Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas 'Usecols do not match columns, columns expected but not found'

Here is my data file:

enter image description here

My file is readable and has a list of columns

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd

df = pd.read_csv('data.csv')
print(df.columns)

enter image description here

Why do I get an error when I want to read one column or several?

df2 = pd.read_csv('data.csv', usecols=['<TICKER>'])

enter image description here

>Solution :

Your CSV is invalid, you have double quotes " wrapping the full lines. Thus they are considered a single field.

You should remove them before trying to read the file as CSV.

Here is an example to pre-process the file to remove the external ":

from io import StringIO
import pandas as pd

with open('data.csv') as csv:
    df = pd.read_csv(StringIO('\n'.join(l[:-1].strip('"') for l in csv)),
                     usecols=['<TICKER>'])

Output:

  <TICKER>
0     AFLT
1     AFLT

Another quick-and-dirty approach, assuming you don’t have other quoted fields (i.e. only " on the outside of the lines), could be to consider the " as an extra separator:

df = pd.read_csv('data.csv', sep=',|"', usecols=['<TICKER>'],
                 engine='python', quoting=3)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading