Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to manage column names containing multiple spaces when using read_csv

I use the same piece of code which I use to import multiple dataframes. Usually the have the same column names with different data. However sometimes they have different spaces before or after the names of the columns.

  df = pd.read_csv(
                file_path,
                delimiter="|",
                low_memory=True,
                dtype=schema,
                usecols=schema.keys(),
            )

The schema of the file is in a different file:

file_schema = {
    " Age ": str,
    " Name ": str,
    " Country ": str,}

for some other cases, there are no spaces before and after the names:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   file_schema = {
        "Age": str,
        "Name": str,
        "Country": str,}

Currently with having one schema, if there is no match in the spaces before the name of the columns, I’m having errors related to usecols.
I’m wondering if there’s a way in one schema file to write the names of the columns and for it to work no matter how many spaces we have before or after the names?

>Solution :

I think it should be possible to match the column names with

pd.read_csv(..., usecols=lambda x: x.strip() in schema.keys())

and then either strip them afterwards with

df.columns = df.columns.str.strip()

or even better try to pass them explicitly with

pd.read_csv(..., header=0, names=schema.keys())

if you know that all columns declared in schema will be in the file and in order.

Not sure, whether dtype=schema will cause the next problems immediatlely, though

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading