Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

add column while reading in (pd.read_csv)

I’m reading in all files within a folder and these files will be merged together when they share the same name in their filename. Now, I would like to add their specific-individual path v to an additional column while each time the files are read in as dataframes.
Is this possible?

This is the code:

filenames = glob.glob(os.path.join(path, "*.xls"))

dd = defaultdict(list)
for fn in filenames:
    dd[fn.split('_202')[0]].append(fn)
    
dict_df = {}
for k, v in dd.items():
    #print("v: ", v)
    dict_df[k] = pd.concat([pd.read_csv(fn
                                        , parse_dates=['time']
                                        , dayfirst=True
                                        , skiprows=4
                                        , sep="\t"
                                        , decimal=','
                                        ) for fn in v
                            ]
                           , ignore_index=True
                           )

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Yeah, it should possible to add a specific-individual path to an additional column while reading in the files as dataframes using pd.read_csv. One approach is to modify the code inside the for loop that reads in the CSV files to include a column with the path information.

Here’s an example modification to the code:

dict_df = {}
for k, v in dd.items():
    dict_df[k] = pd.concat([pd.read_csv(fn
                                        , parse_dates=['time']
                                        , dayfirst=True
                                        , skiprows=4
                                        , sep="\t"
                                        , decimal=','
                                        ).assign(file_path=os.path.dirname(fn)) # add a column with the file path
                            for fn in v
                            ]
                           , ignore_index=True
                           )

The assign() method is used to add a new column called file_path to the DataFrame that is being read in from each file. The value of file_path is set to the directory of the file being read in using os.path.dirname(fn). This will add a new column to each DataFrame with the path information.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading