add column while reading in (pd.read_csv)

March 14, 2023

I’m reading in all files within a folder and these files will be merged together when they share the same name in their filename. Now, I would like to add their specific-individual path v to an additional column while each time the files are read in as dataframes.
Is this possible?

This is the code:

filenames = glob.glob(os.path.join(path, "*.xls"))

dd = defaultdict(list)
for fn in filenames:
    dd[fn.split('_202')[0]].append(fn)
    
dict_df = {}
for k, v in dd.items():
    #print("v: ", v)
    dict_df[k] = pd.concat([pd.read_csv(fn
                                        , parse_dates=['time']
                                        , dayfirst=True
                                        , skiprows=4
                                        , sep="\t"
                                        , decimal=','
                                        ) for fn in v
                            ]
                           , ignore_index=True
                           )

>Solution :

Yeah, it should possible to add a specific-individual path to an additional column while reading in the files as dataframes using pd.read_csv. One approach is to modify the code inside the for loop that reads in the CSV files to include a column with the path information.

Here’s an example modification to the code:

dict_df = {}
for k, v in dd.items():
    dict_df[k] = pd.concat([pd.read_csv(fn
                                        , parse_dates=['time']
                                        , dayfirst=True
                                        , skiprows=4
                                        , sep="\t"
                                        , decimal=','
                                        ).assign(file_path=os.path.dirname(fn)) # add a column with the file path
                            for fn in v
                            ]
                           , ignore_index=True
                           )

The assign() method is used to add a new column called file_path to the DataFrame that is being read in from each file. The value of file_path is set to the directory of the file being read in using os.path.dirname(fn). This will add a new column to each DataFrame with the path information.