I’m reading in all files within a folder and these files will be merged together when they share the same name in their filename. Now, I would like to add their specific-individual path v
to an additional column while each time the files are read in as dataframes.
Is this possible?
This is the code:
filenames = glob.glob(os.path.join(path, "*.xls"))
dd = defaultdict(list)
for fn in filenames:
dd[fn.split('_202')[0]].append(fn)
dict_df = {}
for k, v in dd.items():
#print("v: ", v)
dict_df[k] = pd.concat([pd.read_csv(fn
, parse_dates=['time']
, dayfirst=True
, skiprows=4
, sep="\t"
, decimal=','
) for fn in v
]
, ignore_index=True
)
>Solution :
Yeah, it should possible to add a specific-individual path to an additional column while reading in the files as dataframes using pd.read_csv
. One approach is to modify the code inside the for
loop that reads in the CSV files to include a column with the path information.
Here’s an example modification to the code:
dict_df = {}
for k, v in dd.items():
dict_df[k] = pd.concat([pd.read_csv(fn
, parse_dates=['time']
, dayfirst=True
, skiprows=4
, sep="\t"
, decimal=','
).assign(file_path=os.path.dirname(fn)) # add a column with the file path
for fn in v
]
, ignore_index=True
)
The assign()
method is used to add a new column called file_path
to the DataFrame that is being read in from each file. The value of file_path
is set to the directory of the file being read in using os.path.dirname(fn)
. This will add a new column to each DataFrame with the path information.