Python/Pandas here. I have 2 Excel files. In CSV form they look like:
mappings.xlsx (as a csv)
Id,Standard,FieldName
1,'Animal','Pet'
2,'Color','Clr'
3,'Food','Snack'
info.xslx (as a csv)
Pet,Clr,Snack
Dog,Blue,Pizza
Cat,Purple,French Fries
I want to read these 2 Excel files into Pandas DataFrames, and then use the mappings
dataframe to rename the columns of the info
dataframe:
mappings_data = pd.read_excel('mappings.xlsx', engine="openpyxl")
mappings_df = pd.DataFrame(mappings_data)
info_data = pd.read_excel('info.xlsx', engine="openpyxl")
info_df = pd.DataFrame(info_data)
At this point the info_df
has the following columns:
info_df["Pet"]
info_df["Clr"]
info_df["Snack"]
But because we have the mappings:
Pet
==>Animal
Clr
==>Color
Snack
==>Food
I want to end up with an info_df
that looks like so:
info_df["Animal"]
info_df["Color"]
info_df["Food"]
But obviously, everything needs to be dynamic based on the mappings.xlsx
file. Can anyone point me in the right direction please?
>Solution :
One approach is to create a dictionary from FieldName
and Standard
:
d = dict(zip(mappings_df['FieldName'], mappings_df['Standard']))
info_df = info_df.rename(columns=d)
Note that read_excel
returns a DataFrame
. The following can be simplified:
mappings_data = pd.read_excel('mappings.xlsx', engine="openpyxl")
mappings_df = pd.DataFrame(mappings_data)
info_data = pd.read_excel('info.xlsx', engine="openpyxl")
info_df = pd.DataFrame(info_data)
to just
mappings_df = pd.read_excel('mappings.xlsx', engine="openpyxl")
info_df = pd.read_excel('info.xlsx', engine="openpyxl")