Dynamically renaming Pandas DataFrame columns based on configurable mapping

Advertisements

Python/Pandas here. I have 2 Excel files. In CSV form they look like:

mappings.xlsx (as a csv)

Id,Standard,FieldName
1,'Animal','Pet'
2,'Color','Clr'
3,'Food','Snack'

info.xslx (as a csv)

Pet,Clr,Snack
Dog,Blue,Pizza
Cat,Purple,French Fries

I want to read these 2 Excel files into Pandas DataFrames, and then use the mappings dataframe to rename the columns of the info dataframe:

mappings_data = pd.read_excel('mappings.xlsx', engine="openpyxl")
mappings_df = pd.DataFrame(mappings_data)

info_data = pd.read_excel('info.xlsx', engine="openpyxl")
info_df = pd.DataFrame(info_data)

At this point the info_df has the following columns:

info_df["Pet"]
info_df["Clr"]
info_df["Snack"]

But because we have the mappings:

  • Pet ==> Animal
  • Clr ==> Color
  • Snack ==> Food

I want to end up with an info_df that looks like so:

info_df["Animal"]
info_df["Color"]
info_df["Food"]

But obviously, everything needs to be dynamic based on the mappings.xlsx file. Can anyone point me in the right direction please?

>Solution :

One approach is to create a dictionary from FieldName and Standard:

d = dict(zip(mappings_df['FieldName'], mappings_df['Standard']))
info_df = info_df.rename(columns=d)

Note that read_excel returns a DataFrame. The following can be simplified:

mappings_data = pd.read_excel('mappings.xlsx', engine="openpyxl")
mappings_df = pd.DataFrame(mappings_data)

info_data = pd.read_excel('info.xlsx', engine="openpyxl")
info_df = pd.DataFrame(info_data)

to just

mappings_df = pd.read_excel('mappings.xlsx', engine="openpyxl")

info_df = pd.read_excel('info.xlsx', engine="openpyxl")

Leave a ReplyCancel reply