Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Dynamically renaming Pandas DataFrame columns based on configurable mapping

Python/Pandas here. I have 2 Excel files. In CSV form they look like:

mappings.xlsx (as a csv)

Id,Standard,FieldName
1,'Animal','Pet'
2,'Color','Clr'
3,'Food','Snack'

info.xslx (as a csv)

Pet,Clr,Snack
Dog,Blue,Pizza
Cat,Purple,French Fries

I want to read these 2 Excel files into Pandas DataFrames, and then use the mappings dataframe to rename the columns of the info dataframe:

mappings_data = pd.read_excel('mappings.xlsx', engine="openpyxl")
mappings_df = pd.DataFrame(mappings_data)

info_data = pd.read_excel('info.xlsx', engine="openpyxl")
info_df = pd.DataFrame(info_data)

At this point the info_df has the following columns:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

info_df["Pet"]
info_df["Clr"]
info_df["Snack"]

But because we have the mappings:

  • Pet ==> Animal
  • Clr ==> Color
  • Snack ==> Food

I want to end up with an info_df that looks like so:

info_df["Animal"]
info_df["Color"]
info_df["Food"]

But obviously, everything needs to be dynamic based on the mappings.xlsx file. Can anyone point me in the right direction please?

>Solution :

One approach is to create a dictionary from FieldName and Standard:

d = dict(zip(mappings_df['FieldName'], mappings_df['Standard']))
info_df = info_df.rename(columns=d)

Note that read_excel returns a DataFrame. The following can be simplified:

mappings_data = pd.read_excel('mappings.xlsx', engine="openpyxl")
mappings_df = pd.DataFrame(mappings_data)

info_data = pd.read_excel('info.xlsx', engine="openpyxl")
info_df = pd.DataFrame(info_data)

to just

mappings_df = pd.read_excel('mappings.xlsx', engine="openpyxl")

info_df = pd.read_excel('info.xlsx', engine="openpyxl")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading