Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python: how to multiple merge between two data frames when number of variables are unknown

My example master dataset is the following:

df1 = pd.DataFrame(
    data=
    [['dfs','cat', 'giraffe', 'fish'],
     ['axs','dog', 'pig', 'bird'],
     ['fgd','cow', 'lion', 'bull'],
     ['rty','mouse', 'elephant', 'sheep'],
     ['pwe', 'fox', 'tiger', 'horse']],
    columns=['othervar', 'makevar0', 'makevar1', 'makevar2']
)
print(df1)

The other dataset is as follows:

df2 = pd.DataFrame(
    data=
    [['cat', 34],
     ['pig', 23],
     ['cow', 45],
     ['elephant', 86],
     ['horse', 90]],
    columns=['varx', 'targetvar']
)
print(df2)

I want to merge df1 and df2 like follows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

merged = pd.merge(df1, df2[['varx', 'targetvar']], left_on='makevar0',
                        right_on='varx',
                        how='left', validate='m:1')
print(merged)

The issue is I want to do this for all makevar0, makeover1, and makevar2 in the master dataset; but there may be more than 3 such variables. That is the master dataset may have 2,3,4 or any number of makevars (not very large number and certainly below 10). However, all these will be named starting with ‘makevar’ followed by a number. I want to merge all the makevars with the varx in 2nd dataset to get targetvar mapped in the master data in a loop. The 2nd dataframe does not change at all.

Any help is appreciated so that I don’t have to manually check the number of makevars and write the same line of codes many times.

>Solution :

I would suggest using Series.map to substitute the values in each makevar column:

# create a mapping series
s = df2.set_index('varx')['targetvar']

# Substitute values in each makevar like column
for c in df1.filter(like='makevar'):
    df1[c.replace('make', 'target')] = df1[c].map(s)

Result

  othervar makevar0  makevar1 makevar2  targetvar0  targetvar1  targetvar2
0      dfs      cat   giraffe     fish        34.0         NaN         NaN
1      axs      dog       pig     bird         NaN        23.0         NaN
2      fgd      cow      lion     bull        45.0         NaN         NaN
3      rty    mouse  elephant    sheep         NaN        86.0         NaN
4      pwe      fox     tiger    horse         NaN         NaN        90.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading