Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to map multiple datasets based on the same row values of one column?

I have three datasets,

df1
   name              A         B         C         D  
0  Jeffrey Ray       0.171130  0.307791 -0.031378  0.366607
1  Crystal Roberts   0.429614  1.674201  0.176098 -1.289623
2  Paul Choute       0.396332 0.879003  1.889926  0.745100
3  Amy Clements      1.452253  0.724694 -0.830645 -0.342138
4  Stanley Coston    -0.135439 -0.293293  0.684627  0.538049

df2
   name              A         B         C         D  
0  Jeffrey Ray       0.173430  0.399791 -0.012278  0.88897
1  Sara Flicker      0.635744  1.699001  0.176098 -1.11113
2  Jack Sether      0.323333 0.879003  1.889926  0.983640
3  Amy Clements      1.452253  0.784734 -0.673695 -0.342138
4  Stanley Coston    -0.143222 -0.293293  0.683647  0.638479

df3
   name              A         B         C         D  
0  Jeffrey Ray       0.638273  0.687777  0.766666  0.900032
1  Sara Flicker      0.635744  1.699001  0.176098 -1.11113
2  Samantha Runyon   0.323333 -0.879003  1.889926  -0.683640
3  Amy Clements      -1.452253  0.784734 -0.673695 -0.342138
4  Stanley Coston    -0.143222 -0.282222  0.683647  -0.638479

I want to map name in each name and get the sum D:

Jeffrey Ray  0.366607+0.88897+0.900032
Amy Clements -0.342138+(-0.342138)+(-0.342138)
Stanley Coston 0.538049+0.638479-0.638479

I tried pd.merge:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

pd.merge(df1, df2, on='name', how='inner').sum().reset_index()

But the output is not what I expected. Any suggestion on how should I change it?
Another issue is that it can only merge two datasets each time, is there an efficient way that I can map multiple datasets at the same time without a for loop?

Thanks

>Solution :

How about:

dfs = pd.concat([df1,df2,df3])
out = dfs.groupby('name')['D'].sum()[dfs['name'].value_counts()==3]

Output:

 name
Amy Clements     -1.026414
Jeffrey Ray       2.155609
Stanley Coston    0.538049
Name: D, dtype: float64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading