Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Combine dataframes based on multiple conditions in python

Table A

Item    amount  year
A   100 2011
A   110 2011
A   125 2012
A   25  2012
B   54  2013
C   55  2020
D   68  2022

Table B

item    year    desc
A   2011    xxx
A   2011    xxx
A   2012    xxx
B   2011    xxx
B   2012    xxx
B   2013    xxx
B   2014    xxx
C   2020    xxx
D   2022    xxx

Table C

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

item    year    desc    Total
A   2011    xxx 220
A   2011    xxx 0
A   2012    xxx 150
B   2011    xxx 0
B   2012    xxx 0
B   2013    xxx 54
B   2014    xxx 0
C   2020    xxx 55
D   2022    xxx 68

These are example tables that represent dataframes that I’m going to create from seperate excel sheets.

Basically there’s a many to many relationship going on and I want to be able to create a combined sheet that will roll up the "amount" total (from Table A) for each year of the item (Table B). I don’t want to duplicate the amounts for multiple items with the same year as you see in table A, item A has two 2011 entries.

What I’d like to get is a result like Table C below

I’m currently trying to use pandas merge but not having much luck. Can anyone lend a hand?

>Solution :

IIUC, you can use a merge and post-process to remove the duplicates per year:

out = (dfB
    .merge(dfA.rename(columns={'Item': 'item'})
              .groupby(['item', 'year'], as_index=False).sum(), how='left')
    .assign(amount=lambda d: d['amount']
                             .mask(d.groupby('year').cumcount().gt(0), 0)
                             .fillna(0)
           )
)

output:

  item  year desc  amount
0    A  2011  xxx   210.0
1    A  2011  xxx     0.0
2    A  2012  xxx   150.0
3    B  2011  xxx     0.0
4    B  2012  xxx     0.0
5    B  2013  xxx    54.0
6    B  2014  xxx     0.0
7    C  2020  xxx    55.0
8    D  2022  xxx    68.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading