Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to sum values across groups without summing duplicates

I have the following df:

     A    B        C       D 
0  foo    a     1200     300  
0  foo    a      700     300  
0  foo    b     1000     300         
1  bar    b      270      70 
1  bar    a      350      70
2  abc    c      270     300 
2  abc    a      350     300

I want to display the sum of values in column D grouped by column B, but I do not want to sum the values in column B for a single value in column A. That is, column D has only one value per value in column A.

foo will only ever have the value 300 and bar will only have the value 70 in column D. The values in this column are just repeated because I have repeated indexes.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I want to print something like (no need to show formatting, I just need to output the correct sums):

a: 300 (from foo) + 300 (from foo) + 70 (from bar) = 670
b: 300 (from foo) + 70 (from bar) = 370
c: 300 (from abc)

That is, values in column D should not be summed together if the value in column A is the same among them.

>Solution :

You could use pd.unique() after the groupby and then sum those values up.

df.groupby('B')['D'].apply(lambda x: sum(pd.unique(x)))
B
a    370
b    370
Name: D, dtype: int64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading