Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to sum up a column based on another columns value Python

I have this example df

col1 = [1,1,1,2,2,1,1,1,2,2,2]
col2 = [20, 23, 12, 44, 14, 42, 44, 1, 42, 62, 11]

data = {"col1": col1, "col2": col2}
df = pd.DataFrame(data)

I need to add a column that adds up the col2 every time the col1 is 1 and then the same for when it is 2. I have tried grouping by col1 but this skips every time there is a 2 in between
The expected output would be this.

col1    col2  col3
1       20    55 
1       23    55
1       12    55
2       44    58
2       14    58
1       42    87
1       44    87
1       1     87
2       42    115
2       62    115
2       11    115

Please let me know how to approach this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use GroupBy.transform with helper Series for consecutive values generated by comapre shifted values for not equal and cumulative sum:

df['col3'] = df.groupby(df['col1'].ne(df['col1'].shift()).cumsum())['col2'].transform('sum')
print (df)
    col1  col2  col3
0      1    20    55
1      1    23    55
2      1    12    55
3      2    44    58
4      2    14    58
5      1    42    87
6      1    44    87
7      1     1    87
8      2    42   115
9      2    62   115
10     2    11   115
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading