How to sum up a column based on another columns value Python

May 31, 2022

I have this example df

col1 = [1,1,1,2,2,1,1,1,2,2,2]
col2 = [20, 23, 12, 44, 14, 42, 44, 1, 42, 62, 11]

data = {"col1": col1, "col2": col2}
df = pd.DataFrame(data)

I need to add a column that adds up the col2 every time the col1 is 1 and then the same for when it is 2. I have tried grouping by col1 but this skips every time there is a 2 in between
The expected output would be this.

col1    col2  col3
1       20    55 
1       23    55
1       12    55
2       44    58
2       14    58
1       42    87
1       44    87
1       1     87
2       42    115
2       62    115
2       11    115

Please let me know how to approach this

>Solution :

Use GroupBy.transform with helper Series for consecutive values generated by comapre shifted values for not equal and cumulative sum:

df['col3'] = df.groupby(df['col1'].ne(df['col1'].shift()).cumsum())['col2'].transform('sum')
print (df)
    col1  col2  col3
0      1    20    55
1      1    23    55
2      1    12    55
3      2    44    58
4      2    14    58
5      1    42    87
6      1    44    87
7      1     1    87
8      2    42   115
9      2    62   115
10     2    11   115