I have this example df
col1 = [1,1,1,2,2,1,1,1,2,2,2]
col2 = [20, 23, 12, 44, 14, 42, 44, 1, 42, 62, 11]
data = {"col1": col1, "col2": col2}
df = pd.DataFrame(data)
I need to add a column that adds up the col2 every time the col1 is 1 and then the same for when it is 2. I have tried grouping by col1 but this skips every time there is a 2 in between
The expected output would be this.
col1 col2 col3
1 20 55
1 23 55
1 12 55
2 44 58
2 14 58
1 42 87
1 44 87
1 1 87
2 42 115
2 62 115
2 11 115
Please let me know how to approach this
>Solution :
Use GroupBy.transform with helper Series for consecutive values generated by comapre shifted values for not equal and cumulative sum:
df['col3'] = df.groupby(df['col1'].ne(df['col1'].shift()).cumsum())['col2'].transform('sum')
print (df)
col1 col2 col3
0 1 20 55
1 1 23 55
2 1 12 55
3 2 44 58
4 2 14 58
5 1 42 87
6 1 44 87
7 1 1 87
8 2 42 115
9 2 62 115
10 2 11 115