Logic is to add the value in the new_r_aa after the first line of code output under ‘aa_cumul’ Example, the code will take 3 (which is the first output under ‘aa_cumul’) and sum it with the value 8 under the ‘new_r_aa’ column. This will result in 3+8 = 11 11+9=20, 20+ 8=28, 28+8=36 and so on. But remember the first groupby row should be unaffected and unchanged from the original input.
Data
data = {
'city': ['NY', 'NY', 'NY', 'NY', 'NY', 'CA'],
'ID': ['AA', 'AA', 'AA', 'AA', 'AA', 'AA'],
'quarter': ['2024Q1', '2024Q2', '2024Q3', '2024Q4', '2025Q1', '2024Q1'],
'cml_bb_racks': [6, 13, 18, 20, 30, 5],
'r_aa_bx': [0, 2, 3, 4, 2, 1],
'cml_aa_bx': [1, 3, 6, 10, 12, 2],
'BB_AA_Bx_Ratio': [6.0, 4.333333, 3.0, 2.0, 2.5, 2.5],
'expected_aa_bx_delta': [1.81, 2.856537, 2.395498, 0.0, 2.1, 2.0],
'total aa': [1.810000, 4.856537, 5.395498, 4.000000, 4.100000, 3.000000],
'total round aa': [2.0, 6.0, 6.0, 4.0, 6.0, 4.0],
'new_r_aa': [2.0, 8.0, 9.0, 8.0, 8.0, 5.0],
'aa_cumul': [3.0, 0.0, 0.0, 0.0, 0.0, 6.0]
}
df = pd.DataFrame(data)
Desired
data = {
'city': ['NY', 'NY', 'NY', 'NY', 'NY', 'CA'],
'ID': ['AA', 'AA', 'AA', 'AA', 'AA', 'AA'],
'quarter': ['2024Q1', '2024Q2', '2024Q3', '2024Q4', '2025Q1', '2024Q1'],
'cml_bb_racks': [6, 13, 18, 20, 30, 5],
'r_aa_bx': [0, 2, 3, 4, 2, 1],
'cml_aa_bx': [1, 3, 6, 10, 12, 2],
'BB_AA_Bx_Ratio': [6.0, 4.333333, 3.0, 2.0, 2.5, 2.5],
'expected_aa_bx_delta': [1.81, 2.856537, 2.395498, 0.0, 2.1, 2.0],
'total aa': [1.800000, 4.856537, 5.395498, 4.000000, 4.100000, 3.000000],
'total round aa': [2, 6, 6, 4, 6, 4],
'new_r_aa': [2.0, 8.0, 9.0, 8.0, 8.0, 5.0],
'aa_cumul': [3.0, 11.0, 20.0, 28.0, 36.0, 6.0]
}
df = pd.DataFrame(data)
Doing
df['aa_cumul'] = df.groupby(['city', 'ID'])['new_r_aa'].transform(lambda x: x.cumsum().add(x.iloc[0])).fillna(df['new_r_aa'])
However, this code is not applying the correct reset. Any suggestion is helpful.
>Solution :
IIUC, the below code will get you your desired output:
df['aa_cumul'] = df.groupby(['city', 'ID'])['new_r_aa'].transform(lambda x: x.cumsum().add(-x.iloc[0])) + df.groupby(['city', 'ID'])['aa_cumul'].cumsum()
The first part basically does the cumusum for the ‘new_r_aa’ column without the first entry used and the second part just adds the existing value of aa_cumul to that. Assuming aa_cumul has all zeros except the first element
Output:
0 3.0
1 11.0
2 20.0
3 28.0
4 36.0
5 6.0
dtype: float64