Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Tricky calculation refresh based on columns using Pandas

Logic is to add the value in the new_r_aa after the first line of code output under ‘aa_cumul’ Example, the code will take 3 (which is the first output under ‘aa_cumul’) and sum it with the value 8 under the ‘new_r_aa’ column. This will result in 3+8 = 11 11+9=20, 20+ 8=28, 28+8=36 and so on. But remember the first groupby row should be unaffected and unchanged from the original input.

Data

data = {
    'city': ['NY', 'NY', 'NY', 'NY', 'NY', 'CA'],
    'ID': ['AA', 'AA', 'AA', 'AA', 'AA', 'AA'],
    'quarter': ['2024Q1', '2024Q2', '2024Q3', '2024Q4', '2025Q1', '2024Q1'],
    'cml_bb_racks': [6, 13, 18, 20, 30, 5],
    'r_aa_bx': [0, 2, 3, 4, 2, 1],
    'cml_aa_bx': [1, 3, 6, 10, 12, 2],
    'BB_AA_Bx_Ratio': [6.0, 4.333333, 3.0, 2.0, 2.5, 2.5],
    'expected_aa_bx_delta': [1.81, 2.856537, 2.395498, 0.0, 2.1, 2.0],
    'total aa': [1.810000, 4.856537, 5.395498, 4.000000, 4.100000, 3.000000],
    'total round aa': [2.0, 6.0, 6.0, 4.0, 6.0, 4.0],
    'new_r_aa': [2.0, 8.0, 9.0, 8.0, 8.0, 5.0],
    'aa_cumul': [3.0, 0.0, 0.0, 0.0, 0.0, 6.0]
}

df = pd.DataFrame(data)

Desired

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data = {
    'city': ['NY', 'NY', 'NY', 'NY', 'NY', 'CA'],
    'ID': ['AA', 'AA', 'AA', 'AA', 'AA', 'AA'],
    'quarter': ['2024Q1', '2024Q2', '2024Q3', '2024Q4', '2025Q1', '2024Q1'],
    'cml_bb_racks': [6, 13, 18, 20, 30, 5],
    'r_aa_bx': [0, 2, 3, 4, 2, 1],
    'cml_aa_bx': [1, 3, 6, 10, 12, 2],
    'BB_AA_Bx_Ratio': [6.0, 4.333333, 3.0, 2.0, 2.5, 2.5],
    'expected_aa_bx_delta': [1.81, 2.856537, 2.395498, 0.0, 2.1, 2.0],
    'total aa': [1.800000, 4.856537, 5.395498, 4.000000, 4.100000, 3.000000],
    'total round aa': [2, 6, 6, 4, 6, 4],
    'new_r_aa': [2.0, 8.0, 9.0, 8.0, 8.0, 5.0],
    'aa_cumul': [3.0, 11.0, 20.0, 28.0, 36.0, 6.0]
}

df = pd.DataFrame(data)

Doing

df['aa_cumul'] = df.groupby(['city', 'ID'])['new_r_aa'].transform(lambda x: x.cumsum().add(x.iloc[0])).fillna(df['new_r_aa'])

However, this code is not applying the correct reset. Any suggestion is helpful.

>Solution :

IIUC, the below code will get you your desired output:

df['aa_cumul'] = df.groupby(['city', 'ID'])['new_r_aa'].transform(lambda x: x.cumsum().add(-x.iloc[0])) + df.groupby(['city', 'ID'])['aa_cumul'].cumsum()

The first part basically does the cumusum for the ‘new_r_aa’ column without the first entry used and the second part just adds the existing value of aa_cumul to that. Assuming aa_cumul has all zeros except the first element

Output:

0     3.0
1    11.0
2    20.0
3    28.0
4    36.0
5     6.0
dtype: float64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading