Home Tricky calculation refresh based on columns using Pandas

Questions

Tricky calculation refresh based on columns using Pandas

September 29, 2023

Logic is to add the value in the new_r_aa after the first line of code output under ‘aa_cumul’ Example, the code will take 3 (which is the first output under ‘aa_cumul’) and sum it with the value 8 under the ‘new_r_aa’ column. This will result in 3+8 = 11 11+9=20, 20+ 8=28, 28+8=36 and so on. But remember the first groupby row should be unaffected and unchanged from the original input.

Data

data = {
    'city': ['NY', 'NY', 'NY', 'NY', 'NY', 'CA'],
    'ID': ['AA', 'AA', 'AA', 'AA', 'AA', 'AA'],
    'quarter': ['2024Q1', '2024Q2', '2024Q3', '2024Q4', '2025Q1', '2024Q1'],
    'cml_bb_racks': [6, 13, 18, 20, 30, 5],
    'r_aa_bx': [0, 2, 3, 4, 2, 1],
    'cml_aa_bx': [1, 3, 6, 10, 12, 2],
    'BB_AA_Bx_Ratio': [6.0, 4.333333, 3.0, 2.0, 2.5, 2.5],
    'expected_aa_bx_delta': [1.81, 2.856537, 2.395498, 0.0, 2.1, 2.0],
    'total aa': [1.810000, 4.856537, 5.395498, 4.000000, 4.100000, 3.000000],
    'total round aa': [2.0, 6.0, 6.0, 4.0, 6.0, 4.0],
    'new_r_aa': [2.0, 8.0, 9.0, 8.0, 8.0, 5.0],
    'aa_cumul': [3.0, 0.0, 0.0, 0.0, 0.0, 6.0]
}

df = pd.DataFrame(data)

Desired

data = {
    'city': ['NY', 'NY', 'NY', 'NY', 'NY', 'CA'],
    'ID': ['AA', 'AA', 'AA', 'AA', 'AA', 'AA'],
    'quarter': ['2024Q1', '2024Q2', '2024Q3', '2024Q4', '2025Q1', '2024Q1'],
    'cml_bb_racks': [6, 13, 18, 20, 30, 5],
    'r_aa_bx': [0, 2, 3, 4, 2, 1],
    'cml_aa_bx': [1, 3, 6, 10, 12, 2],
    'BB_AA_Bx_Ratio': [6.0, 4.333333, 3.0, 2.0, 2.5, 2.5],
    'expected_aa_bx_delta': [1.81, 2.856537, 2.395498, 0.0, 2.1, 2.0],
    'total aa': [1.800000, 4.856537, 5.395498, 4.000000, 4.100000, 3.000000],
    'total round aa': [2, 6, 6, 4, 6, 4],
    'new_r_aa': [2.0, 8.0, 9.0, 8.0, 8.0, 5.0],
    'aa_cumul': [3.0, 11.0, 20.0, 28.0, 36.0, 6.0]
}

df = pd.DataFrame(data)

Doing

df['aa_cumul'] = df.groupby(['city', 'ID'])['new_r_aa'].transform(lambda x: x.cumsum().add(x.iloc[0])).fillna(df['new_r_aa'])

However, this code is not applying the correct reset. Any suggestion is helpful.

>Solution :

IIUC, the below code will get you your desired output:

df['aa_cumul'] = df.groupby(['city', 'ID'])['new_r_aa'].transform(lambda x: x.cumsum().add(-x.iloc[0])) + df.groupby(['city', 'ID'])['aa_cumul'].cumsum()

The first part basically does the cumusum for the ‘new_r_aa’ column without the first entry used and the second part just adds the existing value of aa_cumul to that. Assuming aa_cumul has all zeros except the first element

Output:

0     3.0
1    11.0
2    20.0
3    28.0
4    36.0
5     6.0
dtype: float64

numpy

byMR

Published September 29, 2023

Add a comment

rbind 2 data frames by the elements in 2 columns, avoiding nested loops

byMR

September 29, 2023

Questions

Haskell dot operator with multiple parameters

byMR

September 29, 2023

Questions

Redirection of python script output to file not working

byMR

September 29, 2023

Questions

Type '() -> N' cannot conform to 'View'

byMR

September 29, 2023

Questions

Leetcode bug? (Function isn't stopping even after it returns in binary search)

byMR

September 29, 2023

Questions

How to Join (NOT COMBINE) two more django querysets?

byMR

September 29, 2023

Tricky calculation refresh based on columns using Pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

rbind 2 data frames by the elements in 2 columns, avoiding nested loops

Haskell dot operator with multiple parameters

Redirection of python script output to file not working

Type '() -> N' cannot conform to 'View'

Leetcode bug? (Function isn't stopping even after it returns in binary search)

How to Join (NOT COMBINE) two more django querysets?

Keep Up to Date with the Most Important News

Tricky calculation refresh based on columns using Pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

rbind 2 data frames by the elements in 2 columns, avoiding nested loops

Haskell dot operator with multiple parameters

Redirection of python script output to file not working

Type '() -> N' cannot conform to 'View'

Leetcode bug? (Function isn't stopping even after it returns in binary search)

How to Join (NOT COMBINE) two more django querysets?

Discover more from Dev solutions