Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

DF Rolling calculation with starting value and based on another column

Lets say I have a dataframe where I have banking information. I have present value, a list of transactions, and I want to work backwards to calculate the balance over time.

Here is the dataframe:


value   CN_running_balance
2020-08-07  -50.82  843.70
2020-08-06  893.77  NaN
2020-08-05  0.00    NaN
2020-08-04  -9.56   NaN
2020-08-03  -12.21  NaN
... ... ...
2020-05-14  1224.78 NaN
2020-05-13  0.00    NaN
2020-05-12  0.00    NaN
2020-05-11  -25.00  NaN
2020-05-10  -0.00   NaN

And I want to transform the running balance to use that rows value column for next row, by subtracting the value from the balance.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

value   CN_running_balance
2020-08-07  -50.82  843.70
2020-08-06  893.77  894.52
2020-08-05  0.00    0.75
2020-08-04  -9.56   etc
2020-08-03  -12.21  etc
... ... ...
2020-05-14  1224.78 etc
2020-05-13  0.00    etc
2020-05-12  0.00    etc
2020-05-11  -25.00  etc
2020-05-10  -0.00   etc

This has been pretty tricky for me so I would appreciate any suggestions on how to solve the problem!

>Solution :

I would suggest simply iterating over your dataframe as you have to forward fill the value. Here is one approach:

from io import StringIO
import pandas as pd

# Create your data frame
input_string = """value  CN_running_balance
2020-08-07  -50.82  843.70
2020-08-06  893.77  NaN
2020-08-05  0.00  NaN
2020-08-04  -9.56  NaN
2020-08-03  -12.21  NaN
2020-05-14  1224.78  NaN
2020-05-13  0.00  NaN
2020-05-12  0.00  NaN
2020-05-11  -25.00  NaN
2020-05-10  -0.00  NaN"""

data = StringIO(input_string)
df = pd.read_csv(data, sep='  ', engine='python')

# Create iterable index
df = df.reset_index()

# Forward fill running balance
for i in range(1, len(df)):
    df.loc[i, 'CN_running_balance']  = df.loc[i-1, 'CN_running_balance'] - df.loc[i-1, 'value']

# Reset original index
df = df.set_index('index')       
print(df)

Output:

              value  CN_running_balance
index                                  
2020-08-07   -50.82              843.70
2020-08-06   893.77              894.52
2020-08-05     0.00                0.75
2020-08-04    -9.56                0.75
2020-08-03   -12.21               10.31
2020-05-14  1224.78               22.52
2020-05-13     0.00            -1202.26
2020-05-12     0.00            -1202.26
2020-05-11   -25.00            -1202.26
2020-05-10    -0.00            -1177.26
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading