Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Fillna based on month over month percent change and previous value in Pandas

Given a time series data as follow, I need to fill NaNs in column value based on previous value and MoM_pct:

          date    value  MoM_pct
0    2012-1-31  17222.1   0.0019
1    2012-2-29      NaN   0.0101  --> calculated by (1+0.0101) * 17222.1
2    2012-3-31  15650.2   0.0121
3    2012-4-30  15603.1   0.0093
4    2012-5-31      NaN   0.0111  --> calculated by (1+0.0111) * 15603.1
5    2012-6-30      NaN   0.0112  --> calculated by (1+0.0111) * 15603.1 * (1+0.0112)
6    2012-7-31  16314.9   0.0103
7    2012-8-31  16658.9   0.0111
8    2012-9-30      NaN   0.0132  --> calculated by (1+0.0132) * 16658.9
9   2012-10-31      NaN   0.0118  --> calculated by (1+0.0132) * 16658.9 * (1+0.0118)
10  2012-11-30  18476.7   0.0121
11  2012-12-31  20334.2   0.0123
12   2013-1-31      NaN   0.0014  --> calculated by (1+0.0014) * 20334.2
13   2013-2-28      NaN   0.0087  --> calculated by (1+0.0014) * 20334.2 * (1+0.0087)

The expected result will like this:

          date        value  MoM_pct
0    2012-1-31  17222.10000   0.0019
1    2012-2-29  17396.04321   0.0101
2    2012-3-31  15650.20000   0.0121
3    2012-4-30  15603.10000   0.0093
4    2012-5-31  15776.29441   0.0111
5    2012-6-30  15952.98891   0.0112
6    2012-7-31  16314.90000   0.0103
7    2012-8-31  16658.90000   0.0111
8    2012-9-30  16878.79748   0.0132
9   2012-10-31  17077.96729   0.0118
10  2012-11-30  18476.70000   0.0121
11  2012-12-31  20334.20000   0.0123
12   2013-1-31  20362.66788   0.0014
13   2013-2-28  20539.82309   0.0087
14   2013-3-31  17641.20000   0.0150

The example data:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

from numpy import nan
import pandas as pd
    
df = pd.DataFrame({
     'date': {0: '2012-1-31',
                  1: '2012-2-29',
                  2: '2012-3-31',
                  3: '2012-4-30',
                  4: '2012-5-31',
                  5: '2012-6-30',
                  6: '2012-7-31',
                  7: '2012-8-31',
                  8: '2012-9-30',
                  9: '2012-10-31',
                  10: '2012-11-30',
                  11: '2012-12-31',
                  12: '2013-1-31',
                  13: '2013-2-28',
                  14: '2013-3-31'},
     'value': {0: 17222.1,
                   1: nan,
                   2: 15650.2,
                   3: 15603.1,
                   4: nan,
                   5: nan,
                   6: 16314.9,
                   7: 16658.9,
                   8: nan,
                   9: nan,
                   10: 18476.7,
                   11: 20334.2,
                   12: nan,
                   13: nan,
                   14: 17641.2},
    'MoM_pct': {0: 0.0019,
                     1: 0.0101,
                     2: 0.0121,
                     3: 0.0093,
                     4: 0.0111,
                     5: 0.0112,
                     6: 0.0103,
                     7: 0.0111,
                     8: 0.0132,
                     9: 0.0118,
                     10: 0.0121,
                     11: 0.0123,
                     12: 0.0014,
                     13: 0.0087,
                     14: 0.015}
    })

My code below works for the example data, but it’s not suitable if sample has more than two consecutive values are NaNs, also not concise:

df['value1'] = df['value'].shift(1)*df['MoM_pct'].add(1)
df['value2'] = df['value1'].shift(1)*df['MoM_pct'].add(1)
df['value'].fillna(df['value1'], inplace=True)
df['value'].fillna(df['value2'], inplace=True)

How could I improve my code above? Thanks.

>Solution :

We can create the multipliers from the MoM_pct column with groupby cumprod then fillna the valid values with the multiplicative identity 1 and multiply with the ffilled value column:

m = df['value'].isna()
df['value'] = (
        df['value'].ffill() *
        (1 + df['MoM_pct']).groupby((~m).cumsum()[m]).cumprod().fillna(1)
)

df:

          date         value  MoM_pct
0    2012-1-31  17222.100000   0.0019
1    2012-2-29  17396.043210   0.0101
2    2012-3-31  15650.200000   0.0121
3    2012-4-30  15603.100000   0.0093
4    2012-5-31  15776.294410   0.0111
5    2012-6-30  15952.988907   0.0112
6    2012-7-31  16314.900000   0.0103
7    2012-8-31  16658.900000   0.0111
8    2012-9-30  16878.797480   0.0132
9   2012-10-31  17077.967290   0.0118
10  2012-11-30  18476.700000   0.0121
11  2012-12-31  20334.200000   0.0123
12   2013-1-31  20362.667880   0.0014
13   2013-2-28  20539.823091   0.0087
14   2013-3-31  17641.200000   0.0150

Here is the breakdown of steps as a DataFrame:

m = df['value'].isna()
print(pd.DataFrame({
    'm': m,
    'groups': (~m).cumsum(),
    'filtered groups': (~m).cumsum()[m],
    'result of cumprod': (1 + df['MoM_pct']).groupby(
        (~m).cumsum()[m]
    ).cumprod(),
    'fill missing with 1': (1 + df['MoM_pct']).groupby(
        (~m).cumsum()[m]
    ).cumprod().fillna(1)
}))
        m  groups  filtered groups  result of cumprod  fill missing with 1
0   False       1              NaN                NaN             1.000000
1    True       1              1.0           1.010100             1.010100
2   False       2              NaN                NaN             1.000000
3   False       3              NaN                NaN             1.000000
4    True       3              3.0           1.011100             1.011100
5    True       3              3.0           1.022424             1.022424
6   False       4              NaN                NaN             1.000000
7   False       5              NaN                NaN             1.000000
8    True       5              5.0           1.013200             1.013200
9    True       5              5.0           1.025156             1.025156
10  False       6              NaN                NaN             1.000000
11  False       7              NaN                NaN             1.000000
12   True       7              7.0           1.001400             1.001400
13   True       7              7.0           1.010112             1.010112
14  False       8              NaN                NaN             1.000000
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading