Home Fillna based on month over month percent change and previous value in Pandas

Questions

Fillna based on month over month percent change and previous value in Pandas

November 11, 2021

Given a time series data as follow, I need to fill NaNs in column value based on previous value and MoM_pct:

          date    value  MoM_pct
0    2012-1-31  17222.1   0.0019
1    2012-2-29      NaN   0.0101  --> calculated by (1+0.0101) * 17222.1
2    2012-3-31  15650.2   0.0121
3    2012-4-30  15603.1   0.0093
4    2012-5-31      NaN   0.0111  --> calculated by (1+0.0111) * 15603.1
5    2012-6-30      NaN   0.0112  --> calculated by (1+0.0111) * 15603.1 * (1+0.0112)
6    2012-7-31  16314.9   0.0103
7    2012-8-31  16658.9   0.0111
8    2012-9-30      NaN   0.0132  --> calculated by (1+0.0132) * 16658.9
9   2012-10-31      NaN   0.0118  --> calculated by (1+0.0132) * 16658.9 * (1+0.0118)
10  2012-11-30  18476.7   0.0121
11  2012-12-31  20334.2   0.0123
12   2013-1-31      NaN   0.0014  --> calculated by (1+0.0014) * 20334.2
13   2013-2-28      NaN   0.0087  --> calculated by (1+0.0014) * 20334.2 * (1+0.0087)

The expected result will like this:

          date        value  MoM_pct
0    2012-1-31  17222.10000   0.0019
1    2012-2-29  17396.04321   0.0101
2    2012-3-31  15650.20000   0.0121
3    2012-4-30  15603.10000   0.0093
4    2012-5-31  15776.29441   0.0111
5    2012-6-30  15952.98891   0.0112
6    2012-7-31  16314.90000   0.0103
7    2012-8-31  16658.90000   0.0111
8    2012-9-30  16878.79748   0.0132
9   2012-10-31  17077.96729   0.0118
10  2012-11-30  18476.70000   0.0121
11  2012-12-31  20334.20000   0.0123
12   2013-1-31  20362.66788   0.0014
13   2013-2-28  20539.82309   0.0087
14   2013-3-31  17641.20000   0.0150

The example data:

from numpy import nan
import pandas as pd
    
df = pd.DataFrame({
     'date': {0: '2012-1-31',
                  1: '2012-2-29',
                  2: '2012-3-31',
                  3: '2012-4-30',
                  4: '2012-5-31',
                  5: '2012-6-30',
                  6: '2012-7-31',
                  7: '2012-8-31',
                  8: '2012-9-30',
                  9: '2012-10-31',
                  10: '2012-11-30',
                  11: '2012-12-31',
                  12: '2013-1-31',
                  13: '2013-2-28',
                  14: '2013-3-31'},
     'value': {0: 17222.1,
                   1: nan,
                   2: 15650.2,
                   3: 15603.1,
                   4: nan,
                   5: nan,
                   6: 16314.9,
                   7: 16658.9,
                   8: nan,
                   9: nan,
                   10: 18476.7,
                   11: 20334.2,
                   12: nan,
                   13: nan,
                   14: 17641.2},
    'MoM_pct': {0: 0.0019,
                     1: 0.0101,
                     2: 0.0121,
                     3: 0.0093,
                     4: 0.0111,
                     5: 0.0112,
                     6: 0.0103,
                     7: 0.0111,
                     8: 0.0132,
                     9: 0.0118,
                     10: 0.0121,
                     11: 0.0123,
                     12: 0.0014,
                     13: 0.0087,
                     14: 0.015}
    })

My code below works for the example data, but it’s not suitable if sample has more than two consecutive values are NaNs, also not concise:

df['value1'] = df['value'].shift(1)*df['MoM_pct'].add(1)
df['value2'] = df['value1'].shift(1)*df['MoM_pct'].add(1)
df['value'].fillna(df['value1'], inplace=True)
df['value'].fillna(df['value2'], inplace=True)

How could I improve my code above? Thanks.

>Solution :

We can create the multipliers from the MoM_pct column with groupby cumprod then fillna the valid values with the multiplicative identity 1 and multiply with the ffilled value column:

m = df['value'].isna()
df['value'] = (
        df['value'].ffill() *
        (1 + df['MoM_pct']).groupby((~m).cumsum()[m]).cumprod().fillna(1)
)

df:

          date         value  MoM_pct
0    2012-1-31  17222.100000   0.0019
1    2012-2-29  17396.043210   0.0101
2    2012-3-31  15650.200000   0.0121
3    2012-4-30  15603.100000   0.0093
4    2012-5-31  15776.294410   0.0111
5    2012-6-30  15952.988907   0.0112
6    2012-7-31  16314.900000   0.0103
7    2012-8-31  16658.900000   0.0111
8    2012-9-30  16878.797480   0.0132
9   2012-10-31  17077.967290   0.0118
10  2012-11-30  18476.700000   0.0121
11  2012-12-31  20334.200000   0.0123
12   2013-1-31  20362.667880   0.0014
13   2013-2-28  20539.823091   0.0087
14   2013-3-31  17641.200000   0.0150

Here is the breakdown of steps as a DataFrame:

m = df['value'].isna()
print(pd.DataFrame({
    'm': m,
    'groups': (~m).cumsum(),
    'filtered groups': (~m).cumsum()[m],
    'result of cumprod': (1 + df['MoM_pct']).groupby(
        (~m).cumsum()[m]
    ).cumprod(),
    'fill missing with 1': (1 + df['MoM_pct']).groupby(
        (~m).cumsum()[m]
    ).cumprod().fillna(1)
}))

        m  groups  filtered groups  result of cumprod  fill missing with 1
0   False       1              NaN                NaN             1.000000
1    True       1              1.0           1.010100             1.010100
2   False       2              NaN                NaN             1.000000
3   False       3              NaN                NaN             1.000000
4    True       3              3.0           1.011100             1.011100
5    True       3              3.0           1.022424             1.022424
6   False       4              NaN                NaN             1.000000
7   False       5              NaN                NaN             1.000000
8    True       5              5.0           1.013200             1.013200
9    True       5              5.0           1.025156             1.025156
10  False       6              NaN                NaN             1.000000
11  False       7              NaN                NaN             1.000000
12   True       7              7.0           1.001400             1.001400
13   True       7              7.0           1.010112             1.010112
14  False       8              NaN                NaN             1.000000