Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to replace NaN values with forward fill and a decreasing rate in pandas DataFrame?

I try to replace NaN values in a pandas DataFrame with a forward fill method combined with a discount rate or decreasing rate of 0.9.

I have the following data set:

    Column1 Column2 Column3 Column4
0   1.0     5       -9.0    13.0
1   NaN     6       -10.0   15.0
2   3.0     7       NaN     NaN
3   NaN     8       NaN     NaN

For reproducibility:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1 = pd.DataFrame({
                    'Column1':[1, 'NaN', 3, 'NaN'], 
                    'Column2':[5, 6, 7, 8], 
                    'Column3':[-9, -10, 'NaN', 'NaN'], 
                    'Column4':[13, 15, 'NaN', 'NaN']
                    })
df1 = df1.replace('NaN',np.nan)

I was able to replace the NaN values with the fillna command and the forward fill ffill method.

df2 = df1.fillna(method='ffill')

    Column1 Column2 Column3 Column4
0   1.0 5   -9.0    13.0
1   1.0 6   -10.0   15.0
2   3.0 7   -10.0   15.0
3   3.0 8   -10.0   15.0

Additionally, I am trying to apply the ratio 0.9 to all forward filled NaN values, which would yield the following data set:

NaN value row 2, column 3: -10 * 0.9 = -9

NaN value row 3, column 3: -9 * 0.9 = -8.1

    Column1 Column2 Column3 Column4
0   1.0     5       -9.0    13.00
1   0.9     6       -10.0   15.00
2   3.0     7       -9.0    13.50
3   2.7     8       -8.1    12.15

Is there an easy way to deal with that?

Thanks a lot!

>Solution :

Create an exponent mask by counting consecutive NaN sequences:

groups = df1.notna().cumsum()
exp = df1.apply(lambda col: col.isna().groupby(groups[col.name]).cumsum())

#    Column1  Column2  Column3  Column4
# 0        0        0        0        0
# 1        1        0        0        0
# 2        0        0        1        1
# 3        1        0        2        2

Then ffill and multiply by 0.9 ** exp:

df2 = df1.ffill().mul(0.9 ** exp)

#    Column1  Column2  Column3  Column4
# 0      1.0      5.0     -9.0    13.00
# 1      0.9      6.0    -10.0    15.00
# 2      3.0      7.0     -9.0    13.50
# 3      2.7      8.0     -8.1    12.15
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading