Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

problem in pandas 'apply' function to replace missing values

I want to replace np.nan values with other value in pandas.DataFrame using ‘apply’ function. And I will use replace method that where NaN is replaced with max value of each column (axis=0). You better understand below.

import pandas as pd

df = pd.DataFrame({'a':[1, np.nan, 3],
                  'b':[np.nan,5,6],
                  'c':[7,8,np.nan]})

result = df.apply(lambda c: c.replace(np.nan, max(c)), axis=0)
print(result)

There are three np.nan values. Two of them is replaced with appropriate values, but just one value is still np.nan(below picture)

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

After setting argument axis to 1, there is still one value that isn’t replaced. What’s the reason?

>Solution :

Python’s max doesn’t work if a list starts with NaN; so max(df['b'])returns NaN and it cannot fill the NaN value in that column. Use c.max() instead (which works because by default Series.max skips NaNs). So:

df = df.apply(lambda c: c.replace(np.nan, c.max()), axis=0)

But instead of replace, you could use fillna on axis:

df = df.fillna(df.max(), axis=0)

Output:

     a    b    c
0  1.0  6.0  7.0
1  3.0  5.0  8.0
2  3.0  6.0  8.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading