Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to use pandas df groupby and apply function to fill in NaN values with the columns median?

Using Pandas, I’ve been working on Kaggle’s titanic problem, and have tried different variants of the groupby/ apply to try to fill out the NaN entries of the training data, train[‘Age’] Column.

ID Age

887 19.0

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

888 NaN

889 26.0

890 32.0

how would I go through the elements and change these NaN elements to something like the median age?

I’ve tried variations of

train.Age = train.Age.apply(lambda x: x.fillna(x.median()))

without success. Could someone lead me in the right direction? I don’t even need the code; just some tips/hints. I’ve been reading through the pandas documentation without any progress.
Can it be done with just apply? or some kind of groupby method?

>Solution :

You may check with fillna without apply

train.Age = train.Age.fillna(train.Age.median())
df
Out[561]: 
     D   Age
0  887  19.0
1  888  26.0
2  889  26.0
3  890  32.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading