Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Imputation: Why do we replace the nan value with the Mean, and doesn't it affect our data?

Why do we replace the nan value in DataFrame with the Mean, and when we change it doesn’t it affect our data ?

0     1.048242
1     1.688173 
2          NaN
3     0.194162
4     0.194162
5     0.493194
6          NaN
7     0.675041
8          NaN
9     0.101743
10    3.112086
df['view_duration'].fillna(mean,inplace = True)

0     1.048242
1     1.688173
2     0.938350
3     0.194162
4     0.194162
5     0.493194
6     0.938350
7     0.675041
8     0.938350
9     0.101743
10    3.112086


>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Replacing Nulls with other relevant data (like Mean) is called imputation and is usually done for machine learning models as they cannot accept Nulls.

It will not change the Mean of the data.

Please note that if you have too many Nulls in the same column (usually above 30% but this should be considered on a case to case basis) – then we better not impute but drop the rows with Nulls.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading