Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python: fill NaN in dataframe with random values picked from the same column

I have a dataframe with some NaN values like the one below and I would like to fill in the nan values in a column with random picks from the same column.
e.g. randomly pick values from Col1 to fill in the NaN-values in Col1

   Col1      Col2      Col3      Col4   Col5
0  -0.671603 -0.792415  0.783922 NaN    Blue
1   0.207720       NaN  0.996131 Tom    Yellow
2  -0.892115 -1.282333       NaN Julia  NaN
3  -0.315598 -2.371529 -1.959646 NaN    Pink
4        NaN       NaN -0.584636 NaN    Orange
5   0.314736 -0.692732 -0.303951 Jim    NaN
6   0.355121       NaN       NaN NaN    Red
7        NaN -1.900148  1.230828 Sophia NaN
8  -1.795468  0.490953       NaN Anne   Blue
9  -0.678491 -0.087815       NaN NaN    NaN
10  0.755714  0.550589 -0.702019 NaN    Pink
11  0.951908 -0.529933  0.344544 Tobi   Yellow
12       NaN  0.075340 -0.187669 Jon    Red
13       NaN  0.314342 -0.936066 NaN    Yellow
14       NaN  1.293355  0.098964 Peter  Orange

Any idears?

I have tried something like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import numpy as np
import pandas as pd

num_nan= df[col_name].isna().sum()
for n in len(range(num_nan)):
  #pick random value from e.g. col1 that's not NaN
  df[col_name] = df[col_name].where((pd.notnull(df)), None).sample(random_state= 1)     
  #replace NaN-value in e.g. col1 with picked value
  df[col_name]= df.fillna('value')`

to replace the NaN-value sin a columne with a random pick from the same column

>Solution :

You can try:

for c in df:
    mask = df[c].isna()
    df.loc[mask, c] = np.random.choice(df.loc[~mask, c], size=(mask.sum(), 1))

print(df)

Prints (for example):

        Col1      Col2      Col3    Col4    Col5
0  -0.671603 -0.792415  0.783922     Jon    Blue
1   0.207720 -1.900148  0.996131     Tom  Yellow
2  -0.892115 -1.282333 -0.702019   Julia     Red
3  -0.315598 -2.371529 -1.959646    Tobi    Pink
4  -0.892115  0.075340 -0.584636     Jon  Orange
5   0.314736 -0.692732 -0.303951     Jim    Pink
6   0.355121 -0.792415  0.344544     Tom     Red
7  -0.892115 -1.900148  1.230828  Sophia     Red
8  -1.795468  0.490953 -0.303951    Anne    Blue
9  -0.678491 -0.087815  0.344544     Jon  Yellow
10  0.755714  0.550589 -0.702019   Peter    Pink
11  0.951908 -0.529933  0.344544    Tobi  Yellow
12 -0.678491  0.075340 -0.187669     Jon     Red
13  0.951908  0.314342 -0.936066   Julia  Yellow
14 -0.892115  1.293355  0.098964   Peter  Orange
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading