Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to impute nan values in a Pandas dataframe from a multi-index dataframe?

I have the following dataframe:

df = pd.DataFrame([[np.nan, 2, 20, 4],
                   [3, 1, np.nan, 1],
                   [3, 1, 15, 1],
                   [np.nan, 1, np.nan, 1],
                   [10, 1, 30, 4],
                   [50, 2, 35, 4],
                   [10, 1, 37, 4],
                   [40, 2, 30, 1]],
                  columns=list("ABCD"))

I want to fill the Nan values with their group means.
Towards that purpose, I run the following:

df_mean=df.groupby(["B","D"]).mean()
df_mean

        A     C
B   D       
1   1   3.0   15.0
    4   10.0  33.5
2   1   40.0  30.0
    4   50.0  27.5

Is there a way to fill the dataframe df with the values computed in df_mean?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

One way to do this would be as in this answer

df[["A", "C"]] = (
    df
    # create groups
    .groupby(["B", "D"])
    # transform the groups by filling na values with the group mean
    .transform(lambda x: x.fillna(x.mean()))
)

However, for a few millions of rows, where the simple groupby([…]).mean() would take a few seconds, take too long…

It there a quicker way to solve this?

>Solution :

Use GroupBy.transform by mean and pass to DataFrame.fillna:

df = df.fillna(df.groupby(["B", "D"]).transform('mean'))
print (df)
      A  B     C  D
0  50.0  2  20.0  4
1   3.0  1  15.0  1
2   3.0  1  15.0  1
3   3.0  1  15.0  1
4  10.0  1  30.0  4
5  50.0  2  35.0  4
6  10.0  1  37.0  4
7  40.0  2  30.0  1 

Your solution with aggregation is possible also use this way:

df = df.fillna(df[['B','D']].join(df.groupby(["B","D"]).mean(), on=['B','D']))
print (df)
      A  B     C  D
0  50.0  2  20.0  4
1   3.0  1  15.0  1
2   3.0  1  15.0  1
3   3.0  1  15.0  1
4  10.0  1  30.0  4
5  50.0  2  35.0  4
6  10.0  1  37.0  4
7  40.0  2  30.0  1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading