I have a df
with students from three different classes. I am trying to fill in the missing ages based on the mean age of the other students in the same class. I tried two different ways. One is working and the other one is not . I am not able to figure out why that is the case as I feel both ways are doing the exact same thing. Could you kindly explain me why the solution B is not working while A works?
Solution A: (Working)
df.loc[(df['Age'].isna()) & (df['Class'] == 1),'Age'] = mean_age
Solution B: (not working)
df.loc[df['Class'] == 1,'Age'].fillna(mean_age, inplace=True)
>Solution :
IIUC:
df['Age'] = df['Age'].fillna(df.groupby('Class')['Age'].transform('mean'))
The solution B can’t work because you slice your dataframe so you create a "copy" and fill nan values inplace. The copy is filled but not the original dataframe.