Home Is there a way to fill in specific missing rows in a column using rows in another column as a filter

Questions

Is there a way to fill in specific missing rows in a column using rows in another column as a filter

June 5, 2022

I have a dataset of car attributes with missing values in some columns. In the Distance column for example, there are missing values and I want to replace them with the mean. There is a second column however, Car Type it shows whether the car is brand new or used. A brand new car would not have that many miles driven compared to a used car. I want to replace the NaN values in Distance with the mean of Distance values where the Car Type == 'Brand New'

Minimal setup:

df = pd.DataFrame({'Car type': ['New','Used','New','New','New','Used','New','New'],
                   'Distance':[20,2222,34,np.nan,np.nan,np.nan,50,10]})
print(df)

  Car type  Distance
0      New      20.0
1     Used    2222.0
2      New      34.0
3      New       NaN
4      New       NaN
5     Used       NaN
6      New      50.0
7      New      10.0

>Solution :

Compute the mean for each Car Type and broadcast the values (with transform) to all rows then use fillna to replace NaN by the mean value:

df['Distance'] = (df['Distance'].fillna(df.groupby('Car type')['Distance']
                                .transform('mean')))
print(df)

# Output
  Car type  Distance
0      New      20.0
1     Used    2222.0
2      New      34.0
3      New      28.5  # mean of New car
4      New      28.5  # mean of New car
5     Used    2222.0  # mean of Used car
6      New      50.0
7      New      10.0