Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filling NaN in dataframe column based on groupby dataframe value

I want to impute values to replace NaNs in a dataframe. The dataframe contains Hour, Location and multiple numeric columns associated with speeds, times etc.
In each of the numeric columns containing NaNs, I want to replace the missing value with the average value for the hour and locationcode for that column.

EG The duration in Row 2 would be given the average value of the hour 0, Location BG values (3.15,3.18, 3.22) etc

Hour Location Duration
0 BG 3.15
0 BG NaN
0 WH 3.12
0 BG 3.18
0 BG 3.22
1 WH 43.1
1 WH NaN
1 BG 42.5
1 WH 38.4
1 BG 37.8

I assign

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df_mean = df.groupby(['Hour','Location']).mean(numeric_only=True)

with the intention of using this dataframe of means to be a source for the replacment vlue but haven’t found any success in solving my issue.

>Solution :

df['Duration'] = df['Duration'].fillna(
    df.groupby(['Hour', 'Location'])['Duration'].transform('mean')
)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading