Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

DataFrame: impute column with the median value of each category

My dataset looks like:

df = pd.DataFrame(
{'sensor1': [1.94,0.93,0.98,1.75,1.75,3.25,0.5,0.5,5.59,6.02,9.21,4.54,3.71,1.05],
 'sensor2': [-0.91,0.42,-0.11,0.0,0.0,-0.12,0.0,0.0,0.48,0.26,-1.5,-0.75,-1.45,0.06],
 'sensor3': [18,19,20,-2094,-2094,17,17,17,-985,-985,1163,1163,1163,-1265],
 'type': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3]}
)

df.head()
    sensor1 sensor2 sensor3 type
0   1.94    -0.91      18    1
1   0.93     0.42      19    1
2   0.98    -0.11      20    1
3   1.75     0.00   -2094    1
4   1.75     0.00   -2094    1

I negative value from output of sensor3 means invalid. I would like to impute all invalid values in that column, using the median value of each type category.

Required

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    sensor1 sensor2 sensor3 type
0   1.94    -0.91     18     1
1   0.93     0.42     19     1
2   0.98    -0.11     20     1
3   1.75     0.00     19     1
4   1.75     0.00     19     1
5   3.25    -0.12     17     2
6   0.50     0.00     17     2
7   0.50     0.00     17     2
8   5.59     0.48     17     2
9   6.02     0.26     17     2
10  9.21    -1.50   1163     3
11  4.54    -0.75   1163     3
12  3.71    -1.45   1163     3
13  1.05     0.06   1163     3

>Solution :

Create mask for identify invalid values, here less like 0 by Series.lt, then filter rows with inverted mask and aggregate median, last filter rows in DataFrame.loc and mapping by medians by Series.map:

m = df['sensor3'].lt(0)

df.loc[m, 'sensor3'] = df.loc[m, 'type'].map(df[~m].groupby('type')['sensor3'].median())
print (df)
    sensor1  sensor2  sensor3  type
0      1.94    -0.91       18     1
1      0.93     0.42       19     1
2      0.98    -0.11       20     1
3      1.75     0.00       19     1
4      1.75     0.00       19     1
5      3.25    -0.12       17     2
6      0.50     0.00       17     2
7      0.50     0.00       17     2
8      5.59     0.48       17     2
9      6.02     0.26       17     2
10     9.21    -1.50     1163     3
11     4.54    -0.75     1163     3
12     3.71    -1.45     1163     3
13     1.05     0.06     1163     3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading