Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Select certain columns, fill missing values in these columns with the average of columns with not-na values

Given a dataframe df as follows:

         date      value        pred1        pred2        pred3
0   2021-9-19  3613.9663          NaN          NaN          NaN
1   2021-9-26  3613.0673          NaN          NaN          NaN
2   2021-10-3  3568.1668          NaN          NaN          NaN
3  2021-10-10  3592.1666  3631.411799          NaN          NaN
4  2021-10-17  3572.3662  3637.792491  3677.656329          NaN
5  2021-10-24  3582.6036  3678.800911  3707.926324  3712.669694
6  2021-10-31  3547.3361  3731.336899  3735.695071  3733.021293
7   2021-11-7  3491.5677  3749.422639  3743.507496          NaN
8  2021-11-14  3539.1002  3737.851787          NaN          NaN
9  2021-11-21  3560.3734          NaN          NaN          NaN

Suppose for columns pred1, pred2 and pred3, except that all these columns are NaNs, if it has one non-null value, I will populate the other columns with that value, and if it has two or more non-null values, I will use the average of non-null values to fill null values of other columns.

For example, for the row with date 2021-10-10, I will populate pred2 and pred3 with the value of pred1; for the row with date 2021-10-17, I will populate pred3 with the mean of pred1 and pred2 on that day; The same logic will apply to 2021-11-7 and 2021-11-14.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The expected result:

         date      value        pred1        pred2        pred3
0   2021-9-19  3613.9663          NaN          NaN          NaN
1   2021-9-26  3613.0673          NaN          NaN          NaN
2   2021-10-3  3568.1668          NaN          NaN          NaN
3  2021-10-10  3592.1666  3631.411799  3631.411799  3631.411799
4  2021-10-17  3572.3662  3637.792491  3677.656329  3657.724410
5  2021-10-24  3582.6036  3678.800911  3707.926324  3712.669694
6  2021-10-31  3547.3361  3731.336899  3735.695071  3733.021293
7   2021-11-7  3491.5677  3749.422639  3743.507496     2.957571
8  2021-11-14  3539.1002  3737.851787  3737.851787  3737.851787
9  2021-11-21  3560.3734          NaN          NaN          NaN

Anyone could help to deal this issue? Thanks.

>Solution :

df.iloc[:,2:].apply(lambda x: x.fillna(x.mean()), axis = 1)
 
         pred1        pred2        pred3
0          NaN          NaN          NaN
1          NaN          NaN          NaN
2          NaN          NaN          NaN
3  3631.411799  3631.411799  3631.411799
4  3637.792491  3677.656329  3657.724410
5  3678.800911  3707.926324  3712.669694
6  3731.336899  3735.695071  3733.021293
7  3749.422639  3743.507496  3746.465068
8  3737.851787  3737.851787  3737.851787
9          NaN          NaN          NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading