Given a dataframe df as follows:
date value pred1 pred2 pred3
0 2021-9-19 3613.9663 NaN NaN NaN
1 2021-9-26 3613.0673 NaN NaN NaN
2 2021-10-3 3568.1668 NaN NaN NaN
3 2021-10-10 3592.1666 3631.411799 NaN NaN
4 2021-10-17 3572.3662 3637.792491 3677.656329 NaN
5 2021-10-24 3582.6036 3678.800911 3707.926324 3712.669694
6 2021-10-31 3547.3361 3731.336899 3735.695071 3733.021293
7 2021-11-7 3491.5677 3749.422639 3743.507496 NaN
8 2021-11-14 3539.1002 3737.851787 NaN NaN
9 2021-11-21 3560.3734 NaN NaN NaN
Suppose for columns pred1, pred2 and pred3, except that all these columns are NaNs, if it has one non-null value, I will populate the other columns with that value, and if it has two or more non-null values, I will use the average of non-null values to fill null values of other columns.
For example, for the row with date 2021-10-10, I will populate pred2 and pred3 with the value of pred1; for the row with date 2021-10-17, I will populate pred3 with the mean of pred1 and pred2 on that day; The same logic will apply to 2021-11-7 and 2021-11-14.
The expected result:
date value pred1 pred2 pred3
0 2021-9-19 3613.9663 NaN NaN NaN
1 2021-9-26 3613.0673 NaN NaN NaN
2 2021-10-3 3568.1668 NaN NaN NaN
3 2021-10-10 3592.1666 3631.411799 3631.411799 3631.411799
4 2021-10-17 3572.3662 3637.792491 3677.656329 3657.724410
5 2021-10-24 3582.6036 3678.800911 3707.926324 3712.669694
6 2021-10-31 3547.3361 3731.336899 3735.695071 3733.021293
7 2021-11-7 3491.5677 3749.422639 3743.507496 2.957571
8 2021-11-14 3539.1002 3737.851787 3737.851787 3737.851787
9 2021-11-21 3560.3734 NaN NaN NaN
Anyone could help to deal this issue? Thanks.
>Solution :
df.iloc[:,2:].apply(lambda x: x.fillna(x.mean()), axis = 1)
pred1 pred2 pred3
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 3631.411799 3631.411799 3631.411799
4 3637.792491 3677.656329 3657.724410
5 3678.800911 3707.926324 3712.669694
6 3731.336899 3735.695071 3733.021293
7 3749.422639 3743.507496 3746.465068
8 3737.851787 3737.851787 3737.851787
9 NaN NaN NaN