Assuming for the dataframe df as follows:
date actual_value fitted_value predicted_value code
0 2023/8/31 NaN NaN 520.994413 LX0301
1 2023/9/30 NaN NaN 580.967973 LX0301
2 2023/10/31 NaN NaN 650.392867 LX0301
3 2023/8/31 471.459992 520.027310 NaN LX0301
4 2023/9/30 NaN NaN 531.199547 LX0301
5 2023/10/31 NaN NaN 600.053484 LX0301
6 2023/8/31 471.459992 511.902229 NaN LX0301
7 2023/9/30 480.400211 518.202630 NaN LX0301
8 2023/10/31 NaN NaN 537.890792 LX0301
9 2023/8/31 NaN NaN 99.216818 LX0101
10 2023/9/30 NaN NaN 98.624779 LX0101
11 2023/10/31 NaN NaN 98.690596 LX0101
12 2023/8/31 99.400000 99.221767 NaN LX0101
13 2023/9/30 NaN NaN 98.822977 LX0101
14 2023/10/31 NaN NaN 98.875062 LX0101
15 2023/8/31 99.400000 99.220693 NaN LX0101
16 2023/9/30 98.700000 98.802146 NaN LX0101
17 2023/10/31 NaN NaN 98.738091 LX0101
For each group of code (in this example, there are LX0301 and LX0101), if date‘s values are the same, I need to fill NaNs in actual_value with the actual_value that exists on that date. Taking LX0301 as an example, we see that the actual_value in rows 0, 1 and 4, 2023/8/31 and 2023/9/30 are missing values, but they appear in rows 6 (or 3) and 7, so we need to use the existing actual values in rows 6 (or 3) and 7 to fill in row 0, 1 and 4. Note: For each set of data in this example, the actual values on the same date are the same (for example, the actual values in rows 3 and 6, rows 12 and 15 are the same). How to achieve this? Thanks.
The required result is as follows:
date actual_value fitted_value predicted_value code
0 2023/8/31 471.459992 NaN 520.994413 LX0301
1 2023/9/30 480.400211 NaN 580.967973 LX0301
2 2023/10/31 NaN NaN 650.392867 LX0301
3 2023/8/31 471.459992 520.027310 NaN LX0301
4 2023/9/30 480.400211 NaN 531.199547 LX0301
5 2023/10/31 NaN NaN 600.053484 LX0301
6 2023/8/31 471.459992 511.902229 NaN LX0301
7 2023/9/30 480.400211 518.202630 NaN LX0301
8 2023/10/31 NaN NaN 537.890792 LX0301
9 2023/8/31 99.400000 NaN 99.216818 LX0101
10 2023/9/30 98.700000 NaN 98.624779 LX0101
11 2023/10/31 NaN NaN 98.690596 LX0101
12 2023/8/31 99.400000 99.221767 NaN LX0101
13 2023/9/30 98.700000 NaN 98.822977 LX0101
14 2023/10/31 NaN NaN 98.875062 LX0101
15 2023/8/31 99.400000 99.220693 NaN LX0101
16 2023/9/30 98.700000 98.802146 NaN LX0101
17 2023/10/31 NaN NaN 98.738091 LX0101
>Solution :
Try:
df['actual_value_new'] = df.groupby(['code','date'])['actual_value'].transform('first')
Output (used actual_value_new just to show old and new columns):
date actual_value fitted_value predicted_value code actual_value_new
0 2023/8/31 NaN NaN 520.994413 LX0301 471.459992
1 2023/9/30 NaN NaN 580.967973 LX0301 480.400211
2 2023/10/31 NaN NaN 650.392867 LX0301 NaN
3 2023/8/31 471.459992 520.027310 NaN LX0301 471.459992
4 2023/9/30 NaN NaN 531.199547 LX0301 480.400211
5 2023/10/31 NaN NaN 600.053484 LX0301 NaN
6 2023/8/31 471.459992 511.902229 NaN LX0301 471.459992
7 2023/9/30 480.400211 518.202630 NaN LX0301 480.400211
8 2023/10/31 NaN NaN 537.890792 LX0301 NaN
9 2023/8/31 NaN NaN 99.216818 LX0101 99.400000
10 2023/9/30 NaN NaN 98.624779 LX0101 98.700000
11 2023/10/31 NaN NaN 98.690596 LX0101 NaN
12 2023/8/31 99.400000 99.221767 NaN LX0101 99.400000
13 2023/9/30 NaN NaN 98.822977 LX0101 98.700000
14 2023/10/31 NaN NaN 98.875062 LX0101 NaN
15 2023/8/31 99.400000 99.220693 NaN LX0101 99.400000
16 2023/9/30 98.700000 98.802146 NaN LX0101 98.700000
17 2023/10/31 NaN NaN 98.738091 LX0101 NaN