Home Groupby one column, if the date column are the same, fill in the missing values in the numerical column

Questions

Groupby one column, if the date column are the same, fill in the missing values in the numerical column

November 15, 2023

Assuming for the dataframe df as follows:

          date  actual_value  fitted_value  predicted_value    code
0    2023/8/31           NaN           NaN       520.994413  LX0301
1    2023/9/30           NaN           NaN       580.967973  LX0301
2   2023/10/31           NaN           NaN       650.392867  LX0301
3    2023/8/31    471.459992    520.027310              NaN  LX0301
4    2023/9/30           NaN           NaN       531.199547  LX0301
5   2023/10/31           NaN           NaN       600.053484  LX0301
6    2023/8/31    471.459992    511.902229              NaN  LX0301
7    2023/9/30    480.400211    518.202630              NaN  LX0301
8   2023/10/31           NaN           NaN       537.890792  LX0301
9    2023/8/31           NaN           NaN        99.216818  LX0101
10   2023/9/30           NaN           NaN        98.624779  LX0101
11  2023/10/31           NaN           NaN        98.690596  LX0101
12   2023/8/31     99.400000     99.221767              NaN  LX0101
13   2023/9/30           NaN           NaN        98.822977  LX0101
14  2023/10/31           NaN           NaN        98.875062  LX0101
15   2023/8/31     99.400000     99.220693              NaN  LX0101
16   2023/9/30     98.700000     98.802146              NaN  LX0101
17  2023/10/31           NaN           NaN        98.738091  LX0101

For each group of code (in this example, there are LX0301 and LX0101), if date‘s values are the same, I need to fill NaNs in actual_value with the actual_value that exists on that date. Taking LX0301 as an example, we see that the actual_value in rows 0, 1 and 4, 2023/8/31 and 2023/9/30 are missing values, but they appear in rows 6 (or 3) and 7, so we need to use the existing actual values in rows 6 (or 3) and 7 to fill in row 0, 1 and 4. Note: For each set of data in this example, the actual values on the same date are the same (for example, the actual values in rows 3 and 6, rows 12 and 15 are the same). How to achieve this? Thanks.

The required result is as follows:

          date  actual_value  fitted_value  predicted_value    code
0    2023/8/31    471.459992           NaN       520.994413  LX0301
1    2023/9/30    480.400211           NaN       580.967973  LX0301
2   2023/10/31           NaN           NaN       650.392867  LX0301
3    2023/8/31    471.459992    520.027310              NaN  LX0301
4    2023/9/30    480.400211           NaN       531.199547  LX0301
5   2023/10/31           NaN           NaN       600.053484  LX0301
6    2023/8/31    471.459992    511.902229              NaN  LX0301
7    2023/9/30    480.400211    518.202630              NaN  LX0301
8   2023/10/31           NaN           NaN       537.890792  LX0301
9    2023/8/31     99.400000           NaN        99.216818  LX0101
10   2023/9/30     98.700000           NaN        98.624779  LX0101
11  2023/10/31           NaN           NaN        98.690596  LX0101
12   2023/8/31     99.400000     99.221767              NaN  LX0101
13   2023/9/30     98.700000           NaN        98.822977  LX0101
14  2023/10/31           NaN           NaN        98.875062  LX0101
15   2023/8/31     99.400000     99.220693              NaN  LX0101
16   2023/9/30     98.700000     98.802146              NaN  LX0101
17  2023/10/31           NaN           NaN        98.738091  LX0101

>Solution :

Try:

df['actual_value_new'] = df.groupby(['code','date'])['actual_value'].transform('first')

Output (used actual_value_new just to show old and new columns):

          date  actual_value  fitted_value  predicted_value    code  actual_value_new
0    2023/8/31           NaN           NaN       520.994413  LX0301        471.459992
1    2023/9/30           NaN           NaN       580.967973  LX0301        480.400211
2   2023/10/31           NaN           NaN       650.392867  LX0301               NaN
3    2023/8/31    471.459992    520.027310              NaN  LX0301        471.459992
4    2023/9/30           NaN           NaN       531.199547  LX0301        480.400211
5   2023/10/31           NaN           NaN       600.053484  LX0301               NaN
6    2023/8/31    471.459992    511.902229              NaN  LX0301        471.459992
7    2023/9/30    480.400211    518.202630              NaN  LX0301        480.400211
8   2023/10/31           NaN           NaN       537.890792  LX0301               NaN
9    2023/8/31           NaN           NaN        99.216818  LX0101         99.400000
10   2023/9/30           NaN           NaN        98.624779  LX0101         98.700000
11  2023/10/31           NaN           NaN        98.690596  LX0101               NaN
12   2023/8/31     99.400000     99.221767              NaN  LX0101         99.400000
13   2023/9/30           NaN           NaN        98.822977  LX0101         98.700000
14  2023/10/31           NaN           NaN        98.875062  LX0101               NaN
15   2023/8/31     99.400000     99.220693              NaN  LX0101         99.400000
16   2023/9/30     98.700000     98.802146              NaN  LX0101         98.700000
17  2023/10/31           NaN           NaN        98.738091  LX0101               NaN