Home how to delete NaN values in pandas while adding other columns' value to the very next row which has not NaN value?

Questions

how to delete NaN values in pandas while adding other columns' value to the very next row which has not NaN value?

August 19, 2022

Here is my dataframe view:

index   date    B   N   S   y_B   y_N   y_S  f_price    y_price  p_change   price_day_t-2
1   2021-01-12  1   29  0   2.0   57.0  0.0  10250.0    10760.0  -1.0        11060.0
2   2021-01-13  0   67  0   1.0   29.0  0.0  9810.0     10250.0  -1.0        10760.0
3   2021-01-14  2   19  0   0.0   67.0  0.0  NaN         NaN     NaN           NaN
4   2021-01-15  1   6   0   2.0   19.0  0.0  NaN         NaN     NaN           NaN
5   2021-01-16  2   46  0   1.0   6.0   0.0  9340.0     9810.0   -1.0        10250.0
6   2021-01-17  3   22  0   2.0   46.0  0.0  NaN         NaN     NaN           NaN
7   2021-01-18  1   34  0   3.0   22.0  0.0  8890.0     9340.0   -1.0        9810.0

I want to delete rows with null values but at the same time, I want to add the other columns’ values to the very next row without null values. which the result would be like this:

index   date    B   N   S   y_B   y_N   y_S  f_price    y_price  p_change   price_day_t-2
1   2021-01-12  1   29  0   2.0   57.0  0.0  10250.0    10760.0  -1.0        11060.0
2   2021-01-13  0   67  0   1.0   29.0  0.0  9810.0     10250.0  -1.0        10760.0
5   2021-01-16  5   71  0   3.0   92.0  0.0  9340.0     9810.0   -1.0        10250.0
7   2021-01-18  4   56  0   5.0   68.0  0.0  8890.0     9340.0   -1.0        9810.0

for example in row n.5 column "B", the result is 5 because of the aggregation 2+1+2. (the two other values [1] and [2] are from deleted rows)

How can I do this?

>Solution :

You can create groups by comapre last column from back for non missing values with cumulative sum, then aggregate sum for all columns without date, it aggregate by last values per groups:

g = df.iloc[::-1, -1].notna().cumsum()

d = dict.fromkeys(df.columns, 'sum')
d['date'] = 'last'

df = df.groupby(g, sort=False).agg(d).reset_index(drop=True)
print (df)
         date  B   N  S  y_B   y_N  y_S  f_price  y_price  p_change  \
0  2021-01-12  1  29  0  2.0  57.0  0.0  10250.0  10760.0      -1.0   
1  2021-01-13  0  67  0  1.0  29.0  0.0   9810.0  10250.0      -1.0   
2  2021-01-16  5  71  0  3.0  92.0  0.0   9340.0   9810.0      -1.0   
3  2021-01-18  4  56  0  5.0  68.0  0.0   8890.0   9340.0      -1.0   

   price_day_t-2  
0        11060.0  
1        10760.0  
2        10250.0  
3         9810.0