If the period is X days, I want the slush to be summed over that time period(the time period is not equal to the length of the dataframe for each slush variable).
I want a new column A to show the Slush of that row + the prior slush within an appropriate period, where in this case, period = row.
I have checked other posts, I don’t believe it is Cumsum as the value is not carried over indefinitely and instead is dropped after its period is over.
Data
| Slush | Period |
|---|---|
| 85 | 1 |
| 20 | 3 |
| 40 | 3 |
| 15 | 1 |
| 22 | 4 |
Desired Output:
| Slush | Period | A |
|---|---|---|
| 85 | 1 | 85 |
| 20 | 3 | 20 |
| 40 | 3 | 60 |
| 15 | 1 | 75 |
| 22 | 4 | 62 |
>Solution :
You can use numpy for a custom cumsum:
a = np.arange(len(df))
mask = a < (a+df['Period'].to_numpy())[:,None]
df['A'] = (np.triu(mask) * df['Slush'].to_numpy()[:,None]).sum(axis=0)
Output:
Slush Period A
0 85 1 85
1 20 3 20
2 40 3 60
3 15 1 75
4 22 4 62
Intermediates:
np.triu(mask)
array([[ True, False, False, False, False],
[False, True, True, True, False],
[False, False, True, True, True],
[False, False, False, True, False],
[False, False, False, False, True]])
(np.triu(mask) * df['Slush'].to_numpy()[:,None])
array([[85, 0, 0, 0, 0],
[ 0, 20, 20, 20, 0],
[ 0, 0, 40, 40, 40],
[ 0, 0, 0, 15, 0],
[ 0, 0, 0, 0, 22]])