This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'a': [3, 1, 2, 5, 10, 3, 13, 3, 2],
}
)
Expected output is creating a a_max and a_min:
a a_max a_min
0 3 NaN NaN
1 1 3 1
2 2 3 1
3 5 3 1
4 10 3 1
5 3 10 3
6 13 10 3
7 3 13 3
8 2 13 2
Logic:
I explain the logic row by row. There is a dynamic window for this df that for the first instance of the window only the first row is considered. For the second instance of the window the first two rows are considered. Same as below:
These are the first four windows. It expands accordingly.
For each window I need to find the maximum value and after that I need to find the minimum value BELOW that maximum value.
I start explaining it from the yellow window. For this window the max value is 3 and the min value BELOW it is 1. So that is why a_max and a_min for this window is 3 and 1.
Now for the orange window the maximum value is 5 but since there are no values in this window BELOW this value that is less than 5, the previous a_max and a_min are repeated.
And the logic continues for the rest of rows.
This is my attempt:
df['a_max'] = df.a.cummax()
df['a_min'] = df.a.cummin()
>Solution :
This is a tricky one, I would use a cummax+shift, then mask+ffill to compute a_max. Then a_min is the groupby.cummin per group of identical a_max:
# compute the shifted cummax
cm = df['a'].cummax().shift()
# a_max is the cummax except if the current row is larger
df['a_max'] = cm.mask(df['a'].gt(cm)).ffill()
# a_min is the cummin of the current group of a_max
df['a_min'] = df.groupby('a_max')['a'].cummin()
Output:
a a_max a_min
0 3 NaN NaN
1 1 3.0 1.0
2 2 3.0 1.0
3 5 3.0 1.0
4 10 3.0 1.0
5 3 10.0 3.0
6 13 10.0 3.0
7 3 13.0 3.0
8 2 13.0 2.0
Intermediates:
a a_max cummax shift mask ffill a_min
0 3 NaN 3 NaN NaN NaN NaN
1 1 3.0 3 3.0 3.0 3.0 1.0
2 2 3.0 3 3.0 3.0 3.0 1.0
3 5 3.0 5 3.0 NaN 3.0 1.0
4 10 3.0 10 5.0 NaN 3.0 1.0
5 3 10.0 10 10.0 10.0 10.0 3.0
6 13 10.0 13 10.0 NaN 10.0 3.0
7 3 13.0 13 13.0 13.0 13.0 3.0
8 2 13.0 13 13.0 13.0 13.0 2.0
