This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'close': [109, 109, 105, 110, 105, 120, 120, 11, 90, 100],
'high': [110, 110, 108, 108, 115, 122, 123, 1120, 1000, 300],
'target': [107, 107, 107, 107, 107, 124, 124, 500, 500, 500]
}
)
Masks are:
m1 = (
(df.high > df.target) &
(df.close > df.target)
)
m2 = (
(df.high > df.target) &
(df.close < df.target)
)
Expected output is getting row 7 as output:
close high target
7 11 1120 500
The process is:
a) The grouping is by the target column.
b) For each group I want to find the first row that meets conditions of m2 IF ONLY m1 does not have any match BEFORE it.
For example:
For group 107, there is a match for m2 but since m1 has a match BEFORE that, this group should be skipped.
For the next group which is 124 there are no rows that has a match for m2.
For group 500 there is a row and there are no rows before it that m1 is true.
For each group I want one row with this condition and for the total result I want the first match. So for example if one row has been found for multiple groups, the first row should be selected in the df regardless of groups.
These are my attempts:
# attmpt 1
df['a'] = m1.cummax()
df['b'] = m2.cummax()
# attempt 2
out = df[m2.cumsum().eq(1) & m2]
>Solution :
IIUC, you can build a reverse mask with groupby.cummin:
out = df[(~m1).groupby(df['target']).cummin() & m2].drop_duplicates('target')
Or to keep the first overall match:
out = df[(~m1).groupby(df['target']).cummin() & m2].iloc[:1]
Output:
close high target
7 11 1120 500
Intermediates:
close high target ~m1 cummin ... & m2
0 109 110 107 False False False
1 109 110 107 False False False
2 105 108 107 True False False
3 110 108 107 False False False
4 105 115 107 True False False
5 120 122 124 True True False
6 120 123 124 True True False
7 11 1120 500 True True True
8 90 1000 500 True True True
9 100 300 500 True True False