Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I get the first row that meets conditions of a mask if another condition is not present before it?

This is my DataFrame:

import pandas as pd

df = pd.DataFrame(
    {
        'close': [109, 109, 105, 110, 105, 120, 120, 11, 90, 100],
        'high': [110, 110, 108, 108, 115, 122, 123, 1120, 1000, 300],
        'target': [107, 107, 107, 107, 107, 124, 124, 500, 500, 500]
    }
)

Masks are:

m1 = (
    (df.high > df.target) &
    (df.close > df.target)
)

m2 = (
    (df.high > df.target) &
    (df.close < df.target)
)

Expected output is getting row 7 as output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  close  high  target
7     11  1120  500

The process is:

a) The grouping is by the target column.

b) For each group I want to find the first row that meets conditions of m2 IF ONLY m1 does not have any match BEFORE it.

For example:

For group 107, there is a match for m2 but since m1 has a match BEFORE that, this group should be skipped.

For the next group which is 124 there are no rows that has a match for m2.

For group 500 there is a row and there are no rows before it that m1 is true.

For each group I want one row with this condition and for the total result I want the first match. So for example if one row has been found for multiple groups, the first row should be selected in the df regardless of groups.

These are my attempts:

# attmpt 1
df['a'] = m1.cummax()
df['b'] = m2.cummax()
# attempt 2
out = df[m2.cumsum().eq(1) & m2]

>Solution :

IIUC, you can build a reverse mask with groupby.cummin:

out = df[(~m1).groupby(df['target']).cummin() & m2].drop_duplicates('target')

Or to keep the first overall match:

out = df[(~m1).groupby(df['target']).cummin() & m2].iloc[:1]

Output:

   close  high  target
7     11  1120     500

Intermediates:

   close  high  target    ~m1  cummin  ... & m2
0    109   110     107  False   False     False
1    109   110     107  False   False     False
2    105   108     107   True   False     False
3    110   108     107  False   False     False
4    105   115     107   True   False     False
5    120   122     124   True    True     False
6    120   123     124   True    True     False
7     11  1120     500   True    True      True
8     90  1000     500   True    True      True
9    100   300     500   True    True     False
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading