Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I find the first row that meets conditions of a mask for each group?

This is my DataFrame:

import pandas as pd
df = pd.DataFrame(
    {
        'a': ['x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
        'b': [1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2],
        'c': [9, 8, 11, 13, 14, 3, 104, 106, 11, 100, 70, 7]
    }
)

Expected output: Creating column out:

    a  b    c    out
0   x  1    9    NaN
1   x  1    8    NaN
2   x  1   11    NaN
3   x  2   13  found
4   x  2   14    NaN
5   y  1    3    NaN
6   y  1  104  found
7   y  1  106    NaN
8   y  2   11    NaN
9   y  2  100    NaN
10  y  2   70    NaN
11  y  2    7    NaN

The mask is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

mask = (df.c > 10)

The process: Grouping is by column a:

a) For each group, finding the first row that meets the conditions of the mask.

b) For group x this condition only applies when b == 2. That is why row 3 is selected.

And this is my attempt. It is getting close but it feels like this is not the way:

def func(g):
    mask = (g.c > 10)
    g.loc[mask.cumsum().eq(1) & mask, 'out'] = 'found'
    return g

df = df.groupby('a').apply(func)

>Solution :

One option with groupby.idxmax:

mask = (df['c'] > 10) & (df['a'].ne('x') | df['b'].eq(2))

idx = mask.groupby(df['a']).idxmax()
df.loc[idx[mask.loc[idx].values], 'out'] = 'found'

Another with groupby.transform:

mask = (df['c'] > 10) & (df['a'].ne('x') | df['b'].eq(2))

df.loc[mask & mask.groupby(df['a'])
                  .transform(lambda m: (~m).shift(fill_value=True)
                                           .cummin()),
       'out'] = 'found'

Output, with an extra group z that has no match:

    a  b    c    out
0   x  1    9    NaN
1   x  1    8    NaN
2   x  1   11    NaN
3   x  2   13  found
4   x  2   14    NaN
5   y  1    3    NaN
6   y  1  104  found
7   y  1  106    NaN
8   y  2   11    NaN
9   y  2  100    NaN
10  y  2   70    NaN
11  y  2    7    NaN
12  z  3    1    NaN
13  z  3    1    NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading