Follow

Follow

Contact

Home How can I find the first row that meets conditions of a mask for each group?

Questions

How can I find the first row that meets conditions of a mask for each group?

byMR

April 14, 2024

This is my DataFrame:

import pandas as pd
df = pd.DataFrame(
    {
        'a': ['x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
        'b': [1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2],
        'c': [9, 8, 11, 13, 14, 3, 104, 106, 11, 100, 70, 7]
    }
)

Expected output: Creating column out:

    a  b    c    out
0   x  1    9    NaN
1   x  1    8    NaN
2   x  1   11    NaN
3   x  2   13  found
4   x  2   14    NaN
5   y  1    3    NaN
6   y  1  104  found
7   y  1  106    NaN
8   y  2   11    NaN
9   y  2  100    NaN
10  y  2   70    NaN
11  y  2    7    NaN

The mask is:

mask = (df.c > 10)

The process: Grouping is by column a:

a) For each group, finding the first row that meets the conditions of the mask.

b) For group x this condition only applies when b == 2. That is why row 3 is selected.

And this is my attempt. It is getting close but it feels like this is not the way:

def func(g):
    mask = (g.c > 10)
    g.loc[mask.cumsum().eq(1) & mask, 'out'] = 'found'
    return g

df = df.groupby('a').apply(func)

>Solution :

One option with groupby.idxmax:

mask = (df['c'] > 10) & (df['a'].ne('x') | df['b'].eq(2))

idx = mask.groupby(df['a']).idxmax()
df.loc[idx[mask.loc[idx].values], 'out'] = 'found'

Another with groupby.transform:

mask = (df['c'] > 10) & (df['a'].ne('x') | df['b'].eq(2))

df.loc[mask & mask.groupby(df['a'])
                  .transform(lambda m: (~m).shift(fill_value=True)
                                           .cummin()),
       'out'] = 'found'

Output, with an extra group z that has no match:

    a  b    c    out
0   x  1    9    NaN
1   x  1    8    NaN
2   x  1   11    NaN
3   x  2   13  found
4   x  2   14    NaN
5   y  1    3    NaN
6   y  1  104  found
7   y  1  106    NaN
8   y  2   11    NaN
9   y  2  100    NaN
10  y  2   70    NaN
11  y  2    7    NaN
12  z  3    1    NaN
13  z  3    1    NaN

group-by

byMR

Published April 14, 2024

Add a comment

Leave a ReplyCancel reply

Read more

Questions

lags while using loops in python and the code doesn't stop

byMR

April 14, 2024

Questions

How to clear the pre-commit caches?

byMR

April 14, 2024

Questions

How do I add Hilt to a Kotlin project in Android Studio?

byMR

April 14, 2024

Questions

Unexpected output while dealing with hierarchies in Java

byMR

April 14, 2024

Questions

not able to find items on specific screen area

byMR

April 14, 2024

Questions

How to partition a dataframe into dataframes with no missing and contiguous indices?

byMR

April 14, 2024