This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'a': ['x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
'b': [1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2],
'c': [9, 8, 11, 13, 14, 3, 104, 106, 11, 100, 70, 7]
}
)
Expected output: Creating column out:
a b c out
0 x 1 9 NaN
1 x 1 8 NaN
2 x 1 11 NaN
3 x 2 13 found
4 x 2 14 NaN
5 y 1 3 NaN
6 y 1 104 found
7 y 1 106 NaN
8 y 2 11 NaN
9 y 2 100 NaN
10 y 2 70 NaN
11 y 2 7 NaN
The mask is:
mask = (df.c > 10)
The process: Grouping is by column a:
a) For each group, finding the first row that meets the conditions of the mask.
b) For group x this condition only applies when b == 2. That is why row 3 is selected.
And this is my attempt. It is getting close but it feels like this is not the way:
def func(g):
mask = (g.c > 10)
g.loc[mask.cumsum().eq(1) & mask, 'out'] = 'found'
return g
df = df.groupby('a').apply(func)
>Solution :
One option with groupby.idxmax:
mask = (df['c'] > 10) & (df['a'].ne('x') | df['b'].eq(2))
idx = mask.groupby(df['a']).idxmax()
df.loc[idx[mask.loc[idx].values], 'out'] = 'found'
Another with groupby.transform:
mask = (df['c'] > 10) & (df['a'].ne('x') | df['b'].eq(2))
df.loc[mask & mask.groupby(df['a'])
.transform(lambda m: (~m).shift(fill_value=True)
.cummin()),
'out'] = 'found'
Output, with an extra group z that has no match:
a b c out
0 x 1 9 NaN
1 x 1 8 NaN
2 x 1 11 NaN
3 x 2 13 found
4 x 2 14 NaN
5 y 1 3 NaN
6 y 1 104 found
7 y 1 106 NaN
8 y 2 11 NaN
9 y 2 100 NaN
10 y 2 70 NaN
11 y 2 7 NaN
12 z 3 1 NaN
13 z 3 1 NaN