I’ve a dataframe
| a | b | c |
|---|---|---|
| one | 6 | 11 |
| one | 7 | 12 |
| two | 8 | 23 |
| two | 9 | 14 |
| three | 10 | 15 |
| three | 20 | 25 |
I want to apply groupby at column a and then find the highest value in column c, so that, the highest value gets flagged, i.e.
| a | b | c |
|---|---|---|
| one | 6 | 11 |
| one | 7 | 12 |
Compare value 11&12, then
| a | b | c |
|---|---|---|
| two | 8 | 23 |
| two | 9 | 14 |
Compare value 23&14, then
| a | b | c |
|---|---|---|
| three | 10 | 15 |
| three | 20 | 25 |
Finally resulting in:
| a | b | c | flag |
|---|---|---|---|
| one | 6 | 11 | no |
| one | 7 | 12 | yes |
| two | 8 | 23 | yes |
| two | 9 | 14 | no |
| three | 10 | 15 | no |
| three | 20 | 25 | yes |
I/P DF:
df = pd.DataFrame({
'a':["one","one","two","two","three","three"]
, 'b':[6,7,8,9,10,20]
, 'c':[11,12,23,14,15,25]
# , 'flag': ['no', 'yes', 'yes', 'no', 'no', 'yes']
})
df
>Solution :
You can use groupby.transform to get the max value per group, and numpy.where to map the True/False to 'yes'/'no':
df['flag'] = np.where(df.groupby('a')['c'].transform('max').eq(df['c']), 'yes', 'no')
output:
a b c flag
0 one 6 11 no
1 one 7 12 yes
2 two 8 23 yes
3 two 9 14 no
4 three 10 15 no
5 three 20 25 yes
Intermediates:
df.groupby('a')['c'].transform('max')
0 12
1 12
2 23
3 23
4 25
5 25
Name: c, dtype: int64
df.groupby('a')['c'].transform('max').eq(df['c'])
0 False
1 True
2 True
3 False
4 False
5 True
Name: c, dtype: bool