I have a large dataset grouped by column, row, year, potveg, and total. I am trying to get the max value of ‘total’ column in a specific year of a group. i.e., for the dataset below;
col row year potveg total
-125.0 42.5 2015 9 697.3
2015 13 535.2
2015 15 82.3
2016 9 907.8
2016 13 137.6
2016 15 268.4
2017 9 961.9
2017 13 74.2
2017 15 248.0
2018 9 937.9
2018 13 575.6
2018 15 215.5
-135.0 70.5 2015 8 697.3
2015 10 535.2
2015 19 82.3
2016 8 907.8
2016 10 137.6
2016 19 268.4
2017 8 961.9
2017 10 74.2
2017 19 248.0
2018 8 937.9
2018 10 575.6
2018 19 215.5
I want my output to look like this:
col row year potveg total
-125.0 42.5 2015 9 697.3
2016 9 907.8
2017 9 961.9
2018 9 937.9
-135.0 70.5 2015 8 697.3
2016 8 907.8
2017 8 961.9
2018 8 937.9
I tried
df.groupby(['col','row','year','potveg']).agg({'total':'max'})
and
df.groupby(['col','row','year','potveg'])['total'].max()
but those don’t seem to work because it outputs the same results.
I think the issue is the ‘potveg’ column which is a subgroup. I am not sure how to select rows containing max value of ‘total’.
>Solution :
One possible solution, using .idxmax() inside groupby.apply:
print(
df.groupby(["col", "row", "year"], as_index=False, sort=False).apply(
lambda x: x.loc[x["total"].idxmax()]
)
)
Prints:
col row year potveg total
0 -125.0 42.5 2015.0 9.0 697.3
1 -125.0 42.5 2016.0 9.0 907.8
2 -125.0 42.5 2017.0 9.0 961.9
3 -125.0 42.5 2018.0 9.0 937.9
4 -135.0 70.5 2015.0 8.0 697.3
5 -135.0 70.5 2016.0 8.0 907.8
6 -135.0 70.5 2017.0 8.0 961.9
7 -135.0 70.5 2018.0 8.0 937.9
DataFrame used:
| col | row | year | potveg | total |
|---|---|---|---|---|
| -125 | 42.5 | 2015 | 9 | 697.3 |
| -125 | 42.5 | 2015 | 13 | 535.2 |
| -125 | 42.5 | 2015 | 15 | 82.3 |
| -125 | 42.5 | 2016 | 9 | 907.8 |
| -125 | 42.5 | 2016 | 13 | 137.6 |
| -125 | 42.5 | 2016 | 15 | 268.4 |
| -125 | 42.5 | 2017 | 9 | 961.9 |
| -125 | 42.5 | 2017 | 13 | 74.2 |
| -125 | 42.5 | 2017 | 15 | 248 |
| -125 | 42.5 | 2018 | 9 | 937.9 |
| -125 | 42.5 | 2018 | 13 | 575.6 |
| -125 | 42.5 | 2018 | 15 | 215.5 |
| -135 | 70.5 | 2015 | 8 | 697.3 |
| -135 | 70.5 | 2015 | 10 | 535.2 |
| -135 | 70.5 | 2015 | 19 | 82.3 |
| -135 | 70.5 | 2016 | 8 | 907.8 |
| -135 | 70.5 | 2016 | 10 | 137.6 |
| -135 | 70.5 | 2016 | 19 | 268.4 |
| -135 | 70.5 | 2017 | 8 | 961.9 |
| -135 | 70.5 | 2017 | 10 | 74.2 |
| -135 | 70.5 | 2017 | 19 | 248 |
| -135 | 70.5 | 2018 | 8 | 937.9 |
| -135 | 70.5 | 2018 | 10 | 575.6 |
| -135 | 70.5 | 2018 | 19 | 215.5 |