Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas groupby: get max value in a subgroup

I have a large dataset grouped by column, row, year, potveg, and total. I am trying to get the max value of ‘total’ column in a specific year of a group. i.e., for the dataset below;

col      row    year    potveg  total

-125.0  42.5    2015    9       697.3
                2015    13      535.2
                2015    15      82.3
                2016    9       907.8
                2016    13      137.6
                2016    15      268.4
                2017    9       961.9
                2017    13      74.2
                2017    15      248.0
                2018    9       937.9
                2018    13      575.6
                2018    15      215.5
-135.0  70.5    2015    8       697.3
                2015    10      535.2
                2015    19      82.3
                2016    8       907.8
                2016    10      137.6
                2016    19      268.4
                2017    8       961.9
                2017    10      74.2
                2017    19      248.0
                2018    8       937.9
                2018    10      575.6
                2018    19      215.5

I want my output to look like this:

col      row    year    potveg  total

-125.0  42.5    2015    9       697.3
                2016    9       907.8
                2017    9       961.9
                2018    9       937.9
-135.0  70.5    2015    8       697.3
                2016    8       907.8
                2017    8       961.9
                2018    8       937.9

I tried

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.groupby(['col','row','year','potveg']).agg({'total':'max'})

and

df.groupby(['col','row','year','potveg'])['total'].max()

but those don’t seem to work because it outputs the same results.
I think the issue is the ‘potveg’ column which is a subgroup. I am not sure how to select rows containing max value of ‘total’.

>Solution :

One possible solution, using .idxmax() inside groupby.apply:

print(
    df.groupby(["col", "row", "year"], as_index=False, sort=False).apply(
        lambda x: x.loc[x["total"].idxmax()]
    )
)

Prints:

     col   row    year  potveg  total
0 -125.0  42.5  2015.0     9.0  697.3
1 -125.0  42.5  2016.0     9.0  907.8
2 -125.0  42.5  2017.0     9.0  961.9
3 -125.0  42.5  2018.0     9.0  937.9
4 -135.0  70.5  2015.0     8.0  697.3
5 -135.0  70.5  2016.0     8.0  907.8
6 -135.0  70.5  2017.0     8.0  961.9
7 -135.0  70.5  2018.0     8.0  937.9

DataFrame used:

col row year potveg total
-125 42.5 2015 9 697.3
-125 42.5 2015 13 535.2
-125 42.5 2015 15 82.3
-125 42.5 2016 9 907.8
-125 42.5 2016 13 137.6
-125 42.5 2016 15 268.4
-125 42.5 2017 9 961.9
-125 42.5 2017 13 74.2
-125 42.5 2017 15 248
-125 42.5 2018 9 937.9
-125 42.5 2018 13 575.6
-125 42.5 2018 15 215.5
-135 70.5 2015 8 697.3
-135 70.5 2015 10 535.2
-135 70.5 2015 19 82.3
-135 70.5 2016 8 907.8
-135 70.5 2016 10 137.6
-135 70.5 2016 19 268.4
-135 70.5 2017 8 961.9
-135 70.5 2017 10 74.2
-135 70.5 2017 19 248
-135 70.5 2018 8 937.9
-135 70.5 2018 10 575.6
-135 70.5 2018 19 215.5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading