Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Groupby index and keep the max column value given a single column

Scenario: With a dataframe with duplicated indices, I want to groupby while keeping the max value. I found the solution to this in Drop duplicates by index, keeping max for each column across duplicates however, this gets the max value of each column. This mixed the data of different rows, keeping the max values.

Question: If instead of mixing the values of different rows, I want to keep a single row, where the value of a column "C" is the highest among the rows with the same index (in this case I will select the row with the highest value in "C" and keep all values for that row, not mixing with high values of other columns from other rows), how should the groupby be performed?

What I tried: From the question linked, I got

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.groupby(df.index).max()

and tried to modify it to:

df.groupby(df.index)['C'].max()

but this deletes the other columns of the dataframe.

>Solution :

You don’t provide a sample of your data so I’m just going for a general approach.

That said, you can sort the dataframe by C, then groupby with head:

# this assumes that index has only one level
df.sort_values('C', ascending=False).groupby(level=0).head(1)

Or:

df.sort_values('C').groupby(level=0).tail(1)

Also take a look at this related question (not by the index, but a column):

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading