Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas – Drop rows based on multiple columns including max/min

I have a pandas DataFrame where I used groupby.ngroup() to identify groups of related data (basically duplicated data, but not exactly because that would have been too easy…).

DisID BunchData GroupID
1000 xyz 1
2012 abc 2
2014 abc 2
3000 def 3

I am trying to figure out how to remove the min "DisID" within a GroupID, only if there exists more than one row in a GroupID. In this case, the output would look like:

DisID BunchData GroupID
1000 xyz 1
2014 abc 2
3000 def 3

Thanks!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Let us do sort_values then drop_duplicates

df = df.sort_values('DisID').drop_duplicates(['GroupID'],keep='last')
Out[170]: 
   DisID BunchData  GroupID
0   1000       xyz        1
2   2014       abc        2
3   3000       def        3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading