Pandas – Drop rows based on multiple columns including max/min

June 24, 2022

I have a pandas DataFrame where I used groupby.ngroup() to identify groups of related data (basically duplicated data, but not exactly because that would have been too easy…).

DisID	BunchData	GroupID
1000	xyz	1
2012	abc	2
2014	abc	2
3000	def	3

I am trying to figure out how to remove the min "DisID" within a GroupID, only if there exists more than one row in a GroupID. In this case, the output would look like:

DisID	BunchData	GroupID
1000	xyz	1
2014	abc	2
3000	def	3

Thanks!

>Solution :

Let us do sort_values then drop_duplicates

df = df.sort_values('DisID').drop_duplicates(['GroupID'],keep='last')
Out[170]: 
   DisID BunchData  GroupID
0   1000       xyz        1
2   2014       abc        2
3   3000       def        3