Pandas – Drop rows where not totally duplicated

July 6, 2022

I have a DataFrame that contains the following columns (along with others). I am trying to figure out how to remove all rows where: For each group number in ID_Dets if there exists more than 1 unique number in ID_Dets_2 then drop all rows.

I have bolded the rows that I would want removed. Thx!

Index	Other Columns	ID_Dets	ID_Dets_2
11	aga	4	5
13	af	2	3
14	da	3	4
15	aga	3	4
16	as	0	0
17	ga	0	0
18	asg	0	1
19	asa	0	1
4	ga	1	2
5	aah	1	2

>Solution :

You can count the number of unique values per group and set a threshold to have 1 unique value:

df[df.groupby('ID_Dets')['ID_Dets_2'].transform('nunique').eq(1)]

or:

df.groupby('ID_Dets').filter(lambda g: len(g['ID_Dets_2'].unique())<=1)

output:

   Index Other Columns  ID_Dets  ID_Dets_2
0     11           aga        4          5
1     13            af        2          3
2     14            da        3          4
3     15           aga        3          4
8      4            ga        1          2
9      5           aah        1          2