Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas – Drop rows where *not* totally duplicated

I have a DataFrame that contains the following columns (along with others). I am trying to figure out how to remove all rows where: For each group number in ID_Dets if there exists more than 1 unique number in ID_Dets_2 then drop all rows.

I have bolded the rows that I would want removed. Thx!

Index Other Columns ID_Dets ID_Dets_2
11 aga 4 5
13 af 2 3
14 da 3 4
15 aga 3 4
16 as 0 0
17 ga 0 0
18 asg 0 1
19 asa 0 1
4 ga 1 2
5 aah 1 2

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can count the number of unique values per group and set a threshold to have 1 unique value:

df[df.groupby('ID_Dets')['ID_Dets_2'].transform('nunique').eq(1)]

or:

df.groupby('ID_Dets').filter(lambda g: len(g['ID_Dets_2'].unique())<=1)

output:

   Index Other Columns  ID_Dets  ID_Dets_2
0     11           aga        4          5
1     13            af        2          3
2     14            da        3          4
3     15           aga        3          4
8      4            ga        1          2
9      5           aah        1          2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading