Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Check for duplicate rows for a subset of columns in a Pandas DataFrameGroupBy object

Suppose I have a groupby object (grouped on Col1) like below:

Col1     Col2     Col3     Col4     Col5
----------------------------------------
AAA       001      456      846      239     row1
          002      374      238      904     row2
          003      456      846      239     row3

BBB       001      923      222      398     row1
          002      923      222      398     row2
          003      755      656      949     row3

CCC       001      324      454      565     row1
          002      744      345      336     row2
          003      567      355      756     row3

Is there a way to check for duplicate rows based on [Col3, Col4, Col5] within each group. In the example above, for group AAA, row 1 and row 3 match since Col3, Col4, Col5 values are same. Again in group BBB, row 1 and row 2 match. In group CCC, using above logic, we don’t have any duplicate rows.

Can we create a list with ‘n’ boolean elements (where ‘n’ represents the number of groups), fill it with True if there is any duplicate for that group, else False. Therefore, for above example output would be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[True, True, False]

>Solution :

You can try groupby column Col1 then use duplicated() to check if there are any duplicated from Col3 to Col5

out = (df.groupby('Col1')
       .apply(lambda g: g[['Col3','Col4','Col5']].duplicated().any())
       .tolist())
print(out)

[True, True, False]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading