Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Removing duplicates based on matching column values with boolean indexing

After merging two DF’s I have the following dataset:

DB_ID x_val y_val
x01 405 407
x01 405 405
x02 308 306
x02 308 308
x03 658 658
x03 658 660
x04 None 658
x04 None 660
x05 658 660
x06 660 660

The y table contains multiple values for the left join variable (not included in table), resulting in multiple rows per unique DB_ID (string variable, not in df index).

The issue is that only one row is correct, where x_val and y_val match. I tried removing the duplicates with the following code:
df= df[~df['DB_ID'].duplicated() | combined['x_val'] != combined['y_val']]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This however doesn’t work. I am looking for a solution to achieve the following result:

DB_ID x_val y_val
x01 405 405
x02 308 308
x03 658 658
x04 None 658
x05 658 660
x06 660 660

>Solution :

Idea is compare both column for not equal, then sorting and reove duplicates by DB_ID:

df = (df.assign(new = df['x_val'].ne(df['y_val']))
        .sort_values(['DB_ID','new'])
        .drop_duplicates('DB_ID')
        .drop('new', axis=1))
print (df)
  DB_ID x_val y_val
1   x01   405   405
3   x02   308   308
4   x03   658   658
6   x04  None   658
8   x05   658   660
9   x06   660   660

If need equal NaNs or Nones use:

df = (df.assign(new = df['x_val'].fillna('same').ne(df['y_val'].fillna('same')))
        .sort_values(['DB_ID','new'])
        .drop_duplicates('DB_ID')
        .drop('new', axis=1))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading