Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Quick way to find duplicate IDs in two datasets?

I have two different datasets, df and df2.
Say each dataset comprises of person ID numbers.

I want to find if there are any duplicate person IDs (between df and df2) and remove any person ID duplicates from df2.

> dput(df)
c(123, 242, 142, 1535, 355, 253, 533, 676, 347, 49)


> dput(df2)
c(123, 0, 121, 32435, 34555, 25653, 53363, 67366, 336447, 4369
)

Here, we see that person ID number 123 appears in both datasets, so how can I easily filter df2 to remove any rows where a person ID already appears in df

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

*keep in mind my real datasets includes several thousand rows, so it would be a pain to manually find the duplicate person IDs and remove them manually

>Solution :

Please try

df <- c(123, 242, 142, 1535, 355, 253, 533, 676, 347, 49)
df2 <- c(123, 0, 121, 32435, 34555, 25653, 53363, 67366, 336447, 4369)

index <- which(is.na(match(df2,df)))
df2[index]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading