Quick way to find duplicate IDs in two datasets?

January 12, 2023

I have two different datasets, df and df2.
Say each dataset comprises of person ID numbers.

I want to find if there are any duplicate person IDs (between df and df2) and remove any person ID duplicates from df2.

> dput(df)
c(123, 242, 142, 1535, 355, 253, 533, 676, 347, 49)


> dput(df2)
c(123, 0, 121, 32435, 34555, 25653, 53363, 67366, 336447, 4369
)

Here, we see that person ID number 123 appears in both datasets, so how can I easily filter df2 to remove any rows where a person ID already appears in df

*keep in mind my real datasets includes several thousand rows, so it would be a pain to manually find the duplicate person IDs and remove them manually

>Solution :

Please try

df <- c(123, 242, 142, 1535, 355, 253, 533, 676, 347, 49)
df2 <- c(123, 0, 121, 32435, 34555, 25653, 53363, 67366, 336447, 4369)

index <- which(is.na(match(df2,df)))
df2[index]