I have two different datasets, df and df2.
Say each dataset comprises of person ID numbers.
I want to find if there are any duplicate person IDs (between df and df2) and remove any person ID duplicates from df2.
> dput(df)
c(123, 242, 142, 1535, 355, 253, 533, 676, 347, 49)
> dput(df2)
c(123, 0, 121, 32435, 34555, 25653, 53363, 67366, 336447, 4369
)
Here, we see that person ID number 123 appears in both datasets, so how can I easily filter df2 to remove any rows where a person ID already appears in df
*keep in mind my real datasets includes several thousand rows, so it would be a pain to manually find the duplicate person IDs and remove them manually
>Solution :
Please try
df <- c(123, 242, 142, 1535, 355, 253, 533, 676, 347, 49)
df2 <- c(123, 0, 121, 32435, 34555, 25653, 53363, 67366, 336447, 4369)
index <- which(is.na(match(df2,df)))
df2[index]