Home How to find duplicated values in two columns between two dataframes and remove non-duplicates in R?

Questions

How to find duplicated values in two columns between two dataframes and remove non-duplicates in R?

February 2, 2023

So let’s say I have two dataframes that look like this

df1 <- data.frame(ID = c("A","B","F","G","B","B","A","G","G","F","A","A","A","B","F"),
                 code = c(1,2,2,3,3,1,2,2,1,1,3,2,2,1,1),
                 class =  c(2,4,5,5,2,3,2,5,1,2,4,5,3,2,1))

df2 <- data.frame(ID = c("G","F","C","F","B","A","F","C","A","B","A","B","C","A","G"),
                 code = c(1,2,2,3,3,1,2,2,1,1,3,2,2,1,1),
                 class =  c(2,4,5,5,2,3,2,5,1,2,4,5,3,2,1))

I want to check the duplicates in df1$ID and df2$ID and remove all the rows from df2 if the IDs are not present in df1 so the new dataframe would look like this:

df3 <- data.frame(ID = c("G","F","F","B","A","F","A","B","A","B","A","G"),
                 code = c(1,2,3,3,1,2,1,1,3,2,1,1),
                 class =  c(2,4,5,2,3,2,1,2,4,5,2,1))

>Solution :

With %in%:

df2[df2$ID %in% df1$ID, ]

   ID code class
1   G    1     2
2   F    2     4
4   F    3     5
5   B    3     2
6   A    1     3
7   F    2     2
9   A    1     1
10  B    1     2
11  A    3     4
12  B    2     5
14  A    1     2
15  G    1     1