Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to extract the IDs of non-matching values between 2 data frames by ID in R?

I am trying to build a report with all non-matching values between 2 data frames. I was trying to apply the solution here, but the intersect function does not work due to number of columns being different.

I am using the compared function from arsenal package, which does a good job at showing me the differences between dataframes, but I am not sure how to keep the non-matching rows into another data frame or another vector.

here is an example:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1 <- data.frame(id = c("a", "b", "c", "d","e"),
                  var = c(1, 2, 3, 4, 5),
                  var2 = c(1,2,3,4,5))
df2 <- data.frame(id = c("a", "b", "c", "d","e"),
                  var =c(1,3,4,2,5),
                  var2 = c(1,2,4,3,5))

library(arsenal)
summary(comparedf(df1, df2, by ="id"))

Which gives the solution here:

Table: Differences detected

var.x   var.y   id   values.x   values.y    row.x   row.y
------  ------  ---  ---------  ---------  ------  ------
var     var     b    2          3               2       2
var     var     c    3          4               3       3
var     var     d    4          2               4       4
var2    var2    c    3          4               3       3
var2    var2    d    4          3               4       4

Is there a way to extract the IDs from this table as a vector? Or subset the df1 using only these IDs would also work.

Edit: I added another variable column because in my real dataset multiple columns are being compared at the same time.

>Solution :

This would return a list of the ids from the comparedf function

df1 <- data.frame(id = c("a", "b", "c", "d","e"),
                  var = c(1, 2, 3, 4, 5))
df2 <- data.frame(id = c("a", "b", "c", "d","e"),
                  var =c(1,3,4,2,5))
library(arsenal)
vec1 <- summary(comparedf(df1, df2, by="id"))
df4 <- vec1$diffs.table
list1 <- df4$id
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading