Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Compare the sequence of several columns and identify what differ between them and a reference point

I want to compare a set of columns and identify the/or several differences between the columns and a reference. My data looks something like this toy data:

df <- data.frame(id = c(1:5),
                 var1 = c("A","A","A","A","B"),
                 var2 = c(10,20,10,10,10),
                 var3 = c("A2", "A2", "A3", "A2", "A2"),
                 var4 = c("B2", "B2", "B2", "B3", "B2"),
                 var5 = c("C2", "C2", "C2", "C2", "C4"))

This gives the following dataframe:

  id var1 var2 var3 var4 var5
1  1    A   10   A2   B2   C2
2  2    A   20   A2   B2   C2
3  3    A   10   A3   B2   C2
4  4    A   10   A2   B3   C2
5  5    B   10   A2   B2   C4

I also have the following
I want to compare the sequence of V1,V2,V3,V4 and V5 with a reference sequence (found in the vector ref, see below) and (1) create a new column with the part (column or vector element) of the sequence that differs (2) create a new column indicating in which column the difference was identified.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# Same as the first row of df
ref <- c("A", 10, "A2", "B2", "C2")

The expected output should be:

  id diff which
1  1 <NA>  <NA>
2  2   20  var2
3  3   A3  var3
4  4   B3  var4
5  5   C4  var5

Hence, with this output I can see that for observation with id == 2, the difference between the reference-point and the row-sequence was found in var2 and was 20.

Does anyone know how to do this?

>Solution :

in base R:

l <- apply(df[-1], 1, function(x) x[x != ref])
data.frame(id = 1:nrow(df),
           diff = sapply(l, toString),
           which = sapply(l, function(x) toString(names(x))))

  id  diff      which
1  1                 
2  2    20       var2
3  3    A3       var3
4  4    B3       var4
5  5 B, C4 var1, var5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading