I have the following dataset called df:
structure(list(col1 = c("a b", "d e", "g f", "h j", "j k", "y z",
"e f", "b c", "f g", "c d", "y z", "t u")), class = "data.frame", row.names = c(NA,
-12L))
For this dataset, I have two vector with matches: A vector called matching1 <- c("a b", "b c", "c d") and a vector called matching2 <- c("c d","e f","f g"). In my df, I would like to create a new column and assign a value for a match. For the vector matching1, I would like to assign a value of 1, for the vector matching2 I would like to assign a value of 2 and for every string not matched a value of 3. Ideally, the value assignment for vector matching2 would not change the previous value assigment because the vector matching1 and matching2 both feature the string "d e". I know I can use:
matches1 <- paste0(na.omit(matching1), "", collapse = "|")
to create a collapsed vector with or and I have tried to combine it with case_when. However case_when does only allow single patterns and the list of potential matches in my original dataset is very long, so I would like to avoid spelling out every condition explicitely.
The output should look like this:
structure(list(col1 = c("a b", "d e", "g f", "h j", "j k", "y z",
"e f", "b c", "f g", "c d", "y z", "t u"), col2 = c("1", "2",
"3", "3", "3", "3", "2", "1", "2", "1", "3", "3")), class = "data.frame", row.names = c(NA,
-12L))
>Solution :
I think this does it:
edit: performing match2, to catch the situation where "c d" is in both, and match1 is preferred
df$ans<-ifelse(df$col1 %in% matching2, 2, 3)
df$ans<-ifelse(df$col1 %in% matching1, 1, df$ans)
Or pre-edit version with langtang’s comment:
df$ans<-ifelse(df$col1 %in% matching1, 1, 3)
df$ans<-ifelse(df$col1 %in% setdiff(matching2, matching1), 2, df$ans)