Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Assign value to a new column based on any of the multiple patterns from a vector

I have the following dataset called df:

structure(list(col1 = c("a b", "d e", "g f", "h j", "j k", "y z", 
"e f", "b c", "f g", "c d", "y z", "t u")), class = "data.frame", row.names = c(NA, 
-12L))

For this dataset, I have two vector with matches: A vector called matching1 <- c("a b", "b c", "c d") and a vector called matching2 <- c("c d","e f","f g"). In my df, I would like to create a new column and assign a value for a match. For the vector matching1, I would like to assign a value of 1, for the vector matching2 I would like to assign a value of 2 and for every string not matched a value of 3. Ideally, the value assignment for vector matching2 would not change the previous value assigment because the vector matching1 and matching2 both feature the string "d e". I know I can use:

matches1 <- paste0(na.omit(matching1), "", collapse = "|")

to create a collapsed vector with or and I have tried to combine it with case_when. However case_when does only allow single patterns and the list of potential matches in my original dataset is very long, so I would like to avoid spelling out every condition explicitely.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The output should look like this:

structure(list(col1 = c("a b", "d e", "g f", "h j", "j k", "y z", 
"e f", "b c", "f g", "c d", "y z", "t u"), col2 = c("1", "2", 
"3", "3", "3", "3", "2", "1", "2", "1", "3", "3")), class = "data.frame", row.names = c(NA, 
-12L))

>Solution :

I think this does it:

edit: performing match2, to catch the situation where "c d" is in both, and match1 is preferred

df$ans<-ifelse(df$col1 %in% matching2, 2, 3)
df$ans<-ifelse(df$col1 %in% matching1, 1, df$ans)

Or pre-edit version with langtang’s comment:

df$ans<-ifelse(df$col1 %in% matching1, 1, 3)
df$ans<-ifelse(df$col1 %in% setdiff(matching2, matching1), 2, df$ans)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading