I have a dataframe as shown below and the output from this code is shown below.
- Essentially, I want to read values from columns which can be of two formats as shown in col1 and col2.
- If the format is as in col1 then we split text and read third element and if the format is as in col2 we read the cell value as it is.
- We then want to pass condition if the values in columns (col1 and col3 after splitting col1) to create col4 and col5.
- Finally, we want the result in col6.
- the variable "selection" is a list of numbers
Question: Since in the original dataframe, there can be multiple columns of value format type as in col1 and col2, how the code below can be modified to get the desired output?
#STEP1
df <- data.frame(
col1 = c("abc_1_102", "abc_1_103", "xyz_1_104"),
col2 = c("107", "108", "106")
)
#STEP2
split_text <- strsplit(df$col1, "_")
third_elements <- sapply(split_text, function(x) if(length(x) >= 3) x[3] else NA)
#STEP3
df$col3<-third_elements
#STEP4
selection<-c(107,102,108)
df$col4<-ifelse(df$col2 %in% selection,"SELECT","NOTSELECT")
df$col5<-ifelse(df$col3 %in% selection,"SELECT","NOTSELECT")
#STEP5
df$col6<-paste(df$col4,df$col5,sep = ",")
Output from above code:
col1 col2 col3 col4 col5 col6
1 abc_1_102 107 102 SELECT SELECT SELECT,SELECT
2 abc_1_103 108 103 SELECT NOTSELECT SELECT,NOTSELECT
3 xyz_1_104 106 104 NOTSELECT NOTSELECT NOTSELECT,NOTSELECT
Desired output
col1 col2 col6
1 abc_1_102 107 SELECT,SELECT
2 abc_1_103 108 NOTSELECT,SELECT
3 xyz_1_104 106 NOTSELECT,NOTSELECT
>Solution :
You can do this all in one go with by pasting two ifelse statements together. The ifelse for col2 is straightforward. The ifelse for col3 uses grepl to search for any of the numbers in select by creating a search string using paste(..., collapse = "|") (pasting the "or" operator between them). The outer paste(…, sep = ",")` puts it all together:
df$col6 <- paste(ifelse(df$col2 %in% selection, "SELECT", "NOTSELECT"),
ifelse(grepl(paste(selection, collapse = "|"), df$col1), "SELECT", "NOTSELECT"),
sep = ",")
Or to more safely add a word boundary to the second ifelse (thanks to @r2evans!)
df$col6 <- paste(ifelse(df$col2 %in% selection, "SELECT", "NOTSELECT"),
ifelse(grepl(
paste0("(^|_)(", paste(selection, collapse = "|"), ")(_|$)\\b"),
df$col1),
"SELECT", "NOTSELECT"),
sep = ",")
Both give the same output in this example:
col1 col2 col6
1 abc_1_102 107 SELECT,SELECT
2 abc_1_103 108 SELECT,NOTSELECT
3 xyz_1_104 106 NOTSELECT,NOTSELECT