I have this type of data:
set.seed(123)
df <- data.frame(
v1 = sample(LETTERS[1:10], 5),
v2 = sample(LETTERS[1:10], 5),
v3 = sample(LETTERS[1:10], 5),
v4 = sample(LETTERS[1:10], 5)
)
as well as a number of key words:
keys <- c("A", "C", "F", "H")
I want to (i) detect which keyword is present in each column of df and (ii) create new columns for each of the keys, recording with 1if that key is present. I’ve been using a for loop so far, which, however, does not work:
library(stringr)
for(i in keys){
df[i] <- +str_detect(apply(df, 1, paste0, collapse = " "), keys[i])
}
df
v1 v2 v3 v4 A C F H
1 C A J I NA NA NA NA
2 H E E C NA NA NA NA
3 D H F A NA NA NA NA
4 G D I J NA NA NA NA
5 F C A F NA NA NA NA
I don’t know why it doesn’t work, as the same logic applied to a single key does work:
+str_detect(apply(df, 1, paste0, collapse = " "), keys[1])
[1] 1 0 1 0 1
I’d be grateful for advice on how the for loop must be tweaked but also for an entirely different solution to the problem (e.g., dplyr).
>Solution :
In your for loop you are indexing on both the index and the name. Test keys["A"], it’ll return NA. You can use which to avoid this:
library(stringr)
for(i in keys){
df[i] <- ifelse(str_detect(apply(df, 1, paste0, collapse = " "), keys[which(keys == i)]), 1, 0)
}
output
v1 v2 v3 v4 A C F H
1 C E J I 0 1 0 0
2 J D E J 0 0 0 0
3 B F C E 0 1 1 0
4 H H H C 0 1 0 1
5 F A A B 1 0 1 0
Another option would be to index on the position (1, 2, 3, 4):
for(i in seq_along(keys)){
df[keys[i]] <- ifelse(str_detect(apply(df, 1, paste0, collapse = " "), keys[i]), 1, 0)
}