I am trying to do a string search for multiple strings on one column, but the script does not return strings that have parentheses.
a <- c("Apple", "Facebook", "Google (1992)")
b <- c(1, 2, 3)
c <- data.frame(a, b)
d <- c %>%
distinct(a) %>%
pull()
c %>%
filter(str_detect(a, paste(d, collapse = "|"))) %>%
group_by(a) %>%
tally()
I want the last script to return "Apple", "Facebook", "Google (1992)", but it only returns the first two. Is there something I can add to the "collapse" argument to include strings with parentheses?
>Solution :
(Per the comments, you don’t even need regex in this case. But for future reference:) As you may already know, parentheses have to be escaped in regular expressions. This is easy enough when you’re specifying the pattern directly — e.g., str_detect(a, "Google \\(1992\\)"). But it can be slightly trickier when the pattern is stored in a variable, as in your case. You can handle this as
library(stringr)
str_detect(a, paste(
str_replace_all(d, c("\\(" = "\\\\(", "\\)" = "\\\\)")),
collapse = "|"
))
In the vector of replacements, we have to escape the parenthesis on the left side ("\\("). But on the right hand side, we have to escape the "\" — we use "\\\\" to insert a literal "\\".