I need to extract a part of the string that staring from a certain pattern and ends with another pattern.
My code is as follows:
library(stringr)
s1="Ben Fisher5.0 out of 5 stars\n\n\n\n\n\n\n\n \n \n Purchased 2, one had a quality issue but they sent a replacement\n \nReviewed in the United States <U+0001F1FA><U+0001F1F8>"
start_pattern <- "out of 5 stars\n\n\n\n\n\n\n\n \n \n"
end_pattern <- "<U+0001F1FA>"
str_extract(s1, paste0("(?<=", start_pattern, ").*(?=",end_pattern , ")"))
But this code give the output as NA
Could anyone suggest anything to get the correct result?. I need to get the final output as "Purchased 2, one had a quality issue but they sent a replacement\n \nReviewed in the United States"
>Solution :
We may have to capture one or more characters that are not < with ([^<]+)) instead of just (.*). The prefix (out of 5 stars[\n\\s]+) matches the substring with one or more spaces and the next line character and the suffix matches one or more spaces followed by the specific string U+0001F1FA. The + is escaped as it can be a metacharacter in regex mode (\\s+<U\\+0001F1FA)
library(stringr)
str_extract(s1, "out of 5 stars[\n\\s]+([^<]+)\\s+<U\\+0001F1FA", group = 1)
-output
[1] "Purchased 2, one had a quality issue but they sent a replacement\n \nReviewed in the United States"