Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extracting a part of a string using R

I need to extract a part of the string that staring from a certain pattern and ends with another pattern.

My code is as follows:

library(stringr)    
s1="Ben Fisher5.0 out of 5 stars\n\n\n\n\n\n\n\n \n \n Purchased 2, one had a quality issue but they sent a replacement\n \nReviewed in the United States <U+0001F1FA><U+0001F1F8>"
    
    start_pattern <- "out of 5 stars\n\n\n\n\n\n\n\n \n \n"
    end_pattern <- "<U+0001F1FA>" 
    
    str_extract(s1, paste0("(?<=", start_pattern, ").*(?=",end_pattern  , ")"))

But this code give the output as NA

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Could anyone suggest anything to get the correct result?. I need to get the final output as "Purchased 2, one had a quality issue but they sent a replacement\n \nReviewed in the United States"

>Solution :

We may have to capture one or more characters that are not < with ([^<]+)) instead of just (.*). The prefix (out of 5 stars[\n\\s]+) matches the substring with one or more spaces and the next line character and the suffix matches one or more spaces followed by the specific string U+0001F1FA. The + is escaped as it can be a metacharacter in regex mode (\\s+<U\\+0001F1FA)

library(stringr)
str_extract(s1, "out of 5 stars[\n\\s]+([^<]+)\\s+<U\\+0001F1FA", group = 1)

-output

[1] "Purchased 2, one had a quality issue but they sent a replacement\n \nReviewed in the United States"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading