Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract a non-matching portion of a string using stringr and lookahead

I have a string which always contains unwanted text at the end. I would like to extract everything but the unwanted text.

text <- "my_text_and_unwanted_text"
output <- str_extract(text, ".*(?=<_and)")
output

I am hoping that ".*" matches all text that precedes anything with "_and". So the intended result is "my text" but I get "NA". I have reviewed a number of posts but having trouble finding examples that show how to match everything but the desired string.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Another way to think of this operation is to replace the unwanted text with nothing rather than extract everything else. This is often simpler.

text <- "my_text_and_unwanted_text"
str_replace(text, "_and.*", "")
# [1] "my_text"

For the extracting approach, your attempt was very close. (?<= is for look-behind, you need (?= for look-ahead

str_extract(text, ".*(?=_and)")
# [1] "my_text"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading