Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

REGEX to extract a string after an underscore up to a final mark in R

I’m sure this is a silly question, I have a couple of strings such as data_PB_Belf.csv and I need to exctract only PB_Belf (and so on). How can I exctract everything after the first _ up to . (preferably using stringr ) ?

data
[1] "data_PB_Belf.csv" "data_PB_NI.csv" ...

str_replace(data[1], "^[^_]+_([^_]+)_.*", "\\1") ## the closer I got, it returns "PB"
  • I tried to adapt the code from here, but I wasn’t able to. I’m sure that there’s a way to use str_replace() or str_sub() or str_extract(), I just can’t get the right Regex. Thanks in advance!

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

We may match the one or more characters that are not a _ ([^_]+) from the start (^) of the string, followed by an _, then capture the characters that are not a dot (.) (([^.]+)) followed by . (dot is metacharacter, so escape \\), followed by any characters and replace with the backreference (\\1) of the captured group

sub("^[^_]+_([^.]+)\\..*", "\\1", data)
[1] "PB_Belf" "PB_NI" 

Or with str_replace

library(stringr)
str_replace(data, "^[^_]+_([^.]+)\\..*", "\\1")
[1] "PB_Belf" "PB_NI" 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading