I’m sure this is a silly question, I have a couple of strings such as data_PB_Belf.csv and I need to exctract only PB_Belf (and so on). How can I exctract everything after the first _ up to . (preferably using stringr ) ?
data
[1] "data_PB_Belf.csv" "data_PB_NI.csv" ...
str_replace(data[1], "^[^_]+_([^_]+)_.*", "\\1") ## the closer I got, it returns "PB"
- I tried to adapt the code from here, but I wasn’t able to. I’m sure that there’s a way to use
str_replace()orstr_sub()orstr_extract(), I just can’t get the right Regex. Thanks in advance!
>Solution :
We may match the one or more characters that are not a _ ([^_]+) from the start (^) of the string, followed by an _, then capture the characters that are not a dot (.) (([^.]+)) followed by . (dot is metacharacter, so escape \\), followed by any characters and replace with the backreference (\\1) of the captured group
sub("^[^_]+_([^.]+)\\..*", "\\1", data)
[1] "PB_Belf" "PB_NI"
Or with str_replace
library(stringr)
str_replace(data, "^[^_]+_([^.]+)\\..*", "\\1")
[1] "PB_Belf" "PB_NI"