how to extract part of a string matching pattern with separation in r

I’m trying to extract part of a file name that matches a set of letters with variable length. The file names consist of several parameters separated by "_", but they vary in the number of parts. I’m trying to pull some of the parameters out to use separately.

Example file names:

a = "Vel_Mag_ft_modelExisting_350cfs_blah3.tif"
b = "Depth_modelDesign_11000cfs_blah2.tif"

I’m trying to pull out the parts that start with "model" so I end up with

"modelExisting"
"modelDesign"

The filenames are stored as a variable in a data.frame
I’ve tried

library(tidyverse)
tibble(files = c(a,b))%>%
  mutate(attempt1 = str_extract(files, "model"),
         attempt2 = str_match(str_split(files, "_"), "model"))

but just ended up with the "model" in all cases and not the "model…." that I need.

The pieces I need are a consisent number of pieces from the end, but I couldn’t figure out how to specify that either. I tried

str_split(files, "_")[-3] 

but this threw an error that it must be size 480 or 1 not size 479

>Solution :

We can create a function to capture the word before the _ and one or more digits (\\1), in the replacement, specify the backreference (\\1) of the captured group

f1 <- function(x) sub(".*_([[:alpha:]]+)_\\d+.*", "\\1", x)

-testing

> f1(a)
[1] "modelExisting"
> f1(b)
[1] "modelDesign"

Leave a Reply