I’m trying to extract part of a file name that matches a set of letters with variable length. The file names consist of several parameters separated by "_", but they vary in the number of parts. I’m trying to pull some of the parameters out to use separately.
Example file names:
a = "Vel_Mag_ft_modelExisting_350cfs_blah3.tif" b = "Depth_modelDesign_11000cfs_blah2.tif"
I’m trying to pull out the parts that start with "model" so I end up with
The filenames are stored as a variable in a data.frame
library(tidyverse) tibble(files = c(a,b))%>% mutate(attempt1 = str_extract(files, "model"), attempt2 = str_match(str_split(files, "_"), "model"))
but just ended up with the "model" in all cases and not the "model…." that I need.
The pieces I need are a consisent number of pieces from the end, but I couldn’t figure out how to specify that either. I tried
but this threw an error that it must be size 480 or 1 not size 479
We can create a function to capture the word before the
_ and one or more digits (
\\1), in the replacement, specify the backreference (
\\1) of the captured group
f1 <- function(x) sub(".*_([[:alpha:]]+)_\\d+.*", "\\1", x)
> f1(a)  "modelExisting" > f1(b)  "modelDesign"