Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to extract part of a string matching pattern with separation in r

I’m trying to extract part of a file name that matches a set of letters with variable length. The file names consist of several parameters separated by "_", but they vary in the number of parts. I’m trying to pull some of the parameters out to use separately.

Example file names:

a = "Vel_Mag_ft_modelExisting_350cfs_blah3.tif"
b = "Depth_modelDesign_11000cfs_blah2.tif"

I’m trying to pull out the parts that start with "model" so I end up with

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

"modelExisting"
"modelDesign"

The filenames are stored as a variable in a data.frame
I’ve tried

library(tidyverse)
tibble(files = c(a,b))%>%
  mutate(attempt1 = str_extract(files, "model"),
         attempt2 = str_match(str_split(files, "_"), "model"))

but just ended up with the "model" in all cases and not the "model…." that I need.

The pieces I need are a consisent number of pieces from the end, but I couldn’t figure out how to specify that either. I tried

str_split(files, "_")[-3] 

but this threw an error that it must be size 480 or 1 not size 479

>Solution :

We can create a function to capture the word before the _ and one or more digits (\\1), in the replacement, specify the backreference (\\1) of the captured group

f1 <- function(x) sub(".*_([[:alpha:]]+)_\\d+.*", "\\1", x)

-testing

> f1(a)
[1] "modelExisting"
> f1(b)
[1] "modelDesign"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading