Home Regex get string between intervals underscores

Questions

Regex get string between intervals underscores

October 3, 2022

I’ve seen a lot of similar questions, but I wasn’t able to get the desired output.

I have a string means_variab_textimput_x2_200.txt and I want to catch ONLY what is between the third and fourth underscores: textimput

I’m using R, stringr, I’ve tried many things, but none solved the issue:

my_string <- "means_variab_textimput_x2_200.txt"

str_extract(my_string, '[_]*[^_]*[_]*[^_]*[_]*[^_]*')
"means_variab_textimput"

str_extract(my_string, '^(?:([^_]+)_){4}')
"means_variab_textimput_x2_"
str_extract(my_string, '[_]*[^_]*[_]*[^_]*[_]*[^_]*\\.') ## the closer I got was this
"_textimput_x2_200."

Any ideas? Ps: I’m VERY new to Regex, so details would be much appreciated 🙂

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.
Visit Medevel
additional question: can I also get only a "part" of the word? let’s say, instead of textimput only text but without counting the words? It would be good to know both possibilities
this this one this one were helpful, but I couldn’t get the final expected results. Thanks in advance.

>Solution :

stringr uses ICU based regular expressions. Therefore, an option would be to use regex lookarounds, but here the length is not fixed, thus (?<= wouldn’t work. Another option is to either remove the substrings with str_remove or use str_replace to match and capture the third word which doesn’t have the _ ([^_]+) and replace with the backreference (\\1) of the captured word

library(stringr)
str_replace(my_string, "^[^_]+_[^_]+_([^_]+)_.*", "\\1") 
[1] "textimput"

If we need only the substring

str_replace(my_string, "^[^_]+_[^_]+_([^_]{4}).*", "\\1") 
[1] "text"

In base R, it is easier with strsplit and get the third word with indexing

strsplit(my_string, "_")[[1]][3]
# [1] "textimput"

Or use perl = TRUE in regexpr

regmatches(my_string, regexpr("^([^_]+_){2}\\K[^_]+", my_string, perl = TRUE))
# [1] "textimput"

For the substring

regmatches(my_string, regexpr("^([^_]+_){2}\\K[^_]{4}", my_string, perl = TRUE))
[1] "text"

stringr

byMR

Published October 03, 2022

Add a comment

How to upgrade PowerShell version

byMR

October 3, 2022

Questions

convert pandas dataframe multi values column into separate rows

byMR

October 3, 2022

Questions

speeding up a double loop on pandas' date frame

byMR

October 3, 2022

Questions

Promise.all does not seem to run promises in parallel

byMR

October 3, 2022

Questions

JS Why Won't This Console.Log Evaluate Properly?

byMR

October 3, 2022

Questions

How do I convert a multiple strings into a set of strings?

byMR

October 3, 2022

Regex get string between intervals underscores

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to upgrade PowerShell version

convert pandas dataframe multi values column into separate rows

speeding up a double loop on pandas' date frame

Promise.all does not seem to run promises in parallel

JS Why Won't This Console.Log Evaluate Properly?

How do I convert a multiple strings into a set of strings?

Keep Up to Date with the Most Important News

Regex get string between intervals underscores

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to upgrade PowerShell version

convert pandas dataframe multi values column into separate rows

speeding up a double loop on pandas' date frame

Promise.all does not seem to run promises in parallel

JS Why Won't This Console.Log Evaluate Properly?

How do I convert a multiple strings into a set of strings?

Discover more from Dev solutions