Extract everything before second delimiter in R

July 11, 2023

Building off this previous post:
How to extract string after 2nd delimiter in R

Have a string like the following:

"dat1/set1/set1_covars.csv"

And want to extract all the values before the second / like:

"dat1/set1/"

I was using variations of:

sub("^([^/]+/){2}", "", "dat1/set1/set1_covars.csv")

With ^ and .* moved around in different places, but just can’t seem to get the syntax right.

Any help would be appreciated.

>Solution :

This seems to work:

sub("(^([^/]+/){2}).*$", "\\1", "dat1/set1/set1_covars.csv")

add () around the stuff that defines the stuff up to the second delimiter;
add .*$ to include the rest of the line;
replace the blank-string replacement with a replacement by the first capture group.

@GregorThomas points out that for this example dirname() would work, but not if your tree is deeper.

Alternatively:

stringr::str_extract("dat1/set1/set1_covars.csv", "^([^/]+/){2}")

It seemed as though you could also do this with a lookbehind expression (i.e., define a pattern "(?<=^([^/]+/){2}).*$" that says ".*$ preceded by two delimiters, but don’t count delimiter stuff in the matched expression") and replacing with a blank, but we run into trouble:

repetition quantifiers (i.e. "{2}") aren’t allowed in lookbehind expressions
if we spell out the repetition explicitly ("(?<=^[^/]+/[^/]+/).*$") and specify perl = TRUE then it notes we’re only allowed to use fixed-length expressions
lookahead/lookbehind always hurts my brain anyway