Building off this previous post:
How to extract string after 2nd delimiter in R
Have a string like the following:
"dat1/set1/set1_covars.csv"
And want to extract all the values before the second / like:
"dat1/set1/"
I was using variations of:
sub("^([^/]+/){2}", "", "dat1/set1/set1_covars.csv")
With ^ and .* moved around in different places, but just can’t seem to get the syntax right.
Any help would be appreciated.
>Solution :
This seems to work:
sub("(^([^/]+/){2}).*$", "\\1", "dat1/set1/set1_covars.csv")
- add
()around the stuff that defines the stuff up to the second delimiter; - add
.*$to include the rest of the line; - replace the blank-string replacement with a replacement by the first capture group.
@GregorThomas points out that for this example dirname() would work, but not if your tree is deeper.
Alternatively:
stringr::str_extract("dat1/set1/set1_covars.csv", "^([^/]+/){2}")
It seemed as though you could also do this with a lookbehind expression (i.e., define a pattern "(?<=^([^/]+/){2}).*$" that says ".*$ preceded by two delimiters, but don’t count delimiter stuff in the matched expression") and replacing with a blank, but we run into trouble:
- repetition quantifiers (i.e. "{2}") aren’t allowed in lookbehind expressions
- if we spell out the repetition explicitly (
"(?<=^[^/]+/[^/]+/).*$") and specifyperl = TRUEthen it notes we’re only allowed to use fixed-length expressions - lookahead/lookbehind always hurts my brain anyway