How can I extract 1.10E+93, 1012055, 1018598,AOL,WDF985 from the strings vector as below using pattern? I want to extract two times 1.10E+93 and 1012055 and one time other values i.e., extract one value for each string.
string from which to extract values
strings <- c('/ccr/1.10E+93_ccrdt/indices/1.10E+93_ccr_ann_123.csv',
'/ccr/1.10E+93_ccrdt/indices/1.10E+93_obsrst_ann.csv',
'/ccr/1012055_obsrt/indices/1012055_obsrrt.csv',
'/ccr/1012055_obsrt/indices/1012055_ccr_ann.csv',
'/ccr/1018598_obsrt/indices/1018598_obsrrt.csv',
'/ccr/AOL_obsrt/indices/AOL_rrst.csv',
'/ccr/WDF985_obsrt/indices/WDF985_rrst.csv')
>Solution :
There are lot of ways to do this (Copilot or ChatGPT might give you a good answer.) Maybe you want
strings |>
## remove everything up to the last slash
stringr::str_remove("^.*/") |>
## remove underscore and everything after it
stringr::str_remove("_.*$")
[1] "1.10E+93" "1.10E+93" "1012055" "1012055" "1018598" "AOL" "WDF985"
Or
stringr::str_extract(strings, "/([^/_]*)_", group = TRUE)
Or (base R)
gsub("^.*/([^_]*)_.*$", "\\1", strings)
You can also use stringr::str_extract(). Lookbehind/lookahead components of the regular expression are useful for requirements of the form "extract all characters between [A] and [B] but don’t include [A] or [B] in the result"