I wonder how to find the common separator between two-part words as long as the separator is defined within "[^[:alnum:]]+" in a string vector?
For example, in the first vector, the common separator is ".", and in the second vector, the common separator is "_".
Is it possible to have a function that accepts a vector like first or second, and outputs "." or "_"?
first = c("L2DF.L2DA", "L2G.L2DA", "L2L.L2DA", "L2M.L2DA", "L2P.L2DA",
"L2V.L2DA", "L2G.L2DF", "L2L.L2DF", "L2M.L2DF", "L2P.L2DF", "L2V.L2DF",
"L2L.L2G", "L2M.L2G", "L2P.L2G", "L2M.L2L", "L2P.L2L", "L2P.L2M",
"L2R.L2DA", "L2R.L2DF", "L2R.L2G", "L2R.L2L", "L2R.L2M", "L2R.L2P",
"L2V.L2R", "L2V.L2G", "L2V.L2L", "L2V.L2M", "L2V.L2P")
second = c("L2DF_L2DA", "L2G_L2DA", "L2L_L2DA", "L2M_L2DA", "L2P_L2DA",
"L2V_L2DA", "L2G_L2DF", "L2L_L2DF", "L2M_L2DF", "L2P_L2DF", "L2V_L2DF",
"L2L_L2G", "L2M_L2G", "L2P_L2G", "L2M_L2L", "L2P_L2L", "L2P_L2M",
"L2R_L2DA", "L2R_L2DF", "L2R_L2G", "L2R_L2L", "L2R_L2M", "L2R_L2P",
"L2V_L2R", "L2V_L2G", "L2V_L2L", "L2V_L2M", "L2V_L2P")
>Solution :
You could have something like this:
sep_extract <- \(s) stringr::str_extract_all(s, "[^[:alnum:]]") |> unlist() |> unique()
# or using base R:
sep_extract <- \(s) gsub("[a-zA-Z0-9]", "", s) |> unique()
sep_extract(first) # [1] "."
sep_extract(second) # [1] "_"
Notes:
- This will only work if you know the only non-alphanumerics in your strings are separators. If that’s not the case, you would have to specify which is which, or use a more complicated regex.
- You can remove the
+from the regex if you usestr_extract_all(), as it will pick up the second one regardless. - If you’d prefer to keep each combination as it’s own thing, you can remove
unlist().