Pattern matching character strings when everything but the last two elements of the character are the same

I have the following vector.

column_names <- c("6Li", "7Li", "10B", "11B", "7Li.1",
                  "205Pb", "206Pb", "207Pb", "238U",
                  "206Pb.1", "238U.1")

Notice that some of the values are just duplicates with a ".1" stuck at the end. I want to index out all of these character strings along with their corresponding character strings that match such that only the following are returned.

#[1] "7Li"     "7Li.1"   "206Pb"   "238U"    "206Pb.1" "238U.1" 

Assume you don’t know the index positions and so you cannot simply index these values out as follows column_names[c(2,5,7,9,10,11)]. How can I use pattern matching to extract these values?

>Solution :

There is likely a more elegant solution, but in base R you cold try a combination of grep/gsub and paste:

idx <- grep(paste(gsub("\\.1", "", column_names[grep("\\.1", column_names)]), collapse = "|"), column_names)
# [1]  2  5  7  9 10 11

column_names[idx]
# [1] "7Li"     "7Li.1"   "206Pb"   "238U"    "206Pb.1" "238U.1" 

Leave a Reply