I have a dataframe with the followign column names:
NewYork_10
NewYork_20
NewYork3_10
NewYork3_20
NewYork4_10
NewYork4_20
HongKong_10
HongKong_20
SanFrancisco_10
SanFrancisco_20
And I have a vector:
list <- c("NewYork", "SanFrancisco")
I want a script that creates a new dataframe, selecting those columns that have the exact same string before the underscore.
In the example given above, you would get a new dataframe with the following columns.
NewYork_10
NewYork_20
SanFrancisco_10
SanFrancisco_20
I did several tries with grep:
dplyr::select(matches(list_cities))
dplyr::select(matches(paste0(list_cities), "_"))
And even using anchors for a vector, which I’m not sure is possible.
dplyr::select(matches(paste0("^",list_cities, "_.*")))
But in every case it’s capturing all the values of the vector that start with the given substring.
>Solution :
We can also use matches
df %>%
select(matches("(NewYork)|(SanFrancisco)_.*")