Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to select the columns of a dataframe based on a vector of strings, matching for exact coincidence?

I have a dataframe with the followign column names:

NewYork_10
NewYork_20
NewYork3_10
NewYork3_20
NewYork4_10
NewYork4_20
HongKong_10
HongKong_20
SanFrancisco_10
SanFrancisco_20

And I have a vector:

list <- c("NewYork", "SanFrancisco")

I want a script that creates a new dataframe, selecting those columns that have the exact same string before the underscore.
In the example given above, you would get a new dataframe with the following columns.
NewYork_10
NewYork_20
SanFrancisco_10
SanFrancisco_20

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I did several tries with grep:

dplyr::select(matches(list_cities))

dplyr::select(matches(paste0(list_cities), "_"))

And even using anchors for a vector, which I’m not sure is possible.

dplyr::select(matches(paste0("^",list_cities, "_.*")))

But in every case it’s capturing all the values of the vector that start with the given substring.

>Solution :

We can also use matches

df %>%
    select(matches("(NewYork)|(SanFrancisco)_.*")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading