Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: How to filter rows where the start of entries match a list

I have a very large dataframe (a snippet of it below) that I would like to filter. I would only like to keep the rows if at least one entry in the columns that start with "i10_pr" start with any values I have in list.

Here is an example of my dataframe and the list

#>     ID      visitorder i10_pr1 i10_pr2 i10_pr3 i10_pr4 i10_pr5 sum
#> 1   7466851          3 BW28ZZZ BR30Y0Z BR39Y0Z                   0
#> 2   7023336          1 0BDC8ZX 0BDC8W7 07D78ZX                   0
#> 3   2481935          3 5A09357 3C1ZX8Z 06HN33Z B54CZZA 0W993ZX   1
#> 4   4605446          1 5A1955Z 0BH17EZ 03HY32Z 02HV33Z GZ58ZZZ   3
#> 5   7287173          2 

list <- c("0B", "ND", "3J", "BW", "BR")                                           0

The output would look like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

#>     ID      visitorder i10_pr1 i10_pr2 i10_pr3 i10_pr4 i10_pr5 sum
#> 1   7466851          3 BW28ZZZ BR30Y0Z BR39Y0Z                   0
#> 2   7023336          1 0BDC8ZX 0BDC8W7 07D78ZX                   0
#> 4   4605446          1 5A1955Z 0BH17EZ 03HY32Z 02HV33Z GZ58ZZZ   3

>Solution :

We may create a regex pattern to filter the columns that starts_with 'i10_pr' using if_any to find rows having at least one match

library(stringr)
library(dplyr)
v1 <- c("0B", "ND", "3J", "BW", "BR")
pat <- sprintf('^(%s)', str_c(v1, collapse = "|"))
df1 %>%
    filter(if_any(starts_with("i10_pr"), ~ str_detect(.x, pat)))

-output

       ID visitorder i10_pr1 i10_pr2 i10_pr3 i10_pr4 i10_pr5 sum
1 7466851          3 BW28ZZZ BR30Y0Z BR39Y0Z    <NA>    <NA>   0
2 7023336          1 0BDC8ZX 0BDC8W7 07D78ZX    <NA>    <NA>   0
4 4605446          1 5A1955Z 0BH17EZ 03HY32Z 02HV33Z GZ58ZZZ   3

Or extract the first two characters and use %in%

df1 %>% 
    filter(if_any(starts_with("i10_pr"), ~ substr(.x, 1, 2) %in% v1))

-output

       ID visitorder i10_pr1 i10_pr2 i10_pr3 i10_pr4 i10_pr5 sum
1 7466851          3 BW28ZZZ BR30Y0Z BR39Y0Z    <NA>    <NA>   0
2 7023336          1 0BDC8ZX 0BDC8W7 07D78ZX    <NA>    <NA>   0
4 4605446          1 5A1955Z 0BH17EZ 03HY32Z 02HV33Z GZ58ZZZ   3

data

df1 <- structure(list(ID = c(7466851L, 7023336L, 2481935L, 4605446L), 
    visitorder = c(3L, 1L, 3L, 1L), i10_pr1 = c("BW28ZZZ", "0BDC8ZX", 
    "5A09357", "5A1955Z"), i10_pr2 = c("BR30Y0Z", "0BDC8W7", 
    "3C1ZX8Z", "0BH17EZ"), i10_pr3 = c("BR39Y0Z", "07D78ZX", 
    "06HN33Z", "03HY32Z"), i10_pr4 = c(NA, NA, "B54CZZA", "02HV33Z"
    ), i10_pr5 = c(NA, NA, "0W993ZX", "GZ58ZZZ"), sum = c(0L, 
    0L, 1L, 3L)), class = "data.frame", row.names = c("1", "2", 
"3", "4"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading