Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Multiple patterns matching with a column by dplyr

I’d like to create a column that is derived from a column that is character typed. I have some set of patterns, which is possible to accept and the others shouldn’t be accepted.

Here is what I tried:

library(dplyr)

set.seed(1)

index <- sample(1:nrow(iris),10)

iris2 <- iris[index,]

required_cols <- c('ersicol','inic')

iris2 %>% 
mutate(logical_column = case_when(any(sapply(required_cols,grepl,x = Species)) ~ 'WORKED',
                                  TRUE ~ 'NOT_WORKED'))

In this case, all logical_column is marked as ‘WORKED’ but only ‘ersicol’ or ‘inic’ pattern including observations should be marked as ‘WORKED’.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The desired output should be like:

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species    logical_column
          <dbl>       <dbl>        <dbl>       <dbl> <fct>      <chr>         
 1          5.8         2.7          4.1         1   versicolor WORKED        
 2          6.4         2.8          5.6         2.1 virginica  WORKED        
 3          4.4         3.2          1.3         0.2 setosa     NOT_WORKED        
 4          4.3         3            1.1         0.1 setosa     NOT_WORKED        
 5          7           3.2          4.7         1.4 versicolor WORKED        
 6          5.4         3            4.5         1.5 versicolor WORKED        
 7          5.4         3.4          1.7         0.2 setosa     NOT_WORKED        
 8          7.6         3            6.6         2.1 virginica  WORKED        
 9          6.1         2.8          4.7         1.2 versicolor WORKED        
10          4.6         3.4          1.4         0.3 setosa     NOT_WORKED    

Thanks in advance.

>Solution :

The any is the key here. It just takes from the full data, instead, use rowwise if we want to use the OP’s code

library(dplyr)
iris2 %>%
    rowwise %>%
    mutate(logical_column = case_when(any(sapply(required_cols,
           grepl,x = Species)) 
         ~ 'WORKED',
    
                                  TRUE ~ 'NOT_WORKED')) %>%
    ungroup

-output

# A tibble: 10 × 6
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species    logical_column
          <dbl>       <dbl>        <dbl>       <dbl> <fct>      <chr>         
 1          5.8         2.7          4.1         1   versicolor WORKED        
 2          6.4         2.8          5.6         2.1 virginica  WORKED        
 3          4.4         3.2          1.3         0.2 setosa     NOT_WORKED    
 4          4.3         3            1.1         0.1 setosa     NOT_WORKED    
 5          7           3.2          4.7         1.4 versicolor WORKED        
 6          5.4         3            4.5         1.5 versicolor WORKED        
 7          5.4         3.4          1.7         0.2 setosa     NOT_WORKED    
 8          7.6         3            6.6         2.1 virginica  WORKED        
 9          6.1         2.8          4.7         1.2 versicolor WORKED        
10          4.6         3.4          1.4         0.3 setosa     NOT_WORKED  

It may be more efficient, if we make use of vectorized options – paste (str_c) the ‘required_cols’ to a single string (collapse = "|"), use str_detect to check if the substring present, convert it to numeric index (+1) and make use of the index for replacing a vector

library(stringr)
iris2 %>% 
   mutate(logical_column = c("NOT_WORKED", "WORKED")[
     1 + str_detect(Species, str_c(required_cols, collapse = "|"))])

-output

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species logical_column
68           5.8         2.7          4.1         1.0 versicolor         WORKED
129          6.4         2.8          5.6         2.1  virginica         WORKED
43           4.4         3.2          1.3         0.2     setosa     NOT_WORKED
14           4.3         3.0          1.1         0.1     setosa     NOT_WORKED
51           7.0         3.2          4.7         1.4 versicolor         WORKED
85           5.4         3.0          4.5         1.5 versicolor         WORKED
21           5.4         3.4          1.7         0.2     setosa     NOT_WORKED
106          7.6         3.0          6.6         2.1  virginica         WORKED
74           6.1         2.8          4.7         1.2 versicolor         WORKED
7            4.6         3.4          1.4         0.3     setosa     NOT_WORKED
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading