Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Use case_when() in R with multiple conditional rules and multiple columns

I need to create a new column (insider_class) sorting data from a data.frame` based on specific rules using two columns as a reference.

I have a column with several parameters (parameter) and another with values (value).

The rule is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

If value pH >=6 and <=9 then insider_class=yes, if not then insider_class=no

If value DO >= 5.0 then insider_class=yes

I tried that, but some pH values don’t respect the rule.

dput –>

df<-structure(list(Estacao2 = c("1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1", "1", "1", "10", "10"), parameter = c("pH", 
"DO", "pH", "DO", "pH", "DO", "pH", "DO", "pH", "DO", "pH", "DO", 
"pH", "DO", "pH", "DO", "pH", "DO", "pH", "DO", "pH", "DO", "pH", 
"DO", "pH", "DO", "pH", "DO", "pH", "DO"), value = c(4.475, 7.2, 
5.65, 5.15, 6.65, 6.425, 6.4, 6.56, 6.05, 5.533, 5.75, 5.825, 
5.625, 6.25, 5.833, 6.2, 5.35, 4.3, 5.867, 5.8, 5.375, 7.4, 5.6, 
6.45, 5.55, 6.625, 6.033, 7.667, 7.438, 7.312)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -30L), groups = structure(list(
    Estacao2 = c("1", "10"), .rows = structure(list(1:28, 29:30), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))

code:

df2<-df%>%mutate(inside_class = case_when(
    (parameter=='pH'& value %in% c(6.00:9.00) ~ 'yes'),
    (parameter=='DO' & value>=5.0 ~'yes'),
    TRUE~'no'
  ))

enter image description here

>Solution :

You should be aware of the fact that c(6:9) means c(6, 7, 8, 9). You could use dplyr::between in this case like this:

library(dplyr)

df %>% 
  mutate(inside_class = case_when(
  parameter == 'pH'& between(value, 6, 9)  ~ 'yes',
  parameter == 'DO' & value >= 5.0 ~'yes',
  TRUE ~ 'no'
))

# A tibble: 30 × 4
# Groups:   Estacao2 [2]
   Estacao2 parameter value inside_class
   <chr>    <chr>     <dbl> <chr>       
 1 1        pH         4.47 no          
 2 1        DO         7.2  yes         
 3 1        pH         5.65 no          
 4 1        DO         5.15 yes         
 5 1        pH         6.65 yes         
 6 1        DO         6.42 yes         
 7 1        pH         6.4  yes         
 8 1        DO         6.56 yes         
 9 1        pH         6.05 yes         
10 1        DO         5.53 yes         
# … with 20 more rows
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading