Cannot get conditional case_when() to work applied to new variable created with `!!` in mutate

Advertisements

As you may have surmised I was struggling to describe this problem. I want to calculate a new variable including its name by summing across a subset of existing columns, defined by a small element of the name of these variables, THEN calculate a new conditional variable from the newly-created variable. As always, easier to show via example. Toy data. Seven columns in total, one an ID variable then three columns for a cannabis measure (defined by the letters ‘cann’ in the centre of the column name), then three for an alcohol measure (defined by the letters ‘alc’ in the same location).

set.seed(1)

d <- data.frame(id = letters[1:10],
                q1_cann_a = round(rnorm(10),1),
                q1_cann_b = round(rnorm(10),1),
                q1_cann_c = round(rnorm(10),1),
                q1_alc_a = round(rnorm(10),1),
                q1_alc_b = round(rnorm(10),1),
                q1_alc_c = round(rnorm(10),1))

d

# output
#    id q1_cann_a q1_cann_b q1_cann_c q1_alc_a q1_alc_b q1_alc_c
# 1   a      -0.6       1.5       0.9      1.4     -0.2      0.4
# 2   b       0.2       0.4       0.8     -0.1     -0.3     -0.6
# 3   c      -0.8      -0.6       0.1      0.4      0.7      0.3
# 4   d       1.6      -2.2      -2.0     -0.1      0.6     -1.1
# 5   e       0.3       1.1       0.6     -1.4     -0.7      1.4
# 6   f      -0.8       0.0      -0.1     -0.4     -0.7      2.0
# 7   g       0.5       0.0      -0.2     -0.4      0.4     -0.4
# 8   h       0.7       0.9      -1.5     -0.1      0.8     -1.0
# 9   i       0.6       0.8      -0.5      1.1     -0.1      0.6
# 10  j      -0.3       0.6       0.4      0.8      0.9     -0.1

Now say I want to calculate the sum of the three cannabis columns, so I created a function where I can pass the string at the centre of each set of three variable names into a function that creates a new variable name out of that string with "_total" pasted to the end. THAT part I can do. The next step, which I cannot make work, is to then use that newly created variable to create a new conditional variable, in this case, if the sum of the three variables is > 0, the element is "positive" if not > 0, "negative".

sumFunct <- function(data, drug) {
d %>%
  rowwise %>%
     mutate(!!paste0(drug, "_total") := sum(c_across(contains(drug))),
            !!paste0(drug, "_any") := factor(case_when(!!paste0(drug, "_total") > 0 ~ "positive",
                                                       TRUE ~ "negative"),
                                             levels = c("negative",
                                                        "positive")))
}

sumFunct(d, "cann")

# A tibble: 10 × 9
# Rowwise: 
#   id    q1_cann_a q1_cann_b q1_cann_c q1_alc_a q1_alc_b q1_alc_c cann_total cann_any
#   <chr>     <dbl>     <dbl>     <dbl>    <dbl>    <dbl>    <dbl>      <dbl> <fct>   
# 1 a          -0.6       1.5       0.9      1.4     -0.2      0.4        1.8 positive
# 2 b           0.2       0.4       0.8     -0.1     -0.3     -0.6        1.4 positive
# 3 c          -0.8      -0.6       0.1      0.4      0.7      0.3       -1.3 positive
# 4 d           1.6      -2.2      -2       -0.1      0.6     -1.1       -2.6 positive
# 5 e           0.3       1.1       0.6     -1.4     -0.7      1.4        2   positive
# 6 f          -0.8       0        -0.1     -0.4     -0.7      2         -0.9 positive
# 7 g           0.5       0        -0.2     -0.4      0.4     -0.4        0.3 positive
# 8 h           0.7       0.9      -1.5     -0.1      0.8     -1          0.1 positive
# 9 i           0.6       0.8      -0.5      1.1     -0.1      0.6        0.9 positive
# 10 j         -0.3       0.6       0.4      0.8      0.9     -0.1        0.7 positive

As you can see the first part worked fine, the name of the conditional worked, but the conditional itself failed. I’m pretty sure it has something to do with the restatement of the first new variable in the portion of the syntax for the calculation of the second variable to the right of the := but I don’t knwo how to fix it. I have terrible trouble with tidyeval stuff so any help much appreciated. I’d also take advice on how to better name this post.

>Solution :

This is how I would solve it.

  • Instead of rowwise and sum, I would use rowSums.
  • When selecting column in your second condition !!paste0(drug, "_total") is incorrect, instead use .data.
library(dplyr)

sumFunct <- function(data, drug) {
  d %>%
    mutate(!!paste0(drug, "_total") := rowSums(pick(contains(drug))),
           !!paste0(drug, "_any") := factor(
             case_when(.data[[paste0(drug, "_total")]] > 0 ~ "positive",
                       TRUE ~ "negative"),levels = c("negative","positive")))
}

sumFunct(d, "cann")
#   id q1_cann_a q1_cann_b q1_cann_c q1_alc_a q1_alc_b q1_alc_c cann_total cann_any
#1   a      -0.6       1.5       0.9      1.4     -0.2      0.4        1.8 positive
#2   b       0.2       0.4       0.8     -0.1     -0.3     -0.6        1.4 positive
#3   c      -0.8      -0.6       0.1      0.4      0.7      0.3       -1.3 negative
#4   d       1.6      -2.2      -2.0     -0.1      0.6     -1.1       -2.6 negative
#5   e       0.3       1.1       0.6     -1.4     -0.7      1.4        2.0 positive
#6   f      -0.8       0.0      -0.1     -0.4     -0.7      2.0       -0.9 negative
#7   g       0.5       0.0      -0.2     -0.4      0.4     -0.4        0.3 positive
#8   h       0.7       0.9      -1.5     -0.1      0.8     -1.0        0.1 positive
#9   i       0.6       0.8      -0.5      1.1     -0.1      0.6        0.9 positive
#10  j      -0.3       0.6       0.4      0.8      0.9     -0.1        0.7 positive

In your approach if you do the changes according to point 2) it should work as well.

Leave a ReplyCancel reply