I want to create a score based on dummy variables, where different combinations add up to a certain ‘stringency’.
set.seed(2)
df <- data.frame(id = 1:20,
a = rbinom(20, 1, 0.6),
b = rbinom(20, 1, 0.6),
c = rbinom(20, 1, 0.6),
d = rbinom(20, 1, 0.6),
e = rbinom(20, 1, 0.6))
Which looks like
id a b c d e
1 1 0 0 0 1
2 0 1 1 0 0
3 1 0 1 0 1
4 1 1 1 1 1
5 0 1 0 0 1
6 0 1 0 1 0
7 1 1 0 1 0
8 0 1 1 1 1
9 1 0 1 1 0
10 1 1 0 1 1
11 1 1 1 1 0
12 1 1 1 1 1
13 0 0 0 1 1
14 1 0 0 1 1
15 1 1 1 1 1
16 0 0 0 0 1
17 0 0 0 1 1
18 1 1 0 0 1
19 1 0 0 1 1
20 1 1 0 1 1
Now I am trying to create the following variable:
df <- df %>% mutate(stringency = case_when(a == 1 ~ 3,
a == 1 & b == 1 ~ 6,
a == 1 & c == 1 ~ 6,
a == 1 & b == 1 & c ~ 7,
a == 1 & d == 1 ~ 11,
a == 1 & e == 1 ~ 9,
a == 1 & b == 1 & d == 1 ~ 9,
a == 1 & b == 1 & e == 1 ~ 9,
TRUE ~ 0))
However, this produces a result where only the first argument works (a == 1 ~ 3)
id a b c d e stringency
1 1 0 0 0 1 3
2 0 1 1 0 0 0
3 1 0 1 0 1 3
4 1 1 1 1 1 3
5 0 1 0 0 1 0
6 0 1 0 1 0 0
7 1 1 0 1 0 3
8 0 1 1 1 1 0
9 1 0 1 1 0 3
10 1 1 0 1 1 3
11 1 1 1 1 0 3
12 1 1 1 1 1 3
13 0 0 0 1 1 0
14 1 0 0 1 1 3
15 1 1 1 1 1 3
16 0 0 0 0 1 0
17 0 0 0 1 1 0
18 1 1 0 0 1 3
19 1 0 0 1 1 3
20 1 1 0 1 1 3
What I want is that it ‘builds up’: if you have just a, you get 3; if you have a and b, you get 6; etc.
Any ideas on how I can do this? Many thanks
>Solution :
You have to pay attention to the order of your conditions. case_when will stop if the first time a condition is TRUE and will not evaluate the rest. Therefore you want your most complex conditions at the beginning and a == 1 at the end.
library(dplyr)
set.seed(2)
df <- data.frame(id = 1:20,
a = rbinom(20, 1, 0.6),
b = rbinom(20, 1, 0.6),
c = rbinom(20, 1, 0.6),
d = rbinom(20, 1, 0.6),
e = rbinom(20, 1, 0.6))
df <- df %>% mutate(stringency = case_when(a == 1 & b == 1 & c == 1 ~ 7,
a == 1 & b == 1 & d == 1 ~ 9,
a == 1 & b == 1 & e == 1 ~ 9,
a == 1 & d == 1 ~ 11,
a == 1 & e == 1 ~ 9,
a == 1 & b == 1 ~ 6,
a == 1 & c == 1 ~ 6,
a == 1 ~ 3,
TRUE ~ 0))
df
#> id a b c d e stringency
#> 1 1 1 0 0 0 1 9
#> 2 2 0 1 1 0 0 0
#> 3 3 1 0 1 0 1 9
#> 4 4 1 1 1 1 1 7
#> 5 5 0 1 0 0 1 0
#> 6 6 0 1 0 1 0 0
#> 7 7 1 1 0 1 0 9
#> 8 8 0 1 1 1 1 0
#> 9 9 1 0 1 1 0 11
#> 10 10 1 1 0 1 1 9
#> 11 11 1 1 1 1 0 7
#> 12 12 1 1 1 1 1 7
#> 13 13 0 0 0 1 1 0
#> 14 14 1 0 0 1 1 11
#> 15 15 1 1 1 1 1 7
#> 16 16 0 0 0 0 1 0
#> 17 17 0 0 0 1 1 0
#> 18 18 1 1 0 0 1 9
#> 19 19 1 0 0 1 1 11
#> 20 20 1 1 0 1 1 9
Created on 2023-04-05 by the reprex package (v2.0.1)
But as you can see in row 4, the result is 7, but if you change the order of the conditions it could be all other values as well, so you need to add some more conditions for clarity.