Suppose I have a dataframe like the following:
X Y Z
1 b 3
2 a 8
3 a 7
4 c 1
5 b 6
6 a 4
7 a 9
8 b 5
9 a 4
I want to create columns A and B, which are dummy variables for if the value of Z is above or below the median value of Z within Group Y. So the desired output would be the following:
X Y Z A B
1 b 3 0 1
2 a 8 1 0
3 a 7 0 0
4 c 1 0 0
5 b 6 1 0
6 a 4 0 1
7 a 9 1 0
8 b 5 0 0
9 a 4 0 1
>Solution :
You can do the following:
library(data.table)
setDT(df)[, `:=`(A = 1*(Z>median(Z)), B=1*(Z<median(Z))), Y]
or
library(dplyr)
df %>% group_by(Y) %>% mutate(A=1*(Z>median(Z)), B=1*(Z<median(Z)))
Output:
X Y Z A B
<int> <char> <int> <num> <num>
1: 1 b 3 0 1
2: 2 a 8 1 0
3: 3 a 7 0 0
4: 4 c 1 0 0
5: 5 b 6 1 0
6: 6 a 4 0 1
7: 7 a 9 1 0
8: 8 b 5 0 0
9: 9 a 4 0 1
Input:
structure(list(X = 1:9, Y = c("b", "a", "a", "c", "b", "a", "a",
"b", "a"), Z = c(3L, 8L, 7L, 1L, 6L, 4L, 9L, 5L, 4L)), row.names = c(NA,
-9L), class = "data.frame")