Create dummy variable for below or above median within group r

March 11, 2023

Suppose I have a dataframe like the following:

X Y Z
1 b 3
2 a 8
3 a 7
4 c 1
5 b 6
6 a 4
7 a 9
8 b 5
9 a 4

I want to create columns A and B, which are dummy variables for if the value of Z is above or below the median value of Z within Group Y. So the desired output would be the following:

>Solution :

You can do the following:

library(data.table)
setDT(df)[, `:=`(A = 1*(Z>median(Z)), B=1*(Z<median(Z))), Y]

library(dplyr)
df %>% group_by(Y) %>% mutate(A=1*(Z>median(Z)), B=1*(Z<median(Z)))

Output:

       X      Y     Z     A     B
   <int> <char> <int> <num> <num>
1:     1      b     3     0     1
2:     2      a     8     1     0
3:     3      a     7     0     0
4:     4      c     1     0     0
5:     5      b     6     1     0
6:     6      a     4     0     1
7:     7      a     9     1     0
8:     8      b     5     0     0
9:     9      a     4     0     1

Input:

structure(list(X = 1:9, Y = c("b", "a", "a", "c", "b", "a", "a", 
"b", "a"), Z = c(3L, 8L, 7L, 1L, 6L, 4L, 9L, 5L, 4L)), row.names = c(NA, 
-9L), class = "data.frame")