Use sample() function within mutate() and case_when()

Advertisements

Let’s say my input dataset is given by df2:

df2 <- data.frame(a = c(1,NA,6,NA), b = c(2,4,5,1))

a b
1 2
NA 4
6 5
NA 1

I would like to create a third variable called "c" which takes the value of b if a is not missing. If a is missing (row 2 and row 4), c takes randomly the value or 0 or b.

In termes of programmation, I was thinking about doing something like that:

df2 <- df2 %>% 
  mutate(c=case_when(is.na(a) ~ sample(c(0,b),n(),replace=TRUE),
                                  TRUE ~ b))

But it doesn’t give me the result I want.

Any idea?

>Solution :

The sample function won’t vectorize the way you want in this case. We could use if_else instead

df2 %>% 
  mutate(c=case_when(is.na(a) ~ if_else(runif(n()) <.5, 0,b),
                     TRUE ~ b))

We use runif() to draw a random number for each row. If it’s less than .5 we return 0, otherwise we return b. For example

set.seed(369)
df2 %>% 
  mutate(c=case_when(is.na(a) ~ if_else(runif(n()) <.5, 0, b),
                     TRUE ~ b))
#    a b c
# 1  1 2 2
# 2 NA 4 0
# 3  6 5 5
# 4 NA 1 1

Leave a ReplyCancel reply