I have a table and would like to randomly flag three records from each group with a 1 and all other records with a 0.
I know that I can accomplish this using the following code, but this seems clunky and inefficient. Is there any other way I can accomplish the same thing?
library(tidyverse)
dat <- data.frame(row_id = 1:10,
grp = c(rep("a", 5), rep("b", 5)))
dat_sample <- dat %>%
group_by(grp) %>%
sample_n(3) %>%
mutate(val = 1)
dat %>%
left_join(dat_sample, by = c("row_id", "grp")) %>%
mutate(val = coalesce(val, 0))
>Solution :
An option is with mutate instead of a join – i.e. grouped by ‘grp’, sample the row_number() and create a logical vector, which is coerced to binary with as.integer or +
library(dplyr)
dat %>%
group_by(grp) %>%
mutate(val = +(row_number() %in% sample(row_number(), 3))) %>%
ungroup
Or perhaps
dat %>%
group_by(grp) %>%
mutate(val = rbinom(n(), 1, 0.3)) %>%
ungroup