Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

tidyverse: Simulating random sample with nested factor

I want to simulate random sample with nested factor. Factor Dept has two levels A & B. Level A has two nested levels A1 and A2. Level B has three nested levels B1, B2 and B3. Want to simulate random sample from 2022-01-01 to 2022-01-31 using some R code. Part of desired output is given below (from 2022-01-01 to 2022-01-02 only for reference).

library(tibble)

set.seed(12345)
df1 <-
  tibble(
    Date   = c(rep("2022-01-01", 5), rep("2022-01-02", 4), rep("2022-01-03", 4))
  , Dept   = c("A", "A", "B", "B", "B", "A", "B", "B", "B", "A", "A", "B", "B")
  , Prog   = c("A1", "A2", "B1", "B2", "B3", "A1", "B1", "B2", "B3", "A1", "A2", "B2", "B3")
  , Amount = runif(n = 13, min = 50000, max = 100000) 
  )

df1
#> # A tibble: 13 x 4
#>    Date       Dept  Prog  Amount
#>    <chr>      <chr> <chr>  <dbl>
#>  1 2022-01-01 A     A1    86045.
#>  2 2022-01-01 A     A2    93789.
#>  3 2022-01-01 B     B1    88049.
#>  4 2022-01-01 B     B2    94306.
#>  5 2022-01-01 B     B3    72824.
#>  6 2022-01-02 A     A1    58319.
#>  7 2022-01-02 B     B1    66255.
#>  8 2022-01-02 B     B2    75461.
#>  9 2022-01-02 B     B3    86385.
#> 10 2022-01-03 A     A1    99487.
#> 11 2022-01-03 A     A2    51727.
#> 12 2022-01-03 B     B2    57619.
#> 13 2022-01-03 B     B3    86784.

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

If we want to sample randomly, create the expanded data with crossing and then filter/slice to return random rows for each ‘date’

library(dplyr)
library(tidyr)
library(stringr)
crossing(Date = seq(as.Date("2022-01-01"), as.Date("2022-01-31"), 
   by = "1 day"), Dept = c("A", "B"), Prog = 1:3) %>%
   mutate(Prog = str_c(Dept, Prog)) %>%
  filter(Prog != "A3") %>% 
  mutate(Amount = runif(n = n(), min = 50000, max = 100000)) %>% 
  group_by(Date) %>% 
  slice(seq_len(sample(row_number(), 1)))  %>%
  ungroup

-output

# A tibble: 102 × 4
   Date       Dept  Prog  Amount
   <date>     <chr> <chr>  <dbl>
 1 2022-01-01 A     A1    83964.
 2 2022-01-01 A     A2    93428.
 3 2022-01-01 B     B1    85187.
 4 2022-01-01 B     B2    79144.
 5 2022-01-01 B     B3    65784.
 6 2022-01-02 A     A1    86014.
 7 2022-01-03 A     A1    76060.
 8 2022-01-03 A     A2    56412.
 9 2022-01-03 B     B1    87365.
10 2022-01-03 B     B2    66169.
# … with 92 more rows
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading