Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Generate a Subset of Combinations in R

I wish to generate a data set in R mimicking the responses to a 5-variable (x1, x2, x3, x4, x5), 5-level data set (1,2,3,4,5).

I’d like the data set have around n = 15000 responses, and to be characterised by around 75% of the total possible combinations.

Therefore, approximately 75% of 5^5 = 3125 should be covered in the data set of around n = 15000 observations.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Would anyone be able to show me how this can be executed in R, please?

>Solution :

set.seed(42)
data.frame(obs = 1:15000, 
           q = rep(paste0("x",1:5), each = 15000),
           level = sample(1:5, 15000*5, TRUE)) |>
  pivot_wider(names_from = q, values_from = level) 

Produces

# A tibble: 15,000 × 6
     obs    x1    x2    x3    x4    x5
   <int> <int> <int> <int> <int> <int>
 1     1     1     4     4     5     1
 2     2     5     4     3     5     4
 3     3     1     5     4     3     4
 4     4     1     3     5     4     5
 5     5     2     5     5     3     5
 6     6     4     1     5     1     1
 7     7     2     5     2     1     4
 8     8     2     1     2     1     4
 9     9     1     2     3     3     3
10    10     4     5     1     3     5
# ℹ 14,990 more rows

We can add |> unite("combo", x1:x5, remove = FALSE) |> count(combo) to see this covers 3,094 of the possible level combinations, about what you expected.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading