I am trying to create a dataframe of size 1000 in R which would be composed of 10 variables which can all take integer values from 1 to 10.
This is a fairly simple thing to do, for example like this:
library(foreach)
foreach (i = 1:1000, .combine = "rbind") %do% {
sample(1:10, 10)
}
However, what I need is to ensure that each value appears in each column exactly 100 times.
For example, the value "1" must appear 100 times in the first column, 100 times in the second column, 100 times in the third column, and so forth until the tenth column, in which it should also appear 100 times.
Likewise, the value "2" should appear 100 times in the first column, 100 times in the second column and so on, and this continues with all values.
There are many possible combinations of how this could look like and I just want to randomly draw one of these combinations. How could I do that?
>Solution :
You could use replicate with sample:
reps <- rep(1:10, each = 100)
data <- replicate(10, sample(reps, 1000))
# if you want it in a data frame structure:
df <- data.frame(data)
Output
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 6 1 2 9 3 2 4 5 6 5
# [2,] 4 7 6 1 9 5 1 8 5 1
# [3,] 9 9 6 10 2 8 8 7 10 7
# [4,] 4 2 6 7 1 5 8 5 5 8
# [5,] 4 4 4 6 3 1 9 8 3 10
# ....
Check:
apply(data, 2, table)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
1 100 100 100 100 100 100 100 100 100 100
2 100 100 100 100 100 100 100 100 100 100
3 100 100 100 100 100 100 100 100 100 100
4 100 100 100 100 100 100 100 100 100 100
5 100 100 100 100 100 100 100 100 100 100
6 100 100 100 100 100 100 100 100 100 100
7 100 100 100 100 100 100 100 100 100 100
8 100 100 100 100 100 100 100 100 100 100
9 100 100 100 100 100 100 100 100 100 100
10 100 100 100 100 100 100 100 100 100 100