Creating multiple training subsets using sample() in R

I have a training dataset that consists of 60,000 observations that I want to create 9 subset training sets from. I want to sample randomly without replacement; I need 3 datasets of 500 observations, 3 datasets of 1,000 observations, and 3 datasets of 2,000 observations.

enter image description here

How can I do this using sample() in R?

>Solution :

Given your data.frame is named df you do:

sample_sizes <- c(rep(500,3), rep(1000,3), rep(2000,3))
sampling <- sample(60000, sum(sample_sizes))

training_sets <- split(df[sampling,], rep(1:9, sample_sizes)) 

This do sampling without replacement over all dataset.
If you want sampling without replacement in each training set (but not through all training sets):

sample_sizes <- c(rep(500,3), rep(1000,3), rep(2000,3))
sampling <- do.call(c, lapply(sample_sizes, function(i) sample(60000, i)))
training_sets <- split(df[sampling,], rep(1:9, sample_sizes)) 

Leave a Reply