Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Creating multiple training subsets using sample() in R

I have a training dataset that consists of 60,000 observations that I want to create 9 subset training sets from. I want to sample randomly without replacement; I need 3 datasets of 500 observations, 3 datasets of 1,000 observations, and 3 datasets of 2,000 observations.

enter image description here

How can I do this using sample() in R?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Given your data.frame is named df you do:

sample_sizes <- c(rep(500,3), rep(1000,3), rep(2000,3))
sampling <- sample(60000, sum(sample_sizes))

training_sets <- split(df[sampling,], rep(1:9, sample_sizes)) 

This do sampling without replacement over all dataset.
If you want sampling without replacement in each training set (but not through all training sets):

sample_sizes <- c(rep(500,3), rep(1000,3), rep(2000,3))
sampling <- do.call(c, lapply(sample_sizes, function(i) sample(60000, i)))
training_sets <- split(df[sampling,], rep(1:9, sample_sizes)) 

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading