Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sampling randomly and non-randomly in one sample

Is there a way to sample X number of random rows and X non-random rows in a single sample?
For example, I want to get 1,000 samples of 4 rows of iris. I want to randomly sample 3 rows of iris and the fourth row will be the same one in each sample (this is to mimic a hybrid sampling design).

I can sample 3 random rows 1000x and the fixed row 1000x and then merge the two data frames together, but for a few reasons this is not an ideal situation. The code to do that looks something like the following:

df<- iris

fixed_sample<- iris[7,]

random<- list()
fixed<- list()

counter<- 0
for (i in 1:1000) {
  # sample 4 randomly selected transects 100 time
  tempsample_random<- df[sample(1:nrow(df), 3, replace=F),]
  tempsample_fixed<- fixed_sample[sample(1:nrow(fixed_sample), 1, replace=F), ]
  
  random[[i]]=tempsample_random
  fixed[[i]]=tempsample_fixed
  
  
  counter<- counter+1
  print(counter)
}


random_results<- do.call(rbind, random)
fixed_results<- do.call(rbind, fixed)

From here I would make a new column as a grouping variable and then merge them together based on that group. So every four rows of the final data frame has 3 random rows and row number 7 (fixed_sample) in each sample.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’ve looked into using splitstackshape::stratified, but haven’t gotten it to work the way I need it to. I’ll be doing this over several levels of sampling effort (sample 2, 3, 4, 5 rows, etc. 1,000x each) so it would be ideal to be able to pull the fixed and random rows in the same sample from the beginning.

Any help would be greatly appreciated.

>Solution :

I think you can do this in a single line using lapply. In this case we will draw 3 samples, but you can change seq(3) to seq(1000) to get your 1000 samples. I have followed your example and selected row 7 as the fixed row.

lapply(seq(3), function(i) iris[c(sample(seq(nrow(iris))[-7], 3), 7),])
#> [[1]]
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#> 67           5.6         3.0          4.5         1.5 versicolor
#> 105          6.5         3.0          5.8         2.2  virginica
#> 111          6.5         3.2          5.1         2.0  virginica
#> 7            4.6         3.4          1.4         0.3     setosa
#> 
#> [[2]]
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#> 147          6.3         2.5          5.0         1.9 virginica
#> 131          7.4         2.8          6.1         1.9 virginica
#> 126          7.2         3.2          6.0         1.8 virginica
#> 7            4.6         3.4          1.4         0.3    setosa
#> 
#> [[3]]
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#> 143          5.8         2.7          5.1         1.9  virginica
#> 145          6.7         3.3          5.7         2.5  virginica
#> 60           5.2         2.7          3.9         1.4 versicolor
#> 7            4.6         3.4          1.4         0.3     setosa

Created on 2022-05-18 by the reprex package (v2.0.1)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading