Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using sample() to sample from nested lists in R

I’m looking for a way to use sample() to sample values from different lists based on a value in another column of a data.table – at the moment I’m getting a recursive indexing failed error – code and more explanation below:

First set up example data:

library(stats)
library(data.table)

# list of three different nest survival rates
survival<-list(0.91,0.95,0.99)

# incubation period
inc.period<-28

# then set up function to use the geometric distribution to generate 3 lists of incubation outcomes based on the nest survivals and incubation period above.
# e.g. less than 28 is a nest failure, 28 is a successful nest.

create.sample <- function(survival){
  outcome<-rgeom(100,1-survival)
  fifelse(outcome > inc.period, inc.period, outcome)
}

# then create list of 100 nest outcomes with 3 different survival values using lapply 

inc.outcomes <- lapply(survival,create.sample)

# set up a data.table - each row of data will be a nest.

index<-c(1:3)
iteration<-1:20
dt = CJ(index,iteration)

Then I want to make a new column ‘inc.period’ which samples from the ‘inc.outcomes’ list using the index column of the dt to select which of the three ‘inc.outcomes’ lists to sample from (with a different sample for each row of data).
So e.g. when index = 1, the sampled value comes from inc.outcomes[[1]] – which is the low nest survival list, when index = 2 I sample from inc.outcomes[[2]] etc.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The code would look something like this but this doesn’t work (I get a recursive indexing failed error):

dt[,inc.period:= sample(inc.outcomes[[index]],nrow(dt),replace = TRUE)]

Any help or advice gratefully received, also suggestions for different approaches to this problem – this is for an update to code that runs in a shiny simulation so quicker options preferred!

>Solution :

Two problems:

  • inc.outcomes[[index]] is a problem since index is 60-long here, meaning you are ultimately trying inc.outcomes[[ c(1,1,...,2,2,...,3,3) ]], which is incorrect. [[-indexing is either length-1 (for most uses) or a vector as long as its list is nested. For example, in list(list(1,2),list(3,4))[[ c(1,2) ]] the [[c(1,2)]] with length-2 works because the have 2-deep nested lists. Since inc.outcomes is only 1-deep, we can only have length-1 in the [[ indexing.

  • This means we need to do this by-index. (An from this, we need to change from nrow(dt) to .N, but frankly we should be using that anyway even without by=.)

dt[, inc.period := sample(inc.outcomes[[ index[1] ]], .N, replace = TRUE), by = index]
#     index iteration inc.period
#     <int>     <int>      <num>
#  1:     1         1         17
#  2:     1         2         17
#  3:     1         3         21
#  4:     1         4         24
#  5:     1         5          3
#  6:     1         6          1
#  7:     1         7         17
#  8:     1         8          0
#  9:     1         9          1
# 10:     1        10          0
# ---                           
# 51:     3        11          0
# 52:     3        12          0
# 53:     3        13         28
# 54:     3        14         28
# 55:     3        15          9
# 56:     3        16         28
# 57:     3        17          7
# 58:     3        18         28
# 59:     3        19         28
# 60:     3        20         28

My data:

dt <- setDT(structure(list(index = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), iteration = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,  11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L)), row.names = c(NA, -60L), class = c("data.table", "data.frame"), sorted = c("index", "iteration")))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading