Using sample() to sample from nested lists in R

January 20, 2022

I’m looking for a way to use sample() to sample values from different lists based on a value in another column of a data.table – at the moment I’m getting a recursive indexing failed error – code and more explanation below:

First set up example data:

library(stats)
library(data.table)

# list of three different nest survival rates
survival<-list(0.91,0.95,0.99)

# incubation period
inc.period<-28

# then set up function to use the geometric distribution to generate 3 lists of incubation outcomes based on the nest survivals and incubation period above.
# e.g. less than 28 is a nest failure, 28 is a successful nest.

create.sample <- function(survival){
  outcome<-rgeom(100,1-survival)
  fifelse(outcome > inc.period, inc.period, outcome)
}

# then create list of 100 nest outcomes with 3 different survival values using lapply 

inc.outcomes <- lapply(survival,create.sample)

# set up a data.table - each row of data will be a nest.

index<-c(1:3)
iteration<-1:20
dt = CJ(index,iteration)

Then I want to make a new column ‘inc.period’ which samples from the ‘inc.outcomes’ list using the index column of the dt to select which of the three ‘inc.outcomes’ lists to sample from (with a different sample for each row of data).
So e.g. when index = 1, the sampled value comes from inc.outcomes[[1]] – which is the low nest survival list, when index = 2 I sample from inc.outcomes[[2]] etc.

The code would look something like this but this doesn’t work (I get a recursive indexing failed error):

dt[,inc.period:= sample(inc.outcomes[[index]],nrow(dt),replace = TRUE)]

Any help or advice gratefully received, also suggestions for different approaches to this problem – this is for an update to code that runs in a shiny simulation so quicker options preferred!

>Solution :

Two problems:

inc.outcomes[[index]] is a problem since index is 60-long here, meaning you are ultimately trying inc.outcomes[[ c(1,1,...,2,2,...,3,3) ]], which is incorrect. [[-indexing is either length-1 (for most uses) or a vector as long as its list is nested. For example, in list(list(1,2),list(3,4))[[ c(1,2) ]] the [[c(1,2)]] with length-2 works because the have 2-deep nested lists. Since inc.outcomes is only 1-deep, we can only have length-1 in the [[ indexing.
This means we need to do this by-index. (An from this, we need to change from nrow(dt) to .N, but frankly we should be using that anyway even without by=.)

dt[, inc.period := sample(inc.outcomes[[ index[1] ]], .N, replace = TRUE), by = index]
#     index iteration inc.period
#     <int>     <int>      <num>
#  1:     1         1         17
#  2:     1         2         17
#  3:     1         3         21
#  4:     1         4         24
#  5:     1         5          3
#  6:     1         6          1
#  7:     1         7         17
#  8:     1         8          0
#  9:     1         9          1
# 10:     1        10          0
# ---                           
# 51:     3        11          0
# 52:     3        12          0
# 53:     3        13         28
# 54:     3        14         28
# 55:     3        15          9
# 56:     3        16         28
# 57:     3        17          7
# 58:     3        18         28
# 59:     3        19         28
# 60:     3        20         28

My data:

dt <- setDT(structure(list(index = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), iteration = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,  11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L)), row.names = c(NA, -60L), class = c("data.table", "data.frame"), sorted = c("index", "iteration")))