Unable to subset a matrix of data.frames using lapply and sapply

Advertisements

I have a list with Data organized as data.frames in a matrix. For example:

> ls <- list(Dates = seq.Date(as.Date('2023-01-01'), by = 'day', length.out = 3), Pars = seq(0.5, 2.0, 0.5))
> df <- data.frame(X = runif(10,0,1), Y = runif(10,0,1))
> ls$Data <- sapply(ls$Dates, function(d) lapply(ls$Pars, function(p) df -> ls$Data[p][d]))
There were 21 warnings (use warnings() to see them) # *Ignore warnings, this is just an example*
> as.character(ls$Dates) -> colnames(ls$Data); ls$Pars -> rownames(ls$Data)
> ls

$Dates
[1] "2023-01-01" "2023-01-02" "2023-01-03"

$Pars
[1] 0.5 1.0 1.5 2.0

$Data
    2023-01-01   2023-01-02   2023-01-03  
0.5 data.frame,2 data.frame,2 data.frame,2
1   data.frame,2 data.frame,2 data.frame,2
1.5 data.frame,2 data.frame,2 data.frame,2
2   data.frame,2 data.frame,2 data.frame,2

I can easily subset a column in a data.frame:

> ls$Data[['1.5','2023-01-02']]$Y
 [1] 0.78773262 0.54989971 0.29513767 0.42966110 0.01719963 0.87326344 0.85021538 0.16226286 0.76293787
[10] 0.53882718

So building on this, I want to add a matrix like Data to my list with the sum of the Y column in each data.frame. I tried using sapply and lapply and subsetting my list as above, but get an error.

> ls$SumY <- sapply(ls$Dates, function(d) lapply(ls$Pars, function(p) sum(ls$Data[[p, d]]$Y)))
Error in ls$Data[[p, d]] :
attempt to select less than one element in get1index <real>

>Solution :

You shouldn’t/can’t use Date and numeric class objects for indexing character class row and column names:

## demonstration
ls$Data[['1.5', ls$Dates[2]]]$Y
# Error in ls$Data[["1.5", ls$Dates[2]]] : subscript out of bounds

## works if you convert to `character`
ls$Data[['1.5', as.character(ls$Dates[2])]]$Y
 # [1] 0.35346265 0.25918428 0.56523229 0.09214479 0.11412712 0.92853271 0.65296477 0.12045425
 # [9] 0.95620851 0.87551876

## whole thing works if you convert to `character`
sapply(as.character(ls$Dates), function(d)
  lapply(as.character(ls$Pars), function(p)
    sum(ls$Data[[p, d]]$Y)
  )
)
#      2023-01-01 2023-01-02 2023-01-03
# [1,] 4.91783    4.91783    4.91783   
# [2,] 4.91783    4.91783    4.91783   
# [3,] 4.91783    4.91783    4.91783   
# [4,] 4.91783    4.91783    4.91783  

Leave a ReplyCancel reply