Here is my data:
df1 <- fread('
id , date1 , date2
id_0001 , 2017-01-01, 2017-01-05
id_0002 , 2017-01-02, 2017-01-08
id_0003 , 2017-01-04, 2017-01-07
')
df2<- fread('
date , value
2017-01-01, 1
2017-01-02, 2
2017-01-03, 5
2017-01-04, 5
2017-01-05, 5
2017-01-06, 3
2017-01-07, 4
2017-01-08, 7
2017-01-09, 5
2017-01-10, 1
2017-01-11, 5
')
I want to summarize (get mean) the value from df2 by each id from df1 witinin the range between rowwise date1 and date2.
The result is like this:
| id | date1 | date2 | value |
|---|---|---|---|
| id_0001 | 2017-01-01 | 2017-01-05 | mean(c(1,2,5,5,5)) |
| id_0002 | 2017-01-02 | 2017-01-08 | mean(c(2,5,5,5,3,4,7)) |
| id_0003 | 2017-01-04 | 2017-01-07 | mean(c(5,5,3,4)) |
I know that I can extend id by date1 and date2 in df1 and perform left_join by dates to df2 and then summarize. However, as data volume increases, r cannot handle vectors of a certain size when further analysis is needed. Is there a data.table way of doing this inter-dataframe summary?
>Solution :
If means are inexact, it seems you’re after an approach like below:
library(data.table)
df1[, value := df2[.SD, on = .(date >= date1, date <= date2), mean(value), by = .EACHI]$V1]
Output:
df1
id date1 date2 value
1: id_0001 2017-01-01 2017-01-05 3.600000
2: id_0002 2017-01-02 2017-01-08 4.428571
3: id_0003 2017-01-04 2017-01-07 4.250000