Consider the following tibble:
df <- tribble(
~dt, ~value, ~avg,
make_date(2023-08-01),1,NA,
make_date(2023,08,02),2,1,
make_date(2023,08,04),3,1.5,
make_date(2023,08,07),4,3,
make_date(2023,08,08),5,4,
make_date(2023,08,09),6,4.5,
make_date(2023,08,10),7,5,
make_date(2023,08,11),8,6,
make_date(2023,08,12),9,7
)
I want to calculate the average value for any records in the three days prior to the dt. So for example, on 2023-08-04 I average the values from 2023-08-03, 2023-08-02, and 2023-08-01 which are NA, 2, 1 so average of these (na.rm) values is 1.5.
So for this example, I want to add the column ‘avg’:
dt value avg
<date> <dbl> <dbl>
2014-01-01 1 NA
2023-08-02 2 1
2023-08-04 3 1.5
2023-08-07 4 3
2023-08-08 5 4
2023-08-09 6 4.5
2023-08-10 7 5
2023-08-11 8 6
2023-08-12 9 7
There may be gaps of any size in the dates. The real application will have many dates (thousands) and be grouped by a subject_id (not included here). And the ‘three’ days prior may need to be repeated for other window sizes.
>Solution :
Here is an approach using slide_index_dbl from the {slider} package. (Side note: if you load the entire {tidyverse} package, you do not need to call the lubridate functions with namespace ::, it is simply done here to be explicit.)
library(tidyverse)
dat <- tibble(
dt = lubridate::ymd(c(
"2023-08-01", "2023-08-02", "2023-08-04", "2023-08-07", "2023-08-08",
"2023-08-09", "2023-08-10", "2023-08-11", "2023-08-12", "2023-08-16",
"2023-08-17", "2023-08-18"
)),
value = 1:12
)
dat %>%
mutate(
res = slider::slide_index_dbl(
.x = value,
.i = dt,
.f = ~ mean(.x, na.rm = TRUE),
.before = lubridate::days(3),
.after = lubridate::days(-1)
)
)