How to calculate an average over a prior date window?

December 30, 2023

Consider the following tibble:

  df <- tribble(
    ~dt, ~value, ~avg,
    make_date(2023-08-01),1,NA,
    make_date(2023,08,02),2,1,
    make_date(2023,08,04),3,1.5,
    make_date(2023,08,07),4,3,
    make_date(2023,08,08),5,4,
    make_date(2023,08,09),6,4.5,
    make_date(2023,08,10),7,5,
    make_date(2023,08,11),8,6,
    make_date(2023,08,12),9,7
  )

I want to calculate the average value for any records in the three days prior to the dt. So for example, on 2023-08-04 I average the values from 2023-08-03, 2023-08-02, and 2023-08-01 which are NA, 2, 1 so average of these (na.rm) values is 1.5.

So for this example, I want to add the column ‘avg’:

  dt         value   avg
<date>       <dbl>  <dbl>
2014-01-01     1     NA  
2023-08-02     2     1  
2023-08-04     3     1.5
2023-08-07     4     3  
2023-08-08     5     4  
2023-08-09     6     4.5
2023-08-10     7     5  
2023-08-11     8     6  
2023-08-12     9     7

There may be gaps of any size in the dates. The real application will have many dates (thousands) and be grouped by a subject_id (not included here). And the ‘three’ days prior may need to be repeated for other window sizes.

>Solution :

Here is an approach using slide_index_dbl from the {slider} package. (Side note: if you load the entire {tidyverse} package, you do not need to call the lubridate functions with namespace ::, it is simply done here to be explicit.)

library(tidyverse)

dat <- tibble(
  dt = lubridate::ymd(c(
    "2023-08-01", "2023-08-02", "2023-08-04", "2023-08-07", "2023-08-08",
    "2023-08-09", "2023-08-10", "2023-08-11", "2023-08-12", "2023-08-16",
    "2023-08-17", "2023-08-18"
    )),
  value = 1:12
  )

dat %>% 
  mutate(
    res = slider::slide_index_dbl(
      .x = value, 
      .i = dt, 
      .f = ~ mean(.x, na.rm = TRUE),
      .before = lubridate::days(3),
      .after = lubridate::days(-1)
    )
  )