Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to calculate an average over a prior date window?

Consider the following tibble:

  df <- tribble(
    ~dt, ~value, ~avg,
    make_date(2023-08-01),1,NA,
    make_date(2023,08,02),2,1,
    make_date(2023,08,04),3,1.5,
    make_date(2023,08,07),4,3,
    make_date(2023,08,08),5,4,
    make_date(2023,08,09),6,4.5,
    make_date(2023,08,10),7,5,
    make_date(2023,08,11),8,6,
    make_date(2023,08,12),9,7
  )  

I want to calculate the average value for any records in the three days prior to the dt. So for example, on 2023-08-04 I average the values from 2023-08-03, 2023-08-02, and 2023-08-01 which are NA, 2, 1 so average of these (na.rm) values is 1.5.

So for this example, I want to add the column ‘avg’:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  dt         value   avg
<date>       <dbl>  <dbl>
2014-01-01     1     NA  
2023-08-02     2     1  
2023-08-04     3     1.5
2023-08-07     4     3  
2023-08-08     5     4  
2023-08-09     6     4.5
2023-08-10     7     5  
2023-08-11     8     6  
2023-08-12     9     7  

There may be gaps of any size in the dates. The real application will have many dates (thousands) and be grouped by a subject_id (not included here). And the ‘three’ days prior may need to be repeated for other window sizes.

>Solution :

Here is an approach using slide_index_dbl from the {slider} package. (Side note: if you load the entire {tidyverse} package, you do not need to call the lubridate functions with namespace ::, it is simply done here to be explicit.)

library(tidyverse)

dat <- tibble(
  dt = lubridate::ymd(c(
    "2023-08-01", "2023-08-02", "2023-08-04", "2023-08-07", "2023-08-08",
    "2023-08-09", "2023-08-10", "2023-08-11", "2023-08-12", "2023-08-16",
    "2023-08-17", "2023-08-18"
    )),
  value = 1:12
  )

dat %>% 
  mutate(
    res = slider::slide_index_dbl(
      .x = value, 
      .i = dt, 
      .f = ~ mean(.x, na.rm = TRUE),
      .before = lubridate::days(3),
      .after = lubridate::days(-1)
    )
  )
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading