Identify index presentations and re-attendances within a 28 day period

My dataset records presentations to a location by an individual. This is tabulated below but included as a dput further down.

Identifier date
"A1" "28/01/2020"
"A1" "01/04/2020"
"A1" "16/08/2020"
"A1" "20/08/2020"
"A1" "30/08/2020"
"A1" "31/10/2020"
"A1" "14/11/2020"
"A1" "26/11/2020"
"A1" "25/12/2020"
"A1" "04/05/2021"
"A1" "08/05/2021"
"A1" "26/07/2021"

The individual attends sporadically and on occasions returns several times within a 28 day period.

Working in date order I want to identify index cases that occur at least 28 days apart and reattendances that occur within the 28 day window.

The resulting data when processed correctly would look like this:

Identifier date index_visit index_id
"A1" "28/01/2020" TRUE 1
"A1" "01/04/2020" TRUE 2
"A1" "16/08/2020" TRUE 3
"A1" "20/08/2020" FALSE 3
"A1" "30/08/2020" FALSE 3
"A1" "31/10/2020" TRUE 4
"A1" "14/11/2020" FALSE 4
"A1" "26/11/2020" FALSE 4
"A1" "25/12/2020" TRUE 5
"A1" "04/05/2021" TRUE 6
"A1" "08/05/2021" FALSE 6
"A1" "26/07/2021" TRUE 7

I would prefer solutions that make use of dplyr as this is what I am most familiar with.

Dput:

data <- structure(list(identifier = c("A1", "A1", "A1", "A1", "A1", "A1",
                                  "A1", "A1", "A1", "A1", "A1", "A1", "A1"),
                    date = structure(c(18520, 18504, 18621, 18580, 18353, 18751,
                                       18289, 18494, 18592, 18490, 18755, 18834, 18566),
                                     class = "Date")),
               row.names = c(NA, -13L), class = "data.frame")

>Solution :

My function time_episodes() was originally written to identify episodes of disease reinfection but is applicable to any episodic event analysis.

It also supports more complex time units like months through lubridate.

# Uncomment to install
# remotes::install_github("NicChr/timeplyr")
library(tidyverse)
library(timeplyr)

data <- as_tibble(data)

episodic_data <- data %>%
  group_by(identifier) %>%
  time_episodes(date, time_by = "day", window = 28, switch_on_boundary = TRUE)

episodic_data %>%
  arrange(identifier, date) %>%
  mutate(index_visit = ep_id_new > 0)
# A tibble: 13 × 7
# Groups:   identifier [1]
   identifier date       t_elapsed ep_start   ep_id ep_id_new index_visit
   <chr>      <date>         <dbl> <date>     <int>     <int> <lgl>      
 1 A1         2020-01-28         0 2020-01-28     1         1 TRUE       
 2 A1         2020-04-01        64 2020-04-01     2         2 TRUE       
 3 A1         2020-08-16       137 2020-08-16     3         3 TRUE       
 4 A1         2020-08-20         4 2020-08-16     3         0 FALSE      
 5 A1         2020-08-30        10 2020-08-16     3         0 FALSE      
 6 A1         2020-09-15        16 2020-08-16     3         0 FALSE      
 7 A1         2020-10-31        46 2020-10-31     4         4 TRUE       
 8 A1         2020-11-14        14 2020-10-31     4         0 FALSE      
 9 A1         2020-11-26        12 2020-10-31     4         0 FALSE      
10 A1         2020-12-25        29 2020-12-25     5         5 TRUE       
11 A1         2021-05-04       130 2021-05-04     6         6 TRUE       
12 A1         2021-05-08         4 2021-05-04     6         0 FALSE      
13 A1         2021-07-26        79 2021-07-26     7         7 TRUE 

Alternative method

data %>%
  group_by(identifier) %>%
  arrange(identifier, date) %>%
  mutate(elapsed = time_elapsed(date, "days")) %>%
  mutate(index_visit = row_number() == 1 | elapsed >= 28)
# A tibble: 13 × 4
# Groups:   identifier [1]
   identifier date       elapsed index_visit
   <chr>      <date>       <dbl> <lgl>      
 1 A1         2020-01-28      NA TRUE       
 2 A1         2020-04-01      64 TRUE       
 3 A1         2020-08-16     137 TRUE       
 4 A1         2020-08-20       4 FALSE      
 5 A1         2020-08-30      10 FALSE      
 6 A1         2020-09-15      16 FALSE      
 7 A1         2020-10-31      46 TRUE       
 8 A1         2020-11-14      14 FALSE      
 9 A1         2020-11-26      12 FALSE      
10 A1         2020-12-25      29 TRUE       
11 A1         2021-05-04     130 TRUE       
12 A1         2021-05-08       4 FALSE      
13 A1         2021-07-26      79 TRUE   

Leave a Reply