Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Count the number of timestamps in a given vector that fall within an interval in R

I want to count the number of events that occur within intervals.

I start with a table that has three columns: start dates, end dates, and the interval created by them.

table <-
  tibble( 
    start = c( "2022-08-02", "2022-10-06", "2023-01-11"), 
    end = c("2022-08-04", "2023-02-06", "2023-02-04"), 
    interval = start %--% end
  )

I also have a vector of timestamp events:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

events <- c(ymd("2022-08-07"), ymd("2022-10-17"), ymd("2023-01-17"), ymd("2023-02-02"))

For each interval in my table, I want to know how many events fell within that interval so that my final table looks something like this (but with the correct counts):

   start       end         interval                       n_events_within_interval
   <chr>       <chr>       <Interval>                                        <int>
 1 2022-08-02  2022-08-04  2022-08-02 UTC--2022-08-04 UTC                        2
 2 2022-10-06  2023-02-06  2022-10-06 UTC--2023-02-06 UTC                        2
 3 2023-01-11  2023-02-04  2023-01-11 UTC--2023-02-04 UTC                        2

I have tried this so far, but I’m not sure how to get mutate to cycle through the events vector for each row:

library(tidyverse)
library(lubridate)
library(purrr)

table <-
  tibble( 
    start = c( "2022-08-02", "2022-10-06", "2023-01-11"), 
    end = c("2022-08-04", "2023-02-06", "2023-02-04"), 
    interval = start %--% end
  )

table

#> # A tibble: 3 × 3
#>   start      end        interval                      
#>   <chr>      <chr>      <Interval>                    
#> 1 2022-08-02 2022-08-04 2022-08-02 UTC--2022-08-04 UTC
#> 2 2022-10-06 2023-02-06 2022-10-06 UTC--2023-02-06 UTC
#> 3 2023-01-11 2023-02-04 2023-01-11 UTC--2023-02-04 UTC


events <- c(ymd("2022-08-07"), ymd("2022-10-17"), ymd("2023-01-17"), ymd("2023-02-02"))
events
#> [1] "2022-08-07" "2022-10-17" "2023-01-17" "2023-02-02"


table %>% 
  mutate(
    n_events_within_interval = sum(events %within% interval)
  )
#> Warning in as.numeric(a) - as.numeric(int@start): longer object length is not a
#> multiple of shorter object length
#> Warning in as.numeric(a) - as.numeric(int@start) <= int@.Data: longer object
#> length is not a multiple of shorter object length
#> Warning in as.numeric(a) - as.numeric(int@start): longer object length is not a
#> multiple of shorter object length
#> # A tibble: 3 × 4
#>   start      end        interval                       n_events_within_interval
#>   <chr>      <chr>      <Interval>                                        <int>
#> 1 2022-08-02 2022-08-04 2022-08-02 UTC--2022-08-04 UTC                        2
#> 2 2022-10-06 2023-02-06 2022-10-06 UTC--2023-02-06 UTC                        2
#> 3 2023-01-11 2023-02-04 2023-01-11 UTC--2023-02-04 UTC                        2

Created on 2023-02-15 with reprex v2.0.2

>Solution :

We could use rowwise

library(dplyr)
library(lubridate)
table %>% 
  rowwise %>% 
  mutate(n_events_within_interval = sum(events %within% interval)) %>%
  ungroup
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading