Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Separate intersecting time intervals by number of segments (R)

I’m working on a dataset of time intervals. Some intervals overlap. I would like to take the raw interval data, and break it into consecutive intervals, by number of overlaps. In the toy data below, there are 3 intervals. My desired output is a data frame that contains the start and stop of where there’s only one ID, then start and stop where ID 1 and ID 2 intersect, then start and stop of where IDs 1-3 intersect, then start and stop of where IDs 1 and 3 intersect, and finally the start and stop of the remainder of ID 1.

library(lubridate)
library(ggplot2)

df <- structure(list(ID = 1:3, Start = structure(c(1690740180, 1690740480, 
1690741380), class = c("POSIXct", "POSIXt"), tzone = "America/Iqaluit"), 
    End = structure(c(1690751520, 1690742140, 1690742280), class = c("POSIXct", 
    "POSIXt"), tzone = "America/Iqaluit")), row.names = 3:5, class = "data.frame")

ggplot(df) + geom_segment(aes(x = Start, xend = End, y = as.factor(ID), yend = as.factor(ID)))

Desired output should look like this:

  Intervals               Start                 End
         1 2023-07-30 14:03:00 2023-07-30 14:07:59
         2 2023-07-30 14:08:00 2023-07-30 14:22:59
         3 2023-07-30 14:23:00 2023-07-30 14:35:40
         2 2023-07-30 14:35:40 2023-07-30 14:38:00
         1 2023-07-30 14:38:00 2023-07-30 15:06:40

I can do this by interpolating the data down to 1 sec and checking for intersections, but I was hoping for a cleaner solution.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Here’s a base R solution:

alltimes <- unique(sort(c(df$Start, df$End)))
intervals <- sapply(intervals[-length(intervals)],
                    function(tm) df$Start <= tm & tm < df$End)
intervals
#       [,1]  [,2] [,3]  [,4]  [,5]
# [1,]  TRUE  TRUE TRUE  TRUE  TRUE
# [2,] FALSE  TRUE TRUE FALSE FALSE
# [3,] FALSE FALSE TRUE  TRUE FALSE

In intervals, each row is a row from the original df, each column is a time segment, and the value indicates if the original df row was found in that time segment. We can take the sums of each column to create the Intervals column, and then the Start and End columns are merely the pairs of our alltimes vector.

data.frame(
  Intervals = colSums(intervals),
  Start = alltimes[-length(alltimes)],
  End = alltimes[-1]
)
#   Intervals               Start                 End
# 1         1 2023-07-30 14:03:00 2023-07-30 14:08:00
# 2         2 2023-07-30 14:08:00 2023-07-30 14:23:00
# 3         3 2023-07-30 14:23:00 2023-07-30 14:35:40
# 4         2 2023-07-30 14:35:40 2023-07-30 14:38:00
# 5         1 2023-07-30 14:38:00 2023-07-30 17:12:00

I’m not certain if the new End should be the same as the next Start or offset by one second, your expected output uses both. Also, I don’t know how your last row is 15:06:40 (is not in your raw data), I suspect it’s an artifact of your real data and not the sample.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading