Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Generate id for durations with dplyr

I have a column with dates (Time), I consider one duration as consecutive times of 1s:

data <- data.frame(Time = c("2021-12-01 01:01:01","2021-12-01 01:01:02","2021-12-01 01:01:03","2021-12-01 01:01:05","2021-12-01 01:01:06"))

I would like to generate an Id for each duration like this:

data <- data.frame(Time = c("2021-12-01 01:01:01","2021-12-01 01:01:02","2021-12-01 01:01:03","2021-12-01 01:01:05","2021-12-01 01:01:06"),Id = c(1,1,1,2,2))

With dplyr…
Thank you

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Up front:

cumsum(c(TRUE, as.numeric(diff(as.POSIXct(data$Time)), units = "secs") > 1L))
# [1] 1 1 1 2 2

First, you should really be working with real timestamps and not strings. If you’re doing anything else with your Time field, it is almost certainly going to be a number-like operation, so you should do this up-front with

data$Time <- as.POSIXct(data$Time)

This works easily here because they are well-formed along the default format of "%Y-%m-%d %H:%M:%S"; see ?strptime for the %-codes.

From here, you want to keep track of when a difference in time is more than 1 second. The differencing is easy enough with:

as.numeric(diff(data$Time), units = "secs")
# [1] 1 1 2 1

Really, the key operator is diff, but it can report minutes or hours or such if the data is widely-enough spaced; there’s an internal heuristic for that. Wrapping it in as.numeric(., units="secs") forces it to always be in seconds.

From here, we need a cumulative sum of when it is above 1, ergo > 1L, so cumsum(. > 1L).

Note that we have input length 5 but output length 4, this makes sense realizing that differences are between two elements. We force the first difference-test to be TRUE. If you have since changed to POSIXt-class, then the original code is reduced slightly to be

cumsum(c(TRUE, as.numeric(diff(data$Time), units = "secs") > 1L))

and therefore to store it as Id,

data$Id <- cumsum(c(TRUE, as.numeric(diff(data$Time), units = "secs") > 1L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading