Home Generate id for durations with dplyr

Questions

Generate id for durations with dplyr

December 13, 2021

I have a column with dates (Time), I consider one duration as consecutive times of 1s:

data <- data.frame(Time = c("2021-12-01 01:01:01","2021-12-01 01:01:02","2021-12-01 01:01:03","2021-12-01 01:01:05","2021-12-01 01:01:06"))

I would like to generate an Id for each duration like this:

data <- data.frame(Time = c("2021-12-01 01:01:01","2021-12-01 01:01:02","2021-12-01 01:01:03","2021-12-01 01:01:05","2021-12-01 01:01:06"),Id = c(1,1,1,2,2))

With dplyr…
Thank you

>Solution :

Up front:

cumsum(c(TRUE, as.numeric(diff(as.POSIXct(data$Time)), units = "secs") > 1L))
# [1] 1 1 1 2 2

First, you should really be working with real timestamps and not strings. If you’re doing anything else with your Time field, it is almost certainly going to be a number-like operation, so you should do this up-front with

data$Time <- as.POSIXct(data$Time)

This works easily here because they are well-formed along the default format of "%Y-%m-%d %H:%M:%S"; see ?strptime for the %-codes.

From here, you want to keep track of when a difference in time is more than 1 second. The differencing is easy enough with:

as.numeric(diff(data$Time), units = "secs")
# [1] 1 1 2 1

Really, the key operator is diff, but it can report minutes or hours or such if the data is widely-enough spaced; there’s an internal heuristic for that. Wrapping it in as.numeric(., units="secs") forces it to always be in seconds.

From here, we need a cumulative sum of when it is above 1, ergo > 1L, so cumsum(. > 1L).

Note that we have input length 5 but output length 4, this makes sense realizing that differences are between two elements. We force the first difference-test to be TRUE. If you have since changed to POSIXt-class, then the original code is reduced slightly to be

cumsum(c(TRUE, as.numeric(diff(data$Time), units = "secs") > 1L))

and therefore to store it as Id,

data$Id <- cumsum(c(TRUE, as.numeric(diff(data$Time), units = "secs") > 1L))

dplyr

byMR

Published December 13, 2021

Add a comment

Javascript/nodejs Azure Service Bus Complete a message

byMR

December 13, 2021

Questions

Splitting string into words and space and punctuation

byMR

December 13, 2021

Questions

Python: is there a way to implement break into a function that is meant to be looped?

byMR

December 13, 2021

Questions

How to execute mysql query and get the output as data frame in python

byMR

December 13, 2021

Questions

sql query with avg price per day and group by day

byMR

December 13, 2021

Generate id for durations with dplyr

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Javascript/nodejs Azure Service Bus Complete a message

Splitting string into words and space and punctuation

Python: is there a way to implement break into a function that is meant to be looped?

How to execute mysql query and get the output as data frame in python

sql query with avg price per day and group by day

Keep Up to Date with the Most Important News

Generate id for durations with dplyr

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Javascript/nodejs Azure Service Bus Complete a message

Splitting string into words and space and punctuation

Python: is there a way to implement break into a function that is meant to be looped?

How to execute mysql query and get the output as data frame in python

pyspark dataframe groupby with aggregate unique values

sql query with avg price per day and group by day

Discover more from Dev solutions