Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to determine difference in days between two dates across two columns and two rows by group?

I am looking to determine the difference in days by groups across two columns and two rows. Essentially subtract from the End Day by the subsequent Start Day in the subsequent row and record the difference as new column in the data frame and start over when a new group (ID) has been identified.

Start_Date   End_Date     ID   
  
2014-05-09   2015-05-08   01
2015-05-09   2016-05-08   01 
2016-05-11   2017-05-10   01
2017-05-11   2018-05-10   01
2016-08-29   2017-08-28   02
2017-08-29   2018-08-28   02

The result should be something like table below.

Start_Date   End_Date     ID   Days_Difference 
  
2014-05-09   2015-05-08   01         NA
2015-05-09   2016-05-08   01         01
2016-05-11   2017-05-10   01         03
2017-05-11   2018-05-10   01         01
2016-08-29   2017-08-28   02         NA
2017-08-29   2018-08-28   02         01

Essentially I want to take the difference of the End Date and its left diagonal Start date across groups (ID). I am having a really hard time with this one. I don’t think my code would be helpful. Any solution using tidyverse, data.table, or base R would be greatly appreciated!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We may get the difference between the lead (next element) of ‘Start_Date’ and ‘End_Date’ after grouping

library(dplyr)
df1 <- df1 %>%
   mutate(across(ends_with("Date"), as.Date)) %>%
   group_by(ID) %>% 
   mutate(Days_Difference = as.numeric(lag(lead(Start_Date) - End_Date))) %>% 
   ungroup

-output

df1
# A tibble: 6 × 4
  Start_Date End_Date      ID Days_Difference
  <date>     <date>     <int>           <dbl>
1 2014-05-09 2015-05-08     1              NA
2 2015-05-09 2016-05-08     1               1
3 2016-05-11 2017-05-10     1               3
4 2017-05-11 2018-05-10     1               1
5 2016-08-29 2017-08-28     2              NA
6 2017-08-29 2018-08-28     2               1

Or a similar logic with data.table

library(data.table)
setDT(df1)[, Days_Difference := 
    as.numeric(shift(shift(as.IDate(Start_Date), type = "lead") - 
       as.IDate(End_Date))), ID]

-output

> df1
   Start_Date   End_Date    ID Days_Difference
       <char>     <char> <int>           <num>
1: 2014-05-09 2015-05-08     1              NA
2: 2015-05-09 2016-05-08     1               1
3: 2016-05-11 2017-05-10     1               3
4: 2017-05-11 2018-05-10     1               1
5: 2016-08-29 2017-08-28     2              NA
6: 2017-08-29 2018-08-28     2               1

data

df1 <- structure(list(Start_Date = c("2014-05-09", "2015-05-09", "2016-05-11", 
"2017-05-11", "2016-08-29", "2017-08-29"), End_Date = c("2015-05-08", 
"2016-05-08", "2017-05-10", "2018-05-10", "2017-08-28", "2018-08-28"
), ID = c(1L, 1L, 1L, 1L, 2L, 2L)), class = "data.frame", 
row.names = c(NA, 
-6L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading