Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Rolling join two data.tables with date in R

I have two data tables called dt1 (which has date column with all weekdays) and dt2 (which has date column with some weekdays and some weekends). I want to join dt2 table into dt1 table so that –

  1. The dates in dt2 get aligned with dates in dt1, we will call the new column dates_2.
  2. In case dt2 has a date (Saturday or Sunday), which is not there in dt1, then move that date to the next date in dt1.
  3. The dates where dt2 doesn’t have a corresponding date in dt1, then fill NA in dates_2.
library(data.table)
date = seq(from = as.Date("2019-12-31"), to=as.Date("2020-12-31"), by=paste(1,"days"))
date <- date[!weekdays(date) %in% c('Saturday','Sunday')]
  
dt1 = data.table(group = "A", date1 = as.Date(date))
dt2 = structure(list(group = c("A", "A", "A", "A", "A", "A", "A", "A"
), date2 = structure(c(18286, 18391, 18468, 18566, 18657, 18741, 
18832, 18923), class = "Date")), row.names = c(NA, -8L), class = c("data.table", "data.frame"))

I have tried the following code –

dt3 = dt1[dt2, on = c("group" = "group"), date_2 := date2, roll = T]
dt3[]

But the dt3 does not roll properly.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Please guide me on what I am doing wrong.

Here is the expected result. For brevity, I am showing the only subset of the expected result

   group      date1     date_2
...
...
...
1:     A 2020-05-06     NA
2:     A 2020-05-07     NA
3:     A 2020-05-08     NA
4:     A 2020-05-11     2020-05-09
5:     A 2020-05-12     NA
6:     A 2020-05-13     NA
7:     A 2020-05-14     NA
...
...
...
and the series goes on

Since the date "2020-05-09" is not there in dt1, the date in dt2 got rolled over to the next date. Rest all the dates are NA.

>Solution :

We may use non-equi join

dt1[dt2, date_2 := date2, on = .(group, date1 > date2), mult = "first"]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading