Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Combine datasets by nearest time 'after'

I would like to combine two datasets based on the nearest time after, but I don’t know how to specify this with data.table.

The x dataset is here and the ‘y’ dataset is below:

#y data.table 
structure(list(date.time = structure(c(1551107760, 1551171420, 
1551181500, 1551181560, 1551182400, 1551188100, 1551194700, 1551262320, 
1551262380, 1551264420, 1551267960, 1551272400, 1551338580, 1551343140, 
1551343380, 1551349800, 1551351600, 1551369000, 1551424680, 1551425520, 
1551429000, 1551446880), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    id = c(12L, 3L, 3L, 10L, 10L, 11L, 4L, 10L, 3L, 11L, 4L, 
    12L, 2L, 10L, 12L, 12L, 12L, 12L, 2L, 12L, 2L, 10L)), row.names = c(NA, 
-22L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000001dc21061ef0>, sorted = "date.time")

If I specify roll=nearest:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

require(data.table)
setDT(x)
setDT(y)

x[, reference.x := date.time] #ref to check join 
setkey(y, date.time)
setkey(x, date.time)

#join dataframe 
results <- x[y, roll="nearest"]

You will see that the function works as it should, however I would like the join to be based on the nearest time of x that occurred after y. In the results you will see that the date and time of reference.x are before the date and time of y.

> results
            date.time    class         reference.x id
 1: 2019-02-25 15:16:00 8.2 14.8 2019-02-25 15:16:00 12 #on or after
 2: 2019-02-26 08:57:00 8.2 14.8 2019-02-26 08:57:00  3 #on or after 
 3: 2019-02-26 11:45:00 8.2 14.8 2019-02-26 11:45:00  3 #on or after
 4: 2019-02-26 11:46:00 8.2 14.8 2019-02-26 11:45:00 10 #before (incorrect) 
 5: 2019-02-26 12:00:00 8.2 14.8 2019-02-26 11:59:00 10
 6: 2019-02-26 13:35:00 8.2 14.8 2019-02-26 13:35:00 11
 7: 2019-02-26 15:25:00 8.2 14.8 2019-02-26 15:25:00  4
 8: 2019-02-27 10:12:00 8.2 14.8 2019-02-27 07:10:00 10
 9: 2019-02-27 10:13:00 8.2 14.8 2019-02-27 07:10:00  3
10: 2019-02-27 10:47:00 8.2 14.8 2019-02-27 07:10:00 11
11: 2019-02-27 11:46:00 8.2 14.8 2019-02-27 07:10:00  4
12: 2019-02-27 13:00:00 8.2 14.8 2019-02-27 07:10:00 12
13: 2019-02-28 07:23:00 8.2 14.8 2019-02-28 07:26:00  2
14: 2019-02-28 08:39:00 8.2 14.8 2019-02-28 08:39:00 10
15: 2019-02-28 08:43:00 8.2 14.8 2019-02-28 08:42:00 12
16: 2019-02-28 10:30:00 8.2 14.8 2019-02-28 10:30:00 12
17: 2019-02-28 11:00:00 8.2 14.8 2019-02-28 10:59:00 12
18: 2019-02-28 15:50:00 8.2 14.8 2019-02-28 15:49:00 12
19: 2019-03-01 07:18:00 8.2 14.8 2019-03-01 07:18:00  2
20: 2019-03-01 07:32:00 8.2 14.8 2019-03-01 07:32:00 12
21: 2019-03-01 08:30:00 8.2 14.8 2019-03-01 08:30:00  2
22: 2019-03-01 13:28:00 8.2 14.8 2019-03-01 13:27:00 10
>

structure(list(date.time = structure(c(1551107760, 1551171420, 
1551181500, 1551181560, 1551182400, 1551188100, 1551194700, 1551262320, 
1551262380, 1551264420, 1551267960, 1551272400, 1551338580, 1551343140, 
1551343380, 1551349800, 1551351600, 1551369000, 1551424680, 1551425520, 
1551429000, 1551446880), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    class = c("8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8", 
    "8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8", 
    "8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8", 
    "8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8", "8.2 14.8", 
    "8.2 14.8", "8.2 14.8", "8.2 14.8"), reference.x = structure(c(1551107760, 
    1551171420, 1551181500, 1551181500, 1551182340, 1551188100, 
    1551194700, 1551251400, 1551251400, 1551251400, 1551251400, 
    1551251400, 1551338760, 1551343140, 1551343320, 1551349800, 
    1551351540, 1551368940, 1551424680, 1551425520, 1551429000, 
    1551446820), tzone = "UTC", class = c("POSIXct", "POSIXt"
    )), id = c(12L, 3L, 3L, 10L, 10L, 11L, 4L, 10L, 3L, 11L, 
    4L, 12L, 2L, 10L, 12L, 12L, 12L, 12L, 2L, 12L, 2L, 10L)), sorted = "date.time", class = c("data.table", 
"data.frame"), row.names = c(NA, -22L), .internal.selfref = <pointer: 0x000001dc21061ef0>)

Any help would be greatly appreciated.

>Solution :

The option you are looking for is roll = -Inf

From the documentation (?data.table):

-Inf rolls backwards instead; i.e., next observation carried backward (NOCB).

> results <- x[y, roll=-Inf]
> head(results)

             date.time    class         reference.x id
1: 2019-02-25 15:16:00 8.2 14.8 2019-02-25 15:16:00 12
2: 2019-02-26 08:57:00 8.2 14.8 2019-02-26 08:57:00  3
3: 2019-02-26 11:45:00 8.2 14.8 2019-02-26 11:45:00  3
4: 2019-02-26 11:46:00 8.2 14.8 2019-02-26 11:47:00 10
5: 2019-02-26 12:00:00 8.2 14.8 2019-02-26 12:01:00 10
6: 2019-02-26 13:35:00 8.2 14.8 2019-02-26 13:35:00 11
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading