Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

summarising a 2 column data.table to continuous integers

Assume a toy data.table of two integer columns x aand y. The x coumn could contain any integer but y column will have a value one larger (next consecutive integer).

data.table::data.table(
           x = c(1L, 3L, 7L, 8L, 11L, 12L, 13L, 15L),
           y = c(2L, 4L, 8L, 9L, 12L, 13L, 14L, 16L)
)
#>     x  y
#> 1:  1  2
#> 2:  3  4
#> 3:  7  8
#> 4:  8  9
#> 5: 11 12
#> 6: 12 13
#> 7: 13 14
#> 8: 15 16

I want to collapse the rows that have consecutive running integer ranges (having one number common between rows) such that in the new data.table x has the minimum and y the max of the range. The only condition is that the range of integers (x:y) will include all numbers that were present in the original rows that are replaced.

Here is the desired output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data.table::data.table(
           x = c(1L, 3L, 7L, 11L, 15L),
           y = c(2L, 4L, 9L,  14L, 16L)
)
#>     x  y
#> 1:  1  2
#> 2:  3  4
#> 3:  7  9
#> 4: 11 14
#> 5: 15 16

Created on 2022-04-28 by the reprex package (v2.0.1)

This is such a simple requirement but I am unable to get my head around to solve it. There may be many ways to solve this but I am looking for an elegant way.

I am open to any solution using either data.table or dpylr.

>Solution :

Create a grouping column and then summarise to return the first of ‘x’ column and last of ‘y’ column – grouping is created by checking whether the lag values of ‘y’ are not equal to the current values of ‘x’, get the cumulative sum

library(data.table)
dt[, .(x = first(x), y = last(y)), .(grp = cumsum(shift(y, 
     fill = first(y)) != x))][, grp := NULL][]

-output

      x     y
   <int> <int>
1:     1     2
2:     3     4
3:     7     9
4:    11    14
5:    15    16
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading