I have a data.frame where each id maps to several discontiguous linear intervals, which do not overlap, and are sorted in ascending order:
df <- data.frame(id = c(rep("id1",3),rep("id2",4)),
start = c(101,220,307,550,658,742,855),
end = c(154,246,326,625,712,811,944),
stringsAsFactors = F)
I’d like to add new start and end columns that will cumulatively sum up the interval widths and show the cumulative start and end coordinates.
So, for the example df above, these new start and end columns (cum.start, cum.end) will be:
df$cum.start <- c(1,55,82,1,77,132,202)
df$cum.end <- c(54,81,101,76,131,201,291)
Any dplyr way of doing this?
>Solution :
We can use lag and cumsum:
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(cum.start = c(1, lag(cumsum(end - start + 1))[-1] + 1) ,
cum.end = cumsum(end - start + 1))
#> # A tibble: 7 x 5
#> # Groups: id [2]
#> id start end cum.start cum.end
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 id1 101 154 1 54
#> 2 id1 220 246 55 81
#> 3 id1 307 326 82 101
#> 4 id2 550 625 1 76
#> 5 id2 658 712 77 131
#> 6 id2 742 811 132 201
#> 7 id2 855 944 202 291