I have a data df2
date X Days
2020-01-06 525 NA
2020-01-07 799 NA
2020-01-08 782 NA
2020-01-09 542 NA
2020-01-10 638 5
2020-01-11 1000 5
2020-01-12 1400 3
2020-01-13 3500 1
I want to count how many days it will take for the sum of X to surpass a value. In this case, the value is 3000.
For example on 1/13, it took 1 day because X is 3500, so it already surpassed 3000. On 1/12 it took 3 days (1400+1000+638)=3038.
I wish to get the column Days.
dput(df2)
structure(list(date = structure(c(1578268800, 1578355200, 1578441600,
1578528000, 1578614400, 1578700800, 1578787200, 1578873600), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), X = c(525, 799, 782, 542, 638, 1000,
1400, 3500), Days = c(NA, NA, NA, NA, 5, 5, 3, 1)), class = "data.frame", row.names = c(NA,
-8L))
>Solution :
I think a rolling-function works well. Unlike most rolling functions which have a fixed window that is smaller than the length of data, we will intentionally make this full-width.
zoo::rollapplyr(
df2$X, nrow(df2),
FUN = function(z) which(cumsum(rev(z)) > 3000)[1],
partial = TRUE)
# [1] NA NA NA NA 5 5 3 1
(I’m ignoring date, assuming that the rows are consecutive-days.)