I have long time series dataframe grouped by id. The series have different start dates and also missing observations. I want to complete missing observations, by completing the the date and id and filling it with 0.
What I want to avoid in the process, is to complete the missing observations in the beginning, because this is just an indicator, that the time series has a later starting point (different launch date of product for example).
In my reprex I used complete from tidyr. It does the opposite of what I want. Instead of completing the id "A1" with "2015-01-04", it completes the id "B1" with "2015-01-01", which is not needed in this case. Does complete always create groups of the same size? Maybe then it is the wrong function.
How can I achieve the opposite in the following example?
library(tidyr)
data <- data.frame (id = as.character(c(rep("A1",6),rep("B1",5))),
value = c(seq( 1, 9, length.out = 11)),
date = as.Date(c(c("2015-01-01","2015-01-02","2015-01-03",
"2015-01-05","2015-01-06","2015-01-07"),
c("2015-01-02","2015-01-03","2015-01-05",
"2015-01-06","2015-01-07")
)
)
)
data %>% complete(date, id, fill = list(value = 0))
>Solution :
You need to provide the dates to fill explicitly:
data %>%
group_by(id) %>%
complete(date = seq(min(date), max(date), by = 1), fill = list(value = 0))