I thought this would be trivial, and I think it must be, but I am very tired and stuck at this problem at the moment.
Consider a df with two columns, one with a year, and the other with a binary variable indicating some event.
df <- data.frame(year = c(2000,2001,2002,2003,2004, 2005,2006,2007,2008,2010),
flag = c(0,0,0,1,0,0,0,1,0,0))
I want to create a third column that simply counts the years since the last flag and that resets when a new flag appears, like so:
I thought this code would do the job:
First, add a 0 as "year_since" for every year with a flag, then, if there was a flag in the previous year, add 1 to the value of the previous "year_since".
df <- df %>% mutate(year_since = ifelse(flag == 1, 0, NA)) %>%
mutate(year_since = ifelse(dplyr::lag(flag, n=1, order_by = "year") == 1 & is.na(year_since),
dplyr::lag(year_since, n=1, order_by = "year")+1, year_since))
However, this returns NA for every row that should be 1,2,3, and so on.
>Solution :
You could do
df %>%
group_by(group = cumsum(flag)) %>%
mutate(year_since = ifelse(group == 0, NA, seq(n()) - 1)) %>%
ungroup() %>%
select(-group)
#> # A tibble: 10 x 3
#> year flag year_since
#> <dbl> <dbl> <dbl>
#> 1 2000 0 NA
#> 2 2001 0 NA
#> 3 2002 0 NA
#> 4 2003 1 0
#> 5 2004 0 1
#> 6 2005 0 2
#> 7 2006 0 3
#> 8 2007 1 0
#> 9 2008 0 1
#> 10 2010 0 2
Created on 2022-09-16 with reprex v2.0.2

