Sequentially label specific values in a series of values in R

October 3, 2022

I have a table that looks like this:

         Date Season
1  2022-01-01  Val_1
2  2022-01-02  Val_1
3  2022-01-03  Val_1
4  2022-01-04  Val_2
5  2022-01-05  Val_2
6  2022-01-06  Val_2
7  2022-01-07  Val_1
8  2022-01-08  Val_1
9  2022-01-09  Val_1
10 2022-01-10  Val_2
11 2022-01-11  Val_2
12 2022-01-12  Val_2
13 2022-01-13  Val_1
14 2022-01-14  Val_1
15 2022-01-15  Val_1

What I want to do is label each sequence of continuous Season values from 1 to the total number of continuous sequences that exist in the column, for every value that is in the column. I have seen similar solutions solved with function like rle, but I don’t currently see how to mold it to this problem. Here is an example of the output I want:

         Date Season Season_Num
1  2022-01-01  Val_1          1
2  2022-01-02  Val_1          1
3  2022-01-03  Val_1          1
4  2022-01-04  Val_2          1
5  2022-01-05  Val_2          1
6  2022-01-06  Val_2          1
7  2022-01-07  Val_1          2
8  2022-01-08  Val_1          2
9  2022-01-09  Val_1          2
10 2022-01-10  Val_2          2
11 2022-01-11  Val_2          2
12 2022-01-12  Val_2          2
13 2022-01-13  Val_1          3
14 2022-01-14  Val_1          3
15 2022-01-15  Val_1          3

>Solution :

With a single mutate call, using cumsum and lag:

library(dplyr)
df %>% 
  mutate(Season_num = cumsum(Season == "Val_1" & lag(Season, default = "Val_2") != Season))

#          Date Season Season_num
# 1  2022-01-01  Val_1          1
# 2  2022-01-02  Val_1          1
# 3  2022-01-03  Val_1          1
# 4  2022-01-04  Val_2          1
# 5  2022-01-05  Val_2          1
# 6  2022-01-06  Val_2          1
# 7  2022-01-07  Val_1          2
# 8  2022-01-08  Val_1          2
# 9  2022-01-09  Val_1          2
# 10 2022-01-10  Val_2          2
# 11 2022-01-11  Val_2          2
# 12 2022-01-12  Val_2          2
# 13 2022-01-13  Val_1          3
# 14 2022-01-14  Val_1          3
# 15 2022-01-15  Val_1          3