I have a table that looks like this:
Date Season
1 2022-01-01 Val_1
2 2022-01-02 Val_1
3 2022-01-03 Val_1
4 2022-01-04 Val_2
5 2022-01-05 Val_2
6 2022-01-06 Val_2
7 2022-01-07 Val_1
8 2022-01-08 Val_1
9 2022-01-09 Val_1
10 2022-01-10 Val_2
11 2022-01-11 Val_2
12 2022-01-12 Val_2
13 2022-01-13 Val_1
14 2022-01-14 Val_1
15 2022-01-15 Val_1
What I want to do is label each sequence of continuous Season values from 1 to the total number of continuous sequences that exist in the column, for every value that is in the column. I have seen similar solutions solved with function like rle, but I don’t currently see how to mold it to this problem. Here is an example of the output I want:
Date Season Season_Num
1 2022-01-01 Val_1 1
2 2022-01-02 Val_1 1
3 2022-01-03 Val_1 1
4 2022-01-04 Val_2 1
5 2022-01-05 Val_2 1
6 2022-01-06 Val_2 1
7 2022-01-07 Val_1 2
8 2022-01-08 Val_1 2
9 2022-01-09 Val_1 2
10 2022-01-10 Val_2 2
11 2022-01-11 Val_2 2
12 2022-01-12 Val_2 2
13 2022-01-13 Val_1 3
14 2022-01-14 Val_1 3
15 2022-01-15 Val_1 3
>Solution :
With a single mutate call, using cumsum and lag:
library(dplyr)
df %>%
mutate(Season_num = cumsum(Season == "Val_1" & lag(Season, default = "Val_2") != Season))
# Date Season Season_num
# 1 2022-01-01 Val_1 1
# 2 2022-01-02 Val_1 1
# 3 2022-01-03 Val_1 1
# 4 2022-01-04 Val_2 1
# 5 2022-01-05 Val_2 1
# 6 2022-01-06 Val_2 1
# 7 2022-01-07 Val_1 2
# 8 2022-01-08 Val_1 2
# 9 2022-01-09 Val_1 2
# 10 2022-01-10 Val_2 2
# 11 2022-01-11 Val_2 2
# 12 2022-01-12 Val_2 2
# 13 2022-01-13 Val_1 3
# 14 2022-01-14 Val_1 3
# 15 2022-01-15 Val_1 3