Advertisements
How can I calculate col2
?
col2
should increment each time a value in col1
appears after being interrupted by a different value, e.g. creating a grouping variable that changes every time a new sequence starts.
structure(list(col1 = c("A", "A", "A", "A", "B", "B", "B", "C",
"C", "C", "C", "A", "A", "E", "E", "E", "F", "F", "F", "G", "G",
"G", "C", "C", "C", "A", "A", "A"), col2 = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L)), class = "data.frame", row.names = c(NA,
-28L))
Desired output:
col1 col2
1 A 1
2 A 1
3 A 1
4 A 1
5 B 1
6 B 1
7 B 1
8 C 1
9 C 1
10 C 1
11 C 1
12 A 2
13 A 2
14 E 1
15 E 1
16 E 1
17 F 1
18 F 1
19 F 1
20 G 1
21 G 1
22 G 1
23 C 2
24 C 2
25 C 2
26 A 3
27 A 3
28 A 3
>Solution :
One option would be to use consecutive_id
two times like so:
library(dplyr, warn = FALSE)
df |>
select(-col2) |>
mutate(col2 = consecutive_id(col1)) |>
mutate(col2 = consecutive_id(col2), .by = col1)
#> col1 col2
#> 1 A 1
#> 2 A 1
#> 3 A 1
#> 4 A 1
#> 5 B 1
#> 6 B 1
#> 7 B 1
#> 8 C 1
#> 9 C 1
#> 10 C 1
#> 11 C 1
#> 12 A 2
#> 13 A 2
#> 14 E 1
#> 15 E 1
#> 16 E 1
#> 17 F 1
#> 18 F 1
#> 19 F 1
#> 20 G 1
#> 21 G 1
#> 22 G 1
#> 23 C 2
#> 24 C 2
#> 25 C 2
#> 26 A 3
#> 27 A 3
#> 28 A 3