I have large dataframe like this
df <- data.frame(min =seq(1,90, by=1), event=sample(LETTERS,90,replace=TRUE))
And I would like to create new column (segment) to identify and name segments between specific values.
For example, first segment should start from the beginning of dataframe until event "A". Second segment should start after "A" and continue until next event "A". And last segment should start from the last "A" until end of dataframe.
It’s better to show desired output
| min | event | segment |
|---|---|---|
| 1 | C | 1-5 |
| 2 | D | 1-5 |
| 3 | D | 1-5 |
| 4 | E | 1-5 |
| 5 | A | 1-5 |
| 6 | E | 6-10 |
| 7 | G | 6-10 |
| 8 | G | 6-10 |
| 9 | G | 6-10 |
| 10 | A | 6-10 |
| 11 | F | 11-12 |
| 12 | G | 11-12 |
I know that I should to use for-Loops but a bit confused how to do that.
>Solution :
No need for explicit loops. Some simple data manipulation within dplyr should do the trick:
library(dplyr)
df <- data.frame(min = 1:12,
event = c("C", "D", "D", "E", "A",
"E", "G", "G", "G", "A", "F", "G"))
df %>%
mutate(cluster = lag(cumsum(event == "A"), 1, 0)) %>%
mutate(segment = paste(first(min), last(min), sep = "-"), .by = "cluster") %>%
select(-cluster)
#> min event segment
#> 1 1 C 1-5
#> 2 2 D 1-5
#> 3 3 D 1-5
#> 4 4 E 1-5
#> 5 5 A 1-5
#> 6 6 E 6-10
#> 7 7 G 6-10
#> 8 8 G 6-10
#> 9 9 G 6-10
#> 10 10 A 6-10
#> 11 11 F 11-12
#> 12 12 G 11-12