R: create new column with for-Loops

December 16, 2023

I have large dataframe like this

 df <- data.frame(min =seq(1,90, by=1), event=sample(LETTERS,90,replace=TRUE))

And I would like to create new column (segment) to identify and name segments between specific values.

For example, first segment should start from the beginning of dataframe until event "A". Second segment should start after "A" and continue until next event "A". And last segment should start from the last "A" until end of dataframe.

It’s better to show desired output

min	event	segment
1	C	1-5
2	D	1-5
3	D	1-5
4	E	1-5
5	A	1-5
6	E	6-10
7	G	6-10
8	G	6-10
9	G	6-10
10	A	6-10
11	F	11-12
12	G	11-12

I know that I should to use for-Loops but a bit confused how to do that.

>Solution :

No need for explicit loops. Some simple data manipulation within dplyr should do the trick:

library(dplyr)

df <- data.frame(min = 1:12, 
                 event = c("C", "D", "D", "E", "A", 
                           "E", "G", "G", "G", "A", "F", "G"))

df %>%
  mutate(cluster = lag(cumsum(event == "A"), 1, 0)) %>%
  mutate(segment = paste(first(min), last(min), sep = "-"), .by = "cluster") %>%
  select(-cluster)
#>    min event segment
#> 1    1     C     1-5
#> 2    2     D     1-5
#> 3    3     D     1-5
#> 4    4     E     1-5
#> 5    5     A     1-5
#> 6    6     E    6-10
#> 7    7     G    6-10
#> 8    8     G    6-10
#> 9    9     G    6-10
#> 10  10     A    6-10
#> 11  11     F   11-12
#> 12  12     G   11-12