I am working with the R programming language.
I have the following dataset:
id = c("A","A","A","A","A","B", "B", "B", "B")
result = c(1,1,0,1,1,0,1,0,1)
my_data = data.frame(id, result)
For each unique id, I want to create a "counter variable" that:
- if the first result value is 1 then counter = 1 , else 0
- increases by 1 each time when result = 1
- becomes 0 when the result = 0
- remains 0 until the first result = 1 is encountered
- restart to increasing by 1 each time the result = 1
- when the next unique id is encountered, the counter initializes back to 1 if result = 1 , else 0
I think the final result should look something like this:
id result counter
1 A 1 1
2 A 1 2
3 A 0 0
4 A 1 1
5 A 1 2
6 B 0 0
7 B 1 1
8 B 0 0
9 B 1 1
I have these two codes that I am trying to use:
# creates counter by treating entire dataset as a single ID
my_data$counter = unlist(lapply(split(my_data$results, c(0, cumsum(abs(diff(!my_data$results == 1))))), function(x) (x[1] == 1) * seq(length(x))))
# creates counter by taking into consideration ID's
my_data$counter = ave(my_data$results, my_data$id, FUN = function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)})
But I am not sure how to interpret these correctly. For example, I am interested in learning about how to write a general function to accomplish this task with general conditions – e.g. if result = AAA then counter restarts to 0, if result = BBB then counter + 1, if result = CCC then counter + 2, if result = DDD then counter – 1.
Can someone please show me how to do this?
Thanks!
>Solution :
We may create a grouping column with rleid and then do the grouping by ‘id’ and the rleid of ‘result’
library(dplyr)
library(data.table)
my_data %>%
group_by(id) %>%
mutate(grp = rleid(result)) %>%
group_by(grp, .add = TRUE) %>%
mutate(counter = row_number() * result)%>%
ungroup %>%
select(-grp)
-output
# A tibble: 9 × 3
id result counter
<chr> <dbl> <dbl>
1 A 1 1
2 A 1 2
3 A 0 0
4 A 1 1
5 A 1 2
6 B 0 0
7 B 1 1
8 B 0 0
9 B 1 1
Or using data.table
library(data.table)
setDT(my_data)[, counter := seq_len(.N) * result, .(id, rleid(result))]
-output
> my_data
id result counter
1: A 1 1
2: A 1 2
3: A 0 0
4: A 1 1
5: A 1 2
6: B 0 0
7: B 1 1
8: B 0 0
9: B 1 1