Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: Restarting a Counter By Groups Under Conditions

I am working with the R programming language.

I have the following dataset:

id = c("A","A","A","A","A","B", "B", "B", "B")
result = c(1,1,0,1,1,0,1,0,1)
my_data = data.frame(id, result)

For each unique id, I want to create a "counter variable" that:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • if the first result value is 1 then counter = 1 , else 0
  • increases by 1 each time when result = 1
  • becomes 0 when the result = 0
  • remains 0 until the first result = 1 is encountered
  • restart to increasing by 1 each time the result = 1
  • when the next unique id is encountered, the counter initializes back to 1 if result = 1 , else 0

I think the final result should look something like this:

   id result counter
1  A      1       1
2  A      1       2
3  A      0       0
4  A      1       1
5  A      1       2
6  B      0       0
7  B      1       1
8  B      0       0
9  B      1       1

I have these two codes that I am trying to use:

# creates counter by treating entire dataset as a single ID
 my_data$counter =   unlist(lapply(split(my_data$results, c(0, cumsum(abs(diff(!my_data$results == 1))))), function(x) (x[1] == 1) * seq(length(x))))

# creates counter by taking into consideration ID's
my_data$counter = ave(my_data$results, my_data$id, FUN = function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)})

But I am not sure how to interpret these correctly. For example, I am interested in learning about how to write a general function to accomplish this task with general conditions – e.g. if result = AAA then counter restarts to 0, if result = BBB then counter + 1, if result = CCC then counter + 2, if result = DDD then counter – 1.

Can someone please show me how to do this?

Thanks!

>Solution :

We may create a grouping column with rleid and then do the grouping by ‘id’ and the rleid of ‘result’

library(dplyr)
library(data.table)
my_data %>% 
  group_by(id) %>%
  mutate(grp = rleid(result)) %>% 
  group_by(grp, .add = TRUE) %>%
  mutate(counter = row_number() * result)%>%
  ungroup %>% 
  select(-grp)

-output

# A tibble: 9 × 3
  id    result counter
  <chr>  <dbl>   <dbl>
1 A          1       1
2 A          1       2
3 A          0       0
4 A          1       1
5 A          1       2
6 B          0       0
7 B          1       1
8 B          0       0
9 B          1       1

Or using data.table

library(data.table)
setDT(my_data)[, counter := seq_len(.N) * result, .(id, rleid(result))]

-output

> my_data
   id result counter
1:  A      1       1
2:  A      1       2
3:  A      0       0
4:  A      1       1
5:  A      1       2
6:  B      0       0
7:  B      1       1
8:  B      0       0
9:  B      1       1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading