Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I find three or more consecutive dates with the same value in R?

I’m trying to find periods of 3 days or more where the values are the same. As an example, if January 1st, 2nd, and 3rd all have a value of 2, they should be included – but if January 2nd has a value of 3, then none of them should be.

I’ve tried a few ways so far but no luck! Any help would be greatly appreciated!

Reprex:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library("dplyr")
#Goal: include all values with values of 2 or less for 5 consecutive days and allow for a "cushion" period of values of 2 to 5 for up to 3 days
data <- data.frame(Date = c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-06", "2000-01-07", "2000-01-08", "2000-01-09", "2000-01-10", "2000-01-11", "2000-01-12", "2000-01-13", "2000-01-14", "2000-01-15", "2000-01-16", "2000-01-17", "2000-01-18", "2000-01-19", "2000-01-20", "2000-01-21", "2000-01-22", "2000-01-23", "2000-01-24", "2000-01-25", "2000-01-26", "2000-01-27", "2000-01-28", "2000-01-29", "2000-01-30"),
                   Value = c(2,2,2,5,2,2,1,0,1,8,7,7,7,5,2,3,4,5,7,2,6,6,6,6,2,0,3,4,0,1))

head(data)                   
#Goal: values should include dates from 2000-01-01 to 2000-01-03, 2000-01-11 to 2000-01-13, and 2000-01-21 to 2000-01-24

#My attempt so far but it doesn't work 
attempt1 <- data %>% 
  group_by(group_id = as.integer(gl(n(),3,n()))) %>% #3 day chunks
  filter(Value == Value) %>%  #looking for the values being the same inbetween, but this doesn't work for that
  ungroup() %>% 
  select(-group_id)

head(attempt1)

>Solution :

You can create a temporary variable using rleid from the data.table package.

data %>%
  group_by(data.table::rleid(Value)) %>%
  filter(n() >= 3) %>%
  ungroup() %>%
  select(Date, Value)
#> # A tibble: 10 x 2
#>    Date       Value
#>    <chr>      <dbl>
#>  1 2000-01-01     2
#>  2 2000-01-02     2
#>  3 2000-01-03     2
#>  4 2000-01-11     7
#>  5 2000-01-12     7
#>  6 2000-01-13     7
#>  7 2000-01-21     6
#>  8 2000-01-22     6
#>  9 2000-01-23     6
#> 10 2000-01-24     6

Or, if you want to avoid using another package, you could equivalently do

data %>%
  group_by(temp = cumsum(c(1, diff(Value) != 0))) %>%
  filter(n() > 2) %>%
  ungroup() %>%
  select(-temp)
#> # A tibble: 10 x 2
#>    Date       Value
#>    <chr>      <dbl>
#>  1 2000-01-01     2
#>  2 2000-01-02     2
#>  3 2000-01-03     2
#>  4 2000-01-11     7
#>  5 2000-01-12     7
#>  6 2000-01-13     7
#>  7 2000-01-21     6
#>  8 2000-01-22     6
#>  9 2000-01-23     6
#> 10 2000-01-24     6

Created on 2022-09-12 with reprex v2.0.2

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading