Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Group by Consecutive Dates R

I want to group animals based on consecutive months they were found within the same burrow, but also split up those groups if the months were not consecutive.

#Input Data
burrow.data <- read.csv
   Animal Burrow       Date
1     027  B0961 2022-03-01
2     027  B0961 2022-04-26
3     033  1920  2021-11-02
4     033  1955  2022-03-29
5     033  1955  2022-04-26
6     063  B0540 2021-04-21
7     063  B0540 2022-01-04
8     063  B0540 2022-03-01
9     101  B0021 2020-11-23
10    101  B0021 2020-12-23
11    101  B0021 2021-11-04
12    101  B0021 2022-01-06
13    101  B0021 2022-02-04
14    101  B0021 2022-03-03

#Expected Output
 Animal Burrow grp Date.Start   Date.End
1    033  1920   1 2021-11-02 2021-11-02
2    033  1955   1 2022-03-29 2022-04-26
3    101  B0021  1 2020-11-23 2020-12-23
4    101  B0021  2 2022-01-06 2020-03-03
5    063  B0540  1 2021-04-21 2022-03-01
6    027  B0961  1 2022-03-01 2022-04-26

I used code from another post: Group consecutive dates in R

And wrote:
burrow.input <- burrow.data[order(burrow.data$Date),]
burrow.input$grp <- ave(as.integer(burrow.input$Date), burrow.input[-4], FUN = function(z) cumsum(c(TRUE, diff(z)>1)))
burrow.input

out <- aggregate(Date ~ Animal + Burrow + grp, data = burrow.input, FUN = function(z) setNames(range(z), c("Start", "End")))
out <- do.call(data.frame,out)

out[,4:5] <- lapply(out[,4:5], as.Date, origin = "1970-01-01")
out

The code keeps grouping 101 into a single group instead of two groups broken up by a date gap (See below).
How can I fix this?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  Animal Burrow grp Date.Start   Date.End
1    033  1920   1 2021-11-02 2021-11-02
2    033  1955   1 2022-03-29 2022-04-26
3    101  B0021  1 2020-11-23 2022-03-03
4    063  B0540  1 2021-04-21 2022-03-01
5    027  B0961  1 2022-03-01 2022-04-26

>Solution :

Group the data by Animal, Burrow and a grouping variable that changes each time the date jumps by more than 1 month. Here as.yearmon converts the date to a yearmon object which internally is a year plus 0 for Jan, 1/12 for Feb, …, 11/12 for Dec so multiply that by 12 and check whether the difference between it and the prior value is greater than 1. Take the cumulative sum of that to generate a grouping variable. Finally summarize that, sort and remove the grouping variable that was added.

library(dplyr)
library(zoo)

burrow.data %>%
  group_by(Animal, Burrow, 
    diff = cumsum( c(1, diff(12 * as.yearmon(Date)) > 1) ) ) %>%
  summarize(Date.start = first(Date), Date.end = last(Date), .groups = "drop") %>%
  arrange(Burrow) %>%
  select(-diff)

giving:

# A tibble: 7 × 4
  Animal Burrow Date.start Date.end  
   <int> <chr>  <chr>      <chr>     
1     33 1920   2021-11-02 2021-11-02
2     33 1955   2022-03-29 2022-04-26
3    101 B0021  2020-11-23 2021-11-04
4    101 B0021  2022-01-06 2022-03-03
5     63 B0540  2021-04-21 2022-01-04
6     63 B0540  2022-03-01 2022-03-01
7     27 B0961  2022-03-01 2022-04-26
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading