Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Select multiple date ranges from a data table in R

I have a data table with 7 columns: date, time, temperature, id, treatment, year, and month. I have data from 3 winters from 2019 – 2022. The data were taken from November to April.

But I am interested in data from December to March only. So the first year is from 2019 December to 2020 March, the second year is from 2020 December to 2021 March, and the third year is from 2021 December to 2022 March.

I managed to use this code to obtain the months of interest.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

filter(month(ymd(date)) %in% c(1,2,3,12))

But how do I select specific date ranges (start and end) for each year?

2019-12-05 to 2020-03-05 AND
2020-12-07 to 2021-03-15 AND
2021-12-01 to 2021-03-28

Here is a dummy dataset to work with with just two columns (if it helps.)

library(tidyverse)

set.seed(123)

date <- sample(seq(as.Date('2019-10-01'), as.Date('2023-04-25'), by="day"), 1000)
df <- data.frame( date)
df <- df %>% mutate(id = row_number())

Most of the example posts focus on one date range, but not disjointed date ranges like this. Any help is appreciated.

>Solution :

We can do a range join (given dplyr_1.1.0 or newer).

Reducing your sample a bit,

set.seed(123)
date <- sample(seq(as.Date('2019-10-01'), as.Date('2023-04-25'), by="day"), 30)
df <- data.frame( date)
df <- df %>% mutate(id = row_number())
head(df)
#         date id
# 1 2020-11-18  1
# 2 2021-01-05  2
# 3 2020-03-27  3
# 4 2021-03-09  4
# 5 2020-04-12  5
# 6 2022-04-25  6

We can define the ranges in a separate frame:

ranges <- tibble(startdate=c("2019-12-05", "2020-12-07", "2021-12-01"), enddate=c("2020-03-05", "2021-03-15", "2021-03-28")) %>%
  mutate(across(c(startdate, enddate), as.Date), group = row_number())
ranges
# # A tibble: 3 × 3
#   startdate  enddate    group
#   <date>     <date>     <int>
# 1 2019-12-05 2020-03-05     1
# 2 2020-12-07 2021-03-15     2
# 3 2021-12-01 2021-03-28     3

and do the join/filter using inner_join:

inner_join(df, ranges, join_by(between(date, startdate, enddate)))
#         date id  startdate    enddate group
# 1 2021-01-05  2 2020-12-07 2021-03-15     2
# 2 2021-03-09  4 2020-12-07 2021-03-15     2
# 3 2021-03-02 21 2020-12-07 2021-03-15     2
# 4 2021-02-01 30 2020-12-07 2021-03-15     2

where only four rows are found in the three ranges we specified.

Assuming the ranges date ranges are non-overlapping, this is analogous to

df %>%
  filter(
    between(date, ranges$startdate[1], ranges$enddate[1]) | 
    between(date, ranges$startdate[2], ranges$enddate[2]) | 
    between(date, ranges$startdate[3], ranges$enddate[3])
  )
#         date id
# 1 2021-01-05  2
# 2 2021-03-09  4
# 3 2021-03-02 21
# 4 2021-02-01 30

Note that if any of the ranges date ranges overlap, you may have multiples. For instance,

ranges <- tibble(startdate=c("2019-12-05", "2020-12-07", "2021-01-01"), enddate=c("2020-03-05", "2021-03-15", "2021-01-31")) %>%
  mutate(across(c(startdate, enddate), as.Date), group = row_number())
ranges
# # A tibble: 3 × 3
#   startdate  enddate    group
#   <date>     <date>     <int>
# 1 2019-12-05 2020-03-05     1
# 2 2020-12-07 2021-03-15     2
# 3 2021-01-01 2021-01-31     3  # <--- this overlaps with id=2

then we’ll see repeats of ids:

inner_join(df, ranges, join_by(between(date, startdate, enddate)))
#         date id  startdate    enddate group
# 1 2021-01-05  2 2020-12-07 2021-03-15     2
# 2 2021-01-05  2 2021-01-01 2021-01-31     3
# 3 2021-03-09  4 2020-12-07 2021-03-15     2
# 4 2021-03-02 21 2020-12-07 2021-03-15     2
# 5 2021-02-01 30 2020-12-07 2021-03-15     2

If this is a concern, then you’ll need to find some way to deal with multiple rows per id, such as first or min(group) or something else.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading