Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to check if a series of dates are coherent in a dataframe?

Let’s say I have this example df (but with so many dates) :

library(lubridate)

df = data.frame(id = 1:5,
                date1 = dmy(c('01/02/2021','03/05/2022','01/03/2021','08/12/2021','01/02/2021')),
                date2 = dmy(c('03/02/2021','04/06/2022','03/08/2021','08/11/2022','02/05/2021')),
                date3 = dmy(c('04/03/2021','03/08/2022','06/09/2021','09/08/2022','03/06/2021')),
                date4 = dmy(c('05/08/2021','08/09/2022','09/10/2021','21/12/2022','03/07/2021')),
                date5 = dmy(c('09/11/2021','11/08/2022','04/11/2021','10/02/2023','04/11/2021')))

I want a simple way to check that all the dates are coherent for each id (that date1<date2 and date1<date3 and date2 < date3 and so on …). Meaning that I want to check if there are no mistakes for each two dates including all possibilities.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You don’t need to check for every possible combination of dates. If A < B and B < C then automatically A < C.

We can take difference of consecutive dates and return TRUE if the difference for all of them is greater than 0.

You may do this using base R apply :

df$is_coherent <- apply(df[-1], 1, \(x) all(diff(as.Date(x)) > 0))
df

#  id      date1      date2      date3      date4      date5 is_coherent
#1  1 2021-02-01 2021-02-03 2021-03-04 2021-08-05 2021-11-09        TRUE
#2  2 2022-05-03 2022-06-04 2022-08-03 2022-09-08 2022-08-11       FALSE
#3  3 2021-03-01 2021-08-03 2021-09-06 2021-10-09 2021-11-04        TRUE
#4  4 2021-12-08 2022-11-08 2022-08-09 2022-12-21 2023-02-10       FALSE
#5  5 2021-02-01 2021-05-02 2021-06-03 2021-07-03 2021-11-04        TRUE

Or implementing the same logic using dplyr :

library(dplyr)

df %>%
  rowwise() %>%
  mutate(is_coherent = all(diff(c_across(starts_with("date"))) > 0))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading