Let’s say I have this example df (but with so many dates) :
library(lubridate)
df = data.frame(id = 1:5,
date1 = dmy(c('01/02/2021','03/05/2022','01/03/2021','08/12/2021','01/02/2021')),
date2 = dmy(c('03/02/2021','04/06/2022','03/08/2021','08/11/2022','02/05/2021')),
date3 = dmy(c('04/03/2021','03/08/2022','06/09/2021','09/08/2022','03/06/2021')),
date4 = dmy(c('05/08/2021','08/09/2022','09/10/2021','21/12/2022','03/07/2021')),
date5 = dmy(c('09/11/2021','11/08/2022','04/11/2021','10/02/2023','04/11/2021')))
I want a simple way to check that all the dates are coherent for each id (that date1<date2 and date1<date3 and date2 < date3 and so on …). Meaning that I want to check if there are no mistakes for each two dates including all possibilities.
>Solution :
You don’t need to check for every possible combination of dates. If A < B and B < C then automatically A < C.
We can take difference of consecutive dates and return TRUE if the difference for all of them is greater than 0.
You may do this using base R apply :
df$is_coherent <- apply(df[-1], 1, \(x) all(diff(as.Date(x)) > 0))
df
# id date1 date2 date3 date4 date5 is_coherent
#1 1 2021-02-01 2021-02-03 2021-03-04 2021-08-05 2021-11-09 TRUE
#2 2 2022-05-03 2022-06-04 2022-08-03 2022-09-08 2022-08-11 FALSE
#3 3 2021-03-01 2021-08-03 2021-09-06 2021-10-09 2021-11-04 TRUE
#4 4 2021-12-08 2022-11-08 2022-08-09 2022-12-21 2023-02-10 FALSE
#5 5 2021-02-01 2021-05-02 2021-06-03 2021-07-03 2021-11-04 TRUE
Or implementing the same logic using dplyr :
library(dplyr)
df %>%
rowwise() %>%
mutate(is_coherent = all(diff(c_across(starts_with("date"))) > 0))