I have a data frame with start and end dates, and I am calculating the difference in time between them using the difftime
function. However, some of my start dates are greater than my end dates, which results in a negative time difference. I need to swap the start date for the end date and vice versa when this happens. How can I do this?
Here is an example data frame:
df1 <- data.frame(matrix(ncol = 3, nrow = 3))
colnames(df1)[1:3] <- c('date_start','date_end','diff')
df1$date_start <- c(as.Date('2004-11-09'),
as.Date('2020-01-01'),
as.Date('1992-09-01'))
df1$date_end <- c(as.Date('2005-11-09'),
as.Date('2010-12-31'),
as.Date('2006-10-31'))
df1$diff <- difftime(df1$date_end,df1$date_start)
df1
The correct data frame should look like this (with the start and end dates swapped in the second row):
date_start date_end diff
1 2004-11-09 2005-11-09 365 days
2 2010-12-31 2020-01-01 3288 days
3 1992-09-01 2006-10-31 5173 days
>Solution :
df1 <- df1 |>
transform(date_start = pmin(date_start, date_end),
date_end = pmax(date_start, date_end))
(base::transform
is similar to dplyr::mutate
, but one difference is that each term is calculated based on the incoming data, not on "the data as it exists after modifications to that point." The code would not work like this with mutate
since date_start
would be overwritten after the first step.)