I am running into issues with subsetting after subtracting dates using lubridate. I have a dataframe:
customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
returndate <- c("2018-09-15", "2018-09-18", "2018-09-20", "2019-09-15", "2021-08-20", "2020-07-21","2020-09-21", "2018-08-15")
orderid <- c("1", "2", "3", "4", "5", "6", "7", "8")
df <- data.frame(customerid, orderdate, returndate, orderid)
I then want to subset difference = 1 day. Here is my code:
df$orderdate <- as.Date(df$orderdate, format="%m/%d/%Y")
df$resultdate <- as.Date(df$resultdate, format="%m/%d/%Y")
df$order_result_datediff <- difftime(df$orderdate, df$resultdate, units = "days" )
df$nextday <- subset(df, df$order_result_datediff == '1 day')
The subsetting does not work, and I just get all the results in the column. Thank you!
>Solution :
It looks like there are a couple of issues with your code.
First, you are specifying the format "%m/%d/%Y" when converting the date strings to Date objects, but the actual format of the date strings in your orderdate and returndate columns is "%Y-%m-%d". So you should use "%Y-%m-%d" instead.
Second, when you use subset(df, df$order_result_datediff == ‘1 day’), you are comparing the order_result_datediff column to the string ‘1 day’, which is not a valid way to compare time intervals. Instead, you can use the as.numeric() function to convert the time interval to a number of days, and then compare that to the number 1.
Here’s the corrected code:
library(lubridate)
customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
returndate <- c("2018-09-15", "2018-09-18", "2018-09-20", "2019-09-15", "2021-08-20", "2020-07-21","2020-09-21", "2018-08-15")
orderid <- c("1", "2", "3", "4", "5", "6", "7", "8")
df <- data.frame(customerid, orderdate, returndate, orderid)
df$orderdate <- as.Date(df$orderdate, format="%Y-%m-%d")
df$returndate <- as.Date(df$returndate, format="%Y-%m-%d")
df$order_result_datediff <- difftime(df$orderdate, df$returndate, units = "days" )
df$nextday <- subset(df, as.numeric(df$order_result_datediff) == 1)
This should give you a new column nextday that contains the subset of rows where the difference between orderdate and returndate is exactly one day. Note that in this case, there are no rows where the difference is exactly one day, so the nextday column will be all NA values.