Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Running into issues with subsetting after subtracting dates?

I am running into issues with subsetting after subtracting dates using lubridate. I have a dataframe:

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
returndate <- c("2018-09-15", "2018-09-18", "2018-09-20", "2019-09-15", "2021-08-20", "2020-07-21","2020-09-21", "2018-08-15")
orderid <- c("1", "2", "3", "4", "5", "6", "7", "8")
df <- data.frame(customerid, orderdate, returndate, orderid)

I then want to subset difference = 1 day. Here is my code:

df$orderdate <- as.Date(df$orderdate, format="%m/%d/%Y")
df$resultdate <- as.Date(df$resultdate, format="%m/%d/%Y")
df$order_result_datediff <- difftime(df$orderdate, df$resultdate, units = "days" ) 
df$nextday <- subset(df, df$order_result_datediff == '1 day')

The subsetting does not work, and I just get all the results in the column. Thank you!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

It looks like there are a couple of issues with your code.

First, you are specifying the format "%m/%d/%Y" when converting the date strings to Date objects, but the actual format of the date strings in your orderdate and returndate columns is "%Y-%m-%d". So you should use "%Y-%m-%d" instead.

Second, when you use subset(df, df$order_result_datediff == ‘1 day’), you are comparing the order_result_datediff column to the string ‘1 day’, which is not a valid way to compare time intervals. Instead, you can use the as.numeric() function to convert the time interval to a number of days, and then compare that to the number 1.

Here’s the corrected code:

library(lubridate)

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
returndate <- c("2018-09-15", "2018-09-18", "2018-09-20", "2019-09-15", "2021-08-20", "2020-07-21","2020-09-21", "2018-08-15")
orderid <- c("1", "2", "3", "4", "5", "6", "7", "8")


df <- data.frame(customerid, orderdate, returndate, orderid)
df$orderdate <- as.Date(df$orderdate, format="%Y-%m-%d")
df$returndate <- as.Date(df$returndate, format="%Y-%m-%d")
df$order_result_datediff <- difftime(df$orderdate, df$returndate, units = "days" ) 

df$nextday <- subset(df, as.numeric(df$order_result_datediff) == 1)

This should give you a new column nextday that contains the subset of rows where the difference between orderdate and returndate is exactly one day. Note that in this case, there are no rows where the difference is exactly one day, so the nextday column will be all NA values.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading