R Comparing all columns of one dataframe with all columns of another

Advertisements

I have two dataframes each containing lists of dates in columns and I need to find the difference in between these dates in days by comparing each column against each other. The number of columns in each data frame can vary so I’d like to code to be flexible enough to handle this.

I can create a new column which determines the number of days between two columns on the same dataset using the following

DF.Out<-mutate(DF, DF=as.numeric(difftime(DF$Date_1,DF$Date_2, units = "d")))

but have been unable to expand this to handle my current problem I guess I need to use lapply or similar? See an example of input and desired output below

Input Dataframe1:

df1_Date1 df1_Date2
20/1/2023 25/1/2023
15/1/2023 17/1/2023

Input Dataframe2:

df2_Date1 df2_Date2 df2_Date3
23/1/2023 24/1/2023 20/1/2023
16/1/2023 20/1/2023 27/1/2023

Desired Output

df1_Date1 v df2_Date1 df1_Date1 v df2_Date2 df1_Date1 v df2_Date3 df1_Date2 v df2_Date1 df1_Date2 v df2_Date2 df1_Date2 v df2_Date3
3 4 0 -2 -1 -5
1 5 13 -1 3 10

>Solution :

We could use outer

df1[] <- lapply(df1, as.Date, format = "%d/%m/%Y")
df2[] <- lapply(df2, as.Date, format = "%d/%m/%Y")
f1 <- Vectorize(function(x, y) list(as.numeric(difftime(y, x, units = "d"))))
m1 <- outer(df1, df2, FUN = f1)
            df2_Date1 df2_Date2 df2_Date3
df1_Date1      3, 1      4, 5     0, 12
df1_Date2    -2, -1     -1, 3    -5, 10
as.data.frame(setNames(c(m1), do.call(paste, 
   c(expand.grid(dimnames(m1)), sep = " v "))))

-output

   df1_Date1.v.df2_Date1 df1_Date2.v.df2_Date1 df1_Date1.v.df2_Date2 df1_Date2.v.df2_Date2 df1_Date1.v.df2_Date3 df1_Date2.v.df2_Date3
1                     3                    -2                     4                    -1                     0                    -5
2                     1                    -1                     5                     3                    12                    10

data

df1 <- structure(list(df1_Date1 = c("20/1/2023", "15/1/2023"), df1_Date2 = c("25/1/2023", 
"17/1/2023")), class = "data.frame", row.names = c(NA, -2L))

df2 <- structure(list(df2_Date1 = c("23/1/2023", "16/1/2023"), df2_Date2 = c("24/1/2023", 
"20/1/2023"), df2_Date3 = c("20/1/2023", "27/1/2023")), class = "data.frame", row.names = c(NA, 
-2L))

Leave a ReplyCancel reply