Merging two data.tables that don't have common columns

Advertisements

I want to merge two data.tables that don’t have a common column, so I would end up with N1*N2 rows, where N1 and N2 are the number of rows in each dataframe.

Doing this with base R works:

A <- data.frame(id = 1:6, value = 19:24)
B <- data.frame(value2 = c(25, 25, 26, 26), value3 = 4:5)

A
#>   id value
#> 1  1    19
#> 2  2    20
#> 3  3    21
#> 4  4    22
#> 5  5    23
#> 6  6    24

B
#>   value2 value3
#> 1     25      4
#> 2     25      5
#> 3     26      4
#> 4     26      5

merge(A, B, all = TRUE)
#>    id value value2 value3
#> 1   1    19     25      4
#> 2   2    20     25      4
#> 3   3    21     25      4
#> 4   4    22     25      4
#> 5   5    23     25      4
#> 6   6    24     25      4
#> 7   1    19     25      5
#> 8   2    20     25      5
#> 9   3    21     25      5
#> 10  4    22     25      5
#> 11  5    23     25      5
#> 12  6    24     25      5
#> 13  1    19     26      4
#> 14  2    20     26      4
#> 15  3    21     26      4
#> 16  4    22     26      4
#> 17  5    23     26      4
#> 18  6    24     26      4
#> 19  1    19     26      5
#> 20  2    20     26      5
#> 21  3    21     26      5
#> 22  4    22     26      5
#> 23  5    23     26      5
#> 24  6    24     26      5

But if I now have two data.tables and not dataframes anymore, it errors:

library(data.table)

A <- data.table(id = 1:6, value = 19:24)
B <- data.table(value2 = c(25, 25, 26, 26), value3 = 4:5)

merge(A, B, all = TRUE)
#> Error in merge.data.table(A, B, all = TRUE): A non-empty vector of column names for `by` is required.

How can I reproduce the base R behavior with data.table (without necessarily using merge())?

>Solution :

You are looking for a cross-join. In data.table, there is a CJ function but it only works with one data set, otherwise you can do:

res <- setkey(A[, c(k=1, .SD)], k)[B[, c(k=1, .SD)], allow.cartesian = TRUE][, k := NULL]
res
    id value value2 value3
 1:  1    19     25      4
 2:  2    20     25      4
 3:  3    21     25      4
 4:  4    22     25      4
 5:  5    23     25      4
 6:  6    24     25      4
 7:  1    19     25      5
 8:  2    20     25      5
 9:  3    21     25      5
10:  4    22     25      5
11:  5    23     25      5
12:  6    24     25      5
13:  1    19     26      4
14:  2    20     26      4
15:  3    21     26      4
16:  4    22     26      4
17:  5    23     26      4
18:  6    24     26      4
19:  1    19     26      5
20:  2    20     26      5
21:  3    21     26      5
22:  4    22     26      5
23:  5    23     26      5
24:  6    24     26      5
    id value value2 value3

Note the alternative dplyr solution:

dplyr::cross_join(A, B)

Leave a ReplyCancel reply