Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merging two data.tables that don't have common columns

I want to merge two data.tables that don’t have a common column, so I would end up with N1*N2 rows, where N1 and N2 are the number of rows in each dataframe.

Doing this with base R works:

A <- data.frame(id = 1:6, value = 19:24)
B <- data.frame(value2 = c(25, 25, 26, 26), value3 = 4:5)

A
#>   id value
#> 1  1    19
#> 2  2    20
#> 3  3    21
#> 4  4    22
#> 5  5    23
#> 6  6    24

B
#>   value2 value3
#> 1     25      4
#> 2     25      5
#> 3     26      4
#> 4     26      5

merge(A, B, all = TRUE)
#>    id value value2 value3
#> 1   1    19     25      4
#> 2   2    20     25      4
#> 3   3    21     25      4
#> 4   4    22     25      4
#> 5   5    23     25      4
#> 6   6    24     25      4
#> 7   1    19     25      5
#> 8   2    20     25      5
#> 9   3    21     25      5
#> 10  4    22     25      5
#> 11  5    23     25      5
#> 12  6    24     25      5
#> 13  1    19     26      4
#> 14  2    20     26      4
#> 15  3    21     26      4
#> 16  4    22     26      4
#> 17  5    23     26      4
#> 18  6    24     26      4
#> 19  1    19     26      5
#> 20  2    20     26      5
#> 21  3    21     26      5
#> 22  4    22     26      5
#> 23  5    23     26      5
#> 24  6    24     26      5

But if I now have two data.tables and not dataframes anymore, it errors:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(data.table)

A <- data.table(id = 1:6, value = 19:24)
B <- data.table(value2 = c(25, 25, 26, 26), value3 = 4:5)

merge(A, B, all = TRUE)
#> Error in merge.data.table(A, B, all = TRUE): A non-empty vector of column names for `by` is required.

How can I reproduce the base R behavior with data.table (without necessarily using merge())?

>Solution :

You are looking for a cross-join. In data.table, there is a CJ function but it only works with one data set, otherwise you can do:

res <- setkey(A[, c(k=1, .SD)], k)[B[, c(k=1, .SD)], allow.cartesian = TRUE][, k := NULL]
res
    id value value2 value3
 1:  1    19     25      4
 2:  2    20     25      4
 3:  3    21     25      4
 4:  4    22     25      4
 5:  5    23     25      4
 6:  6    24     25      4
 7:  1    19     25      5
 8:  2    20     25      5
 9:  3    21     25      5
10:  4    22     25      5
11:  5    23     25      5
12:  6    24     25      5
13:  1    19     26      4
14:  2    20     26      4
15:  3    21     26      4
16:  4    22     26      4
17:  5    23     26      4
18:  6    24     26      4
19:  1    19     26      5
20:  2    20     26      5
21:  3    21     26      5
22:  4    22     26      5
23:  5    23     26      5
24:  6    24     26      5
    id value value2 value3

Note the alternative dplyr solution:

dplyr::cross_join(A, B)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading