Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to (efficiently) perform Cartesian product on a key subset [R]

Suppose I have these data

data1 <- read.delim(textConnection(
"id val1
1 blue
1 green
1 red
2 black
2 brown
2 white"
), sep=' ')

data2 <- read.delim(textConnection(
"id val2
1 cat
1 dog
1 fish
2 hat
2 coat
2 car"
), sep=' ')

I would like to calculate all permutations of blue, green, and red cat, dog, and fish for id=1 and brown, black, and white hats, coats, and cars for id=2. I could do it in a for loop with expand.grid, and then "build" the output using rbind. But my actual data have several IDs and several vals so it runs poorly.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

In base R, we could use split on both the datasets to create a list of values by ‘id’ and then apply the expand.grid on the corresponding elements of the list and rbind (if needed)

Map(expand.grid, split(data1$val1, data1$id), split(data2$val2, data2$id))

Or in data.table

library(data.table)
setDT(data1)[data2, on = .(id), allow.cartesian = TRUE]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading