Given:
library(data.table)
dat1 <- setDT(data.frame(pat=c("A.C",".BC"),val=c(1,2)))
dat2 <- setDT(data.frame(q=c("ABC","AXC","XBC"),val2=c(10,11,12)))
I want the result to be:
dat3 <- setDT(data.frame(pat=c("A.C","A.C",".BC",".BC"),val=c(1,1,2,2),q=c("ABC","AXC","ABC","XBC"),val2=c(10,11,10,12)))
dat3
pat val q val2
1: A.C 1 ABC 10
2: A.C 1 AXC 11
3: .BC 2 ABC 10
4: .BC 2 XBC 12
In other words, a left join of dat1 to dat2 for each match of regex pattern pat to query string q
I was wondering if this is possible with a concise data.table merge expression, i.e.
dat1[dat2, .(pat, val, q, val2), on= .(grepl(pat,q))] # this does not work
or any other data.table trick for efficiency. In reality dat1 is hundreds of rows and dat2 can be 10-100 thousand rows.
This sounds simple enough, but I haven’t found a post that quite covers it.
The closest I’ve gotten to is the following:
match_pat_to_q <- function(pattern, data, data.col="q"){
ret<-lapply(pattern, function(x){
data[grepl(x,get(data.col))]
})
names(ret) <- pattern # becomes an .id column in next step
rbindlist(ret, idcol=TRUE)
}
match_pat_to_q(dat1$pat, dat2)[dat1, on=.(.id==pat)]
.id q val2 val
1: A.C ABC 10 1
2: A.C AXC 11 1
3: .BC ABC 10 2
4: .BC XBC 12 2
>Solution :
You can solve your probelm as follows:
dat1[, dat2[grep(pat, q), .(val, x=q, val2)], by=pat]
pat val x val2
<char> <num> <char> <num>
1: A.C 1 ABC 10
2: A.C 1 AXC 11
3: .BC 2 ABC 10
4: .BC 2 XBC 12
You group dat1 by pat. This allows to logically create one group for each pattern in dat1. Then for each group, you filter the corresponding data from dat2 using pattern matching while selecting the variables needed at the same time.