Suppose I have a set of ranges by row:
| lower | upper |
|---|---|
| -10.4443200 | -8.695751 |
| -10.5356594 | -7.372029 |
| -3.9635740 | -2.661712 |
| -2.7043889 | -1.051237 |
| 0.8921994 | 2.525341 |
| 0.8495998 | 2.982567 |
| 0.9639315 | 3.149708 |
| 1.2656724 | 3.362623 |
| 2.8932368 | 5.332422 |
| 4.6476099 | 5.489882 |
What is an efficient way to count the number of pairs of ranges that overlap with one another?
One naive way is, but this is slow for millions of comparisons due to the loop. Perhaps a vectorised way using foverlaps would be ideal.
library(data.table)
setDT(a)
setkey(a, lower, upper)
for (i in 1:nrow(a)) {
for (j in 1:nrow(a)) {
foverlaps(a[i,], a[j,])
}
}
data=structure(list(lower = c(-10.4443200112593, -10.5356593568179,
-3.96357398513697, -2.70438891891616, 0.892199380698278, 0.849599807772024,
0.963931532617852, 1.2656723800301, 2.89323680524585, 4.64760986325676
), upper = c(-8.69575093847071, -7.37202901360451, -2.66171192367237,
-1.05123670198647, 2.5253413373515, 2.98256679223578, 3.14970844448057,
3.3626226637927, 5.33242229071662, 5.48988156249026)), row.names = c(NA,
-10L), class = "data.frame")
>Solution :
a data.table approach
library(data.table)
setDT(mydata)
setkey(mydata, lower, upper)
# !! use .N - 1 because each row overlaps with itself !!
foverlaps(mydata, mydata)[, .N - 1, by = .(lower, upper)]
# lower upper N
# 1: -10.4443200 -8.695751 1
# 2: -10.5356594 -7.372029 1
# 3: -2.7043889 -1.051237 1
# 4: -3.9635740 -2.661712 1
# 5: 0.8921994 2.525341 3
# 6: 0.9639315 3.149708 4
# 7: 1.2656724 3.362623 4
# 8: 2.8932368 5.332422 4
# 9: 0.8495998 2.982567 4
#10: 4.6476099 5.489882 1