Home How to efficiently calculate the number of overlaps between a set of ranges?

Questions

How to efficiently calculate the number of overlaps between a set of ranges?

June 30, 2023

Suppose I have a set of ranges by row:

lower	upper
-10.4443200	-8.695751
-10.5356594	-7.372029
-3.9635740	-2.661712
-2.7043889	-1.051237
0.8921994	2.525341
0.8495998	2.982567
0.9639315	3.149708
1.2656724	3.362623
2.8932368	5.332422
4.6476099	5.489882

What is an efficient way to count the number of pairs of ranges that overlap with one another?

One naive way is, but this is slow for millions of comparisons due to the loop. Perhaps a vectorised way using foverlaps would be ideal.

library(data.table)
setDT(a)
setkey(a, lower, upper)

for (i in 1:nrow(a)) {
    for (j in 1:nrow(a)) {
        foverlaps(a[i,], a[j,])
    }
}

data=structure(list(lower = c(-10.4443200112593, -10.5356593568179,
-3.96357398513697, -2.70438891891616, 0.892199380698278, 0.849599807772024,
0.963931532617852, 1.2656723800301, 2.89323680524585, 4.64760986325676
), upper = c(-8.69575093847071, -7.37202901360451, -2.66171192367237,
-1.05123670198647, 2.5253413373515, 2.98256679223578, 3.14970844448057,
3.3626226637927, 5.33242229071662, 5.48988156249026)), row.names = c(NA,
-10L), class = "data.frame")

>Solution :

a data.table approach

library(data.table)
setDT(mydata)
setkey(mydata, lower, upper)
# !! use .N - 1 because each row overlaps with itself !!
foverlaps(mydata, mydata)[, .N - 1, by = .(lower, upper)]
#          lower     upper N
# 1: -10.4443200 -8.695751 1
# 2: -10.5356594 -7.372029 1
# 3:  -2.7043889 -1.051237 1
# 4:  -3.9635740 -2.661712 1
# 5:   0.8921994  2.525341 3
# 6:   0.9639315  3.149708 4
# 7:   1.2656724  3.362623 4
# 8:   2.8932368  5.332422 4
# 9:   0.8495998  2.982567 4
#10:   4.6476099  5.489882 1