Algorithm to total the combined length of segments

February 3, 2022

I am looking into this problem:

Let’s assume we have N segments on the X-axis with different start and end points. This is described with the next data structure:

[(start1, end1),..., (startN, endN)]

Now we want to calculate the whole size of such segments, but overlaps should be not be double counted.

Example

input: [(0,3),(1,2), (6,7)]
output: 4

Because segment (1,2) overlaps with segment (0,3), the distance between 0 and 3 is just 3, and then between 6 and 7 it is 1. So 3+1 = 4.

Question

Is there an algorithm to make this calculation for large input sizes?

>Solution :

You can proceed as follows:

Extract both coordinates of each pair and mark whether they end a segment (flag) or not (when they start one). Sort these new pairs (x, end). Then process them. When the coordinate starts a segment, push the coordinate on a stack. When it ends a segment, pop the corresponding start coordinate from the stack. If the stack is empty add the difference between end and start coordinate to the total:

lst = [(0,3),(1,2), (6,7)]

stack = []
total = 0
for x, end in sorted((x, i) for pair in lst for i, x in enumerate(pair)):
    if end:
        start = stack.pop()
        if not len(stack):
            total += x - start
    else:
        stack.append(x)

print(total)  # 4