Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find overlapping ranges from a large set of data efficiently

Is there an efficient algorithm for checking for overlaps between multiple bounded continuous ranges?

If we have a pair of tasks, each represented by a start time and an end time, a function such as the one below can detect clashes.

# tasks are of the form (start_time, end_time)

def has_clash(task_1: tuple[float, float], task_2: tuple[float, float]) -> bool:
    """Check if a clash exists between two tasks."""
    return min(task_1[1], task_2[1]) > max(task_1[0], task_2[0])

To check multiple tasks, we would have to run the above test pairwise on the set of tasks, for an overall complexity of O(n^2), e.g.:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

def has_any_clash(tasks: list[tuple[float, float]]) -> bool:
    """Check if any tasks clash, return True if no clashes, false otherwise."""
    if len(tasks) < 2:
        return False
    clash = any(
        has_clash(task, other_task)
        for i, (task) in enumerate(tasks[1:])
        for other_task in tasks[:i]
    )
    return clash

Can this complexity be improved upon by e.g. sorting or some other mathematical magic?

Is there an algorithm that does this more efficiently?

>Solution :

Sort the list by start time, then just compare the last end to the next start time of consecutive entries. The complexity for this is just O(nlogn) for sorting and O(n) for iterating the pairs.

def has_any_clash(tasks: list[tuple[float, float]]) -> bool:
    """Check if any tasks clash, return True if no clashes, false otherwise."""
    sorted_tasks = sorted(tasks)
    return any(a2 < b1 for (a1, a2), (b1, b2) in zip(sorted_tasks, sorted_tasks[1:]))

print(has_any_clash([]))  # False
print(has_any_clash([(0, 4)]))  # False
print(has_any_clash([(1, 5), (0, 4)]))  # False
print(has_any_clash([(5, 8), (0, 4), (6, 7)]))  # True

If you want a list which tasks clash (those could be more that just the consecutive ones) that is a bit more involved, but not by much. In addition to the above:

  • keep track of "running" tasks, using a heap, sorted by end-time
  • iterate tasks sorted by start-time, in each iteration:
    • pop tasks from running heap while end-time is < current tasks’s start time
    • all remaining tasks in running heap clash with current task
    • add current task to running heap
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading