Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Aggregate rows only if dates range overlap

I am trying to build a complex sql query for hours but still didn’t find any way to do it as expected.

Here is my table and my dataset :

create table Skills
(
ID varchar(10),
StartDate date,
EndDate date,
Skill varchar(10)
);

Insert into Skills values
('1','2021-01-01','2021-12-31','A'),
('1','2022-01-01','2022-12-31','B'),
('2','2021-01-01','2021-12-31','A'),
('2','2021-11-30','2022-12-31','B'),
('3','2021-01-01','2021-12-31','A'),
('3','2021-11-30','2022-12-31','B'),
('3','2022-11-30','2023-12-31','C'),
('4','2021-01-01','2021-12-31','A'),
('4','2022-01-01','2022-12-31','B'),
('4','2022-11-30','2023-12-31','C');

I would like to aggregate rows by ID only when dates range (StartDate, EndDate) overlap.
Here is the expected result :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

1, 2021-01-01, 2021-12-31, A
1, 2022-01-01, 2022-12-31, B
2, 2021-01-01, 2022-12-31, B
3, 2021-01-01, 2023-12-31, C
4, 2021-01-01, 2021-12-31, A
4, 2022-01-01, 2023-12-31, C

When rows with overlapping dates range are aggregated, we need to keep the oldest StartDate, the newest EndDate and the Skill associated to the newest EndDate.

I tried so many queries with partition by, lag, cte, etc.

Could you help me find the right solution please ?

Thanks,
Regards

>Solution :

This is a gaps and islands problem, to solve it you can use lag() to determine where the "islands" start, Then use a cumulative sum() to determine gaps :

select id, min(start_date) as start_date, max(end_date) as end_date, max(Skill) as skill
from (
      select d.*,
      sum(case when DATEDIFF(prev_end_date, start_date) > 0 then 0 else 1 end)  over (partition by id order by start_date) as grp
      from (
            select d.*,
            lag(end_date) over (partition by id order by start_date) as prev_end_date
            from Skills d
      ) d
    ) d
group by id, grp
order by id, min(start_date);

Demo here

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading