The calculation of a time span with gaps and islands was already discussed.
But what about different patterns of time spans present?
The present data is a combination of gaps between time spans and contiguous time spans. These are available in different forms.
ID 1 has 5 time spans of which 4 lie within the first time span from 2000-08-08 to 2019-03-31 but for ID 2 the time spans are present in such a way that the next time span in the next line falls within the end date of the following line. ID 3 has a gap between the first enddate and the following startdate.
My code works just for ID 2 and 3.
How do I find the time span for ID 1 as well?
In exactly this case my code does not work.
The problem occurs with ID 1 as the code is not able to solve it with 1 iteration, because the data is present in a different way.
| id | startdate | enddate |
|---|---|---|
| 1 | 2000-08-08 | 2019-03-31 |
| 1 | 2007-06-08 | 2007-09-09 |
| 1 | 2008-02-08 | 2011-08-04 |
| 1 | 2012-08-01 | 2012-09-20 |
| 1 | 2018-02-01 | 2019-03-31 |
| 2 | 2013-02-18 | 2019-03-28 |
| 2 | 2018-04-12 | 2020-07-28 |
| 3 | 2015-01-11 | 2015-04-02 |
| 3 | 2016-02-08 | 2021-11-22 |
My code:
WITH cte1 AS (
SELECT
id,
startdate,
enddate,
CASE
WHEN LAG(enddate) OVER (PARTITION BY id ORDER BY startdate) >= DATEADD(day, -1, startdate) THEN 0
ELSE 1
END AS new_grp
FROM tab1
), cte2 AS (
SELECT
cte1.*,
SUM(new_grp) OVER (PARTITION BY id ORDER BY startdate) AS grp_num
FROM cte1
)
SELECT
id,
MIN(startdate) AS startdate,
MAX(enddate) AS enddate
FROM cte2
GROUP BY id, grp_num
ORDER BY id, startdate;
But when I want to derive the time span for ID 1 it does need 4 iterations – so the classical gaps and islands approach is not feasible.
What I expect:
| id | startdate | enddate |
|---|---|---|
| 1 | 2000-08-08 | 2019-03-31 |
| 2 | 2013-02-18 | 2020-07-28 |
| 3 | 2015-01-11 | 2015-04-02 |
| 3 | 2016-02-08 | 2021-11-22 |
>Solution :
You can take max(enddte) in preceding rows.
Change
CASE
WHEN LAG(enddate) OVER (PARTITION BY id ORDER BY startdate) >= DATEADD(day, -1, startdate) THEN 0
ELSE 1
END AS new_grp
with
CASE
WHEN MAX(enddate)
OVER (PARTITION BY id ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
>= DATEADD(day, -1, startdate)
THEN 0
ELSE 1
END AS new_grp
Query example (from your fiddle)
WITH cte1 AS (
SELECT
id,
startdate,
enddate,
CASE
WHEN MAX(enddate)
OVER (PARTITION BY id ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
>= DATEADD(day, -1, startdate)
THEN 0
ELSE 1
END AS new_grp
FROM tab1
), cte2 AS (
SELECT
cte1.*,
SUM(new_grp) OVER (PARTITION BY id ORDER BY startdate) AS grp_num
FROM cte1
)
SELECT
id,
MIN(startdate) AS startdate,
MAX(enddate) AS enddate
FROM cte2
GROUP BY id, grp_num
ORDER BY id, startdate;