I have a table:
| user_id | type | date |
|---|---|---|
| 1 | prime | 2022-05-31 |
| 1 | usual | 2022-06-30 |
| 1 | usual | 2022-07-31 |
| 1 | usual | 2022-09-31 |
I need to get a table grouped by user_id and type and validity period. The main difficulty is that the periods may not be consecutive (they have more than 1 month distance between each other):
Otput should be:
| user_id | type | start_month | end_month |
|---|---|---|---|
| 1 | prime | 2022-05 | 2022-05 |
| 1 | usual | 2022-06 | 2022-07 |
| 1 | usual | 2022-09 | 2022-09 |
>Solution :
This is a typical gaps-and-island problem, where you need to build a more refined partitioning. In order to do that you:
- create a flag to understand when to change partition
- compute a running sum on the flag to create your actual partitions
- apply the aggregation
WITH cte AS (
SELECT *,
CASE WHEN LEAD(date) OVER(PARTITION BY user_id, type ORDER BY date) > date + INTERVAL '1 MONTH' THEN 0 ELSE 1 END AS changepart
FROM tab
), cte2 AS (
SELECT *,
SUM(changepart) OVER(PARTITION BY user_id, type ORDER BY date) AS parts
FROM cte
)
SELECT user_id,
type,
MIN(date) AS start_dt,
MAX(date) AS end_dt
FROM cte2
GROUP BY user_id,
type,
parts
Output:
| user_id | type | start_dt | end_dt |
|---|---|---|---|
| 1 | prime | 2022-05-31 | 2022-05-31 |
| 1 | usual | 2022-06-30 | 2022-07-31 |
| 1 | usual | 2022-09-30 | 2022-09-30 |
Check the demo here.
Note: You can’t have year and month only dates. A date is always composed of a year, a month and a day. At best, your "start_dt" and "end_dt" can be transposed to the first day of that month, using DATE_TRUNC('MONTH', your_date).