I have the below table that shows a monthly view of customers that received an email in a particular month.
| cust_id | eom_date | email_sent |
|---|---|---|
| 1 | 2023-06-30 | 1 |
| 1 | 2023-05-31 | |
| 1 | 2023-04-30 | 1 |
| 1 | 2023-03-31 | |
| 1 | 2023-02-28 | |
| 1 | 2023-01-31 | 1 |
| 2 | 2023-06-30 | 1 |
| 2 | 2023-05-31 | |
| 2 | 2023-04-30 | 1 |
| 2 | 2023-03-31 | |
| 2 | 2023-02-28 | |
| 2 | 2023-01-31 | |
| 2 | 2022-12-31 | 1 |
I need to find out the number of months that has passed since the last email a customer received. This is the desired outcome:
| cust_id | eom_date | email_sent | months_since_last_email |
|---|---|---|---|
| 1 | 2023-06-30 | 1 | 0 |
| 1 | 2023-05-31 | 1 | |
| 1 | 2023-04-30 | 1 | 0 |
| 1 | 2023-03-31 | 2 | |
| 1 | 2023-02-28 | 1 | |
| 1 | 2023-01-31 | 1 | 0 |
| 2 | 2023-06-30 | 1 | 0 |
| 2 | 2023-05-31 | 1 | |
| 2 | 2023-04-30 | 1 | 0 |
| 2 | 2023-03-31 | 3 | |
| 2 | 2023-02-28 | 2 | |
| 2 | 2023-01-31 | 1 | |
| 2 | 2022-12-31 | 1 | 0 |
I tried the below in BigQuery but it only gives me cust_ids for previous periods, basically getting nowhere:
SELECT cust_id,
eom_date,
lag(cust_id, 1) over (
partition by cust_id order by eom_date) t1,
lag(cust_id, 2) over (
partition by cust_id order by eom_date) t2
FROM table
Thanks in advance.
>Solution :
This is for SQL Server. Don’t have BigQuery, but these are pretty commonly supported functions.
The general idea should work for BigQuery using date_diff instead of datediff
Fiddle: https://dbfiddle.uk/Hor4MnrD
CREATE TABLE your_table_name (
cust_id INT,
eom_date DATE,
email_sent INT
);
INSERT INTO your_table_name (cust_id, eom_date, email_sent)
VALUES
(1, '2023-06-30', 1),
(1, '2023-05-31', NULL),
(1, '2023-04-30', 1),
(1, '2023-03-31', NULL),
(1, '2023-02-28', NULL),
(1, '2023-01-31', 1),
(2, '2023-06-30', 1),
(2, '2023-05-31', NULL),
(2, '2023-04-30', 1),
(2, '2023-03-31', NULL),
(2, '2023-02-28', NULL),
(2, '2023-01-31', NULL),
(2, '2022-12-31', 1);
Option A: Correlated subquery. Find the last email sent by cust_id and take the date diff.
select *,
datediff(
month,
(
select max(t2.eom_date)
from your_table_name t2
where t1.cust_id = t2.cust_id
and t1.eom_date >= t2.eom_date
and t2.email_sent = 1
group
by t2.cust_id
),
eom_date) as months_since_last_email
from your_table_name t1
order
by cust_id,
eom_date desc;
Option B: Window function. Send non-email months as null, so the max in the unbounded preceding window will be the last sent date.
select *,
datediff(
month,
max(case when email_sent = 1 then
eom_date
end
) over ( partition by cust_id
order by eom_date
rows between unbounded preceding and current row
),
eom_date) as months_since_last_email
from your_table_name t1
order
by cust_id,
eom_date desc;