in my project I got 2 tables (postgresql) – partners and campaigns. Each partner may have many campaigns, but each campaign may have only one partner. The data looks like the following:
partners:
id | name
=========
1 Partner_1
2 Partner_2
campaigns:
id | name | partner_id | started_at
=============================================================
1 camp_1 1 2022-07-06 00:00:00.000000 +00:00
2 camp_2 1 2022-07-07 00:00:00.000000 +00:00
3 camp_3 2 2022-08-08 00:00:00.000000 +00:00
4 camp_4 2 2022-08-09 00:00:00.000000 +00:00
The task is for each partner select only 1 campaign with the most recent started_at date. So the final result would look like the following:
partner_name | campaign_started_at
==================================
Partner_1 2022-07-07 00:00:00.000000 +00:00
Partner_2 2022-08-09 00:00:00.000000 +00:00
For these purposes I guess I need to use a window function with select subquery and Group By statements and I tried the following:
SELECT public.partners.name,
public.campaigns.started_at as camp_started_at,
rank() OVER (PARTITION BY public.partners.name ORDER BY public.campaigns.started_at DESC)
FROM public.partners
JOIN campaigns ON campaigns.partner_id = partners.id
GROUP BY partners.name, public.campaigns.started_at;
but it did not produce the desired result. Any ideas how to fix it would be welcome.
>Solution :
You can simply use MAX() and GROUP BY partners.name to get your desired output. The window function RANK() OVER() is expensive and overkill here.
SELECT public.partners.name
, MAX(public.campaigns.started_at) as camp_started_at
FROM public.partners
JOIN campaigns ON campaigns.partner_id = partners.id
GROUP BY partners.name
;