I have following SQL table
eventdate userid traffic location
18.09.2023 user_1 10 A
18.09.2023 user_1 20 A
18.09.2023 user_2 10 B
18.09.2023 user_2 20 B
18.09.2023 user_2 30 B
18.09.2023 user_3 100 A
19.09.2023 user_1 50 B
19.09.2023 user_2 10 B
19.09.2023 user_2 20 B
19.09.2023 user_3 150 C
19.09.2023 user_3 250 C
20.09.2023 user_1 50 A
20.09.2023 user_1 20 A
20.09.2023 user_2 30 B
20.09.2023 user_3 110 C
20.09.2023 user_3 120 C
I want to have result as follows
eventdate – beginning of each week,
userid – unique user id per week,
traffic – sum of all traffics,
location – the most frequent location appeared with the week
For example
eventdate userid traffic location
18.09.2023 user_1 150 A
18.09.2023 user_2 120 B
18.09.2023 user_3 730 C
I was able to achieve the result with following query
SELECT t1.eventdate, t1.userid, t1.traffic, t2.location
FROM (SELECT TO_CHAR(TRUNC(TO_DATE('2023-09-18', 'yyyy-mm-dd'), 'IW'),
'yyyy-mm-dd') AS eventdate,
tk.userid,
SUM(tk.traffic) AS traffic
FROM test_kt tk
GROUP BY tk.userid) t1
JOIN (
WITH cte AS
(
SELECT tk2.userid,
tk2.location,
ROW_NUMBER() OVER
(PARTITION BY tk2.userid ORDER BY COUNT(tk2.location) DESC) rn
FROM test_kt tk2
GROUP BY tk2.userid, tk2.location
)
SELECT userid, location
FROM cte
WHERE rn = 1
) t2
ON t1.userid = t2.userid;
Are there any efficient ways to do so?
>Solution :
You may achieve the same with STATS_MODE
aggregate function:
with a(eventdate, userid, traffic, location) as (
select to_date('18.09.2023', 'dd.mm.yyyy'), 'user_1', 10, 'A' from dual union all
select to_date('18.09.2023', 'dd.mm.yyyy'), 'user_1', 20, 'A' from dual union all
select to_date('18.09.2023', 'dd.mm.yyyy'), 'user_2', 10, 'B' from dual union all
select to_date('18.09.2023', 'dd.mm.yyyy'), 'user_2', 20, 'B' from dual union all
select to_date('18.09.2023', 'dd.mm.yyyy'), 'user_2', 30, 'B' from dual union all
select to_date('18.09.2023', 'dd.mm.yyyy'), 'user_3', 100, 'A' from dual union all
select to_date('19.09.2023', 'dd.mm.yyyy'), 'user_1', 50, 'B' from dual union all
select to_date('19.09.2023', 'dd.mm.yyyy'), 'user_2', 10, 'B' from dual union all
select to_date('19.09.2023', 'dd.mm.yyyy'), 'user_2', 20, 'B' from dual union all
select to_date('19.09.2023', 'dd.mm.yyyy'), 'user_3', 150, 'C' from dual union all
select to_date('19.09.2023', 'dd.mm.yyyy'), 'user_3', 250, 'C' from dual union all
select to_date('20.09.2023', 'dd.mm.yyyy'), 'user_1', 50, 'A' from dual union all
select to_date('20.09.2023', 'dd.mm.yyyy'), 'user_1', 20, 'A' from dual union all
select to_date('20.09.2023', 'dd.mm.yyyy'), 'user_2', 30, 'B' from dual union all
select to_date('20.09.2023', 'dd.mm.yyyy'), 'user_3', 110, 'C' from dual union all
select to_date('20.09.2023', 'dd.mm.yyyy'), 'user_3', 120, 'C' from dual
)
select
trunc(eventdate, 'iw') as eventdate,
userid,
sum(traffic) as traffic,
stats_mode(location) as location
from a
group by
trunc(eventdate, 'iw'),
userid
EVENTDATE | USERID | TRAFFIC | LOCATION |
---|---|---|---|
2023-09-18 | user_1 | 150 | A |
2023-09-18 | user_2 | 120 | B |
2023-09-18 | user_3 | 730 | C |
UPD: You may also consult the documentation for this function that shows sample query to obtain multiple modes if they are exist (for which you will use DENSE_RANK
in your original query in place of ROW_NUMBER
).