Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to get 1st column value using partition

I have a table looks like this

CREATE TABLE myTable
(
    userid      text,
    webid         text,
    "timestamp" timestamp
);

INSERT INTO myTable
    ("userid", "webid", "timestamp")
VALUES ('A', '34', '2023-01-31 17:34:49.000'),
       ('A', '73', '2023-01-31 17:34:50.000'),
       ('A', '97', '2023-01-31 17:34:58.000'),
       ('A', '17', '2023-01-31 17:35:02.000'),
       ('A', '17', '2023-01-31 17:35:07.000'),
       ('A', '17', '2023-01-31 17:35:18.000'),
       ('A', '17', '2023-01-31 17:35:30.000'),
       ('A', '1', '2023-01-31 17:35:37.000'),
       ('A', '1', '2023-01-31 17:35:38.000'),
       ('A', '77', '2023-01-31 17:35:41.000'),
       ('A', '77', '2023-01-31 17:35:42.000'),
       ('A', '1', '2023-01-31 17:37:10.000'),
       ('A', '1', '2023-01-31 17:37:12.000'),
       ('A', '77', '2023-01-31 17:37:14.000'),
       ('A', '77', '2023-01-31 17:37:15.000'),
       ('B', '33', '2023-01-31 06:37:15.000'),
       ('B', '56', '2023-01-31 06:40:15.000')
  ;

userid webid timestamp
A 34 2023-01-31 17:34:49
A 73 2023-01-31 17:34:50
A 97 2023-01-31 17:34:58
A 17 2023-01-31 17:35:02
A 17 2023-01-31 17:35:07
A 17 2023-01-31 17:35:18
A 17 2023-01-31 17:35:30
A 1 2023-01-31 17:35:37
A 1 2023-01-31 17:35:38
A 77 2023-01-31 17:35:41
A 77 2023-01-31 17:35:42
A 1 2023-01-31 17:37:10
A 1 2023-01-31 17:37:12
A 77 2023-01-31 17:37:14
A 77 2023-01-31 17:37:15
B 33 2023-01-31 06:37:15
B 56 2023-01-31 06:40:15

The correct output I want is this:

userid webid timestamp first_ts
A 34 2023-01-31 17:34:49 2023-01-31 17:34:49
A 73 2023-01-31 17:34:50 2023-01-31 17:34:50
A 97 2023-01-31 17:34:58 2023-01-31 17:34:58
A 17 2023-01-31 17:35:02 2023-01-31 17:35:02
A 17 2023-01-31 17:35:07 2023-01-31 17:35:02
A 17 2023-01-31 17:35:18 2023-01-31 17:35:02
A 17 2023-01-31 17:35:30 2023-01-31 17:35:02
A 1 2023-01-31 17:35:37 2023-01-31 17:35:37
A 1 2023-01-31 17:35:38 2023-01-31 17:35:37
A 77 2023-01-31 17:35:41 2023-01-31 17:35:41
A 77 2023-01-31 17:35:42 2023-01-31 17:35:41
A 1 2023-01-31 17:37:10 2023-01-31 17:37:10
A 1 2023-01-31 17:37:12 2023-01-31 17:37:10
A 77 2023-01-31 17:37:14 2023-01-31 17:37:14
A 77 2023-01-31 17:37:15 2023-01-31 17:37:14
B 33 2023-01-31 06:37:15 2023-01-31 06:37:15
B 56 2023-01-31 06:40:15 2023-01-31 06:40:15

first_ts means first timestamp. If a userid visits consecutive webid, their first_ts should be the first timestamp of this webid. If the userid visits webid let’s say webid = 77 and jumps to another webid = 1 and then jumps back to webid = 77, the first_ts should be refreshed, it won’t be the very first one. I used following code to do that, but unable to get correct first_ts for userid who jumps back to previous webid.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

select userid, webid, 
lag(webid,1) over(partition by userid order by timestamp desc) as next_webid,
lag(webid,1) over(partition by userid order by timestamp asc) as previous_webid,
timestamp,
case when webid = next_webid then first_value(timestamp) over(partition by userid, webid order by timestamp) 
     when webid = next_webid and webid = previous_webid then first_value(timestamp) over(partition by userid, webid order by timestamp) 
     when webid = previous_webid then first_value(timestamp) over(partition by userid, webid order by timestamp) 
     else timestamp
end as first_ts
from myTable 
order by userid, timestamp;

this wrong query will return output

userid webid timestamp first_ts
A 34 2023-01-31 17:34:49 2023-01-31 17:34:49
A 73 2023-01-31 17:34:50 2023-01-31 17:34:50
A 97 2023-01-31 17:34:58 2023-01-31 17:34:58
A 17 2023-01-31 17:35:02 2023-01-31 17:35:02
A 17 2023-01-31 17:35:07 2023-01-31 17:35:02
A 17 2023-01-31 17:35:18 2023-01-31 17:35:02
A 17 2023-01-31 17:35:30 2023-01-31 17:35:02
A 1 2023-01-31 17:35:37 2023-01-31 17:35:37
A 1 2023-01-31 17:35:38 2023-01-31 17:35:37
A 77 2023-01-31 17:35:41 2023-01-31 17:35:41
A 77 2023-01-31 17:35:42 2023-01-31 17:35:41
A 1 2023-01-31 17:37:10 2023-01-31 17:35:37
A 1 2023-01-31 17:37:12 2023-01-31 17:35:37
A 77 2023-01-31 17:37:14 2023-01-31 17:35:41
A 77 2023-01-31 17:37:15 2023-01-31 17:35:41
B 33 2023-01-31 06:37:15 2023-01-31 06:37:15
B 56 2023-01-31 06:40:15 2023-01-31 06:40:15

in row 12, 13, 14, 15, the first_ts is wrong.
Anyone knows how to modify the query to get correct first_ts? Thanks!

>Solution :

This is a gaps and islands problem. One solution uses the difference in row numbers method:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY userid ORDER BY timestamp) rn1,
              ROW_NUMBER() OVER (PARTITION BY userid, webid ORDER BY timestamp) rn2
    FROM myTable
)

SELECT userid, webid, timestamp,
       MIN(timestamp) OVER (PARTITION BY userid, webid, rn1 - rn2) AS first_ts
FROM cte
ORDER BY userid, timestamp;

The idea here is to use the row number difference to form a pseudo group, which is defined by the userid, webid, and rn1 - rn2. We then take the minimum timestamp for each such group/island to find the first_ts. To get a better understanding of how difference in row numbers works, just run SELECT * FROM cte and check the results.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading