Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Gap-and-island for more than time threshold

I have these tables:

CREATE TABLE labels(user_id INT, session_id INT,
  start_time TIMESTAMP,mode TEXT);

INSERT INTO labels (user_id,session_id,start_time,mode)
VALUES  (48,652,'2016-04-01 00:47:00+01','foot'),
(9,656,'2016-04-01 00:03:39+01','car'),(9,657,'2016-04-01 00:26:51+01','car'),
(9,658,'2016-04-01 00:45:19+01','car'),(46,663,'2016-04-01 00:13:12+01','car');

CREATE TABLE raw_data(user_id INT, session_id INT,timestamp TIMESTAMP) ;

INSERT INTO raw_data(user_id, session_id,timestamp)          
VALUES (8,652,'2016-04-01 00:46:11.638+01'),(8,652,'2016-04-01 00:47:00.566+01'),
       (8,652,'2016-04-01 00:48:06.383+01'),(9,656,'2016-04-01 00:14:17.707+01'),
       (9,656,'2016-04-01 00:15:18.664+01'),(9,656,'2016-04-01 00:16:19.687+01'),
       (9,656,'2016-04-01 00:24:20.691+01'),(9,656,'2016-04-01 00:25:23.681+01'),
       (9,657,'2016-04-01 00:24:50.842+01'),(9,657,'2016-04-01 00:26:51.096+01'),
       (9,657,'2016-04-01 00:37:54.092+01');

I want to investigate sessions of a user having time time difference between successive rows greater than 5 minutes, reporting:

  • session and session mode
  • number of sessions with the gap-and-island problem.

My attempt:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

select session_id     
      ,timestamp    
      ,user_id  
      ,start_time   
      ,count(diff) over()/2 as number_of_session_with_problem
from  (
       select *
              ,case when timestamp-lag(timestamp) over(partition by session_id order by timestamp)    > '00:05:00.000' then 1 when lead(timestamp) over(partition by session_id order by timestamp) - timestamp > '00:05:00.000' then 1 end as diff
       from   raw_data join labels using(session_id)
      ) t

Error:

ERROR:  column reference "user_id" is ambiguous
LINE 3:       ,user_id 
               ^

Expected results:

session_id  timestamp            user_id    start_time        number_of_session_with_problem
   656     2016-04-01 00:16:19.687  9      2016-04-01 00:03:39     1
   657     2016-04-01 00:26:51.096  9      2016-04-01 00:26:51     2 

Note: dbfiddle.

>Solution :

The problem is we have two user_id and they’re not defined so you need to specifically choose them and give them aliases.

select  session_id    
       ,timestamp   
       ,user_id 
       ,start_time  
       ,count(diff) over() as number_of_sessions_with_problem
from   (
       select session_id     
             ,timestamp    
             ,labels.user_id  
             ,start_time   
             ,case when lead(timestamp) over(partition by session_id order by timestamp)-timestamp > '00:05:00.000' then 1 end as diff
       from   raw_data join labels using(session_id)
       ) t
where  diff = 1
session_id timestamp user_id start_time number_of_sessions_with_problem
656 2016-04-01 00:16:19.687 9 2016-04-01 00:03:39 2
657 2016-04-01 00:26:51.096 9 2016-04-01 00:26:51 2

Fiddle

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading