Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add a column and populate based on other columns values

I have the following table:

CREATE TABLE trajectory(
    user_id int, 
    session_id int, 
    lat double precision, 
    lon double precision
);

INSERT INTO trajectory(user_id, session_id, lat, lon) VALUES 
(1, 25304,39.984702, 116.318417),
(1, 25304, 39.984683, 116.31845), 
(1, 25304, 39.984686, 116.318417),
(1, 25304, 39.984688, 116.318385), 
(1, 20959,40.008304, 116.319876),
(1, 20959,40.008413, 116.319962), 
(1, 20959,40.007171, 116.319458),
(2, 55305, 39.984094, 116.319236), 
(2, 55305, 39.984198, 116.319322), 
(2, 55305, 39.984224, 116.319402), 
(2, 34104, 40.013867, 116.306473),
(2, 34104, 40.013907, 116.306488);

Question:

I want to add a column trip_id to this table, based on the user_id and session_id columns, so that when the user’s session id changes, I know the user is on a new trip so I add that id to the new trip column.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Required output:

user_id session_id lat lon trip_id
1 25304 39.984702 116.318417 1
1 25304 39.984683 116.31845 1
1 25304 39.984686 116.318417 1
1 25304 39.984688 116.318385 1
1 20959 40.008304 116.319876 2
1 20959 40.008413 116.319962 2
1 20959 40.007171 116.319458 2
2 55305 39.984094 116.319236 1
2 55305 39.984198 116.319322 1
2 55305 39.984224 116.319402 1
2 34104 40.013867 116.306473 2
2 34104 40.013907 116.306488 2

How can I do this?

>Solution :

We can approach this with a gaps-and-island technique. Assuming that you have a column to order your dataset, say ordering_id:

select t.*, 
    count(*) 
        filter(where session_id is distinct from lag_session_id) 
        over(partition by user_id order by ordering_id) trip_id
from (
    select t.*, 
        lag(session_id) over(partition by user_id order by ordering_id) lag_session_id
    from trajectory t
) t

The idea is to bring the "preceding" session id of the same user with lag, then to compare it with the value on the current row; we can then count how many times it changed, which defines the trip id.

user_id session_id lat lon ordering_id lag_session_id trip_id
1 25304 39.984702 116.318417 1 null 1
1 25304 39.984683 116.31845 2 25304 1
1 25304 39.984686 116.318417 3 25304 1
1 25304 39.984688 116.318385 4 25304 1
1 20959 40.008304 116.319876 5 25304 2
1 20959 40.008413 116.319962 6 20959 2
1 20959 40.007171 116.319458 7 20959 2
2 55305 39.984094 116.319236 8 null 1
2 55305 39.984198 116.319322 9 55305 1
2 55305 39.984224 116.319402 10 55305 1
2 34104 40.013867 116.306473 11 55305 2
2 34104 40.013907 116.306488 12 34104 2

fiddle

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading