I have the following table:
CREATE TABLE trajectory(
user_id int,
session_id int,
lat double precision,
lon double precision
);
INSERT INTO trajectory(user_id, session_id, lat, lon) VALUES
(1, 25304,39.984702, 116.318417),
(1, 25304, 39.984683, 116.31845),
(1, 25304, 39.984686, 116.318417),
(1, 25304, 39.984688, 116.318385),
(1, 20959,40.008304, 116.319876),
(1, 20959,40.008413, 116.319962),
(1, 20959,40.007171, 116.319458),
(2, 55305, 39.984094, 116.319236),
(2, 55305, 39.984198, 116.319322),
(2, 55305, 39.984224, 116.319402),
(2, 34104, 40.013867, 116.306473),
(2, 34104, 40.013907, 116.306488);
Question:
I want to add a column trip_id to this table, based on the user_id and session_id columns, so that when the user’s session id changes, I know the user is on a new trip so I add that id to the new trip column.
Required output:
| user_id | session_id | lat | lon | trip_id |
|---|---|---|---|---|
| 1 | 25304 | 39.984702 | 116.318417 | 1 |
| 1 | 25304 | 39.984683 | 116.31845 | 1 |
| 1 | 25304 | 39.984686 | 116.318417 | 1 |
| 1 | 25304 | 39.984688 | 116.318385 | 1 |
| 1 | 20959 | 40.008304 | 116.319876 | 2 |
| 1 | 20959 | 40.008413 | 116.319962 | 2 |
| 1 | 20959 | 40.007171 | 116.319458 | 2 |
| 2 | 55305 | 39.984094 | 116.319236 | 1 |
| 2 | 55305 | 39.984198 | 116.319322 | 1 |
| 2 | 55305 | 39.984224 | 116.319402 | 1 |
| 2 | 34104 | 40.013867 | 116.306473 | 2 |
| 2 | 34104 | 40.013907 | 116.306488 | 2 |
How can I do this?
>Solution :
We can approach this with a gaps-and-island technique. Assuming that you have a column to order your dataset, say ordering_id:
select t.*,
count(*)
filter(where session_id is distinct from lag_session_id)
over(partition by user_id order by ordering_id) trip_id
from (
select t.*,
lag(session_id) over(partition by user_id order by ordering_id) lag_session_id
from trajectory t
) t
The idea is to bring the "preceding" session id of the same user with lag, then to compare it with the value on the current row; we can then count how many times it changed, which defines the trip id.
| user_id | session_id | lat | lon | ordering_id | lag_session_id | trip_id |
|---|---|---|---|---|---|---|
| 1 | 25304 | 39.984702 | 116.318417 | 1 | null | 1 |
| 1 | 25304 | 39.984683 | 116.31845 | 2 | 25304 | 1 |
| 1 | 25304 | 39.984686 | 116.318417 | 3 | 25304 | 1 |
| 1 | 25304 | 39.984688 | 116.318385 | 4 | 25304 | 1 |
| 1 | 20959 | 40.008304 | 116.319876 | 5 | 25304 | 2 |
| 1 | 20959 | 40.008413 | 116.319962 | 6 | 20959 | 2 |
| 1 | 20959 | 40.007171 | 116.319458 | 7 | 20959 | 2 |
| 2 | 55305 | 39.984094 | 116.319236 | 8 | null | 1 |
| 2 | 55305 | 39.984198 | 116.319322 | 9 | 55305 | 1 |
| 2 | 55305 | 39.984224 | 116.319402 | 10 | 55305 | 1 |
| 2 | 34104 | 40.013867 | 116.306473 | 11 | 55305 | 2 |
| 2 | 34104 | 40.013907 | 116.306488 | 12 | 34104 | 2 |