Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to correctly get the second row of a partition by

First of all, I’m using BigQuery. In my table I have some duplicated rows due to a change in one of the columns (called fmid). When this happens, I have to get the latest record, which is defined by the time (called created_date). The only thing these duplicated records have in common is the id (called tid).

I have tried two different options to set which row is the first and which one is the latest, so then I could remove what I want.

First one:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

SELECT *, row_number() OVER( PARTITION BY tid) as number

FROM (SELECT created_date,tid, fmid

FROM table 

ORDER BY tid,created_date)

ORDER BY tid, created_date

Results:
Sometimes it’s correct, but the order of the id 296644550 is not right.

1   2023-02-03 22:11:07 UTC 233330088   4   1
2   2023-02-04 10:07:06 UTC 233330088   18  2
3   2023-02-11 11:06:42 UTC 284447799   4   1
4   2023-02-11 11:22:13 UTC 284447799   18  2
5   2023-02-10 10:39:23 UTC 296644550   4   2
6   2023-02-10 19:32:08 UTC 296644550   18  1

Second one:

SELECT *, row_number() OVER( PARTITION BY tid,created_date) as number

FROM (SELECT created_date,tid, fmid

FROM table 

ORDER BY tid,created_date)

ORDER BY tid, created_date

Results:
The column "number" gets the same result.

Row created_date    tid fmid    number
1   2023-02-03 22:11:07 UTC 233330088   4   1
2   2023-02-04 10:07:06 UTC 233330088   18  1
3   2023-02-11 11:06:42 UTC 284447799   4   1
4   2023-02-11 11:22:13 UTC 284447799   18  1
5   2023-02-10 10:39:23 UTC 296644550   4   1
6   2023-02-10 19:32:08 UTC 296644550   18  1

Any idea of what’s wrong?

>Solution :

the sorting has to happen in the window of row_number, so that you get the correct number

SELECT created_date,tid, fmid
FROM (SELECT created_date,tid, fmid
      , row_number() OVER( PARTITION BY tid ORDER BY created_date) as number
       FROM table 
)
WHERE number = 2
ORDER BY tid

or as CTE

WITH CTE AS (
SELECT created_date,tid, fmid
      , row_number() OVER( PARTITION BY tid ORDER BY created_date) as number
       FROM table )

SELECT created_date,tid, fmid
FROM CTE
WHERE number = 2
ORDER BY tid
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading