SQL Query Help in Postgresql

Using this data: https://dbfiddle.uk/zgLVbIbJ
I am trying to get a query that updates rows 1-3, 5-7, 9, 11 when LIMIT 3 is used. I am trying to find the first 3 thing_id’s which match, ordered by status_1_date. Then update all matching thing_id’s rows. So that’s thing_1, thing_2, thing_3 and update only their matching rows which are 1-3, 5-7, 9, 11

This question spawned from a previous thread but no solution: Unable to get Postgresql LIMIT to work with UPDATE and subquery

The issues I am trying to solve is I work with a large database and I need to limit the data returned from it but ensure I get a complete set of data for the entities I am querying for. The thing_id is not the primary key in our database however the thing_id represents an entity that could be composed of data from N rows as indicated by a shared thing_id. I need to make sure I get a complete set of rows for all thing entities where LIMIT represents the total number of thing entities I am fetching data for.

>Solution :

This can’t be done with a limit, because it’s too… limited. Instead we can use a window function. Let’s break it down…

First, we identify the matching rows and put them in order.

select
  *
from thing
where c_id = 'c_id'
  and name like 'protocol://full/path/to/thing/%'
order by status_1_date;

Simple enough.

From those results, we need to pick the first 3 distinct matching thing_ids. We can add a column to the result with dense_rank.

select
  dense_rank() over (order by thing_id) as thing_rank,
  *
from thing
where c_id = 'c_id'
  and name like 'protocol://full/path/to/thing/%'
order by status_1_date;
thing_rank  name    thing_id    c_id    status  status_1_date
1   protocol://full/path/to/thing/thing_1.file1     thing_1     c_id    status_1    2023-09-29 09:00:01
1   protocol://full/path/to/thing/thing_1.file2     thing_1     c_id    status_1    2023-09-29 09:00:02
2   protocol://full/path/to/thing/thing_2.file5     thing_2     c_id    status_1    2023-09-29 09:00:02.5
2   protocol://full/path/to/thing/thing_2.file1     thing_2     c_id    status_1    2023-09-29 09:00:03
2   protocol://full/path/to/thing/thing_2.file2     thing_2     c_id    status_1    2023-09-29 09:00:04
2   protocol://full/path/to/thing/thing_2.file3     thing_2     c_id    status_1    2023-09-29 09:00:05
3   protocol://full/path/to/thing/thing_3.file1     thing_3     c_id    status_1    2023-09-29 09:00:06
4   protocol://full/path/to/thing/thing_4.file1     thing_4     c_id    status_1    2023-09-29 09:00:06.1
3   protocol://full/path/to/thing/thing_3.file2     thing_3     c_id    status_1    2023-09-29 09:00:06.2
4   protocol://full/path/to/thing/thing_4.file2     thing_4     c_id    status_1    2023-09-29 09:00:06.3
5   protocol://full/path/to/thing/thing_5.file1     thing_5     c_id    status_1    2023-09-29 09:00:06.4
5   protocol://full/path/to/thing/thing_5.file2     thing_5     c_id    status_1    2023-09-29 09:00:06.5
5   protocol://full/path/to/thing/thing_5.file3     thing_5     c_id    status_1    2023-09-29 09:00:06.6
6   protocol://full/path/to/thing/thing_6.file1     thing_6     c_id    status_1    2023-09-29 09:00:06.7

Now we have a way to identify the rows which correspond to the first three thing_ids: thing_1 has a rank of 1, thing_2 has a rank of 2, and thing_3 has a rank of 3.

We use dense_rank, not rank, to ensure the ranks are 1, 2, 3… Otherwise we’d have gaps.

Join with that sub-query on the primary key, and only update those with a rank <= 3.

with ranked_matching_things as (
  select
    dense_rank() over (order by thing_id) as thing_rank,
    name
  from thing
  where c_id = 'c_id'
    and name like 'protocol://full/path/to/thing/%'
  order by status_1_date
)
update thing
set status = 'CHANGED'
from ranked_matching_things
where thing.name = ranked_matching_things.name
  and ranked_matching_things.thing_rank <= 3

Demonstration

Leave a Reply