Using this data: https://dbfiddle.uk/zgLVbIbJ
I am trying to get a query that updates rows 1-3, 5-7, 9, 11 when LIMIT 3
is used. I am trying to find the first 3 thing_id’s which match, ordered by status_1_date. Then update all matching thing_id’s rows. So that’s thing_1, thing_2, thing_3 and update only their matching rows which are 1-3, 5-7, 9, 11
This question spawned from a previous thread but no solution: Unable to get Postgresql LIMIT to work with UPDATE and subquery
The issues I am trying to solve is I work with a large database and I need to limit the data returned from it but ensure I get a complete set of data for the entities I am querying for. The thing_id is not the primary key in our database however the thing_id represents an entity that could be composed of data from N rows as indicated by a shared thing_id. I need to make sure I get a complete set of rows for all thing entities where LIMIT represents the total number of thing entities I am fetching data for.
>Solution :
This can’t be done with a limit, because it’s too… limited. Instead we can use a window function. Let’s break it down…
First, we identify the matching rows and put them in order.
select
*
from thing
where c_id = 'c_id'
and name like 'protocol://full/path/to/thing/%'
order by status_1_date;
Simple enough.
From those results, we need to pick the first 3 distinct matching thing_ids. We can add a column to the result with dense_rank
.
select
dense_rank() over (order by thing_id) as thing_rank,
*
from thing
where c_id = 'c_id'
and name like 'protocol://full/path/to/thing/%'
order by status_1_date;
thing_rank name thing_id c_id status status_1_date
1 protocol://full/path/to/thing/thing_1.file1 thing_1 c_id status_1 2023-09-29 09:00:01
1 protocol://full/path/to/thing/thing_1.file2 thing_1 c_id status_1 2023-09-29 09:00:02
2 protocol://full/path/to/thing/thing_2.file5 thing_2 c_id status_1 2023-09-29 09:00:02.5
2 protocol://full/path/to/thing/thing_2.file1 thing_2 c_id status_1 2023-09-29 09:00:03
2 protocol://full/path/to/thing/thing_2.file2 thing_2 c_id status_1 2023-09-29 09:00:04
2 protocol://full/path/to/thing/thing_2.file3 thing_2 c_id status_1 2023-09-29 09:00:05
3 protocol://full/path/to/thing/thing_3.file1 thing_3 c_id status_1 2023-09-29 09:00:06
4 protocol://full/path/to/thing/thing_4.file1 thing_4 c_id status_1 2023-09-29 09:00:06.1
3 protocol://full/path/to/thing/thing_3.file2 thing_3 c_id status_1 2023-09-29 09:00:06.2
4 protocol://full/path/to/thing/thing_4.file2 thing_4 c_id status_1 2023-09-29 09:00:06.3
5 protocol://full/path/to/thing/thing_5.file1 thing_5 c_id status_1 2023-09-29 09:00:06.4
5 protocol://full/path/to/thing/thing_5.file2 thing_5 c_id status_1 2023-09-29 09:00:06.5
5 protocol://full/path/to/thing/thing_5.file3 thing_5 c_id status_1 2023-09-29 09:00:06.6
6 protocol://full/path/to/thing/thing_6.file1 thing_6 c_id status_1 2023-09-29 09:00:06.7
Now we have a way to identify the rows which correspond to the first three thing_ids: thing_1 has a rank of 1, thing_2 has a rank of 2, and thing_3 has a rank of 3.
We use dense_rank
, not rank
, to ensure the ranks are 1, 2, 3… Otherwise we’d have gaps.
Join with that sub-query on the primary key, and only update those with a rank <= 3.
with ranked_matching_things as (
select
dense_rank() over (order by thing_id) as thing_rank,
name
from thing
where c_id = 'c_id'
and name like 'protocol://full/path/to/thing/%'
order by status_1_date
)
update thing
set status = 'CHANGED'
from ranked_matching_things
where thing.name = ranked_matching_things.name
and ranked_matching_things.thing_rank <= 3