Using this data: https://dbfiddle.uk/zgLVbIbJ
I am trying to get a query that updates rows 1-3, 5-7, 9, 11 when
LIMIT 3 is used. I am trying to find the first 3 thing_id’s which match, ordered by status_1_date. Then update all matching thing_id’s rows. So that’s thing_1, thing_2, thing_3 and update only their matching rows which are 1-3, 5-7, 9, 11
This question spawned from a previous thread but no solution: Unable to get Postgresql LIMIT to work with UPDATE and subquery
The issues I am trying to solve is I work with a large database and I need to limit the data returned from it but ensure I get a complete set of data for the entities I am querying for. The thing_id is not the primary key in our database however the thing_id represents an entity that could be composed of data from N rows as indicated by a shared thing_id. I need to make sure I get a complete set of rows for all thing entities where LIMIT represents the total number of thing entities I am fetching data for.
This can’t be done with a limit, because it’s too… limited. Instead we can use a window function. Let’s break it down…
First, we identify the matching rows and put them in order.
select * from thing where c_id = 'c_id' and name like 'protocol://full/path/to/thing/%' order by status_1_date;
From those results, we need to pick the first 3 distinct matching thing_ids. We can add a column to the result with
select dense_rank() over (order by thing_id) as thing_rank, * from thing where c_id = 'c_id' and name like 'protocol://full/path/to/thing/%' order by status_1_date;
thing_rank name thing_id c_id status status_1_date 1 protocol://full/path/to/thing/thing_1.file1 thing_1 c_id status_1 2023-09-29 09:00:01 1 protocol://full/path/to/thing/thing_1.file2 thing_1 c_id status_1 2023-09-29 09:00:02 2 protocol://full/path/to/thing/thing_2.file5 thing_2 c_id status_1 2023-09-29 09:00:02.5 2 protocol://full/path/to/thing/thing_2.file1 thing_2 c_id status_1 2023-09-29 09:00:03 2 protocol://full/path/to/thing/thing_2.file2 thing_2 c_id status_1 2023-09-29 09:00:04 2 protocol://full/path/to/thing/thing_2.file3 thing_2 c_id status_1 2023-09-29 09:00:05 3 protocol://full/path/to/thing/thing_3.file1 thing_3 c_id status_1 2023-09-29 09:00:06 4 protocol://full/path/to/thing/thing_4.file1 thing_4 c_id status_1 2023-09-29 09:00:06.1 3 protocol://full/path/to/thing/thing_3.file2 thing_3 c_id status_1 2023-09-29 09:00:06.2 4 protocol://full/path/to/thing/thing_4.file2 thing_4 c_id status_1 2023-09-29 09:00:06.3 5 protocol://full/path/to/thing/thing_5.file1 thing_5 c_id status_1 2023-09-29 09:00:06.4 5 protocol://full/path/to/thing/thing_5.file2 thing_5 c_id status_1 2023-09-29 09:00:06.5 5 protocol://full/path/to/thing/thing_5.file3 thing_5 c_id status_1 2023-09-29 09:00:06.6 6 protocol://full/path/to/thing/thing_6.file1 thing_6 c_id status_1 2023-09-29 09:00:06.7
Now we have a way to identify the rows which correspond to the first three thing_ids: thing_1 has a rank of 1, thing_2 has a rank of 2, and thing_3 has a rank of 3.
rank, to ensure the ranks are 1, 2, 3… Otherwise we’d have gaps.
Join with that sub-query on the primary key, and only update those with a rank <= 3.
with ranked_matching_things as ( select dense_rank() over (order by thing_id) as thing_rank, name from thing where c_id = 'c_id' and name like 'protocol://full/path/to/thing/%' order by status_1_date ) update thing set status = 'CHANGED' from ranked_matching_things where thing.name = ranked_matching_things.name and ranked_matching_things.thing_rank <= 3