Given the following schema
CREATE TEMP TABLE tmp_table
(
id_bucket int2 NOT NULL,
id_property int2 NOT NULL,
v1 varchar(320) NOT NULL,
v2 varchar(320) NULL
);
And the following data
id_bucket id_property v1 v2
------------------------------
1 3 OT A
1 8 RO C
1 2 OT C
1 1 RT D
1 5 OT C
1 4 OT C
I would like that all rows that have the same values for v1 and v2 (in this case v1 = OT, v2 = C) be updated with a value for v2 of NULL except for one, regardless.
I tried grouping by my criterias, but then I lose the details of the rows in question. I know id_property will probably be useful, but when using GROUP BY’s, I don’t have access to that column.
I’ve been pulling my hair out for a while on this one. Any ideas ?
>Solution :
Your solution are analytic functions aka window functions.
UPDATE tmp_table
SET v2 = NULL
WHERE (id_bucket,
id_property) in -- this assumes these columns form a unique key
(SELECT id_bucket,
id_property
FROM
(SELECT id_bucket,
id_property,
row_number() OVER (
PARTITION BY v1, v2) rn -- numbers rows with identical v1 and v2
FROM tmp_table
) x
WHERE rn > 1
); -- removes the first row with identical v1, v2
Results in
| id_bucket | id_property | v1 | v2 |
|---|---|---|---|
| 1 | 3 | OT | A |
| 1 | 8 | RO | C |
| 1 | 2 | OT | C |
| 1 | 1 | RT | D |
| 1 | 5 | OT | null |
| 1 | 4 | OT | null |
See the DB-fiddle for a full example.
The benefit of a solution based on window functions is that the underlying table gets selected only once, while comparable solutions based on group by in general need to select the table at least a second time. This makes window functions much more efficient.
Of course, that is only really relevant when table get large.