Updating all rows but one that have same values on certain columns

August 17, 2023

Given the following schema

CREATE TEMP TABLE tmp_table 
(
    id_bucket int2 NOT NULL, 
    id_property int2 NOT NULL,
    v1 varchar(320) NOT NULL,
    v2 varchar(320) NULL
);

And the following data

id_bucket  id_property v1   v2
------------------------------
1          3           OT   A
1          8           RO   C
1          2           OT   C
1          1           RT   D   
1          5           OT   C
1          4           OT   C

I would like that all rows that have the same values for v1 and v2 (in this case v1 = OT, v2 = C) be updated with a value for v2 of NULL except for one, regardless.

I tried grouping by my criterias, but then I lose the details of the rows in question. I know id_property will probably be useful, but when using GROUP BY’s, I don’t have access to that column.

I’ve been pulling my hair out for a while on this one. Any ideas ?

>Solution :

Your solution are analytic functions aka window functions.

UPDATE tmp_table
SET v2 = NULL
WHERE (id_bucket,
       id_property) in -- this assumes these columns form a unique key
    (SELECT id_bucket,
            id_property
     FROM
       (SELECT id_bucket,
               id_property,
               row_number() OVER (
                  PARTITION BY v1, v2) rn -- numbers rows with identical v1 and v2
                                                
        FROM tmp_table
       ) x
     WHERE rn > 1 
    ); -- removes the first row with identical v1, v2

Results in

id_bucket	id_property	v1	v2
1	3	OT	A
1	8	RO	C
1	2	OT	C
1	1	RT	D
1	5	OT	null
1	4	OT	null

See the DB-fiddle for a full example.

The benefit of a solution based on window functions is that the underlying table gets selected only once, while comparable solutions based on group by in general need to select the table at least a second time. This makes window functions much more efficient.
Of course, that is only really relevant when table get large.