Getting an error of duplicate data when trying to perform a merge that uses a column of binary format
MERGE INTO TARGET_TABLE tgt
USING (
SELECT src.*
FROM SRC_TABLE src
) src ON (NVL(tgt.brand,'N') = NVL(src.brand,'N')
AND tgt.host = src.host
AND tgt.binary_col = src.binary_col )
WHEN MATCHED AND (src.TOTAL_DISTINCT != tgt.TOTAL_DISTINCT
OR src.TOTAL_DISTINCT_2 != tgt.TOTAL_DISTINCT_) THEN
UPDATE
SET tgt.TOTAL_DISTINCT=src.TOTAL_DISTINCT,
tgt.TOTAL_DISTINCT_2=src.TOTAL_DISTINCT_2
WHEN NOT MATCHED THEN
INSERT(BRAND, HOST,binary_col,TOTAL_DISTINCT,TOTAL_DISTINCT_2)
values(src.BRAND,src.HOST,
src.binary_col,src.TOTAL_DISTINCT,src.TOTAL_DISTINCT_2);
When I run the below code, I do get a count = 2 for a couple of my records.
SELECT nvl(src.brand,'N'), src.host, TO_CHAR(src.binary_col, 'HEX'), count(*)
FROM src_table src
GROUP BY nvl(src.brand,'N'), src.host, TO_CHAR(src.binary_col, 'HEX')
HAVING count(*)> 1;
However when I try to find those individual records with below query, the query only returns 1 record for the ones that the group by returned 2.
SELECT src.*
FROM src_table src
WHERE
brand is null
and host in ('www.myhost.com')
and src.binary_col in ('6D5F2AC3300942F6A4CE066F056CAF0B'::BINARY)
Is there a workaround when using binary columns in merge/group by statements?
>Solution :
A better way to to quickly identify "duplicated" rows is usage of QUALIFY and windowed COUNT:
SELECT COALESCE(src.brand,'N'), src.host, TO_CHAR(src.binary_col, 'HEX'), *
FROM src_table src
QUALIFY COUNT(*) OVER (PARITITON BY COALESCE(src.brand,'N'), src.host,
TO_CHAR(src.binary_col, 'HEX')) > 1;
This approach allows to quickly identify duplicates rows and do not collapse them like GROUP BY/HAVING approach.