Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Need help finding duplicate values for Data Quality checks

I have a table which requires me to ensure that a combination of attributes should have a unique record against it.

col1 col2 col3
a     b   x
a     b   y
a     c   x
a     d   z
e     b   w

How do I ensure that a col1+col2 combination only has unique col3 values. Here ab has both x and y as col3 values. I have to send such rows to a reject file and I am looking for the right filter query.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We can use an aggregation approach. To identify rows which are failing the unique requirement use:

WITH cte AS (
    SELECT col1, col2
    FROM yourTable
    GROUP BY col1, col2
    HAVING MIN(col3) <> MAX(col3)
)

SELECT t1.*
FROM yourTable t1
INNER JOIN cte t2
    ON t2.col1 = t1.col1 AND
       t2.col2 = t1.col2;
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading