Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Selecting Different Rows in a Group based on condition

I have the following problem with selecting rows in a group:
I have records from different data sources, each data source has a priority assigned to it.
Whenever I have two identical records with different data sources, I want to select the one with the lowest priority.
Whenever I have duplicate records from the same data source, but with different entries in the VALUE column, I want to select the row with ‘t’ (if there are several, take the first).

Here’s an example dataset:

Example Dataset

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

And the expected result:

Expected result

What I did until now was to select those entries, that had the lowest prio inside a group.
Here’s a code example:

WITH Product AS (
SELECT 1 id, 'A' code, 'f' value, 'S1' datasource, 1 prio FROM DUAL 
UNION ALL
SELECT 1 id, 'A' code, 'f' value, 'S2' datasource, 2 prio FROM DUAL 
UNION ALL 
SELECT 3 id, 'A' code, 'f' value, 'S3' datasource, 3 prio FROM DUAL 
UNION ALL 
SELECT 2 id, 'A' code, 'f' value, 'S1' datasource, 1 prio FROM DUAL 
UNION ALL 
SELECT 2 id, 'B' code, 't' value, 'S1' datasource, 1 prio FROM DUAL 
),
extended_product AS
(
SELECT p.*, ROW_NUMBER() OVER(PARTITION BY p.ID ORDER BY p.PRIO ASC) AS ROW_NUMBER
FROM Product p 
)
SELECT ep.*
FROM extended_product ep
WHERE ROW_NUMBER = 1;

I don’t know how to do both conditions, selecting lowest prio and in case all entries inside the group have the same datasource, take only those with ‘t’ (and take the first one if there are several).

>Solution :

You’re close, I think. row_number misses the value part:

ROW_NUMBER () OVER (PARTITION BY p.ID ORDER BY p.PRIO ASC, VALUE DESC) AS ROW_NUMBER

SQL> WITH
  2     Product
  3     AS
  4        (SELECT 1 id, 'A' code, 'f' VALUE, 'S1' datasource, 1 prio FROM DUAL
  5         UNION ALL
  6         SELECT 1 id, 'B' code, 'f' VALUE, 'S2' datasource, 2 prio FROM DUAL
  7         UNION ALL
  8         SELECT 3 id, 'C' code, 'f' VALUE, 'S3' datasource, 3 prio FROM DUAL
  9         UNION ALL
 10         SELECT 2 id, 'D' code, 'f' VALUE, 'S1' datasource, 1 prio FROM DUAL
 11         UNION ALL
 12         SELECT 2 id, 'E' code, 't' VALUE, 'S1' datasource, 1 prio FROM DUAL),
 13     extended_product
 14     AS
 15        (SELECT p.*,
 16                ROW_NUMBER ()
 17                   OVER (PARTITION BY p.ID ORDER BY p.PRIO ASC, VALUE DESC) AS ROW_NUMBER
 18           FROM Product p)
 19    SELECT ep.*
 20      FROM extended_product ep
 21     WHERE ep.ROW_NUMBER = 1
 22  ORDER BY ID, PRIO;

        ID C V DA       PRIO ROW_NUMBER
---------- - - -- ---------- ----------
         1 A f S1          1          1
         2 E t S1          1          1
         3 C f S3          3          1

SQL>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading