Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Select only rows that have a value for all categories in a column after group by

DB fiddle: https://dbfiddle.uk/QLKIuW1G


For a given table, I have grouped by two columns and calculated the average of a third column. However, I would like to only get rows that have a value for all possible values in the category column (these are 1 and 2). How can this result be achieved?

Example table:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

| name     | category | number |
| -------- | -------- | ------ | 
| jack     | 1        | 12.30  |
| jack     | 1        | 12.50  |
| jack     | 2        | 13.35  |
| jack     | 2        | 13.35  |
| jack     | 2        | 13.35  |
| james    | 1        | 18.76  |
| james    | 1        | 20.38  |
| kate     | 1        | 22.14  |
| kate     | 1        | 22.18  |
| kate     | 2        | 21.80  |
| kate     | 2        | 22.00  |

Current status:

SELECT name, category, AVG(number) AS average_number
FROM dummy_table
GROUP BY name, category

-- Gives:

| name     | category | average_number |
| -------- | -------- | -------------- | 
| jack     | 1        | 12.40          |
| jack     | 2        | 13.35          |
| james    | 1        | 19.57          |
| kate     | 1        | 22.16          |
| kate     | 2        | 21.90          |

Desired result:

-- `James` is dropped because he only has values for category 1, and not category 2.

| name     | category | average_number |
| -------- | -------- | -------------- | 
| jack     | 1        | 12.40          |
| jack     | 2        | 13.35          |
| kate     | 1        | 22.16          |
| kate     | 2        | 21.90          |

DDL (from fiddle, different values):

Create Table Emps(person_name VARCHAR(50), category INT, number DECIMAL);

Insert Into Emps Values
('steve',1,15.30),
('steve',1,14.30),
('steve',2,14.30),
('steve',2,13.30),
('john',1,18.76),
('john',1,17.36),
('john',1,18.42),
('angela',1,23.84),
('angela',1,24.48),
('angela',2,22.00),
('angela',2,22.94);

>Solution :

My first naive solution would be to include a WHERE-clause that checks if there are records in the table for the same name but with different category values. That can be done with a subquery.

Something like this:

SELECT name, category, AVG(number) AS average_number
FROM dummy_table
WHERE name IN (SELECT name
               FROM dummy_table
               WHERE category IN (1, 2)
               GROUP BY name
               HAVING COUNT(DISTINCT category) = 2)
GROUP BY name, category

I included an additional check (WHERE-clause) in the subquery to only regard categories 1 and 2. You could omit that inner WHERE-clause if there are only 2 categories.

Perhaps there are more elegant solutions, so I will keep an eye on the other answers in this post. 😉

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading