Home Select only rows that have a value for all categories in a column after group by

Questions

Select only rows that have a value for all categories in a column after group by

October 25, 2022

For a given table, I have grouped by two columns and calculated the average of a third column. However, I would like to only get rows that have a value for all possible values in the category column (these are 1 and 2). How can this result be achieved?

Example table:

| name     | category | number |
| -------- | -------- | ------ | 
| jack     | 1        | 12.30  |
| jack     | 1        | 12.50  |
| jack     | 2        | 13.35  |
| jack     | 2        | 13.35  |
| jack     | 2        | 13.35  |
| james    | 1        | 18.76  |
| james    | 1        | 20.38  |
| kate     | 1        | 22.14  |
| kate     | 1        | 22.18  |
| kate     | 2        | 21.80  |
| kate     | 2        | 22.00  |

Current status:

SELECT name, category, AVG(number) AS average_number
FROM dummy_table
GROUP BY name, category

-- Gives:

| name     | category | average_number |
| -------- | -------- | -------------- | 
| jack     | 1        | 12.40          |
| jack     | 2        | 13.35          |
| james    | 1        | 19.57          |
| kate     | 1        | 22.16          |
| kate     | 2        | 21.90          |

Desired result:

-- `James` is dropped because he only has values for category 1, and not category 2.

| name     | category | average_number |
| -------- | -------- | -------------- | 
| jack     | 1        | 12.40          |
| jack     | 2        | 13.35          |
| kate     | 1        | 22.16          |
| kate     | 2        | 21.90          |

DDL (from fiddle, different values):

Create Table Emps(person_name VARCHAR(50), category INT, number DECIMAL);

Insert Into Emps Values
('steve',1,15.30),
('steve',1,14.30),
('steve',2,14.30),
('steve',2,13.30),
('john',1,18.76),
('john',1,17.36),
('john',1,18.42),
('angela',1,23.84),
('angela',1,24.48),
('angela',2,22.00),
('angela',2,22.94);

>Solution :

My first naive solution would be to include a WHERE-clause that checks if there are records in the table for the same name but with different category values. That can be done with a subquery.

Something like this:

SELECT name, category, AVG(number) AS average_number
FROM dummy_table
WHERE name IN (SELECT name
               FROM dummy_table
               WHERE category IN (1, 2)
               GROUP BY name
               HAVING COUNT(DISTINCT category) = 2)
GROUP BY name, category

I included an additional check (WHERE-clause) in the subquery to only regard categories 1 and 2. You could omit that inner WHERE-clause if there are only 2 categories.

Perhaps there are more elegant solutions, so I will keep an eye on the other answers in this post. 😉