Average of rows grouped by id, but only if COUNT of grouped rows are > than N [SQL]

Advertisements

I have a table with the following schema:

CREATE TABLE IF NOT EXISTS itemreviews(
  itemid INT NOT NULL,
  reviewid INT NOT NULL,
  reviewrating INT NOT NULL,
  reviewsummary VARCHAR(256) NOT NULL,
  PRIMARY KEY(itemid, reviewid)
);

Here’s an example of some data in this table:

+--------+----------+--------------+----------------------+
| itemId | reviewId | reviewRating |    reviewSummary     |
+--------+----------+--------------+----------------------+
|     01 | VM2LK    |            5 | Nice item! Loved it. |
|     01 | P4KF2    |            3 | Meh, it's okay.      |
|     02 | 3IK42    |            1 | Horrible, hated it.  |
|     02 | KDXX2    |            3 | It's fine.           |
|     03 | 3KXXZ    |            4 | I liked it!          |
|     03 | KQ23S    |            3 | It's okay.           |
+--------+----------+--------------+----------------------+

I was tasked with writing a query to sort items by the average of their ratings.

So, using the same dataset above, the result of that query would simply be:

+--------+---------+
| itemid | average |
+--------+---------+
|     01 |       4 | // item 01 had ratings 5 and 3 so (5 + 3) / number of ratings = 4
|     03 |     3.5 | // item 03 had ratings 4 and 3 so (4 + 3) / number of ratings = 3.5
|     02 |       2 | // item 02 had ratings 1 and 3 so (1 + 3) / number of ratings = 2
+--------+---------+

I am currently using this query to achieve this result:

SELECT itemid, AVG(reviewRating) as average FROM itemreviews 
GROUP BY itemid
ORDER BY average DESC 

The problem, however, is that I have another requirement:

We should only include, in the result, items that have at least n number of reviews.

In the example above, all items have 2 reviews equally.

If, suppose, item 03 had only 1 review, and that I only want items with at least 2 reviews, then item 03 should be excluded. Only item 01 and 02 would be sorted:

+--------+---------+
| itemid | average |
+--------+---------+
|     01 |       4 | // item 01 had ratings 5 and 3 so (5 + 3) / number of ratings = 4
|     02 |       2 | // item 02 had ratings 1 and 3 so (1 + 3) / number of ratings = 2
// item 03 is no more, since it only had one review
+--------+---------+

I have no idea how to apply this requirement.

I am guessing that I need to group by itemid, then somehow use COUNT to find the number of reviews an item has, and then filter those that don’t have at least n reviews, and only then sort it.

But I am already calculating the average, which, I presume, already "counts" internally. I am worried about performance as well.

If anyone can give me any pointers, I’d be very thankful.

>Solution :

As you already Grouping you can use HAVING COUNT(*) to get number of rows

SELECT itemid, AVG(reviewRating) as average FROM itemreviews 
GROUP BY itemid
HAVING COUNT(*) > 1
ORDER BY average DESC 

Leave a Reply Cancel reply