Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Average of rows grouped by id, but only if COUNT of grouped rows are > than N [SQL]

I have a table with the following schema:

CREATE TABLE IF NOT EXISTS itemreviews(
  itemid INT NOT NULL,
  reviewid INT NOT NULL,
  reviewrating INT NOT NULL,
  reviewsummary VARCHAR(256) NOT NULL,
  PRIMARY KEY(itemid, reviewid)
);

Here’s an example of some data in this table:

+--------+----------+--------------+----------------------+
| itemId | reviewId | reviewRating |    reviewSummary     |
+--------+----------+--------------+----------------------+
|     01 | VM2LK    |            5 | Nice item! Loved it. |
|     01 | P4KF2    |            3 | Meh, it's okay.      |
|     02 | 3IK42    |            1 | Horrible, hated it.  |
|     02 | KDXX2    |            3 | It's fine.           |
|     03 | 3KXXZ    |            4 | I liked it!          |
|     03 | KQ23S    |            3 | It's okay.           |
+--------+----------+--------------+----------------------+

I was tasked with writing a query to sort items by the average of their ratings.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

So, using the same dataset above, the result of that query would simply be:

+--------+---------+
| itemid | average |
+--------+---------+
|     01 |       4 | // item 01 had ratings 5 and 3 so (5 + 3) / number of ratings = 4
|     03 |     3.5 | // item 03 had ratings 4 and 3 so (4 + 3) / number of ratings = 3.5
|     02 |       2 | // item 02 had ratings 1 and 3 so (1 + 3) / number of ratings = 2
+--------+---------+

I am currently using this query to achieve this result:

SELECT itemid, AVG(reviewRating) as average FROM itemreviews 
GROUP BY itemid
ORDER BY average DESC 

The problem, however, is that I have another requirement:

We should only include, in the result, items that have at least n number of reviews.

In the example above, all items have 2 reviews equally.

If, suppose, item 03 had only 1 review, and that I only want items with at least 2 reviews, then item 03 should be excluded. Only item 01 and 02 would be sorted:

+--------+---------+
| itemid | average |
+--------+---------+
|     01 |       4 | // item 01 had ratings 5 and 3 so (5 + 3) / number of ratings = 4
|     02 |       2 | // item 02 had ratings 1 and 3 so (1 + 3) / number of ratings = 2
// item 03 is no more, since it only had one review
+--------+---------+

I have no idea how to apply this requirement.

I am guessing that I need to group by itemid, then somehow use COUNT to find the number of reviews an item has, and then filter those that don’t have at least n reviews, and only then sort it.

But I am already calculating the average, which, I presume, already "counts" internally. I am worried about performance as well.

If anyone can give me any pointers, I’d be very thankful.

>Solution :

As you already Grouping you can use HAVING COUNT(*) to get number of rows

SELECT itemid, AVG(reviewRating) as average FROM itemreviews 
GROUP BY itemid
HAVING COUNT(*) > 1
ORDER BY average DESC 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading