Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Count duplicates using count() and count(distinct)

I am unclear why this SQL query can be used to count duplicate values for a particular field:

select sum(country_count)
from (
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
);

returns 88**

while this query cannot:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

SELECT count(Country) - count(Distinct Country) 
FROM Customers;

returns 70**

As a side note, why do they give different results?

*FYI, these queries work in the W3Schools trySQL interface.

https://www.w3schools.com/sql/trysql.asp?filename=trysql_asc#:~:text=If%20you%20switch%20to%20a,your%20browser%2C%20for%20testing%20purposes.

What I have tried:

I have run these queries and they give different results (returning 88 vs 70). I expected them to produce the same result.

>Solution :

Consider these rows in your Customers database in Country column

  1. India
  2. India
  3. India
  4. USA
  5. USA
  6. Australia

Now consider your 1st query:-

select sum(country_count)
from (
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
);

First let’s consider the inner query that you are using:-

select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1

The output (Output1) of this query will be:-

Country | country_count

India | 3

USA | 2

Now consider applying the outer query on this output:-

select sum(country_count)
FROM Output1

The answer should be 5 for this

But for your 2nd query:-

SELECT count(Country) - count(Distinct Country) 
FROM Customers;

The output should be 6-3 = 3

So, both are doing different things. That’s the reason you are getting different outputs.

1st query : Ignore the Countries which have only 1 count and then add up all the counts for the other ones.

2nd query : Ignore the first count from all the distinct Countries and add up rest of the counts.

Hope this helps.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading