I am unclear why this SQL query can be used to count duplicate values for a particular field:
select sum(country_count)
from (
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
);
returns 88**
while this query cannot:
SELECT count(Country) - count(Distinct Country)
FROM Customers;
returns 70**
As a side note, why do they give different results?
*FYI, these queries work in the W3Schools trySQL interface.
What I have tried:
I have run these queries and they give different results (returning 88 vs 70). I expected them to produce the same result.
>Solution :
Consider these rows in your Customers database in Country column
- India
- India
- India
- USA
- USA
- Australia
Now consider your 1st query:-
select sum(country_count)
from (
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
);
First let’s consider the inner query that you are using:-
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
The output (Output1) of this query will be:-
Country | country_count
India | 3
USA | 2
Now consider applying the outer query on this output:-
select sum(country_count)
FROM Output1
The answer should be 5 for this
But for your 2nd query:-
SELECT count(Country) - count(Distinct Country)
FROM Customers;
The output should be 6-3 = 3
So, both are doing different things. That’s the reason you are getting different outputs.
1st query : Ignore the Countries which have only 1 count and then add up all the counts for the other ones.
2nd query : Ignore the first count from all the distinct Countries and add up rest of the counts.
Hope this helps.