The below SQL query gets all the first_names which have different last names.
select a.first_name
from names a
WHERE a.first_name in (
select b.first_name
from names b
WHERE a.last_name<>b.last_name
)
I am not able to figure out how this exactly works. What I thought would happens is, for every row in the table the subquery will check if there exists a row with a different last name. But the checking happens among only identical values of first name. Can someone explain how this actually works.
>Solution :
Firstly, as I mentioned in the comments, I would recommend instead using a HAVING for this. It won’t require 2 scans of the table, just one, it won’t return duplicates for the same first_name and it’s probably easier for you to understand:
SELECT first_name
FROM dbo.names
GROUP BY first_name
HAVING COUNT(DISTINCT last_name) > 1;
This should be fairly self explanative; it returns rows where there is more than 1 distinct value of last_name for each value of first_name.
For your query, let’s just look at the WHERE:
WHERE a.first_name IN (SELECT b.first_name
FROM names b
WHERE a.last_name<>b.last_name
Firstly, we "join" the 2 instances of names (aliased a ad b) where the value of last_name differs in the 2 instances. This will, likely, be an expensive operation; if you have a table will 1,000 rows, and it has 900 different values for last_name then you’re going to end up with the join getting hundreds of matches for each row.
After that, it checks to see the value of first_name for the row in the instance of names aliased as a in the rowset from the join; it is does then there are 2 (or more) instances of same first_name with different last_name values.
Using an IN with a subquery is no different, in context, to using a literal list of values. WHERE SomeColumn IN (1,2,3,4,5,7) would be the same as WHERE SomeColumn IN (SELECT I FROM SomeTable) if the column I in the table SomeTable had the values 1, 2, 3, 4, 5, and 7.