Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Can someone explain how the sql query works?

The below SQL query gets all the first_names which have different last names.

select a.first_name
from names a
WHERE a.first_name in (
select b.first_name 
  from names b
  WHERE a.last_name<>b.last_name
)

I am not able to figure out how this exactly works. What I thought would happens is, for every row in the table the subquery will check if there exists a row with a different last name. But the checking happens among only identical values of first name. Can someone explain how this actually works.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Firstly, as I mentioned in the comments, I would recommend instead using a HAVING for this. It won’t require 2 scans of the table, just one, it won’t return duplicates for the same first_name and it’s probably easier for you to understand:

SELECT first_name
FROM dbo.names
GROUP BY first_name
HAVING COUNT(DISTINCT last_name) > 1;

This should be fairly self explanative; it returns rows where there is more than 1 distinct value of last_name for each value of first_name.


For your query, let’s just look at the WHERE:

WHERE a.first_name IN (SELECT b.first_name 
                       FROM names b
                       WHERE a.last_name<>b.last_name

Firstly, we "join" the 2 instances of names (aliased a ad b) where the value of last_name differs in the 2 instances. This will, likely, be an expensive operation; if you have a table will 1,000 rows, and it has 900 different values for last_name then you’re going to end up with the join getting hundreds of matches for each row.

After that, it checks to see the value of first_name for the row in the instance of names aliased as a in the rowset from the join; it is does then there are 2 (or more) instances of same first_name with different last_name values.

Using an IN with a subquery is no different, in context, to using a literal list of values. WHERE SomeColumn IN (1,2,3,4,5,7) would be the same as WHERE SomeColumn IN (SELECT I FROM SomeTable) if the column I in the table SomeTable had the values 1, 2, 3, 4, 5, and 7.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading