So, I have dataset that looks like below. I would like to write a query that extract users who have done steps 1,2 & 3 but not 4 & 5. The output of the query should be User 1 and User 4.
User_ID | Steps |
---|---|
User 1 | Step 1 |
User 1 | Step 2 |
User 1 | Step 3 |
User 2 | Step 1 |
User 2 | Step 2 |
User 2 | Step 3 |
User 2 | Step 4 |
User 2 | Step 5 |
User 3 | Step 1 |
User 3 | Step 2 |
User 3 | Step 3 |
User 3 | Step 4 |
User 3 | Step 5 |
User 4 | Step 1 |
User 4 | Step 2 |
User 4 | Step 3 |
Trying to use NOT IN but it does not work.
Tried using NOT IN but it does not work. I’m a bit new to SQL so help would be appreciated.
The result of the SQL query should be distinct user_ids that did not perform step 4 & 5.
Expected result:
User 1
User 4
>Solution :
It sounds like your requirement is to find users who have only done steps 1, 2, and 3. One approach uses aggregation:
SELECT User_ID
FROM yourTable
GROUP BY User_ID
HAVING COUNT(DISTINCT Steps) = 3 AND
COUNT(CASE WHEN Steps NOT IN (1, 2, 3) THEN 1 END) = 0;
The first COUNT
in the HAVING
clause ensures that there are exactly 3 distinct steps for a matching user. The second COUNT
ensures that those steps can only be 1, 2, or 3.