Let’s say I have two tables (from uploaded csv files) and I want to do a diff based on an id+territory in the new file that wasn’t in the old file. The easiest way to do this is something like:
SELECT id, territory FROM this_week EXCEPT SELECT id, territory FROM last_week
However, what I’m trying to do is get ALL fields (in both tables — one row per key) that are generated by that difference. How could this be done?
Either postgres or bigquery is fine. Both have the EXCEPT set op.
>Solution :
Use NOT EXISTS:
SELECT * -- all columns of "this_week"
FROM this_week t
WHERE NOT EXISTS (
SELECT FROM last_week l
WHERE t.id = l.id
AND t.territory = l.territory
);
And index on last_week (id, territory) would typically help performance (a lot).
This shows all columns of this_week.
I don’t see the point in adding columns of last_week, which would be empty (null) by definition of the query if you’d left-join.
Basics:
Note a subtle difference:
EXCEPT (when used without ALL) folds duplicates. This query does not. You may want one or the other. Typically, you want this.