I’m learning the databricks platform at the moment, and i’m on a lesson where we are talking about CTE’s. This specific query is of a CTE in a CTE definition, and the girl in the video is not doing the best job breaking down what exactly this query is doing.
WITH lax_bos AS (
WITH origin_destination (origin_airport, destination_airport) AS (
SELECT
origin,
destination
FROM
external_table
)
SELECT
*
FROM
origin_destination
WHERE
origin_airport = 'LAX'
AND destination_airport = 'BOS'
)
SELECT
count(origin_airport) AS `Total Flights from LAX to BOS`
FROM
lax_bos;
the output of the query comes out to 684 wich i know comes from the last select statement, It’s just mostly everything thats going on above, i don’t fully understand whats happening.
>Solution :
at first you choose 2 needed columns from external_table and name this cte "origin_destination" :
SELECT
origin,
destination
FROM
external_table
next you filter it in another cte named "lax_bos"
SELECT
*
FROM
origin_destination ------the cte you already made
WHERE
origin_airport = 'LAX'
AND destination_airport = 'BOS'
and this is the main query where you use cte "lax_bos" that you made in previous step, here you just count a number of flights:
SELECT
count(origin_airport) AS `Total Flights from LAX to BOS`
FROM
lax_bos