This is the table:
| id | region | variety | price |
|---|---|---|---|
| 1 | Alexander Valley | Cabernet Sauvignon | 35 |
| 2 | Alexander Valley | Cabernet Sauvignon | 45 |
| 3 | Alexander Valley | Merlot | 19 |
| 4 | California | Sauvignon Blanc | 8 |
| 5 | California | Pinot Noir | 17 |
I wanted to find out for each region, the cheapest and most expensive variety, so the output should be:
| region | expensive | cheap |
|---|---|---|
| Alexander Valley | Cabernet Sauvignon | Merlot |
| California | Pinot Noir | Sauvignon Blanc |
I was able to get the correct result using two first_value()
SELECT
DISTINCT region,
FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive,
FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price) AS cheapest
FROM wine_list
I thought it would be equivalent to the following query
SELECT
DISTINCT region,
FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive,
LAST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS cheapest
FROM wine_list
However now my output is:
| region | expensive | cheap |
|---|---|---|
| Alexander Valley | Cabernet Sauvignon | Cabernet Sauvignon |
| Alexander Valley | Cabernet Sauvignon | Merlot |
| California | Pinot Noir | Pinot Noir |
| California | Pinot Noir | Sauvignon Blanc |
Why is my output wrong? I am so baffled.
>Solution :
The default window for FIRST_VALUE and LAST_VALUE is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. I.e. it is the first resp. last value "so far".
You want it for the whole dataset, however, so you must explicitely describe the window range:
SELECT DISTINCT
region,
FIRST_VALUE(variety) OVER
(PARTITION BY region ORDER BY price DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS expensive,
LAST_VALUE(variety) OVER
(PARTITION BY region ORDER BY price DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS cheapest
FROM wine_list;