Incorrect results when using last_value()

September 3, 2022

This is the table:

id	region	variety	price
1	Alexander Valley	Cabernet Sauvignon	35
2	Alexander Valley	Cabernet Sauvignon	45
3	Alexander Valley	Merlot	19
4	California	Sauvignon Blanc	8
5	California	Pinot Noir	17

I wanted to find out for each region, the cheapest and most expensive variety, so the output should be:

region	expensive	cheap
Alexander Valley	Cabernet Sauvignon	Merlot
California	Pinot Noir	Sauvignon Blanc

I was able to get the correct result using two first_value()

SELECT
  DISTINCT region,
  FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive,
  FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price) AS cheapest
FROM wine_list

I thought it would be equivalent to the following query

SELECT
  DISTINCT region,
  FIRST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS expensive,
  LAST_VALUE(variety) OVER (PARTITION BY region ORDER BY price DESC) AS cheapest
FROM wine_list

However now my output is:

region	expensive	cheap
Alexander Valley	Cabernet Sauvignon	Cabernet Sauvignon
Alexander Valley	Cabernet Sauvignon	Merlot
California	Pinot Noir	Pinot Noir
California	Pinot Noir	Sauvignon Blanc

Why is my output wrong? I am so baffled.

>Solution :

The default window for FIRST_VALUE and LAST_VALUE is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. I.e. it is the first resp. last value "so far".

You want it for the whole dataset, however, so you must explicitely describe the window range:

SELECT DISTINCT
  region,
  FIRST_VALUE(variety) OVER 
    (PARTITION BY region ORDER BY price DESC
     ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS expensive,
  LAST_VALUE(variety) OVER 
     (PARTITION BY region ORDER BY price DESC
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS cheapest
FROM wine_list;