SQL: Identifying a Row Closest to a Given Date?

I am working with Netezza SQL.

I have the following table (my_table):

  id fav_color date_of_entry
1  1       red    2009-01-01
2  1      blue    2010-05-05
3  1       red    2011-01-01
4  2     green    2009-02-02
5  2       red    2010-04-04
6  2      blue    2020-09-09
7  3       red    2009-05-05
8  3      blue    2009-06-06
9  3       red    2010-05-05

My Question:

  • I want to create a variable (var1) corresponding to the favorite color for each ID on 2010-01-01 . Since no one declares their favorite color exactly on 2010-01-01 , I want to find out the color that each ID has the earliest available date before 2010-01-01.

  • I want to also create a new variable (var2) that is 1 is var1 = red else 0

The final answer should look like this:.

  id fav_color date_of_entry  var1 var2
1  1       red    2009-01-01   red    1
2  1      blue    2010-05-05   red    1
3  1       red    2011-01-01   red    1
4  2     green    2009-02-02 green    0
5  2       red    2010-04-04 green    0
6  2      blue    2020-09-09 green    0
7  3       red    2009-05-05  blue    0
8  3       red    2009-06-06  blue    0
9  3       red    2010-05-05  blue    0

Here is my attempt to do this:

WITH temp AS (
    SELECT id, fav_color, date_of_entry,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY date_of_entry DESC) AS row_num
    FROM my_table
)
SELECT my_table.id, my_table.fav_color, my_table.date_of_entry, temp.fav_color AS var1,
CASE WHEN temp.fav_color = 'red' THEN 1 ELSE 0 END AS var2
FROM my_table
LEFT JOIN temp
ON my_table.id = temp.id AND temp.row_num = 1;

But my final answer does not match the desired output:

  id fav_color date_of_entry var1 var2
1  1       red    2009-01-01  red    1
2  1      blue    2010-05-05  red    1
3  1       red    2011-01-01  red    1
4  2     green    2009-02-02 blue    0
5  2       red    2010-04-04 blue    0
6  2      blue    2020-09-09 blue    0
7  3       red    2009-05-05  red    1
8  3      blue    2009-06-06  red    1
9  3       red    2010-05-05  red    1

Can someone please show me how to fix this?

Thanks!

Note:

Here is the data for the question:

my_table = (
id = c(1,1,1,2,2,2,3,3,3),
fav_color = c("red", "blue", "red", "green", "red", "blue", "red", "blue", "red"),
date_of_entry = c("2009-01-01", "2010-05-05", "2011-01-01", "2009-02-02", "2010-04-04", "2020-09-09", "2009-05-05", "2009-06-06", "2010-05-05")
)

>Solution :

In the CTE, you should also filter off any records occurring after 2010-01-01:

WITH cte AS (
    SELECT id, fav_color, date_of_entry,
           ROW_NUMBER() OVER (PARTITION BY id ORDER BY date_of_entry DESC) AS rn
    FROM my_table
    WHERE date_of_entry <= '2010-01-01'
)

SELECT id, fav_color, date_of_entry,
       CASE WHEN fav_color = 'red' THEN 1 ELSE 0 END AS var2
FROM cte
WHERE rn = 1;
ORDER BY id;

Leave a Reply