Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

SQL: Identifying a Row Closest to a Given Date?

I am working with Netezza SQL.

I have the following table (my_table):

  id fav_color date_of_entry
1  1       red    2009-01-01
2  1      blue    2010-05-05
3  1       red    2011-01-01
4  2     green    2009-02-02
5  2       red    2010-04-04
6  2      blue    2020-09-09
7  3       red    2009-05-05
8  3      blue    2009-06-06
9  3       red    2010-05-05

My Question:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • I want to create a variable (var1) corresponding to the favorite color for each ID on 2010-01-01 . Since no one declares their favorite color exactly on 2010-01-01 , I want to find out the color that each ID has the earliest available date before 2010-01-01.

  • I want to also create a new variable (var2) that is 1 is var1 = red else 0

The final answer should look like this:.

  id fav_color date_of_entry  var1 var2
1  1       red    2009-01-01   red    1
2  1      blue    2010-05-05   red    1
3  1       red    2011-01-01   red    1
4  2     green    2009-02-02 green    0
5  2       red    2010-04-04 green    0
6  2      blue    2020-09-09 green    0
7  3       red    2009-05-05  blue    0
8  3       red    2009-06-06  blue    0
9  3       red    2010-05-05  blue    0

Here is my attempt to do this:

WITH temp AS (
    SELECT id, fav_color, date_of_entry,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY date_of_entry DESC) AS row_num
    FROM my_table
)
SELECT my_table.id, my_table.fav_color, my_table.date_of_entry, temp.fav_color AS var1,
CASE WHEN temp.fav_color = 'red' THEN 1 ELSE 0 END AS var2
FROM my_table
LEFT JOIN temp
ON my_table.id = temp.id AND temp.row_num = 1;

But my final answer does not match the desired output:

  id fav_color date_of_entry var1 var2
1  1       red    2009-01-01  red    1
2  1      blue    2010-05-05  red    1
3  1       red    2011-01-01  red    1
4  2     green    2009-02-02 blue    0
5  2       red    2010-04-04 blue    0
6  2      blue    2020-09-09 blue    0
7  3       red    2009-05-05  red    1
8  3      blue    2009-06-06  red    1
9  3       red    2010-05-05  red    1

Can someone please show me how to fix this?

Thanks!

Note:

Here is the data for the question:

my_table = (
id = c(1,1,1,2,2,2,3,3,3),
fav_color = c("red", "blue", "red", "green", "red", "blue", "red", "blue", "red"),
date_of_entry = c("2009-01-01", "2010-05-05", "2011-01-01", "2009-02-02", "2010-04-04", "2020-09-09", "2009-05-05", "2009-06-06", "2010-05-05")
)

>Solution :

In the CTE, you should also filter off any records occurring after 2010-01-01:

WITH cte AS (
    SELECT id, fav_color, date_of_entry,
           ROW_NUMBER() OVER (PARTITION BY id ORDER BY date_of_entry DESC) AS rn
    FROM my_table
    WHERE date_of_entry <= '2010-01-01'
)

SELECT id, fav_color, date_of_entry,
       CASE WHEN fav_color = 'red' THEN 1 ELSE 0 END AS var2
FROM cte
WHERE rn = 1;
ORDER BY id;
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading