I am working with Netezza SQL.
I have the following table (my_table):
id fav_color date_of_entry
1 1 red 2009-01-01
2 1 blue 2010-05-05
3 1 red 2011-01-01
4 2 green 2009-02-02
5 2 red 2010-04-04
6 2 blue 2020-09-09
7 3 red 2009-05-05
8 3 blue 2009-06-06
9 3 red 2010-05-05
My Question:
-
I want to create a variable (var1) corresponding to the favorite color for each ID on 2010-01-01 . Since no one declares their favorite color exactly on 2010-01-01 , I want to find out the color that each ID has the earliest available date before 2010-01-01.
-
I want to also create a new variable (var2) that is 1 is var1 = red else 0
The final answer should look like this:.
id fav_color date_of_entry var1 var2
1 1 red 2009-01-01 red 1
2 1 blue 2010-05-05 red 1
3 1 red 2011-01-01 red 1
4 2 green 2009-02-02 green 0
5 2 red 2010-04-04 green 0
6 2 blue 2020-09-09 green 0
7 3 red 2009-05-05 blue 0
8 3 red 2009-06-06 blue 0
9 3 red 2010-05-05 blue 0
Here is my attempt to do this:
WITH temp AS (
SELECT id, fav_color, date_of_entry,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date_of_entry DESC) AS row_num
FROM my_table
)
SELECT my_table.id, my_table.fav_color, my_table.date_of_entry, temp.fav_color AS var1,
CASE WHEN temp.fav_color = 'red' THEN 1 ELSE 0 END AS var2
FROM my_table
LEFT JOIN temp
ON my_table.id = temp.id AND temp.row_num = 1;
But my final answer does not match the desired output:
id fav_color date_of_entry var1 var2
1 1 red 2009-01-01 red 1
2 1 blue 2010-05-05 red 1
3 1 red 2011-01-01 red 1
4 2 green 2009-02-02 blue 0
5 2 red 2010-04-04 blue 0
6 2 blue 2020-09-09 blue 0
7 3 red 2009-05-05 red 1
8 3 blue 2009-06-06 red 1
9 3 red 2010-05-05 red 1
Can someone please show me how to fix this?
Thanks!
Note:
Here is the data for the question:
my_table = (
id = c(1,1,1,2,2,2,3,3,3),
fav_color = c("red", "blue", "red", "green", "red", "blue", "red", "blue", "red"),
date_of_entry = c("2009-01-01", "2010-05-05", "2011-01-01", "2009-02-02", "2010-04-04", "2020-09-09", "2009-05-05", "2009-06-06", "2010-05-05")
)
>Solution :
In the CTE, you should also filter off any records occurring after 2010-01-01
:
WITH cte AS (
SELECT id, fav_color, date_of_entry,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date_of_entry DESC) AS rn
FROM my_table
WHERE date_of_entry <= '2010-01-01'
)
SELECT id, fav_color, date_of_entry,
CASE WHEN fav_color = 'red' THEN 1 ELSE 0 END AS var2
FROM cte
WHERE rn = 1;
ORDER BY id;