Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Selecting data from two different tables with case statement

I have 2 table.

tab1:

dates.
2021-09-30
2021-10-01
2021-10-02
2021-10-04
2021-11-15

buckets:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

bucket_dates
2021-10-01.
2021-10-02.
2021-10-03.
2021-11-03.

I want to join these two table to get final result like below. (If there is a matching date then it will match and if there is no matching date, then it should read the next highest date from tab1 table. e.g. 2021-10-03 & 2021-11-03 dates).

Result table:

bucket_dates final_dates
2021-10-01. 2021-10-01
2021-10-02. 2021-10-02
2021-10-03. 2021-10-04
2021-11-03. 2021-11-15

I tried to do this by using join query

select a.bucket_dates,
   case when b.dates is null then (select min(c.dates) from tab1 c where c.dates > a.bucket_dates)
      else b.dates end as final_dates
from buckets a left join tab1 b 
 on a.bucket_dates = b.dates; 

but this query is giving below error
Correlated column is not allowed in a non-equality predicate

Any suggestion will be really helpful.

>Solution :

If pyspark won’t allow > or >= in a correlated sub-query, just use the MIN() in the main query…

SELECT
  b.bucket_dates,
  MIN(t.dates)   AS final_dates
FROM
  buckets   AS b
LEFT JOIN
  tab1      AS t
    ON t.dates >= b.bucket_dates
GROUP BY
  b.bucket_dates
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading