Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

SQL Join Using REGEXP_SUBSTR and Wildcard

I’m trying to join two tables in Snowflake using REGEX_SUBSTR and a wildcard but having no luck. Here is what I have:

SELECT P.NAME,
       ACL.CONTENT_NAME,
       REGEXP_SUBSTR(CONTENT_NAME, '/([^/]+(\\.pdf))$', 1, 1, 'e', 1) AS PDFS
FROM ACTIVITY_DOWNLOAD AD
         JOIN PROGRAM P
              ON P.NAME LIKE '%' || REGEXP_SUBSTR(CONTENT_NAME, '/([^/]+(\\.pdf))$', 1, 1, 'e', 1) || '%'

Running the query doesn’t return anything–any ideas on what the issue is?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

using some CTE’s for fake data:

WITH ACTIVITY_DOWNLOAD(CONTENT_NAME) as (
    select * from values
     ('http://www.example.com/string1/string2.pdf'),
     ('http://www.example.com/string1/string3.gif')
), PROGRAM(name) as (
    select * from values
    ('string2.pdf')
)

and the SQL

SELECT P.NAME,
       ACL.CONTENT_NAME,
       REGEXP_SUBSTR(CONTENT_NAME, '/([^/]+(\\.pdf))$', 1, 1, 'e', 1) AS PDFS
FROM ACTIVITY_DOWNLOAD ACL
         JOIN PROGRAM P
              ON P.NAME LIKE '%' || REGEXP_SUBSTR(ACL.CONTENT_NAME, '/([^/]+(\\.pdf))$', 1, 1, 'e', 1) || '%'

gives:

NAME CONTENT_NAME PDFS
string2.pdf http://www.example.com/string1/string2.pdf string2.pdf

which aligned with what I would expect.

depending how large your ACTIVITY_DOWNLOAD table is, you might find a CTE/sub-select give better performance.

SELECT P.NAME,
       f.CONTENT_NAME,
       f.file_name AS PDFS
FROM (
    select distinct
        CONTENT_NAME,
        REGEXP_SUBSTR(ACL.CONTENT_NAME, '/([^/]+(\\.pdf))$', 1, 1, 'e', 1) as file_name,
        '%' || file_name || '%' as like_filter
    from ACTIVITY_DOWNLOAD as acl
)f
JOIN PROGRAM P
    ON P.NAME LIKE f.like_filter

OR

WITH clean_content_names as (
    select distinct
        CONTENT_NAME,
        REGEXP_SUBSTR(ACL.CONTENT_NAME, '/([^/]+(\\.pdf))$', 1, 1, 'e', 1) as file_name,
        '%' || file_name || '%' as like_filter
    from ACTIVITY_DOWNLOAD as acl
)
SELECT P.NAME,
       f.CONTENT_NAME,
       f.file_name AS PDFS
FROM clean_content_names f
JOIN PROGRAM P
    ON P.NAME LIKE f.like_filter
WITH ACTIVITY_DOWNLOAD(CONTENT_NAME) as (
    select * from values
     ('http://www.example.com/string1/string2.pdf'),
     ('http://www.example.com/string1/string3.gif')
), PROGRAM(name) as (
    select * from values
    ('string2.pdf')
)
SELECT P.NAME,
       f.CONTENT_NAME,
       f.file_name AS PDFS
FROM (
    select distinct
        CONTENT_NAME,
        REGEXP_SUBSTR(ACL.CONTENT_NAME, '/([^/]+(\\.pdf))$', 1, 1, 'e', 1) as file_name,
        '%' || file_name || '%' as like_filter
    from ACTIVITY_DOWNLOAD as acl
)f
JOIN PROGRAM P
    ON P.NAME LIKE f.like_filter;
NAME CONTENT_NAME PDFS
string2.pdf http://www.example.com/string1/string2.pdf string2.pdf
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading