Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

mismatched input error when trying to use Spark subquery

New at PySpark, trying to get a query to run and it seems like it SHOULD run but I get an EOF issue and I’m not sure how to resolve it..

What I’m trying to do is find all rows in blah.table where the value in col "domainname" matches a value from a list of domains. Then I want to grab 2 columns from those rows that contain ID information, and do another search finding all rows in blah.table that contain those pairs of IDs.

So far I have:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = spark.sql("select * from blah.table where id1,id2 in (select id1,id2 from blah.table where domainname in ('list.com','of.com','domains.com'))")

When I run it I get this error:

mismatched input ',' expecting {<EOF>, ';'}

If I split the query up, this seems to run fine by itself:

df = spark.sql("select id1,id2 from blah.table where domainname in ('list.com','of.com','domains.com')")

How can I make this work?

>Solution :

select * from blah.table
where (id1, id2) in (select id1,id2 from blah.table
                     where domainname in ('list.com','of.com','domains.com'))

Or, use EXISTS:

select * from blah.table t1
where EXISTS (select * from blah.table t2
              where t1.id1 = t2.id1 and t1.id2 = t2.id2
                and t2.domainname in ('list.com','of.com','domains.com'))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading