Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Spark SQL: Select 10 randomly chosen groups of value in column of table

I have a table, in a database. There is a column "id" and to each id there are multiple rows.
I want to randomly pick 10 ids and then load all the rows with these ids.
The following is working and gives me a table with 10 random ids. That’s fine.

distinct_ids = spark.sql(f""" (SELECT DISTINCT id  FROM {database_name}.{orig_table_cl_name} ORDER BY RAND() LIMIT 10)

But now I am not getting how to do an inner join with the original table to get all the data containing the right ids…
I tried:

distinct_ids = spark.sql(
f""" (SELECT DISTINCT vehicle_id 
FROM {database_name}.{orig_table_name} 
ORDER BY RAND() LIMIT 10) 
AS table
INNER JOIN table
ON {database_name}.{orig_table_name}.id = table.id""")

Giving me following error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ParseException: 
mismatched input 'AS' expecting {<EOF>, ';'}(line 1, pos 99)

== SQL ==
 (SELECT DISTINCT vehicle_id FROM pnds12v_dev_core.t_cycle_log_car_v2_0  ORDER BY RAND() LIMIT 10) AS table

I tried a bunch of other orders of the arguments but figured I am missing the general understanding of how this works… and since I thought there are no semicolons when using spark.sql the error message does not give me any hints.

Can someone explain to me how to do such a query?

>Solution :

Use common table expression (CTE) to define a temporary result set.

distinct_ids = spark.sql(
f"""with table as
    (SELECT DISTINCT vehicle_id as id
    FROM {database_name}.{orig_table_name} 
    ORDER BY RAND() LIMIT 10)
select * from table INNER JOIN {database_name}.{orig_table_name}
ON {database_name}.{orig_table_name}.id = table.id""")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading