Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Databricks spark sql, fact query, slow at showing data

using the following example on a large table:

pages = spark.sql('select * from table xx'), I found that the query runs in seconds, but as soon as I want to see the data with pages.show(n=10) it takes minutes to get the data to have a sample of that data. What is happening under the hood to be so slow.

the SQL (spark.sql) command takes < 1 second but the pages.show(n=10) takes minutes.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Spark does lazy evaluation so it won’t start actually executing the command (e.g. select * from table xx) until an ‘action’ is call (e.g. .show(), .write or display() in Databricks).

The part that is running <1 sec is the evaluation—it’s checking to see if the command can be executed, but not actually executing until an action.

Related reads on Transformation vs Actions with Spark:

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading