Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Hive queries taking so long

I have a CDP environment running Hive, for some reason some queries run pretty quickly and others are taking even more than 5 minutes to run, even a regular select current_timestamp or things like that.
I see that my cluster usage is pretty low so I don’t understand why this is happening.

How can I use my cluster fully? I read some posts in the cloudera website, but they are not helping a lot, after all the tuning all the things are the same.

Something to note is that I have the following message in the hive logs:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

"Get Query Coordinator (AM)  350"

Then I see that the time to execute the query was pretty low.

I am using tez, any idea what can I look at?

>Solution :

Besides taking care of the overall tuning: https://community.cloudera.com/t5/Community-Articles/Demystify-Apache-Tez-Memory-Tuning-Step-by-Step/ta-p/245279

Please check my answer to this same issue here Enable hive parallel processing

That post explains what you need to do to enable parallel processing.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading