Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do Snowflake primary key definitions affect query performance?

I need to define primary keys on the Snowflake tables into which I’m loading data, since the ELT framework I’m using (Singer.io) uses them to define the match condition for merge/upsert. However, the order of the columns in the key seems to affect partitioning and query performance.

My primary key contains both a low and a high cardinality column. I’ve tried defining the key with the columns listed in either order, and when the low cardinality column is listed first, the table has fewer partitions, and some queries are slightly faster, but other queries can be much slower than with no primary key or when the high cardinality column is first.

So Snowflake appears to use primary keys as clustering/micro-partitioning hints, even though it doesn’t enforce them. Are there any guidelines or information on how the choice of primary key (not clustering key) can affect query performance?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

As of April, 2023, there are no performance drawbacks to defining primary and foreign keys. The only advantage is join elimination, which you can read about here:

https://docs.snowflake.com/en/user-guide/join-elimination

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading