Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Can SQL Server query time-complexity (latency) reach O(1) given a set of optimizations?

Imagine one has two SQL tables containing daily time-series data.

Both of the tables have primary key (and since it’s SQL Server the key is also clustered). The key is composite and includes field BUSINESS_DATE of type Date.

Both table are partitioned monthly.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Table_Short contains only 2 years worth of data.
Table_Long contains 50 years worth of data.

The end-user can only query maximum last 1 year worth of data at a time. Therefore, the front-end (UI) will always make a SELECT statement like

SELECT <some columns>
FROM <Table>
WHERE BUSINESS_DATE >= t1 
AND BUSINESS_DATE <= t2

for some dates t1, t2.

Given that the user only needs to look into the last year and the fact that the tables are partitioned and ordered by business_date, is it true, that the query latency should not depend on the table size, thus query against Table_Long will take the same time as against Table_Long?

My understanding was that SQL Server stores tables as Binary Tree, where search is at worst O(log n).
However, given that we only need the last year in either 2 years or 50 years, should the time-complexity become O(1)?

>Solution :

Query latencies may still depend on table size even though SQL Server uses a B-tree index to efficiently search and retrieve data. In certain cases, latency can depend on table size even if the data is partitioned.

In theory, while time complexity of the search operation may still be O(log n). When it comes to practice, SELECT statement shouldn’t depend on the overall size of the tables, but more on the number of partitions that need to be scanned to get data within a specific date range.(the partitioning scheme allows for efficient pruning of irrelevant data)

  • In your case, we are querying for the last year of data, and both tables are partitioned by month. SQL Server query optimizer should be able to use the partitioning scheme to prune the partitions that do not contain the relevant data, resulting in a smaller search space and faster query execution.

Ps. SQL Server uses a clustered index physically ordering the data in a table, which further improves query performance.

  • in your case, the primary key in both tables is clustered and includes BUSINESS_DATE field (data within each partition will be ordered by BUSINESS_DATE, making it easier and faster to locate data within each partition, resulting in query execution times that are not dependent on the overall size of the tables)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading