Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

choose latest partition of a Bigquery table where filter over partition column is required

I have been using the following query

    SELECT DISTINCT
      *
    FROM
      `project.dataset.table` t
    WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) 

It is not ideal as the partition could be unavailable due to delay.. Thus I try the following queries

    SELECT DISTINCT
      *
    FROM
      `project.dataset.table` t
    WHERE DATE(_PARTITIONTIME) IN 
    (
       SELECT 
         MAX(DATE(_PARTITIONTIME)) AS max_partition
       FROM `project.dataset.table`
       WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
    )

as well as

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    SELECT DISTINCT
      *
    FROM
      `project.dataset.table` t
    WHERE TIMESTAMP(DATE(_PARTITIONTIME)) IN 
    (
      SELECT parse_timestamp("%Y%m%d", MAX(partition_id))
      FROM `project.dataset.INFORMATION_SCHEMA.PARTITIONS`
      WHERE table_name = 'table'
    )

Neither of them work due to

Cannot query over table ‘project.dataset.table’ without a filter over
column(s) ‘_PARTITION_LOAD_TIME’, ‘_PARTITIONDATE’, ‘_PARTITIONTIME’
that can be used for partition elimination.

>Solution :

In both of your solutions the limiting filter for the partition column is calculated during the query. This lead to full table scan.

Therfore, you need to add a filter for the partition column which is always know at the beginning of the run of your query.

SELECT DISTINCT
      *
    FROM
      `project.dataset.table` t
    WHERE DATE(_PARTITIONTIME) IN 
    (
       SELECT 
         MAX(DATE(_PARTITIONTIME)) AS max_partition
       FROM `project.dataset.table`
       WHERE DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
    )
AND DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading