Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find table sizes and as percent of total dataset size in BigQuery dataset

I Just utilize two temporary inline tables to work out table sizes and total dataset size Google BigQuery. The dataset here is called test.

Once these two temp tables (called table_sizes and dataset_size respectively) are created, I do a cartesian join and that should be OK because dataset_size only has one column, namely DB_size.

WITH table_sizes AS ( 
 SELECT
    table_id
  , row_count
  , ROUND(SUM(size_bytes) / (1024*1024), 2) AS size_in_mb
FROM test.__TABLES__
GROUP BY
  table_id,
  row_count
), dataset_size AS
(
SELECT
ROUND(SUM(size_bytes)/1024/1024, 2) AS DB_size
FROM test.__TABLES__
)
SELECT
    t.table_id
  , t.row_count
  , t.size_in_mb
  , ROUND(t.size_in_mb*100/d.DB_size,1) AS percent
  FROM
  table_sizes t,
  dataset_size d
  ORDER BY 
  t.size_in_mb DESC
  LIMIT 5
  ;

It results in the following output

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[{
  "table_id": "randomdata",
  "row_count": "11100330",
  "size_in_mb": "1651.42",
  "percent": "97.6"
}, {
  "table_id": "ocod_full",
  "row_count": "95535",
  "size_in_mb": "39.98",
  "percent": "2.4"
}, {
  "table_id": "DUMMY",
  "row_count": "10000",
  "size_in_mb": "0.99",
  "percent": "0.1"
}, {
  "table_id": "abcd",
  "row_count": "2",
  "size_in_mb": "0.0",
  "percent": "0.0"
}, {
  "table_id": "json",
  "row_count": "0",
  "size_in_mb": "0.0",
  "percent": "0.0"
}]

If anyone has better suggestions please add

>Solution :

As you second query returns a scalar value you can try.

 SELECT
    table_id
  , row_count
  , ROUND(SUM(size_bytes) / (1024*1024), 2) AS size_in_mb
,ROUND(SUM(size_bytes) / (1024*1024), 2) / (
  SELECT
     ROUND(SUM(size_bytes)/1024/1024, 2) 
  FROM test.__TABLES__
  ) AS precent
FROM test.__TABLES__

GROUP BY
  table_id,
  row_count
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading