Find table sizes and as percent of total dataset size in BigQuery dataset

I Just utilize two temporary inline tables to work out table sizes and total dataset size Google BigQuery. The dataset here is called test.

Once these two temp tables (called table_sizes and dataset_size respectively) are created, I do a cartesian join and that should be OK because dataset_size only has one column, namely DB_size.

WITH table_sizes AS ( 
 SELECT
    table_id
  , row_count
  , ROUND(SUM(size_bytes) / (1024*1024), 2) AS size_in_mb
FROM test.__TABLES__
GROUP BY
  table_id,
  row_count
), dataset_size AS
(
SELECT
ROUND(SUM(size_bytes)/1024/1024, 2) AS DB_size
FROM test.__TABLES__
)
SELECT
    t.table_id
  , t.row_count
  , t.size_in_mb
  , ROUND(t.size_in_mb*100/d.DB_size,1) AS percent
  FROM
  table_sizes t,
  dataset_size d
  ORDER BY 
  t.size_in_mb DESC
  LIMIT 5
  ;

It results in the following output

[{
  "table_id": "randomdata",
  "row_count": "11100330",
  "size_in_mb": "1651.42",
  "percent": "97.6"
}, {
  "table_id": "ocod_full",
  "row_count": "95535",
  "size_in_mb": "39.98",
  "percent": "2.4"
}, {
  "table_id": "DUMMY",
  "row_count": "10000",
  "size_in_mb": "0.99",
  "percent": "0.1"
}, {
  "table_id": "abcd",
  "row_count": "2",
  "size_in_mb": "0.0",
  "percent": "0.0"
}, {
  "table_id": "json",
  "row_count": "0",
  "size_in_mb": "0.0",
  "percent": "0.0"
}]

If anyone has better suggestions please add

>Solution :

As you second query returns a scalar value you can try.

 SELECT
    table_id
  , row_count
  , ROUND(SUM(size_bytes) / (1024*1024), 2) AS size_in_mb
,ROUND(SUM(size_bytes) / (1024*1024), 2) / (
  SELECT
     ROUND(SUM(size_bytes)/1024/1024, 2) 
  FROM test.__TABLES__
  ) AS precent
FROM test.__TABLES__

GROUP BY
  table_id,
  row_count

Leave a Reply