I Just utilize two temporary inline tables to work out table sizes and total dataset size Google BigQuery. The dataset here is called test.
Once these two temp tables (called table_sizes and dataset_size respectively) are created, I do a cartesian join and that should be OK because dataset_size only has one column, namely DB_size.
WITH table_sizes AS (
SELECT
table_id
, row_count
, ROUND(SUM(size_bytes) / (1024*1024), 2) AS size_in_mb
FROM test.__TABLES__
GROUP BY
table_id,
row_count
), dataset_size AS
(
SELECT
ROUND(SUM(size_bytes)/1024/1024, 2) AS DB_size
FROM test.__TABLES__
)
SELECT
t.table_id
, t.row_count
, t.size_in_mb
, ROUND(t.size_in_mb*100/d.DB_size,1) AS percent
FROM
table_sizes t,
dataset_size d
ORDER BY
t.size_in_mb DESC
LIMIT 5
;
It results in the following output
[{
"table_id": "randomdata",
"row_count": "11100330",
"size_in_mb": "1651.42",
"percent": "97.6"
}, {
"table_id": "ocod_full",
"row_count": "95535",
"size_in_mb": "39.98",
"percent": "2.4"
}, {
"table_id": "DUMMY",
"row_count": "10000",
"size_in_mb": "0.99",
"percent": "0.1"
}, {
"table_id": "abcd",
"row_count": "2",
"size_in_mb": "0.0",
"percent": "0.0"
}, {
"table_id": "json",
"row_count": "0",
"size_in_mb": "0.0",
"percent": "0.0"
}]
If anyone has better suggestions please add
>Solution :
As you second query returns a scalar value you can try.
SELECT
table_id
, row_count
, ROUND(SUM(size_bytes) / (1024*1024), 2) AS size_in_mb
,ROUND(SUM(size_bytes) / (1024*1024), 2) / (
SELECT
ROUND(SUM(size_bytes)/1024/1024, 2)
FROM test.__TABLES__
) AS precent
FROM test.__TABLES__
GROUP BY
table_id,
row_count