Big Data Query

A Simple Query — Or So We Thought

EXPORT DATA
OPTIONS (
    uri = 'gs://xxxxx/*.json',
    format = 'JSON',
    overwrite = true)
AS (
    SELECT *
    FROM `bigquery-public-data.crypto_solana_xxxxx.Instructions`
    LIMIT 1000000
);

This query exports 1,000,000 rows from the Instructions table in the crypto_solana dataset (hosted in BigQuery’s public datasets) to a Google Cloud Storage bucket in JSON format.

Three queries. 1,576.56 TB of data “scanned.” 

The invoice for three queries shows 1,576.56 TB of data scanned, charged at $9,847.24 for those three queries!!!

The cost breakdown was even crazier:

  • Total “scanned” data: 1,576.56 TB across three queries
  • Each query, despite using LIMIT, was billed for 509.89 TB of scanned data
  • Queries ran in 22 seconds — which implies a 23 TB per second scan rate
SELECT * FROM huge_table LIMIT 100;
  • Even if only 100 rows are returned, you’re charged as if you scanned the entire table.
  • If the table is 1 PB in size, you’re billed for 1 PB of data scanned.
  • Filtering doesn’t help — if you reference the table, you pay for it.
  • Charges are based on total referenced data, not actual scanned data.
  • LIMIT does not reduce the amount of data billed — if your query touches a large table, you’re billed for the entire thing.
  • Partition pruning is unpredictable — queries may still scan and bill for full table sizes.