BigQuery Cost Optimization + Best Practices
BigQuery Cost Optimization + Best Practices
Exabyte-scale data
warehousing Built-in ML and Geospatial for
predictive insights
Unique
Encrypted, durable, secure,
And highly available
High-speed, in-memory BI Engine
for faster reporting and analysis
Unique
BigQuery | Architectural Advantage
Decoupled storage and compute for maximum flexibility
SQL:2011
Compliant
Replicated, Distributed BigQuery High-Available Cluster
Storage Compute
Streaming (99.9999999999% durability) (Dremel) REST API
Ingest
Distributed Web UI, CLI
Memory
Shuffle Tier
Client
Libraries
Free Bulk In 7
Loading Petabit Network languages
BigQuery | Managed storage
Durable and persistent storage with automatic backup
● Seamless maintenance
● Avoid SELECT * (Use preview option to explore your data - its free!) Query
1
● Denormalize your data (nested fields) *To bear in mind: BigQuery is a Data Warehouse required data
● Filter your query as early and as often as possible to improve performance and
reduce cost.
● Check how much your query is going to be charged
● Avoid SQL anti-patterns Enforce cost
2
control
Partition and
3
cluster
Flat-rate
4
pricing
Avoid human errors Optimize
querying
● Enforce MAX limits on bytes processed at query, user and project level. Query
1
required data
● Cancelling a query may cost $
Enforce cost
2
control
Partition and
3
cluster
Flat-rate
4
pricing
Partition & cluster your data Optimize
querying
Query
Partition your table to reduce the data sweeped 1
● required data
○ Enable required partition filter
Enforce cost
2
control
Partition and
3
cluster
Partitioning Clustering
Flat-rate
4
pricing
Flat-rate & Reservations Optimize
querying
Query
● Think about flat-rate once your BigQuery processing cost > $10K 1
required data
○ Familiarize with BigQuery cost using our pricing calculator
● How many slots you should buy? - Visualize slot utilization in Stackdriver
Enforce cost
2
control
Partition and
3
cluster
Flat-rate
4
pricing
02
Optimizing
Storage
How long are you keeping your data? Optimizing
Storage
Long term
2
storage
Avoid duplicate
3
storage
● If your table or partition has not been edited for 90 days, the storage 1 Data Retention
price drops by 50% (Long-term storage)
○ Watchout for any actions that edits your table: Loading into
BQ, DML operations, streaming inserts, .. Long term
2
storage
● For long term archives with access frequency at most once a year -
leverage Coldline class in GCS.
Avoid duplicate
3
storage
Streaming
4
inserts
Backup and
5
Recovery
Avoid duplicate copies of data Optimizing
Storage
Avoid duplicate
Use cases: 3
storage
● Frequently changing small side inputs
● Ingestion with cleanup that needs to be archived
● Querying of large archives Streaming
4
● Querying is less performant - gotcha! inserts
Backup and
5
Recovery
Optimizing
Storage
Loading the data
1 Data Retention
● Batch upload is free. Use streaming inserts only if it consumed by
downstream processes in real time.
Long term
2
storage
Backup and
5
Recovery
BigQuery Materialized Views
Sub-second queries
Simplified architecture
Smart tuning
Visualize cost
bit.ly/gcp-co-bq
Thank you
Appendix
Ingestion formats
Faster Avro
Avro (Compressed)
Avro (Uncompressed) Parquet
Parquet / ORC
CSV ORC
JSON BigQuery
CSV (Compressed) CSV
JSON (Compressed)
Slower JSON
Introducing
BigQuery Omni
A flexible, fully-managed, multi-cloud
analytics solution that lets you analyze data
across public clouds without leaving the
familiar BigQuery user interface.
Data integration partners
Databases
Data warehouses
Pricing Efficiency
● Flex Slots
● BigQuery Slots Recs