0% found this document useful (0 votes)

100 views30 pages

BigQuery Cost Optimization + Best Practices

Estudos de Data Engineer GCP

Uploaded by

elaine.cristina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views30 pages

BigQuery Cost Optimization + Best Practices

Estudos de Data Engineer GCP

Uploaded by

elaine.cristina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Cost optimization

best practices for

BigQuery
TecHub
Data Analytics
Google BigQuery

Fully managed and serverless

for maximum agility and scale
Unique
Google Cloud Platform’s
enterprise data warehouse Real-time insights from streaming data
for analytics
Unique

Exabyte-scale data
warehousing Built-in ML and Geospatial for
predictive insights
Unique
Encrypted, durable, secure,
And highly available
High-speed, in-memory BI Engine
for faster reporting and analysis
Unique
BigQuery | Architectural Advantage
Decoupled storage and compute for maximum flexibility

SQL:2011
Compliant
Replicated, Distributed BigQuery High-Available Cluster
Storage Compute
Streaming (99.9999999999% durability) (Dremel) REST API
Ingest
Distributed Web UI, CLI
Memory
Shuffle Tier
Client
Libraries
Free Bulk In 7
Loading Petabit Network languages
BigQuery | Managed storage
Durable and persistent storage with automatic backup

● Tables are stored in optimized

Table 1 Table 2 Table 3
columnar format

● Each table is compressed and

encrypted on disk Region

● Storage is durable & each table is

replicated across datacenters 2 2

● You can do time travel on data 3 3

within 7 days 1 1 2 1 3

Zone A Zone B Zone C

BigQuery | Large stateless compute
Modern architecture for scalability and performance

● Superlinear horizontal scalability

● Immune to node/rack downtime

● Seamless maintenance

● Pipelined execution, dynamic work

repartitioning, speculative execution
Cost optimization techniques

Query Processing Storage

● Ondemand pricing ● Data Retention
○ Query the data you need
● Long term storage
○ Query cost controls
○ Partition and Cluster your tables ● Avoid duplicate storage - use
(includes zero maintenance federated data access model
auto-reclustering)
● Streaming Insert
● Flat-rate pricing
● Backup and recovery
01
Optimize
Querying
Query the data you need Optimize
querying

● Avoid SELECT * (Use preview option to explore your data - its free!) Query
1
● Denormalize your data (nested fields) *To bear in mind: BigQuery is a Data Warehouse required data
● Filter your query as early and as often as possible to improve performance and
reduce cost.
● Check how much your query is going to be charged
● Avoid SQL anti-patterns Enforce cost
2
control

Partition and
3
cluster

Flat-rate
4
pricing
Avoid human errors Optimize
querying

● Enforce MAX limits on bytes processed at query, user and project level. Query
1
required data
● Cancelling a query may cost $

● Use caching intelligently

Enforce cost
2
control

Partition and
3
cluster

Flat-rate
4
pricing
Partition & cluster your data Optimize
querying

Query
Partition your table to reduce the data sweeped 1
● required data
○ Enable required partition filter

● Cluster to further prune your data blocks

Enforce cost
2
control

Partition and
3
cluster

Partitioning Clustering
Flat-rate
4
pricing
Flat-rate & Reservations Optimize
querying

Query
● Think about flat-rate once your BigQuery processing cost > $10K 1
required data
○ Familiarize with BigQuery cost using our pricing calculator

● How many slots you should buy? - Visualize slot utilization in Stackdriver

Enforce cost
2
control

Partition and
3
cluster

Flat-rate
4
pricing
02
Optimizing
Storage
How long are you keeping your data? Optimizing
Storage

(TTL) 1 Data Retention

Long term
2
storage

Avoid duplicate
3
storage

Dataset level Table level Streaming

4
inserts

*Similar to dataset-level and table-level, you can also set up expiration at

Backup and
partition-level. Do checkout our public documentation for default behaviors. 5
Recovery
Be wary how you edit your data? Optimizing
Storage

● If your table or partition has not been edited for 90 days, the storage 1 Data Retention
price drops by 50% (Long-term storage)

○ Watchout for any actions that edits your table: Loading into
BQ, DML operations, streaming inserts, .. Long term
2
storage
● For long term archives with access frequency at most once a year -
leverage Coldline class in GCS.
Avoid duplicate
3
storage

Streaming
4
inserts

Backup and
5
Recovery
Avoid duplicate copies of data Optimizing
Storage

Leverage BigQuery’s federated data access model for

1 Data Retention
your data stored on:
● Cloud Drive
● Cloud BigTable Long term
Cloud Storage 2
● storage
● Cloud SQL

Avoid duplicate
Use cases: 3
storage
● Frequently changing small side inputs
● Ingestion with cleanup that needs to be archived
● Querying of large archives Streaming
4
● Querying is less performant - gotcha! inserts

Backup and
5
Recovery
Optimizing
Storage
Loading the data
1 Data Retention
● Batch upload is free. Use streaming inserts only if it consumed by
downstream processes in real time.

Long term
2
storage

Understanding DR and backup processes 3

Avoid duplicate
storage
● By default your 7-day history is tracked by BigQuery at the service level.
○ You can find examples in our public documentation for point in time restore.
Streaming
4
● If you delete your table, you cannot restore it after 2 days. inserts

Backup and
5
Recovery
BigQuery Materialized Views

Automatically synchronizes data refreshes with

Zero Maintenance data changes in base tables. No user inputs
required.

Always consistent with the base table. There will

Always fresh never be a situation when querying MV results in
stale data.

BigQuery will rewrite the query to use the MV for

Self tuning better performance and/or eﬃciency when
querying the base table directly.
Flexibility and choice across the BI process
Flexibility and choice across the BI process
Introducing BigQuery
BI Engine

Sub-second queries

Simplified architecture

Smart tuning
Visualize cost

● Create your own dashboard (example)

● Analyze spending trend & query trend over time

● Breakdown cost per project and per user

● Be proactive about tracking your expensive

queries and optimize them
● BQ Audit logs Queries repository (Github)
Blogpost
For more details

bit.ly/gcp-co-bq
Thank you
Appendix
Ingestion formats

Faster Avro

Avro (Compressed)
Avro (Uncompressed) Parquet

Parquet / ORC
CSV ORC

JSON BigQuery
CSV (Compressed) CSV

JSON (Compressed)

Slower JSON
Introducing
BigQuery Omni
A ﬂexible, fully-managed, multi-cloud
analytics solution that lets you analyze data
across public clouds without leaving the
familiar BigQuery user interface.
Data integration partners

SaaS Data Sources

Databases

Data warehouses

B2B, EDI data

Resource Optimizations
● BigQuery Partitioning & Clustering
● Federation: Avoid duplication of data
● Data retention and clean up for active storage
● BigQuery Caching

Pricing Eﬃciency
● Flex Slots
● BigQuery Slots Recs

RedBooks-InfoSphere DataStage For Enterprise XML Data Integration PDF
100% (1)
RedBooks-InfoSphere DataStage For Enterprise XML Data Integration PDF
404 pages
Spark
No ratings yet
Spark
160 pages
EDB Postgres Advanced Server Guide v11
No ratings yet
EDB Postgres Advanced Server Guide v11
329 pages
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
No ratings yet
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
39 pages
04 - Google BigQuery Pricing
No ratings yet
04 - Google BigQuery Pricing
18 pages
Lecture 07 - Key-Value Databases
No ratings yet
Lecture 07 - Key-Value Databases
75 pages
M1 - Introducing Google Cloud v5.2 - ILT
No ratings yet
M1 - Introducing Google Cloud v5.2 - ILT
69 pages
Introduction To Big Data and Hadoop
100% (1)
Introduction To Big Data and Hadoop
29 pages
Neo4j-Manual-2 0 1
No ratings yet
Neo4j-Manual-2 0 1
593 pages
Lab 12 Introduction To Rapidminer/Weka.: Objective
No ratings yet
Lab 12 Introduction To Rapidminer/Weka.: Objective
24 pages
40+ Google Interview Questions & Answers
No ratings yet
40+ Google Interview Questions & Answers
17 pages
SS1123 - D2T - Apache Cassandra Overview PDF
100% (1)
SS1123 - D2T - Apache Cassandra Overview PDF
45 pages
SPARQL
No ratings yet
SPARQL
39 pages
The Data Warehouse ETL Toolkit - Chapter 04
100% (1)
The Data Warehouse ETL Toolkit - Chapter 04
51 pages
1 - Creating A Data Transformation Pipeline With Cloud Dataprep
0% (1)
1 - Creating A Data Transformation Pipeline With Cloud Dataprep
39 pages
Data Mining N Business Intelligence
No ratings yet
Data Mining N Business Intelligence
63 pages
Interview PDF
No ratings yet
Interview PDF
100 pages
Everything You Need To Know About PostgreSQL EXPLAIN
No ratings yet
Everything You Need To Know About PostgreSQL EXPLAIN
44 pages
07 - Ingesting New Datasets Into Google BigQuery
No ratings yet
07 - Ingesting New Datasets Into Google BigQuery
8 pages
Apache Hive Tutorial
No ratings yet
Apache Hive Tutorial
139 pages
Cassandra Tutorial
No ratings yet
Cassandra Tutorial
27 pages
Big Query Google
100% (2)
Big Query Google
62 pages
Nosql - Journey Ahead!: Origin: Punch Cards To Dbms
No ratings yet
Nosql - Journey Ahead!: Origin: Punch Cards To Dbms
54 pages
PostgreSQL and NoSQL
100% (7)
PostgreSQL and NoSQL
36 pages
Lab - Qlik Replicate With Google BigQuery
No ratings yet
Lab - Qlik Replicate With Google BigQuery
23 pages
T-GCPBDML-B - M0 - Course Introduction - ILT Slides
No ratings yet
T-GCPBDML-B - M0 - Course Introduction - ILT Slides
16 pages
Polars Vs Pandas - Benchmarking Performances and Beyond - LinkedIn
No ratings yet
Polars Vs Pandas - Benchmarking Performances and Beyond - LinkedIn
12 pages
Azure Cloud Intro
No ratings yet
Azure Cloud Intro
34 pages
System Design
No ratings yet
System Design
18 pages
C 100 Dev
No ratings yet
C 100 Dev
10 pages
Mongo DB
No ratings yet
Mongo DB
31 pages
C100dev 3
No ratings yet
C100dev 3
14 pages
Cassandra Best Practices
100% (1)
Cassandra Best Practices
49 pages
05.azure Data Lake Authentication
No ratings yet
05.azure Data Lake Authentication
16 pages
Mongodb Vs Couchbase Architecture WP PDF
No ratings yet
Mongodb Vs Couchbase Architecture WP PDF
45 pages
Introduction To Neo4j
No ratings yet
Introduction To Neo4j
14 pages
Untitled
No ratings yet
Untitled
13 pages
Applied Coding Track
No ratings yet
Applied Coding Track
10 pages
Real Time Data Processing With PDI
No ratings yet
Real Time Data Processing With PDI
15 pages
4.2.4 - Data Source Architectural Patterns
No ratings yet
4.2.4 - Data Source Architectural Patterns
20 pages
DB Campus Drive Preparation Materials Geeks4Geeks
No ratings yet
DB Campus Drive Preparation Materials Geeks4Geeks
14 pages
SQL Practical
No ratings yet
SQL Practical
16 pages
Big Query
No ratings yet
Big Query
5 pages
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages
PHP Pdo Syntax
No ratings yet
PHP Pdo Syntax
20 pages
MongoDB SI Architect Program Overview
No ratings yet
MongoDB SI Architect Program Overview
2 pages
When Where and Why To Use NoSQL
No ratings yet
When Where and Why To Use NoSQL
13 pages
The Node - Js Developer Roadmap For 2021
No ratings yet
The Node - Js Developer Roadmap For 2021
6 pages
Sequential Patterns The GSP Algorithm
No ratings yet
Sequential Patterns The GSP Algorithm
10 pages
Case Study Based On: Cloud Deployment and Service Delivery Models
No ratings yet
Case Study Based On: Cloud Deployment and Service Delivery Models
10 pages
Cassandra DBA
No ratings yet
Cassandra DBA
5 pages
13-How Good Is Your Data
No ratings yet
13-How Good Is Your Data
6 pages
Data Structures and Algorithms Made Easy: Narasimha Karumanchi
No ratings yet
Data Structures and Algorithms Made Easy: Narasimha Karumanchi
12 pages
ACID Properties in DBMS.8
No ratings yet
ACID Properties in DBMS.8
4 pages
My) SQL Cheat Sheet: Mysql Command-Line What How Example (S)
No ratings yet
My) SQL Cheat Sheet: Mysql Command-Line What How Example (S)
3 pages
Google Certified Professional Data Engineer
No ratings yet
Google Certified Professional Data Engineer
4 pages
Hive in Class Assignment Winter 2021
No ratings yet
Hive in Class Assignment Winter 2021
2 pages
MongoDB Indexing PDF
No ratings yet
MongoDB Indexing PDF
3 pages
Data Science Links
No ratings yet
Data Science Links
1 page
BQ Solutions-1
No ratings yet
BQ Solutions-1
19 pages
A Virtualisation Case Study
No ratings yet
A Virtualisation Case Study
25 pages
04 RTU560 Cabinets and Subracks E
No ratings yet
04 RTU560 Cabinets and Subracks E
19 pages
Devnet 1077mclaughlin 160301215127 PDF
No ratings yet
Devnet 1077mclaughlin 160301215127 PDF
55 pages
SuperMax Receiver Biss English
No ratings yet
SuperMax Receiver Biss English
7 pages
Securing RouteEngine v2
No ratings yet
Securing RouteEngine v2
150 pages
SQL Commands
No ratings yet
SQL Commands
8 pages
Purge Programs Ed3
No ratings yet
Purge Programs Ed3
16 pages
UID Assignment 2020
No ratings yet
UID Assignment 2020
2 pages
Code-Coverage Analysis: Structural Testing and Functional Testing
No ratings yet
Code-Coverage Analysis: Structural Testing and Functional Testing
4 pages
Chapter 4 Information System
No ratings yet
Chapter 4 Information System
27 pages
Home Security System With Face Recognition Based On Convolutional Neural Network
No ratings yet
Home Security System With Face Recognition Based On Convolutional Neural Network
5 pages
Business Requirement Document (BRD)
No ratings yet
Business Requirement Document (BRD)
5 pages
ds1401 System Overview 2017-A
No ratings yet
ds1401 System Overview 2017-A
5 pages
三维重建文档教程
No ratings yet
三维重建文档教程
19 pages
L3 OS Structure
No ratings yet
L3 OS Structure
16 pages
Final Java..
No ratings yet
Final Java..
13 pages
Introduction To Computing and Information Technology 1: Important Note
No ratings yet
Introduction To Computing and Information Technology 1: Important Note
18 pages
JARKOM - Setting Modem Yang Digunakan Untuk Layanan Internet Pascabayar (Telkom Speedy)
No ratings yet
JARKOM - Setting Modem Yang Digunakan Untuk Layanan Internet Pascabayar (Telkom Speedy)
14 pages
2020 CyberC AComprehensiveDetectionApproachofNmap-PrinciplesRulesandExperiments
No ratings yet
2020 CyberC AComprehensiveDetectionApproachofNmap-PrinciplesRulesandExperiments
9 pages
Hmi Series
No ratings yet
Hmi Series
19 pages
UNIT 3 Emb
No ratings yet
UNIT 3 Emb
4 pages
Cheminformatics Educationfinal
No ratings yet
Cheminformatics Educationfinal
7 pages
The Model View Controller Pattern - MVC Architecture and Frameworks Explained
No ratings yet
The Model View Controller Pattern - MVC Architecture and Frameworks Explained
18 pages
India Ink Guide
No ratings yet
India Ink Guide
16 pages
Crystal Reports From SQL Query String
No ratings yet
Crystal Reports From SQL Query String
5 pages
Examples Joins
No ratings yet
Examples Joins
9 pages
Part I: Introduction: Purpose
No ratings yet
Part I: Introduction: Purpose
9 pages
Naresh Babu Gatti SP Resume
No ratings yet
Naresh Babu Gatti SP Resume
1 page
Tutorial 2 Hci 20B10G013
No ratings yet
Tutorial 2 Hci 20B10G013
2 pages
DP 18 1 Practice
No ratings yet
DP 18 1 Practice
2 pages

BigQuery Cost Optimization + Best Practices

Uploaded by

BigQuery Cost Optimization + Best Practices

Uploaded by

Cost optimization

best practices for

Fully managed and serverless

● Tables are stored in optimized

● Each table is compressed and

● Storage is durable & each table is

● You can do time travel on data 3 3

Zone A Zone B Zone C

● Superlinear horizontal scalability

● Immune to node/rack downtime

● Pipelined execution, dynamic work

Query Processing Storage

● Use caching intelligently

● Cluster to further prune your data blocks

(TTL) 1 Data Retention

Dataset level Table level Streaming

*Similar to dataset-level and table-level, you can also set up expiration at

Leverage BigQuery’s federated data access model for

Understanding DR and backup processes 3

Automatically synchronizes data refreshes with

Always consistent with the base table. There will

BigQuery will rewrite the query to use the MV for

● Create your own dashboard (example)

● Analyze spending trend & query trend over time

● Breakdown cost per project and per user

● Be proactive about tracking your expensive

SaaS Data Sources

B2B, EDI data

You might also like