0% found this document useful (0 votes)

12 views2 pages

Bigquery

Uploaded by

SECE20A39MRUNAL VAIDYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views2 pages

Bigquery

Uploaded by

SECE20A39MRUNAL VAIDYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Bigquery

Legacy vs Standard Sql

Legacy sql – [], udf available in web console. Tables use “:” as separator

Standard sql – backtick is used, separator is . does not support TABLE_DATE_RANGE and
TABLE_QUERY. Can be overcome using wildcard and table_suffix. Supports querying nested and
repeated data.

Standard sql advantages:

▪ Composability using WITH clauses and SQL functions.

▪ Subqueries in the SELECT list and WHERE clause.

▪ Correlated subqueries

▪ ARRAY and STRUCT data types (legacy had repeated and record data types)

▪ Inserts, updates, and deletes (dml)

▪ COUNT(DISTINCT <expr>) is exact and scalable, providing the accuracy of

EXACT_COUNT_DISTINCT without its limitations

▪ Automatic predicate push-down through JOINs

▪ Complex JOIN predicates, including arbitrary expressions

▪ Table wildcards, table_suffix

▪ Stricter timestamp checking

Best practises/Performance

▪ Avoid self-joins, use window function instead

▪ If data is skewed like some partitions are huge, filter early. Use approximate_top_count to
determine skew

▪ Avoid joins that produces more output rows than input

▪ Avoid point specific dml. Batch the dml statements

▪ Sub-queries are more efficient than joins

▪ Avoid self-joins, use window function instead

▪ Use only columns that are needed

▪ Filter using “WHERE” clause so that there are minimal rows

▪ With joins, do bigger joins first. Left side of join must be the bigger table

▪ Low cardinality “by groups” are faster. Low cardinality means that the column contains a lot
of “repeats” in its data range

▪ LIMIT doesnt affect cost as it controls only the display

▪ Built-in functions are faster than js udf

▪ Exact functions are slower than approximate built-in function, use approximate built-in if
possible. For example, instead of using COUNT(DISTINCT), use APPROX_COUNT_DISTINCT()

▪ Ordering on outermost query, not inner. Outer query is performed last, so put complex
operations in the end when all filtering is done.

▪ Wildcards – be more specific if possible

▪ Performance – query time split between stages, can be seen using stackdriver as well.

▪ Each stage – wait, read, write, compute

▪ Tail skew – max time spent is significantly more than average. Some partitions are way bigger
than other partitions. Tail skew can be found out using approximate aggregate function like
APPROX_TOP_COUNT

▪ Avoid tail skew – filter as early as possible

▪ Batch load is free, streaming has a cost. Unless data is needed in real-time, use batch when
possible.

▪ Denormalize when possible. Still use structs and arrays.

▪ External data sources are slow, use it only when needed.

▪ Monitor query performance – using “details” page. Can find out if there is read, compute or
write latency. Query plan shows different stages and shows breakup of time between
different activities in a stage

Crack Your Data Engineering SQL Round
No ratings yet
Crack Your Data Engineering SQL Round
112 pages
SQL Cheat Sheet PDF
100% (2)
SQL Cheat Sheet PDF
2 pages
1 - Architecting For The Lakehouse
No ratings yet
1 - Architecting For The Lakehouse
115 pages
Im 7000 SM
No ratings yet
Im 7000 SM
2,533 pages
Apache Spark - DataFrames and Spark SQL
100% (2)
Apache Spark - DataFrames and Spark SQL
146 pages
Barclays Data Engineer Interview Questions
No ratings yet
Barclays Data Engineer Interview Questions
17 pages
Mastercard Data Engineer Interview Questions
No ratings yet
Mastercard Data Engineer Interview Questions
16 pages
Advanced SQL Concepts
No ratings yet
Advanced SQL Concepts
38 pages
Power Systems Resilience: Naser Mahdavi Tabatabaei Sajad Najafi Ravadanegh Nicu Bizon
No ratings yet
Power Systems Resilience: Naser Mahdavi Tabatabaei Sajad Najafi Ravadanegh Nicu Bizon
366 pages
BigQuery Cost Optimization + Best Practices
No ratings yet
BigQuery Cost Optimization + Best Practices
30 pages
Advanced Database Ch2 and 3
100% (1)
Advanced Database Ch2 and 3
73 pages
Top 100 Must Know SQL Queries
No ratings yet
Top 100 Must Know SQL Queries
10 pages
INTERVIEW QUESTIONS - ALL Companies
No ratings yet
INTERVIEW QUESTIONS - ALL Companies
15 pages
Sptve Icf 8 Q1 M6
No ratings yet
Sptve Icf 8 Q1 M6
13 pages
CS 6303 Computer Architecture TWO Mark With Answer
100% (1)
CS 6303 Computer Architecture TWO Mark With Answer
14 pages
Data Analyst Cheat Sheet
No ratings yet
Data Analyst Cheat Sheet
28 pages
Data Science Tools Study Guides For MIT's 15.003
No ratings yet
Data Science Tools Study Guides For MIT's 15.003
23 pages
From Data To Insights Course Summary
No ratings yet
From Data To Insights Course Summary
67 pages
Super Study Guide: Data Science Tools: Afshine Amidi and Shervine Amidi August 21, 2020
No ratings yet
Super Study Guide: Data Science Tools: Afshine Amidi and Shervine Amidi August 21, 2020
23 pages
NICE COMPASS 3.5.2 - Installation Manual
No ratings yet
NICE COMPASS 3.5.2 - Installation Manual
556 pages
BigQuery Data Engineer Interview CheatSheet
No ratings yet
BigQuery Data Engineer Interview CheatSheet
4 pages
Why Postgresql For Analytics Infrastructure (DW) ?: Huy Nguyen Cto, Cofounder - Holistics - Io
No ratings yet
Why Postgresql For Analytics Infrastructure (DW) ?: Huy Nguyen Cto, Cofounder - Holistics - Io
50 pages
Google Bigquery & Tableau: Best Practices
No ratings yet
Google Bigquery & Tableau: Best Practices
14 pages
BigSQLv4102 BestPractices and Performance Session7 ProblemDetermination 15012016
No ratings yet
BigSQLv4102 BestPractices and Performance Session7 ProblemDetermination 15012016
22 pages
BigQuery Query Optimization With Troposphere PDF
No ratings yet
BigQuery Query Optimization With Troposphere PDF
51 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
2 pages
SQL Server Query Optimization Techniques PDF
No ratings yet
SQL Server Query Optimization Techniques PDF
9 pages
Optimizing BigQuery SQL Queries - A Comprehensive Guide
No ratings yet
Optimizing BigQuery SQL Queries - A Comprehensive Guide
27 pages
Spark Best Practices
No ratings yet
Spark Best Practices
10 pages
(Ebook) The Art of Postgresql: Turn Thousands of Lines of Code Into Simple Queries by Dimitri Fontaine
No ratings yet
(Ebook) The Art of Postgresql: Turn Thousands of Lines of Code Into Simple Queries by Dimitri Fontaine
65 pages
SQL Cheat Sheet PDF
No ratings yet
SQL Cheat Sheet PDF
2 pages
Datathon at UCI Resource Sheet
No ratings yet
Datathon at UCI Resource Sheet
15 pages
CSC421 - Database Management II
No ratings yet
CSC421 - Database Management II
48 pages
Analytics Databases - A Comparative Study
No ratings yet
Analytics Databases - A Comparative Study
62 pages
SQL Indexes - Advanced SQL - Bipp Analytics
No ratings yet
SQL Indexes - Advanced SQL - Bipp Analytics
5 pages
SQL Concepts To Be Known
No ratings yet
SQL Concepts To Be Known
4 pages
SQL - Eda Process
No ratings yet
SQL - Eda Process
7 pages
Beginners Guide To SQL
No ratings yet
Beginners Guide To SQL
32 pages
2022 Price List: Power Tools
100% (1)
2022 Price List: Power Tools
7 pages
Apache Druid: Sudhindra Tirupati Nagaraj
No ratings yet
Apache Druid: Sudhindra Tirupati Nagaraj
12 pages
Query Execution
No ratings yet
Query Execution
25 pages
Data Stream Management
No ratings yet
Data Stream Management
46 pages
4 - Spark SQL
No ratings yet
4 - Spark SQL
58 pages
TT SQL Cheat Sheet
No ratings yet
TT SQL Cheat Sheet
7 pages
Day 10 1729086189
No ratings yet
Day 10 1729086189
14 pages
Data Storage Services in GCP: Relational Database Data Warehouse Nosql Big Data Database Service
No ratings yet
Data Storage Services in GCP: Relational Database Data Warehouse Nosql Big Data Database Service
15 pages
SQL For Data Science
No ratings yet
SQL For Data Science
8 pages
Advance SQL
No ratings yet
Advance SQL
12 pages
r23 Dbms Record
No ratings yet
r23 Dbms Record
8 pages
Big Query Optimization Document
No ratings yet
Big Query Optimization Document
10 pages
Unstructured Data: User Price Shipped
No ratings yet
Unstructured Data: User Price Shipped
14 pages
SQL Concepts Differences
No ratings yet
SQL Concepts Differences
3 pages
SQL Tutorial1
No ratings yet
SQL Tutorial1
25 pages
Simatic Net: Rugged Ethernet Switches
No ratings yet
Simatic Net: Rugged Ethernet Switches
48 pages
SQL CheatSheet
No ratings yet
SQL CheatSheet
4 pages
SQL To Analytics Language Cheat Sheet
No ratings yet
SQL To Analytics Language Cheat Sheet
2 pages
Loading and Exporting Data
No ratings yet
Loading and Exporting Data
2 pages
Advance SQL Deck 1749614960
No ratings yet
Advance SQL Deck 1749614960
76 pages
BigQuery CheatSheet
No ratings yet
BigQuery CheatSheet
100 pages
Wipro Data Analyst Interview Questions
No ratings yet
Wipro Data Analyst Interview Questions
29 pages
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
No ratings yet
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
25 pages
Databases LEVEL 3 Notes
No ratings yet
Databases LEVEL 3 Notes
29 pages
Q1. Difference Between Cache and Pe
No ratings yet
Q1. Difference Between Cache and Pe
13 pages
GRC Certification Matrix
No ratings yet
GRC Certification Matrix
20 pages
Online Vehicle Rental Management System-Mern
No ratings yet
Online Vehicle Rental Management System-Mern
5 pages
SQL Notes (23 - 01 - 2024)
No ratings yet
SQL Notes (23 - 01 - 2024)
2 pages
How Does Digitalization Change The Role and Way of Working of Internal Audit - An Exploratory Overview
No ratings yet
How Does Digitalization Change The Role and Way of Working of Internal Audit - An Exploratory Overview
29 pages
Full Stack Development (Mern) : Submitted in Partial Fulfillment of The Requirements For The Award of The Degree of
No ratings yet
Full Stack Development (Mern) : Submitted in Partial Fulfillment of The Requirements For The Award of The Degree of
27 pages
Course Overview - VIRTUAL REALITY
No ratings yet
Course Overview - VIRTUAL REALITY
52 pages
Firsov Indictment
No ratings yet
Firsov Indictment
6 pages
LAMMPS Tutorial SC22
No ratings yet
LAMMPS Tutorial SC22
50 pages
Lutron L-hwlv2-Wifi Installation Manual
No ratings yet
Lutron L-hwlv2-Wifi Installation Manual
42 pages
Use of AI in Civil Engineering, Its Problems and Solutions-1
No ratings yet
Use of AI in Civil Engineering, Its Problems and Solutions-1
17 pages
Dowload Source : Formvalidator Jqtransform Phpmailer Dark Wood Texture
No ratings yet
Dowload Source : Formvalidator Jqtransform Phpmailer Dark Wood Texture
10 pages
NPM Package Manager Q A
No ratings yet
NPM Package Manager Q A
2 pages
D5.1. OpenETCS - Functional Specification of Demonstrator
No ratings yet
D5.1. OpenETCS - Functional Specification of Demonstrator
41 pages
AG7 Access+resource+secrets+more+securely+across+services Ed1
No ratings yet
AG7 Access+resource+secrets+more+securely+across+services Ed1
55 pages
DQM - Roles and Responsibilities - 03-17-2021 - SM
No ratings yet
DQM - Roles and Responsibilities - 03-17-2021 - SM
45 pages
Stdout
No ratings yet
Stdout
36 pages
Group 3 Capstone
No ratings yet
Group 3 Capstone
25 pages
Authentication and Hash Function
No ratings yet
Authentication and Hash Function
14 pages
Operating System Kernels
No ratings yet
Operating System Kernels
7 pages
2020 Chaudhari A Review Article On Artificial Intelligence Change in Farmaceutical Formulation and Development
No ratings yet
2020 Chaudhari A Review Article On Artificial Intelligence Change in Farmaceutical Formulation and Development
8 pages
Human-Centered Machine Learning Implementation in Banking Case Study in BRILink BRI Branchless Banking Agent Acquisition Upgrade and Activation
No ratings yet
Human-Centered Machine Learning Implementation in Banking Case Study in BRILink BRI Branchless Banking Agent Acquisition Upgrade and Activation
7 pages
Web Optimization
No ratings yet
Web Optimization
5 pages
KHAIRUN NISA - Product Owner
No ratings yet
KHAIRUN NISA - Product Owner
4 pages
Raunak Resume
No ratings yet
Raunak Resume
1 page
SQL Mastery: From Novice Queries to Advanced Database Wizardry
From Everand
SQL Mastery: From Novice Queries to Advanced Database Wizardry
Scott Markham
No ratings yet
SQL Interview Success From Beginner To Pro
From Everand
SQL Interview Success From Beginner To Pro
Shana
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Bigquery

Uploaded by

Bigquery

Uploaded by

Bigquery

Legacy vs Standard Sql

Standard sql advantages:

▪ Composability using WITH clauses and SQL functions.

▪ Subqueries in the SELECT list and WHERE clause.

▪ Inserts, updates, and deletes (dml)

▪ COUNT(DISTINCT <expr>) is exact and scalable, providing the accuracy of

▪ Automatic predicate push-down through JOINs

▪ Complex JOIN predicates, including arbitrary expressions

▪ Table wildcards, table_suffix

▪ Stricter timestamp checking

▪ Avoid self-joins, use window function instead

▪ Avoid joins that produces more output rows than input

▪ Avoid point specific dml. Batch the dml statements

▪ Sub-queries are more efficient than joins

▪ Avoid self-joins, use window function instead

▪ Use only columns that are needed

▪ Filter using “WHERE” clause so that there are minimal rows

▪ LIMIT doesnt affect cost as it controls only the display

▪ Built-in functions are faster than js udf

▪ Wildcards – be more specific if possible

▪ Each stage – wait, read, write, compute

▪ Avoid tail skew – filter as early as possible

▪ Denormalize when possible. Still use structs and arrays.

▪ External data sources are slow, use it only when needed.

You might also like