M1 - Introduction To Data Engineering Slides
M1 - Introduction To Data Engineering Slides
Data Engineering
Agenda
Explore the role of a data engineer
Intro to BigQuery
So… how do we
get the raw data Manage the Productionize
from multiple
systems and data data processes
where can be
store it durably?
A data lake brings together data from across the
enterprise into a single location
Replicate
Raw Data
Data Lake
Spread
RDMBS sheets
Other
Offline systems
files and apps
Key considerations when building a Data Lake
Extract, Transform,
and Load
What if your data arrives continuously and endlessly?
THIS DATA
Streaming Data
DOES NOT
WAIT
Processing
Cloud Cloud
Dataflow BigQuery
Pub/Sub
Agenda
Explore the role of a data engineer
Intro to BigQuery
Example Query:
Give me all the
in-store promotions
for recent orders and
their inventory levels
Under-utilized
Consumption (Wasting $$$)
Capacity
Time
Challenge: Queries need to be optimized for
performance (caching, parallel execution)
Intro to BigQuery
Performance Resource
tuning provisioning
Utilization Handling
improvements growing scale
Deployment &
Reliability
configuration
You don't need to provision resources before using BigQuery
Consumption
Allocation
Time
Agenda
Explore the role of a data engineer
Intro to BigQuery
● Postgres Federate
d Query
Cloud ● MySQL
SQL ● SQL Server
Cloud
Storage
Demo Federated Queries with
BigQuery
Agenda
Explore the role of a data engineer
Intro to BigQuery
● Automatic encryption
Cloud
SQL ● 30TB storage capacity
● 60,000 IOPS
(read/write per second)
● Auto-scale and auto
backup
line ML Model
pip e
re
Featu
Data
Data Lake Other Team Data
Warehouse Eng Pipeline
Warehouse
BI p
ipe
line
Reporting
Dashboards
Intro to BigQuery
FROM
ML.EVALUATE(MODEL
`bqml_tutorial.sample_model`,
TABLE eval_table)
Partner
● No need to manage OLAP cubes
BI tools or separate BI servers for
dashboard performance
Intro to BigQuery
line ML Model
pip e
re
Featu
BI p
ipe
line
Data Catalog
Intro to BigQuery
line ML Model
pip e
re
Featu
BI p
ipe
line
Intro to BigQuery
AI Platform
http://www.multichannel-blog.co.uk/2017/05/03/google-the-future-of-cloud-conference-in-london-3-4th-may/
Twitter democratized data analysis using BigQuery
“We believe that users with a wide range of technical skills should be able to discover
data and have access to SQL-based analysis and visualization tools that perform well”
-- Twitter
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/democratizing-data-analysis-with-google-bigquery.html
Recap
● Data sources
● Data lakes
● Data warehouses
● Google Cloud solutions for
Data Engineering
Concept Review:
for analysis
Data stores
AI
Platform
Notebooks
Here’s a useful guide
for “GCP products in
4 words or less”
https://github.com/gr
egsramblings/google-
cloud-4-words
Updated continually By Greg Wilson -
Google DevRel
Agenda
Explore the role of a data engineer
Intro to BigQuery