0% found this document useful (0 votes)

54 views12 pages

GCP Technologies

Google Cloud Storage is one of the most important and widely used services of GCP. It provides blob type storage for unstructured data like audio, video, images. Datastore and Filestore provide no SQL database services for transactional and non-relational data with strong consistency. BigQuery is the data warehouse service that provides SQL capabilities for analysis on large datasets. Dataflow is used for processing streaming data in flight using directed acyclic graphs.

Uploaded by

Neha Khatri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as XLSX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views12 pages

GCP Technologies

Uploaded by

Neha Khatri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as XLSX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Products Storage Type Corresponding technologies

Google Cloud Storage Blob type storage Media storage (audio/video/image)

Cloud SQL OLTP MySQL
Cloud Spanner OLTP MySQL
Data Store No SQL XML/HTML/Key value pair Mongo DB
Bigtable No SQL (Hbase) XML/HTML/Key value pair
BigQuery OLAP (Hive) Data Warehouse
Memcache Cache storage RAM

Lets start talking about some simple use cases

You have data coming in for long term storage, which might be used for analysis, logging, reporting, long term legal archives: G
You have transactional data, which is relational in nature, coming in to keep record of transactions, data in GBs. Since data is t
consistent across database: Cloud SQL
You have transactional data, which is relational in nature, coming in to keep record of transactions, data > 10TBs. Since data is
consistent across database: Cloud Spanner
You have transactional data, which is NOT relational in nature, coming in to keep record of transactions, consistency, that is re
(Filestore is next version of Datastore, so both are the same thing, we have Firestore in case it's from mobile apps)

What is ACID? In case you are having transactional data coming in , you need to ensure that the databases are ACID in nature,
1) Automicity
2) Consistent
3) Isolation
4) Durable
You have data coming in petabytes, it is getting streamed in, relational in nature, and you will have to do analysis on top of tha
You have high volume, structured, but non relational data, like time series, streamed in from iOTs etc. Dataset can become hu
updates in the database, and this data will be used for analysis: Bigtable

You have cost constraint, data getting streamed in, non relational data, but you have just launched your web app/game, so yo
transaction data, no analysis required data gets stored in: Datastore (it is having excellent scale down facility, really strong con
Relational data coming in, streamed in, need to create reports: Get data in BQ, create reports in Data Studio, Data Studio can N
Bigtable and Cloud spanner are very expensive, but can store huge quantities of data
Datastore is really cheap, but can't store relational data, no link with reporting, not used for analytics

Google Cloud Storage: Cheapest and highly available data, but only for unstructred data. If you need cheap storage, not sure a
BgQuery is as cheap as cloud storage when it comes to data warehousing, so excellent case for streaming data, bacth storage
reporting
Any limitation of BQ? Yes, can't store data where row size is more than 10 MB
Any limitation of Cloud SQL: Max 30 TB data size, anything above, if it's relational, transactional data, put it in Cloud Spanner. A
Any limitation of Cloud Spanner: It's Google's proprietary technology, data is sparsely populated, globally consistent, petabyte
Cloud SQl uses both pstgre and Mysql
Cloud spanner is ideal for financial systems
Datastore has good integration with Google App Engine
Any issue with Bigtable? Can't store relational data, isn't used to store transactional data, best for time series kind of data, to a
Datastore: Best part is that it is having a free tier, also you will be charged on the basis of quantity of data returned, not quanti
BQ: Again charged on the basis of quantity of data returned in your query, not how much data was queried
BigQuery is also Google's proprietary database, columnar storage, with SQL capabilities
Bigtable is used by Google to power it's search, Gmail, maps etc. It suppports BOTH streaming and batch data transfers
In Datastore data is stored on the basis of key value pair in entitities. No fixed schema, and every row can be of different schem
Use case
Static wesites, storing images, video or audio data 9Unstructured Data)
OLTP, Transactional relational data with strong consistency
OLTP, Transactional relational data with strong consistency, with HUGE dataset
Transactional data with strong consistency, something that can easily scale down to 0
Transactional data with strong consistency, something that can easily scale up
SQL based data warehouse
RAM attached to VMs, very expensive, very fast, but gets deleted when machine restarts

orting, long term legal archives: Google Cloud Storage

tions, data in GBs. Since data is transactional so, you want strong consistency, that is records should be

tions, data > 10TBs. Since data is transactional so, you want strong consistency, that is records should be

nsactions, consistency, that is records should be consistent across database: Cloud Datastore/Filestore
's from mobile apps)

he databases are ACID in nature, that is:

have to do analysis on top of that, or create report with this data: BigQuery
OTs etc. Dataset can become huge, but you have no idea how big, you might have to get a lot of writes, multiple

ched your web app/game, so you aren't sure how popular it will be, it might not even take off, this is all
e down facility, really strong consistency in database, ACID transactions, has daily free quota limits)
in Data Studio, Data Studio can NOT be linked with Bigtable

nalytics

u need cheap storage, not sure about usage of data right now, store here. Compliance data gets stored here
r streaming data, bacth storage data, but relational in nature, structred data which will be used for analysis and

al data, put it in Cloud Spanner. And it is available only at regional level

ed, globally consistent, petabytes of capacity, but not for analysis

for time series kind of data, to analyze, and is very expensive. Data is stored sparsely
ntity of data returned, not quantity of data queried
a was queried

and batch data transfers

ery row can be of different schema, different size
GCS: Google Cloud Storage is one of the most important and widely used service of GCP
4 Storage classes: Multi Regional, Regional, Nearline, Coldline
Multiregional: For frequently accessed data, stored in multiple regions for better availability and redundancy
Regional: For frequently accessed data, but not as highly available as multiregional
Nearline: For data which is not as frequently accessed, maybe once a month. Storage is cheaper, access is more expensive. 30
Coldline: for data that is accessed once an year, for archival storage, maybe like for audits etc. Storage is cheapest, access if m
Archival: Lowest cost for data archiving, 365 days minimum
You can store an object of size 5TB at max
Data that is accessed frequently is called as hot data
GCS data is stored in buckets
Data is stored in the form of objects, each object has it's own URL
Every GCS bucket has to be stored with a GLOBALLY unique name
When the same piece of datais stored again within GCS, it replaces the earlier object
You can switch on versioning if you want to store multiple versions of data
Each version of data is stored within the bucket and cost the same for each object
You can put lifecycle management rules on the bucket, where you can specify to delete object after a specific period of time
Lifecycle management file is uploaded in GCS with JSON format
You can't get deleted objects back in GCS, deletion is permanent
It has strong global consistency

Refer to the image below to see the availability SLA of storage options
nd redundancy

er, access is more expensive. 30 days minimum storage

Storage is cheapest, access if more expensive, 90 days minimum storage

t after a specific period of time

Products Type
Data Proc Managed Hadoop offering
Data Flow Inflight data ETL offering
BigQuery Data warehouse providing SQL capabilities to query
Helping to ingest streaming live data, and then handle the
Cloud Pub Sub same within GCP tools

Cloud Datalab Tool based on Jupyter to write your own ML algorithms

NOT very important from exam perspective, just read:

Dataflow in some more detail
You have real time data streaming in, which needs to be processed in flight: Data Flow
Think about it this way, a graph is created, graph of all the processes that will act upon data, which will be ingested from sourc
Dataflow is based on Apache Beam
All transformations within dataflow are done with the help of DAGs (Directed Acyclic Graphs)
Some important terms that you need to know about Dataflow: pcollection, transforms and pipeline
Think of a DAG as a flowchart, data needs to pass through the whole flowchart to be processed, only difference is that flowcha
A pipeline is a DAG as a whole, repeatable jobs from start to finish, look at figure one to see one pipeline representation
Transform takes one or more pcollections to perform the processing function that you provide on the elements of that pcollec
Driver: Defined computation DAG
Runner: Executes DAG on the backend
Transforms never change input pcollection, they receive input pcollection and make changes in output ocollection

Fig1: Data Flow Pipeline

Corresponding technologies
Hadoop Spark
Apache Beam
Hive

Kafka

Jupyter Notebook

ght: Data Flow

t will act upon data, which will be ingested from source, and then written to a sink: DataFlow

ected Acyclic Graphs)

on, transforms and pipeline
wchart to be processed, only difference is that flowcharts run linearly, whereas DAG doesn't necessarily works linearly
at figure one to see one pipeline representation
ction that you provide on the elements of that pcollection, and produce the output pcollection

n and make changes in output ocollection

Use case
Making data processing easier and faster through processing on clusters
Serverless streaming and batch processing platform
OLAP based highly scalable multi cloud data warehouse

Messaging and event driven service for data ingestion and processing

Integrated tool for data exploration, analysis, visualization and machine learning

d from source, and then written to a sink: DataFlow

that flowcharts run linearly, whereas DAG doesn't necessarily works linearly
tation
that pcollection, and produce the output pcollection
Cloud Data Prod: A managed Hadoop offering, supports Spark. It includes Hive and Pig as well
No ops required: Just create cluster, use it and then turn it off
Don't store data in HDFS since it becomes expensive
It's ideal for moving existing code in GCP
Create clusters using compute engine VMs
Need atleast 1 master and 2 worker nodes
You can use preemptible machines, but not as a ,master node
Remember that preemptible machines have to be released in 30 seconds, whenever GCP wants it back, and it will definitely be
Use preemptible machines for operations, not for storing data
You can't make cluster only with preemptible machines
Min persistent disk that you can have is for 100 GB
Initialization scripts can be on git, or GCS
Can work with data proc on command Line Interface, GCP Console or programatically
You can run as root
Clusters can be easily scaled up or down, even while jobs are running

Operations of scaling are:

1) Add Workers
2) Remove workers
3) Add HDFS storage
For high availability clusters use 3 masyer nodes rather than 1, that will run in Apache zookeper for automatic failover
You can also have single node clusters, 1 node as master and worker, but can't be preemptible, it can be used for learning
Data proc jobs do not restart on failure, can optinally change though for streaming and long runnig jobs
It has connection with both BQ and Bigtable
ts it back, and it will definitely be rolled back in 24 hours

er for automatic failover

e, it can be used for learning
unnig jobs

04 BigQuery
No ratings yet
04 BigQuery
243 pages
04 Choosing Storage Solutions
No ratings yet
04 Choosing Storage Solutions
29 pages
Week 5 GCP Lec Notes
No ratings yet
Week 5 GCP Lec Notes
13 pages
Week 7 GCP Notes
No ratings yet
Week 7 GCP Notes
4 pages
From Data To Insights Course Summary
No ratings yet
From Data To Insights Course Summary
67 pages
Big Data As A Service On Google Cloud
No ratings yet
Big Data As A Service On Google Cloud
329 pages
Unit 2 - BD - Big Data Technology Foundations
No ratings yet
Unit 2 - BD - Big Data Technology Foundations
44 pages
Lecture 7
No ratings yet
Lecture 7
19 pages
Ccomputing Madurya
No ratings yet
Ccomputing Madurya
20 pages
3.2 - Data Storage Services
No ratings yet
3.2 - Data Storage Services
98 pages
(English (Auto-Generated) ) (Cloud Forum) Understanding BigQuery - Use Cases and Best Practices (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) (Cloud Forum) Understanding BigQuery - Use Cases and Best Practices (DownSub - Com)
42 pages
Exam Topics - PDE - Questions-7w1dhd9jefy8p8w9ucpjurqidy
No ratings yet
Exam Topics - PDE - Questions-7w1dhd9jefy8p8w9ucpjurqidy
64 pages
Storage - and - Database - Services GCP
No ratings yet
Storage - and - Database - Services GCP
69 pages
05 Data Storage Services
No ratings yet
05 Data Storage Services
75 pages
Unit 5 - Part 1
No ratings yet
Unit 5 - Part 1
25 pages
OD 02 PDE Designing Data Processing Systems
No ratings yet
OD 02 PDE Designing Data Processing Systems
67 pages
M1.1 Introduction To Data Engineering
No ratings yet
M1.1 Introduction To Data Engineering
75 pages
M4 - T-GCPFCI-B - Core Infrastructure v5.1.0 - ILT
No ratings yet
M4 - T-GCPFCI-B - Core Infrastructure v5.1.0 - ILT
52 pages
BDA Mod-1
No ratings yet
BDA Mod-1
20 pages
05 Storage and Database Services
No ratings yet
05 Storage and Database Services
74 pages
Exam Overview: GCP Data Engineer
100% (5)
Exam Overview: GCP Data Engineer
12 pages
Big Table
No ratings yet
Big Table
10 pages
2.2 Storage and Database Services
No ratings yet
2.2 Storage and Database Services
64 pages
OD M1 Introduction To Data Engineering
No ratings yet
OD M1 Introduction To Data Engineering
69 pages
xldb2012 Wed 1105 DhrubaBorthakur PDF
No ratings yet
xldb2012 Wed 1105 DhrubaBorthakur PDF
38 pages
Curso Google Data Engineer
No ratings yet
Curso Google Data Engineer
36 pages
01 Overview of GCP Platform
No ratings yet
01 Overview of GCP Platform
66 pages
05 Storage and Database Services
No ratings yet
05 Storage and Database Services
71 pages
GCP Notes For Certification
No ratings yet
GCP Notes For Certification
24 pages
OD 03 PDE Building and Operationalizing Data Processing Systems
No ratings yet
OD 03 PDE Building and Operationalizing Data Processing Systems
34 pages
20 - 04 - 2024 Cheatsheet
No ratings yet
20 - 04 - 2024 Cheatsheet
3 pages
Week 5 GCP Notes
No ratings yet
Week 5 GCP Notes
5 pages
Choosing Storage and Data Solutions: Priyanka Vergadia Developer Advocate, Google Cloud
No ratings yet
Choosing Storage and Data Solutions: Priyanka Vergadia Developer Advocate, Google Cloud
27 pages
Ace3 HTML
No ratings yet
Ace3 HTML
41 pages
OD M1 Introduction To Data Engineering
No ratings yet
OD M1 Introduction To Data Engineering
69 pages
ACE Sim 03 PDF
No ratings yet
ACE Sim 03 PDF
32 pages
22021134 - Đặng Thanh Quang - Chủ đề 1
No ratings yet
22021134 - Đặng Thanh Quang - Chủ đề 1
3 pages
CLOUD Ia2
No ratings yet
CLOUD Ia2
17 pages
Exam Overview: GCP Data Engineer
100% (1)
Exam Overview: GCP Data Engineer
12 pages
Google Cloud Services
No ratings yet
Google Cloud Services
27 pages
GCP Fund Module 9 Summary and Review
No ratings yet
GCP Fund Module 9 Summary and Review
13 pages
Cloud Elevate GCP CDL SME Connect2 GCP Data Services V1.1
No ratings yet
Cloud Elevate GCP CDL SME Connect2 GCP Data Services V1.1
31 pages
1.1 GCP - Storage - Options PDF
No ratings yet
1.1 GCP - Storage - Options PDF
20 pages
2.2 Storage and Database Services
No ratings yet
2.2 Storage and Database Services
64 pages
Google Cloud Fund M8 Big Data and Machine Learning in The Cloud
No ratings yet
Google Cloud Fund M8 Big Data and Machine Learning in The Cloud
44 pages
Preparing For The Google Cloud Professional Data Engineer Exam
No ratings yet
Preparing For The Google Cloud Professional Data Engineer Exam
3 pages
Script - Google Cloud Infrastructure
No ratings yet
Script - Google Cloud Infrastructure
6 pages
GCP Storage
No ratings yet
GCP Storage
12 pages
Finding Employee SSN in BigQuery Datasets - 05032025
No ratings yet
Finding Employee SSN in BigQuery Datasets - 05032025
2 pages
L01 Manual
No ratings yet
L01 Manual
38 pages
Core Services
No ratings yet
Core Services
5 pages
GCP Storage Database CheatSheet
No ratings yet
GCP Storage Database CheatSheet
2 pages
Associate Cloud Engineer - Study Notes
No ratings yet
Associate Cloud Engineer - Study Notes
14 pages
GCP Data Storage Cheat Sheet
No ratings yet
GCP Data Storage Cheat Sheet
1 page
GCP Storage Database CheatSheet
No ratings yet
GCP Storage Database CheatSheet
1 page
Data Storage Services in GCP: Relational Database Data Warehouse Nosql Big Data Database Service
No ratings yet
Data Storage Services in GCP: Relational Database Data Warehouse Nosql Big Data Database Service
15 pages
Analyse Data in GCP
No ratings yet
Analyse Data in GCP
14 pages
GCP Data
No ratings yet
GCP Data
6 pages
More About Google Compute Engine: Disk Packages On First Boot, You Can Do It With Start-Up Scripts. There Are Also
No ratings yet
More About Google Compute Engine: Disk Packages On First Boot, You Can Do It With Start-Up Scripts. There Are Also
3 pages
Purview Hub Integration With Fabric
No ratings yet
Purview Hub Integration With Fabric
18 pages
Microsoft Testking MS-900 v2020-01-29 by Philip 70q
No ratings yet
Microsoft Testking MS-900 v2020-01-29 by Philip 70q
47 pages
Index 572
No ratings yet
Index 572
3 pages
EZine - 29a-2
100% (1)
EZine - 29a-2
908 pages
Summer Internship Report
No ratings yet
Summer Internship Report
23 pages
Turbo Assembler Version 1.0 Reference Guide 1988
No ratings yet
Turbo Assembler Version 1.0 Reference Guide 1988
304 pages
Bandit Levels
No ratings yet
Bandit Levels
6 pages
ImageFusion Module 321 User Guide
No ratings yet
ImageFusion Module 321 User Guide
96 pages
Creating Configuration Tiles in Fiori
No ratings yet
Creating Configuration Tiles in Fiori
4 pages
CISCO Internship 2024
No ratings yet
CISCO Internship 2024
6 pages
X-USB Firmware Update V10 2014-07-09 Rev.2 PDF
No ratings yet
X-USB Firmware Update V10 2014-07-09 Rev.2 PDF
2 pages
SQL Notees Linkedin
No ratings yet
SQL Notees Linkedin
40 pages
INSTALLING OPERATING SYSTEM Part I
No ratings yet
INSTALLING OPERATING SYSTEM Part I
4 pages
Mod Menu Log - JP - Co.studioz - Shamanking
No ratings yet
Mod Menu Log - JP - Co.studioz - Shamanking
120 pages
Scheduling Tasks With Cron Jobs: Definitions
No ratings yet
Scheduling Tasks With Cron Jobs: Definitions
8 pages
Sample Resume
No ratings yet
Sample Resume
3 pages
Total Station Procedure
No ratings yet
Total Station Procedure
10 pages
Db4o 8.0 Tutorial
No ratings yet
Db4o 8.0 Tutorial
183 pages
Work Report: Here Is Where Your Presentation Begins
No ratings yet
Work Report: Here Is Where Your Presentation Begins
24 pages
Self-Assessment Form - Big Data
No ratings yet
Self-Assessment Form - Big Data
4 pages
DBM 612 Business Information Systems
No ratings yet
DBM 612 Business Information Systems
4 pages
Magic Quadrant For x86 Server Virtualization Infrastructure
No ratings yet
Magic Quadrant For x86 Server Virtualization Infrastructure
14 pages
Monitor Threats To The Network
No ratings yet
Monitor Threats To The Network
12 pages
Panasonic VRF Designer v7-14-0 User Manual Eng
No ratings yet
Panasonic VRF Designer v7-14-0 User Manual Eng
40 pages
New 2 Mic Dragon 720
No ratings yet
New 2 Mic Dragon 720
1 page
Hangsim - Updates
No ratings yet
Hangsim - Updates
6 pages
Test 123
No ratings yet
Test 123
12 pages
Struxureware Power Monitoring Expert 8.1: What'S New
No ratings yet
Struxureware Power Monitoring Expert 8.1: What'S New
7 pages
Write A Letter To Your Neighbour
No ratings yet
Write A Letter To Your Neighbour
1 page
Bharath Resume
No ratings yet
Bharath Resume
3 pages
EULA For SyncToy
No ratings yet
EULA For SyncToy
3 pages
DMDT Assignment UC2F1410SE
No ratings yet
DMDT Assignment UC2F1410SE
1 page

GCP Technologies

Uploaded by

GCP Technologies

Uploaded by

Products Storage Type Corresponding technologies

Google Cloud Storage Blob type storage Media storage (audio/video/image)

Lets start talking about some simple use cases

orting, long term legal archives: Google Cloud Storage

he databases are ACID in nature, that is:

al data, put it in Cloud Spanner. And it is available only at regional level

and batch data transfers

er, access is more expensive. 30 days minimum storage

t after a specific period of time

Cloud Datalab Tool based on Jupyter to write your own ML algorithms

NOT very important from exam perspective, just read:

Fig1: Data Flow Pipeline

ght: Data Flow

ected Acyclic Graphs)

n and make changes in output ocollection

d from source, and then written to a sink: DataFlow

Operations of scaling are:

er for automatic failover

You might also like