GCP Fund Module 4 Storage in The Cloud
GCP Fund Module 4 Storage in The Cloud
Infrastructure
Storage in the Cloud
Google Cloud Platform
Machine Operations
Compute Networking Big Data Storage
Learning and Tools
Google Cloud Platform has many storage options that satisfy nearly every
customer use case. In this module, we turn our attention to the core storage
options: Google Cloud Storage, Google Cloud SQL, Google Cloud Spanner,
Cloud Datastore, and Google Cloud Bigtable.
Agenda
Cloud Storage
Cloud Bigtable
Cloud Datastore
Google Cloud Storage offers developers and IT organizations durable and highly
available object storage. It assesses no minimum fee; you pay only for what you use.
Prior provisioning of capacity isn’t necessary.
What’s object storage? It’s not the same as file storage, in which you manage your
data as a hierarchy of folders. It’s not the same as block storage, in which your
operating system manages your data as chunks of disk. Instead, object storage
means this: you say to your storage, “Here, keep this arbitrary sequence of bytes,,”
and the storage lets you address it with a unique key. In Google Cloud Storage and in
other systems, these unique keys are in the form of URLs, which means object
storage interacts well with web technologies.
Google Cloud Storage always encrypts your data on the server side, before it is
written to disk, at no additional charge. Data traveling between a customer’s device
and Google is encrypted by default using HTTPS/TLS (Transport Layer Security). In
fact, Google was the first major cloud provider to enable HTTPS/TLS by default.
Google Cloud Storage is not a file system, although it can be accessed as one via
third-party tools such as Cloud Storage FUSE. The storage objects offered by Google
Cloud Storage are “immutable,” which means that you do not edit them in place, but
instead create a new version. Google Cloud Storage’s primary use is whenever binary
large-object storage is needed: online content, backup and archiving, storage of
intermediate results in processing workflows, and more.
Offline Media Import/Export is a third-party solution that allows you to load data into
Google Cloud Storage by sending your physical media, such as hard disk drives
(HDDs), tapes, and USB flash drives, to a third-party service provider who uploads
data on your behalf. Offline Media Import/Export is helpful if you’re limited to a slow,
unreliable, or expensive internet connection.
Storage class
Your Cloud Storage files are organized into buckets. When you create a bucket:
you give it a globally-unique name; you specify a geographic location where the
bucket and its contents are stored; and you choose a default storage class.
Pick a location that minimizes latency for your users. For example, if most of
your users are in Europe, you probably want to pick a European location: a GCP
region in Europe, or else the EU multi-region.
There are several ways to control users’ access to your objects and buckets.
For most purposes, Cloud IAM is sufficient. Roles are inherited from project to
bucket to object. If you need finer control, you can create access control lists
(“ACLs”) that offer finer control, ACLs define who has access to your buckets
and objects, as well as what level of access they have. Each ACL consists of
two pieces of information: A scope, which defines who can perform the
specified actions (for example, a specific user or group of users). And a
permission, which defines what actions can be performed (for example, read
or write).
Remember that Cloud Storage objects are immutable. You can turn on object
versioning on your buckets if you want. If you do, Cloud Storage keeps a
history of modifications--that is, overwrites or deletes--of all objects in the
bucket. You can list the archived versions of an object, restore an object to an
older state, or permanently delete a version, as needed. If you don’t turn on
object versioning, new always overwrites old.
Cloud Storage also offers lifecycle management policies. For example, you
could tell Cloud Storage to delete objects older than 365 days, or to delete
objects created before January 1, 2013; or to keep only the 3 most recent
versions of each object in a bucket that has versioning enabled.
Choosing among Cloud Storage classes
Retrieval price
Total price per GB transferred
Use cases Content storage and In-region analytics, Long-tail content, Archiving,
delivery transcoding backups disaster recovery
Cloud Storage lets you choose among four different types of storage classes:
Regional, Multi-regional, Nearline and Coldline. Multi-regional and Regional are
high-performance object storage, whereas Nearline and Coldline are backup and
archival storage. All of the storage classes are accessed in analogous ways using the
Cloud Storage API, and they all offer millisecond access times.
Regional Storage lets you store your data in a specific GCP region, us-central1,
europe-west1 or asia-east1. It’s cheaper than multi-regional storage, but it offers less
redundancy.
Multi-Regional Storage costs a bit more, but it’s geo-redundant. That means you
pick a broad geographical location, like United States, the European Union, or Asia,
and Cloud Storage stores your data in at least two geographic locations separated by
at least 160 kilometers.
The availability of these storage classes varies, with multi-regional having the highest
availability of 99.95%, followed by regional with 99.9% and nearline and coldline with
99.0%.
As for pricing, all storage classes incur a cost per gigabyte of data stored per month,
with multi-regional having the highest storage price and coldline the lowest storage
price. Egress and data transfer charges may also apply.
In addition to those charges, Nearline storage also incurs an access fee per gigabyte
of data read, and Coldline storage incurs a higher fee per gigabyte of data read.
There are several ways to bring data into Cloud Storage
Regardless of which storage class you choose, there are several ways to bring data
into Cloud Storage.
Many customers simply use gsutil, which is the Cloud Storage command from the
Cloud SDK. You can also move data in with a drag and drop in the GCP Console, if
you use the Google Chrome browser. But what if you have to upload terabytes or
even petabytes of data? Google Cloud Platform offers the online Storage Transfer
Service and the offline Transfer Appliance to help.
The Storage Transfer Service lets you schedule and manage batch transfers to Cloud
Storage from another cloud provider, from a different Cloud Storage region, or from an
HTTP(S) endpoint.
The Transfer Appliance is a rackable, high-capacity storage server that you lease
from Google Cloud. You simply connect it to your network, load it with data, and then
ship it to an upload facility where the data is uploaded to Cloud Storage. The service
enables you to securely transfer up to a petabyte of data on a single appliance. As of
this recording, it’s still beta, and it’s not available everywhere, so check the website for
details.
Cloud Storage works with other GCP services
Import and Startup scripts,
export images, and
tables general object
storage
BigQuery Compute
Engine
There are other ways of getting your data into Cloud Storage, as this storage option is
tightly integrated with many of the Google Cloud Platform products and services.
For example, you can import and export tables from and to BigQuery, as well as
Cloud SQL.
You can also store App Engine logs, Cloud Datastore backups, and objects used by
App Engine applications like images. Cloud Storage can also store instance startup
scripts, Compute Engine images, and objects used by Compute Engine applications.
In short, Cloud Storage is often the ingestion point for data being moved into the
cloud, and is frequently the long-term storage location for data.
Agenda
Cloud Storage
Cloud Bigtable
Cloud Datastore
Cloud Bigtable is Google's NoSQL big data database service. It's the same
database that powers many core Google services, including Search, Analytics,
Maps, and Gmail.
Why choose Cloud Bigtable?
● Replicated storage
● Data encryption in-flight and at rest
● Role-based ACLs
● Drives major applications such as Google
Analytics and Gmail
Streaming
Data can be streamed in (written event by event)
through a variety of popular stream processing
frameworks like Cloud Dataflow Streaming, Spark
Streaming, and Storm.
Cloud Bigtable
Batch Processing
Data can be read from and written to Cloud Bigtable
through batch processes like Hadoop MapReduce,
Dataflow, or Spark. Often, summarized or newly
calculated data is written back to Cloud Bigtable or to a
downstream database.
As Cloud Bigtable is part of the GCP ecosystem, it can interact with other GCP
services and third-party clients.
From an application API perspective, data can be read from and written to Cloud
Bigtable through a data service layer like Managed VMs, the HBase REST Server, or
a Java Server using the HBase client. Typically this will be to serve data to
applications, dashboards, and data services.
If streaming is not an option, data can also be read from and written to Cloud Bigtable
through batch processes like Hadoop MapReduce, Dataflow, or Spark. Often,
summarized or newly calculated data is written back to Cloud Bigtable or to a
downstream database.
Agenda
Cloud Storage
Cloud Bigtable
Cloud Datastore
Every Cloud SQL instance includes a network firewall, allowing you to control
network access to your database instance by granting access.
Easily scale up to 64 processor cores and more than 100 GB of RAM. Quickly
scale out with read replicas.
Automatic replication
Google Cloud SQL supports the following read replica scenarios:
● Cloud SQL instances replicating from a Cloud SQL master instance
Replicas are other instances in the same project and location as the
master instance. This feature is in Beta.
● Cloud SQL instances replicating from an external master instance
The master instance is external to Google Cloud SQL. For example, it
● can be outside the Google network or in a Google Compute Engine
instance. This feature is in Beta.
● External MySQL instances replicating from a Cloud SQL master
instance
External replicas are in hosting environments, outside of Cloud SQL.
Managed backups
Cloud SQL takes care of securely storing your backed-up data and makes it
easy for you to restore from a backup and perform a point-in-time recovery to a
specific state of an instance. Cloud SQL retains up to 7 backups for each
instance, which is included in the cost of your instance.
Cloud SQL customer data is encrypted when on Google's internal networks and
when stored in database tables, temporary files, and backups.
External
service
Cloud SQL can be used with Compute Engine instances Cloud SQL can be used with
App Engine using standard can be authorized to external applications and
drivers. access Cloud SQL clients.
instances using an external
You can configure a Cloud IP address. Standard tools can be used
SQL instance to follow an to administer databases.
App Engine application. Cloud SQL instances can be
configured with a preferred External read replicas can
zone. be configured.
Another benefit of Cloud SQL instances is that they are accessible by other GCP
services and even external services. You can use Cloud SQL with App Engine using
standard drivers like Connector/J for Java or MySQLdb for Python.
You can authorize Compute Engine instances to access Cloud SQL instances and
configure the Cloud SQL instance to be in the same zone as your virtual machine.
Cloud SQL also supports other applications and tools that you might be used to, like
SQL Workbench, Toad and other external applications using standard MySQL drivers.
Cloud Spanner is a horizontally scalable RDBMS
Cloud Spanner supports:
● Automatic replication
● Strong global consistency
● Managed instances with high availability
● SQL (ANSI 2011 with extensions)
Cloud Bigtable
Cloud Datastore
The total size of Cloud Datastore databases can grow to terabytes and more.
Google Cloud Datastore: benefits
● Schemaless access
○ No need to think about underlying data
structure
● Local development tools
● Includes a free daily quota
● Access from anywhere through a RESTful
interface
Cloud Bigtable
Cloud Datastore
Unit size 1 MB/entity ~10 MB/cell ~100 5 TB/object Determined by 10,240 MiB/ row 10 MB/row
MB/row DB engine
Now that we covered GCP’s core storage options, let’s compare them to help you
choose the right service for your application or workflow.
This table focuses on the technical differentiators of the storage services. Each row is
a technical specification and each column is a service. Let me cover each service
from left to right.
Consider using Cloud Datastore, if you need to store structured objects, or if you
require support for transactions and SQL-like queries. This storage services provides
terabytes of capacity with a maximum unit size of 1 MB per entity.
Consider using Cloud Bigtable, if you need to store a large amount of structured
objects. Cloud Bigtable does not support SQL queries, nor does it support multi-row
transactions. This storage service provides petabytes of capacity with a maximum unit
size of 10 MB per cell and 100 MB per row.
Consider using Cloud Storage, if you need to store immutable blobs larger than 10
MB, such as large images or movies. This storage service provides petabytes of
capacity with a maximum unit size of 5 TB per object.
Consider using Cloud SQL or Cloud Spanner if you need full SQL support for an
online transaction processing system. Cloud SQL provides up to Up to 10,230 GB,
depending on machine type, while Cloud Spanner provides petabytes. If Cloud SQL
does not fit your requirements because you need horizontal scalability, not just
through read replicas, consider using Cloud Spanner.
We didn’t cover BigQuery in this module as it sits on the edge between data storage
and data processing, but you will learn more about it in the “Big Data and Machine
Learning in the Cloud” module. The usual reason to store data in BigQuery is to use
its big data analysis and interactive querying capabilities. You would not want to use
BigQuery, for example, as the backing store for an online application.
Comparing storage options: use cases
Cloud Cloud Cloud Cloud Cloud BigQuery
Datastore Bigtable Storage SQL Spanner
Best for Semi-structured “Flat” data, Structured and Web Large-scale Interactive
application data, Heavy unstructured frameworks, database querying, offline
durable read/write, binary or object existing applications (> analytics
key-value data events, data applications ~2 TB)
analytical data
Use cases Getting started, AdTech, Images, large User Whenever high Data
App Engine Financial and IoT media files, credentials, I/O, global warehousing
applications data backups customer consistency is
orders needed
Considering the technical differentiators of the different storage services helps some
people decide which storage service to choose, others like to consider use cases. Let
me go through each service one more time.
Cloud Datastore is best for semi-structured application data that is used in App
Engine applications.
Bigtable is best for analytical data with heavy read and write events, like AdTech,
financial or IoT data.
Cloud Storage is best for structured and unstructured binary or object data, like
images, large media files and backups.
Cloud SQL is best for web frameworks and existing applications, like storing user
credentials and customer orders.
Cloud Spanner is best for large-scale database applications that are larger than 2 TB.
For example, for financial trading and e-commerce use cases.
Cloud Bigtable
Cloud Datastore
In this lab you will create a Google Cloud Storage bucket and place an image in it.
You’ll also configure an application running in Google Compute Engine to use a
database managed by Google Cloud SQL and to reference the image in the Cloud
Storage bucket.
Lab Objectives
● Create a Cloud Storage bucket
and place an image into it
In this lab you will create a Google Cloud Storage bucket and place an image in it.
You’ll also configure an application running in Google Compute Engine to use a
database managed by Google Cloud SQL and to reference the image in the Cloud
Storage bucket.
More resources
Overview of Cloud Storage https://cloud.google.com/storage/
Getting started with Google Cloud SQL https://cloud.google.com/sql/docs/quickstart
Cloud Bigtable https://cloud.google.com/stackdriver/docs/
Cloud Spanner https://cloud.google.com/spanner/docs/
Cloud Datastore https://cloud.google.com/datastore/docs/