0% found this document useful (0 votes)
113 views12 pages

GCP Storage

This document provides information on various Google Cloud Platform (GCP) storage options including object storage, databases, structured vs unstructured data, and NoSQL databases. It discusses key aspects of Cloud Storage, Cloud SQL, Cloud Spanner, and Cloud Datastore. The main priorities for database users on GCP are to help migrate existing databases to the cloud and select the right service, usually Cloud SQL for MySQL or Postgres workloads. GCP offers both relational and non-relational storage options to suit different data and application needs.

Uploaded by

Paresh Bapat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views12 pages

GCP Storage

This document provides information on various Google Cloud Platform (GCP) storage options including object storage, databases, structured vs unstructured data, and NoSQL databases. It discusses key aspects of Cloud Storage, Cloud SQL, Cloud Spanner, and Cloud Datastore. The main priorities for database users on GCP are to help migrate existing databases to the cloud and select the right service, usually Cloud SQL for MySQL or Postgres workloads. GCP offers both relational and non-relational storage options to suit different data and application needs.

Uploaded by

Paresh Bapat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Module 4 - Storage

ACID refers to the four key properties of a transaction: atomicity, consistency,


isolation, and durability.

1. Storage options in GCP:

GCP offers many different storage options from object storage to database services.
These options help users save costs, reduce the time it takes to launch, and make the
most of their datasets by being able to analyze a wide variety of data.

2. Google’s priorities for users with a database?

For users with databases, Google’s first priority is to help them migrate existing
databases to the cloud and move them to the right service. This will usually be users
moving MySQL or Postgres workloads to Cloud SQL. The second priority is to help
users innovate, build or rebuild for the cloud, take advantage of mobile, and plan for
future growth

a. Migrate existing databases to the cloud, and move them to the right service.
b. Innovate, build, or rebuild for the cloud, take advantage of mobile, and plan for
future growth.
3. Structured vs unstructured data?

Structured data is what most people are used to working with and typically fits within
columns and rows in spreadsheets or relational databases. You can expect this type of
data to be organized and clearly defined, and usually easy to capture, access and
analyze. Examples of structured data would include names, addresses, contact
numbers, dates and billing info. The benefit of structured data is that it can be
understood by programming languages and data can be manipulated relatively quickly.

It’s estimated that around 80 percent of all data is unstructured. It’s far more difficult to
process or analyze unstructured data using traditional methods as there’s no internal
identifier to enable search functions to identify it. Unstructured data often includes text
and multimedia content, for example, e-mail messages, documents, photos, videos,
presentations, webpages, and so on. Organizations are focusing increasingly on mining
unstructured data for insights that will provide them with a competitive edge.

4. Choice of storage according to need?

5. Cloud storage notes:


Cloud Storage is just one of many storage options on GCP and stores and serves object
data, also known as blob data. Gmail, Google photos use it. You can store an unlimited
number of objects, up to 5 terabytes in size each, and Cloud Storage is well suited to
binary or object data such as images, media serving, and backups.

Classes of Storage in cloud storage:


Buckets contain objects which can be accessed by their own methods. In
addition to the acl property, buckets contain bucketAccessControls, for use in
fine-grained manipulation of an existing bucket's access controls.

A bucket is always owned by the project team owners group.

Cloud Storage files are organized into buckets. When you create a bucket, you give it a
globally-unique name, you specify a geographic location where the bucket and its
contents are stored, and you choose one of the default storage classes we just
discussed.

There are several ways to control users’ access to your objects and buckets. For most
purposes, Cloud IAM is sufficient. Roles are inherited from project to bucket to object. If
you need finer control, you can create access control lists. ACLs define who has access
to your buckets and objects, as well as what level of access they have. Each ACL
consists of two pieces of information: A scope, which defines who can perform the
specified actions and a permission, which defines what actions can be performed, for
example, read or write.

You can turn on object versioning on your buckets if you want. If you do, Cloud Storage
keeps a history of modifications--that is, overwrites or deletes--of all objects in the
bucket. You can list the archived versions of an object, restore an object to an older
state, or permanently delete a version, as needed. If you don’t turn on object versioning,
new always overwrites old.

Cloud Storage also offers lifecycle management policies. For example, you could tell
Cloud Storage to delete objects older than 365 days, or to delete objects created before
January 1, 2013; or to keep only the 3 most recent versions of each object in a bucket
that has versioning enabled.

6. What is a database?

A database is a collection of information that is organized so that it can easily be


accessed and managed. Users are building software applications using databases to
answer business questions, such as buying a ticket, filing an expense report, storing a
photo, or storing medical records.
GCP offers two managed relational (SQL) database services: CloudSQL and Cloud spanner

7. CloudSQL Notes:
Cloud SQL is a fully managed, relational database service that makes it easy to set up,
maintain, manage, and administer relational MySQL and PostgreSQL databases in the
cloud. This allows you to focus on your applications. Cloud SQL is perfect for WordPress
sites, ecommerce applications, CRM tools, geospatial applications, and any other
application that is compatible with MySQL, PostgreSQL, or SQL Server.

8. Cloud spanner notes:


So how does Cloud Spanner work?

Data is automatically and instantly copied across regions, which is called synchronous
replication. As a result, queries always return consistent and ordered answers regardless
of the region. Google uses replication within and across regions to achieve availability,
so if one region goes offline, a user’s data can still be served from another region.

As with Cloud SQL, Cloud Spanner is aligned to relational database requirements. The
key difference is that Cloud Spanner combines the benefits of relational database
structure with non-relational horizontal scale. Vertical scaling is where you make a single
instance larger or smaller. Horizontal scaling is when you scale by adding and removing
servers. What makes Cloud Spanner unique is that fact that it is a relational database
that scales horizontally. Cloud Spanner users are often in the advertising, finance, and
marketing technology industries, where the need exists to manage end user metadata.

The difference between Cloud Spanner and other databases:

Familiar relational database structure:

Most databases today require making trade-offs between scale and consistency. With
Cloud Spanner, users get the best of relational database structure and non-relational
database scale and performance with strong external consistency across rows, regions,
and continents.

Scales to very large databases:

What this means is that Cloud Spanner can scale to very large database sizes while still
giving IT and developers the familiarity they are used to with other relational databases
such as MySQL, PostgreSQL, or proprietary databases.

Strong external consistency:

Cloud Spanner is strongly consistent. Data added or updated from any location is
immediately available regardless of the location it is accessed.

Reduces operational overheads:

Cloud Spanner also dramatically reduces the operational overhead needed to keep the
database online and serving traffic. Users often move to Cloud Spanner from sharded
MySQL deployments and expensive proprietary solutions.
SCALE+SQL

Cloud Spanner scales horizontally and serves data with low latency while maintaining
transactional consistency and industry-leading 99.999% availability, less than 5 minutes
downtime per year. Cloud Spanner can scale to arbitrarily large database sizes to help
avoid rewrites and migrations. The use of multiple databases or sharded databases as
an alternative solution introduces unnecessary complexity and cost.

FULLY MANAGED

Cloud Spanner allows you to create or scale a globally replicated database for mission
critical apps through a handful of clicks. Synchronous replication and maintenance is
also automatic and built-in.

LAUNCH FASTER

Cloud Spanner is a relational database with full relational semantics, ACID transactions,
and handles schema changes as an online operation with no planned downtime. You
can reuse existing SQL skills to query data in Cloud Spanner using familiar, industry
standard ANSI SQL 2011.

ENTERPRISE GRADE SECURITY

Enterprise grade security includes data-layer encryption by default in transit and at rest,
granular identity and access management, and audit logging.

9. NoSQL (Non relational)options:

Google offers two managed NoSQL database options. Cloud Datastore is a fully
managed, serverless NoSQL document store that supports ACID transactions. Cloud
Bigtable is a petabyte scale, sparse wide column NoSQL database that offers extremely
low write latency.

10. Cloud datastore notes:


Cloud Datastore is a highly-scalable NoSQL database that’s ideal for rapid and flexible
web and mobile development. Cloud Datastore is a schemaless database, which means
it doesn’t rely on a schema the way a relational database does. Cloud Datastore is
therefore ideal if you have nonrelational data and want a serverless database without
having to worry about nodes or cluster management. Cloud Datastore isn’t a full SQL
database though, and isn’t an effective storage solution for data being used for analysis.
Cloud Datastore allows you to change to your data structure as your application evolves,
so there’s no need to perfect your data model at the beginning of your project. With
NoSQL, storing new properties in data requires no database changes or schema.

Cloud Datastore serves high-speed queries no matter how big your database, to ensure
your applications maintain high performance. Cloud Datastore uses Google Query
Language (GQL), which, because it's a query language in a SQL-like syntax format, is
both familiar and easy to learn. Complex queries are enabled with secondary (called
built-ins in Cloud Datastore) and composite indexes. It automatically scales to support
millions of API requests per second and 100s of terabytes of data, so no configuration or
capacity planning is needed.

It’s fully managed by Google so you can instantly provision a scalable and available
NoSQL database, without the hassle of spinning up virtual machines and maintaining
databases. Cloud Datastore automatically handles sharding and replication across
multiple datacenters to provide a database that is highly available and durable. This
allows users to focus on application development.

With the RESTful interface of Cloud Datastore, data can easily be accessed by any
deployment target. You can build solutions that run on App Engine, GKE, and Compute
Engine and use Cloud Datastore as their integration point. Cloud Datastore
automatically encrypts all data before it is written to disk and automatically decrypts the
data when read by an authorized user.
Examples of Cloud Datastore use cases:
1 User profiles: Cloud Datastore is is the best choice for shifting data requirements
without needing downtime, such as user game profiles where flexibility enables rapid
development of new features. It’s also ideal for storing user profiles to deliver a
customized experience based on the user’s past activities and preferences.

2 Product catalogues: Cloud Datastore enables true data hierarchy through ancestor
paths, which means related data can be strongly grouped together, making it exceptional
for tasks such as storing product reviews.

3 Recording transactions: Cloud Datastore is well suited for recording transactions


based on ACID properties, for example, transferring funds from one bank account to
another.

4 Mobile games: For mobile games, Cloud Datastore provides a durable key-value store
which allows player data to be efficiently stored and accessed. It’s scalability
accommodates the growth of games, whether there are 10 players or 100 million.

11. Cloud Bigtable notes:


Cloud Bigtable aligns with non-relational database requirements and is a
high-performance NoSQL database service for large analytical and throughput-intensive
operational workloads. It’s designed for very large amounts of data and is great for IoT,
user analytics, financial data analysis, time series data, and graph data. Cloud Bigtable
is also an option if support isn’t required for ACID transactions or if the data isn’t highly
structured. USED BY GOOGLE ANALYTICS, MAPS, GMAIL.

Bigtable was an internal Google database system that was so revolutionary it kickstarted
the NoSQL industry. Google wanted to build a database that could deliver real-time
access to petabytes of data. The result was Bigtable, and in 2006 Google released a
research paper describing it. This was later awarded as being one of the most influential
papers of the previous decade. This gave people outside of Google ideas that led to the
creation of popular NoSQL databases. In 2015, Cloud Bigtable was made available as a
service users could use for their own applications.

FAST AND PERFORMANT

Because Cloud Bigtable offers high performance under high load, large apps and
workflows are faster, more reliable, and more efficient running on Cloud Bigtable. Cloud
Bigtable is ideal for storing large amounts of data with very low latency.

SEAMLESS SCALING AND REPLICATION


Databases can automatically and seamlessly scale to billions of rows and thousands of
columns, allowing you to store petabytes of data. Changes to the deployment
configuration are immediate, so there’s no downtime during reconfiguration. Replication
adds high availability for live serving apps, and workload isolation for serving versus
analytics.

FULLY MANAGED

Because Bigtable is a fully managed service, there’s no need to worry about configuring
and tuning your database for performance or scalability. Google also creates data
backups to protect against catastrophic events and allow for disaster recovery.

INTEGRATED AND SECURE

You can use Cloud Bigtable for a range of applications, from real-time ad analytics all the
way to tracking millions of readings from thousands of IoT sensors. Because Cloud
Bigtable is compatible with industry standard tools like HBase, Hadoop, and BigQuery
and Cloud Dataflow, it's easy to put all that data to work for your app. In terms of
security, all data in Cloud Bigtable is encrypted both inflight and at rest, while access to
Cloud Bigtable data is easily controlled through IAM permissions.

Cloud bigtable structure:


Processing is done through a front-end server pool and nodes, is handled
separately from the storage.A Cloud Bigtable table is sharded into blocks
of contiguous rows, called tablets, to help balance the workload of queries.
Tablets are similar to HBase regions. Tablets are stored on Colossus,
which is Google's file system, in a Sorted Strings Table, or SSTable,
format. An SSTable provides a persistent, ordered immutable map from
keys to values, where both keys and values are arbitrary byte strings.

Charts show that as the required queries per second increase, the nodes
required will increase too. The throughput scales linearly, so for every
single node that you do add, you're going to see a linear scale of
throughput performance, up to hundreds of nodes

You might also like