0% found this document useful (0 votes)
27 views36 pages

Cloud DBs

Uploaded by

shwetasha03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views36 pages

Cloud DBs

Uploaded by

shwetasha03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

1

u Assignments > Activity #13 – Create a DevOps Pipeline


u Term Project Check-in this week
u Details and links to Calendy are on now front page of myCourses
u Remember each checkin is 5% of your project grade
u Check-in details can be found on myCourses at Content > Project >
Project Check-in
2
u DevOps is a set of practices that combines software development
(Dev) and information-technology operations (Ops) which aims to
shorten the systems development life cycle and provide continuous
delivery with high software quality
u It can be thought of as a branch of Agile
u It ensures that there is an extra
focus on the development and
operations teams collaborating
closely, especially during the
release, deployment and
production support phases of the
SDLC
u DevOps is a cultural shift

Source: https://www.linkedin.com/pulse/transition-devops-jasmine-scott/
3
u A CI/CD Pipeline is the backbone of the modern DevOps environment. It
bridges the gap between development and operations teams by automating
the building, testing, and deployment of applications:
u Continuous Integration (CI) = short-lived feature branches, team is merging to
master branch multiple times per day, fully automated build and test process;
deployment is manual
u Continuous Delivery (CD) = CI + the entire software release process is automated,
it may be composed of multiple stages, and deployment to production is manual
u Continuous Deployment = CI + CD + fully automated deployment to production
CI/CD on AWS - Summary
u There are several tools to help that can help with your pipelines on AWS

u There many more 3rd party tools you can choose from
Databases in the Cloud (SQL vs. NoSQL)

SWEN 514/614: Engineering Cloud Software Systems

Department of Software Engineering


Rochester Institute of Technology
Relational Databases 6

u Relational databases are based on the relational model, an intuitive,


straightforward way of representing data in tables
u Each row in the table is a record with a unique ID called the key
u The standard user and application programming interface of a
relational database is the Structured Query Language (SQL)
u If you want to use a relational (aka SQL) database on AWS, you have
two options:
1. Operate a relational database yourself on
top of virtual machines, which we did with
our WordPress example using MySQL
2. Use the Amazon Relational Database
Service (RDS)
Amazon Relational Database Service (RDS) 7

u Amazon RDS is a managed database service


u It is not a database, but instead provides multiple database products
to choose including Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and
Microsoft SQL Server

u It is also commonly referred to as Database as a Service (DBaaS)


Managed Database Service 8

u What does this mean for what you would manage vs. the cloud
provider (AWS?)
Amazon RDS - Features & Benefits 9

u Process automation – RDS automates administrative tasks


such as database backups, software patching, failure
detection, and recovery
u Scalability – RDS offers two levels of scalability features:
vertical and horizontal, which can be done in minutes
u Security - RDS allows you to encrypt your databases using
keys you manage through AWS Key Management Service
(KMS)
u High Availability – RDS offers a feature called Multi-AZ,
which provides an SLA up-time of 99.95%. AWS provides a
synchronous “standby” replica of every database in
another “zone.”
u Since both the database and its replica are in sync, there is
no chance of data loss
Amazon RDS – High Availability 10

u Amazon RDS lets you launch highly


available (HA) databases
u Compared to a default database
consisting of a single database instance,
an HA RDS database consists of two
database instances: a master and a
standby database my-database…rds.amazonaws.com

u All clients send requests to the master


database
u Data is replicated between the master
and the standby database synchronously
Amazon RDS – High Availability 11

u If the master database becomes


unavailable due to hardware or network
failures, RDS starts the failover process
u RDS automatically performs a failover to
the standby so that database operations
can resume quickly without administrative
intervention my-database…rds.amazonaws.com

u The standby database then becomes the


master database
Amazon RDS – Read Replication 12

u A database suffering from too many read


requests can be scaled horizontally by
adding additional database instances for
read replication
u Changes to the database are
asynchronously replicated to an
additional read-only database instance
u The read requests can be distributed
between the master database and its
read-replication databases to increase
read throughput
Aurora vs other DB offerings… 13

u Better performance
u Compatible with MySQL and PostgreSQL
u Serverless option
u Database will automatically start up, shut down, and scale capacity up or down
based on your application’s needs
u Native High Availability including “multi-master”
u You can multiple read/write instances of your database across multiple AZ

u Downsides
u Can be more expensive, vendor lock-in
Amazon RDS (managed) vs. Self-hosting 14

u You would need considerable time and staff know-how to build a


comparable relational database environment based on virtual
machines
SQL Databases - Recap 15

u If your data is primarily structured, a SQL database is likely the right


choice
u A SQL database is a great fit for transaction-oriented systems such as
customer relationship management tools, accounting software, and e-
commerce platforms
u SQL databases are best when you need ACID compliance
u Atomicity èEach transaction either succeeds completely or is fully rolled back
u Consistency è Data written to a database must be valid according to all defined
rules
u Isolation è When transactions are run concurrently, they do not contend with
each other, and act as if they were being run sequentially
u Durability è Once a transaction has been committed to the database, it is
considered permanent, even in the event of a system failure
SQL Databases - Challenges 16

u With SQL databases, there is overhead for complex select, update and
deletes:
u Select – Joining too many tables to create a huge table
u Update – Each update affects other tables
u Delete – Must guarantee consistency of data

u They can be costly to scale


u Generally, they scale vertically vs.
horizontally
u SQL databases don’t do well with
unstructured data or scaling with
extremely high volumes of data
SQL Databases - Challenges 17

In one day:
24 million transactions processed by Walmart
100 TB of data uploaded to Facebook
175 million tweets on Twitter

How do you store, query and process this


data efficiently?

Answer: NoSQL

1 Zetabyte (ZB) è1million petabytes è 1billion terabytes è1trillion gigabytes


Source: https://www.sandfield.co.nz/news/opinion/444/part-1-data-science-so-what-is-it-exactly
What is NoSQL? 18

u NoSQL is an approach to databases that


represents a shift away from traditional
relational database management system
u They do not rely on tables, columns, rows or
schemas and use more flexible data models
u NoSQL follow ”schema on read” vs. “schema
on write” compared to SQL databases
u They are widely recognized for their ease of
development, functionality, and performance
at scale
u NoSQL databases use a variety of data
models, including key/value, graph, column
stores and document
NoSQL DBs – Key-Value 19

u Key-Value are the simplest NoSQL


databases
u Every single item in the database is
stored as an attribute name (or 'key'),
together with its ‘value’
u A values can be any type of binary
object (text, video, JSON document,
etc.)
u Examples: Berkeley DB, Cassandra and
AWS DynamoDB
Key-Value DB - Example 20
SQL NoSQL
Car Color Key Value
CarKey MakeKey ModelKey ColorKey Year ColorKey Color 1 Make: Nissan
Model: Pathfinder
1 1 1 2 2003 1 Red Color: Green
Year: 2003
2 2 1 3 2005 2 Green
State: NY
3 2 1 2 2005 3 Blue
2 Make: Honda
Model: Civic
Color: Green
MakeModel Make Year: 2005
State: Maine
ModelKey MakeKey Model MasterKey Make

1 1 Pathfinder 1 Nissan

1 2 Bluebird 2 Honda
2 1 Civic

u Question – What would I need to do to add a State value?


Key-Value - Benefits 21

u Simple data format makes write and read


operations fast
u Both keys and values can be anything,
ranging from simple objects to
complex compound objects
u Key-value databases are highly
partitionable and allow horizontal scaling at
scales that other types of databases
cannot achieve
NoSQL DBs – Document 22

u Similar to key-value databases


u Used for storing, retrieving, and managing semi-
structured data
u No single schema as they can contain many
different key-value pairs, or key-array pairs, or
even nested documents
u They typically store self-describing JSON, XML,
and BSON (Binary JSON) documents
u Popular fields in the document can be indexed
to provide fast retrieval without knowing the
key
u Examples: MongoDB, Apache CouchDB and
AWS DocumentDB
NoSQL DBs – Graph Database 23
u Graph databases are purpose-built to store and navigate
relationships
u Graph databases use nodes to store data entities, and edges to
store relationships between entities
u An edge always has a start node, end
node, type, and direction, and an edge
can describe parent-child relationships,
actions, ownership, and the like
u There is no limit to the number and kind
of relationships a node can have
u Examples: Neo4j, Microsoft CosmosDB
and Amazon Neptune
Graph Database – Benefits 24
u Can store and process large volumes of data and enable the search,
discovery, and exploration of large networks
u Ideal for solutions such as Fraud Detection, 360 Customer views and
Social Networks
u Query speed only dependent on
the number of concrete
relationships, and not on the
amount of data
u Compliance to standards (RDF,
SPARQL), no vendor lock-in
u Easy to publish/consume (e.g.
Knowledge Graphs – more on this
later)
Source: https://neo4j.com/
Relational vs. Key-Value vs. Graph Database 25

Source: https://www.nextplatform.com/2018/09/19/the-graph-database-poised-to-pounce-on-the-mainstream/
NoSQL DBs – Column Store 26

u Column stores in NoSQL are similar


at first appearance to traditional
SQL database
u Columns are logically grouped into
column families, which can contain
a virtually unlimited number of
columns
Row
Keys

Column Store DBs - Example 27

u Row Key - Each row has a unique key, which is


a unique identifier for that row.
u Column - Each column contains a name(key),
a value, and timestamp:
u Name - This is the name of the name/value u A column family consists of multiple rows (e.g.
pair Employees)
u Value - This is the value of the name/value pair u Each row can contain a different number of columns
to the other rows. The columns don’t have to match
u Timestamp - This provides the date and
the columns in the other rows (i.e. they can have
time that the data was inserted. This can
be used to determine the most recent version
different column names, data types, etc).
of data. u Each column is contained to its row and doesn’t
span all rows like in a relational database. It contains
a name/value pair, along with a timestamp.
Column Store DBs - Benefits 28

u Compression - Column stores are very efficient at data compression


and/or partitioning
u Aggregation queries - Due to their structure, columnar databases
perform particularly well with aggregation queries (such as SUM,
COUNT, AVG, etc)
u Scalability – Column store databases are very scalable. They are
well suited to massively parallel processing (MPP), which involves
having data spread across a large cluster of machines – often
thousands of machines
u Fast to load and query - Columnar stores can be loaded extremely
fast. A billion-row table could be loaded within a few seconds. You
can start querying and analyzing almost immediately.
u Examples: Google Bigtable, HBase & Cassandra
NoSQL DBs in AWS - DynamoDB 29
u DynamoDB is Key-value and Document database that can delivers single-
digit millisecond performance at any scale
u It’s a fully managed, multi-region, multi-master, durable database with built-
in security, backup and restore, and in-memory caching for internet-scale
applications
u There are no servers to provision, patch, or manage, and no software to
install, maintain, or operate. It automatically scales tables to adjust for
capacity and maintains performance with zero administration
u Highly scalability as it can handle more than
10 trillion requests per day and can support
peaks of more than 20 million requests per
second
u Supports encryption at rest
DynamoDB – Features 30
u Data types:
u Single-valued – Number, String, Binary, Boolean, and Null
u Multi-valued – String Set, Number Set, and Binary Set
u Document – List and Map

u Supports both Query and Scan for DB searching


u Query
u Faster as it searches only primary key attribute values and
supports a subset of comparison operators on key attribute
values to refine the search process
u Scan
u Less efficient as it scans the entire table
u You can specify filters to apply to the results to refine the
values returned to you, after the complete scan
u Class activity will look at these further
NoSQL – Recap 31

u NoSQL provides high level of scalability


u Able to handle large volumes of structured, semi-structured and
unstructured data
u Schema on read (NoSQL) vs. schema on write (relational)
u It is used in distributed computing environment
u Implementation is less costly
u Programming is easy to use and flexible
NoSQL – Challenges 32

• Most NoSQL databases do not perform ACID transactions for ensuring that data remains
consistent across the entire database as it is moved around
Data • NoSQL relies on the principle of “eventual consistency”, and it poses the risk that data on
Consistency one database node may go out of sync with data on another node

• The design and query languages of NoSQL databases vary widely between different
Lack of NoSQL products
Standards

• Many NoSQL systems are open-source projects and companies offering support are often
are small and/or start-ups without the global reach, support resources, or credibility of an
Less Support Oracle, Microsoft, or IBM
When to pick SQL vs. NoSQL? 33
SQL more ideal for:
• Transactions & Data Consistency - Anything to do with money or numbers that
requires transactions, ACID and data consistency will be very important giving
clear advantage to Relational Databases
• Storing Relationships - Relational databases are built to store relationships. They
have been tried & tested & are used by big guns in the industry like Facebook.

NoSQL more ideal for:


• Handling a Large Number Of Read/Write Operations – Use NoSQL databases
when you need to scale fast. They can add nodes on the fly and can handle
more concurrent traffic and large amounts of data with minimal latency.
• Running data analytics - NoSQL databases fit best for data analytics use cases,
where you have to deal with an influx of massive amounts of data

Bottom-line - The choice between NoSQL and SQL depends on the complex
business needs of an organization and volume and variety of data it consumes
Source: https://www.quora.com/Why-and-when-should-I-use-NoSQL-instead-of-SQL
What about other cloud providers? 34

u AWS vs. Azure

Source: https://thomaslarock.com/2019/05/updated-data-services-comparison-aws-vs-azure/
35

u Going back to your scenarios, you’ve been asked to identify a NoSQL


solution in your recommendation
u What type of NoSQL database(s) would you recommend and where
would they integrate into your architecture?
u You have ~10 minutes to discuss a plan

?
u The scenarios can be found in Assignment >
Activity #7 > Cost Estimating Scenarios
DynamoDB Activity (Due next class) 36

u You will be creating a DynamoDB instance and using the query and
scan operations using AWS command-line interface (CLI)
u Data for the database will be transferred from an S3 bucket to your
DynamoDB

Bonus

u Located in Assignments > Activity #14 – Create a DynamoDB


u There 1 deliverable plus an opportunity for a bonus point in this activity

You might also like