0% found this document useful (0 votes)
12 views23 pages

Nosql KK

Uploaded by

dzwowt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views23 pages

Nosql KK

Uploaded by

dzwowt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

NoSQL Databases

An Overview
Dr. Kalpakis, Introduction to Data Science, Fall 2017

1
The need

2
Scaling Relational Databases
• Vertically (or up)
• Can be achieved by hardware upgrades (e.g., faster CPU, more memory, or
larger disks)
• Limited by the amount of CPU, RAM and disk that can be configured on a
single machine

• Horizontally (or out)


• Can be achieved by adding more machines
• Requires database sharding and probably replication
• Limited by the Read-to-Write ratio and communication overhead
• ACID requirements constrain scalability
3
Data Sharding
• Data is typically sharded (or striped) to allow for parallel accesses
• Amdahl’s Law gives the speedup due to sharding
• Real speedup is less due to communication overhead and workload
imbalance
Input data: A large file

Machine 1 Machine 2 Machine 3


Chunk1 of input data Chunk3 of input data Chunk5 of input data

Chunk2 of input data Chunk4 of input data Chunk5 of input data

E.g., parallel access to chunks 1, 3 and 5

4
Data Replication
• Replicating data across servers helps
• Avoid performance bottlenecks
• Avoid single point of failures
• Enhance scalability and availability

Main Server

Replicated Servers

5
Relational Databases & ACID
properties
• Execution of DB code blocks (aka transactions) ensure
• Atomicity: either all instructions or none of them are excuted
• Consistency: at the end, it leaves database in consistent state
• Isolation: oblivious to other concurrent manipulations of database
• Durability: upon completion, modifications to DB are permanent
• Consistency in distributed relational databases is often done using 2-
phase commit protocol (2PC)
• When sharding and replicating relational databases, ensuring
consistency is costly since real-life distributed systems are unreliable
• even worse, when network partitions
• AID are relatively easier to support in distributed systems
6
2-Phase Commit protocol (2PC)
Phase I: Voting
1. VOTE_REQUEST
2. VOTE_COMMIT
DB Server 1
Participant 1

DB Server 2
Coordinator Participant 2

Phase II: Commit


3. GLOBAL_COMMIT DB Server 3
Participant 3
4. LOCAL_COMMIT

7
The CAP Theorem
• “Of three properties of a shared data system: data consistency, system availability
and tolerance to network partitions, only two can be achieved at any given moment .”
• Conjectured by Eric Brewer (2000) and proven by Nancy Lynch and Seth Gilbert (2002)
• “CAP prohibits only a tiny part of the design space: perfect availability and consistency in
the presence of partitions, which are rare.” (Eric Brewer, 2012)
• Consistency:
• All nodes should see the same data at the same time (strict consistency)
• Availability:
• Node failures do not prevent survivors from continuing to operate
• Partition-tolerance:
• The system continues to operate despite network partitions

• Necessary to decide between C and A for very large systems since almost certainly will
partition
8
Various Consistency types
• Strong Consistency
• any subsequent access after an update will return the same updated value.
• Eventual Consistency
• if no new updates are made, eventually all accesses will return the last updated value
• Read-your-writes
• Upon updating an item, a process never sees an older value
• Monotonic read consistency
• If a process has seen a particular value of an item, no process sees an older value
afterwards
• Monotonic write consistency
• serializes the writes by the same process

9
BASE antidote to ACID
• Basically Available: indicates that the system does guarantee
availability
• Soft state indicates that the state of the system may change
over time, even without input.
• Eventual consistency indicates that the system will become
consistent over time, when input ceases during that time.
• Most NoSQL databases relax ACID and adopt BASE

10
CAP and databases

11
Taxonomy of NoSQL (Not-only
SQL) databases
• Key-Value Stores
• Lookup a single value for a key
• Amazon’s DynamoDB
• Document Stores
• Access data by key or by search of “document” data.
• MongoDB
• CouchDB
• Column Stores
• Column-wise storage of tabular data
• Google’s BigTable
• Facebook’s Cassandra
• Graph Stores
• Native graph storage, efficient graph algorithms
• Neo4j
• Google’s Pregel
12
13
Key-Value Stores
DynamoDB Data Model

Mandatory Optional
Key-value access pattern Models 1:N relationships
Determines data distribution Enables rich queries

14
Column Stores

15
Document Stores

in JSON/BSON

16
MongoDB Architecture

17
Queries

18
Graph Stores
Graph Stores – neo4j

vs

19
Prons/Cons of NoSQL
• Advantages :
• High elastic scalability
• Lower cost
• Schema flexibility, semi-structured data
• Disadvantages
• No standardization
• Less mature
• Limited query capabilities
• Programming with eventual consistent is counter-intuitive

20
21
NewSQL
• A DBMS that delivers the scalability and flexibility promised by NoSQL while
retaining the support for SQL queries and/or ACID, or to improve performance for
appropriate workloads.
Matt Aslett – “How Will The Database Incumbents Respond To NoSQL And NewSQL?”
https://www.451research.com/report-short?entityId=66963

Properties Traditional SQL NoSQL NewSQL


ACID Y N Y
• NewSQL databases have In-memory DB N Y Y
• SQL as the primary interface. Big Data N Y Y
RDBMS Y N Y
• ACID support for transactions
• Non-locking concurrency control.
• High per-node performance.
Parallel,
•Michael shared-nothing
Stonebraker- architecture.
“New SQL: An Alternative
http://cacm.acm.org/blogs/blog-cacm/109710
to NoSQL and Old SQL for New OLTP Apps”

22
23

You might also like