Nosql KK
Nosql KK
An Overview
Dr. Kalpakis, Introduction to Data Science, Fall 2017
1
The need
2
Scaling Relational Databases
• Vertically (or up)
• Can be achieved by hardware upgrades (e.g., faster CPU, more memory, or
larger disks)
• Limited by the amount of CPU, RAM and disk that can be configured on a
single machine
4
Data Replication
• Replicating data across servers helps
• Avoid performance bottlenecks
• Avoid single point of failures
• Enhance scalability and availability
Main Server
Replicated Servers
5
Relational Databases & ACID
properties
• Execution of DB code blocks (aka transactions) ensure
• Atomicity: either all instructions or none of them are excuted
• Consistency: at the end, it leaves database in consistent state
• Isolation: oblivious to other concurrent manipulations of database
• Durability: upon completion, modifications to DB are permanent
• Consistency in distributed relational databases is often done using 2-
phase commit protocol (2PC)
• When sharding and replicating relational databases, ensuring
consistency is costly since real-life distributed systems are unreliable
• even worse, when network partitions
• AID are relatively easier to support in distributed systems
6
2-Phase Commit protocol (2PC)
Phase I: Voting
1. VOTE_REQUEST
2. VOTE_COMMIT
DB Server 1
Participant 1
DB Server 2
Coordinator Participant 2
7
The CAP Theorem
• “Of three properties of a shared data system: data consistency, system availability
and tolerance to network partitions, only two can be achieved at any given moment .”
• Conjectured by Eric Brewer (2000) and proven by Nancy Lynch and Seth Gilbert (2002)
• “CAP prohibits only a tiny part of the design space: perfect availability and consistency in
the presence of partitions, which are rare.” (Eric Brewer, 2012)
• Consistency:
• All nodes should see the same data at the same time (strict consistency)
• Availability:
• Node failures do not prevent survivors from continuing to operate
• Partition-tolerance:
• The system continues to operate despite network partitions
• Necessary to decide between C and A for very large systems since almost certainly will
partition
8
Various Consistency types
• Strong Consistency
• any subsequent access after an update will return the same updated value.
• Eventual Consistency
• if no new updates are made, eventually all accesses will return the last updated value
• Read-your-writes
• Upon updating an item, a process never sees an older value
• Monotonic read consistency
• If a process has seen a particular value of an item, no process sees an older value
afterwards
• Monotonic write consistency
• serializes the writes by the same process
9
BASE antidote to ACID
• Basically Available: indicates that the system does guarantee
availability
• Soft state indicates that the state of the system may change
over time, even without input.
• Eventual consistency indicates that the system will become
consistent over time, when input ceases during that time.
• Most NoSQL databases relax ACID and adopt BASE
10
CAP and databases
11
Taxonomy of NoSQL (Not-only
SQL) databases
• Key-Value Stores
• Lookup a single value for a key
• Amazon’s DynamoDB
• Document Stores
• Access data by key or by search of “document” data.
• MongoDB
• CouchDB
• Column Stores
• Column-wise storage of tabular data
• Google’s BigTable
• Facebook’s Cassandra
• Graph Stores
• Native graph storage, efficient graph algorithms
• Neo4j
• Google’s Pregel
12
13
Key-Value Stores
DynamoDB Data Model
Mandatory Optional
Key-value access pattern Models 1:N relationships
Determines data distribution Enables rich queries
14
Column Stores
15
Document Stores
in JSON/BSON
16
MongoDB Architecture
17
Queries
18
Graph Stores
Graph Stores – neo4j
vs
19
Prons/Cons of NoSQL
• Advantages :
• High elastic scalability
• Lower cost
• Schema flexibility, semi-structured data
• Disadvantages
• No standardization
• Less mature
• Limited query capabilities
• Programming with eventual consistent is counter-intuitive
20
21
NewSQL
• A DBMS that delivers the scalability and flexibility promised by NoSQL while
retaining the support for SQL queries and/or ACID, or to improve performance for
appropriate workloads.
Matt Aslett – “How Will The Database Incumbents Respond To NoSQL And NewSQL?”
https://www.451research.com/report-short?entityId=66963
22
23