Lecture 3_ Principles of NoSQL Databases
Lecture 3_ Principles of NoSQL Databases
source: Holubová, Kosek, Minařík, Novák. Big Data a NoSQL databáze. 2015.
Example (2): Relational Model
source: Holubová, Kosek, Minařík, Novák. Big Data a NoSQL databáze. 2015.
Agenda
● Fundamentals of RDBMs and NoSQL Databases
● Data Model of Aggregates
● Models of Data Distribution
○ scalability, sharding
○ replication: master-slave, peer-to-peer
○ combination
● Consistency
○ write-write vs. read-write conflict
○ strategies and techniques
○ relaxing consistency
Aggregates
An aggregate
● A data unit with a complex structure
○ Not simply a tuple (a table row) like in RDBMS
● A collection of related objects treated as a unit
○ unit for data manipulation and management of consistency
source: Holubová, Kosek, Minařík, Novák. Big Data a NoSQL databáze. 2015.
Example (4): Aggregates
// collection "Customer" // collection "Order"
{ {
"customerID": 1, "orderNumber": 11,
"name": "Jan Novák", "date": "2015-04-01",
"address": { "customerID": 1,
"city": "Praha", "orderItems": [
"street": "Krásná 5", {
"ZIP": "111 00" "productID": 111,
} "name": "Vysavač ETA E1490",
} "quantity": 1,
// collection "Invoice" "price": 1300
{ },
"invoiceID": 2015003, {
"orderNumber": 11, "productID": 112,
"bankAccount": "64640439/0100", "name": "Sáček k ETA E1490",
"paymentDate": "2015-04-16", "quantity": 10,
"address": { "price": 300
"city": "Brno", }
"street": "Slunečná 7", ]
"ZIP": "602 00" }
}
NoSQL Databases: Aggregate-oriented
Many NoSQL stores are aggregate-oriented:
○ There is no general strategy to set aggregate boundaries
○ Aggregates give the database information about which bits
of data will be manipulated together
■ What should be stored on the same node
● Traditional choice
○ in favour of strong consistency
○ very simple to realize (no handling of data distribution)
source: https://blogs.oracle.com/jag/resource/Fallacies.html
Distribution Models: Overview
● Horizontal scalability = scaling out
● Two generic ways of data distribution:
○ Replication – the same data is copied over multiple nodes
■ Master-slave or peer-to-peer
○ Sharding – different data chunks are put on different nodes
(data partitioning)
● Applicability: Different
clients access different
parts of the dataset
● Reads from any node source: Sadalage & Fowler: NoSQL Distilled, 2012
Master-slave Replication (2)
● For scaling a read-intensive application
○ More read requests → more slave nodes
○ The master fails → the slaves can still handle read requests
○ A slave can become a new master quickly (it is a replica)
user’s writes DB
● Example:
○ A single database instance is always consistent
○ If the replication factor > 1, the system must handle the
writes and/or reads in a special way
CAP Theorem (2)
Availability
● If a node (server) is working, it can read and write data
○ Every request must result in a response
Partition Tolerance
● System continues to operate, even if two sets of servers
get isolated
○ A connection failure should not shut the system down
Write(key, B)
Nodes have replicated data, two write attempts
node 1 node 2
● Strong consistency: agreement
○ Before the write is committed,
both nodes have to agree on the order of the writes
Write(key, A)
● In case of partitioning,
master can commit write Write(key, A)
(OK)
○ Losing some Consistency: Write(key, B)
(waiting 4ever)
Data on slave will be stale
for read master slave
Write (Update) Consistency in NoSQL When Data is
Replicated Peer to Peer
Write(key, A)
● Choosing Availability: Write(key, B)
○ Peer-to-peer replication
○ Eventual consistency peer 1 peer 2
● In case of Partitioning
○ All requests are answered (full Write(key, A)
Availability) Write(key, B)
○ We risk losing consistency
guarantees completely
peer 1 peer 2
Example:
● Read quorum: R = 2 Write(key, A) Write(key, B)
(R + W > N)
peer 1 peer 2
● 2 nodes contacted for read
Read(key)
=> the newest data returned peer 3
Relaxing Durability
Durability:
● When Write is committed, the change is permanent
● In some cases, strict durability is not essential and it
can be traded for scalability (write performance)
○ e.g., storing session data, collection sensor data