0% found this document useful (0 votes)

17 views58 pages

2 NoSQL Databases Principles

Uploaded by

khawla tadist

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views58 pages

2 NoSQL Databases Principles

Uploaded by

khawla tadist

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 58

NoSQL Databases

Principles
Ecole d’Ingénierie Digitale et d’Intelligence Artificielle (EIDIA)
Cycle Préparatoire formation Ingénieur
Khawla TADIST
Année 2023-2024
Outline
Diﬀerent aspects of data distribution
 Scaling
 Vertical vs. horizontal
 Distribution models
 Sharding
 Replication
 Master-slave vs. peer-to-peer architectures
 CAP properties
 Consistency, Availability and Partition tolerance
 ACID vs. BASE
Scalability

What is scalability?
 Capability of a system to handle growing amounts of data and/or queries without
losing performance,
 Or its potential to be enlarged in order to accommodate such a growth.
 Two general approaches
 Vertical scaling
 Horizontal scaling
Scalability - Vertical Scalability

Vertical scaling (scaling up/down)

 Adding resources to a single node in a system
 Increasing the number of CPUs,
 Extending system memory,
 Using larger disk arrays,
 …
 i.e. larger and more powerful machines are involved
Scalability - Vertical Scalability

Vertical scaling (scaling up/down)

 Traditional choice
 In favor of strong consistency
 Easy to implement and deploy
 No issues caused by data distribution
 …
 Works well in many cases but …
Scalability - Vertical Scalability Drawbacks

Performance limits
 Even the most powerful machine has a limit
 Moreover, everything works well… unless we start approaching such limits
Higher costs
 The cost of expansion increases exponentially
 In particular, it is higher than the sum of costs of equivalent commodity
hardware
Scalability - Vertical Scalability Drawbacks

Proactive provisioning
 New projects/applications might evolve rapidly
 Upfront budget is needed when deploying new machines
 And so ﬂexibility is seriously suppressed
Scalability - Vertical Scalability Drawbacks

Vendor lock-in
 There are only a few manufacturers of large machines
 Customer is made dependent on a single vendor
 Their products, services, but also implementation details, proprietary formats,
interfaces, …
 i.e. it is difficult or impossible to switch to another vendor
Deployment downtime
 Inevitable downtime is often required when scaling up
Scalability - Horizontal Scalability

Horizontal scaling (scaling out/in)

 Adding more nodes to a system
 i.e. the system is distributed across multiple nodes in a cluster
 Choice of many NoSQL systems
Scalability - Horizontal Scalability

Horizontal scaling (scaling out/in)

 Advantages
 Commodity hardware, cost eﬀective
 Flexible deployment and maintenance
 Often surpasses the vertical scaling
 …
 Unfortunately, there are also plenty of false assumptions…
Scalability - Horizontal Scalability
Drawbacks
False assumptions
 Network is reliable
 Network is secure
 Latency is zero
 Bandwidth is infinite
 Topology does not change
 There is one administrator
 Transport cost is zero
Scalability - Horizontal Scalability
Consequences
Significantly increases complexity
 Complexity of management,
 Programming model, …
 Introduces new issues and problems
Synchronization of nodes
 Data distribution
 Data consistency
 Recovery from failures
 …
Scalability - Horizontal Scalability

A standalone node still might be a better option in certain cases

 e.g. for graph databases
 Simply because it is difficult to split and distribute graphs
 In other words
 It can make sense to run even a NoSQL database system on a single node
 No distribution at all is the most preferred/simple scenario
But in general, horizontal scaling does open new possibilities
Scalability - Horizontal Scalability
Architecture
What is a cluster?
 A collection of mutually interconnected commodity nodes
 Based on the shared-nothing architecture
 Nodes do not share their CPUs, memory, hard drives,…
 Each node runs its own operating system instance
 Nodes send messages to interact with each other
 Nodes of a cluster can be heterogeneous
 Data, queries, computation, workload, …
 This is all distributed among the nodes within a cluster
Distribution Models

Generic techniques of data distribution

 Sharding
 Different data on different nodes
 Motivation: increasing volume of data, increasing performance
 Replication
 Copies of the same data on different nodes
 Motivation: increasing performance, increasing fault tolerance
Distribution Models

Both the techniques are mutually orthogonal

 i.e. we can use either of them, or combine them both
NoSQL systems often oﬀer automatic sharding and replication
Distribution Models - Sharding

Sharding (horizontal partitioning)

 Placement of different data on different nodes
 What does different data mean? Different aggregates
– E.g. key-value pairs, documents, …
Distribution Models - Sharding

Sharding (horizontal partitioning)

 Placement of diﬀerent data on diﬀerent nodes
 Related pieces of data that are accessed together should also be kept together
– Specifically, operations involving data on multiple shards should be
avoided
Distribution Models - Sharding

Sharding (horizontal partitioning)

 The questions are…
 How to design aggregate structures?
 How to actually distribute these aggregates?
Distribution Models - Sharding
Sharding (horizontal partitioning)
Distribution Models - Sharding

Objectives
 Uniformly distributed data (volume of data)
 Balanced workload (read and write requests)
 Respecting physical locations
 e.g. diﬀerent data centers for users around the world
 …
Unfortunately, these objectives…
 May mutually contradict each other
 May change in time
Distribution Models - Sharding
Sharding (horizontal partitioning)

Source: Sadalage, Pramod J. - Fowler, Martin: NoSQL Distilled. Pearson Education, Inc., 2013.
Distribution Models - Sharding

How to actually determine shards for aggregates?

 We not only need to be able to place new data when handling write requests,
 But also ﬁnd the data in case of read requests
 i.e. when a given search criterion is provided (e.g. key, id, …),
Distribution Models - Sharding

How to actually determine shards for aggregates?

 We must be able to determine the corresponding shard to the given key
 So that the requested data can be accessed and returned,
 Or failure can be correctly detected when the data is missing
Distribution Models - Sharding

Sharding strategies
 Based on mapping structures
 Placing of data on shards in a random fashion (e.g. round-robin) (Not suitable)
 Based on general rules:
 Hash partitioning,
 Range partitioning
Distribution Models - Replication

Replication
 Placement of multiple copies – replicas – of the same data on diﬀerent nodes
 Replication factor = the number of copies
 Two approaches:
 Master-slave architecture
 Peer-to-peer architecture
Distribution Models - Replication - Master-
Slave
Master-Slave Architecture

Source: Sadalage, Pramod J. - Fowler, Martin: NoSQL Distilled. Pearson Education, Inc., 2013.
Distribution Models - Replication - Master-
Slave

Architecture
 One node is primary (master), all the other secondary (slave)
 Master node bears all the management responsibility
 All the nodes contain identical data
Distribution Models - Replication - Master-
Slave

Architecture
 Read requests can be handled by both the master or slaves
 Suitable for read-intensive applications
 More read requests to deal with → more slaves to deploy
 When the master fails, read operations can still be handled
Distribution Models - Replication - Master-
Slave

Write requests can only be handled by the master

Newly written replicas are propagated to all the slaves
Consistency issue
 Luckily enough, at most one write request is handled at a time
 But the propagation still takes some time during which obsolete reads might happen
 Hence certain synchronization is required to avoid conﬂicts
Distribution Models - Replication - Master-
Slave

In case of master failure, a new one needs to be appointed

 Manually (user-defined)
 Automatically (cluster-elected)
 Since the nodes are identical, appointment can be fast
Master might therefore represent a bottleneck (because of the performance or
failures)
Distribution Models - Replication - Peer-to-
Peer
Peer-to-Peer Architecture

Source: Sadalage, Pramod J. - Fowler, Martin: NoSQL Distilled. Pearson Education, Inc., 2013.
Distribution Models - Replication

Architecture
 All the nodes have equal roles and responsibilities
 All the nodes contain identical data once again
Distribution Models - Replication

Both read and write requests can be handled by any node

 No bottleneck, no single point of failure
 More requests to deal with → more nodes to deploy
Distribution Models - Replication

Both read and write requests can be handled by any node

 Consistency issues
 Unfortunately, multiple write requests can be initiated independently and handled at
the same time
 Hence synchronization is required to avoid conﬂicts
Distribution Models - Sharding and
Replication

Observations with respect to replication:

 Does the replication factor really need to correspond to the number of nodes?
 No, replication factor of 3 will often be the right choice
 Consequences
– Nodes will no longer contain identical data
– Replica placement strategy will be needed
 Sharding and replication can be combined… but how?
Distribution Models - Sharding and
Replication

Combinations of sharding and replication

 Sharding + master-slave replication
 Multiple masters, each for diﬀerent data
 Roles of the nodes can overlap
– Each node can be master for some data
and/or slave for other
Distribution Models - Sharding and
Replication

Combinations of sharding and replication

 Sharding + peer-to-peer replication
 Placement of anything anywhere
CAP Theorem

Assumptions
 System with sharding and replication
 Read and write operations on a single aggregate
CAP properties = properties of a distributed system
 Consistency
 Availability
 Partition tolerance
CAP Theorem

CAP theorem
 It is not possible to have a distributed system that would guarantee consistency,
availability, and partition tolerance at the same time.
 Only 2 of these 3 properties can be enforced.
 But, what do these properties actually mean?
CAP Theorem - Properties

Consistency
 Read and write operations must be executed atomically
 There must exist a total order on all operations such that each operation looks as if it
was completed at a single instant,
 i.e. as if all the operations were executed one by one on a single standalone node
CAP Theorem - Properties

Consistency
 Practical consequence: After a write operation, all readers see the same data
 Since any node can be used for handling of read requests, atomicity of write
operations means that changes must be propagated to all the replicas
CAP Theorem - Properties

Availability
 If a node is working, it must respond to user requests
 Every read or write request received by a non-failing node in the system must result in a
response
CAP Theorem - Properties

Partition tolerance
 System continues to operate even when two or more sets of nodes get isolated
 i.e. a connection failure MUST NOT shut the whole system down
CAP Theorem - Consequences

At most two properties can be guaranteed

 CA = Consistency + Availability
 CP = Consistency + Partition tolerance
 AP = Availability + Partition tolerance
CAP Theorem - Consequences

If at most two properties can be guaranteed…

 CA = Consistency + Availability
 Traditional ACID properties are easy to achieve
 Examples: RDBMS, Google BigTable
 Any single-node system
 However, should the network partition happen, all the nodes must be forced to stop
accepting user requests
CAP Theorem - Consequences

If at most two properties can be guaranteed…

 CP = Consistency + Partition tolerance
 Examples: MongoDB, HBase
CAP Theorem - Consequences

If at most two properties can be guaranteed…

 AP = Availability + Partition tolerance
 New concept of BASE properties
 Examples: Apache Cassandra, Apache CouchDB
 Other examples: web caching, DNS
CAP Theorem - Consequences

Partition tolerance is necessary in clusters

 Why?
 Because it is difficult to detect network failures
 Does it mean that only purely CP and AP systems are possible?
 No…
CAP Theorem - Consequences

The real meaning of the CAP theorem:

 Partition tolerance is a MUST,
 But we can trade off consistency versus availability
 Just a little bit relaxed consistency can bring a lot of availability
 Such trade-offs are not only possible, but often work very well in practice
ACID Properties
Traditional ACID properties
 Atomicity
 Partial execution of transactions is not allowed (all or nothing)
 Consistency
 Transactions bring the database from one consistent (valid) state to another
 Isolation
 Although multiple transactions may execute in parallel, each transaction must take into
consideration the execution of the other
 Durability
 Effects of committed transactions must remain durable
BASE Properties

New concept of BASE properties

 Basically Available
 The system works basically all the time
 Partial failures can occur, but without total system failure
 Soft State
 The system is in ﬂux (unstable)
 Changes occur all the time
 Eventual Consistency
 Sooner or later the system will be in some consistent state
ACID and BASE
ACID
 Choose consistency over availability
 Pessimistic approach
 Implemented by traditional relational databases
BASE
 Choose availability over consistency
 Optimistic approach
 Common in NoSQL databases
 Allows levels of scalability that cannot be acquired with ACID
Current trend in NoSQL:
 Strong consistency → eventual consistency
Consistency

Consistency in general…
 Consistency is the lack of contradiction in the database
 Strong consistency is achievable even in clusters, but eventual consistency might
often be sufficient
 Even when an already unavailable hotel room is booked once again, the situation can be
figured out in the real world
 …
Consistency

Write consistency (update consistency)

 Problem: write-write conﬂict
 Two or more write requests on the same aggregate are initiated concurrently
 Issue: lost update
 Question: Do we need to solve the problem in the ﬁrst place?
Consistency

Write consistency (update consistency)

 Question: Do we need to solve the problem in the first place?
 If yes, than there are two general solutions
 Pessimistic approaches
 Preventing conflicts from occurring
 Techniques: write locks, …
 Optimistic approaches
 Conflicts may occur, but are detected and resolved later on
 Techniques: version stamps, …
Consistency

Read consistency (replication consistency)

 Problem: read-write conflict
 Write and read requests on the same aggregate are initiated concurrently
 Issue: inconsistent read
 When not treated, inconsistency window will exist
 Propagation of changes to all the replicas takes some time
 Until this process is finished, inconsistent reads may happen even the initiator of the
write request may read wrong data!
Conclusion
There is a wide range of options influencing…
 Scalability
– how well the system scales (data and requests)?
 Availability
– when nodes may refuse to handle user requests?
 Consistency
– what level of consistency is required?
 Latency
– how complicated is to handle user requests?
 Durability
– are the committed data written reliably?
 Resilience
– can the data be recovered in case of failures?
It’s good to know these properties and choose the right trade-off

LabVIEW Core 2 Exercise Guide
No ratings yet
LabVIEW Core 2 Exercise Guide
77 pages
Ch02 - Big Data Storage Concepts
No ratings yet
Ch02 - Big Data Storage Concepts
23 pages
IRDA, Functions, Roles, Power, Duties Etc
No ratings yet
IRDA, Functions, Roles, Power, Duties Etc
4 pages
5.promotion Basics PDF
No ratings yet
5.promotion Basics PDF
24 pages
Validation Form
100% (3)
Validation Form
2 pages
Big Data Management Basic Principles
No ratings yet
Big Data Management Basic Principles
55 pages
Module 2 Nosql
No ratings yet
Module 2 Nosql
31 pages
NoSQL - Unit2
No ratings yet
NoSQL - Unit2
8 pages
Lecture 3 - Principles of NoSQL Databases
No ratings yet
Lecture 3 - Principles of NoSQL Databases
49 pages
Mathina BDA
No ratings yet
Mathina BDA
11 pages
NoSQL Module 2
No ratings yet
NoSQL Module 2
76 pages
Module 2
No ratings yet
Module 2
36 pages
NoSQL M2
No ratings yet
NoSQL M2
47 pages
Big Data Management and Nosql Databases: Doc. Rndr. Irena Holubova, PH.D
No ratings yet
Big Data Management and Nosql Databases: Doc. Rndr. Irena Holubova, PH.D
27 pages
Distribution Model
100% (1)
Distribution Model
24 pages
NoSQL Databases UNIT-2
No ratings yet
NoSQL Databases UNIT-2
29 pages
Big Data Storage Concepts
No ratings yet
Big Data Storage Concepts
31 pages
BDT Assignment
No ratings yet
BDT Assignment
4 pages
NoSql Module 2 Part 1
No ratings yet
NoSql Module 2 Part 1
13 pages
0zI2XrFJX5tR CjuECI f5HwGdQkpL8DAkTmwDPyFm3H0eCERMEvG9fH
No ratings yet
0zI2XrFJX5tR CjuECI f5HwGdQkpL8DAkTmwDPyFm3H0eCERMEvG9fH
13 pages
Lec 3 - Basic Concepts
No ratings yet
Lec 3 - Basic Concepts
32 pages
Nosql Databases
No ratings yet
Nosql Databases
379 pages
III Sharding Strategies
No ratings yet
III Sharding Strategies
30 pages
Nosql Mod2
No ratings yet
Nosql Mod2
25 pages
DRKP Module 2 1
No ratings yet
DRKP Module 2 1
77 pages
Module 2
No ratings yet
Module 2
40 pages
Nosql M2-P1-P2
No ratings yet
Nosql M2-P1-P2
75 pages
6q9k5yndkd9j-SDE DF400 020 Full Deck
No ratings yet
6q9k5yndkd9j-SDE DF400 020 Full Deck
81 pages
Big Data - No SQL Databases and Related Concepts
100% (1)
Big Data - No SQL Databases and Related Concepts
101 pages
Module 3
No ratings yet
Module 3
14 pages
NoSQL - Unit 2
No ratings yet
NoSQL - Unit 2
11 pages
Module 5 Part II NoSQL DB
No ratings yet
Module 5 Part II NoSQL DB
12 pages
Gcru 2 Nosql
No ratings yet
Gcru 2 Nosql
52 pages
Nosql 1
No ratings yet
Nosql 1
40 pages
Unit 5 NOSQL
No ratings yet
Unit 5 NOSQL
102 pages
Mod5 CH2
No ratings yet
Mod5 CH2
36 pages
SDA Presentation
No ratings yet
SDA Presentation
12 pages
Module 2 Nosql
No ratings yet
Module 2 Nosql
10 pages
Module 7 - NoSQL
No ratings yet
Module 7 - NoSQL
34 pages
NOSQL Databases and Big Data Storage Systems: Shilpa R Assistant Professor Cse, Sdmit Ujire
No ratings yet
NOSQL Databases and Big Data Storage Systems: Shilpa R Assistant Professor Cse, Sdmit Ujire
61 pages
BDA CH 2 (StorageConcepts)
No ratings yet
BDA CH 2 (StorageConcepts)
33 pages
Nosql Overview: Implementation Free
No ratings yet
Nosql Overview: Implementation Free
40 pages
Chapter24 Nosql Dbs
No ratings yet
Chapter24 Nosql Dbs
35 pages
No SQL
No ratings yet
No SQL
14 pages
A Thorough Introduction To Distributed Systems
No ratings yet
A Thorough Introduction To Distributed Systems
31 pages
NoSQL Databases
No ratings yet
NoSQL Databases
8 pages
Bda Module 3
No ratings yet
Bda Module 3
20 pages
Data Base Ppt.... Dbms
No ratings yet
Data Base Ppt.... Dbms
8 pages
NoSQL M1
No ratings yet
NoSQL M1
48 pages
DBMS Module-5 2024 Chap 2
No ratings yet
DBMS Module-5 2024 Chap 2
25 pages
Bda Ia2 Bda
No ratings yet
Bda Ia2 Bda
7 pages
3 Bda Chapter3 Answer
No ratings yet
3 Bda Chapter3 Answer
7 pages
Module 2
No ratings yet
Module 2
104 pages
Module 2
No ratings yet
Module 2
100 pages
Unit 4
No ratings yet
Unit 4
13 pages
Sharding in MongoDB
No ratings yet
Sharding in MongoDB
3 pages
Module 7
No ratings yet
Module 7
30 pages
CS3492-DBMS Unit-5
No ratings yet
CS3492-DBMS Unit-5
9 pages
Nosql Systems: Sharding, Replication and Consistency: Riccardo Torlone Università Roma Tre
No ratings yet
Nosql Systems: Sharding, Replication and Consistency: Riccardo Torlone Università Roma Tre
28 pages
Lec21Notes Merged
No ratings yet
Lec21Notes Merged
20 pages
CH 2 BDA
No ratings yet
CH 2 BDA
3 pages
DISTRIBUTEDSYSTEMSDesignGurus Io
No ratings yet
DISTRIBUTEDSYSTEMSDesignGurus Io
17 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
OS Shell
No ratings yet
OS Shell
24 pages
Lecture 7
No ratings yet
Lecture 7
35 pages
Lecture 11
No ratings yet
Lecture 11
40 pages
4 Column-Family Stores Cassandra
No ratings yet
4 Column-Family Stores Cassandra
44 pages
Lecture 10
No ratings yet
Lecture 10
39 pages
6 Graph Databases Neo4j
No ratings yet
6 Graph Databases Neo4j
46 pages
Key-Value Stores - Updated
No ratings yet
Key-Value Stores - Updated
65 pages
Personal and Environmental Hygiene
No ratings yet
Personal and Environmental Hygiene
4 pages
Plates - API-2W Grade 50
No ratings yet
Plates - API-2W Grade 50
2 pages
Flex Power Guide 1.3.2
No ratings yet
Flex Power Guide 1.3.2
191 pages
Scary NCLEX Topics Student Workbook 2020
100% (2)
Scary NCLEX Topics Student Workbook 2020
23 pages
Data and Analysis v4
No ratings yet
Data and Analysis v4
51 pages
Uganda Petroleum (Exploration, Development and Production) National Content Regulations, 2016
100% (2)
Uganda Petroleum (Exploration, Development and Production) National Content Regulations, 2016
23 pages
Session I Structural Analysis PDF
100% (1)
Session I Structural Analysis PDF
46 pages
Hosking PDF
No ratings yet
Hosking PDF
10 pages
Personal Project
No ratings yet
Personal Project
18 pages
FUR5-02 Herb Hunting
No ratings yet
FUR5-02 Herb Hunting
30 pages
Mechanical Engineering Drawing
No ratings yet
Mechanical Engineering Drawing
6 pages
Methods of Coating
No ratings yet
Methods of Coating
21 pages
Course Outlines - 5175 - Water Law
No ratings yet
Course Outlines - 5175 - Water Law
3 pages
Comparative Development Experiences of India, China and Pakistan
No ratings yet
Comparative Development Experiences of India, China and Pakistan
5 pages
Invoice 000085189
No ratings yet
Invoice 000085189
2 pages
Educational Computing
No ratings yet
Educational Computing
17 pages
Form Monitoring Emergency Kit
No ratings yet
Form Monitoring Emergency Kit
38 pages
The Research Question Assignment
No ratings yet
The Research Question Assignment
7 pages
Base Sonic's Universal Feat Thread (Games A
No ratings yet
Base Sonic's Universal Feat Thread (Games A
30 pages
Toyota PDF
No ratings yet
Toyota PDF
48 pages
Periodic Heat Transfer PDF
No ratings yet
Periodic Heat Transfer PDF
27 pages
Lara Andrea V. de Lara: Basic Integrated Theatre Arts Guild of iACT
No ratings yet
Lara Andrea V. de Lara: Basic Integrated Theatre Arts Guild of iACT
3 pages
Article Writing On Chandra Shekhar Venkatraman
No ratings yet
Article Writing On Chandra Shekhar Venkatraman
2 pages
CH 4 Fmi Financial Markets in The Financial System
No ratings yet
CH 4 Fmi Financial Markets in The Financial System
16 pages
Controlling A Robotic Car Through MATLAB GUI - Electronics Project
No ratings yet
Controlling A Robotic Car Through MATLAB GUI - Electronics Project
4 pages
Som - Varshney GRP
No ratings yet
Som - Varshney GRP
19 pages

2 NoSQL Databases Principles

Uploaded by

2 NoSQL Databases Principles

Uploaded by

NoSQL Databases

Vertical scaling (scaling up/down)

Vertical scaling (scaling up/down)

Horizontal scaling (scaling out/in)

Horizontal scaling (scaling out/in)

A standalone node still might be a better option in certain cases

Generic techniques of data distribution

Both the techniques are mutually orthogonal

Sharding (horizontal partitioning)

Sharding (horizontal partitioning)

Sharding (horizontal partitioning)

How to actually determine shards for aggregates?

How to actually determine shards for aggregates?

Write requests can only be handled by the master

In case of master failure, a new one needs to be appointed

Both read and write requests can be handled by any node

Both read and write requests can be handled by any node

Observations with respect to replication:

Combinations of sharding and replication

Combinations of sharding and replication

At most two properties can be guaranteed

If at most two properties can be guaranteed…

If at most two properties can be guaranteed…

If at most two properties can be guaranteed…

Partition tolerance is necessary in clusters

The real meaning of the CAP theorem:

New concept of BASE properties

Write consistency (update consistency)

Write consistency (update consistency)

Read consistency (replication consistency)

You might also like