2 NoSQL Databases Principles
2 NoSQL Databases Principles
Principles
Ecole d’Ingénierie Digitale et d’Intelligence Artificielle (EIDIA)
Cycle Préparatoire formation Ingénieur
Khawla TADIST
Année 2023-2024
Outline
Different aspects of data distribution
Scaling
Vertical vs. horizontal
Distribution models
Sharding
Replication
Master-slave vs. peer-to-peer architectures
CAP properties
Consistency, Availability and Partition tolerance
ACID vs. BASE
Scalability
What is scalability?
Capability of a system to handle growing amounts of data and/or queries without
losing performance,
Or its potential to be enlarged in order to accommodate such a growth.
Two general approaches
Vertical scaling
Horizontal scaling
Scalability - Vertical Scalability
Performance limits
Even the most powerful machine has a limit
Moreover, everything works well… unless we start approaching such limits
Higher costs
The cost of expansion increases exponentially
In particular, it is higher than the sum of costs of equivalent commodity
hardware
Scalability - Vertical Scalability Drawbacks
Proactive provisioning
New projects/applications might evolve rapidly
Upfront budget is needed when deploying new machines
And so flexibility is seriously suppressed
Scalability - Vertical Scalability Drawbacks
Vendor lock-in
There are only a few manufacturers of large machines
Customer is made dependent on a single vendor
Their products, services, but also implementation details, proprietary formats,
interfaces, …
i.e. it is difficult or impossible to switch to another vendor
Deployment downtime
Inevitable downtime is often required when scaling up
Scalability - Horizontal Scalability
Objectives
Uniformly distributed data (volume of data)
Balanced workload (read and write requests)
Respecting physical locations
e.g. different data centers for users around the world
…
Unfortunately, these objectives…
May mutually contradict each other
May change in time
Distribution Models - Sharding
Sharding (horizontal partitioning)
Source: Sadalage, Pramod J. - Fowler, Martin: NoSQL Distilled. Pearson Education, Inc., 2013.
Distribution Models - Sharding
Sharding strategies
Based on mapping structures
Placing of data on shards in a random fashion (e.g. round-robin) (Not suitable)
Based on general rules:
Hash partitioning,
Range partitioning
Distribution Models - Replication
Replication
Placement of multiple copies – replicas – of the same data on different nodes
Replication factor = the number of copies
Two approaches:
Master-slave architecture
Peer-to-peer architecture
Distribution Models - Replication - Master-
Slave
Master-Slave Architecture
Source: Sadalage, Pramod J. - Fowler, Martin: NoSQL Distilled. Pearson Education, Inc., 2013.
Distribution Models - Replication - Master-
Slave
Architecture
One node is primary (master), all the other secondary (slave)
Master node bears all the management responsibility
All the nodes contain identical data
Distribution Models - Replication - Master-
Slave
Architecture
Read requests can be handled by both the master or slaves
Suitable for read-intensive applications
More read requests to deal with → more slaves to deploy
When the master fails, read operations can still be handled
Distribution Models - Replication - Master-
Slave
Source: Sadalage, Pramod J. - Fowler, Martin: NoSQL Distilled. Pearson Education, Inc., 2013.
Distribution Models - Replication
Architecture
All the nodes have equal roles and responsibilities
All the nodes contain identical data once again
Distribution Models - Replication
Assumptions
System with sharding and replication
Read and write operations on a single aggregate
CAP properties = properties of a distributed system
Consistency
Availability
Partition tolerance
CAP Theorem
CAP theorem
It is not possible to have a distributed system that would guarantee consistency,
availability, and partition tolerance at the same time.
Only 2 of these 3 properties can be enforced.
But, what do these properties actually mean?
CAP Theorem - Properties
Consistency
Read and write operations must be executed atomically
There must exist a total order on all operations such that each operation looks as if it
was completed at a single instant,
i.e. as if all the operations were executed one by one on a single standalone node
CAP Theorem - Properties
Consistency
Practical consequence: After a write operation, all readers see the same data
Since any node can be used for handling of read requests, atomicity of write
operations means that changes must be propagated to all the replicas
CAP Theorem - Properties
Availability
If a node is working, it must respond to user requests
Every read or write request received by a non-failing node in the system must result in a
response
CAP Theorem - Properties
Partition tolerance
System continues to operate even when two or more sets of nodes get isolated
i.e. a connection failure MUST NOT shut the whole system down
CAP Theorem - Consequences
Consistency in general…
Consistency is the lack of contradiction in the database
Strong consistency is achievable even in clusters, but eventual consistency might
often be sufficient
Even when an already unavailable hotel room is booked once again, the situation can be
figured out in the real world
…
Consistency