Module 2.3
Module 2.3
7
•Partition tolerance
•Proven by Nancy Lynch et al. MIT labs.
•http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keyn
ote.pdf
CAP theorem
10
• client perceives that a set of operations has occurred all at
once – Pritchett
• More like Atomic in ACID transaction properties
Availability
• Availability: Data to be available.
• node failures do not prevent survivors
from continuing to operate – Wikipedia
11
• Every operation must terminate in an intended response – Pritchett
Partition Tolerance
• Partial Tolerance: Data to be partitioned
across network segments due to network
failures.
12
• the system continues to operate despite
arbitrary message loss – Wikipedia
• Operations will complete, even if individual components are
unavailable – Pritchett
Febr
uary
16, 2
024
CAP Theorem
• Consistency
• 2 types of consistency:
1. Strong consistency – ACID (Atomicity, Consistency,
Isolation, Durability)
13
2. Weak consistency – BASE (Basically Available Soft-
state Eventual consistency)
BASE, an ACID Alternative
Almost the opposite of ACID.
•Basically available: Nodes in the a distributed environment can
go down, but the whole system shouldn’t be affected.
•Soft State (scalable): The state of the system and data changes
over time.
•Eventual Consistency: Given enough time, data will be consistent
across the distributed system.
Characteristics of BASE Transactions
• Weak consistency meaning stale data is okay.
• Availability has more priority
• Best effort.
• Approximate answers are okay
• Simple and faster
CAP
The cap theorem categorizes systems into three categories:
CP(consistent and partition tolerant) — The cp category is
confusing, i.e., a system that is consistent and partition tolerant
but never available. is referring to a category of systems where
availability is sacrificed only in the case of a network partition.
CA(consistent and available) — ca systems are consistent and
available systems in the absence of any network partition. often a
single node's db servers are categorized as ca systems. single node
db servers do not need to deal with partition tolerance and are
thus considered ca systems. the only hole in this theory is that
single node db systems are not a network of shared data systems
and thus do not fall under the preview of cap.
AP(available and partition tolerant) — these are systems that are
available and partition tolerant but cannot guarantee consistency.
CAP theorem with databases that
“choose” CA, CP and AP
Example- CA
RDBMS
Replication
Read A
B
Read Write
User User
Example-CP
Backup
Replication
Read/ Write
User
A
• Today, NoSQL databases are classified based on the two CAP characteristics
they support:
• CP database: A CP database delivers consistency and partition tolerance at
the expense of availability. When a partition occurs between any two
nodes, the system has to shut down the non-consistent node (i.e., make it
unavailable) until the partition is resolved.
• AP database: An AP database delivers availability and partition tolerance at
the expense of consistency. When a partition occurs, all nodes remain
available but those at the wrong end of a partition might return an older
version of data than others. (When the partition is resolved, the AP
databases typically resync the nodes to repair all inconsistencies in the
system.)
• CA database: A CA database delivers consistency and availability across all
nodes. It can’t do this if there is a partition between any two nodes in the
system, however, and therefore can’t deliver fault tolerance.
What is NOSQL?
• The Name:
• Stands for Not Only SQL
• The term NOSQL was introduced by Carl Strozzi in 1998 to name
his file-based database
22
• It was again re-introduced by Eric Evans when an event was
organized to discuss open source distributed databases
• Eric states that “… but the whole point of seeking alternatives is
that you need to solve a problem that relational databases are a
bad fit for. …”
What is NOSQL?
• Key features (advantages):
• non-relational
• don’t require schema
• data are replicated to multiple
nodes (so, identical & fault-tolerant)
23
and can be partitioned:
• down nodes easily replaced
• no single point of failure
• horizontal scalable
• cheap, easy to implement
(open-source)
• massive write performance
• fast key-value access
What is NOSQL?
• Disadvantages:
• Don’t fully support relational features
• no join, group by, order by operations (except within partitions)
• no referential integrity constraints across partitions
• No declarative query language (e.g., SQL) more programming
24
• Relaxed ACID (see CAP theorem) fewer guarantees
• No easy integration with other applications that support SQL
Questions?