Brewer Conjecture (CAP)

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

Life after CAP

Ali Ghodsi [email protected]

CAP conjecture [reminder]


Can only have two of:
Consistency Availability Partition-tolerance

Examples
Databases, 2PC, centralized algo (C & A) Distributed databases, majority protocols (C & P) DNS, Bayou (A & P)

CAP theorem
Formalization by Gilbert & Lynch What does impossible mean?
There exist an execution which violates one of CAP not possible to guarantee that an algorithm has all three at all times
Shard data with different CAP tradeoffs Detect partitions and weaken consistency

Partition-tolerance & availability


What is partition-tolerance?
Consistency and Availability are provided by algo Partitions are external events (scheduler/oracle)
Partition-tolerance is really a failure model Partition-tolerance equivalent with omissions

In the CAP theorem


Proof rests on partitions that never heal Datacenters can guarantee recovery of partitions!
Can guarantee that conflict resolution eventually happens

How do we ensure consistency


Main technique to be consistent
Quorum principle Example: Majority quorums
Always write to and read from a majority of nodes At least one node knows most recent value
majority(9)=5

WRITE(v)

READv

Quorum Principle
Majority Quorum
Pro: tolerate up to N/2 -1 crashes Con: Have to read/write N/2 +1 values

Read/write quorums (Dynamo, ZooKeeper, Chain Repl)


Read R nodes, Rrite W nodes, s.t. R + W > N (W > N/2) Pro: adjust performance of reads/writes Con: availability can suffer

Maekwa Quorum

P1

P2 P5 P8

P3 P6 P9

Arrange nodes in a MxM grid P4 Write to row+col, read cols (always overlap) P7 Pro: Only need to read/write O( sqrt(N) ) nodes Con: Tolerate at most O( sqrt(N) ) crashes (reconfiguration)

Probabilistic Quorums
Quorum size N, ( > 1) intersects with probability 1-exp(2)
Example: Maekwa: N=16 nodes, quorum size 7, intersects 95%, tolerates 9 failures N=16 nodes, quorum size 7, intersects 100%, tolerates 4 failures

Pro: Small quorums, high fault-tolerance Con: Could fail to intersect, N usually large
8

Quorums and CAP


With quorums we can get
C & P: partition can make quorum unavailable
C & A: no-partition ensures availability and atomicity

Faced decision when fail to get quorum *brewer11+


Sacrifice availability by waiting for merger Sacrifice atomicity by ignoring the quorum

Can we get CAP for weaker consistency?

What does atomicity really mean?


P1 P2 P3 W(5) R W(6)
invocation response

Linearization Points Read ops appear as if immediately happened at all nodes at


time between invocation and response

Write ops appear as if immediately happened at all nodes at


time between invocation and response

Definition of Atomicity
Linearization Points Read ops appear as if immediately happened at all nodes at
time between invocation and response

Write ops appear as if immediately happened at all nodes at


time between invocation and response

P1
P2 P3 W(5)

R:6

R:5
W(6)

atomic

Definition of Atomicity
P1 P2 P3 P1 P2 P3 W(5) R:6 W(5) R:6 W(6) R:6

atomic

R:5
not atomic

W(6)

Atomicity too strong?


P1 P2 P3 R:6

R:5
not atomic

W(5)

W(6)

Linearization points too strong?


Why not just have R:5 appear atomically right after W(5)? Lamport: If P2s operator phones P1 and tells her I just read 6

Atomicity too strong?


P1 P2 P3

R:5
R:6 not atomic sequentially consistent

W(5)

W(6)

Sequential consistency
Weaker than atomicity Sequential consistency removes this real-time requirement Any global ordering OK as long as they respect local ordering Does Gilberts proof fall apart for sequential consistency?

Causal memory
Weaker than sequential No need to have global view, each process different view Local, read/writes immediately return to caller CAP theorem does not apply to causal memory

P1
P2

causally consistent W(0) R:1 W(1) R:0

Going really weak


Eventual consistency
When network non-partitioned, all nodes eventually have the same value I.e. dont be consistent at all times, but only after partitions heal!

Based on powerful technique: gossipping


Periodically exchange logs with one random node Exchange must be constant-sized packets Set reconciliation, merkle trees, etc Use (clock, node_id) to break ties of events in log

Properties of gossipping
All nodes will have the same value in O(log N) time No positive-feedback cycles that congest the network

BASE
Catch all for any consistency model C that enables C-A-P
Eventual consistency PRAM consistency Causal consistency

Main ingredients
Stale data Soft-state (regenerateable state) Approximate answers

Summary
No need to ensure CAP at all times
Switch between algorithms or satisfy subset at different times

Weaken consistency model


Choose weaker consistency:
Causal memory (relatively strong) work around CAP

Only be consistent when network isnt partitioned:


Eventual consistency (very weak) works around CAP

Weaken partition-tolerance
Some environments never partition, e.g. datacenters Tolerate unavailability in small quorums Some env. have recovery guarantees (partitions heal within X hours), perform conflict resolution

Related Work (ignored in talk)


PRAM consistency (Pipelined RAM)
Weaker than causal and non-blocking

Eventual Linearizability (PODC10)


Becomes atomic after quiescent periods

Gossipping & set reconciliation


Lots of related work

You might also like