Distributed Systems
Distributed Systems
Michaelmas 2021
Dr David J Greaves and Dr Martin Kleppmann
(With thanks to Dr Robert N. M. Watson and Dr Steven Hand)
1
Concurrent and Distributed Systems
2
log scale
https://www.karlrupp.net/2018/02/42-years-of-microprocessor-trend-data/
3
Concurrent systems outline
1. Introduction to concurrency, threads, and mutual exclusion.
2. Automata composition - safety and liveness.
3. Semaphores and associated design patterns.
4. CCR, monitors and concurrency in programming languages.
5. Deadlock, liveness and priority inversion and limits on
parallelism.
6. Concurrency without shared data; transactions.
7. Further transactions.
8. Crash recovery; lock free programming; (Transactional
memory).
See the ‘Learner’s Guide’ on the course pages for additional notes a well.
4
Recommended reading
• “Operating Systems, Concurrent and Distributed Software Design“, Jean
Bacon and Tim Harris, Addison-Wesley 2003
• “Designing Data-Intensive Applications”, Martin Kleppmann O’Reilly
Media 2017
• “Modern Operating Systems”, Andrew Tannenbaum, Prentice-Hall 2007
etc and free pdf online.
• “Java Concurrency in Practice”, Brian Goetz and others, Addison-Wesley
2006
See the “Learner’s Guide” on the course pages for additional notes as well.
6
In this course we will
• Investigate concurrency in computer systems
– Processes, threads, interrupts, hardware
• Consider how to control concurrency
– Mutual exclusion (locks, semaphores), condition synchronization,
HLL primitives and lock-free programming
• Learn about deadlock, livelock, priority inversion
– And prevention, avoidance, detection, recovery
• See how abstraction can provide support for correct & fault-tolerant
concurrent execution
– Transactions, serialisability, concurrency control
• Look at techniques for anticipating and detecting deadlock
• Later, we will extend these ideas to distributed systems.
7
Recall: Processes and threads
• Processes are instances of programs in execution
– OS unit of protection & resource allocation
– Has a virtual address space; and one or more threads
• Threads are entities managed by the scheduler
– Represents an individual execution context
– A thread control block (TCB) holds the saved context (registers,
including stack pointer), scheduler info, etc
• Threads run in the address spaces of their process
– (and also in the kernel address space on behalf of user code)
• Context switches occur when the OS saves the state of one thread
and restores the state of another
– If a switch is between threads in different processes, then process
state is also switched – e.g., the address space.
8
Concurrency with a single CPU (1)
• Process / OS concurrency
– Process X runs for a while (until blocks or interrupted)
– OS runs for a while (e.g. does some TCP processing)
– Process X resumes where it left off…
• Inter-process concurrency
– Process X runs for a while; then OS; then Process Y; then OS; then
Process Z; etc
• Intra-process concurrency
– Process X has multiple threads X1, X2, X3, …
– X1 runs for a while; then X3; then X1; then X2; then …
9
Concurrency with a single CPU (2)
• With just one CPU, can think of concurrency as
interleaving of different executions, e.g.
Proc(A) OS Proc(B) OS Proc(B) OS Proc(C) OS Proc(A)
time
void main(void) {
threadid_t threads[NUMTHREADS]; // Thread IDs
int i; // Counter
Additional threads
for (i = 0; i < NUMTHREADS; i++) are started explicitly
threads[i] = thread_create(threadfn, i);
13
Multiple threads within a process
Thread 1 Process • A single-threaded process has code, a
registers address heap, a stack, registers
$a0 space • Additional threads have their own
$a1
Stack registers and stacks
$t0 – Per-thread program counters ($pc)
$sp allow execution flows to differ
$pc – Per-thread stack pointers ($sp) allow
Stack
Thread 2 call stacks, local variables to differ
registers • Heap and code (+global variables) are
$a0
shared between all threads
$a1
• Access to another thread’s stack is
$t0 possible in some languages – but
Heap
$sp deeply discouraged!
$pc
Code
14
1:N - user-level threading
• Kernel only knows about (and schedules)
processes
• A userspace library implements threads,
context switching, scheduling,
P1 T1 T2 T3
synchronisation, …
– E.g., the JVM or a threading library
• Advantages
– Lightweight creation/termination +
context switch; application-specific
scheduling; OS independence
• Disadvantages
P1 P2
– Awkward to handle blocking system
calls or page faults, preemption; cannot
use multiple CPUs
Kernel • Very early 1990s!
CPU 1 CPU 2
15
1:1 - kernel-level threading
• Kernel provides threads directly
– By default, a process has one thread…
– … but can create more via system calls
P1 T1 T3
• Kernel implements threads, thread
T2 context switching, scheduling, etc.
• Userspace thread library 1:1 maps user
threads into kernel threads
• Advantages:
– Handles preemption, blocking syscalls
– Straightforward to use multiple CPUs
P1 • Disadvantages:
– Higher overhead (trap to kernel); less
flexible; less portable
Kernel
• Model of choice across major OSes
– Windows, Linux, MacOS, FreeBSD,
CPU 1 CPU 2 Solaris, … 16
M:N - hybrid threading
• Best of both worlds?
– M:N threads, scheduler activations, …
• Kernel exposes a smaller number (M) of
T1 activations – typically 1:1 with CPUs
P1 T2 T3 T2 • Userspace schedules a larger number (N) of
threads onto available activations
– Kernel upcalls when a thread blocks,
returning the activation to userspace
– Kernel upcalls when a thread wakes up,
userspace schedules it on an activation
– Kernel controls maximum parallelism by
P1 limiting number of activations
• Removed from most OSes – why?
• Now: Virtual Machine Monitors (VMMs)
Kernel – Each Virtual CPU (VCPU) is an activation
• Reappears in concurrency frameworks
CPU 1 CPU 2
– E.g., Apple’s Grand Central Dispatch (GCD)
17
Advantages of concurrency
• Allows us to overlap computation and I/O on a single machine
• Can simplify code structuring and/or improve responsiveness
– E.g. one thread redraws the GUI, another handles user input, and
another computes game logic
– E.g. one thread per HTTP request
– E.g. background GC thread in JVM/CLR
• Enables the seamless (?!) use of multiple CPUs –greater performance
through parallel processing
18
Concurrent systems
• In general, have some number of processes…
– … each with some number of threads …
– … running on some number of computers…
– … each with some number of CPUs.
• For this half of the course we’ll focus on a single computer running a multi-
threaded process
– most problems & solutions generalize to multiple processes, CPUs, and
machines, but more complex
– (we’ll look at distributed systems later in the term)
• Challenge: threads will access shared resources concurrently via their
common address space
19
Example: Housemates Buying Beer
• Thread 1 (person 1) • Thread 2 (person 2)
1.Look in fridge 1.Look in fridge
2.If no beer, go buy beer 2.If no beer, go buy beer
3.Put beer in fridge 3.Put beer in fridge
20
Solution #1: Leave a Note
• Thread 1 (person 1) • Thread 2 (person 2)
1.Look in fridge 1.Look in fridge
2.If no beer & no note 2.If no beer & no note
1.Leave note on fridge 1.Leave note on fridge
2.Go buy beer 2.Go buy beer
3.Put beer in fridge 3.Put beer in fridge
4.Remove note 4.Remove note
24
Critical Sections & Mutual Exclusion
• The high-level problem here is that we have
two threads trying to solve the same problem
– Both execute buyBeer() concurrently
– Ideally want only one thread doing that at a time
• We call this code a critical section
– A piece of code which should never be concurrently
executed by more than one thread
• Ensuring this involves mutual exclusion
– If one thread is executing within a critical section, all
other threads are prohibited from entering it
25
Achieving Mutual Exclusion
• One way is to let only one thread ever execute a particular
critical section – e.g. a nominated beer buyer – but this
restricts concurrency
• Alternatively our (broken) solution #1 was trying to provide
mutual exclusion via the note
– Leaving a note means “I’m in the critical section”;
– Removing the note means “I’m done”
– But, as we saw, it didn’t work ;-)
• This was because we could experience a context switch
between reading ‘note’, and setting it
26
Non-Solution #1: Leave a Note
// thread 1 // thread 2
beer = checkFridge();
if(!beer) {
if(!note) { context switch
beer = checkFridge();
We decide to if(!beer) {
enter the critical if(!note) {
But only mark the note = 1;
section here…
fact here … buyBeer();
note = 0;
context switch
note = 1;
buyBeer();
note = 0; }
} These problems
} are referred to as race
} conditions in which multiple threads
“race” with one another during
conflicting access to shared resources
27
Atomicity
• What we want is for the checking of note and the (conditional)
setting of note to happen without any other thread being
involved
– We don’t care if another thread reads it after we’re done; or
sets it before we start our check
– But once we start our check, we want to continue without
any interruption
• If a sequence of operations (e.g. read-and-set) are made to
occur as if one operation, we call them atomic
– Since indivisible from the point of view of the program
• An atomic read-and-set operation is sufficient for us to
implement a correct beer program
28
Solution #2: Atomic Note
// thread 1 // thread 2
beer = checkFridge(); beer = checkFridge();
if(!beer) { if(!beer) {
if(read-and-set(note)) { if(read-and-set(note)) {
buyBeer(); buyBeer();
note = 0; note = 0;
} }
} }
32
Implementing mutual exclusion
• Associate a mutual exclusion lock with each
critical section, e.g. a variable L
– (must ensure use correct lock variable!)
– ENTER_CS() = “LOCK(L)”
LEAVE_CS() = “UNLOCK(L)”
• Can implement LOCK() using read-and-set():
LOCK(L) { UNLOCK(L) {
while(!read-and-set(L)) L = 0;
; // do nothing }
}
33
Solution #3: mutual exclusion locks
// thread 1 // thread 2
LOCK(fridgeLock); LOCK(fridgeLock);
beer = checkFridge(); beer = checkFridge();
if(!beer) if(!beer)
buyBeer(); buyBeer();
UNLOCK(fridgeLock); UNLOCK(fridgeLock);
• Next time:
– Operating System and hardware instructions and structures,
– Interacting automata view of concurrency,
– Introduction to formal modelling of concurrency.
35
Concurrent systems
Lecture 2: Hardware, OS and Automaton Views
Dr David J Greaves
(Thanks to Dr Robert N. M. Watson)
(2020/21: Bakery Algorithm)
1
From last time ...
• Concurrency exploits parallel and distributed
computation.
• Concurrency is also a useful programming
paradigm and a virtualisation means.
• Race conditions arise with imperative
languages in shared memory (sadly the
predominant paradigm of last 15 years).
• Concurrency bugs are hard to anticipate.
2
This time
• Computer architecture and O/S summary
• Hardware support for atomicity
• Basic Automata Theory/Jargon and
interactions.
• Simple model checking
• Dining Philosophers Taster
• Primitive-free atomicity (Bakery Alg)
3
General comments
• Concurrency is essential in modern systems
– overlapping I/O with computation
– building distributed systems
– But throws up a lot of challenges
• need to ensure safety, allow synchronization, and avoid issues of liveness
(deadlock, livelock, …)
• A major risk of over-engineering exists: putting in too many locks not really
needed.
• Also its possible to get accidental, excessive serialisation, killing the
expected parallel speedup.
• Generally worth building sequential system first
– and worth using existing libraries, tools and design patterns rather than
rolling your own!
4
https://www.cl.cam.ac.uk/~djg11/socdam-patterns-hls-touchstones/soc-design-patterns/sp1-socparts/zhp6c8e57449.html
Even on a uniprocessor,
interrupt routines will ‘magically’
change stored values in memory.
Stop-the-world atomic
operations are undesirable on
parallel hardware.
Operating System Behaviour
7
Hardware foundations for atomicity 2
• How can we implement atomic read-and-set?
• Simple pair of load and store instructions fail
the atomicity test (obviously divisible!)
• Need a new ISA primitive for protection against
parallel access to memory from another CPU
• Two common flavours:
– Atomic Compare and Swap (CAS)
– Load Linked, Store Conditional (LL/SC)
– (But we also find atomic increment, bitset etc..)
8
Atomic Compare and Swap (CAS)
• Instruction operands: memory address, prior + new values
– If prior value matches in-memory value, new value stored
– If prior value does not match in-memory value, instruction fails
– Software checks return value, can loop on failure
• Found on CISC systems such as x86 (cmpxchg)?
FSM is tuple: (Q, q0, Σ, Δ) being states, start state, input alphabet, transition function.
A live state is one that can be returned to infinitely often in the future.
‘Bad’ states are those that lead away from the main live behaviour.
Finite State Machine: Fairness and Livelock
Ignoring the ‘F’, the live states of this FSM include Q5 and Q6.
F has been labelled as a ‘fair’ state. If we also discard the start-up ‘lasso
stem’, its existence changes the live states to just Q2, Q3, Q4. Manual
labelling defines the intended system behaviour.
Any fair state is live and states from which any fair state cannot be reached
are not live. [ Hence if we also labelled Q5 as F, fairness cannot be achieved.]
Example couplings:
Half coupled:
Let y = M1 in state A.
Full coupling:
Let y = M1 in state A
and x = M2 in state 0.
Fully-coupled:
Let y = M1 in state A
and x = M2 in state 0.
while(true) { // philosopher i
think();
wait(fork[i]);
wait(fork[(i+1) % 5];
eat();
signal(fork[i]);
signal(fork[(i+1) % 5];
}
• For now, read ‘wait’ as ‘pick up’ and ‘signal’ as `put down’
• See next time for definitions.
• Exercise: Draw out FSM product for 2 or 3 philosophers.
18
Reachable State Space Algorithm
• 0. Input FSM = (Q, q0, Σ, Δ)
• 1. Initialise reachable R = { q0 }
• 2. while(changes)
R = R ∪ { q’| q’ = Δ(q, σ), q ∈R, σ ∈Σ }
The `while(changes)’ construct makes this a fixed-point
iteration.
21
Lamport Bakery Algorithm
void lock(tid) // tid=thread identifier
{ Note, the continue
Enter[tid] = true; statements operate
Number[tid] = 1 + maxj (Number[j]); on their `whiles’
Enter[tid] = false; not the outer ‘for’.
for (j in 0..Ntid-1) They are `spins’.
{ while (Enter[j]) continue;
while (Number[j] && (Number[j], j)<(Number[tid], tid)) continue;
}
} Take a ticket on entry to bakery, one
greater than maximum issued (or in use).
23
Summary + next time
• We looked at underlying hardware structures (but this
was for completeness rather than for examination
purposes)
• We looked at finite-state models of programs and a
model checker, but do note that today’s tools can
cope only with highly-abstracted models or small sub-
systems of real-world applications.
• Next time
– Access to hardware primitives via O/S
Dr David J Greaves
(Thanks to Dr Robert N. M. Watson)
1
Reminder from last time
• Automata models of concurrent systems
• Concurrency hardware mechanisms
2
From first lecture
From last time: beer-buying example
• Thread 1 (person 1) • Thread 2 (person 2)
1. Look in fridge 1. Look in fridge
2. If no beer, go buy beer 2. If no beer, go buy beer
3. Put beer in fridge 3. Put beer in fridge
• In most cases, this works just fine…
• But if both people look (step 1) before either refills the fridge (step 3)… we’ll end up with
too much beer!
• Obviously more worrying if “look in fridge” is “check reactor”, and “buy beer” is “toggle
safety system” ;-)
4
Implementing mutual exclusion
• Associate a mutual exclusion lock with each
critical section, e.g. a variable L
– (must ensure use correct lock variable!)
ENTER_CS() = “LOCK(L)”
LEAVE_CS() = “UNLOCK(L)”
• Can implement LOCK() using read-and-set():
LOCK(L) { UNLOCK(L) {
while(!read-and-set(L)) L = 0;
continue; // do nothing }
}
5
Semaphores
• Despite with atomic ops, busy waiting remains inefficient…
– Lock contention with spinning-based solution wastes CPU cycles.
– Better to sleep until resource available.
• Dijkstra (THE, 1968) proposed semaphores
– New type of variable
– Initialized once to an integer value (default 0)
• Supports two operations: wait() and signal()
– Sometimes called down() and up()
– (and originally called P() and V() ... blurk!)
• Can be used for mutual exclusion with sleeping
• Can also be used for condition synchronisation
– Wake up another waiting thread on a condition or event
– E.g., “There is an item available for processing in a queue”
6
Semaphore implementation
• Implemented as an integer and a queue
wait(sem) {
if(sem > 0) {
sem = sem - 1;
} else suspend caller & add thread to queue for sem
}
signal(sem) {
if no threads are waiting {
sem = sem + 1;
} else wake up some thread on queue
}
0 wait (aSem)
0 B wait (aSem)
CS B blocked
0 B, C wait (aSem)
0 C blocked
C signal (aSem)
CS
0 signal (aSem)
CS
1 signal (aSem)
aSem
A B A B
aSem
0 0
0 wait (aSem)
A signal (aSem)
1
A blocked “wake-up waiting”
12
Producer-consumer problem
• General “pipe” concurrent programming paradigm
– E.g. pipelines in Unix; staged servers; work stealing;
download thread vs. rendering thread in web browser
• Shared buffer B[] with N slots, initially empty
• Producer thread wants to: If producer thread paused while buffer
– Produce an item is full, this is called ‘backpressure’.
buffer g h i j k l
0 out in N-1
14
Producer-consumer solution
int buffer[N]; int in = 0, out = 0;
spaces = new Semaphore(N);
items = new Semaphore(0);
buffer g h i j k l
0 out in N-1
15
Producer-consumer solution
• Use of semaphores for N-resource allocation
– In this case, resource is a slot in the buffer
– spaces allocates empty slots (for producer)
– items allocates full slots (for consumer)
• No explicit mutual exclusion
– Threads will never try to access the same slot at the
same time; if “in == out” then either
• buffer is empty (and consumer will sleep on items), or
• buffer is full (and producer will sleep on spaces)
– NB: in and out are each accessed solely in one of the
producer (in) or consumer (out) 16
Generalized producer-consumer
• Previously had exactly one producer thread, and
exactly one consumer thread
• More generally might have many threads adding
items, and many removing them
• If so, we do need explicit mutual exclusion
– E.g. to prevent two consumers from trying to remove
(and consume) the same item
– (Race conditions due to concurrent use of in and out
precluded when just one thread on each end)
• Can implement with one more semaphore…
17
Generalized P-C solution
int buffer[N]; int in = 0, out = 0;
spaces = new Semaphore(N);
items = new Semaphore(0);
guard = new Semaphore(1); // for mutual exclusion
A B C
20
Summary + next time
• Implementing mutual exclusion: hardware support for
atomicity and inter-processor interrupts
• Semaphores for mutual exclusion, condition synchronisation,
and resource allocation
• Two-party and generalised producer-consumer relationships
• Invariants and locks
• Next time:
– Multi-Reader Single-Writer (MRSW) locks
– Starvation and fairness
– Alternatives to semaphores/locks
– Concurrent primitives in practice
21
Summary + next time
• Implementing mutual exclusion: hardware support for
atomicity and inter-processor interrupts
• Semaphores for mutual exclusion, condition synchronisation,
and resource allocation
• Two-party and generalised producer-consumer relationships
• Invariants and locks
• Next time:
– Multi-Reader Single-Writer (MRSW) locks
– Starvation and fairness
– Alternatives to semaphores/locks
– Concurrent primitives in practice
22
Concurrent systems
Lecture 4: CCR, monitors, and
concurrency in practice.
Dr David J Greaves
(Thanks to Dr Robert N. M. Watson)
1
Reminder from last time
• Implementing mutual exclusion: hardware
support for atomicity and inter-processor
interrupts
• Semaphores for mutual exclusion, condition
synchronisation, and resource allocation
• Two-party and generalised producer-
consumer relationships
• Invariants and locks
2
From last time: Semaphores summary
• Powerful abstraction for implementing concurrency control:
– mutual exclusion & condition synchronization
• Better than read-and-set()… but correct use requires
considerable care
– e.g. forget to wait(), can corrupt data
– e.g. forget to signal(), can lead to infinite delay
– generally get more complex as add more semaphores
• Used internally in some OSes and libraries, but generally
deprecated for other mechanisms…
4
Multiple-Readers Single-Writer (MRSW)
• Another common synchronisation paradigm is MRSW
– Shared resource accessed by a set of threads
• e.g. cached set of DNS results
– Safe for many threads to read simultaneously, but a writer
(updating) must have exclusive access
– MRSW locks have read lock and write lock operations
– Mutual exclusion vs. data stability
• Simple implementation uses two semaphores
• First semaphore is a mutual exclusion lock (mutex)
– Any writer must wait to acquire this
• Second semaphore protects a reader count
– Reader count incremented whenever a reader enters
– Reader count decremented when a reader exits
– First reader acquires mutex; last reader releases mutex.
5
Simplest MRSW solution
int nr = 0; // number of readers
rSem = new Semaphore(1); // protects access to nr
wSem = new Semaphore(1); // protects writes to data
7
A fairer MRSW solution
int nr = 0; // number of readers
rSem = new Semaphore(1); // protects access to nr
wSem = new Semaphore(1); // protects writes to data
turn = new Semaphore(1); // write is awaiting a turn
shared int A, B, C;
region A, B {
await( /* arbitrary condition */);
// critical code using A and B
}
wait(cv) {
suspend thread and add it to the queue for CV,
release monitor lock;
}
signal(cv) {
if any threads queued on CV, wake one thread;
}
broadcast(cv) {
wake all threads queued on CV;
} 14
Monitor Producer-Consumer solution?
monitor ProducerConsumer {
int in, out, buffer[N];
condition notfull = TRUE, notempty = FALSE;
If buffer is full,
procedure produce(item) { wait for consumer
if ((in-out) == N) wait(notfull);
buffer[in % N] = item; If buffer was empty,
if ((in-out) == 0) signal(notempty); signal the consumer
in = in + 1;
}
procedure int consume() { If buffer is empty,
if ((in-out) == 0) wait(notempty); wait for producer
item = buffer[out % N];
if ((in-out) == N) signal(notfull); If buffer was full,
out = out + 1; signal the producer
return(item);
}
/* init */ { in = out = 0; }
} 15
Does this work?
• Depends on implementation of wait() & signal()
• Imagine two threads, T1 and T2
– T1 enters the monitor and calls wait(C) – this suspends T1,
places it on the queue for C, and unlocks the monitor
– Next T2 enters the monitor, and invokes signal(C)
– Now T1 is unblocked (i.e. capable of running again)…
– … but can only have one thread active inside a monitor!
• If we let T2 continue (signal-and-continue), T1 must queue for
re-entry to the monitor
– And no guarantee it will be next to enter
• Otherwise T2 must be suspended (signal-and-wait), allowing
T1 to continue…
Note: C is either of our two condition variables.
16
Signal-and-Wait (“Hoare Monitors”)
18
Same code as slide 15
20
Signal-and-Continue example (1)
P1 waits P1 wakes up
P1 enters P1 tries to enter,
as !(not despite !(not full)
enqueued on E
full)
P1
P2
P2 tries to enter, P2 enters P2 inserts item,
enqueued on E sets !(not full)
Buffer
full not full full
C1
With signal-and-continue semantics,
C1 enters C1 removes item, must use while instead of if in case the
signals not full condition becomes false while waiting
Thread in monitor Thread waits for monitor Buffer has space - (not full)
Thread waits for condition Buffer is full - !(not full) 21
Signal-and-Continue example (2)
• Consider multiple producer-consumer threads
1. P1 enters. Buffer is full so blocks on queue for C
2. C1 enters.
3. P2 tries to enter; occupied, so queues on E
4. C1 continues, consumes, and signals C (“notfull”)
5. P1 unblocks; monitor occupied, so queues on E
6. C1 exits, allowing P2 to enter
7. P2 fills buffer, and exits monitor
8. P1 resumes and tries to add item – BUG!
• Hence must re-test condition:
i.e. while( (in - out) == N) wait(not full);
22
if() replaced with while() for conditions
24
Concurrency in practice
• Seen a number of abstractions for concurrency
control
– Mutual exclusion and condition synchronization
• Next let’s look at some concrete examples:
– POSIX pthreads (C/C++ API)
– FreeBSD kernels
– Java.
25
Example: pthreads (1)
• Standard (POSIX) threading API for C, C++, etc
• mutexes, condition variables, and barriers
• Mutexes are essentially binary semaphores:
26
Example: pthreads (2)
• Condition variables are Mesa-style:
int pthread_cond_init(pthread_cond_t *cond, ...);
int pthread_cond_wait(pthread_cond_t *cond,
pthread_mutex_t *mutex);
int pthread_cond_signal(pthread_cond_t *cond);
int pthread_cond_broadcast(pthread_cond_t *cond);
worker() {
while(!done) {
// do work for this round
pthread_barrier_wait(&B);
}
}
28
Example: FreeBSD kernel
• Kernel provides spin locks, mutexes, conditional variables,
reader-writer + read-mostly locks
– Semantics (roughly) modelled on POSIX threads
• A variety of deferred work primitives
• “Fully preemptive” and highly threaded
– (e.g., interrupt processing in threads)
– Interesting debugging tools
– such as DTrace, lock
– contention measurement,
• lock-order checking
• Further details are in 2019’s
lecture 8 ...
For modern C++ support, see https://en.cppreference.com/w/cpp/thread
29
Example: Java synchronization (1)
• Inspired by monitors – objects have intrinsic locks
• Synchronized methods:
• Synchronized statements:
public void myMethod() throws ...{
synchronized(this) {
// This code runs with the intrinsic lock held.
}}
30
Example: Java synchronization (2)
• Objects have condition variables for guarded blocks
• wait() puts the thread to sleep:
32
Parallel C++ Extensions: Cilk and OpenMP
• Cilk allowed a function call to be ‘spawned’ to another worker
and requires all the results to be ready at the ‘sync’ boundary.
• OpenMP embeds parallelisation suggestions in #pragma
directives.
// Cilk C/C++ // OpenMP C/C++
cilk int fib(int n) { double sum_array(double A[], int len)
if (n < 2) return n; { double sum = 0.0;
else #pragma omp parallel for
{ int x = spawn fib(n-1); for (int i = 0; i < len; i++)
int y = spawn fib(n-2); Sum += Normalise(a[i]);
sync;
return x + y; return sum;
} }
35
Summary + next time
• Multi-Reader Single-Writer (MRSW) locks
• Alternatives to semaphores/locks:
– Conditional critical regions (CCRs)
– Monitors
– Condition variables
– Signal-and-wait vs. signal-and-continue semantics
• Concurrency primitives in practice
• Concurrency primitives wrap-up
• Next time:
– Problems with concurrency: deadlock, livelock, priorities
– Resource allocation graphs; deadlock {prevention, detection, recovery}
– Priority and scheduling; priority inversion; (auto) parallelism limits.
36
Concurrent systems
Lecture 5: Liveness and Priority Guarantees
Dr David J Greaves
(Thanks to Dr Robert N. M. Watson)
1
Reminder from last time
• Multi-Reader Single-Writer (MRSW) locks
• Alternatives to semaphores/locks:
– Conditional critical regions (CCRs)
– Monitors
– Condition variables
– Signal-and-wait vs. signal-and-continue semantics
• Concurrency primitives in practice
• Concurrency primitives wrap-up
2
From last time: primitives summary
• Concurrent systems require means to ensure:
– Safety (mutual exclusion in critical sections), and
– Progress (condition synchronization)
• Spinlocks (busy wait); semaphores; CCRs and monitors
– Hardware primitives for synchronisation
– Signal-and-Wait vs. Signal-and-Continue
• Many of these are still used in practice
– Subtle minor differences can be dangerous
– Require care to avoid bugs – e.g., “lost wakeups”
• More detail on implementation in additional material on web page.
Deadlock!
Ra Rb Rc Rd
8
Resource allocation graphs (2)
• Can generalize to resources which can have K
distinct users (c/f semaphores)
• Absence of a cycle means no deadlock…
– but presence only means may encounter deadlock, e.g.
T1 T2 T3 T4
Resource in
quantity 1
Ra(1) Rb(2) Rc(2) Rd(1)
11
Deadlock Static Prevention
1. Mutual Exclusion: resources have bounded #owners
– Could always allow access… but probably unsafe ;-(
– However can help e.g. by using MRSW locks
2. Hold-and-Wait: can get Rx and wait for Ry
– Require that we request all resources simultaneously; deny the
request if any resource is not available now
– But must know maximal resource set in advance = hard?
3. No Preemption: keep Rx until you release it
– Stealing a resource generally unsafe (but see later)
4. Circular Wait: cyclic dependency
– Impose a partial order on resource acquisition
– Can work: but requires programmer discipline
– Lock order enforcement rules used in many systems e.g., FreeBSD
WITNESS – static and dynamic orders checked 12
Example: Dining Philosophers
• 5 philosophers, 5 forks, round table…
Semaphore forks[] = new Semaphore[5];
while(true) { // philosopher i
think();
wait(fork[i]);
wait(fork[(i+1) % 5];
eat();
signal(fork[i]);
signal(fork[(i+1) % 5];
}
while(true) { // philosopher i
think();
first = MIN(i, (i+1) % 5);
second = MAX(i, (i+1) % 5);
wait(fork[first]);
wait(fork[second];
eat();
signal(fork[second]);
signal(fork[first]);
}
14
Deadlock Dynamic Avoidance
• Prevention aims for deadlock-free “by design”
• Deadlock avoidance is a dynamic scheme:
– Assumption: We know maximum possible resource allocation
for every process / thread
– Assumption: A process granted all desired resources will
complete, terminate, and free its resources
– Track actual allocations in real-time
– When a request is made, only grant if guaranteed no deadlock
even if all others take max resources
• E.g. Banker’s Algorithm
– Not really useful in general as need a priori knowledge of
#processes/threads, and their max resource needs.
15
Deadlock detection (anticipation)
• Deadlock detection is a dynamic scheme that determines if deadlock
exists (or would exist if we granted a request)
– Principle: At a some moment in execution, examine resource allocations and
graph
– Determine if there is at least one plausible sequence of events in which all
threads could make progress
– I.e., check that we are not in an unsafe state in which no further sequences can
complete without deadlock
• When only a single instance of each resource, can explicitly check for a
cycle:
– Keep track which object each thread is waiting for
– From time to time, iterate over all threads and build the resource allocation
graph
– Run a cycle detection algorithm on graph O(n2)
• Or use Banker’s Alg if have multi-instance resources (more difficult)
16
Banker’s Algorithm (1)
• Have m distinct resources and n threads
• V[0:m-1], vector of currently available resources
• A, the m x n resource allocation matrix, and
R, the m x n (outstanding) request matrix
– Ai,j is the number of objects of type j owned by i
– Ri,j is the number of objects of type j needed by i
• Proceed by successively marking rows in A for
threads that are not part of a deadlocked set
– If we cannot mark all rows of A we have deadlock
Optimistic assumption: if we can fulfill thread i’s request Ri, then it will run
17
to completion and release held resources for other threads to allocate.
Banker’s Algorithm (2)
• Mark all zero rows of A (since a thread holding zero
resources can’t be part of deadlock set)
• Initialize a working vector W[0:m-1] to V
– W[] describes any free resources at start, plus any
resources released by a hypothesized sequence of
satisfied threads freeing and terminating
• Select an unmarked row i of A s.t. R[i] <= W
– (i.e. find a thread who’s request can be satisfied)
– Set W = W + A[i]; mark row i, and repeat
• Terminate when no such row can be found
– Unmarked rows (if any) are in the deadlock set 18
Banker’s Algorithm: Example 1
• Five threads and three resources (none free)
A R V W
X Y Z X Y Z X Y Z X Y Z
T0 0 1 0 0 0 0 0 0 0 0 2
3
5
7 0 5
1 0
3
4
T1 2 0 0 2 0 2
T2 3 0 3 0 0 0
T3 2 1 1 1 0 0
T4 0 0 1 0 0 2
X Y Z X Y Z X Y Z X Y Z
T0 0 1 0 0 0 0 0 0 0 0 1
0 0
T1 2 0 0 2 0 2
T2 3 0 3 0 0 1
Cannot find a row in
T3 2 1 1 1 0 0
R <= W!!
T4 0 0 1 0 0 2
Example for parallel speedup. 35 units of work run across four servers, showing
data dependency arcs as typically found. Arcs implicitly exist between all
adjacent work unit boxes.
public static int associative_reduction_example(int starting) Map reduce style works nicely:
{ int vr = 0; - Map: a function or expression is
for (int i=0;i<15;i++) // or also i+=4 applied at each index point or for
{ int vx = (i+starting)*(i+3)*(i+5); // Mapped computation each member of a set.
vr ^= ((vx&128)>0 ? 1:0); // Associative reduction - Reduce: an associative operator
} (xor) joins up all of the results using an
return vr; } arbitrary tree structure.
• Next time:
– Concurrency without shared data
– Active objects; message passing
– Composite operations; transactions
– ACID properties; isolation; serialisability 31
Concurrent systems
Lecture 6: Concurrency without shared data, composite operations
and transactions, and serialisability
Dr David J Greaves
(Thanks to Dr Robert N. M. Watson)
1
Reminder from last time
• Liveness properties
• Deadlock (requirements; resource allocation graphs; detection;
prevention; recovery)
• The Dining Philosophers
• Priority inversion
• Priority inheritance
Concurrency is so hard!
If only there were some way that programmers could accomplish useful concurrent
computation without…
Observation: the code of exactly one thread, and the data that
only it accesses, effectively experience mutual exclusion 5
Producer-Consumer in Ada
task-body ProducerConsumer is
... Clause is active only
loop when condition is true
SELECT
when count < buffer-size ACCEPT dequeues a
ACCEPT insert(item) do client request and
// insert item into buffer performs the operation
end;
count++;
or Single thread: no need
when count > 0 for mutual exclusion
ACCEPT consume(item) do
// remove item from buffer
end; Non-deterministic choice
count--; between a set of
end SELECT guarded ACCEPT clauses
end loop
6
Message passing
• Dynamic invocations between threads can be thought of
as general message passing
– Thread X can send a message to Thread Y
– Contents of message can be arbitrary data values
• Can be used to build Remote Procedure Call (RPC)
– Message includes name of operation to invoke along with as
any parameters
– Receiving thread checks operation name, and invokes the
relevant code
– Return value(s) sent back as another message
• (Called Remote Method Invocation (RMI) in Java)
We will discuss message passing and RPC in detail 2nd half; a taster
now, as these ideas apply to local, not just distributed, systems. 7
Message passing semantics
• Can conceptually view sending a message to be similar to
sending an email:
1. Sender prepares contents locally, and then sends
2. System eventually delivers a copy to receiver
3. Receiver checks for messages
• In this model, sending is asynchronous:
–.Sender doesn’t need to wait for message delivery
–.(but they may, of course, choose to wait for a reply)
–.Bounded FIFO may ultimately apply sender back pressure
• Receiving is also asynchronous:
–.messages first delivered to a mailbox, later retrieved
–.message is a copy of the data (i.e. no actual sharing)
8
Synchronous Message Passing
• FSM view: both (all) participating FSMs execute the message passing primitive
simultaneously.
• Send and receive operations must be part of edge guard (before the slash).
9
Asynchronous Message Passing
14
Message passing: summary
• A way of sidestepping (at least some of) the issues with
shared memory concurrency
– No direct access to data => no data race conditions
– Threads choose actions based on message
• Explicit message passing can be awkward
– Many weird and wonderful languages ;-)
• Can also use with traditional languages, e.g.
– Transparent messaging via RPC/RMI
– Scala, Kilim (actors on Java), Bastion for Rust, …
We have eliminated some of the issues associated with shared memory, but
these are still concurrent programs subject to deadlock, livelock, etc. 15
Composite operations
• So far have seen various ways to ensure safe concurrent access to
a single object
– e.g. monitors, active objects, message passing
• More generally want to handle composite operations:
– i.e. build systems which act on multiple distinct objects
• As an example, imagine an internal bank system which allows
account access via three method calls:
int amount = getBalance(account);
bool credit(account, amount);
bool debit(account, amount);
16
Composite operations
• Consider two concurrently executing client threads:
– One wishes to transfer 100 quid from the savings account to the
current account
– The other wishes to learn the combined balance
// thread 1: transfer 100 // thread 2: check balance
// from savings->current s = getBalance(savings);
debit(savings, 100); c = getBalance(current);
credit(current, 100); tot = s + c;
17
Problems with composite operations
Two separate kinds of problem here:
1. Insufficient Isolation
– Individual operations being atomic is not enough
– E.g., want the credit & debit making up the transfer to
happen as one operation
– Could fix this particular example with a new transfer()
method, but not very general ...
2. Fault Tolerance
– In the real-word, programs (or systems) can fail
– Need to make sure we can recover safely
18
Transactions
• Want programmer to be able to specify that a set of operations should
happen atomically, e.g.
// transfer amt from A -> B
transaction {
if (getBalance(A) > amt) {
debit(A, amt);
credit(B, amt);
return true;
} else return false;
}
19
ACID Properties
Want committed transactions to satisfy four properties:
• Atomicity: either all or none of the transaction’s operations are performed
– Programmer doesn’t need to worry about clean up
• Consistency: a transaction transforms the system from one consistent
state to another – i.e., preserves invariants
– Programmer must ensure e.g. conservation of money
• Isolation: each transaction executes [as if] isolated from the concurrent
effects of others
– Can ignore concurrent transactions (or partial updates)
• Durability: the effects of committed transactions survive subsequent
system failures
– If system reports success, must ensure this is recorded on disk
21
Isolation
• To ensure a transaction executes in isolation could just have a
server-wide lock… simple!
// transfer amt from A -> B
transaction { // acquire server lock
if (getBalance(A) > amt) {
debit(A, amt);
credit(B, amt);
return true;
} else return false;
} // release server lock
28
History graphs: bad schedules
T1: START S.getBalance C.getBalance COMMIT
Isolation – serialisability
T1: S.getBalance C.getBalance
T2: S.debit C.credit
The transaction system must ensure that, regardless of any actual concurrent execution used to improve
performance, only results consistent with serialisable orderings are visible to the transaction programmer.
30
Summary + next time
• Concurrency without shared data (Active Objects)
• Message passing, actor model (Occam, Erlang)
• Composite operations; transactions; ACID properties
• Isolation and serialisability
• History graphs; good (and bad) schedules
31
Concurrent systems
Lecture 7: Isolation vs. Strict Isolation,
2-Phase Locking (2PL), Time Stamp Ordering (TSO), and
Optimistic Concurrency Control (OCC)
Dr David J Greaves
(Thanks to Dr Robert N. M. Watson)
1
Reminder from last time
• Concurrency without shared data
– Active objects
• Message passing; the actor model
– Occam, Erlang
• Composite operations
– Transactions, ACID properties
– Isolation and serialisability
• History graphs; good (and bad) schedules
2
Last time: isolation – serialisability
• The idea of executing transactions serially (one after the other) is a useful model
– We want to run transactions concurrently
– But the result should be as if they ran serially
• Consider two transactions, T1 and T2
T1 transaction { T2 transaction {
s = getBalance(S); debit(S, 100);
c = getBalance(C); credit(C, 100);
return (s + c); return true;
} }
• If assume individual operations are atomic, then there are six possible
ways the operations can interleave…
Isolation allow transaction programmers to reason about the interactions between transactions trivially:they appear to execute in serial.
Transaction systems execute transactions concurrently for performance and rely on the definition of serialisability
to decide if an actual execution schedule is allowable.
3
From last lecture
Isolation – serialisability
T1: S.getBalance C.getBalance
T2: S.debit C.credit
8
Two-phase locking (2PL)
• Associate a lock with every object
– Could be mutual exclusion, or MRSW
• Transactions proceed in two phases:
– Expanding Phase: during which locks are acquired but
none are released
– Shrinking Phase: during which locks are released, and no
further are acquired
• Operations on objects occur in either phase,
providing appropriate locks are held
– Guarantees serializable execution
9
2PL example
Acquire a read lock
// transfer amt from A -> B (shared) before ‘read’ A
transaction {
readLock(A);
Upgrade to a write lock
if (getBalance(A) > amt) {
(exclusive) before write A
Expanding writeLock(A);
Phase debit(A, amt); Acquire a write lock
writeLock(B); (exclusive) before write B
credit(B, amt);
writeUnlock(B);
addInterest(A); Release locks when done
Shrinking writeUnlock(A); to allow concurrency
Phase tryCommit(return=true);
} else {
readUnlock(A);
tryCommit(return=false);
}
10
Problems with 2PL
• Requires knowledge of which locks required:
– Complexity arises if complex control flow inside a transaction
– Some transactions look up objects dynamically
– But can be automated in many systems
– User may declare affected objects statically to assist checker tool or have built-in
mechanisms in high-level language (HLL) compilers.
●
Risk of deadlock:
– Can attempt to impose a partial order
– Or can detect deadlock and abort, releasing locks
– (this is safe for transactions due to rollback, which is nice)
• Non-Strict Isolation: releasing locks during execution means others can
access those objects
– e.g. T1 updates B, then releases write lock; now T2 can read or overwrite the
uncommitted value
– Hence T2’s fate is tied to T1 (whether commit or abort)
– Can fix with strict 2PL: hold all locks until transaction end
11
Strict(er) 2PL example
// transfer amt from A -> B
transaction {
readLock(A);
if (getBalance(A) > amt) {
Expanding writeLock(A);
Phase debit(A, amt);
writeLock(B);
credit(B, amt);
addInterest(A);
tryCommit(return=true); Retain lock on B here to
} else { ensure strict isolation
readUnlock(A);
Unlock All tryCommit(return=false);
Phase } on commit, abort {
unlock(A);
unlock(B);
} By holding locks longer, Strict
2PL risks greater contention
12
2PL: rollback
• Recall that transactions can abort
– Could be due to run-time conflicts (non-strict 2PL), or
could be programmed (e.g. on an exception)
• Using locking for isolation works, but means that
updates are made ‘in place’
– i.e. once acquire write lock, can directly update
– If transaction aborts, need to ensure no visible effects
• Rollback is the process of returning the world to
the state it in was before the transaction started
– I.e., to implement atomicity: all happened, or none.
13
Why might a transaction abort?
• Some failures are internal to transaction systems:
– Transaction T2 depends on T1, and T1 aborts
– Deadlock is detected between two transactions
– Memory is exhausted or a system error occurs
• Some are programmer-triggered:
– Transaction self-aborted – e.g., debit() failed due to
inadequate balance
• Some failures must be programmer visible
• Others may simply trigger retry of the transaction
14
Implementing rollback: undo
• One strategy is to undo operations, e.g.
– Keep a log of all operations, in order: O1, O2, .. On
– On abort, undo changes of On, O(n-1), .. O1
• Must know how to undo an operation:
– Assume we log both operations and parameters
– Programmer can provide an explicit counter action
• UNDO(credit(A, x) ⇒ debit(A, x));
• May not be sufficient (e.g. setBalance(A, x))
– Would need to record previous balance, which we may
not have explicitly read within transaction…
15
Implementing rollback: copy
• A more brute-force approach is to take a copy of an
object before [first] modification
– On abort, just revert to original copy
• Has some advantages:
– Doesn’t require programmer effort
– Undo is simple, and can be efficient (e.g. if there are many
operations, and/or they are complex)
• However can lead to high overhead if objects are
large … and may not be needed if don’t abort!
– Can reduce overhead with partial copying
16
Timestamp ordering (TSO)
• 2PL and Strict 2PL are widely used in practice
– But can limit concurrency (certainly the latter)
– And must be able to deal with deadlock
• Time Stamp Ordering (TSO) is an alternative approach:
– As a transaction begins, it is assigned a timestamp – the proposed
eventual (total) commit order / serialisation
– Timestamps are comparable, and unique (can think of as e.g. current
time – or a logical incrementing number)
– Every object O records the timestamp of the last transaction to
successfully access (read? write?) it: V(O)
– T can access object O iff V(T) >= V(O), where V(T) is the timestamp of T
(otherwise rejected as “too late”)
– If T is non-serialisable with timestamp, abort and roll back
Timestamps allow us to explicitly track new “happens-before”
edges, detecting (and preventing) violations 17
TSO example 1
T1 transaction { T2 transaction {
s = getBalance(S); debit(S, 100);
c = getBalance(C); credit(C, 100);
return = s + c; return true;
} }
23
Implementing OCC (2)
• NB: There are many approaches following this basic technique.
• Various efficient schemes for shadowing
– e.g. write buffering, page-based copy-on-write.
• All complexity resides in the two-step validator that must reflect a
serialisable commit order in its ultimate side effects.
• Read validation:
– Must ensure that all versions of data read by T (all shadows) were valid at some
particular time t
– This becomes the tentative start time for T
• Serialisability validation:
– Must ensure that there are no conflicts with any committed transactions which
have a later start time
• Optimality matching:
– For a batch, must choose a serialisation that commits as many as possible, possibly
weighted on other heuristic, such as success for those rejected last attempt.
24
OCC example (1)
• Validator keeps track of last k validated transactions,
their timestamps, and the objects they updated
Transaction Validation Timestamp Objects Updated Writeback Done?
T5 10 A, B, C Yes
T6 11 D Yes
T7 12 A, E No
27
Isolation & concurrency: Summary
• 2PL explicitly locks items as required, then releases
– Guarantees a serializable schedule
– Strict 2PL avoids cascading aborts
– Can limit concurrency & prone to deadlock
• TSO assigns timestamps when transactions start
– Cannot deadlock, but may miss serializable schedules
– Suitable for distributed/decentralized systems.
• OCC executes with shadow copies, then validates
– Validation assigns timestamps when transactions end
– Lots of concurrency & admits many serializable schedules
– No deadlock but potential livelock when contention is high.
• Differing tradeoffs between optimism, concurrency, but also potential starvation,
livelock, and deadlock.
• Ideas like TSO/OCC will recur in Distributed Systems.
28
Summary + next time
• History graphs; good (and bad) schedules
• Isolation vs. strict isolation; enforcing isolation
• Two-phase locking; rollback
• Timestamp ordering (TSO)
• Optimistic concurrency control (OCC)
• Isolation and concurrency summary
• Next time:
– Transactional durability: crash recovery and logging,
– Lock-free programming,
– Transactional memory (if time permits).
29
Concurrent systems
Lecture 8a: Durability & crash recovery.
Lecture 8b: lock-free programming & transactional memory.
Dr David J Greaves
(Thanks to Dr Robert N. M. Watson)
1
Reminder from last time
2
This time
• Transaction durability: crash recovery, logging
– Write-ahead logging
– Checkpoints
– Recovery and Rollback
• Advanced topics (as time permits)
– Lock-free programming
– Transactional memory
3
Crash Recovery & Logging
• Transactions require ACID properties
– So far have focused on I (and implicitly C).
• How can we ensure Atomicity & Durability?
– Need to make sure that a transaction is always done entirely or
not at all
– Need to make sure that a transaction reported as committed
remains so, even after a crash.
• Consider for now a fail-stop model:
– If system crashes, all in-memory contents are lost
– Data on disk, however, remains available after reboot
The small print: we must keep in mind the limitations of fail-stop, even as we assume it.
Failing hardware/software do weird stuff. Pay attention to hardware price differentiation.
4
Using persistent storage
• Simplest “solution”: write all updated objects to disk on
commit, read back on reboot
– Doesn’t work, since crash could occur during write
– Can fail to provide Atomicity and/or Consistency
• Instead split update into two stages
1. Write proposed updates to a write-ahead log
2. Write actual updates
• Crash during #1 => no actual updates done
• Crash during #2 => use log to redo, or undo.
• Recall transactions can also abort (and cascading aborts), so log
can help undo the changes made.
5
Write-ahead logging
• Log: an ordered, append-only file on disk
• Contains entries like <txid, obj, op, old, new>
– ID of transaction, object modified, (optionally) the operation
performed, the old value and the new value
– This means we can both “roll forward” (redo operations) and
“rollback” (undo operations)
• When persisting a transaction to disk:
– First log a special entry <txid, START>
– Next log a number of entries to describe operations
– Finally log another special entry <txid, COMMIT>
• We build composite-operation atomicity from fundamental atomic
operation single-sector write.
– Much like building high-level primitives over LL/SC or CAS!
6
Using a write-ahead log
• When executing transactions, perform updates to objects in memory with
lazy write back
– I.e. the OS will normally delay all disk writes to improve efficiency.
• Invariant: write log records before corresponding data.
• But when wish to commit a transaction, must first synchronously flush a
commit record to the log
– Assume there is a fsync() or fsyncdata() operation or similar which allows us
to force data out to disk.
– Only report transaction committed when fsync() returns.
• Can improve performance by delaying flush until we have a number of
transaction to commit - batching
– Hence at any point in time we have some prefix of the write-ahead log on disk,
and the rest in memory.
7
The Big Picture
RAM acts as a cache of disk Log conceptually infinite,
(e.g. no in-memory copy of z) and spans RAM & Disk
T2 T2: REDO
T3 T3: UNDO
Active at checkpoint.
Has since committed; T4: REDO
T4
and record in log.
T5 T5: UNDO
Active at checkpoint;
Not active at checkpoint.
in progress at crash. Not active at checkpoint,
But has since committed,
and still in progress. 10
and commit record in log.
Recovery algorithm
• Initialize undo set U = { set of active txactions }
• Also have redo set R, initially empty.
• Walk log forward as indicated by checkpoint record:
– If see a START record, add transaction to U
– If see a COMMIT record, move transaction from U->R
• When hit end of log, perform undo:
– Walk backward and undo all records for all Tx in U
• When reach checkpoint timestamp again, Redo:
– Walk forward, and re-do all records for all Tx in R
• After recovery, we have effectively checkpointed
– On-disk store is consistent, so can (generally) truncate the log.
12
Transactions: Summary
• Standard mutual exclusion techniques not programmer friendly
when dealing with >1 object
– intricate locking (& lock order) required, or
– single coarse-grained lock, limiting concurrency
• Transactions allow us a better way:
– potentially many operations (reads and updates) on many objects, but
should execute as if atomically
– underlying system deals with providing isolation, allowing safe
concurrency, and even fault tolerance!
• Appropriate only if operations are “transactional”
– E.g., discrete events in time, as must commit to be visible
• Transactions are used both in databases and filesystems.
13
Advanced Topics
• Will briefly look at two advanced topics
– lock-free data structures, and
– transactional memory
• Then, next time, Distributed Systems
14
Lock-free programming
• What’s wrong with locks?
– Difficult to get right (if locks are fine-grained)
– Don’t scale well (if locks too coarse-grained)
– Don’t compose well (deadlock!)
– Poor cache behavior (e.g. convoying)
– Priority inversion
– And can be expensive
• Lock-free programming involves getting rid of locks ... but not at the cost
of safety!
• Recall TAS, CAS, LL/SC from our early lecture: what if we used them to
implement something other than locks?
15
Assumptions
• We have a cache-consistent shared-memory system (and we
understand the sequential consistency model)
• Low-level (assembly instructions) include:
val = read(addr); // atomic read from memory
(void) write(addr, val); // atomic write to memory
done = CAS(addr, old, new); // atomic compare-and-swap
• Compare-and-Swap (CAS) is atomic
• Reads value of addr (‘val’), compares with ‘old’, and updates
memory to ‘new’ iff old==val -- without interruption.
• Something like this instruction common on most modern
processors (e.g. cmpxchg on x86 – or LL/SC on RISC)
• Typically used to build spinlocks (or mutexes, or semaphores,
or whatever...)
16
Lock-free approach
• Directly use CAS to update shared data
• For example, consider a lock-free linked list of integers
– list is singly linked, and sorted
– Use CAS to update pointers
– Handle CAS failure cases (i.e., races)
• Represents the ‘set’ abstract data type, i.e.
– find(int) -> bool The delete() operation is
left as an example for you
– insert(int) -> bool this year.
– delete(int) -> bool
• Return values required as operations may fail, requiring retry (typically in
a loop)
• Assumption: hardware supports atomic operations on pointer-size types.
• Assumption: Full sequential consistency (or fences used as needed).
17
Searching a sorted list
• find(20):
20?
H 10 30 T
18
Inserting an item with a simple store
• insert(20):
30 20
H 10 30 T
20
30 20
30 25
H 10 30 T
20
25
20
Concurrent find+insert
• find(20) -> false • insert(20) -> true
20?
H 10 30 T
20
(One issue with lock free programming is that it sometimes relies on a change
being reflected through a pointer having a different value. So as store is
reclaimed, we should sometimes quarantine recently-used memory to stop a
change becoming invisible – the so-called ABA problem.) 21
Concurrent find+insert
• find(20) -> false • insert(20) -> true
...but this thread
This thread saw 20
succeeded in putting
was20?not in the set...
it in!
H 10 30 T
20
23
(S/W) Transactional Memory (TM)
• Based on optimistic concurrency control.
• Instead of: lock(&sharedx_mutex);
sharedx[i] *= sharedx[j] + 17;
unlock(&sharedx_mutex);
Use: atomic {
sharedx[i] *= sharedx[j] + 17;
}
Has “obvious” semantics, i.e. all operations within block
occur as if atomically
Transactional since under the hood it looks like:
credit(a, x) = atomic {
setbal(a, readbal(a) + x);
}
debit(a, x) = atomic {
setbal(a, readbal(a) - x);
}
transfer(a, b, x) = atomic {
debit(a, x);
credit(b, x);
25
}
TM advantages
• Cannot deadlock:
– No locks, so don’t have to worry about locking order
– (Though may get live lock if not careful)
• No races (mostly):
– Cannot forget to take a lock (although you can forget to put
atomic { } around your critical section ;-))
• Scalability:
– High performance possible via OCC
– No need to worry about complex fine-grained locking
• There remains a simplicity vs. performance tradeoff
– Too much atomic {} and implementation can’t find concurrency.
Too little, and errors arise from poor interleaving.
26
TM is very promising…
• Essentially does ‘ACI’ but no D
– no need to worry about crash recovery
– can work entirely in memory
– can be implemented in HLL, VM or hardware (S/W v H/W TM)
– some hardware support emerging (take 1)
– some hardware support emerging (take 2)
• Last decade, both x86 and Arm offered direct support for transactions
using augmented cache protocols
– … And promptly withdrawn in errata
– Now back on the street again
– Security vulnerabilities (timing attacks and the like)?
• But not a panacea
– Contention management can get ugly (lack of parallel speedup)
– Difficulties with irrevocable actions / side effects (e.g. I/O)
– Still working out exact semantics (type of atomicity, handling exceptions, signalling,
...)
27
Supervision questions + exercises
This slide likely out-of-date.
• Supervision questions See web site.
– CS0, CS1: - get started, for discussion with supervsisors
– CS2: Threads and synchronisation
• Semaphores, priorities, and work distribution
– CS3: Transactions
• ACID properties, 2PL, TSO, and OCC
– CS4: Miscellaneous: lock free, load balancing, work stealing.
• See also the optional Java practical exercises
– Java concurrency primitives and fundamentals
– Threads, synchronisation, guarded blocks, producer-
consumer, and data races.
28
Concurrent systems: summary
• Concurrency is essential in modern systems
– overlapping I/O with computation,
– exploiting multi-core,
– building distributed systems.
• But throws up a lot of challenges
– need to ensure safety, allow synchronization, and avoid issues of
liveness (deadlock, livelock, ...)
• Major risks of bugs and over-engineering
– generally worth running as a sequential system first,
– too much locking leads to too much serial execution,
– and worth using existing libraries, tools and design patterns rather
than rolling your own!
29
Summary + next time
• Transactional durability: crash recovery and logging
– Write-ahead logging; checkpoints; recovery.
• Advanced topics
– Lock-free programming
– Transactional memory.
• Notes on supervision exercises.
30