0% found this document useful (0 votes)
24 views27 pages

13-Conc Bugs

Here are a few ways to fix the code: 1. Impose a lock ordering - Always acquire lock A before lock B. 2. Use a single lock to protect both resources. 3. Don't take multiple locks at the same time. Release lock A before taking lock B in Thread 1. 4. Use trylock() to avoid blocking - Thread 1 tries lock B, if it fails it releases lock A and yields. By enforcing a lock ordering, using a single lock, or not holding multiple locks concurrently, we break the circular wait condition and prevent deadlock from occurring.

Uploaded by

chandreshpatel16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views27 pages

13-Conc Bugs

Here are a few ways to fix the code: 1. Impose a lock ordering - Always acquire lock A before lock B. 2. Use a single lock to protect both resources. 3. Don't take multiple locks at the same time. Release lock A before taking lock B in Thread 1. 4. Use trylock() to avoid blocking - Thread 1 tries lock B, if it fails it releases lock A and yields. By enforcing a lock ordering, using a single lock, or not holding multiple locks concurrently, we break the circular wait condition and prevent deadlock from occurring.

Uploaded by

chandreshpatel16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Fall 2017 :: CSE 306

Concurrency
Bugs
Nima Honarmand
(Based on slides by Prof. Andrea Arpaci-Dusseau)
Fall 2017 :: CSE 306

Concurrency Bugs are Serious


The Therac-25 incident (1980s)

“The accidents occurred when the high-power electron beam


was activated instead of the intended low power beam, and
without the beam spreader plate rotated into place. Previous
models had hardware interlocks in place to prevent this, but
Therac-25 had removed them, depending instead on software
interlocks for safety. The software interlock could fail due to a
race condition.”

“…in three cases, the injured patients later died.”


Source: en.wikipedia.org/wiki/Therac-25
Fall 2017 :: CSE 306

Concurrency Bugs are Serious (2)


Northeast blackout of 2003

“The Northeast blackout of 2003 was a widespread power outage


that occurred throughout parts of the Northeastern and Midwestern
United States and the Canadian province of Ontario on Thursday,
August 14, 2003, just after 4:10 p.m. EDT.”

The blackout's primary cause was a bug in the alarm system... The
lack of an alarm left operators unaware of the need to re-distribute
power after overloaded transmission lines hit unpruned foliage,
triggering a "race condition" in the energy management system…
What would have been a manageable local blackout cascaded into
massive widespread distress on the electric grid.”

Source: en.wikipedia.org/wiki/Northeast_blackout_of_2003
Fall 2017 :: CSE 306

Concurrency Study from 2008

For four major projects, search for concurrency bugs


among > 500K bug reports. Analyze small sample to
identify common types of concurrency bugs.
Source: Lu et. al, “Learning from mistakes — a comprehensive study
on real world concurrency bug characteristics”
Fall 2017 :: CSE 306

Atomicity Violation Bugs


“The desired serializability among multiple memory
accesses is violated (i.e. a code region is intended to be
atomic, but the atomicity is not enforced during
execution)”
MySQL Example
Thread 1 Thread 2
if (thd->proc_info) { thd->proc_info = NULL;

fputs(thd->proc_info, …);

}

• What’s wrong?
• How to fix?
• Use a lock
Fall 2017 :: CSE 306

Ordering Violation Bugs


“The desired order between two (groups of) memory
accesses is flipped (i.e., A should always be executed
before B , but the order is not enforced during
execution)”
Mozilla Example
Thread 1 Thread 2
void init() { void mMain(…) {
… …
mThread = mState = mThread->State;
PR_CreateThread(mMain, …); …
… }

• What’s wrong?
}

• How to fix?
• Use a condition variable
Fall 2017 :: CSE 306

Ordering Violation Bugs (2)


Thread 1 Thread 2
void init() { void mMain(…) {
… …
mThread = mutex_lock(&mtLock);
PR_CreateThread(mMain, …); while (mtInit == 0)
mutex_lock(&mtLock); cond_wait(&mtCond, &mtLock);
mtInit = 1; mutex_unlock(&mtLock);
cond_signal(&mtCond);
mutex_unlock(&mtLock); mState = mThread->State;
… …
} }

• Why are we using a new flag (mtInit) instead of


mThread itself?
Fall 2017 :: CSE 306

Fixing Concurrency Bugs: Easy?


• If all we had to do was adding locks and cond vars,
concurrent programming would be quite simple

• Problems?

1) Adding too many locks increase the danger of


deadlocks

2) How about having just a few big locks then?


• Causes performance problems because it reduces
concurrency
Fall 2017 :: CSE 306

Locking Granularity
• Coarse-grain locking
• Have one (or a few) locks that protect all (or big chunks) of shared
state
• Example: early Linux’s BKL (Big Kernel Lock)
• One big lock protecting all kernel data
• Only one processor code execute kernel code at any point of time; others
would have to wait
• Significant contention over big locks → hurts performance

• Fine-grain locking
• Have many small locks, each protecting one (or a few) objects
• Reduces contention → better performance
• Increases deadlock risk
Fall 2017 :: CSE 306

Deadlock Bugs
• Deadlock: No progress can be made because two or
more threads are waiting for the other to take
some action and thus neither ever does

• Could arise when we need to coordinate access to


more than one shared resources
• Means we need to grab and hold multiple locks
simultaneously
Fall 2017 :: CSE 306

Deadlock Theory
• Deadlocks can only occur when all
four conditions are true:
1) Mutual exclusion

STOP
STOP
2) Hold-and-wait
B
3) Circular wait A
4) No preemption D
C

STOP
• Eliminate deadlock by eliminating STOP

any one condition


Fall 2017 :: CSE 306

1) Mutual Exclusion
• Definition: “Threads claim exclusive control of
resources that they require (e.g., thread grabs a lock)”

• Strategy: eliminate locks


• Try to use atomic instructions instead

Concurrent Counter Example


Code with locks Code with Compare-and-Swap (CAS)
void add (int *val, int amt) void add (int *val, int amt)
{ {
mutex_lock(&m); do {
*val += amt; int old = *value;
mutex_unlock(&m); } while(!CAS(val, old, old+amt));
} }
Fall 2017 :: CSE 306

Example: Lock-Free Linked List Insert


Code with locks Code with Compare-and-Swap (CAS)
void insert (int val) void insert (int val)
{ {
node_t *n = node_t *n = malloc(sizeof(*n));
malloc(sizeof(*n)); n->val = val;
n->val = val; do {
mutex_lock(&m); n->next = head;
n->next = head; } while (!CAS(&head, n->next, n));
head = n; }
mutex_unlock(&m);
}
Fall 2017 :: CSE 306

2) Hold-and-Wait
• Definition: “Threads hold resources allocated to them
(e.g., locks they have already acquired) while waiting
for additional resources (e.g., locks they wish to
acquire).”
• Strategy: release currently held resources when waiting
for new ones
Example with trylock
top:
pthread_mutex_lock(A);
if (pthread_mutex_trylock(B) != 0)
{
pthread_mutex_unlock(A);
goto top;
}

Fall 2017 :: CSE 306

Problem w/ This Strategy


• Potential for Livelock: no process makes forward
progress, but the state of involved processes
constantly changes
• Can happen if all processes release resources and
then try to re-acquire, fail, and keep doing this
• Classic solution: back-off techniques
• Random back-off: wait for a random amount of time
before retrying
• Exponential back-off: wait for exponentially increasing
amount of time before retrying
Fall 2017 :: CSE 306

3) Circular Wait
• Definition: “There exists a circular chain of threads such
that each thread holds a resource (e.g., lock) being
requested by next thread in the chain.”

• Usually the easiest deadlock requirement to attack

• Strategy: impose a well-documented order of acquiring


locks
• Decide which locks should be acquired before others
• If A before B, never acquire A if B is already held!
• Document this, and write code accordingly

• Works well if system has distinct layers


Fall 2017 :: CSE 306

Simple Example
Thread 1 Thread 2
lock(&A); lock(&B);
lock(&B); lock(&A);

How would you fix this code?

Thread 1 Thread 2
lock(&A); lock(&A);
lock(&B); lock(&B);
Fall 2017 :: CSE 306

Example: mm/filemap.c lock ordering


/*
* Lock ordering:
* ->i_mmap_lock (vmtruncate)
* ->private_lock (__free_pte->__set_page_dirty_buffers)
* ->swap_lock (exclusive_swap_page, others)
* ->mapping->tree_lock
* ->i_mutex
* ->i_mmap_lock (truncate->unmap_mapping_range)
* ->mmap_sem
* ->i_mmap_lock
* ->page_table_lock or pte_lock (various, mainly in memory.c)
* ->mapping->tree_lock (arch-dependent flush_dcache_mmap_lock)
* ->mmap_sem
* ->lock_page (access_process_vm)
* ->mmap_sem
* ->i_mutex (msync)
* ->i_mutex
* ->i_alloc_sem (various)
* ->inode_lock
* ->sb_lock (fs/fs-writeback.c)
* ->mapping->tree_lock (__sync_single_inode)
* ->i_mmap_lock
* ->anon_vma.lock (vma_adjust)
* ->anon_vma.lock
* ->page_table_lock or pte_lock (anon_vma_prepare and various)
* ->page_table_lock or pte_lock
* ->swap_lock (try_to_unmap_one)
* ->private_lock (try_to_unmap_one)
* ->tree_lock (try_to_unmap_one)
* ->zone.lru_lock (follow_page->mark_page_accessed)
. . .

19
Fall 2017 :: CSE 306

Encapsulation Makes Ordering Difficult


• Encapsulation, and emphasis on code modularity, make
things difficult
• Can’t control the order in which locks are acquired when we
calling a function in another module

• What could go wrong in this code?


set_t *intersect(set_t *s1, set_t *s2)
{
Deadlock possible if one
set_t *rv = malloc(sizeof(*rv));
mutex_lock(&s1->lock);
mutex_lock(&s2->lock); thread calls
for(int i=0; i<s1->len; i++) { intersect(s1, s2)
if(set_contains(s2, s1->items[i]) and another thread
set_add(rv, s1->items[i]);
mutex_unlock(&s2->lock);
intersect(s2, s1)
mutex_unlock(&s1->lock);
}
Fall 2017 :: CSE 306

One Possible Solution


• Acquire the locks in the order of their virtual
addresses when possible
set_t *intersect(set_t *s1, set_t *s2) {
set_t *rv = malloc(sizeof(*rv));
if ((uint)&s1->lock < (uint)&s2->lock) {
mutex_lock(&s1->lock);
mutex_lock(&s2->lock);
} else {
mutex_lock(&s2->lock);
mutex_lock(&s1->lock);
}
for(int i=0; i<s1->len; i++) {
if(set_contains(s2, s1->items[i])
set_add(rv, s1->items[i]); You may also want to
mutex_unlock(&s2->lock); change the order of
mutex_unlock(&s1->lock); unlock()s to be
} reverse of lock()s.
Fall 2017 :: CSE 306

Other Complications
• Sometimes can’t know all virtual addresses in
advance

• Example: when traversing a linked list where each


object has a separate lock
Fall 2017 :: CSE 306

Linux Example: fs/dcache.c


void d_prune_aliases(struct inode *inode) {
struct dentry *dentry;
struct hlist_node *p;
restart:
spin_lock(&inode->i_lock); Make sure inode lock is
hlist_for_each_entry(dentry, p, acquired before dentry
&inode->i_dentry, d_alias) { locks
spin_lock(&dentry->d_lock);
if (!dentry->d_count) {
__dget_dlock(dentry);
__d_drop(dentry); When a list element is
spin_unlock(&dentry->d_lock);
removed, have to restart
spin_unlock(&inode->i_lock);
dput(dentry); from beginning because
goto restart; order of items has
} changed.
spin_unlock(&dentry->d_lock);
}
spin_unlock(&inode->i_lock);
}
Fall 2017 :: CSE 306

4) Deadlock Detection and Recovery


• Database systems use many, many locks
• Very difficult to always avoid deadlocks in general in
such a system

• Last-resort strategy: detect deadlocks, and recover


• Detection usually involves looking out for locks that are
held for too long
• Recovery usually requires a restart of the database app

• An example of breaking the “No preemption”


condition
• By restarting, we are forcibly releasing the resource
Fall 2017 :: CSE 306

Summary: Current Reality


Fine-Grained Locking
Performance

Coarse-Grained
Locking

Complexity

Unsavory trade-off between synchronization


complexity and performance
25
Fall 2017 :: CSE 306

Locking in Kernel
• All locking stuff we discussed so far applies equally
to kernel and user code
• Spinlocks
• Blocking locks
• Granularity
• Deadlock
• Etc.

• However, there is one form of concurrency that’s


(almost) only found in kernel, remember?
• Yes, interrupts!
Fall 2017 :: CSE 306

Locks and Interrupts


• Suppose you are in the disk driver (say, serving a read()
syscall) and holding a disk-related lock

• Say, a disk interrupt happens, and you need to grab the


same lock in the interrupt service routine (ISR)

• What would happen?


• Yes, deadlock
• Can’t finish the ISR without grabbing the lock
• Can’t return to driver code (to release the lock) without finishing ISR

• Can you identify the multiple resources that are involved in


the deadlock?
1) Lock
2) CPU
Fall 2017 :: CSE 306

Solution
• How can we solve this problem?

• Two part solution:


1) Only use spinlocks in ISRs — never call, directly or
indirectly, a routine that would use a blocking lock
2) When acquiring a spinlock in kernel, disable interrupts
on the current processor

• Why just on this processor? Is it okay to get an


interrupt on other processors?

• This is why xv6 kernel spinlocks disable interrupts

You might also like