Byte Python Concurrent and Parallel Programming V2
Byte Python Concurrent and Parallel Programming V2
Parallel Programming
threads, locks, processes and events
Introduction
“The free lunch is over” - Herb Sutter, 2005 http://www.gotw.ca/publications/concurrency-ddj.htm
[thread 1]
with stdout_lock:
print "Why it is tricky"
[thread 2]
with stdout_lock:
print "to use threads"
Difficulties with multithreading
● Threads are nondeterministic
● Scheduling is done by the OS, not by the Python interpreter
● It is unpredictable when a thread runs, hence code needs to be thread safe
● Threads that use I/O block (wait) for (example filesystem) resources to
become available
● We must use locks to synchronise multithreaded access to shared state
● Synchronising state across threads using locks is difficult and error-prone
Thread locks & synchronisation
● threading.Lock : default lock
○ acquire() closes
○ release() opens
○ subsequent acquire calls on a closed lock block the acquiring thread - even the owning thread
● threading.Rlock : reëntrant lock with ownership counter
○ subsequent acquire calls from the owning thread do not block
■ useful when invoking thread-safe code on the same thread
● threading.Event : simple flag
○ set / clear toggles event to be flagged
○ wait() puts thread in wait state until the event is set
● threading.Condition : conditional lock with notification
○ acquire() blocks
○ notify() signals waiting thread that condition has changed state
○ wait() blocks thread until notify() is called on the lock by owning thread
● threading.Semaphore : lock with a counter
○ acquire() blocks when the amount of acquisitions == semaphore.max
○ release() decreases this
Locking pitfalls
● deadlock
○ thread A → waits on thread B → waits on thread A
○ A waits on C, C waits on B, B waits on A
■ usually circular dependencies or not releasing lock (thread might
be waiting on C for a long time, never releasing resource A)
a.resource_a_lock.acquire() #locks
b.resource_b_lock.acquire() #locks
b.resource_a_lock.acquire() #blocks
b.resource_c_lock.acquire() #blocks
a.resource_b_lock.acquire() #blocks
Locking pitfalls
● race condition
○ thread A retrieves value of bank account f00b4r (EUR 100)
○ thread B gets scheduled and does the same (EUR 100)
○ thread B increases the value by 10% (EUR 110)
○ thread A gets scheduled and increases by 15% (EUR 126,5)
Within shared state multithreading, use a form of lock or event to keep these
Get-Update-Store operations (transactions) atomic (thread-safe)
Use queues to share state between threads
Queue - synchronised task container (it uses simple locks internally)
import Queue
q = Queue.Queue()
q.put() # append a task
q.get() # retrieve a task, wait() if no task available
q.task_done() # decrease task counter
q.join() # block until all tasks are complete
5
B - Buy Beer Brought By Bicycle
How it works
I have:
● a warehouse filled with tasty beer
● a cargo bike
● a cell phone where I can get orders
Pseudocode
while (1)
continue unless order = pop_order_off_stack()
collect_order(order)
ready_bicycle()
route = determine_route(order.address)
bike_to_address(route)
fullfill_transaction(order.price)
return_home(route)
So it got a bit popular
● Hire three employees
○ three messengers (threads)
○ a dispatcher (scheduler)
○ Two order pickers (threads)
● Buy another cargo bike (pool)
q_orders = Queue.Queue()
q_loaded_bikes = Queue.Queue(maxsize=n_bikes)
q_empty_bikes = Queue.Queue(maxsize=n_bikes)
for m in xrange(n_messengers):
thread = threading.Thread(target=messenger_worker, args=(q_loaded_bikes, q_empty_bikes, q_orders, m))
thread.daemon = True
thread.start()
for o in xrange(n_order_pickers):
thread = threading.Thread(target=order_picker_worker, args=(q_orders, q_empty_bikes, q_loaded_bikes, o))
thread.daemon = True
thread.start()
q_orders.join()
def order_picker_worker(order_queue, empty_queue, ready_queue, n):
def process_order(order):
while True:
try:
empty_bike = empty_queue.get(True, 1)
except Queue.Empty:
continue
sleep(5)
ready_queue.put(empty_bike, True, 1)
while True: ←
try:
beer_order = order_queue.get()
except Queue.Empty:
sleep(5)
continue
process_order(beer_order)
def messenger_worker(full_queue, empty_queue, order_queue, n):
while True:
try:
loaded_bike = full_queue.get(True, 1)
except Queue.Empty:
continue
sleep(10)
empty_queue.put(loaded_bike, True, 1)
order_queue.task_done()
Conditional wait() over while True:
#order picker
ready_cv.acquire()
orders.put(ready_order)
ready_cv.notify()
ready_cv.release()
#messenger
ready_cv.acquire()
while True:
order = orders.get()
if order:
break
ready_cv.wait()
ready_cv.release()
do_stuff_with_order()
Greenlets instead of Threads
import gevent
from gevent.queue import Queue #similar to Queue, tuned for greenlets
Enter multiprocessing!
Difficulties within parallelism
● Amdahl's law
manager = mp.Manager()
queue_1 = manager.Queue()
…
process = mp.Process(target=messenger_worker, args=(queue_1, queue_2…)
process.start()
…
Communication between processes
import multiprocessing as mp
manager = mp.Manager()
queue_1 = manager.Queue()
…
process = mp.Process(target=messenger_worker, args=(queue_1, queue_2…)
process.start()
…
class Messenger(mp.Process)
…
Communication between processes
import multiprocessing as mp
Questions?