Byte Python Concurrent and Parallel Programming V2
Byte Python Concurrent and Parallel Programming V2
Byte Python Concurrent and Parallel Programming V2
Parallel Programming
threads, locks, processes and events
Introduction
“The free lunch is over” - Herb Sutter, 2005 http://www.gotw.ca/publications/concurrency-ddj.htm
[thread 1]
with stdout_lock:
print "Why it is tricky"
[thread 2]
with stdout_lock:
print "to use threads"
Difficulties with multithreading
● Threads are nondeterministic
● Scheduling is done by the OS, not by the Python interpreter
● It is unpredictable when a thread runs, hence code needs to be thread safe
● Threads that use I/O block (wait) for (example filesystem) resources to
become available
● We must use locks to synchronise multithreaded access to shared state
● Synchronising state across threads using locks is difficult and error-prone
Thread locks & synchronisation
● threading.Lock : default lock
○ acquire() closes
○ release() opens
○ subsequent acquire calls on a closed lock block the acquiring thread - even the owning thread
● threading.Rlock : reëntrant lock with ownership counter
○ subsequent acquire calls from the owning thread do not block
■ useful when invoking thread-safe code on the same thread
● threading.Event : simple flag
○ set / clear toggles event to be flagged
○ wait() puts thread in wait state until the event is set
● threading.Condition : conditional lock with notification
○ acquire() blocks
○ notify() signals waiting thread that condition has changed state
○ wait() blocks thread until notify() is called on the lock by owning thread
● threading.Semaphore : lock with a counter
○ acquire() blocks when the amount of acquisitions == semaphore.max
○ release() decreases this
Locking pitfalls
● deadlock
○ thread A → waits on thread B → waits on thread A
○ A waits on C, C waits on B, B waits on A
■ usually circular dependencies or not releasing lock (thread might
be waiting on C for a long time, never releasing resource A)
a.resource_a_lock.acquire() #locks
b.resource_b_lock.acquire() #locks
b.resource_a_lock.acquire() #blocks
b.resource_c_lock.acquire() #blocks
a.resource_b_lock.acquire() #blocks
Locking pitfalls
● race condition
○ thread A retrieves value of bank account f00b4r (EUR 100)
○ thread B gets scheduled and does the same (EUR 100)
○ thread B increases the value by 10% (EUR 110)
○ thread A gets scheduled and increases by 15% (EUR 126,5)
Within shared state multithreading, use a form of lock or event to keep these
Get-Update-Store operations (transactions) atomic (thread-safe)
Use queues to share state between threads
Queue - synchronised task container (it uses simple locks internally)
import Queue
q = Queue.Queue()
q.put() # append a task
q.get() # retrieve a task, wait() if no task available
q.task_done() # decrease task counter
q.join() # block until all tasks are complete
5
B - Buy Beer Brought By Bicycle
How it works
I have:
● a warehouse filled with tasty beer
● a cargo bike
● a cell phone where I can get orders
Pseudocode
while (1)
continue unless order = pop_order_off_stack()
collect_order(order)
ready_bicycle()
route = determine_route(order.address)
bike_to_address(route)
fullfill_transaction(order.price)
return_home(route)
So it got a bit popular
● Hire three employees
○ three messengers (threads)
○ a dispatcher (scheduler)
○ Two order pickers (threads)
● Buy another cargo bike (pool)
q_orders = Queue.Queue()
q_loaded_bikes = Queue.Queue(maxsize=n_bikes)
q_empty_bikes = Queue.Queue(maxsize=n_bikes)
for m in xrange(n_messengers):
thread = threading.Thread(target=messenger_worker, args=(q_loaded_bikes, q_empty_bikes, q_orders, m))
thread.daemon = True
thread.start()
for o in xrange(n_order_pickers):
thread = threading.Thread(target=order_picker_worker, args=(q_orders, q_empty_bikes, q_loaded_bikes, o))
thread.daemon = True
thread.start()
q_orders.join()
def order_picker_worker(order_queue, empty_queue, ready_queue, n):
def process_order(order):
while True:
try:
empty_bike = empty_queue.get(True, 1)
except Queue.Empty:
continue
sleep(5)
ready_queue.put(empty_bike, True, 1)
while True: ←
try:
beer_order = order_queue.get()
except Queue.Empty:
sleep(5)
continue
process_order(beer_order)
def messenger_worker(full_queue, empty_queue, order_queue, n):
while True:
try:
loaded_bike = full_queue.get(True, 1)
except Queue.Empty:
continue
sleep(10)
empty_queue.put(loaded_bike, True, 1)
order_queue.task_done()
Conditional wait() over while True:
#order picker
ready_cv.acquire()
orders.put(ready_order)
ready_cv.notify()
ready_cv.release()
#messenger
ready_cv.acquire()
while True:
order = orders.get()
if order:
break
ready_cv.wait()
ready_cv.release()
do_stuff_with_order()
Greenlets instead of Threads
import gevent
from gevent.queue import Queue #similar to Queue, tuned for greenlets
Enter multiprocessing!
Difficulties within parallelism
● Amdahl's law
manager = mp.Manager()
queue_1 = manager.Queue()
…
process = mp.Process(target=messenger_worker, args=(queue_1, queue_2…)
process.start()
…
Communication between processes
import multiprocessing as mp
manager = mp.Manager()
queue_1 = manager.Queue()
…
process = mp.Process(target=messenger_worker, args=(queue_1, queue_2…)
process.start()
…
class Messenger(mp.Process)
…
Communication between processes
import multiprocessing as mp
Questions?