Concurrency: CS2403 Programming Languages
Concurrency: CS2403 Programming Languages
Concurrency: CS2403 Programming Languages
Concurrency
Chung-Ta King
Department of Computer Science
National Tsing Hua University
(Slides are adopted from Concepts of Programming Languages, R.W. Sebesta)
Outline
Parallel architecture and programming
Language supports for concurrency
Controlling concurrent tasks
Sharing data
Synchronizing tasks
2
Sequential Computing
von Neumann arch. with Program Counter (PC)
dictates sequential execution
Traditional programming thus follows a single
thread of control
The sequence of program
points reached as control
flows through the
program
Program counter
(Introduction to Parallel Computing, Blaise Barney)
3
Sequential Programming Dominates
Sequential programming has dominated
throughout computing history
Why?
Why is there no need to change programming style?
4
2 Factors Help to Maintain Perf.
IC technology: ever shrinking feature size
Moore’s law, faster switching, more functionalities
Architectural innovations to remove bottlenecks
in von Neumann architecture
Memory hierarchy for reducing memory latency:
registers, caches, scratchpad memory
Hide or tolerate memory latency: multithreading,
prefetching, predication, speculation
Executing multiple instructions in parallel: pipelining,
multiple issue (in-/out-of-order, VLIW), SIMD
multimedia extensions (inst.-level parallelism, ILP)
(Prof. Mary Hall, Univ. of Utah)
5
End of Sequential Programming?
Infeasible for continuing improving performance
of uniprocessors
Power, clocking, ...
Multicore architecture prevails (homogeneous or
heterogeneous)
Achieve performance gains with simpler processors
Sequential programming still alive!
Why?
Throughput versus execution time
Can we live with sequential prog. forever?
6
Parallel Programming
A programming style that specify concurrency
(control structure) & interaction (communication
structure) between concurrent subtasks
Still in imperative language style
Concurrency can be expressed at various levels
of granularity
Machine instruction level, high-level language
statement level, unit level, program level
Different models assume different architectural
support
Look at parallel architectures first
SISD SIMD
Single Instruction, Single Instruction,
Single Data Multiple Data
MISD MIMD
Multiple Instruction, Multiple Instruction,
Single Data Multiple Data
9
Parallel Control Mechanisms
11
2 Classes of Parallel Architecture
Distributed memory architectures
Processing units (PEs) connected by an interconnect
Each PE has its own distinct address space without a
global address space, and they explicitly
communicate to exchange data
Ex.: PC clusters of connected by commodity Ethernet
12
Shared Memory Programming
Often as a collection of threads of control
Each thread has private data, e.g., local stack, and a
set of shared variables, e.g., global heap
Threads communicate implicitly by writing and
reading shared variables
Threads coordinate through locks and barriers
implemented using shared variables
13
Distributed Memory Programming
Organized as named processes
A process is a thread of control plus local address
space -- NO shared data
A process cannot see the memory contents of other
processes, nor can it address and access them
Logically shared data is partitioned over processes
Processes communicate by explicit send/receive. i.e.,
asking the destination process to access its local data
on behalf of the requesting process
Coordination is implicit in communication events
blocking/non-blocking send and receive
17
Shared Memory Prog. with Threads
Several thread libraries:
PThreads: the POSIX threading interface
POSIX: Portable Operating System Interface for UNIX
Interface to OS utilities
System calls to create and synchronize threads
OpenMP is newer standard
Allow a programmer to separate a program into serial
regions and parallel regions
Provide synchronization constructs
Compiler generates thread program & synch.
Extensions to Fortran, C, C++ mainly by directives
(Prof. Mary Hall, Univ. of Utah) 18
Thread Basics
A thread is a program unit that can be in
concurrent execution with other program units
Threads differ from ordinary subprograms:
When a program unit starts the execution of a
thread, it is not necessarily suspended
When a thread’s execution is completed, control may
not return to the caller
All threads run in the same address space but have
own runtime stacks
19
Message Passing Prog. with MPI
MPI defines a standard library for message-
passing that can be used to develop portable
message-passing programs using C or Fortran
Based on Single Program, Multiple Data (SPMD)
All communication, synchronization require subroutine
calls no shared variables
Program runs on a single processor just like any
uniprocessor program, except for calls to message
passing library
It is possible to write fully-functional message-
passing programs by using only six routines
(Prof. Mary Hall, Univ. of Utah; Prof. Ananth Grama, Purdue Univ. )
20
Message Passing Basics
The computing systems consists of p processes,
each with its own exclusive address space
Each data element must belong to one of the
partitions of the space; hence, data must be explicitly
partitioned and placed
All interactions (read-only or read/write) require
cooperation of two processes - the process that has
the data and one that wants to access the data
All processes execute asynchronously unless they
interact through send/receive synchronizations
27
Controlling Concurrent Tasks (cont.)
Java Thread class has several methods to
control the execution of threads
The yield is a request from the running thread to
voluntarily surrender the processor
The sleep method can be used by the caller of the
method to block the thread
The join method is used to force a method to delay
its execution until the run method of another thread
has completed its execution
28
Controlling Concurrent Tasks (cont.)
Java thread priority:
A thread’s default priority is the same as the thread
that create it
If main creates a thread, its default priority is
NORM_PRIORITY
Threads defined two other priority constants,
MAX_PRIORITY and MIN_PRIORITY
The priority of a thread can be changed with the
methods setPriority
29
Controlling Concurrent Tasks (cont.)
MPI:
Programmer writes the code for a single process and
the compiler includes necessary libraries
mpicc -g -Wall -o mpi_hello mpi_hello.c
The execution environment starts parallel processes
mpiexec -n 4 ./mpi_hello
36
Pthreads Mutex Example
pthread_mutex_t sum_lock;
int sum;
main() {
...
pthread_mutex_init(&sum_lock, NULL);
...
}
void *find_min(void *list_ptr) {
int my_sum;
pthread_mutex_lock(&sum_lock);
sum += my_sum;
pthread_mutex_unlock(&sum_lock);
}
37
Synchronizing Tasks (cont.)
OpenMP:
OpenMP has reduce operation
sum = 0;
#pragma omp parallel for reduction(+:sum)
for (i=0; i < 100; i++) {
sum += array[i]; }
OpenMP also has critical directive that is executed by
all threads, but restricted to only one thread at a
time
#pragma omp critical [( name )] new-line
sum = sum + 1;
(Prof. Mary Hall, Univ. of Utah) 38
Synchronizing Tasks (cont.)
Java:
A method that includes the synchronized modifier
disallows any other method from running on the
object while it is in execution
public synchronized void deposit(int i)
{…}
public synchronized int fetch() {…}
The above two methods are synchronized which
prevents them from interfering with each other
39
Synchronizing Tasks (cont.)
Java:
Cooperation synchronization is achieved via wait,
notify, and notifyAll methods
All methods are defined in Object, which is the root
class in Java, so all objects inherit them
The wait method must be called in a loop
The notify method is called to tell one waiting
thread that the event it was waiting has happened
The notifyAll method awakens all of the threads
on the object’s wait list
40
Synchronizing Tasks (cont.)
MPI:
Use send/receive to complete task synchronizations,
but semantics of send/receive have to be specialized
Non-blocking send/receive:
Non-blocking send/receive: send() and receive() calls
will return no matter whether data has arrived
Blocking send/receive:
Unbuffered blocking send() does not return until
matching receive() is encountered at receiving process
Buffered blocking send() will return after the sender
has copied the data into the designated buffer
Blocking receive() forces the receiving process to wait
(Prof. Ananth Grama, Purdue Univ. )
41
Unbuffered Blocking
44