ch4并发编程
ch4并发编程
ch4并发编程
Programming
4.1 Introduction to Parallel Computing
In the early days, most computers have only one processing element,
known as the processor or Central Processing Unit (CPU).
Computer programs are traditionally written for serial computation.
Parallel computing is a computing scheme which tries to use multiple
processors executing parallel algorithms to solve problems faster.
With the advent of multicore processors in recent years, most operating
systems, such as Linux, support Symmetrical Multiprocessing (SMP).
The future of computing is clearly in the direction of parallel
computing.
4.1.1 Sequential Algorithms vs. Parallel
Algorithms
The sequential algorithm consist of many steps, all the steps are to be performed by a
single task serially one step at a time.
The parallel algorithm consist of separate tasks, all tasks are to be performed in parallel.
4.1.2 Parallelism vs. Concurrency
In general, a parallel algorithm only identifies tasks that can be executed
in parallel, but it does not specify how to map the tasks to processing
elements.
Ideally, all the tasks in a parallel algorithm should be executed
simultaneously in real time.
However, true parallel executions can only be achieved in systems with
multiple processing elements, such as multiprocessor or multicore
systems.
On single CPU systems, only one task can execute at a time.
In this case, different tasks can only execute concurrently, i.e. logically
in parallel.
4.2 Threads
4.2.1 Principle of Threads
In the process model, processes are independent execution units.
Each process executes in either kernel mode or user mode.
While in user mode, each process executes in a unique address space,
which is separated from other processes.
Although each process is an independent unit, it has only one execution
path.
Whenever a process must wait for something, e.g. an I/O completion
event, it becomes suspended and the entire process execution stops.
Threads are independent execution units in the same address space of a process.
When creating a process, it is created in a unique address space with a main
thread.
When a process begins, it executes the main thread of the process.
The main thread may create other threads. Each thread may create yet more
threads.
All threads in a process execute in the same address space of the process but each
thread is an independent execution unit.
In the threads model, if a thread becomes suspended, other threads may continue
to execute.
In addition to sharing a common address space, threads also share many other
resources of a process, such as user id, opened file descriptors and
signals, etc. ,
Currently, almost all OS support Pthreads, which is the threads standard of IEEE
POSIX 1003.1c.
4.2.2 Advantages of Threads
Threads have many advantages over processes.
(1). Thread creation and switching are faster.
(2). Threads are more responsive.
(3). Threads are better suited to parallel computing.
Parallel algorithms often require the execution entities to share common
data.
In the process model, processes cannot share data efficiently because their
address spaces are all distinct. Using Interprocess Communication (IPC) to
exchange data
Threads in the same process share all the (global) data in the same address
space.
4.2.3 Disadvantages of Threads
(1). Because of shared address space, threads needs explicit
synchronization from the user.
(2). Many library functions may not be threads safe, e.g. the traditional
strtok() function, which divides a string into tokens in-line.
In general, any function which uses global variables or relies on
contents of static memory is not threads safe.
(3). On single CPU systems, using threads to solve problems is actually
slower than using a sequential program due to the overhead in threads
creation and context switching at run-time.
4.4 Threads Management Functions
The Pthreads library offers the following APIs for threads management.
pthread_create(thread, attr, function, arg) : create thread
pthread_exit(status) : terminate thread
pthread_cancel(thread) : cancel thread
pthread_attr_init(attr) : initialize thread attributes
pthread_attr_destroy(attr) : destroy thread attribute
4.4.1 Create Thread
Threads are created by the pthread_create() function.
int pthread_create (pthread_t *pthread_id, pthread_attr_t *attr,
void *(*func)(void *), void *arg);
which returns 0 on success or an error number on failure. Parameters to
the pthread_create() function are
• pthread_id is a pointer to a variable of the pthread_t type. It will be
filled with the unique thread ID assigned by the OS kernel.
• A thread may get its own ID by the pthread_self() function.
• In Linux, pthread_t type is defined as unsigned long, so thread ID can
be printed as %lu.
• attr is a pointer to another opaque data type, which specifies the thread
attributes.
• func is the entry address of a function for the new thread to execute.
• arg is a pointer to a parameter for the thread function, which can be
written as void *func(void *arg).
• The steps of using an attr parameter are as follows.
(1). Define a pthread attribute variable pthread_attr_t attr
(2). Initialize the attribute variable with pthread_attr_init(&attr)
(3). Set the attribute variable and use it in pthread_create() call
(4). If desired, free the attr resource by pthread_attr_destroy(&attr)
• By default, every thread is created to be joinable with other threads.
• If desired, a thread can be created with the detached attribute, which
makes it non-joinable with other threads.
• The following code segment shows how to create a detached thread.
pthread_attr_t attr; // define an attr variable
pthread_attr_init(&attr); // initialize attr
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
// set attr
pthread_create(&thread_id, &attr, func, NULL); // create thread with
attr
pthread_attr_destroy(&attr); // optional: destroy attr
• Every thread is created with a default stack size. During execution, a
thread may find out its stack size by the function
size_t pthread_attr_getstacksize()
which returns the default stack size.
The following code segment shows how to create a thread with a
specific stack size.
pthread_attr_t attr; // attr variable
size_t stacksize; // stack size
pthread_attr_init(&attr); // initialize attr
stacksize = 0x10000; // stacksize=64KB;
pthread_attr_setstacksize (&attr, stacksize); // set stack size in attr
pthread_create(&threads[t], &attr, func, NULL); // create thread with
stack size
• If the attr parameter is NULL, threads will be created with default
attributes.
• In fact, this is the recommended way of creating threads, which should
be followed unless there is a compelling reason to alter the thread
attributes.
4.4.2 Thread ID
Thread ID is an opaque data type, which depends on implementation.
Thread IDs should not be compared directly.
If needed, they can be compared by the pthread_equal() function.
int pthread_equal (pthread_t t1, pthread_t t2);
which returns zero if the threads are different threads, non-zero
otherwise.
4.4.3 Thread Termination
A thread terminates when the thread function finishes.
Alternatively, a thread may call the function
int pthread_exit (void *status);
to terminate explicitly, where status is the exit status of the thread.
As usual, a 0 exit value means normal termination, and non-zero values
mean abnormal termination.
4.4.4 Thread Join
A thread can wait for the termination of another thread by
int pthread_join (pthread_t thread, void **status_ptr);
The exit status of the terminated thread is returned in status_ptr.
4.5.1 Sum of Matrix by Threads
Example 4.1: Assume that we want to compute the sum of all the
elements in an N N matrix of integers.
The program must be
compiled as
gcc C4.1.c –pthread
4.5.2 Quicksort by Threads
Example 4.2: Quicksort by Concurrent Threads
4.6 Threads Synchronization
Since threads execute in the same address space of a process, they share
all the global variables and data structures in the same address space.
When several threads try to modify the same shared variable or data
structure, if the outcome depends on the execution order of the threads,
it is called a race condition.
In order to prevent race conditions, as well as to support threads
cooperation, threads need synchronization.
In general, synchronization refers to the mechanisms and rules used to
ensure the integrity of shared data objects and coordination of
concurrently executing entities.
4.6.1 Mutex Locks
In Pthreads, locks are called mutex, which stands for Mutual Exclusion.
Mutex variables are declared with the type pthread_mutex_t, and they
must be initialized before using.
(1) Statically, as in
pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
which defines a mutex variable m and initializes it with default
attributes.
(2) Dynamically with the pthread_mutex_init() function, which allows
setting the mutex attributes by an attr parameter, as in
pthread_mutex_init (pthread_mutex_t *m, pthread_mutexattr_t,*attr);
As usual, the attr parameter can be set to NULL for default attributes.
After initialization, mutex variables can be used by threads via the
following functions.
int pthread_mutex_lock (pthread_mutex_t *m); // lock mutex
int pthread_mutex_unlock (pthread_mutex_t *m); // unlock mutex
int pthread_mutex_trylock (pthread_mutex_t *m); // try to lock
mutex
int pthread_mutex_destroy (pthread_mutex_t *m); // destroy mutex
• Threads use mutexes as locks to protect shared data objects.
• A thread first creates a mutex and initializes it once.
• A newly created mutex is in the unlocked state and without an owner.
• Each thread tries to access a shared data object by
pthread_mutex_lock(&m); // lock mutex
access shared data object; // access shared data in a critical
region
pthread_mutex_unlock(&m); // unlock mutex
A sequence of executions which can only be performed by one
execution entity at a time is commonly known as a Critical Region
(CR).
Example 4.3: This example is a modified version of Example 4.1. As
before, we shall use N working threads to compute the sum of all the
elements of an NxN matrix of integers. Each working thread computes
the partial sum of a row. Instead of depositing the partial sums in a
global sum[ ] array, each working thread tries to update a global
variable, total, by adding its partial sum to it.
4.6.2 Deadlock Prevention
Deadlock is a condition, in which many execution entities mutually wait
for one another so that none of them can proceed.
In this case, T1 and T2 would mutually wait for each other forever, so
they are in a deadlock due to crossed locking requests.
There are many ways to deal with possible deadlocks, which include
deadlock prevention, deadlock avoidance, deadlock detection and
recovery, etc.
In real systems, the only practical way is deadlock prevention, which tries
to prevent deadlocks from occurring when designing parallel algorithms.
A simple way to prevent deadlock is to order the mutexes and ensure that
every thread requests mutex locks only in a single direction, so that there
are no loops in the request sequences.
However, it may not be possible to design every parallel algorithm with
only uni-directional locking requests.
In such cases, the conditional locking function,
pthread_mutex_trylock(),
may be used to prevent deadlocks.
The trylock() function returns immediately with an error if the mutex is
already locked.
In that case, the calling thread may back-off by releasing some of the
locks it already holds, allowing other threads to continue.
In the above crossed locking example, we may redesign one of the
threads, e.g. T1, as follows
4.6.3 Condition Variables
Condition variables provide a means for threads cooperation.
Condition variables are always used in conjunction with mutex locks.
In Pthreads, condition variables are declared with the type
pthread_cond_t, and must be initialized before using.
Like mutexes, condition variables can also be initialized in two ways.
(1) Statically, when it is declared, as in
pthread_cond_t con = PTHREAD_COND_INITIALIZER;
which defines a condition variable, con, and initializes it with default
attributes.
(2) Dynamically with the pthread_cond_init() function, which allows
setting a condition variable with an attr parameter.
For simplicity, we shall always use a NULL attr parameter for default
attributes.