0% found this document useful (0 votes)
10 views27 pages

ACA 2024W 03 Shared-Memory Programming 1-35

Uploaded by

Ghofrane Rh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views27 pages

ACA 2024W 03 Shared-Memory Programming 1-35

Uploaded by

Ghofrane Rh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Multicore & GPU Programming : An Integrated Approach

3 Shared-memory programming: Threads


By G. Barlas

<C> G. Barlas, 2015 1


 Modifications by H. Weber

Objectives
!
Learn what threads are and how you can create them.
!
Learn how to initialize threads in order to perform a desired task.
!
Learn how to terminate a multi-threaded program using different
techniques.
!
Understand problems associated with having threads access shared
resources, like race-conditions and deadlocks.
!
Learn what semaphores and monitors are and how you can use them in
your programs.
!
Become familiar with classical synchronization problems and their
solutions.
!
Learn how threads can be dynamically managed at run-time.
!
Learn effective debugging techniques for multi-threaded programs.
<C> G. Barlas, 2015 2
Introduction

!
Thread: An execution path (a sequence of instructions)
that is managed separately by the operating system
scheduler as a unit. There can be multiple threads per
process.
!
Threads are good for:
− Improving performance
− Background tasks
− Asynchronous processing
− Improving program structure

<C> G. Barlas, 2015 3

Thread Creation

!
Traditionally, threads are handled by third-party libraries, such
as:
− pThreads: A C-based library that is considered the common
dominator for threading support.
− winThreads: A Windows-only C++ based library.
− Qt threads: A part of the Qt cross-platform C++ toolkit.
!
C++ 11 standard (ISO/IEC 14882:2011 of year 2011) includes
a build-in thread library.
!
However, some higher level functionality is missing from the
standard.

<C> G. Barlas, 2015 4


Threads in C++11

return 0;
<C> G. Barlas, 2015 5
}

Thread Creation in Qt
! Qt thread management is almost identical to Java's.
! Example:

<C> G. Barlas, 2015 6


Thread Creation in Qt (cont.)

<C> G. Barlas, 2015 7

Threads in Qt : Implicit Approach


! Qt also supports a simpler thread usage possibility
(without the QThread formality):
! QtConcurrent namespace provides a run method.
! QThreadPool provides the threads used by run().

!
Questions:
− How do we pass parameters?
− How do we detect when the thread is done?
− How do we collect the result?
<C> G. Barlas, 2015 8
Implicit Threads in Qt (cont.)
! Parameters to the function to be run by a thread,
can be added to the invocation to
QtConcurrent::run().
! The run() returns a reference to an instance of the
QFuture class.
! QFuture provides:
− isStarted()
− isRunning()
− isFinished()

<C> G. Barlas, 2015 9

Implicit Threads in Qt (cont.)


! A simple example:

#include <QtConcurrent/QtConcurrentRun> int main(){


#include <QFuture> int N=3;
#include <iostream> char c;

using namespace std; QFuture<int *> t =


QtConcurrent::run(test, N, &c);
int* test(int N, char *data) cout << (t.result())[0] << endl;
{ return 0;
int *x = new int[N]; }
x[0]=100;
return x;
}

<C> G. Barlas, 2015 10


A more elaborate example: multi-threaded
file hashing

<C> G. Barlas, 2015 11

A more elaborate example: multi-threaded


file hashing (2)

<C> G. Barlas, 2015 12


Data Sharing Between Threads
! Race condition: An abnormal program behavior caused by
dependence of the result on the relative timing of events in the
program.

<C> G. Barlas, 2015 13

A Race Condition in action

<C> G. Barlas, 2015 14


A Race Condition in action (2)

<C> G. Barlas, 2015 15

A Race Condition in action (3)

! A sample run:

<C> G. Barlas, 2015 16


Semaphores

!
A semaphore is a software abstraction proposed by the late Dutch
computer scientist Edsger Dijkstra, that supports two atomic
operations:
– P for "try to decrement". If the semaphore is unlocked (new value is
non-negative) it becomes locked. If it is locked (new value is non-
positive) the operation is blocking.
– V for "increment".
! Semaphores are equipped with a queue for holding blocked
threads. If the queue is a FIFO queue the semaphore is called
strong (or weak otherwise)
! A binary semaphore (a.k.a. mutex, coming from mutal exclusion)
can have only two states. Blocking occurs if the old value is false.
<C> G. Barlas, 2015 17

Semaphore Classes in Qt

= Counting Semaphore

<C> G. Barlas, 2015 18


Using a Semaphore

! A semaphore can be utilized in three distinct ways:


− As a lock. Suitable semaphore type: binary.
− As a resource counter. Suitable semaphore type:
general.
− As a signaling mechanism. Suitable semaphore type:
binary or general depending on the application.

<C> G. Barlas, 2015 19

Semaphore Use Patterns

<C> G. Barlas, 2015 20


Common Problem Patterns:
Producers-Consumers (Code without Sync.)

<C> G. Barlas, 2015 21

Common Problem Patterns:


Producers-Consumers (Code without Sync.)

<C> G. Barlas, 2015 22


Producers-Consumers
(Code with Sync. for n prod., m cons.)

<C> G. Barlas, 2015 23

Producers-Consumers
(Code with Sync. for n prod., m cons.)

<C> G. Barlas, 2015 24


Producers-Consumers
(Code with Sync. for n prod., m cons.)

<C> G. Barlas, 2015 25

Termination Problem

! How can we stop the threads upon program


termination in an orchestrated manner?
! If the main/parent thread exits, all children are
terminated also by the O.S. But this is not the proper
course of action.
! Two possibilities to terminate thread without help of
OS:
− Shared data item
− Messages
<C> G. Barlas, 2015 26
Termination Using a Shared Flag
! Changes to the shared flag should be detected as soon as
possible.
! To avoid caching copies of the flag the corresponding variable
has to be declared as volatile.
! Example: application in producers-consumers solution (similar
declaration should be in Consumer class):

<C> G. Barlas, 2015  27

Termination Using a Shared Flag (2)


! Assuming that a consumer
decides about the termination
condition at runtime.
! Threads which are blocked
have to be waken up:

<C> G. Barlas, 2015 28


Termination Using a Shared Flag (3)

<C> G. Barlas, 2015 29

Termination Using Messages


! Same requirements as using a Shared Flag:
– Termination Message should reach all threads.
– The threads should be able to detect the
Termination Message within a reasonable
amount of time.
! If the application uses data exchange between all
threads, a special data item could code the termination
message.

<C> G. Barlas, 2015 30


Monitors
! A monitor is an object that encapsulates thread coordination
logic. Its methods are mutually exclusive for the threads using
it.
– Monitors use a condition variable.
– Qt provides a condition variable implementation in the form of the
QWaitCondition class.
! Commonly used methods of the QWaitCondition class:
– wait(): Forces a thread to block. To eliminate the need for the programmer
to separately unlock or lock the mutex controlling entry to the monitor, a
reference to this mutex is passed to the method for performing these operations
automatically.
– wakeOne(): Notifies one thread.
– wakeAll(): Notifies all blocked threads. They will all run, one by one, inside
the monitor.
<C> G. Barlas, 2015  31

Monitors (2)
• Qt provides one more class that simplifies the manage-
ment of the entry-controlling mutex: QMutexLocker.
• An instance of this class should be created at the very first
line of each monitor method. Its responsibility is to lock the
entry-controlling mutex, and that is why its constructor is
passed a reference to it.
• The benefit of this seemingly redundant act is that a
monitor method can terminate at multiple program positions
without requiring an explicit mutex unlock.
• When the destructor of the QMutexLocker class is called
(upon method termination), the mutex is unlocked, saving
effort and eliminating potential programming errors.

<C> G. Barlas, 2015  32


Design #1: Critical Section inside Monitor

! What if the critical section is too time consuming?


– Output of s<thread1> to out<thread1> may consume much time.
• ⇒ All other outputs to other streams have to wait.
– Better solution for such cases:
• Monitor controls entry to Critical Section.
<C> G. Barlas, 2015 33

Design #2: Monitor controls entry to Critical


Section

! Arrangement is similar to semaphore-based design.


! But monitors can check for very complicated
conditions in an easy way.
! Example: producers-consumers implementation.
! This approach is sensible when buffer deposit/ex-
traction is time-consuming (e.g. when it involves an
deep copy of an object and not just the copy of a
pointer).

<C> G. Barlas, 2015 34


Producers-Consumers:
Buffer I/O exterior to the monitor (example)

Look at Code monitor2ProdCons.cpp.

<C> G. Barlas, 2015 35

Static Thread Management Revisited


!
Qt's QThreadPool class manages a collection QThreads.
!
QThreadPool useful functions:
− start(): reserves a thread and uses it to run
− activeThreadCount(): how many threads are currently running
− maxThreadCount(): maximum number of threads in QThreadPool
− setMaxThreadCount(): mutator method for the maximum number
of threads
− waitForDone() : waits for all threads to finish
!
Acces to these functions by global_Instance()->...
!
Major issue is that there is little control on what thread will run
and when.
<C> G. Barlas, 2015  36
Debugging Multithreaded Appl.
!
Tools & setup:
− Use a debugger supporting multithreaded execution (e.g.
DDD)
− Compile your code with debugging information enabled (-g)
− Do not let the compiler optimize the code (no -O switches)
!
Application instrumentation:
− The application should generate some form of log or trace
history.
− Limit or control the number of threads. E.g. can use 1 thread
to weed-out algorithmic errors.

<C> G. Barlas, 2015  37

Debugging Multithreaded Appl. (2)


!
You can differentiate normal from debugging output
by using the standard error.
!
The debugging messages can be filtered and
redirected to a file. E.g. (works in both Linux and
Windows):
$ myprog 2> trace.log
!
Debugging output should be time-stamped.
!
Compiler directives can be used to control when and
which of the debugging code is active.

<C> G. Barlas, 2015 38


Debugging Multithreaded Appl. (3)

Look at Code debugSample.cpp.

<C> G. Barlas, 2015 39

A sample run

<C> G. Barlas, 2015 40


Qt's Higher Level Functionality

! Qt offers functions for processing in parallel,


collections of data, using one (or a pair combination)
of the following operations:
− Mapping: applying a function on all the elements of the
collection
− Filtering: selecting a subset based on the outcome of a
predicate function
− Reduction: consolidating all the data into a single result,
like e.g. a summation.

<C> G. Barlas, 2015 41

Qt's Higher Level Functionality (2)


! Both blocking and non-blocking functions are offered.
The non-blocking require the use of a QFuture refe-
rence to query completion. Qt's list of functions:

<C> G. Barlas, 2015 42


Concurrent Map
! QtConcurrent::map function applies a supplied function
to all the members of a collection in parallel. Example:

<C> G. Barlas, 2015 43

Concurrent Map (2)


! The supplied function needs to have this signature:

! If a new collection needs to be created, the mapped


variant can be used instead. In that case, there is no
need to use references to the input data:

! The only difference of the blocking variants is, that


they do not require the use of a QFuture object.
<C> G. Barlas, 2015 44
The "mapped" variant (example)

<C> G. Barlas, 2015 45

The "mapped" variant


!
The results can be retrieved in the following ways:
− Using iterators to go over the new collection:
QFuture<int> r = QtConcurrent::mapped(data, mult);
for(QFuture<int>::const_iterator i = r.begin(); i != r.end(); i++)
cout << *i << " ";
cout << endl;

− Using the resultAt and resultCount methods of QFuture:


for(int i=0;i<r.resultCount(); i++)
cout << r.resultAt(i) << " ";
cout << endl;

− Getting a reference to the QList<> object generated:


QList<int> res = r.results();
for(int i=0;i<res.size(); i++)
cout << res[i] << " ";
<C> G. Barlas, 2015 46
cout << endl;
The blocking "mapped" variant
! The results can be retrieved immediately from the call:
QList<int> res =
QtConcurrent::blockingMapped(data, mult);

for(int i=0;i<res.size(); i++)


cout << res[i] << " ";
cout << endl;

<C> G. Barlas, 2015 47

Map-reduce

! QtConcurrent::mappedReduced() and
QtConcurrent::blockingMappedReduced() functions
! Two phases applied, requiring two functions: one for the
mapping and one for the reduction.
! Reduction function signature:

! First argument receives the result of the reduction.


! Major issue: reduction is sequential.

<C> G. Barlas, 2015 48


Map-reduce Example
! Finding the variance of a random variable x:

<C> G. Barlas, 2015 49

Map-reduce Example (cont.)

<C> G. Barlas, 2015  50


A Case Study:
Multithreaded Image Matching
! Problem: Finding in a pool of images a subset that best matches a target
image.
! Match can be decided based on a variety of criteria. A simple one is mutual
information:
p xy
MI ( I, J ) = ∑ ∑ p xy log 2
∀x ∀ y
(
px p y )
where x represents a gray-level in I and y a gray-level in J.
!
px is the probability of having a pixel in I with a graylevel of x (similarly for py
for J).
!
pxy is the joint probability defined by pxy = nxy/n , where n is the total number of
pixels in I, J, and nxy is the number of pixels that have a gray-level of x in I
and y in J.

<C> G. Barlas, 2015  51

A Case Study:
Multithreaded Image Matching (2)
! Look at Code image_matching/main.cpp.
– Uses QtConcurrent::blockingMap
– Uses the template class QThreadStorage for holding
data in an object or class that is thread specific. Useful
methods of QThreadStorage:
• bool hasLocalData(): returns true if a data item has been
stored on behalf of the current thread
• T localData(): retrieves the data item corresponding to the
currently running thread
• void setLocalData(T): // associates a data item with the
currently running thread
<C> G. Barlas, 2015  52
A Case Study:
Multithreaded Image Matching (3)
! Average Speedup over 10 runs of each scenario.

1,32

1,3

1,28
speedup

1,26

1,24

1,22

1,2
0 50 100 150 200 250 300

Images
<C> G. Barlas, 2015  53

A Case Study:
Multithreaded Image Matching (4)
! Average Speedup over 10 runs of each scenario.
! Without 9 lines of code below comment "read the target
and all other images".

<C> G. Barlas, 2015  54

You might also like