Concurrent Programming With Threads: Rajkumar Buyya
Concurrent Programming With Threads: Rajkumar Buyya
Rajkumar Buyya
School of Computer Science and Software Engineering
Monash Technology
Melbourne, Australia
Email: [email protected]
URL: http://www.dgs.monash.edu.au/~rajkumar
Concurrent Programming with Threads
2
Objectives
] Explain the parallel computing right from architecture,
OS, programming paradigm, and applications
] Explain the multithreading paradigm, and all aspects
of how to use it in an application
]Cover all basic MT concepts
]Explore issues related to MT
]Contrast Solaris, POSIX, Java threads
]Look at the APIs in detail
]Examine some Solaris, POSIX, and Java code
examples
] Debate on: MPP and Cluster Computing
3
Agenda
Overview of Computing
Operating Systems Issues
Threads Basics
Multithreading with Solaris and POSIX threads
Multithreading in Java
Distributed Computing
Grand Challenges
Solaris, POSIX, and Java example code
4
P P P P P P
Microkernel
Multi-Processor Computing System
Threads Interface
Hardware
Operating System
Process Processor Thread
P
Applications
Computing Elements
Programming paradigms
5
Architectures
Compilers
Applications
P.S.Es
Architectures
Compilers
Applications
P.S.Es
Sequential
Era
Parallel
Era
1940 50 60 70 80 90 2000 2030
Two Eras of Computing
Commercialization
R & D Commodity
6
History of Parallel Processing
LPP can be traced to a tablet dated
around 100 BC.
4 Tablet has 3 calculating positions.
4 Infer that multiple positions:
Reliability/ Speed
7
Motivating Factors
d d
d
WJust as we learned to fly, not by
constructing a machine that flaps its
wings like birds, but by applying
aerodynamics principles demonstrated
by nature...
We modeled PP after those of
biological species.
8
Aggregated speed with
which complex calculations
carried out by individual neurons
response is slow (ms) - demonstrate
feasibility of PP
Motivating Factors
9
Why Parallel Processing?
Computation requirements are ever
increasing -- visualization, distributed
databases, simulations, scientific
prediction (earthquake), etc..
Sequential architectures reaching
physical limitation (speed of light,
thermodynamics)
10
Technical Computing
Solving technology problems using
computer modeling, simulation and analysis
Life Sciences
Mechanical Design & Analysis (CAD/CAM)
Aerospace
Geographic
Information
Systems
11
No. of Processors
C
.
P
.
I
.
1 2 . . . .
Computational Power Improvement
Multiprocessor
Uniprocessor
12
Age
G
r
o
w
t
h
5 10 15 20 25 30 35 40 45 . . . .
Computational Power Improvement
Vertical
Horizontal
13
The Tech. of PP is mature and can be
exploited commercially; significant
R & D work on development of tools
&environment.
Significant development in
Networking technology is paving a
way for heterogeneous computing.
Why Parallel Processing?
14
Hardware improvements like
Pipelining, Superscalar, etc., are non-
scalable and requires sophisticated
Compiler Technology.
Vector Processing works well for
certain kind of problems.
Why Parallel Processing?
15
Parallel Program has & needs ...
7 Multiple processes active simultaneously
solving a given problem, general multiple
processors.
7 Communication and synchronization of its
processes (forms the core of parallel
programming efforts).
16
Processing Elements Architecture
17
7 Simple classification by Flynn:
(No. of instruction and data streams)
SISD - conventional
SIMD - data parallel, vector computing
MISD - systolic arrays
MIMD - very general, multiple approaches.
7 Current focus is on MIMD model, using
general purpose processors.
(No shared memory)
Processing Elements
18
SISD : A Conventional Computer
Speed is limited by the rate at which computer can
transfer information internally.
Processor
Data Input
Data Output
I
n
s
t
r
u
c
t
i
o
n
s
Ex:PC, Macintosh, Workstations
19
The MISD
Architecture
More of an intellectual exercise than a practical configuration.
Few built, but commercially not available
Data
Input
Stream
Data
Output
Stream
Processor
A
Processor
B
Processor
C
Instruction
Stream A
Instruction
Stream B
Instruction Stream C
20
SIMD Architecture
Ex: CRAY machine vector processing, Thinking machine cm*
C
i
<= A
i
* B
i
Instruction
Stream
Processor
A
Processor
B
Processor
C
Data Input
stream A
Data Input
stream B
Data Input
stream C
Data Output
stream A
Data Output
stream B
Data Output
stream C
21
Unlike SISD, MISD, MIMD computer works asynchronously.
Shared memory (tightly coupled) MIMD
Distributed memory (loosely coupled) MIMD
MIMD Architecture
Processor
A
Processor
B
Processor
C
Data Input
stream A
Data Input
stream B
Data Input
stream C
Data Output
stream A
Data Output
stream B
Data Output
stream C
Instruction
Stream A
Instruction
Stream B
Instruction
Stream C
22
M
E
M
O
R
Y
B
U
S
Shared Memory MIMD machine
Comm: Source PE writes data to GM & destination retrieves it
Easy to build, conventional OSes of SISD can be easily be ported
Limitation : reliability & expandability. A memory component or any
processor failure affects the whole system.
Increase of processors leads to memory contention.
Ex. : Silicon graphics supercomputers....
M
E
M
O
R
Y
B
U
S
Global Memory System
Processor
A
Processor
B
Processor
C
M
E
M
O
R
Y
B
U
S
23
M
E
M
O
R
Y
B
U
S
Distributed Memory MIMD
Communication : IPC on High Speed Network.
Network can be configured to ... Tree, Mesh, Cube, etc.
Unlike Shared MIMD
easily/ readily expandable
Highly reliable (any CPU failure does not affect the whole system)
Processor
A
Processor
B
Processor
C
M
E
M
O
R
Y
B
U
S
M
E
M
O
R
Y
B
U
S
Memory
System A
Memory
System B
Memory
System C
IPC
channel
IPC
channel
24
Laws of caution.....
Speed of computers is proportional to the square
of their cost.
i.e.. cost = Speed
Speedup by a parallel computer increases as the
logarithm of the number of processors.
S
P
C
S
(speed = cost
2
)
Speedup = log
2
(no. of processors)
25
Caution....
N Very fast development in PP and related area
have blurred concept boundaries, causing lot of
terminological confusion : concurrent computing/
programming, parallel computing/ processing,
multiprocessing, distributed computing, etc..
26
Its hard to imagine a field
that changes as rapidly as
computing.
27
Computer Science is an Immature Science.
(lack of standard taxonomy, terminologies)
Caution....
28
N There is no strict delimiters for
contributors to the area of parallel
processing : CA, OS, HLLs, databases,
computer networks, all have a role to
play.
CThis makes it a Hot Topic of Research
Caution....
29
Parallel Programming Paradigms
Multithreading
Task level parallelism
30
Serial Vs. Parallel
Q
Please
COUNTER
COUNTER 1
COUNTER 2
31
High Performance Computing
Parallel Machine : MPP
function1( )
{
//......function stuff
}
function2( )
{
//......function stuff
}
Serial Machine
function1 ( ):
function2 ( ):
Single CPU
Time : add (t
1
, t
2
)
function1( ) || function2 ( )
massively parallel system
containing thousands of CPUs
Time : max (t
1
, t
2
)
t
1
t
2
32
Single and Multithreaded
Processes
Single-threaded Process
Single instruction stream
Multiple instruction stream
Multiplethreaded Process
Threads of
Execution
Common
Address Space
33
OS:
Multi-Processing, Multi-Threaded
Application
Application
Application
Application
CPU
Better Response Times in
Multiple Application
Environments
Higher Throughput for
Parallelizeable Applications
CPU
CPU
CPU
CPU CPU
Threaded Libraries, Multi-threaded I/O
34
Multi-threading, continued...
Multi-threaded OS enables parallel, scalable I/O
Application
CPU
CPU CPU
Application
Application
OS Kernel
Multiple, independent I/O
requests can be satisfied
simultaneously because all the
major disk, tape, and network
drivers have been multi-
threaded, allowing any given
driver to run on multiple
CPUs simultaneously.
35
Shared
memory
segments,
pipes, open
files or
mmapd
files
Basic Process Model
DATA
STACK
TEXT
DATA
STACK
TEXT
processes
Shared Memory
maintained by kernel processes
36
What are Threads?
Thread is a piece of code that can execute in
concurrence with other threads.
It is a schedule entity on a processor
Local state
Global/ shared state
PC
Hard/Software Context
Registers
Hardware
Context
Status Word
Program Counter
Running
Thread Object
37
Threaded Process Model
THREAD
STACK
THREAD
DATA
THREAD
TEXT
SHARED
MEMORY
Threads within a process
Independent executables
All threads are parts of a process hence communication
easier and simpler.
38
Code-Granularity
Code Item
Large grain
(task level)
Program
Medium grain
(control level)
Function (thread)
Fine grain
(data level)
Loop
Very fine grain
(multiple issue)
With hardware
Levels of Parallelism
Task i-l Task i Task i+1
func1 ( )
{
....
....
}
func2 ( )
{
....
....
}
func3 ( )
{
....
....
}
a ( 0 ) =..
b ( 0 ) =..
a ( 1 )=..
b ( 1 )=..
a ( 2 )=..
b ( 2 )=..
+ x
Load
OTask
OControl
OData
OMultiple Issue
39
Simple Thread Example
void *func ( )
{
/* define local data */
- - - - - - - - - - -
- - - - - - - - - - - /* function code */
- - - - - - - - - - -
thr_exit(exit_value);
}
main ( )
{
thread_t tid;
int exit_value;
- - - - - - - - - - -
thread_create (0, 0, func (), NULL, &tid);
- - - - - - - - - - -
thread_join (tid, 0, &exit_value);
- - - - - - - - - - -
}
40
Few Popular Thread Models
POSIX, ISO/IEEE standard
Mach C threads, CMU
Sun OS LWP threads, Sun Microsystems
PARAS CORE threads, C-DAC
Java-Threads, Sun Microsystems
Chorus threads, Paris
OS/2 threads, IBM
Windows NT/95 threads, Microsoft
41
Multithreading - Uniprocessors
] Concurrency Vs Parallelism
Concurrency
Number of Simulatneous execution units > no of CPUs
P1
P2
P3
time
CPU
42
Multithreading -
Multiprocessors
Concurrency Vs Parallelism
P1
P2
P3
time
No of execution process = no of CPUs
CPU
CPU
CPU
43
Computational Model
Parallel Execution due to :
Concurrency of threads on Virtual Processors
Concurrency of threads on Physical Processor
True Parallelism :
threads : processor map = 1:1
User Level Threads
Virtual Processors
Physical Processors
User-Level Schedule (User)
Kernel-Level Schedule (Kernel)
44
General Architecture of
Thread Model
Hides the details of machine
architecture
Maps User Threads to kernel
threads
Process VM is shared, state change
in VM by one thread visible to other.
45
Process Parallelism
int add (int a, int b, int & result)
// function stuff
int sub(int a, int b, int & result)
// function stuff
pthread t1, t2;
pthread-create(&t1, add, a,b, & r1);
pthread-create(&t2, sub, c,d, & r2);
pthread-par (2, t1, t2);
MISD and MIMD Processing
a
b
r1
c
d
r2
add
sub
Processor
Data
IS
1
IS
2
Processor
46
do
d
n/2
d
n2/+1
d
n
Sort
Data
IS
Data Parallelism
sort( int *array, int count)
//......
//......
pthread-t, thread1, thread2;