23.L20 Multiprocessing Multithreading Vectorization
23.L20 Multiprocessing Multithreading Vectorization
multithreading and
vectorization
Sparsh Mittal
IIT Hyderabad, India
3
Strong and weak scaling
4
Shared Memory vs Message Passing
Shared Memory
All the threadds share the virtual address space.
They can communicate with each other by reading and
writing values from/to shared memory.
● Application ensures no data corruption (Lock/Unlock)
● Example language: OpenMP, CUDA
Message Passing
Programs communicate between each other by sending and
receiving messages (e.g., sending emails
They do not share memory addresses.
Example language: MPI
5
Types of Parallelism
• Data Parallelism
– Different pieces of data can be operated on in parallel
– SIMD: Vector processing, array processing
– Systolic arrays, streaming processors
7
7
Flynn's Taxonomy
8
Flynn's Classification
9
SISD and SIMD
10
MISD
11
MIMD
12
Summary
• SISD: Single instruction operates on single data element
• SIMD: Single instruction operates on multiple data
elements
– Array processor
– Vector processor
• MISD: Multiple instructions operate on single data
element
– Closest form: systolic array processor, streaming
processor
• MIMD: Multiple instructions operate on multiple data
elements (multiple instruction streams)
– Multiprocessor
– Multithreaded processor
13
13
Multiprocessing
14
Multithreading
15
The Notion of Threads
16
Operation of the Program
Parent thread
Initialisation
Child
threads
Time
Sequential
section
17
Multithreading
18
Analogy
19
Coarse Grained Multithreading
4 2
20
Implémentation
21
Advantages
22
Fine Grained Multithreading
23
Simultaneous Multithreading
24
Simultaneous Multithreading
Main Idea
Partition the issue slots across threads
Scenario : In the same cycle
Issue 2 instructions for thread 1
and, issue 1 instruction for thread 2
and, issue 1 instruction for thread 3
Support required
Need smart instruction selection logic.
Balance fairness and throughput
25
Summary
Thread 2
Thread 3
Time
Thread 4
issue
slots
26
Vectorization (and vector
processor)
27
BIG PICTURE
28
Vectorization
29 https://colfaxresearch.com/knl-avx512/
Some of the SIMD instruction sets used in
industry
30
Vector Processors
31
Software Interface
32
Example of Vector Addition
vr1
vr2
vr3
Let us define 8 128 bit vector registers in SimpleRisc. vr0 ... vr7
33
Loading Vector Registers
34
Scatter Gather Operation
Instruction Semantics
v.sg.ld vr1, vr2 vr1 ([vr2[0]], [vr2[1]], [vr2[2]], [vr2[3]])
35
Vector Store Operation
Instruction Semantics
v.st vr1, 12[r1] [r1+12] vr1[0]
[r1+16] vr1[1]
[r1+20] vr1[2]
[r1+24] vr1[3]
36
Vector Operations
37
Design of a Vector Processor
Salient Points
We have a vector register file and a scalar register file
There are scalar and vector functional units
Unless we are converting a vector to a scalar or vice
versa, we in general do not forward values between
vector and scalar instructions
The memory unit needs support for regular operations,
vector operations, and possibly scatter-gather
operations.
39
References
40