Module 2 - Parallel Computing
Module 2 - Parallel Computing
Module Two
(Parallel Computing – Explicit Parallelism)
2
Architectural Concept
(Serial to Parallel)
The Von Neumann Architecture (1940s)
Comprised of four main components:
Memory
Control Unit
Arithmetic Logic
Unit
Input/Output
Each of these dimensions can have only one of two possible states:
4
Single or Multiple.
Single Instruction, Single Data (SISD)
This is a serial (non-parallel) computer with
Single Instruction: Only one instruction stream executed by the CPU
during any one clock cycle
Single Data: Only one data stream is utilized during any one clock cycle
Deterministic execution
7
Multiple Instruction, Single Data (MISD)
Multiple Instruction: Each processing unit operates on the data
independently via separate instruction streams.
Single Data: A single data stream is fed into multiple processors.
Examples: Few (if any) actual examples of this class of parallel
computer have ever existed.
8
Multiple Instruction, Multiple Data (MIMD)
Multiple Instruction: Every processor may be executing a different
instruction stream
Multiple Data: Every processor may be working on a different data
stream
11
Control Structure of Parallel Platforms
12
Example 1 (SIMD) – Parallelism from Single
Instruction on Multiple Processors
• Consider the following code segment that adds two vectors:
1. for (i = 0; i < 1000; i ++)
2. c[i] = a[i] + b[i];
• Various iterations of the loop are independent of each other and can
be executed independently.
• In this case adding all the processors with the appropriate data will
allow executing the loop much faster.
• In SIMD parallel computer, the same instruction is dispatched to all
processors and executed concurrently.
• The Intel Pentium processor with Streaming SIMD Extensions (SSE)
provides a number of instructions that execute the same instructions
on multiple data items.
• The above architectural enhancements rely on highly structured
(regular) nature of the underlying computation (e.g. image
13
processing or graphics) to deliver improved performance.
SIMD – Selective Execution Challenge
• While the SIMD concept works well for structured
computations on parallel data structures such as arrays;
15
The idling challenge
Memory Architecture
16
Memory Architecture
(Shared memory)
General Characteristics:
20
Memory Architecture
(Shared memory )
Advantages:
- Global address space provides a user-friendly programming
perspective to memory
- Data sharing between tasks is both fast and uniform due to the
proximity of memory to CPUs
Disadvantages:
-Lack of scalability between memory and CPUs.
Adding more CPUs can geometrically increases traffic on the shared
memory-CPU path, and for cache coherent systems, geometrically
increase traffic associated with cache/memory management.
-Synchronization Programmer responsible for synchronization
constructs that ensure "correct" access of global memory.
21
Memory Architecture
(Distributed memory )
General Characteristics:
General Characteristics:
The network "fabric" used for data transfer varies widely, though
it can be as simple as Ethernet.
23
Memory Architecture
(Distributed memory )
Advantages:
-Memory is scalable with the number of processors. Increase the
number of processors and the size of memory increases
proportionately.
-Rapid memory access: Each processor can rapidly access its own
memory without interference and without the overhead incurred
with trying to maintain global cache coherency.
Disadvantages:
- Demand on the programmer who is responsible for many of the
details associated with data communication between processors.
25
Memory Architecture
(Hybrid Distributed-Shared memory )
General Characteristics:
The largest and fastest computers in the world today employ both
shared and distributed memory architectures.
General Characteristics:
Needs network communications - The distributed memory
component is the networking of multiple shared
memory/GPU machines, which know only about their own
memory - not the memory on another machine. Thus, they
need network to move data from one machine to another.
2- Message-passing model
- Many programming details (MPI or PVM)
- Better user control (data & work decomposition)
- Larger systems and better performance
30
Shared Memory Models
This, perhaps, is the simplest parallel programming model where
processes/tasks share a common address space, which they
read and write to asynchronously.
31
Shared Memory Models
32
Thread Models
This is a type of shared memory programming model where a single
"heavy weight" process can have multiple "light weight",
concurrent execution paths.
33
Thread Models
- From a programming perspective, threads implementations
commonly comprise:
• A library of subroutines that are called from within parallel source code
• A set of compiler directives embedded in either serial or parallel source
code
34
Types of Thread Models
Historically, there are 2 main implementation types of threaded models:
- Posix ,and
- OpenMP
POSIX Threads
- Specified by the IEEE POSIX 1003.1c standard (1995). C Language only.
- Part of Unix/Linux operating systems
- Library based
- Commonly referred to as Pthreads.
- Very explicit parallelism; requires significant programmer attention to
detail.
35
Types of Thread Models
OpenMP
- Industry standard with fork-join model
- Compiler directive based
- Portable / multi-platform, including Unix and Windows platforms
- Available in C/C++ and Fortran implementations
- Can be very easy and simple to use
- Provides for "incremental parallelism". Can begin with a serial code.
36
Types of Thread Models
OpenMP - Example
38
Distributed Memory/Message Passing
Interface (MPI) Models
• Data transfer usually requires cooperative operations to be
performed by each process. For example, a send operation must
have a matching receive operation.
40
Hybrid Programming Model
Typical hybrid model is the combination of MPI model with OpenMP
threads.
42
SPMD & MPMD Programming Models
1- Single Program Multiple Data (SPMD)
SINGLE PROGRAM: All tasks execute their copy of the same program
simultaneously. This program can be threads, message passing, data
parallel or hybrid.
MULTIPLE DATA: All tasks may use different data
43
SPMD & MPMD Programming Models
1- Single Program Multiple Data (SPMD)
44
SPMD & MPMD Programming Models
2- Multiple Program Multiple Data (MPMD)
45
Designing Parallel Programs
Important considerations in designing parallel programs include:
2- Parallel Overhead
Granularity
Is a qualitative measure of the ratio of computation to
communication.
Coarse: Relatively large amounts of computational work are done between
communication events
Fine: Relatively small amounts of computational work are done between
communication events
48
Cost of Parallel Programs
Amdahl's Law:
- States that the potential program speedup is defined by the fraction
of code (P) that can be parallelized:
“You can spend a lifetime getting 95% of your code to be parallel, and never achieve
better than 20x speedup no matter how many processors you throw at it!” 51
Further Topics in Parallel Computing
Super-
“Supercomputer” refers to computing systems capable
Computers of sustaining high-performance computing applications
that require a large number of processors,
shared/distributed memory, and multiple disks. 52
Summary
• Parallel computing relies on parallel hardware
• Parallel computing needs parallel software
• So parallel programming requires:
– New way of thinking
– Identification of parallelism
– Design of parallel algorithm
• Implementation can be a challenge
– requires careful attention to details by the
programmer
53
LAB Exercises
Run Matlab on a Multiprocessor machine and
execute each of the following codes.
54
LAB Exercises
Run Matlab on a Multiprocessor machine and
execute each of the following codes.
Code 2: Parallel Code
55