0% found this document useful (0 votes)
29 views

W3C1 Principles of Parallel Computing

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

W3C1 Principles of Parallel Computing

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Principles of Parallel Computing

(Mastering Cloud Computing: Chapter#2)

Rashmi Kansakar
Computing Eras

➢ Sequential - 1940s+

➢ Parallel and Distributed - 1960s+

➔ But… CS typically teaches sequential


➔ Parallel programming is hard!
➔ Moore’s Law (modern CPUs) now require us into write
parallel programs.
Computing Eras…
Von Neumann Architecture
What is Serial Computing?

Traditionally, software written for serial computation architecture

ü Run on a single computer


having a single Central
Processing Unit (CPU)

ü A problem is broken into a


discrete series of
instructions
ü Instructions are executed
one after another

ü Only one instruction may


execute at any moment in
time
What is Parallel Computing?
Parallel computing is the simultaneous use of
multiple compute resources to solve a computational problem
ü Breaking the problem into
independent parts
ü Each processing element
execute its part of algorithm
simultaneously with the
others
ü Computational problem
solved in less time with
multiple compute resources
than with a single compute
resource
ü Compute resources can be a
single computer with multiple
processors or Several
networked computers
Parallel Vs. Distributed
➢ Parallel - tightly coupled system
○ Computation divided among processors sharing common memory
○ Homogeneous components. Each processor same type & capacity
○ Defined loosening with InfiniBand & distributed memory

➢ Distributed – Architecture that allows computation is broken down into


units and executed concurrently
○ Parallel a subtype - distributed more general
○ Different nodes, processors, or cores
○ Heterogeneous components
○ E.g GRID computing, Internet computing systems
Parallel Computing

Renewed interest…
➢ Larger computation tasks
➢ CPU have reached physical limits
➢ Hardware features (pipelining, superscalar, etc) require
complex compilers - reached limits
➢ Vector processing effective, but applicability isolated
➢ Networking technology mature
Hardware Architectures

➢ Single instruction, single data (SISD)

➢ Single instruction, multiple data (SIMD)

➢ Multiple instruction, single data (MISD)

➢ Multiple instruction, multiple data (MIMD)


Single Instruction Single Data (SISD)

§ Sequential computers (no parallel instruction/data streams)


§ Single Instruction is being acted by the CPU in 1 clock cycle
§ Single Data being used as input during any 1 click cycle
§ Older generation mainframes, minicomputers
§ ‘Normal’ computers, modern day PCs, Macs
§ CS1, CS2, DS - typically programming
Single Instruction Multiple Data (SIMD)

§ Multiple data streams against a single instruction stream to perform operations


§ Multiple Data: Each processing unit can operate on a different data element
§ Vector Processor 1 instruction operate on 1D arrays of data called vectors
§ Scientific workloads, vector and matrix operations
§ GPUs (CUDA), Sony PS3 Cell processor (1, 2), Cray’s vector processor
Multiple Instruction Single Data (MISD)

§ Each processing unit operates on data independently via separate


instruction streams. Example y = sin(x) + cos(x) + tan(x)
§ A single data stream is fed into multiple processing units
§ No commercial machines exist, though CPU superscalar and pipelining
have a similar feel
Multiple Instruction Multiple Data (MIMD)

§ Multiple autonomous processors simultaneously executing different instructions on


different data
§ Multiple Instruction: Every processor may be executing a different instruction stream
§ Multiple Data: Every processor may be working with a different data stream
§ Asynchronous transfers are generally faster than synchronous transfers
§ Most supercomputers, networked parallel clusters, grids, clouds, multi-core PCs
Memory Architectures in MIMD

Shared Memory Distributed-memory


- All PEs are connected to a single global - All PEs have local memory. Loosely-
memory and have access to it coupled multiprocessor systems
- Tightly-coupled multiprocessor systems - Cost effectiveness: can use
- Communication thru shared memory commodity, off-the-shelf processors
- Shared MIMD easier to program, less - Failure can be isolated. Popular
tolerant to failure, and harder to scale - Each processor can access its own
- Failure affect entire entire system memory without interference
Hardware Architectures
Flynn's taxonomy: Based upon the number of concurrent instruction
and data streams available in the architecture

Ø SISD
Uniprocessors
Ø SIMD
Vector Processors
Parallel Processing
Ø MISD
Maybe Pipelined Computers
Ø MIMD
Multi-Computers
Multi-Processors
Parallel Processor Architectures

§ The computers of today, and


tomorrow, have tremendous
processing power that require parallel
programming to fully utilize.
§ There are significant differences
between sequential and parallel
programming, that can be
challenging.
§ With early exposure to these
differences, students are capable of
achieving performance improvements
with multicore programming.
How to Program in Parallel?

➢ Problem specific!
➢ Approaches:
○ Data parallelism
■ MapReduce (data based)
○ Process parallelism
■ Game/Cell Processor (code based)
○ Farmer-and-worker model
■ Web serving (Apache) (thread based)
Level of Parallelism
➢ Goal?
○ Never have a processor idle!
○ ‘Grain size’ important
■ How you break up the problem.
Grain Size Code Item Parallelized By

Large (task) Separate (heavyweight process) Programmer

Medium (control) Function or procedure (thread) Programmer

Fine Loop or instruction block Compiler

Very Fine Instruction Processor or OS


Programmer
Compiler Level of Parallelism…

Processor &
OS
Limit to Parallelism (yet)
Linear speedup not possible
➢ Doubling # cores doesn’t double speed
■ Communication overhead
➢ General guidelines:
■ Computation speed = sqrt(system cost) or faster a system
becomes the more expensive it is to make it faster
■ Speed of parallel computer increases as the log(n) of
processors (n = number of processors)
Parallel Overhead
➢ Sequential program runs in a single processor, and has single line of control
➢ Parallel programming is making many processors collectively work on a single
program
➢ Parallel Overhead:
○ The amount of time required to coordinate parallel tasks, as opposed to doing useful
work.

➢ Parallel overhead can include factors such as:


○ Task start-up time
○ Synchronizations
○ Data communications
○ Software overhead imposed by parallel languages, libraries,
○ operating system, etc.
○ Task termination time
Why Use Parallel Computing ?
Ø Save time and money
- In theory, throwing more resources at a task will shorten its time to completion
- Parallel computers can be built from cheap, commodity Components
Ø Solve larger problems
- Many problems are so large and complex that it is impractical or impossible to
solve them on a single computer, especially given limited computer memory
Ø Provide concurrency
- A single compute resource can only do one thing at a time.
- Multiple computing resources can be doing many things simultaneously
Ø Limits to serial computing
- Increasingly expensive to make a single processor faster
- Using a larger number of moderately fast commodity processors to achieve
the same or better performance is less expensive
The Future
Trends indicated by ever faster networks, distributed systems, and multi-processor
computer architectures clearly show that PARALLELISM is the future of
computing
Ø There has been a
greater than 1000x
increase in
supercomputer
performance, with no
end currently in sight
Ø The race is already on
for Exascale Computing!

1 exaFlops = 10^18,
FLOATING POINT OPERATIONS PER SECOND
FLOP machine will do one "operation" in a second.
Moore’s Law and beyond
Moore's Law originated around 1970;
Simplified version of this law states that
processor speeds, or overall processing
power for computers will double every two
years. Moore's Law is no longer relevant,
GPUs advancing faster pace than CPUs
Improving parallel processing CPU vs. GPU
Ø CPU is sometimes called the brains of a
computer while a GPU acts as a specialized
microprocessor.
Ø CPU is good at handling multiple tasks but
a GPU can handle a few specific tasks very fast.
Ø GPU (graphical processing unit) is a
programmable processor designed to quickly
render high resolution images and video.
Ø CUDA cores are an Nvidia GPU's equivalent of
CPU cores. They are optimized for running a
large number of calculations
simultaneously
Ø GPU render
Billions of
triangles Intel XEON NVIDIA TITAN V
per second PLATINUM 9282 CUDA Cores: 5,120
CPU Cores: 56 cores Tensor Cores: 640
Ø PARALLEL
Retail: $50,000 + Retail: $2,999
PROCESSING Transistors : 8 Billion Transistors : 21 Billion
in steroid!
Beyond GPU to dedicated ML silicon
CPU GPU
1. CPU stands for Central Processing While GPU stands for Graphics
Unit. Processing Unit.
2. CPU consumes or needs more While it consumes or requires less
memory than GPU. memory than CPU.
3. The speed of CPU is less than GPU’s While GPU is faster than CPU’s
speed. speed.
4. CPU contain minute powerful cores. While it contain more weak cores.
5. CPU is suitable for serial instruction While GPU is not suitable for serial
processing. instruction processing.
6. CPU is not suitable for parallel While GPU is suitable for parallel
instruction processing. instruction processing.
7. CPU emphasis on low latency. While GPU emphasis on high
throughput.
Comparing GPU & CPU

MythBusters hosts: Adam and Jamie


Paint the Mona Lisa in 80 Milliseconds!

https://www.youtube.com/watch?v=WmW6SD-EHVY
Why Deep Learning uses GPUs

• Artificial intelligence with PyTorch and CUDA.


• CUDA cores are an Nvidia GPU's equivalent of CPU cores. They are
optimized for running large number of calculations simultaneously
• Discuss how CUDA fits in with PyTorch, and more importantly, why
we use GPUs in neural network programming.

https://www.youtube.com/watch?v=6stDhEA0wFQ

You might also like