W3C1 Principles of Parallel Computing
W3C1 Principles of Parallel Computing
Rashmi Kansakar
Computing Eras
➢ Sequential - 1940s+
Renewed interest…
➢ Larger computation tasks
➢ CPU have reached physical limits
➢ Hardware features (pipelining, superscalar, etc) require
complex compilers - reached limits
➢ Vector processing effective, but applicability isolated
➢ Networking technology mature
Hardware Architectures
Ø SISD
Uniprocessors
Ø SIMD
Vector Processors
Parallel Processing
Ø MISD
Maybe Pipelined Computers
Ø MIMD
Multi-Computers
Multi-Processors
Parallel Processor Architectures
➢ Problem specific!
➢ Approaches:
○ Data parallelism
■ MapReduce (data based)
○ Process parallelism
■ Game/Cell Processor (code based)
○ Farmer-and-worker model
■ Web serving (Apache) (thread based)
Level of Parallelism
➢ Goal?
○ Never have a processor idle!
○ ‘Grain size’ important
■ How you break up the problem.
Grain Size Code Item Parallelized By
Processor &
OS
Limit to Parallelism (yet)
Linear speedup not possible
➢ Doubling # cores doesn’t double speed
■ Communication overhead
➢ General guidelines:
■ Computation speed = sqrt(system cost) or faster a system
becomes the more expensive it is to make it faster
■ Speed of parallel computer increases as the log(n) of
processors (n = number of processors)
Parallel Overhead
➢ Sequential program runs in a single processor, and has single line of control
➢ Parallel programming is making many processors collectively work on a single
program
➢ Parallel Overhead:
○ The amount of time required to coordinate parallel tasks, as opposed to doing useful
work.
1 exaFlops = 10^18,
FLOATING POINT OPERATIONS PER SECOND
FLOP machine will do one "operation" in a second.
Moore’s Law and beyond
Moore's Law originated around 1970;
Simplified version of this law states that
processor speeds, or overall processing
power for computers will double every two
years. Moore's Law is no longer relevant,
GPUs advancing faster pace than CPUs
Improving parallel processing CPU vs. GPU
Ø CPU is sometimes called the brains of a
computer while a GPU acts as a specialized
microprocessor.
Ø CPU is good at handling multiple tasks but
a GPU can handle a few specific tasks very fast.
Ø GPU (graphical processing unit) is a
programmable processor designed to quickly
render high resolution images and video.
Ø CUDA cores are an Nvidia GPU's equivalent of
CPU cores. They are optimized for running a
large number of calculations
simultaneously
Ø GPU render
Billions of
triangles Intel XEON NVIDIA TITAN V
per second PLATINUM 9282 CUDA Cores: 5,120
CPU Cores: 56 cores Tensor Cores: 640
Ø PARALLEL
Retail: $50,000 + Retail: $2,999
PROCESSING Transistors : 8 Billion Transistors : 21 Billion
in steroid!
Beyond GPU to dedicated ML silicon
CPU GPU
1. CPU stands for Central Processing While GPU stands for Graphics
Unit. Processing Unit.
2. CPU consumes or needs more While it consumes or requires less
memory than GPU. memory than CPU.
3. The speed of CPU is less than GPU’s While GPU is faster than CPU’s
speed. speed.
4. CPU contain minute powerful cores. While it contain more weak cores.
5. CPU is suitable for serial instruction While GPU is not suitable for serial
processing. instruction processing.
6. CPU is not suitable for parallel While GPU is suitable for parallel
instruction processing. instruction processing.
7. CPU emphasis on low latency. While GPU emphasis on high
throughput.
Comparing GPU & CPU
https://www.youtube.com/watch?v=WmW6SD-EHVY
Why Deep Learning uses GPUs
https://www.youtube.com/watch?v=6stDhEA0wFQ