0% found this document useful (0 votes)
10 views43 pages

CS516: Parallelization of Programs: Overview of Parallel Architectures

Uploaded by

chirag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views43 pages

CS516: Parallelization of Programs: Overview of Parallel Architectures

Uploaded by

chirag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

CS516: Parallelization of Programs

Overview of Parallel Architectures

Vishwesh Jatala
Assistant Professor
Department of CSE
Indian Institute of Technology Bhilai
[email protected]

2023-24 W
1
Recap: Why Parallel Architectures?
• Moore’s Law: The number of transistors on a IC doubles about every two years

2
Recap: Moore’s Law Effect

3
Processor Architecture RoadMap

4
Course Outline
■ Introduction
■ Overview of Parallel Architectures
■ Performance
■ Parallel Programming
• GPUs and CUDA programming
■ Case studies
■ Extracting Parallelism from Sequential Programs Automatically

5
Flynn’s Taxonomy
• Flynn’s classification of computer architecture

6
SISD: Single Instruction, Single Data
• The von Neumann architecture

• Implements a universal Turing machine

• Conforms to serial algorithmic analysis

From http://arstechnica.com/paedia/c/ cpu/part-1/cpu1-1.html

7
SIMD: Single Instruction, Multiple Data
• Single control stream

• All processors operating in lock step

• Fine-grained parallelism

8
SIMD: Single Instruction, Multiple Data

• Example: GPUs

From http://arstechnica.com/paedia/c/c pu/part-1/cpu1-1.html

9
MIMD: Multiple Instructions, Multiple Data
• Most the machines that are prevalent

• Multi-core, SMP, Clusters, NUMA machines, etc.

10
Rest of the today’s lecture…
• Flynn’s classification of computer architecture

11
Flynn’s Taxonomy
• Flynn’s classification of computer architecture

12
MIMD: Shared Memory Multiprocessors
• Tightly coupled multiprocessors
• Shared global memory address space
• Traditional multiprocessing: symmetric multiprocessing (SMP)
• Existing multi-core processors, multithreaded processors
• Programming model similar to uniprocessors (i.e., multitasking uniprocessor) except
• Operations on shared data require synchronization

13
Interconnection Schemes for SMP

14
SMP Architectures

15
UMA: Uniform Memory Access
• All processors have the same uncontended latency to memory
• Symmetric multiprocessing (SMP) ~ UMA with bus interconnect

16
UMA: Uniform Memory Access
+ Data placement unimportant/less important (easier to optimize code and make use of available
memory space)
- Scaling the system increases all latencies
- Contention could restrict bandwidth and increase latency

17
How to Scale Shared Memory Machines?
• Two general approaches

• Maintain UMA
• Provide a scalable interconnect to memory
• Scaling system increases memory latency

• Interconnect complete processors with local memory


• NUMA (Non-uniform memory access)
• Local memory faster than remote memory
• Still needs a scalable interconnect for accessing remote memory

18
NUMA: Non Uniform Memory Access
• Shared memory as local versus remote memory
+ Low latency to local memory
- Much higher latency to remote memories
+ Bandwidth to local memory may be higher
- Performance very sensitive to data placement

19
MIMD: Message Passing Architectures
• Loosely coupled multiprocessors
• No shared global memory address space
• Multicomputer network
• Network-based multiprocessors
• Usually programmed via message passing
• Explicit calls (send, receive) for communication

20
MIMD: Message Passing Architectures

21
Historical Evolution: 1960s & 70s

• Early MPs
• Mainframes
• Small number of processors
• crossbar interconnect
• UMA

22
Historical Evolution: 1980s

• Bus-Based MPs
• enabler: processor-on-a-board
• economical scaling
• precursor of today’s SMPs
• UMA

23
Historical Evolution: Late 80s, mid 90s
• Large Scale MPs (Massively Parallel
Processors)
• multi-dimensional interconnects
• each node a computer (proc + cache
+ memory)
• NUMA
• still used for “supercomputing”

24
Flynn’s Taxonomy
• Flynn’s classification of computer architecture

25
SIMD: Single Instruction, Multiple Data

• Example: GPUs

From http://arstechnica.com/paedia/c/c pu/part-1/cpu1-1.html

26
Data Parallel Programming Model
• Programming Model
• Operations are performed on each element of a large (regular) data
structure (array, vector, matrix)

• Simple example (A, B and C are vectors)


C = (A * B)
• The operations can be executed in sequential or parallel steps
• Language supports array assignment

27
On Sequential Hardwares

28
On Data Parallel Hardwares

29
Data Parallel Architectures
• Early architectures directly mirrored programming model

• Single control processor (broadcast each instruction to an array/grid of


processing elements)

• Examples: Connection Machine, MPP (Massively Parallel Processor)

30
Data Parallel Architectures
• Later data parallel architectures
• Higher integration → SIMD units on chip along with caches
• More generic → multiple cooperating multiprocessors (GPUs)
• Specialized hardware support for global synchronization

31
SIMD: Graphics Processing Units
• The early GPU designs
• Specialized for graphics processing only
• Exhibit SIMD execution
• Less programmable
• NVIDIA GeForce 256

• In 2007, fully programmable GPUs


• CUDA released

32
Single-core CPU vs Multi-core vs GPU

33
Single-core CPU vs Multi-core vs GPU

34
NVIDIA V100 GPU

https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
35
Specifications

36
CPUs vs GPUs

Chip to chip comparison of peak memory bandwidth in GB/s and peak double precision
gigaflops for GPUs and CPUs since 2008.

https://www.nextplatform.com/2019/07/10/a-decade-of-accelerated-computing-augurs-well-for-gpus
37
GPU Applications

38
Specifications

39
Multi-GPU Systems

https://www.azken.com/images/dgx1_images/dgx1-system-architecture-whitepaper1.pdf

40
Summary
• Parallel architectures are inevitable

• Different architectures are evolved

• Flynn’s taxonomy:

• SISD

• MISD

• MIMD

• SIMD

41
References
• David Culler, Jaswinder Pal Singh, and Anoop Gupta. 1998. Parallel Computer
Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers Inc.,
San Francisco, CA, USA

• https://safari.ethz.ch/architecture/fall2020/doku.php?id=schedule

• https://www.cse.iitd.ac.in/~soham/COL380/page.html

• https://s3.wp.wsu.edu/uploads/sites/1122/2017/05/6-9-2017-slides-vFinal.pptx

• https://ebhor.com/full-form-of-cpu/

• Miscellaneous resources on internet


42
Thank You

43

You might also like