0% found this document useful (0 votes)
6 views

CS516: Parallelization of Programs: Overview of Parallel Architectures

Uploaded by

chirag
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

CS516: Parallelization of Programs: Overview of Parallel Architectures

Uploaded by

chirag
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

CS516: Parallelization of Programs

Overview of Parallel Architectures

Vishwesh Jatala
Assistant Professor
Department of CSE
Indian Institute of Technology Bhilai
[email protected]

2023-24 W
1
Recap: Why Parallel Architectures?
• Moore’s Law: The number of transistors on a IC doubles about every two years

2
Recap: Moore’s Law Effect

3
Processor Architecture RoadMap

4
Course Outline
■ Introduction
■ Overview of Parallel Architectures
■ Performance
■ Parallel Programming
• GPUs and CUDA programming
■ Case studies
■ Extracting Parallelism from Sequential Programs Automatically

5
Flynn’s Taxonomy
• Flynn’s classification of computer architecture

6
SISD: Single Instruction, Single Data
• The von Neumann architecture

• Implements a universal Turing machine

• Conforms to serial algorithmic analysis

From http://arstechnica.com/paedia/c/ cpu/part-1/cpu1-1.html

7
SIMD: Single Instruction, Multiple Data
• Single control stream

• All processors operating in lock step

• Fine-grained parallelism

8
SIMD: Single Instruction, Multiple Data

• Example: GPUs

From http://arstechnica.com/paedia/c/c pu/part-1/cpu1-1.html

9
MIMD: Multiple Instructions, Multiple Data
• Most the machines that are prevalent

• Multi-core, SMP, Clusters, NUMA machines, etc.

10
Rest of the today’s lecture…
• Flynn’s classification of computer architecture

11
Flynn’s Taxonomy
• Flynn’s classification of computer architecture

12
MIMD: Shared Memory Multiprocessors
• Tightly coupled multiprocessors
• Shared global memory address space
• Traditional multiprocessing: symmetric multiprocessing (SMP)
• Existing multi-core processors, multithreaded processors
• Programming model similar to uniprocessors (i.e., multitasking uniprocessor) except
• Operations on shared data require synchronization

13
Interconnection Schemes for SMP

14
SMP Architectures

15
UMA: Uniform Memory Access
• All processors have the same uncontended latency to memory
• Symmetric multiprocessing (SMP) ~ UMA with bus interconnect

16
UMA: Uniform Memory Access
+ Data placement unimportant/less important (easier to optimize code and make use of available
memory space)
- Scaling the system increases all latencies
- Contention could restrict bandwidth and increase latency

17
How to Scale Shared Memory Machines?
• Two general approaches

• Maintain UMA
• Provide a scalable interconnect to memory
• Scaling system increases memory latency

• Interconnect complete processors with local memory


• NUMA (Non-uniform memory access)
• Local memory faster than remote memory
• Still needs a scalable interconnect for accessing remote memory

18
NUMA: Non Uniform Memory Access
• Shared memory as local versus remote memory
+ Low latency to local memory
- Much higher latency to remote memories
+ Bandwidth to local memory may be higher
- Performance very sensitive to data placement

19
MIMD: Message Passing Architectures
• Loosely coupled multiprocessors
• No shared global memory address space
• Multicomputer network
• Network-based multiprocessors
• Usually programmed via message passing
• Explicit calls (send, receive) for communication

20
MIMD: Message Passing Architectures

21
Historical Evolution: 1960s & 70s

• Early MPs
• Mainframes
• Small number of processors
• crossbar interconnect
• UMA

22
Historical Evolution: 1980s

• Bus-Based MPs
• enabler: processor-on-a-board
• economical scaling
• precursor of today’s SMPs
• UMA

23
Historical Evolution: Late 80s, mid 90s
• Large Scale MPs (Massively Parallel
Processors)
• multi-dimensional interconnects
• each node a computer (proc + cache
+ memory)
• NUMA
• still used for “supercomputing”

24
Flynn’s Taxonomy
• Flynn’s classification of computer architecture

25
SIMD: Single Instruction, Multiple Data

• Example: GPUs

From http://arstechnica.com/paedia/c/c pu/part-1/cpu1-1.html

26
Data Parallel Programming Model
• Programming Model
• Operations are performed on each element of a large (regular) data
structure (array, vector, matrix)

• Simple example (A, B and C are vectors)


C = (A * B)
• The operations can be executed in sequential or parallel steps
• Language supports array assignment

27
On Sequential Hardwares

28
On Data Parallel Hardwares

29
Data Parallel Architectures
• Early architectures directly mirrored programming model

• Single control processor (broadcast each instruction to an array/grid of


processing elements)

• Examples: Connection Machine, MPP (Massively Parallel Processor)

30
Data Parallel Architectures
• Later data parallel architectures
• Higher integration → SIMD units on chip along with caches
• More generic → multiple cooperating multiprocessors (GPUs)
• Specialized hardware support for global synchronization

31
SIMD: Graphics Processing Units
• The early GPU designs
• Specialized for graphics processing only
• Exhibit SIMD execution
• Less programmable
• NVIDIA GeForce 256

• In 2007, fully programmable GPUs


• CUDA released

32
Single-core CPU vs Multi-core vs GPU

33
Single-core CPU vs Multi-core vs GPU

34
NVIDIA V100 GPU

https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
35
Specifications

36
CPUs vs GPUs

Chip to chip comparison of peak memory bandwidth in GB/s and peak double precision
gigaflops for GPUs and CPUs since 2008.

https://www.nextplatform.com/2019/07/10/a-decade-of-accelerated-computing-augurs-well-for-gpus
37
GPU Applications

38
Specifications

39
Multi-GPU Systems

https://www.azken.com/images/dgx1_images/dgx1-system-architecture-whitepaper1.pdf

40
Summary
• Parallel architectures are inevitable

• Different architectures are evolved

• Flynn’s taxonomy:

• SISD

• MISD

• MIMD

• SIMD

41
References
• David Culler, Jaswinder Pal Singh, and Anoop Gupta. 1998. Parallel Computer
Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers Inc.,
San Francisco, CA, USA

• https://safari.ethz.ch/architecture/fall2020/doku.php?id=schedule

• https://www.cse.iitd.ac.in/~soham/COL380/page.html

• https://s3.wp.wsu.edu/uploads/sites/1122/2017/05/6-9-2017-slides-vFinal.pptx

• https://ebhor.com/full-form-of-cpu/

• Miscellaneous resources on internet


42
Thank You

43

You might also like