0% found this document useful (0 votes)

10 views43 pages

CS516: Parallelization of Programs: Overview of Parallel Architectures

Uploaded by

chirag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views43 pages

CS516: Parallelization of Programs: Overview of Parallel Architectures

Uploaded by

chirag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

CS516: Parallelization of Programs

Overview of Parallel Architectures

Vishwesh Jatala
Assistant Professor
Department of CSE
Indian Institute of Technology Bhilai
[email protected]

2023-24 W
1
Recap: Why Parallel Architectures?
• Moore’s Law: The number of transistors on a IC doubles about every two years

2
Recap: Moore’s Law Effect

3
Processor Architecture RoadMap

4
Course Outline
■ Introduction
■ Overview of Parallel Architectures
■ Performance
■ Parallel Programming
• GPUs and CUDA programming
■ Case studies
■ Extracting Parallelism from Sequential Programs Automatically

5
Flynn’s Taxonomy
• Flynn’s classification of computer architecture

6
SISD: Single Instruction, Single Data
• The von Neumann architecture

• Implements a universal Turing machine

• Conforms to serial algorithmic analysis

From http://arstechnica.com/paedia/c/ cpu/part-1/cpu1-1.html

7
SIMD: Single Instruction, Multiple Data
• Single control stream

• All processors operating in lock step

• Fine-grained parallelism

8
SIMD: Single Instruction, Multiple Data

• Example: GPUs

From http://arstechnica.com/paedia/c/c pu/part-1/cpu1-1.html

9
MIMD: Multiple Instructions, Multiple Data
• Most the machines that are prevalent

• Multi-core, SMP, Clusters, NUMA machines, etc.

10
Rest of the today’s lecture…
• Flynn’s classification of computer architecture

11
Flynn’s Taxonomy
• Flynn’s classification of computer architecture

12
MIMD: Shared Memory Multiprocessors
• Tightly coupled multiprocessors
• Shared global memory address space
• Traditional multiprocessing: symmetric multiprocessing (SMP)
• Existing multi-core processors, multithreaded processors
• Programming model similar to uniprocessors (i.e., multitasking uniprocessor) except
• Operations on shared data require synchronization

13
Interconnection Schemes for SMP

14
SMP Architectures

15
UMA: Uniform Memory Access
• All processors have the same uncontended latency to memory
• Symmetric multiprocessing (SMP) ~ UMA with bus interconnect

16
UMA: Uniform Memory Access
+ Data placement unimportant/less important (easier to optimize code and make use of available
memory space)
- Scaling the system increases all latencies
- Contention could restrict bandwidth and increase latency

17
How to Scale Shared Memory Machines?
• Two general approaches

• Maintain UMA
• Provide a scalable interconnect to memory
• Scaling system increases memory latency

• Interconnect complete processors with local memory

• NUMA (Non-uniform memory access)
• Local memory faster than remote memory
• Still needs a scalable interconnect for accessing remote memory

18
NUMA: Non Uniform Memory Access
• Shared memory as local versus remote memory
+ Low latency to local memory
- Much higher latency to remote memories
+ Bandwidth to local memory may be higher
- Performance very sensitive to data placement

19
MIMD: Message Passing Architectures
• Loosely coupled multiprocessors
• No shared global memory address space
• Multicomputer network
• Network-based multiprocessors
• Usually programmed via message passing
• Explicit calls (send, receive) for communication

20
MIMD: Message Passing Architectures

21
Historical Evolution: 1960s & 70s

• Early MPs
• Mainframes
• Small number of processors
• crossbar interconnect
• UMA

22
Historical Evolution: 1980s

• Bus-Based MPs
• enabler: processor-on-a-board
• economical scaling
• precursor of today’s SMPs
• UMA

23
Historical Evolution: Late 80s, mid 90s
• Large Scale MPs (Massively Parallel
Processors)
• multi-dimensional interconnects
• each node a computer (proc + cache
+ memory)
• NUMA
• still used for “supercomputing”

24
Flynn’s Taxonomy
• Flynn’s classification of computer architecture

25
SIMD: Single Instruction, Multiple Data

• Example: GPUs

From http://arstechnica.com/paedia/c/c pu/part-1/cpu1-1.html

26
Data Parallel Programming Model
• Programming Model
• Operations are performed on each element of a large (regular) data
structure (array, vector, matrix)

• Simple example (A, B and C are vectors)

C = (A * B)
• The operations can be executed in sequential or parallel steps
• Language supports array assignment

27
On Sequential Hardwares

28
On Data Parallel Hardwares

29
Data Parallel Architectures
• Early architectures directly mirrored programming model

• Single control processor (broadcast each instruction to an array/grid of

processing elements)

• Examples: Connection Machine, MPP (Massively Parallel Processor)

30
Data Parallel Architectures
• Later data parallel architectures
• Higher integration → SIMD units on chip along with caches
• More generic → multiple cooperating multiprocessors (GPUs)
• Specialized hardware support for global synchronization

31
SIMD: Graphics Processing Units
• The early GPU designs
• Specialized for graphics processing only
• Exhibit SIMD execution
• Less programmable
• NVIDIA GeForce 256

• In 2007, fully programmable GPUs

• CUDA released

32
Single-core CPU vs Multi-core vs GPU

33
Single-core CPU vs Multi-core vs GPU

34
NVIDIA V100 GPU

https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
35
Specifications

36
CPUs vs GPUs

Chip to chip comparison of peak memory bandwidth in GB/s and peak double precision
gigaflops for GPUs and CPUs since 2008.

https://www.nextplatform.com/2019/07/10/a-decade-of-accelerated-computing-augurs-well-for-gpus
37
GPU Applications

38
Specifications

39
Multi-GPU Systems

https://www.azken.com/images/dgx1_images/dgx1-system-architecture-whitepaper1.pdf

40
Summary
• Parallel architectures are inevitable

• Different architectures are evolved

• Flynn’s taxonomy:

• SISD

• MISD

• MIMD

• SIMD

41
References
• David Culler, Jaswinder Pal Singh, and Anoop Gupta. 1998. Parallel Computer
Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers Inc.,
San Francisco, CA, USA

• https://safari.ethz.ch/architecture/fall2020/doku.php?id=schedule

• https://www.cse.iitd.ac.in/~soham/COL380/page.html

• https://s3.wp.wsu.edu/uploads/sites/1122/2017/05/6-9-2017-slides-vFinal.pptx

• https://ebhor.com/full-form-of-cpu/

• Miscellaneous resources on internet

42
Thank You

Paralelismo_2024
No ratings yet
Paralelismo_2024
30 pages
computer_design_paper_Ali
No ratings yet
computer_design_paper_Ali
5 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
268 pages
ceg4131_models
No ratings yet
ceg4131_models
27 pages
06 Flynn-S Classification
No ratings yet
06 Flynn-S Classification
31 pages
Copy of Unit IV CA
No ratings yet
Copy of Unit IV CA
73 pages
Lec 3
No ratings yet
Lec 3
48 pages
01 Introduction
No ratings yet
01 Introduction
51 pages
Introduction about ACA Syllabus
No ratings yet
Introduction about ACA Syllabus
18 pages
Architecture
No ratings yet
Architecture
67 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
RUCKUS-One-Online-Help-Rev E
No ratings yet
RUCKUS-One-Online-Help-Rev E
1,156 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
Unit 1- Part 1
No ratings yet
Unit 1- Part 1
51 pages
onur-digitaldesign-2020-lecture20-gpu-beforelecture
No ratings yet
onur-digitaldesign-2020-lecture20-gpu-beforelecture
73 pages
APznzabMSGRiAQ8A6MYm6rveAifgi1HxTbiTS9Yf85jZUPqJgWxkujRhNKxar3EMmdUmkYBO7lY9cgFKwY4fwAkv2bcmoL6bQOuYWj_ptvmKvZa7LIHiGWTA-SGiv4ZX1G6v7akwnOUhTbDF77ogwOam9w3m9razgp9_G3AN8-n7pGnvYDhIz5LR3pHaezRf34N7xBAUUWK5LTsnzw1
No ratings yet
APznzabMSGRiAQ8A6MYm6rveAifgi1HxTbiTS9Yf85jZUPqJgWxkujRhNKxar3EMmdUmkYBO7lY9cgFKwY4fwAkv2bcmoL6bQOuYWj_ptvmKvZa7LIHiGWTA-SGiv4ZX1G6v7akwnOUhTbDF77ogwOam9w3m9razgp9_G3AN8-n7pGnvYDhIz5LR3pHaezRf34N7xBAUUWK5LTsnzw1
31 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
Lecture ParallelArchTLP-DLP
No ratings yet
Lecture ParallelArchTLP-DLP
52 pages
Week 4a - Computer Architecture Fundamentals - Part 1
No ratings yet
Week 4a - Computer Architecture Fundamentals - Part 1
45 pages
Aca Unit 1.1
No ratings yet
Aca Unit 1.1
20 pages
Parallel Computer Models: CEG 4131 Computer Architecture III Miodrag Bolic
No ratings yet
Parallel Computer Models: CEG 4131 Computer Architecture III Miodrag Bolic
27 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
28 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Flynn's Classification
No ratings yet
Flynn's Classification
46 pages
Parallel & Distributed Computing: By: M. Imran Siddiqui
No ratings yet
Parallel & Distributed Computing: By: M. Imran Siddiqui
25 pages
Review of LSS CSC
No ratings yet
Review of LSS CSC
21 pages
COA U5 PPT Full
No ratings yet
COA U5 PPT Full
43 pages
Lecture 1: Introduction: Graphics Processing Units (Gpus) : Architecture and Programming
No ratings yet
Lecture 1: Introduction: Graphics Processing Units (Gpus) : Architecture and Programming
33 pages
Advanced Computer Architecture: Presented By, Krishna
No ratings yet
Advanced Computer Architecture: Presented By, Krishna
35 pages
lecture1
No ratings yet
lecture1
37 pages
Unit 1
No ratings yet
Unit 1
22 pages
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
No ratings yet
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
43 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
CUDA Programming Fundamentals: Definitive Reference for Developers and Engineers
From Everand
CUDA Programming Fundamentals: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Slide02 Parallel Computers
No ratings yet
Slide02 Parallel Computers
44 pages
UNIT1
No ratings yet
UNIT1
11 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
Lecture13 - Full IS1500
No ratings yet
Lecture13 - Full IS1500
34 pages
Ch12 Parallel Proc3-Aula
No ratings yet
Ch12 Parallel Proc3-Aula
35 pages
Organization of Multiprocessor Systems
No ratings yet
Organization of Multiprocessor Systems
87 pages
Lec2 ParallelProgrammingPlatforms
No ratings yet
Lec2 ParallelProgrammingPlatforms
26 pages
Cs8083 MCP Unit I Notes
No ratings yet
Cs8083 MCP Unit I Notes
31 pages
Model
No ratings yet
Model
14 pages
Advanced Computer Architecture Unit 1
No ratings yet
Advanced Computer Architecture Unit 1
23 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Lect2 Classification
No ratings yet
Lect2 Classification
23 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Parralel PerformanceMeasurement
No ratings yet
Parralel PerformanceMeasurement
23 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Chap15 Sima Mimd
No ratings yet
Chap15 Sima Mimd
12 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
RLC Filters: Electrical Network Analysis
No ratings yet
RLC Filters: Electrical Network Analysis
9 pages
Parallel Architectures Parallel Architectures: Ever Faster
No ratings yet
Parallel Architectures Parallel Architectures: Ever Faster
11 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Parallel Computer Architecture Classification
No ratings yet
Parallel Computer Architecture Classification
23 pages
Architecture of Parallel Computing
No ratings yet
Architecture of Parallel Computing
6 pages
HISTORY OF COMPUTERS Long Notes
No ratings yet
HISTORY OF COMPUTERS Long Notes
9 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
Computer Hardware Components
100% (4)
Computer Hardware Components
15 pages
AP Computer Science A Multiple Choice Questions Packet
No ratings yet
AP Computer Science A Multiple Choice Questions Packet
57 pages
Computer Network Quantum
No ratings yet
Computer Network Quantum
193 pages
CUDA Programming with C++: From Basics to Expert Proficiency
From Everand
CUDA Programming with C++: From Basics to Expert Proficiency
William Smith
No ratings yet
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
No ratings yet
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
48 pages
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
Brksec 2464 1
No ratings yet
Brksec 2464 1
143 pages
Velammal Engineering College Department of Cse Design and Analysis of Algorithm Unit V Notes Decision Trees (2 Marks)
No ratings yet
Velammal Engineering College Department of Cse Design and Analysis of Algorithm Unit V Notes Decision Trees (2 Marks)
29 pages
Quiz CLF-02
No ratings yet
Quiz CLF-02
25 pages
Upload Download So PV 33 en
No ratings yet
Upload Download So PV 33 en
43 pages
NOC Computer Science and Engineering (1)
No ratings yet
NOC Computer Science and Engineering (1)
3 pages
Cs601 Current Final Term Paper 2014
100% (1)
Cs601 Current Final Term Paper 2014
8 pages
Datasheet Stormagic SvSAN
No ratings yet
Datasheet Stormagic SvSAN
6 pages
CS 332: Algorithms: Proof by Induction Asymptotic Notation
No ratings yet
CS 332: Algorithms: Proof by Induction Asymptotic Notation
18 pages
Round Robin and Priority Schedule
No ratings yet
Round Robin and Priority Schedule
9 pages
Bridge REPORT
100% (1)
Bridge REPORT
29 pages
DSD Lab 5
No ratings yet
DSD Lab 5
5 pages
4541 775 9 CGRA Introduction PDF
No ratings yet
4541 775 9 CGRA Introduction PDF
18 pages
Ipv6 and Ipv4
No ratings yet
Ipv6 and Ipv4
10 pages
CE707 Assignment1 2022
No ratings yet
CE707 Assignment1 2022
7 pages
CH341A Programer
No ratings yet
CH341A Programer
3 pages
CS 389 - Software Engineering: Lecture 2 - Part 1 Chapter 2 - Software Processes
No ratings yet
CS 389 - Software Engineering: Lecture 2 - Part 1 Chapter 2 - Software Processes
29 pages
Fugro - 9265 - User Manual
100% (1)
Fugro - 9265 - User Manual
16 pages
Chapter 2 PHP Form Handling
No ratings yet
Chapter 2 PHP Form Handling
8 pages
Medecom BC MedArchive and Clipper EN
No ratings yet
Medecom BC MedArchive and Clipper EN
2 pages
II B. Tech II Semester Regular/ Supplementary Examinations, April/May - 2019 Computer Organization
No ratings yet
II B. Tech II Semester Regular/ Supplementary Examinations, April/May - 2019 Computer Organization
4 pages
Working With NumPy For Class 12th PDF
No ratings yet
Working With NumPy For Class 12th PDF
5 pages
SoC or System On Chip Seminar Report
No ratings yet
SoC or System On Chip Seminar Report
25 pages
WINDOWS 11 FOR BEGINNERS POWER USERS The Concise Microsoft Windows 11 A-Z Mastery Guide For All Users (King, Fritsche Demystified, Tech)
100% (4)
WINDOWS 11 FOR BEGINNERS POWER USERS The Concise Microsoft Windows 11 A-Z Mastery Guide For All Users (King, Fritsche Demystified, Tech)
253 pages
AWS Stephane Maarek Practice Test1 From Udemy June
No ratings yet
AWS Stephane Maarek Practice Test1 From Udemy June
151 pages

CS516: Parallelization of Programs: Overview of Parallel Architectures

Uploaded by

CS516: Parallelization of Programs: Overview of Parallel Architectures

Uploaded by

CS516: Parallelization of Programs

Overview of Parallel Architectures

• Implements a universal Turing machine

• Conforms to serial algorithmic analysis

From http://arstechnica.com/paedia/c/ cpu/part-1/cpu1-1.html

• All processors operating in lock step

From http://arstechnica.com/paedia/c/c pu/part-1/cpu1-1.html

• Multi-core, SMP, Clusters, NUMA machines, etc.

• Interconnect complete processors with local memory

From http://arstechnica.com/paedia/c/c pu/part-1/cpu1-1.html

• Simple example (A, B and C are vectors)

• Single control processor (broadcast each instruction to an array/grid of

• Examples: Connection Machine, MPP (Massively Parallel Processor)

• In 2007, fully programmable GPUs

• Different architectures are evolved

• Miscellaneous resources on internet

You might also like