Lecture 2

This document discusses different platforms for parallel computing. It describes how parallelism addresses performance bottlenecks in processors, memory systems, and data paths. It then discusses two approaches to implicit parallelism: pipelining and superscalar execution, and very long instruction word (VLIW) processors. Pipelining overlaps instruction execution stages to improve performance. Superscalar processors issue multiple instructions concurrently by exploiting instruction-level parallelism. VLIW processors rely on compile-time analysis to bundle instructions for parallel execution.

Uploaded by

Lets clear Jee maths

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

Lecture 2

Uploaded by

Lets clear Jee maths

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 17

Parallel Computing Platforms

Implicit Parallelism
Scope of Parallelism
• Conventional architectures coarsely comprise of a processor,
memory system, and the datapath.
• Each of these components present significant performance
bottlenecks.
• A number of architectural innovations over the years have
addressed these bottlenecks.
• One of the most important innovations is multiplicity – in
processing units, datapaths, and memory units.
• This multiplicity is either entirely hidden from the programmer,
as in the case of implicit parallelism, or exposed to the
programmer in different forms.
Scope of Parallelism

• Parallelism addresses each of these components (processor,

memory system, and the datapath) in significant ways.
• Different applications utilize different aspects of parallelism -
e.g., data intensive applications utilize high aggregate
throughput, server applications utilize high aggregate network
bandwidth, and scientific applications typically utilize high
processing and memory system performance.
• It is important to understand each of these performance
bottlenecks.
Implicit Parallelism
• Pipelining and Superscalar Execution
• Very Long Instruction Word Processors
Implicit Parallelism: Trends in Microprocessor
Architectures
• Microprocessor clock speeds have posted impressive gains
over the past two decades (two to three orders of magnitude).
• Higher levels of device integration have made available a
large number of transistors.
• The question of how best to utilize these resources is an
important one.
• Current processors use these resources in multiple functional
units and execute multiple instructions in the same cycle.
• The precise manner in which these instructions are selected
and executed provides impressive diversity in architectures.
1. Pipelining and Superscalar Execution

• Pipelining overlaps various stages of instruction execution to

achieve performance.
• At a high level of abstraction, an instruction can be executed
while the next one is being decoded and the next one is being
fetched.
• One of the real life example is assembly line for manufacture
of cars.
– Divide the process in to multiple pipelined stages
– Each pipeline stage has multiple units
– This will lead to multi-fold speedup as compared to serial production
Pipelining and Superscalar Execution

• Pipelining, however, has several limitations.

• The speed of a pipeline is eventually limited by the slowest
stage.
• For this reason, conventional processors rely on very deep
pipelines (20 stage pipelines in state-of-the-art Pentium
processors).
• However, in typical program traces, every 5-6th instruction is a
conditional jump! This requires very accurate branch prediction.
• The penalty of a mis-prediction grows with the depth of the
pipeline, since a larger number of instructions will have to be
flushed.
Pipelining and Superscalar Execution

• One simple way of alleviating these bottlenecks is to use

multiple pipelines.
• During each clock cycle, multiple instructions are piped into
the processor in parallel.
• These instructions are executed on multiple functional units.
Superscalar Execution: An Example

Example of a two-
way superscalar
execution of
instructions.
Superscalar Execution: An Example

• In the above example, there is some wastage of resources

due to data dependencies.

• The example also illustrates that different instruction mixes

with identical semantics can take significantly different
execution time.
Superscalar Execution

• Scheduling of instructions is determined by a number of

factors:
– True Data Dependency: The result of one operation is an input
to the next.
– Resource Dependency: Two operations require the same
resource.
– Branch Dependency: Scheduling instructions across conditional
branch statements cannot be done deterministically a-priori.
– The scheduler, a piece of hardware looks at a large number of
instructions in an instruction queue and selects appropriate
number of instructions to execute concurrently based on these
factors.
– The complexity of this hardware is an important constraint on
superscalar processors.
Superscalar Execution:
Issue Mechanisms
• In the simpler model, instructions can be issued only in the
order in which they are encountered.
• That is, if the second instruction cannot be issued because it
has a data dependency with the first, only one instruction is
issued in the cycle.
• This is called in-order issue.
Superscalar Execution:
Issue Mechanisms
• In a more aggressive model, instructions can be issued out of
order.
• In this case, if the second instruction has data dependencies
with the first, but the third instruction does not, the first and
third instructions can be co-scheduled.
• This is also called dynamic issue.
• Performance of in-order issue is generally limited.
Superscalar Execution:
Efficiency Considerations
• Not all functional units can be kept busy at all times.
• If during a cycle, no functional units are utilized, this is referred
to as vertical waste.
• If during a cycle, only some of the functional units are utilized,
this is referred to as horizontal waste.
• Due to limited parallelism in typical instruction traces,
dependencies, or the inability of the scheduler to extract
parallelism, the performance of superscalar processors is
eventually limited.
• Conventional microprocessors typically support four-way
superscalar execution.
2. Very Long Instruction Word (VLIW)
Processors
• The hardware cost and complexity of the superscalar scheduler
is a major consideration in processor design.
• To address this issues, VLIW processors rely on compile time
analysis to identify and bundle together instructions that can be
executed concurrently.
• These instructions are packed and dispatched together, and
thus the name very long instruction word.
• This concept was used with some commercial success in the
Multiflow Trace machine (circa 1984).
• Variants of this concept are employed in the Intel IA64
processors.
Very Long Instruction Word (VLIW)
Processors: Considerations
• Compiler has a bigger context from which to select co-
scheduled instructions.
• Compilers, however, do not have runtime information such as
cache misses. Scheduling is, therefore, inherently
conservative.
• Branch and memory prediction is more difficult.
• VLIW performance is highly dependent on the compiler. A
number of techniques such as loop unrolling, speculative
execution, branch prediction are critical.
• Typical VLIW processors are limited to 4-way to 8-way
parallelism.
Thank You

EXP-301 Windows User Mode Exploit Development
No ratings yet
EXP-301 Windows User Mode Exploit Development
291 pages
Parallel Processing
No ratings yet
Parallel Processing
127 pages
Chap2 Slides
No ratings yet
Chap2 Slides
127 pages
Parallel Computing Platforms-Dr Nausheen
No ratings yet
Parallel Computing Platforms-Dr Nausheen
47 pages
CSC 580 - Chapter 2
No ratings yet
CSC 580 - Chapter 2
50 pages
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
No ratings yet
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
17 pages
Lecture (2) .PPT-1
100% (1)
Lecture (2) .PPT-1
19 pages
Parallel Processing: sp2016 Lec#3
No ratings yet
Parallel Processing: sp2016 Lec#3
23 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
Module 2
No ratings yet
Module 2
127 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
ITEC582-Chapter 16m
No ratings yet
ITEC582-Chapter 16m
55 pages
Computer Architecture Unit 3
No ratings yet
Computer Architecture Unit 3
8 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Advanced Computer Architecture Prof Thriveni T K
No ratings yet
Advanced Computer Architecture Prof Thriveni T K
59 pages
005-SimultaneousMultithreading
No ratings yet
005-SimultaneousMultithreading
50 pages
Superscaling in Computer Architecture
No ratings yet
Superscaling in Computer Architecture
9 pages
Parallel Processing Chapter - 2
0% (1)
Parallel Processing Chapter - 2
135 pages
08 Parallel algorithms approches
No ratings yet
08 Parallel algorithms approches
12 pages
Lec2 ParallelProgrammingPlatforms
No ratings yet
Lec2 ParallelProgrammingPlatforms
26 pages
15CS72_ACA_Module2FinalCopy
No ratings yet
15CS72_ACA_Module2FinalCopy
29 pages
Computer Architecture_Lecture 13
No ratings yet
Computer Architecture_Lecture 13
18 pages
Pipelining
No ratings yet
Pipelining
5 pages
Presentation - Cea - Chapter16 2
No ratings yet
Presentation - Cea - Chapter16 2
33 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
Superscalar and VLIW Architectures
No ratings yet
Superscalar and VLIW Architectures
35 pages
Very Long Instruction Word
No ratings yet
Very Long Instruction Word
19 pages
ACA Mod2
No ratings yet
ACA Mod2
45 pages
Unit 5
No ratings yet
Unit 5
44 pages
Aca Notes
No ratings yet
Aca Notes
23 pages
8. Module3
No ratings yet
8. Module3
49 pages
CSO Computer Programming
No ratings yet
CSO Computer Programming
73 pages
HPC-Unit-2
No ratings yet
HPC-Unit-2
72 pages
Superscalar Processor
No ratings yet
Superscalar Processor
4 pages
Unit 1
No ratings yet
Unit 1
5 pages
L8 Processor Types Parallelism Other Features Jan24 2024
No ratings yet
L8 Processor Types Parallelism Other Features Jan24 2024
2 pages
L27,28 Superscaler
No ratings yet
L27,28 Superscaler
28 pages
Advanced Processor Superscalarclass
50% (2)
Advanced Processor Superscalarclass
73 pages
Parallel Programming Platforms: Alexandre David 1.2.05
No ratings yet
Parallel Programming Platforms: Alexandre David 1.2.05
30 pages
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
20 pages
A Comparative Report Between EPIC and VLIW Architecture
No ratings yet
A Comparative Report Between EPIC and VLIW Architecture
2 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
19 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Module 6
No ratings yet
Module 6
59 pages
Chapter 04 Processors and Memory Hierarchy PDF
No ratings yet
Chapter 04 Processors and Memory Hierarchy PDF
50 pages
Chapter 04 Processors and Memory Hierarchy
75% (8)
Chapter 04 Processors and Memory Hierarchy
50 pages
VI. Implicit Parallelism - Instruction Level VI. Implicit Parallelism Instruction Level Parallelism. Pipeline Superscalar & Vector P Processors
No ratings yet
VI. Implicit Parallelism - Instruction Level VI. Implicit Parallelism Instruction Level Parallelism. Pipeline Superscalar & Vector P Processors
26 pages
Epic Vliw
No ratings yet
Epic Vliw
4 pages
Instruction-Level Parallelism and Superscalar Processors
No ratings yet
Instruction-Level Parallelism and Superscalar Processors
22 pages
Presentation Cea Chapter16 2 Demo
No ratings yet
Presentation Cea Chapter16 2 Demo
30 pages
Introduction To Parallel Processing: Unit-2
No ratings yet
Introduction To Parallel Processing: Unit-2
32 pages
CH - 14 - Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH - 14 - Instruction Level Parallelism and Superscalar Processors
42 pages
CH16-WS ILP and Superscalar-v2
No ratings yet
CH16-WS ILP and Superscalar-v2
42 pages
UNIT 1 (1)
No ratings yet
UNIT 1 (1)
34 pages
Zareen 13
No ratings yet
Zareen 13
13 pages
Processors
100% (4)
Processors
44 pages
CH18-COA11e
No ratings yet
CH18-COA11e
37 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Oracle 11g Streams Implementer's Guide
From Everand
Oracle 11g Streams Implementer's Guide
Ann L. R. McKinnell
No ratings yet
Kafka Developer Certified: The Essential Guide
From Everand
Kafka Developer Certified: The Essential Guide
SUJAN
No ratings yet
UCS617
No ratings yet
UCS617
1 page
UMA034
No ratings yet
UMA034
1 page
UCS631
No ratings yet
UCS631
1 page
Lecture 1
No ratings yet
Lecture 1
23 pages
PCA513
No ratings yet
PCA513
1 page
Lecture 5
No ratings yet
Lecture 5
72 pages
MCA101
No ratings yet
MCA101
1 page
PVL333
No ratings yet
PVL333
1 page
OWC Accelsior 1A Installation Guide
No ratings yet
OWC Accelsior 1A Installation Guide
8 pages
Efficient Memory Mapped File IO For In-Memory File Systems
No ratings yet
Efficient Memory Mapped File IO For In-Memory File Systems
6 pages
ICT-Grade 8
No ratings yet
ICT-Grade 8
32 pages
Document 3
No ratings yet
Document 3
5 pages
Tle8 CSS
No ratings yet
Tle8 CSS
14 pages
02 SFM Guide Instructors Investigators
No ratings yet
02 SFM Guide Instructors Investigators
24 pages
Operatingsystems: What Is Server Platform?
No ratings yet
Operatingsystems: What Is Server Platform?
9 pages
Lastexception 63720475532
No ratings yet
Lastexception 63720475532
2 pages
Lab Two - Introduction To MS Excel Part Two
No ratings yet
Lab Two - Introduction To MS Excel Part Two
5 pages
ISD-SC5030S-E2CVL
No ratings yet
ISD-SC5030S-E2CVL
4 pages
Engineering Change Record
No ratings yet
Engineering Change Record
57 pages
G31M-VS Bios CN
No ratings yet
G31M-VS Bios CN
20 pages
The Complete Guide to Blender Graphics: Computer Modeling and Animation: Volume 2 9th Edition John M. Blain all chapter instant download
100% (5)
The Complete Guide to Blender Graphics: Computer Modeling and Animation: Volume 2 9th Edition John M. Blain all chapter instant download
46 pages
Oosto OnAccess Genetec - Technical Description
No ratings yet
Oosto OnAccess Genetec - Technical Description
3 pages
8 Đề Thi Olympic CAD 3D
No ratings yet
8 Đề Thi Olympic CAD 3D
11 pages
Atm Technology
No ratings yet
Atm Technology
62 pages
Code - Cap 3
No ratings yet
Code - Cap 3
5 pages
Azure Interview Questions
No ratings yet
Azure Interview Questions
19 pages
Prototype Submission Template
No ratings yet
Prototype Submission Template
17 pages
Supermap Ai Gis Technology: Yan Yuna Product Consultant Supermap R&D Institute
No ratings yet
Supermap Ai Gis Technology: Yan Yuna Product Consultant Supermap R&D Institute
52 pages
RO190642 - Lab Manual - 2023
No ratings yet
RO190642 - Lab Manual - 2023
74 pages
Alexander Shvets Design Patterns Explained Simply
No ratings yet
Alexander Shvets Design Patterns Explained Simply
2 pages
Computer Science Topic 1.1 Questions
No ratings yet
Computer Science Topic 1.1 Questions
34 pages
Thinkcentre Neo 50q Gen 4 12lm000fls
No ratings yet
Thinkcentre Neo 50q Gen 4 12lm000fls
2 pages
Agile Accessibility Handbook
No ratings yet
Agile Accessibility Handbook
75 pages
ES680 Manual
100% (2)
ES680 Manual
150 pages
VINAY KR
No ratings yet
VINAY KR
2 pages
M8 V800R022C00SPC600 Release Notes
No ratings yet
M8 V800R022C00SPC600 Release Notes
36 pages
Unified Analytics For Dummies Databricks Special Edition Ulrika Jägare download pdf
100% (2)
Unified Analytics For Dummies Databricks Special Edition Ulrika Jägare download pdf
64 pages

Lecture 2

Uploaded by

Lecture 2

Uploaded by

Parallel Computing Platforms

• Parallelism addresses each of these components (processor,

• Pipelining overlaps various stages of instruction execution to

• Pipelining, however, has several limitations.

• One simple way of alleviating these bottlenecks is to use

• In the above example, there is some wastage of resources

• The example also illustrates that different instruction mixes

• Scheduling of instructions is determined by a number of

You might also like