Instruction Level Parallelism

The document discusses instruction-level parallelism (ILP) and its impact on program execution speed, emphasizing the importance of potential parallelism in both the program and processor. It explains concepts like pipelining, multiple instruction issue, and scheduling constraints that affect how instructions can be executed concurrently. Additionally, it addresses the complexities of data dependencies and the challenges of optimal instruction scheduling, which is NP-complete, suggesting heuristic approaches like list scheduling to manage these issues.

Uploaded by

hkjoshi400

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views22 pages

Instruction Level Parallelism

Uploaded by

hkjoshi400

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Instruction Level

Parallelism
How fast can a program be run on a
processor with instruction-level parallelism?

 The potential parallelism in the program.

 The available parallelism on the processor.
 Our ability to extract parallelism from the original sequential
program.
 Our ability to find the best parallel schedule given scheduling
constraints.

If all the operations in a program are highly dependent upon one

another, then no amount of hardware or parallelization
techniques can make the program run fast in parallel.
Processor Architectures
 Usually a processor issuing several operations in
a single clock cycle.
 In fact, it is possible for a machine to issue just
one operation per clock and yet achieve
instruction level parallelism using the concept of
pipelining.
 Every processor, be it a high-performance
supercomputer or a standard machine, uses an
instruction pipeline.
 With an instruction pipeline, a new instruction
can be fetched every clock while preceding
instructions are still going through the pipeline.
 There is a simple 5-stage instruction pipeline: it
first fetches the instruction (IF), decodes it (ID),
executes the operation (EX), accesses the
memory (MEM), and writes back the result (WB).
Pipelined Execution
 Some instructions take several clocks to execute.
 When a memory access hits in the cache, it usually takes several
clocks for the cache to return the data.
 The execution of an instruction is pipelined if succeeding instructions
not dependent on the result are allowed to proceed.
 Thus, even if a processor can issue only one operation per clock,
several operations might be in their execution stages at the same
time.
 If the deepest execution pipeline has n stages, potentially n
operations can be in flight at the same time.
 Note that not all instructions are fully pipelined.
 While floating-point adds and multiplies often are fully pipelined,
floating-point divides, being more complex and less frequently
executed, often are not.
Multiple Instruction Issue
 By issuing several operations per clock, processors can keep even more operations in
flight.
 The largest number of operations that can be executed simultaneously can be computed
by multiplying the instruction issue width by the average number of stages in the
execution pipeline.
 Like pipelining, parallelism on multiple-issue machines can be managed either by software
or hardware.
 Machines that rely on software to manage their parallelism are known as VLIW (Very-Long-
Instruction-Word) machines, while those that manage their parallelism with hardware are
known as superscalar machines.
 VLIW machines, as their name implies, have wider than normal instruction words that
encode the operations to be issued in a single clock.
 The compiler decides which operations are to be issued in parallel and encodes the
information in the machine code explicitly.
 Superscalar machines, on the other hand, have a regular instruction set with an ordinary
sequential-execution semantics.
 Superscalar machines automatically detect dependences among instructions and issue
them as their operands become available. Some processors include both VLIW and
superscalar functionality.
Coding Scheduling Constraints

 Control-dependence constraints: All the operations executed in

the original program must be executed in the optimized one.
 Data-dependence constraints: The operations in the optimized
program must produce the same results as the corresponding ones in
the original program.
 Resource constraints: The schedule must not oversubscribe the
resources on the machine.

These scheduling constraints guarantee that the optimized

program pro duces the same results as the original.
Data Dependence
 True dependence: read after write. If a write is followed by a read of
the same location, the read depends on the value written; such a
dependence is known as a true dependence.
 Antidependence: write after read. If a read is followed by a write to
the same location, we say that there is an antidependence from the
read to the write
 Output dependence: write after write. Two writes to the same
location share an output dependence. If the dependence is violated,
the value of the memory location written will have the wrong value
after both operations are performed.
Antidependence and output dependences are referred to as
storage-related dependences.
These are not true dependences and can be eliminated by using
different locations to store different values.
Antidependences
 Antidependences are not real dependences, in the sense that they do
not arise from the flow of data.
 They are due to a single location being used to store different values.
Most of the time, antidependences can be removed by renaming
locations — e.g. registers.
 In the example below, the program on the left contains a WAW
antidependence between the two memory load instructions, that can
be removed by renaming the second use of R1.
Instruction ordering

 When a compiler emits the instructions corresponding to a program,

it imposes a total order on them.
 However, that order is usually not the only valid one, in the sense
that it can be changed without modifying the program’s behavior.
 For example, if two instructions i1 and i2 appear sequentially in that
order and are independent, then it is possible to swap them.
 Among all the valid permutations of the instructions composing a
program — i.e. those that preserve the program’s behavior — some
can be more desirable than others.
 For example, one order might lead to a faster program on some
machine, because of architectural constraints. The aim of instruction
scheduling is to find a valid order that optimizes some metric, like
execution speed.
Instruction Scheduling Example
(a + b) + c + (d + e)
Parallel evaluation of the expression
Dependence Graph

 The dependence graph is a directed graph representing dependences

among instructions.
 Its nodes are the instructions to schedule, and there is an edge from
node n1 to node n2 iff the instruction of n2 depends on n1.
 Any topological sort of the nodes of this graph represents a valid way
to schedule the instructions
Dependence Graph Example
Difficulty of scheduling

 Optimal instruction scheduling is NP-complete.

 As always, this implies that we will use techniques based on
heuristics to find a good — but sometimes not optimal — solution to
that problem.
 List scheduling is a technique to schedule the instructions of a single
basic block.
 Its basic idea is to simulate the execution of the instructions, and to
try to schedule instructions only when all their operands can be used
without stalling the pipeline
List Scheduling Algorithm

 The list scheduling algorithm maintains two lists: –

 Ready is the list of instructions that could be scheduled without stall,
ordered by priority,
 Active is the list of instructions that are being executed.
 At each step, the highest-priority instruction from ready is scheduled,
and moved to active, where it stays for a time equal to its delay.
 Before scheduling is performed, renaming is done to remove all
antidependences that can be removed
Prioritizing instructions

 Nodes (i.e. instructions) are sorted by priority in the ready list.

 Several schemes exist to compute the priority of a node, which can
be equal to: –
 The length of the longest latency-weighted path from it to a root of the
dependence graph, –
 The number of its immediate successors, –
 The number of its descendants, –
 Its latency, – etc.
Unfortunately, no single scheme is better for all cases
Scheduling Conflicts
 It is hard to decide whether scheduling should be done before or after
register allocation.
 If register allocation is done first, it can introduce antidependences
when reusing registers.
 If scheduling is done first, register allocation can introduce spilling
code, destroying the schedule.
 Solution: schedule first, then allocate registers and schedule once
more if spilling was necessary

2023 Grade 09 ICT 1st Term Test Paper English Medium
100% (6)
2023 Grade 09 ICT 1st Term Test Paper English Medium
6 pages
AXIOMTECH eBOX530-820-FL 2 PDF
No ratings yet
AXIOMTECH eBOX530-820-FL 2 PDF
74 pages
Instruction Scheduling
No ratings yet
Instruction Scheduling
17 pages
Madonna American Life Songbook PDF
0% (2)
Madonna American Life Songbook PDF
3 pages
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
No ratings yet
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
201 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
4-Advanced Pipelining - 241114 - 060906
No ratings yet
4-Advanced Pipelining - 241114 - 060906
80 pages
2 TypesofParallelism
No ratings yet
2 TypesofParallelism
69 pages
Solaris OBP Reference Guide
No ratings yet
Solaris OBP Reference Guide
125 pages
Basics of OS
No ratings yet
Basics of OS
22 pages
CompanionAsset 9780128119051 Chapter03
No ratings yet
CompanionAsset 9780128119051 Chapter03
67 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
214 pages
L1.3b OOOpipelines
No ratings yet
L1.3b OOOpipelines
72 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
EC483 Fall2024 W7
No ratings yet
EC483 Fall2024 W7
40 pages
Module 5 Instruction Level Parallelism and Pipelining
No ratings yet
Module 5 Instruction Level Parallelism and Pipelining
54 pages
ILP Overview and Scoreboard
No ratings yet
ILP Overview and Scoreboard
60 pages
Module 1 - Von Neuman Architecture and Basic Gates
No ratings yet
Module 1 - Von Neuman Architecture and Basic Gates
28 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
MCP Unit 1
No ratings yet
MCP Unit 1
41 pages
CAP Classification System Final
No ratings yet
CAP Classification System Final
22 pages
Pipelining 2019
No ratings yet
Pipelining 2019
82 pages
Lec5 - ILP Issues in Pipeline Design
No ratings yet
Lec5 - ILP Issues in Pipeline Design
38 pages
Lecture 5
No ratings yet
Lecture 5
50 pages
41-Instruction Scheduling and Software Pipelining-19!11!2024
No ratings yet
41-Instruction Scheduling and Software Pipelining-19!11!2024
31 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
EE457Unit9a OoO
No ratings yet
EE457Unit9a OoO
77 pages
CompArch 17e ILP-1
No ratings yet
CompArch 17e ILP-1
15 pages
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
No ratings yet
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
50 pages
3a.ILP Dipendenze e Superscalare
No ratings yet
3a.ILP Dipendenze e Superscalare
24 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
Unit 5
No ratings yet
Unit 5
36 pages
ITEC582-Chapter 16m
No ratings yet
ITEC582-Chapter 16m
55 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
Lec5 PDF
No ratings yet
Lec5 PDF
39 pages
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
50 pages
CH18 COA11e
No ratings yet
CH18 COA11e
37 pages
ILI9488驱动芯片数据手册
No ratings yet
ILI9488驱动芯片数据手册
339 pages
U3.1 Concepts and Challenges
No ratings yet
U3.1 Concepts and Challenges
12 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
0003 BHB CAN-Bus en
100% (1)
0003 BHB CAN-Bus en
48 pages
Computer System Organization
No ratings yet
Computer System Organization
26 pages
Pipelining Become Universal Technique in 1985
No ratings yet
Pipelining Become Universal Technique in 1985
16 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
Chapter 5 PPTV 41 STDV 1
No ratings yet
Chapter 5 PPTV 41 STDV 1
47 pages
Cabin Control Board Version 1.0 Ref. Cartop V1.0
No ratings yet
Cabin Control Board Version 1.0 Ref. Cartop V1.0
19 pages
ILP-Architectures Part I
No ratings yet
ILP-Architectures Part I
56 pages
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
No ratings yet
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
14 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
CS 6461: Computer Architecture Instruction Level Parallelism
No ratings yet
CS 6461: Computer Architecture Instruction Level Parallelism
41 pages
PNOZmulti Sys Descr 1002217-En-14
100% (1)
PNOZmulti Sys Descr 1002217-En-14
33 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
No ratings yet
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
21 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
Patriot 2005 - ECS 321 Laptop - English Manual
No ratings yet
Patriot 2005 - ECS 321 Laptop - English Manual
60 pages
Introduction To Instruction Level Parallelism (ILP) : ECE338 Parallel Computer Architecture Spring 2022
No ratings yet
Introduction To Instruction Level Parallelism (ILP) : ECE338 Parallel Computer Architecture Spring 2022
13 pages
Instruction-Level Parallel Processors: Objective
No ratings yet
Instruction-Level Parallel Processors: Objective
31 pages
Instruction-Level Parallel Processors: Asim Munir
No ratings yet
Instruction-Level Parallel Processors: Asim Munir
28 pages
pdc2: MODULE2
No ratings yet
pdc2: MODULE2
113 pages
Study Guide Chapter 3
No ratings yet
Study Guide Chapter 3
3 pages
Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
Instruction Level Parallelism: Soner Onder
No ratings yet
Instruction Level Parallelism: Soner Onder
25 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
Instruction-Level Parallelism and Superscalar Processors
No ratings yet
Instruction-Level Parallelism and Superscalar Processors
22 pages
A+ Guide To Managing & Maintaining Your PC, 8th Edition: All About Motherboards
No ratings yet
A+ Guide To Managing & Maintaining Your PC, 8th Edition: All About Motherboards
54 pages
SEO Assignment #1: Apple Business
No ratings yet
SEO Assignment #1: Apple Business
6 pages
ILP Saad Saeed
No ratings yet
ILP Saad Saeed
31 pages
Computer Science Priya
No ratings yet
Computer Science Priya
6 pages
Disc07 Sols
No ratings yet
Disc07 Sols
6 pages
IT Tools and Application Input-Output Devices
No ratings yet
IT Tools and Application Input-Output Devices
13 pages
Digital Termomether and Thermostat Ds1620
No ratings yet
Digital Termomether and Thermostat Ds1620
12 pages
Aspire One ZA3
No ratings yet
Aspire One ZA3
12 pages
HP Notebook 15-Bs105ne: Reliable Performance. Beautiful Design. Do More
No ratings yet
HP Notebook 15-Bs105ne: Reliable Performance. Beautiful Design. Do More
2 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
19 pages
CR-90BT Im
No ratings yet
CR-90BT Im
2 pages
en - Differences Between SIMATIC ModbusTCP Redundant Versions
No ratings yet
en - Differences Between SIMATIC ModbusTCP Redundant Versions
1 page
Coldplay Survey Responses
No ratings yet
Coldplay Survey Responses
10 pages
03 Dynamic Sched
No ratings yet
03 Dynamic Sched
84 pages
XGB Economic, Standard Catalogue (Eng) - 20100504
No ratings yet
XGB Economic, Standard Catalogue (Eng) - 20100504
2 pages
U62H64SA: Automotive Fast 8K X 8 SRAM
No ratings yet
U62H64SA: Automotive Fast 8K X 8 SRAM
8 pages
SOP - Create & Restore Image Use Acronis With Universal Restore
No ratings yet
SOP - Create & Restore Image Use Acronis With Universal Restore
16 pages
WP012 64bit Software Porting
No ratings yet
WP012 64bit Software Porting
8 pages
7608E Series of Eight (8) Video Inputs Encoder With Bi-Directional or 12-Unidirection Audio Inputs
0% (1)
7608E Series of Eight (8) Video Inputs Encoder With Bi-Directional or 12-Unidirection Audio Inputs
2 pages
O-Af02a Phe PHC 20220518
No ratings yet
O-Af02a Phe PHC 20220518
2 pages
UC Voice Platform Comparison Matrix
No ratings yet
UC Voice Platform Comparison Matrix
1 page
Swift Programming Simplified: A Practical Guide with Examples
From Everand
Swift Programming Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet

Instruction Level Parallelism

Uploaded by

Instruction Level Parallelism

Uploaded by

Instruction Level

 The potential parallelism in the program.

If all the operations in a program are highly dependent upon one

 Control-dependence constraints: All the operations executed in

These scheduling constraints guarantee that the optimized

 When a compiler emits the instructions corresponding to a program,

 The dependence graph is a directed graph representing dependences

 Optimal instruction scheduling is NP-complete.

 The list scheduling algorithm maintains two lists: –

 Nodes (i.e. instructions) are sorted by priority in the ready list.

You might also like