0% found this document useful (0 votes)

475 views19 pages

Instruction Level Parallelism

Instruction level parallelism Computer designers and computer architects have been striving to improve uniprocessor computer performance since the first computer was designed and this is done by exploiting advances in implementation technology. Architectural innovations have also played a part, and one of the most significant of these over the last decade has been the rediscovery of RISC architectures. RISC architectures have gained acceptance in

Uploaded by

Zarnigar Altaf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

475 views19 pages

Instruction Level Parallelism

Uploaded by

Zarnigar Altaf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 19

Instruction level parallelism

Asma hameed
Syeda phool zehra
Zarnigar altaf
 Computer designers and computer architects have been striving to improve uniprocessor
computer performance since the first computer was designed and this is done by exploiting
advances in implementation technology.
 Architectural innovations have also played a part, and one of the most significant of these
over the last decade has been the rediscovery of RISC architectures.
 RISC architectures have gained acceptance in both scientific and marketing circles.
 computer architects have been thinking of new ways to improve uniprocessor performance
by exploiting instruction-level parallelism. Many of these proposals are:
VLIW
superscalar
And some old ideas such as vector processing
 computer architects take advantage of parallelism by issuing more than one instruction per
cycle explicitly (as in VLIW or super scalar machines) or implicitly (as in vector machines).
 The amount of instruction-level parallelism varies widely depending on the type of code
being executed. This is because when we consider uniprocessor performance
improvements due to exploitation of instruction-level parallelism, it is important to keep
in mind the type of application environment
 if the dominant applications have little instruction-level parallelism the performance
improvements will be much smaller.
 Parallel computing is a form of computation in which many calculations are carried out
simultaneously, operating on the principle that large problems can often be divided into
smaller ones, which are then solved concurrently("in parallel"). Parallel Computations use
multi-processor computers and/or several independent computers interconnected in some
way, working together on a common task.
 Parallelism is the simultaneous use of multiple compute resources to solve a
computational problem:
•To be run using multiple CPUs.
•A problem is broken into discrete parts that can be solved concurrently.
•Each part is further broken down to a series of instructions.
•Instructions from each part execute simultaneously on different CPUs.
With the era of increasing processor speeds slowly coming to and end, computer
architects are exploring new ways of increasing throughput. One of the most
promising is to look for and exploit different types of parallelism in code.

 Instruction Level Parallelism:

Instruction level parallelism (ILP) takes advantage of sequences of instructions
that require different functional units (such as the load unit, ALU, FP
multiplier, etc). It is an idea to have these non-dependent instructions
executing simultaneously to keep the functional units busy as often as possible
or how many of the operations in a computer program can be performed
simultaneously.
 Data Level Parallelism:
DLP is an act of performing the same operation on multiple datum simultaneously.
Example of DLP:
it is like performing an operation on an image in which processing each pixel is
independent from the ones around it (such as brightening). This type of image
processing lends itself well to having multiple pixels modified simultaneously using
the same modification function. Other types of operations that allow the exploitation of
DLP are matrix, array, and vector processing.
 Thread Level Parallelism:
Thread level parallelism is the act of running multiple flows of execution of a single
process simultaneously .
 Applications using TLP:
TLP is most often found in applications that need to run independent, unrelated tasks
such as
Computing
memory accesses
and IO simultaneously.
These types of applications are often found on machines that have a high workload,
such as web servers.

 TASK PARALLELISM
Entirely different calculations can be performed on either the same or different sets of
data.
 Abbreviated as ILP, Instruction-Level Parallelism is a measurement of the number of operations
that can be performed simultaneously in a computer program. Microprocessors exploit ILP by
executing multiple instructions from a single program in a single cycle.

Consider the following program:

1) e = a + b
2) f = c + d
3) g = e * f
Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of
them are completed. However, operations 1 and 2 do not depend on any other operation, so they
can be calculated simultaneously. If we assume that each operation can be completed in one unit
of time then these three instructions can be completed in a to.al of two units of time.
 A goal of compiler and processor designers is to identify and take advantage of as much ILP as
possible.

Ordinary program execution ILP PROGRAM EXECUTION

Ordinary programs are typically written under ILP allows the compiler and the processor to
a sequential execution model where the overlap the execution of multiple instructions
instructions execute one after the other and or even to change the order in which
in the order specified by the programmer. instructions are executed.
 control dependence:

 Resource dependence:
An instruction is resource-dependent on a previously issued instruction if it requires
a hardware resource which is still being used by a previously issued instruction.
e.g.
div r1, r2, r3
div r4, r2, r5
 Computer Architecture: is a contract (instruction format and the
interpretation of the bits that constitute an instruction) between the
class of programs that are written for the architecture and the set of
processor implementations of that architecture.
 In ILP Architectures: + information embedded in the program pertaining
to available parallelism between instructions and operations in the
program

Sequential Architectures:
The program is not expected to convey any explicit information regarding
parallelism. (Superscalar processors)
Dependence Architectures:
The program explicitly indicates the dependences that exist between
operations (Dataflow processors)
Independence Architectures:
The program provides information as to which operations are independent of
one another. (VLIW processors)
 Program contains no explicit information regarding dependencies that
exist between instructions
 Dependencies between instructions must be determined by the hardware
 It is only necessary to determine dependencies with sequentially
preceding instructions that have been issued but not yet completed
 Compiler may re-order instructions to facilitate the hardware’s task of
extracting parallelism

 A superscalar CPU architecture implements a form of parallelism called

instruction level parallelism within a single processor.

 A superscalar processor executes more than one instruction during a

clock cycle by simultaneously dispatching multiple instructions to
redundant functional units on the processor.

 Each functional unit is not a separate CPU core but an execution

resource within a single CPU such as an arithmetic logic unit, a bit
shifter, or a multiplier.
 A superscalar CPU is typically also pipelined, pipelining and
superscalar architecture are considered different performance
enhancement techniques.
The superscalar technique is traditionally associated with several
identifying characteristics (within a given CPU core):
 Instructions are issued from a sequential instruction stream
 CPU hardware dynamically checks for data dependencies
between instructions at run time (versus software checking at
compile time)
 The CPU accepts multiple instructions per clock cycle
 The compiler (programmer) identifies the parallelism in the program and
communicates it to the hardware (specify the dependences between
operations)
 The hardware determines at run-time when each operation is
independent from others and perform scheduling
 Here, no scanning of the sequential program to determine dependences
 Objective: execute the instruction at the earliest possible time
(available input operands and functional units).

 By knowing which operations are independent, the hardware needs no

further checking to determine which instructions can be issued in the
same cycle
 The set of independent operations >> the set of dependent operations
 Only a subset of independent operations are specified
 The compiler may additionally specify on which functional unit and in
which cycle an operation is executed
 The hardware needs to make no run-time decisions
 A hazard is created whenever there is dependence between instructions, and they
are close enough that the overlap caused by pipelining would change the order of
access to an operand. . Data hazards make the performance lower. The situation
when the next instruction depends on the results of the previous one is occurred
very often. It means that these instructions cannot be executed together. There
are three situations in which a data hazard can occur:
 Read after write (RAW):
A RAW Data Hazard refers to a situation where we refer to a result that has not yet
been calculated or retrieved RAW data hazard is the most common type. It arises
when the next instruction tries to read a source before the previous instruction
writes to it. So, the next instruction gets the old value incorrectly.
 Write after read (WAR):
 A WAR Data Hazard represents a problem with concurrent execution.WAR hazard
arises when the next instruction writes to a destination before the previous
instruction reads it. In this case, the previous instruction gets a new value
incorrectly.
 Write after write (WAW)
A WAW Data Hazard is another situation which may occur in a concurrent
execution environment.
 Structural hazards
 A structural hazard occurs when a part of the processor's hardware is needed by two or more
instructions at the same time.
 Control hazards (branch hazards)
 Branching hazards (also known as control hazards) occur with branches. On many
instruction pipeline microarchitectures, the processor will not know the outcome of the
branch when it needs to insert a new instruction into the pipeline.
TECHNIQUES REDUCES
Forwarding and bypassing Potential data hazard stall
Delayed branches & branch scheduling Control hazard stalls
Basic dynamic scheduling (scoreboarding) Data hazards from true dependencies
Dynamic scheduling with renaming Data hazards from antideps and output deps.
Dynamic branch prediction Control stalls
Speculation Data and control hazard stalls
Dynamic memory disambiguation Data hazard stalls with memory
Dynamic memory disambiguation Data hazard stalls with memory

Forwarding and bypassing Potential data hazard stall

Delayed branches & branch scheduling Control hazard stalls
Basic dynamic scheduling (scoreboarding) Data hazards from true dependencies
Dynamic scheduling with renaming Data hazards from antideps and output deps.
Dynamic branch prediction Control stalls
Speculation Data and control hazard stalls
Dynamic memory disambiguation Data hazard stalls with memory
Dynamic memory disambiguation Data hazard stalls with memory
 Instruction pipelining where the execution of multiple
instructions can be partially overlapped.

 Register renaming which refers to a technique used to avoid

unnecessary serialization of program operations imposed by the
reuse of registers by those operations, used to enable out-of-
order execution.

 Dataflow architectures are another class of architectures where

ILP is explicitly specified, but they have not been actively
researched since the 1980s.

 Superscalar execution, VLIW, and the closely related Explicitly

Parallel Instruction Computing concepts, in which multiple
execution units are used to execute multiple instructions in
parallel.
 Instruction-level parallelism is mainly used to increase processor's
performance; however, parallelism can also be used to increase the
energy efficiency of a system. Instruction-level parallelism makes it
possible to execute more than one instruction per cycle. Today’s
processors use more than one pipeline, which means that they have
superscalar architecture. Instruction-level parallelism increases the
performance but an ideal sequence of uniform instructions is rare. The
execution of one instruction often depends on the result of the
previous instruction’s execution. This situation is a data hazard. Data
hazards reduce the architecture performance.
Thank You

MCP Unit 1
No ratings yet
MCP Unit 1
41 pages
CompanionAsset 9780128119051 Chapter03
No ratings yet
CompanionAsset 9780128119051 Chapter03
67 pages
2 TypesofParallelism
No ratings yet
2 TypesofParallelism
69 pages
ILP Overview and Scoreboard
No ratings yet
ILP Overview and Scoreboard
60 pages
Chapter 5 PPTV 41 STDV 1
No ratings yet
Chapter 5 PPTV 41 STDV 1
47 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
22 pages
ITEC582-Chapter 16m
No ratings yet
ITEC582-Chapter 16m
55 pages
Unit-IV ILP
No ratings yet
Unit-IV ILP
6 pages
Code Scheduling in Compiler Design
No ratings yet
Code Scheduling in Compiler Design
6 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
Module3
No ratings yet
Module3
49 pages
5-Instruction Level Support For Parallel Programming-22!12!2022
No ratings yet
5-Instruction Level Support For Parallel Programming-22!12!2022
16 pages
CH18 COA11e
No ratings yet
CH18 COA11e
37 pages
Organization CH 2
No ratings yet
Organization CH 2
102 pages
COA UNIT 5 (AutoRecovered)
No ratings yet
COA UNIT 5 (AutoRecovered)
14 pages
Computer Architecture Unit 3
No ratings yet
Computer Architecture Unit 3
8 pages
Computer Architecture
No ratings yet
Computer Architecture
29 pages
Coa Chapter 5
No ratings yet
Coa Chapter 5
96 pages
Pacman Game in Assembly Language
67% (6)
Pacman Game in Assembly Language
2 pages
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
No ratings yet
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
17 pages
Xx-Iip & Ilp
No ratings yet
Xx-Iip & Ilp
16 pages
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
No ratings yet
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
15 pages
Pipelining
No ratings yet
Pipelining
5 pages
P14-15 Superscalar
No ratings yet
P14-15 Superscalar
28 pages
Unit Iv
No ratings yet
Unit Iv
17 pages
MongalJyoti Saha
No ratings yet
MongalJyoti Saha
9 pages
HPC Unit 2
No ratings yet
HPC Unit 2
72 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
PAG Unit1
No ratings yet
PAG Unit1
64 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
Coa Unit 04
No ratings yet
Coa Unit 04
85 pages
HPC - Unit-1 Insem Notes
No ratings yet
HPC - Unit-1 Insem Notes
76 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
4th Lecture Computer Architecture
No ratings yet
4th Lecture Computer Architecture
15 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
No ratings yet
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
32 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Instruction-Level Parallel Processors: Asim Munir
No ratings yet
Instruction-Level Parallel Processors: Asim Munir
28 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
50 pages
Assignment # 1
80% (5)
Assignment # 1
2 pages
Assignment # 1
80% (5)
Assignment # 1
2 pages
Assignment # 1
80% (5)
Assignment # 1
2 pages
Assignment # 1
80% (5)
Assignment # 1
2 pages
Parallelism in Microprocessor
No ratings yet
Parallelism in Microprocessor
17 pages
Validity and Reliability
No ratings yet
Validity and Reliability
13 pages
Validity and Reliability
No ratings yet
Validity and Reliability
13 pages
COA - Unit 4
No ratings yet
COA - Unit 4
84 pages
Multiprocessing Vs Multithreading 2
No ratings yet
Multiprocessing Vs Multithreading 2
16 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
AI Application in Military
85% (13)
AI Application in Military
26 pages
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
No ratings yet
Module 1: PARALLEL AND DISTRIBUTED COMPUTING
65 pages
ILP Saad Saeed
No ratings yet
ILP Saad Saeed
31 pages
Me FIRST
No ratings yet
Me FIRST
4 pages
CS 6461: Computer Architecture Instruction Level Parallelism
No ratings yet
CS 6461: Computer Architecture Instruction Level Parallelism
41 pages
pdc2: MODULE2
No ratings yet
pdc2: MODULE2
113 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
Processors
100% (4)
Processors
44 pages
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
No ratings yet
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
28 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
Finite Automata Project
100% (1)
Finite Automata Project
6 pages
Bubble Sort
No ratings yet
Bubble Sort
16 pages
Instruction Level Parallelism
95% (21)
Instruction Level Parallelism
11 pages
Self Awareness, Psychology and Social Roles
No ratings yet
Self Awareness, Psychology and Social Roles
64 pages
Parallelism
No ratings yet
Parallelism
22 pages
Filters in Matlab (Filter Codes)
100% (4)
Filters in Matlab (Filter Codes)
5 pages
8051 Microcontroller Instruction
No ratings yet
8051 Microcontroller Instruction
32 pages
HTML Vs XML
100% (1)
HTML Vs XML
2 pages
Solution Manual of Mobile Communication and Networks
No ratings yet
Solution Manual of Mobile Communication and Networks
5 pages
Bluetooth Vs Zigbee
No ratings yet
Bluetooth Vs Zigbee
20 pages
A Thing of Beauty Is A Joy Forever
No ratings yet
A Thing of Beauty Is A Joy Forever
12 pages
Risc and Cisc
No ratings yet
Risc and Cisc
17 pages
Unit 2-LPC 2148
No ratings yet
Unit 2-LPC 2148
91 pages
Assignment 2
No ratings yet
Assignment 2
1 page
Unit - 4 Embedded Software Development Process and Tools
No ratings yet
Unit - 4 Embedded Software Development Process and Tools
25 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
2 pages
Advanced Microcontrollers: Chapter - 26
100% (1)
Advanced Microcontrollers: Chapter - 26
36 pages
Nearest Neighbor Interpolation
No ratings yet
Nearest Neighbor Interpolation
4 pages
Risc V
No ratings yet
Risc V
5 pages
ERD Notation For Entity Relation Diagram
No ratings yet
ERD Notation For Entity Relation Diagram
4 pages
Salvation. Everyone's Very Familiar With This Word. Tragically, Few in This
No ratings yet
Salvation. Everyone's Very Familiar With This Word. Tragically, Few in This
4 pages
'Cameraman - JPG': Original Image 2000 Hitogram of Original Image
No ratings yet
'Cameraman - JPG': Original Image 2000 Hitogram of Original Image
4 pages
CCCCCCCCCCCCCCCCCC C C C C C C C C C: C C C C
No ratings yet
CCCCCCCCCCCCCCCCCC C C C C C C C C C: C C C C
3 pages
CJ Instrucoes 8051
No ratings yet
CJ Instrucoes 8051
6 pages
Assignment # 3
No ratings yet
Assignment # 3
1 page
Desktop Du
No ratings yet
Desktop Du
62 pages
New Technical Book List 2011 1
No ratings yet
New Technical Book List 2011 1
53 pages
Register File Prefetching
No ratings yet
Register File Prefetching
14 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Equalized Histogram
No ratings yet
Equalized Histogram
3 pages
Topic Instruction Cycle: (Continuous Assessment - 1)
No ratings yet
Topic Instruction Cycle: (Continuous Assessment - 1)
15 pages
Embedded Processor QB by VP
No ratings yet
Embedded Processor QB by VP
4 pages
L06 Arc DR Wail c4 Pipelining
No ratings yet
L06 Arc DR Wail c4 Pipelining
168 pages
Microprocessors and Microcontrollers
No ratings yet
Microprocessors and Microcontrollers
3 pages
Lecture 3.1.6 (RISC and CISC Architectures)
No ratings yet
Lecture 3.1.6 (RISC and CISC Architectures)
5 pages
Labtask 2
No ratings yet
Labtask 2
12 pages
8051 Microcontroller
No ratings yet
8051 Microcontroller
2 pages
CS8491 Ca Unit 4
No ratings yet
CS8491 Ca Unit 4
32 pages
Instruction Set Architecture
No ratings yet
Instruction Set Architecture
27 pages
Computer Architecture MCQ
No ratings yet
Computer Architecture MCQ
5 pages
Multiplier and Accumulator Concepts in Digital Signal Processing
No ratings yet
Multiplier and Accumulator Concepts in Digital Signal Processing
8 pages
Chapt 1 Intro Microprocessor Imp
No ratings yet
Chapt 1 Intro Microprocessor Imp
1 page
Be Coa Syllabus 4th Semester
No ratings yet
Be Coa Syllabus 4th Semester
4 pages
CO Important Topics
No ratings yet
CO Important Topics
5 pages
CPU Suppor List H110
No ratings yet
CPU Suppor List H110
2 pages
Inventaire
No ratings yet
Inventaire
2 pages
Ic 3402 Embedded Systems and Iot
No ratings yet
Ic 3402 Embedded Systems and Iot
1 page
CO IMP Questions
No ratings yet
CO IMP Questions
1 page
Assignment 4
No ratings yet
Assignment 4
5 pages

Instruction Level Parallelism

Uploaded by

Instruction Level Parallelism

Uploaded by

Instruction level parallelism

 Instruction Level Parallelism:

Consider the following program:

Ordinary program execution ILP PROGRAM EXECUTION

 A superscalar CPU architecture implements a form of parallelism called

 A superscalar processor executes more than one instruction during a

 Each functional unit is not a separate CPU core but an execution

 By knowing which operations are independent, the hardware needs no

Forwarding and bypassing Potential data hazard stall

 Register renaming which refers to a technique used to avoid

 Dataflow architectures are another class of architectures where

 Superscalar execution, VLIW, and the closely related Explicitly

You might also like