95% found this document useful (21 votes)

15K views11 pages

Instruction Level Parallelism

Instruction-level parallelism (ILP) refers to executing multiple instructions simultaneously by exploiting opportunities where instructions do not depend on each other. There are three main types of parallelism: instruction-level parallelism, where independent instructions from the same program can execute simultaneously; data-level parallelism, where the same operation is performed on multiple data items in parallel; and thread-level parallelism, where multiple threads of a program run simultaneously. Exploiting ILP is challenging due to data dependencies between instructions, which limit opportunities for parallel execution.

Uploaded by

Zarnigar Altaf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

95% found this document useful (21 votes)

15K views11 pages

Instruction Level Parallelism

Uploaded by

Zarnigar Altaf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

INSTRUCTION LEVEL PARALLELISM

INTRODUCTION
Computer designers and computer architects have been striving to improve uniprocessor
computer performance since the first computer was designed. The most significant advances in
uniprocessor performance have come from exploiting advances in implementation technology.
Architectural innovations have also played a part, and one of the most significant of these over
The last decade has been the rediscovery of RISC architectures. Now that RISC architectures
have gained acceptance both in scientific and marketing circles, computer architects have been
thinking of new ways to improve uniprocessor performance. Many of these proposals such as
VLIW, superscalar, and even relatively old ideas such as vector processing try to improve
computer performance by exploiting instruction-level parallelism. They take advantage of this
parallelism by issuing more than one instruction per cycle explicitly (as in VLIW or superscalar
machines) or implicitly (as in vector machines). In this paper we will limit ourselves to
improving uniprocessor performance, and will not discuss methods of improving application
performance by using multiple processors in parallel. The amount of instruction-level parallelism
varies widely depending on the type of code being executed. When we consider uniprocessor
performance improvements due to exploitation of instruction-level parallelism, it is important to
keep in mind the type of application environment. If the applications are dominated by highly
parallel code (e.g., weather forecasting), any of a number of different parallel computers (e.g.,
vector, MIMD) would improve application performance. However, if the dominant applications
have little instruction-level parallelism (e.g., compilers, editors, event-driven simulators, lisp
interpreters), the performance improvements will be much smaller.

ILP HISTORY

Instruction-level Parallelism (ILP) is a critical technique used in computer architecture for

processor and compiler design. ILP can improve the program execution performance by causing
individual machine operations to execute in parallel. ILP appeared in the field of computer
design 30 years ago. However it didn’t play a important role in computer architecture design
until 1980s. The quick development of the computer architecture technologies, ILP played a
main role in designing a computer system, including the hardware and software design. So far,
more CPU manufactures had incorporated ILP to their CPUs, and a bunch of new hardware and
software techniques for ILP have become a popular topic. Actually when we talk about the ILP,
it means how many instructions can be executed or issued at one time. Though we need
hardware support for the ILP, the amount of availability of ILP is really got by the compiler
before execution. So the available ILP in program became one of the central topics in computer
compiler design in recent years. The study of how much instruction level parallelism actually
exists in programs is a pretty interest field in processor architecture to boost the performance of a
single processor by overlapping the execution of multiple instructions, using parallel processing
models such as VLIW, superscalar, etc. This study attempts to measure the available parallelism
in a program and tries to indicate whether the performance bottleneck is insufficient parallelism
in the instruction stream. Its result will lead to reduce the instruction dependencies in the
program by using appropriate compiler optimizations. So the study of ILP will do a great help to
improve the performance of the nowadays’ processor. And it can also make the program ILP
1
independent of the machine architecture. Program parallelism is very different from machine
parallelism. If the program parallelism is low relative to machine parallelism, overall
performance is limited by the program parallelism.

PARALLELISM

With the era of increasing processor speeds slowly coming to an end, computer architects are
exploring new ways of increasing throughput. One of the most promising is to look for and
exploit different types of parallelism in code.

TYPES OF PARALLELISM

There are three main types of parallelism.

Instruction Level Parallelism

Instruction level parallelism (ILP) takes advantage of sequences of instructions that require
different functional units (such as the load unit, ALU, FP multiplier, etc). Different architectures
approach this in different ways, but the idea is to have these non-dependent instructions
executing simultaneously to keep the functional units busy as often as possible.

Data Level Parallelsim

Data level parallelism (DLP) is more of a special case than instruction level parallelism. DLP to
the act of performing the same operation on multiple datum simultaneously. A classic example
of DLP is performing an operation on an image in which processing each pixel is independent
from the ones around it (such as brightening). This type of image processing lends itself well to
having multiple pixels modified simultaneously using the same modification function. Other
types of operations that allow the exploitation of DLP are matrix, array, and vector processing.

Thread Level Parallelism

Thread level parallelism (TLP) is the act of running multiple flows of execution of a single
process simultaneously. TLP is most often found in applications that need to run independent,
unrelated tasks (such as computing, memory accesses, and IO) simultaneously. These types of
applications are often found on machines that have a high workload, such as web servers. TLP is
a popular ground for current research due to the rising popularity of multi-core and multi-
processor systems, which allow for different threads to truly execute in parallel.

2
INSTRUCTION LEVEL PARALLELISM
DEFINITION

Abbreviated as ILP, Instruction-Level Parallelism is a measurement of the number of operations

that can be performed simultaneously in a computer program. Microprocessors exploit ILP by
executing multiple instructions from a single program in a single cycle.

EXPLANTION

Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer

program can be performed simultaneously. Consider the following program:

1. e = a + b
2. f = c + d
3. g = e * f

Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of
them are completed. However, operations 1 and 2 do not depend on any other operation, so they
can be calculated simultaneously. If we assume that each operation can be completed in one unit
of time then these three instructions can be completed in a total of two units of time, giving an
ILP of 3/2.

A goal of compiler and processor designers is to identify and take advantage of as much ILP as
possible. Ordinary programs are typically written under a sequential execution model where
instructions execute one after the other and in the order specified by the programmer. ILP allows
the compiler and the processor to overlap the execution of multiple instructions or even to
change the order in which instructions are executed.

How much ILP exists in programs is very application specific. In certain fields, such as graphics
and scientific computing the amount can be very large. However, workloads such as
cryptography exhibit much less parallelism.

DATA DEPENDANCY
A data dependency in computer science is a situation in which a program statement
(instruction) refers to the data of a preceding statement.

Dependences are a property of programs. If two instructions are data dependent they cannot
execute simultaneously. A dependence results in a hazard and the hazard causes a stall. Data
dependences may occur through registers or memory

3
TYPES OF DEPENDENCY

1.Name dependence
Two instructions use the same register/memory (name), but there is no flow

of data.

There are further two types of name dependency

1) Anti-dependence
2) Output dependence

Output dependence

An output dependency occurs when the ordering of instructions will affect the final output value
of a variable. In the example below, there is an output dependency between instructions 3 and 1;
changing the ordering of instructions in this example will change the final value of B, thus these
instructions cannot be executed in parallel.

1A=2*X
2B=A/3
3A=9*Y

Anti-dependence

An anti-dependency occurs when an instruction requires a value that is later updated. In the
following example, instruction 3 anti-depends on instruction 2the ordering of these instructions
cannot be changed, nor can they be executed in parallel (possibly changing the instruction
ordering), as this would affect the final value of A.

1. B = 3
2. A = B + 1
3. B = 7

2.Data True dependence

A true dependency, also known as a data dependency, occurs when an instruction depends on the
result of a previous instruction:

1. A = 3
2. B = A
3. C = B

4
Instruction 3 is truly dependent on instruction 2, as the final value of C depends on the
instruction updating B. Instruction 2 is truly dependent on instruction 1, as the final value of B
depends on the instruction updating A. Since instruction 3 is truly dependent upon instruction 2
and instruction 2 is truly dependent on instruction 1, instruction 3 is also truly dependent on
instruction 1. Instruction level parallelism is therefore not an option in this example.

3.Control Dependence
Any instruction B is control dependent on a preceding instruction A if the latter determines
whether B should execute or not. In the following example, instruction 2 is control dependent on
instruction 1.

1. If a == b goto AFTER

2. A = 2 * X4.

3. AFTER:

4.Resource Dependence
An instruction is resource-dependent on a previously issued instruction if it requires a hardware
resource which is still being used by a previously issued instruction e.g.

div r1, r2, r3

div r4, r2, r5

TYPES OF HAZARDS

1.Data hazards

A hazard is created whenever there is dependence between instructions, and they are close
enough that the overlap caused by pipelining would change the order of access to an operand. Or
we can say that data hazards occur when instructions which exhibit data dependence modify data
in different stages of a pipeline. Data hazards make the performance lower. The situation when
the next instruction depends on the results of the previous one is occurred very often. It means
that these instructions cannot be executed together. There are three situations in which a data
hazard can occur:

1. Read after write (RAW), a true dependency

2. Write after read (WAR)
3. Write after write (WAW)

5
Read After Write (RAW)

A RAW Data Hazard refers to a situation where we refer to a result that has not yet been
calculated or retrieved

RAW data hazard is the most common type. It arises when the next instruction
tries to read a source before the previous instruction writes to it. So, the next
instruction gets the old value incorrectly

For example:

1. R2 <- R1 + R3
2. R4 <- R2 + R3

The first instruction is calculating a value to be saved in register 2, and the second is going to use
this value to compute a result for register 4. However, in a pipeline, when we fetch the operands
for the 2nd operation, the results from the first will not yet have been saved, and hence we have a
data dependency.

We say that there is a data dependency with instruction 2, as it is dependent on the completion of
instruction 1.

Write After Read (WAR)

A WAR Data Hazard represents a problem with concurrent execution.

WAR hazard arises when the next instruction writes to a destination before the previous
instruction reads it. In this case, the previous instruction gets a new value incorrectly

For example:

1. R4 <- R1 + R3
2. R3 <- R1 + R2

If we are in a situation that there is a chance that 2 may be completed before 1 (i.e. with
concurrent execution) we must ensure that we do not store the result of register 3 before 1 has
had a chance to fetch the operands.

Write After Write (WAW)

A WAW Data Hazard is another situation which may occur in a concurrent execution
environment.

6
For example:

1. R2 <- R1 + R2
2. R2 <- R4 + R7

We must delay the WB (Write Back) of 2 until the execution of 1.

The overall picture of the data hazard is described in a picture below

2.Structural hazards

A structural hazard occurs when a part of the processor's hardware is needed by two or more
instructions at the same time. A canonical example is a single memory unit that is accessed both
in the fetch stage where an instruction is retrieved from memory, and the memory stage where
data is written and/or read from memory.

3.Control hazards (branch hazards)

Branching hazards (also known as control hazards) occur with branches. On many instruction
pipeline microarchitectures, the processor will not know the outcome of the branch when it needs
to insert a new instruction into the pipeline.

HARDWARE SUPPORT FOR ILP

We can see there are several mechanisms for supporting the ILP by hardware: first, using
multiple, parallel functional units and second, pipelining the functional units . And also we can
use dynamic scheduling, such as scoreboarding or Tomasulo approach, to reduce the

7
dependencies among program. Superscalar and VLIW are the basic hardware techniques for
exploiting the ILP. Superscalar and VLI processors can potentially provide large performance
improvements over their scalar predecessors by providing multiple data paths and functional
units. The parallel resourses are exploited by concurrently executing independent instructions
from the instruction stream. However, conditional branch instructions pose difficult problems for
all types of processors that exploit ILP. Recent studies have shown that by using conventional
code optimization and scheduling methods, superscalar and VLIW processors cannot produce a
sustained speedup of more than two for nonnumeric programs. For such programs, conventional
architectural and compilation methods do not provide enough support to utilize these processors.

COMPILER SUPPORT FOR ILP

As we know, the ILP is exploited both by compiler and hardware support. However compiler
provides the
inherent and implicit ILP in program to hardware by compilation optimization. There is lots of
compiler
techniques for extracting the available ILP in programs:
• Scheduling
• register allocation and renaming.
• Loop unroll
• control-flow analysis and optimization:Branch prediction and speculation.
• memory access optimization: value prediction.

TECHNIQUES FOR ILP

TECHNIQUES REDUCES

Forwarding and bypassing Potential data hazard stall

Delayed branches & branch scheduling Control hazard stalls
Basic dynamic scheduling (scoreboarding) Data hazards from true dependencies
Dynamic scheduling with renaming Data hazards from antideps and output deps.
Dynamic branch prediction Control stalls
Speculation Data and control hazard stalls
Dynamic memory disambiguation Data hazard stalls with memory
Dynamic memory disambiguation Data hazard stalls with memory

ILP Architectures

8
ILP Architectures: information embedded in the program pertaining to available parallelism
between instructions and operations in the program is refer to ilp architecture.

ILP architecture is a contract (instruction format and the interpretation of the bits that constitute
an instruction) between the class of programs that are written for the architecture and the set of
processor implementations of that architecture.

ILP Architectures Classifications

Sequential Architectures: the program is not expected to convey any explicit information
regarding parallelism. (Superscalar processors)

Dependence Architectures: the program explicitly indicates the dependences that exist between
operations (Dataflow processors)

Independence Architectures: the program provides information as to which operations are

independent of one another. (VLIW processors)

Sequential architecture and superscalar processors

Program contains no explicit information regarding dependencies that exist between instructions.
Dependencies between instructions must be determined by the hardware. It is only necessary to
determine dependencies with sequentially preceding instructions that have been issued but not
yet completed. Compiler may re-order instructions to facilitate the hardware’s task of extracting
parallelism

Superscalar Processors
A superscalar CPU architecture implements a form of parallelism called instruction level
parallelism within a single processor. It therefore allows faster CPU throughput than would
otherwise be possible at a given clock rate.

Dependence architecture and data flow processors

The compiler (programmer) identifies the parallelism in the program and communicates it to the
hardware (specify the dependences between operations). The hardware determines at run-time
when each operation is independent from others and perform scheduling. Here, no scanning of
the sequential program to determine dependences

Objective: execute the instruction at the earliest possible time (available input operands and
functional units).

Independence architecture and VLIW processors

9
By knowing which operations are independent, the hardware needs no further checking to
determine which instructions can be issued in the same cycle

The set of independent operations >> the set of dependent operations

Only a subset of independent operations are specified. The compiler may additionally specify on
which functional unit and in which cycle an operation is executed. The hardware needs to make
no run-time decisions

VLIW Processor
Very long instruction word or VLIW refers to a CPU architecture designed to take advantage
of instruction level parallelism (ILP). A processor that executes every instruction one after the
other (i.e. a non-pipelined scalar architecture) may use processor resources inefficiently,
potentially leading to poor performance. The performance can be improved by executing
different sub-steps of sequential instructions simultaneously (this is pipelining), or even
executing multiple instructions entirely simultaneously as in superscalar architectures. Further
improvement can be achieved by executing instructions in an order different from the order they
appear in the program; this is called out-of-order execution.

Limits to ILP
 Hardware sophistication is a problem that arises when we use ILP in a system.
 Compiler sophistication

Overcoming Limits

 The compiler technology must be advanced.

 Significantly new and different hardware techniques may be able to overcome
limitations that were assumed in the studies.
 However, unlikely such advances when coupled with realistic hardware will overcome
these limits in near future

Fields where ILP is being used:

10
Micro-architectural techniques that are used to exploit ILP include:

 Instruction pipelining where the execution of multiple instructions can be partially

overlapped.
 Superscalar execution, VLIW, and the closely related Explicitly Parallel Instruction
Computing concepts, in which multiple execution units are used to execute multiple
instructions in parallel.
 Out-of-order execution where instructions execute in any order that does not violate data
dependencies. Note that this technique is independent of both pipelining and superscalar.
Current implementations of out-of-order execution dynamically (i.e., while the program
is executing and without any help from the compiler) extract ILP from ordinary
programs. An alternative is to extract this parallelism at compile time and somehow
convey this information to the hardware. Due to the complexity of scaling the out-of-
order execution technique, the industry has re-examined instruction sets which explicitly
encode multiple independent operations per instruction.
 Register renaming which refers to a technique used to avoid unnecessary serialization of
program operations imposed by the reuse of registers by those operations, used to enable
out-of-order execution.
 Dataflow architectures are another class of architectures where ILP is explicitly specified, but
they have not been actively researched since the 1980s.
 Speculative execution which allow the execution of complete instructions or parts of
instructions before being certain whether this execution should take place. A commonly
used form of speculative execution is control flow speculation where instructions past a
control flow instruction (e.g., a branch) are executed before the target of the control flow
instruction is determined.

CONCLUSION

Instruction-level parallelism is mainly used to increase processor's performance; however,

parallelism can also be used to increase the energy efficiency of a system. Instruction-level
parallelism makes it possible to execute more than one instruction per cycle. Today’s processors
use more than one pipeline, which means that they have superscalar architecture. Instruction-
level parallelism increases the performance but an ideal sequence of uniform instructions is rare.
The execution of one instruction often depends on the result of the previous instruction’s
execution. This situation is a data hazard. Data hazards reduce the architecture performance. ILP
techniques are used to expose independent instructions in a sequential program. With adequate
hardware support, the execution of such independent instructions can be parallelized, reducing
the program execution time. The performance improvement that is given by instruction-level
parallelism strongly depends on the ability to find independent instructions. Data dependences,
control dependences, and resource conflicts are the fundamental limitations that bound the
amount of available parallelism, and therefore the potential increase in performance. The next
subsections describe these dependences and the way to reduce their impact on performance.

Process Control
100% (1)
Process Control
18 pages
Pipeline Hazards. Presentation
100% (2)
Pipeline Hazards. Presentation
20 pages
Kai Hwang: Advanced Computer Architecture
No ratings yet
Kai Hwang: Advanced Computer Architecture
9 pages
EC2303 Unit I The Ias Computer Architecture
100% (4)
EC2303 Unit I The Ias Computer Architecture
5 pages
Bubble Sort
No ratings yet
Bubble Sort
16 pages
Pacman Game in Assembly Language
67% (6)
Pacman Game in Assembly Language
2 pages
Advanced Computer Architecture: Conditions of Parallelism
No ratings yet
Advanced Computer Architecture: Conditions of Parallelism
27 pages
Parallel Computing LessonPlan
No ratings yet
Parallel Computing LessonPlan
10 pages
Compiler Techniques For Exposing ILP
No ratings yet
Compiler Techniques For Exposing ILP
26 pages
Multiprogramming in 8086 Microprocessor
No ratings yet
Multiprogramming in 8086 Microprocessor
40 pages
Computer Architecture: 1. Draw A Diagram Single Bus Organization of The Data Path Inside A Processor
No ratings yet
Computer Architecture: 1. Draw A Diagram Single Bus Organization of The Data Path Inside A Processor
8 pages
Network Programming Imp
No ratings yet
Network Programming Imp
5 pages
Chapter 3 - Pipelining-And-Vector-Processing
100% (1)
Chapter 3 - Pipelining-And-Vector-Processing
29 pages
Program Partioning and Scheduling
No ratings yet
Program Partioning and Scheduling
36 pages
Parallel and Distributed Computing Complete Notes
No ratings yet
Parallel and Distributed Computing Complete Notes
41 pages
Influences On Language Design
100% (1)
Influences On Language Design
7 pages
Pipelining: by Based On The Text Book "Computer Organization" by Carl Hamacher Et Al., Fifth Edition
No ratings yet
Pipelining: by Based On The Text Book "Computer Organization" by Carl Hamacher Et Al., Fifth Edition
23 pages
Q. Explain Booch Methodology. Booch Method:: Notation
No ratings yet
Q. Explain Booch Methodology. Booch Method:: Notation
7 pages
Pipeline Hazards Detailed Notes
No ratings yet
Pipeline Hazards Detailed Notes
49 pages
Compiling, Linking, and Locating
No ratings yet
Compiling, Linking, and Locating
20 pages
State Reduction and Assignment
100% (5)
State Reduction and Assignment
49 pages
CH1-Introduction To Unix Linux Kernel
100% (2)
CH1-Introduction To Unix Linux Kernel
25 pages
2.1 Advanced Processor Technology
No ratings yet
2.1 Advanced Processor Technology
40 pages
Thrashing in OS (Operating System) - What Is Thrash - Javatpoint
No ratings yet
Thrashing in OS (Operating System) - What Is Thrash - Javatpoint
7 pages
Advanced Computer Architecture - Unit 1 - WWW - Rgpvnotes.in
100% (1)
Advanced Computer Architecture - Unit 1 - WWW - Rgpvnotes.in
20 pages
Unit - Ii: Communication and Invocation
No ratings yet
Unit - Ii: Communication and Invocation
16 pages
PPT-Unit-4 CPU Scheduling and Algorithms
No ratings yet
PPT-Unit-4 CPU Scheduling and Algorithms
56 pages
Advance Computer Architecture2
No ratings yet
Advance Computer Architecture2
36 pages
Operating System Unit Wise Important Questions: S. No. Blooms Taxonomy Level Course Outcomes
100% (1)
Operating System Unit Wise Important Questions: S. No. Blooms Taxonomy Level Course Outcomes
12 pages
8051 Instruction Set
No ratings yet
8051 Instruction Set
79 pages
Introduction of System Call
No ratings yet
Introduction of System Call
13 pages
20 Distributed Reliability Protocols PDF
0% (2)
20 Distributed Reliability Protocols PDF
31 pages
3-RISC Architecture
100% (1)
3-RISC Architecture
13 pages
Unit 1 Introduction To ML
100% (1)
Unit 1 Introduction To ML
52 pages
Unit-3 PPT Co
No ratings yet
Unit-3 PPT Co
24 pages
Lab Pratice First Lab Manual
No ratings yet
Lab Pratice First Lab Manual
81 pages
Timers, Serial Port & Interrupts in 8051
No ratings yet
Timers, Serial Port & Interrupts in 8051
51 pages
Lesson Plan Microprocessor and Micro COntroller
0% (2)
Lesson Plan Microprocessor and Micro COntroller
7 pages
C Program To Implement Stack Using Array
100% (1)
C Program To Implement Stack Using Array
3 pages
OS Question Bank Unit 1-5
No ratings yet
OS Question Bank Unit 1-5
9 pages
Compiler Design-UNIT-5
No ratings yet
Compiler Design-UNIT-5
34 pages
HPC Question Bank From SNGCE, Kadayirippu
No ratings yet
HPC Question Bank From SNGCE, Kadayirippu
3 pages
Mcculloch-Pitts Neural Model and Pattern Classification.
No ratings yet
Mcculloch-Pitts Neural Model and Pattern Classification.
13 pages
CS 8351 Digital Principles and System Design: Multiplexers
No ratings yet
CS 8351 Digital Principles and System Design: Multiplexers
25 pages
Dpco - Unit-3,4,5-Question-Bank
No ratings yet
Dpco - Unit-3,4,5-Question-Bank
4 pages
COA Unit-1 Final
60% (5)
COA Unit-1 Final
34 pages
DLD - Ch.1 Notes PDF
No ratings yet
DLD - Ch.1 Notes PDF
35 pages
Part A - Micro-Project Proposal: Assembly Language Program To Print String
50% (2)
Part A - Micro-Project Proposal: Assembly Language Program To Print String
7 pages
Advanced Computer Architecture - Parallelism Scalability & Programability - Kai Hwang
100% (2)
Advanced Computer Architecture - Parallelism Scalability & Programability - Kai Hwang
165 pages
Embedded C Programming Language (Overview, Syntax, One Simple Program Like Addition of Two Numbers)
100% (1)
Embedded C Programming Language (Overview, Syntax, One Simple Program Like Addition of Two Numbers)
6 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
3 pages
Unit-IV ILP
No ratings yet
Unit-IV ILP
6 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
19 pages
2 TypesofParallelism
No ratings yet
2 TypesofParallelism
69 pages
Instruction Level Parallelism-Concepts N Challenges
100% (1)
Instruction Level Parallelism-Concepts N Challenges
4 pages
U3.1 Concepts and Challenges
No ratings yet
U3.1 Concepts and Challenges
12 pages
MongalJyoti Saha
No ratings yet
MongalJyoti Saha
9 pages
MCP Unit 1
No ratings yet
MCP Unit 1
41 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
2 pages
Module 5 Instruction Level Parallelism and Pipelining
No ratings yet
Module 5 Instruction Level Parallelism and Pipelining
54 pages
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
No ratings yet
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
201 pages
ORACLE PL/SQL Interview Questions You'll Most Likely Be Asked
From Everand
ORACLE PL/SQL Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
5/5 (1)
Filters in Matlab (Filter Codes)
100% (4)
Filters in Matlab (Filter Codes)
5 pages
Validity and Reliability
No ratings yet
Validity and Reliability
13 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
Assignment # 3
No ratings yet
Assignment # 3
1 page
Assignment # 1
80% (5)
Assignment # 1
2 pages
Assignment # 1
80% (5)
Assignment # 1
2 pages
Assignment # 1
80% (5)
Assignment # 1
2 pages
HTML Vs XML
100% (1)
HTML Vs XML
2 pages
Assignment # 1
80% (5)
Assignment # 1
2 pages
AI Application in Military
85% (13)
AI Application in Military
26 pages
Nearest Neighbor Interpolation
No ratings yet
Nearest Neighbor Interpolation
4 pages
Solution Manual of Mobile Communication and Networks
No ratings yet
Solution Manual of Mobile Communication and Networks
5 pages
'Cameraman - JPG': Original Image 2000 Hitogram of Original Image
No ratings yet
'Cameraman - JPG': Original Image 2000 Hitogram of Original Image
4 pages
Equalized Histogram
No ratings yet
Equalized Histogram
3 pages
Assignment 2
No ratings yet
Assignment 2
1 page
Validity and Reliability
No ratings yet
Validity and Reliability
13 pages
Bluetooth Vs Zigbee
No ratings yet
Bluetooth Vs Zigbee
20 pages
Finite Automata Project
100% (1)
Finite Automata Project
6 pages
CCCCCCCCCCCCCCCCCC C C C C C C C C C: C C C C
No ratings yet
CCCCCCCCCCCCCCCCCC C C C C C C C C C: C C C C
3 pages
Salvation. Everyone's Very Familiar With This Word. Tragically, Few in This
No ratings yet
Salvation. Everyone's Very Familiar With This Word. Tragically, Few in This
4 pages
Self Awareness, Psychology and Social Roles
No ratings yet
Self Awareness, Psychology and Social Roles
64 pages
A Thing of Beauty Is A Joy Forever
No ratings yet
A Thing of Beauty Is A Joy Forever
12 pages
ERD Notation For Entity Relation Diagram
No ratings yet
ERD Notation For Entity Relation Diagram
4 pages
2PL Protocol
No ratings yet
2PL Protocol
32 pages
The Performance of Spin Lock Alternatives For Shared-Memory Multiprocessors
No ratings yet
The Performance of Spin Lock Alternatives For Shared-Memory Multiprocessors
27 pages
Dicky Pramana Agung
No ratings yet
Dicky Pramana Agung
31 pages
UNIT 02 Process and Process Scheduling
No ratings yet
UNIT 02 Process and Process Scheduling
56 pages
Flynn's Classification 3.1.2
No ratings yet
Flynn's Classification 3.1.2
4 pages
Juspay OS Previous Questions
No ratings yet
Juspay OS Previous Questions
2 pages
Unit 3 Inter Process Communication
No ratings yet
Unit 3 Inter Process Communication
63 pages
Distributed Computing Notes
No ratings yet
Distributed Computing Notes
4 pages
How Map Reduce Work
No ratings yet
How Map Reduce Work
99 pages
Building Secure and Reliable Network Applications
No ratings yet
Building Secure and Reliable Network Applications
4 pages
Python Asyncio Jump-Start
No ratings yet
Python Asyncio Jump-Start
191 pages
Difference Between Synchronized Method and Block (Threads Forum at JavaRanch)
No ratings yet
Difference Between Synchronized Method and Block (Threads Forum at JavaRanch)
4 pages
Hs Err Pid17221
No ratings yet
Hs Err Pid17221
253 pages
Temp Anr 7722162232522808235
No ratings yet
Temp Anr 7722162232522808235
26 pages
Os Multiprogramming
100% (1)
Os Multiprogramming
15 pages
Os Galvinsilberschatz Sol
No ratings yet
Os Galvinsilberschatz Sol
134 pages
CS 61C: Great Ideas in Computer Architecture (Machine Structures)
No ratings yet
CS 61C: Great Ideas in Computer Architecture (Machine Structures)
32 pages
Concurrency Control Techniques
No ratings yet
Concurrency Control Techniques
46 pages
Java Concurrency Ebook
No ratings yet
Java Concurrency Ebook
98 pages
Threads: Amity Institute of Information Technology
No ratings yet
Threads: Amity Institute of Information Technology
23 pages
Multitasking Operation SYSTEM - 2023EV149: Dharsan S (201CS146)
No ratings yet
Multitasking Operation SYSTEM - 2023EV149: Dharsan S (201CS146)
8 pages
06 Flynn-S Classification
No ratings yet
06 Flynn-S Classification
31 pages
Lamport's Algorithm For Logical Clock
No ratings yet
Lamport's Algorithm For Logical Clock
5 pages
Cas Error
No ratings yet
Cas Error
23 pages
RT2021 Chap4
No ratings yet
RT2021 Chap4
51 pages
Distributed System
No ratings yet
Distributed System
34 pages
Cat - Ii
No ratings yet
Cat - Ii
2 pages
Module - 2 Notes-BCS303
100% (1)
Module - 2 Notes-BCS303
38 pages
Message 7
No ratings yet
Message 7
2 pages

Instruction Level Parallelism

Uploaded by

Instruction Level Parallelism

Uploaded by

INSTRUCTION LEVEL PARALLELISM

Instruction-level Parallelism (ILP) is a critical technique used in computer architecture for

There are three main types of parallelism.

Instruction Level Parallelism

Data Level Parallelsim

Thread Level Parallelism

Abbreviated as ILP, Instruction-Level Parallelism is a measurement of the number of operations

Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer

There are further two types of name dependency

2.Data True dependence

div r1, r2, r3

div r4, r2, r5

1. Read after write (RAW), a true dependency

Write After Read (WAR)

A WAR Data Hazard represents a problem with concurrent execution.

Write After Write (WAW)

We must delay the WB (Write Back) of 2 until the execution of 1.

The overall picture of the data hazard is described in a picture below

3.Control hazards (branch hazards)

HARDWARE SUPPORT FOR ILP

COMPILER SUPPORT FOR ILP

TECHNIQUES FOR ILP

Forwarding and bypassing Potential data hazard stall

ILP Architectures Classifications

Independence Architectures: the program provides information as to which operations are

Sequential architecture and superscalar processors

Dependence architecture and data flow processors

Independence architecture and VLIW processors

The set of independent operations >> the set of dependent operations

 The compiler technology must be advanced.

Fields where ILP is being used:

 Instruction pipelining where the execution of multiple instructions can be partially

Instruction-level parallelism is mainly used to increase processor's performance; however,

You might also like