0% found this document useful (0 votes)
475 views19 pages

Instruction Level Parallelism

Instruction level parallelism € Computer designers and computer architects have been striving to improve uniprocessor computer performance since the first computer was designed and this is done by exploiting advances in implementation technology. Architectural innovations have also played a part, and one of the most significant of these over the last decade has been the rediscovery of RISC architectures. RISC architectures have gained acceptance in

Uploaded by

Zarnigar Altaf
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
475 views19 pages

Instruction Level Parallelism

Instruction level parallelism € Computer designers and computer architects have been striving to improve uniprocessor computer performance since the first computer was designed and this is done by exploiting advances in implementation technology. Architectural innovations have also played a part, and one of the most significant of these over the last decade has been the rediscovery of RISC architectures. RISC architectures have gained acceptance in

Uploaded by

Zarnigar Altaf
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 19

Instruction level parallelism

Asma hameed
Syeda phool zehra
Zarnigar altaf
 Computer designers and computer architects have been striving to improve uniprocessor
computer performance since the first computer was designed and this is done by exploiting
advances in implementation technology.
 Architectural innovations have also played a part, and one of the most significant of these
over the last decade has been the rediscovery of RISC architectures.
 RISC architectures have gained acceptance in both scientific and marketing circles.
 computer architects have been thinking of new ways to improve uniprocessor performance
by exploiting instruction-level parallelism. Many of these proposals are:
VLIW
superscalar
And some old ideas such as vector processing
 computer architects take advantage of parallelism by issuing more than one instruction per
cycle explicitly (as in VLIW or super scalar machines) or implicitly (as in vector machines).
 The amount of instruction-level parallelism varies widely depending on the type of code
being executed. This is because when we consider uniprocessor performance
improvements due to exploitation of instruction-level parallelism, it is important to keep
in mind the type of application environment
 if the dominant applications have little instruction-level parallelism the performance
improvements will be much smaller. 
 Parallel computing is a form of computation in which many calculations are carried out
simultaneously, operating on the principle that large problems can often be divided into
smaller ones, which are then solved concurrently("in parallel"). Parallel Computations use
multi-processor computers and/or several independent computers interconnected in some
way, working together on a common task.
 Parallelism is the simultaneous use of multiple compute resources to solve a
computational problem:
•To be run using multiple CPUs.
•A problem is broken into discrete parts that can be solved concurrently.
•Each part is further broken down to a series of instructions.
•Instructions from each part execute simultaneously on different CPUs.
With the era of increasing processor speeds slowly coming to and end, computer
architects are exploring new ways of increasing throughput. One of the most
promising is to look for and exploit different types of parallelism in code.

 Instruction Level Parallelism:


Instruction level parallelism (ILP) takes advantage of sequences of instructions
that require different functional units (such as the load unit, ALU, FP
multiplier, etc). It is an idea to have these non-dependent instructions
executing simultaneously to keep the functional units busy as often as possible
or how many of the operations in a computer program can be performed
simultaneously.
 Data Level Parallelism:
DLP is an act of performing the same operation on multiple datum simultaneously.
Example of DLP:
it is like performing an operation on an image in which processing each pixel is
independent from the ones around it (such as brightening). This type of image
processing lends itself well to having multiple pixels modified simultaneously using
the same modification function. Other types of operations that allow the exploitation of
DLP are matrix, array, and vector processing.
 Thread Level Parallelism:
Thread level parallelism is the act of running multiple flows of execution of a single
process simultaneously .
 Applications using TLP:
TLP is most often found in applications that need to run independent, unrelated tasks
such as
Computing
memory accesses
and IO simultaneously.
These types of applications are often found on machines that have a high workload,
such as web servers.

 TASK PARALLELISM
Entirely different calculations can be performed on either the same or different sets of
data.
 Abbreviated as ILP, Instruction-Level Parallelism is a measurement of the number of operations
that can be performed simultaneously in a computer program. Microprocessors exploit ILP by
executing multiple instructions from a single program in a single cycle.

Consider the following program:


1) e = a + b
2) f = c + d
3) g = e * f
Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of
them are completed. However, operations 1 and 2 do not depend on any other operation, so they
can be calculated simultaneously. If we assume that each operation can be completed in one unit
of time then these three instructions can be completed in a to.al of two units of time.
 A goal of compiler and processor designers is to identify and take advantage of as much ILP as
possible.

Ordinary program execution ILP PROGRAM EXECUTION


Ordinary programs are typically written under ILP allows the compiler and the processor to
a sequential execution model where the overlap the execution of multiple instructions
instructions execute one after the other and or even to change the order in which
in the order specified by the programmer. instructions are executed.
 control dependence:

 Resource dependence:
An instruction is resource-dependent on a previously issued instruction if it requires
a hardware resource which is still being used by a previously issued instruction.
e.g.
div r1, r2, r3
div r4, r2, r5
 Computer Architecture: is a contract (instruction format and the
interpretation of the bits that constitute an instruction) between the
class of programs that are written for the architecture and the set of
processor implementations of that architecture.
 In ILP Architectures: + information embedded in the program pertaining
to available parallelism between instructions and operations in the
program

Sequential Architectures:
The program is not expected to convey any explicit information regarding
parallelism. (Superscalar processors)
Dependence Architectures:
The program explicitly indicates the dependences that exist between
operations (Dataflow processors)
Independence Architectures:
The program provides information as to which operations are independent of
one another. (VLIW processors)
 Program contains no explicit information regarding dependencies that
exist between instructions
 Dependencies between instructions must be determined by the hardware
 It is only necessary to determine dependencies with sequentially
preceding instructions that have been issued but not yet completed
 Compiler may re-order instructions to facilitate the hardware’s task of
extracting parallelism

 A superscalar CPU architecture implements a form of parallelism called


instruction level parallelism within a single processor.

 A superscalar processor executes more than one instruction during a


clock cycle by simultaneously dispatching multiple instructions to
redundant functional units on the processor.

 Each functional unit is not a separate CPU core but an execution


resource within a single CPU such as an arithmetic logic unit, a bit
shifter, or a multiplier.
 A superscalar CPU is typically also pipelined, pipelining and
superscalar architecture are considered different performance
enhancement techniques.
The superscalar technique is traditionally associated with several
identifying characteristics (within a given CPU core):
 Instructions are issued from a sequential instruction stream
 CPU hardware dynamically checks for data dependencies
between instructions at run time (versus software checking at
compile time)
 The CPU accepts multiple instructions per clock cycle
 The compiler (programmer) identifies the parallelism in the program and
communicates it to the hardware (specify the dependences between
operations)
 The hardware determines at run-time when each operation is
independent from others and perform scheduling
 Here, no scanning of the sequential program to determine dependences
 Objective: execute the instruction at the earliest possible time
(available input operands and functional units).

 By knowing which operations are independent, the hardware needs no


further checking to determine which instructions can be issued in the
same cycle
 The set of independent operations >> the set of dependent operations
 Only a subset of independent operations are specified
 The compiler may additionally specify on which functional unit and in
which cycle an operation is executed
 The hardware needs to make no run-time decisions
 A hazard is created whenever there is dependence between instructions, and they
are close enough that the overlap caused by pipelining would change the order of
access to an operand. . Data hazards make the performance lower. The situation
when the next instruction depends on the results of the previous one is occurred
very often. It means that these instructions cannot be executed together. There
are three situations in which a data hazard can occur:
 Read after write (RAW):
A RAW Data Hazard refers to a situation where we refer to a result that has not yet
been calculated or retrieved RAW data hazard is the most common type. It arises
when the next instruction tries to read a source before the previous instruction
writes to it. So, the next instruction gets the old value incorrectly.
 Write after read (WAR):
 A WAR Data Hazard represents a problem with concurrent execution.WAR hazard
arises when the next instruction writes to a destination before the previous
instruction reads it. In this case, the previous instruction gets a new value
incorrectly.
 Write after write (WAW)
A WAW Data Hazard is another situation which may occur in a concurrent
execution environment.
 Structural hazards
 A structural hazard occurs when a part of the processor's hardware is needed by two or more
instructions at the same time.
 Control hazards (branch hazards)
 Branching hazards (also known as control hazards) occur with branches. On many
instruction pipeline microarchitectures, the processor will not know the outcome of the
branch when it needs to insert a new instruction into the pipeline.
TECHNIQUES REDUCES
Forwarding and bypassing Potential data hazard stall
Delayed branches & branch scheduling Control hazard stalls
Basic dynamic scheduling (scoreboarding) Data hazards from true dependencies
Dynamic scheduling with renaming Data hazards from antideps and output deps.
Dynamic branch prediction Control stalls
Speculation Data and control hazard stalls
Dynamic memory disambiguation Data hazard stalls with memory
Dynamic memory disambiguation Data hazard stalls with memory

Forwarding and bypassing Potential data hazard stall


Delayed branches & branch scheduling Control hazard stalls
Basic dynamic scheduling (scoreboarding) Data hazards from true dependencies
Dynamic scheduling with renaming Data hazards from antideps and output deps.
Dynamic branch prediction Control stalls
Speculation Data and control hazard stalls
Dynamic memory disambiguation Data hazard stalls with memory
Dynamic memory disambiguation Data hazard stalls with memory
 Instruction pipelining where the execution of multiple
instructions can be partially overlapped.

 Register renaming which refers to a technique used to avoid


unnecessary serialization of program operations imposed by the
reuse of registers by those operations, used to enable out-of-
order execution.

 Dataflow architectures are another class of architectures where


ILP is explicitly specified, but they have not been actively
researched since the 1980s.

 Superscalar execution, VLIW, and the closely related Explicitly


Parallel Instruction Computing concepts, in which multiple
execution units are used to execute multiple instructions in
parallel.
 Instruction-level parallelism is mainly used to increase processor's
performance; however, parallelism can also be used to increase the
energy efficiency of a system. Instruction-level parallelism makes it
possible to execute more than one instruction per cycle. Today’s
processors use more than one pipeline, which means that they have
superscalar architecture. Instruction-level parallelism increases the
performance but an ideal sequence of uniform instructions is rare. The
execution of one instruction often depends on the result of the
previous instruction’s execution. This situation is a data hazard. Data
hazards reduce the architecture performance.
Thank You

You might also like