Instruction Level Parallelism
Instruction Level Parallelism
Asma hameed
Syeda phool zehra
Zarnigar altaf
Computer designers and computer architects have been striving to improve uniprocessor
computer performance since the first computer was designed and this is done by exploiting
advances in implementation technology.
Architectural innovations have also played a part, and one of the most significant of these
over the last decade has been the rediscovery of RISC architectures.
RISC architectures have gained acceptance in both scientific and marketing circles.
computer architects have been thinking of new ways to improve uniprocessor performance
by exploiting instruction-level parallelism. Many of these proposals are:
VLIW
superscalar
And some old ideas such as vector processing
computer architects take advantage of parallelism by issuing more than one instruction per
cycle explicitly (as in VLIW or super scalar machines) or implicitly (as in vector machines).
The amount of instruction-level parallelism varies widely depending on the type of code
being executed. This is because when we consider uniprocessor performance
improvements due to exploitation of instruction-level parallelism, it is important to keep
in mind the type of application environment
if the dominant applications have little instruction-level parallelism the performance
improvements will be much smaller.
Parallel computing is a form of computation in which many calculations are carried out
simultaneously, operating on the principle that large problems can often be divided into
smaller ones, which are then solved concurrently("in parallel"). Parallel Computations use
multi-processor computers and/or several independent computers interconnected in some
way, working together on a common task.
Parallelism is the simultaneous use of multiple compute resources to solve a
computational problem:
•To be run using multiple CPUs.
•A problem is broken into discrete parts that can be solved concurrently.
•Each part is further broken down to a series of instructions.
•Instructions from each part execute simultaneously on different CPUs.
With the era of increasing processor speeds slowly coming to and end, computer
architects are exploring new ways of increasing throughput. One of the most
promising is to look for and exploit different types of parallelism in code.
TASK PARALLELISM
Entirely different calculations can be performed on either the same or different sets of
data.
Abbreviated as ILP, Instruction-Level Parallelism is a measurement of the number of operations
that can be performed simultaneously in a computer program. Microprocessors exploit ILP by
executing multiple instructions from a single program in a single cycle.
Resource dependence:
An instruction is resource-dependent on a previously issued instruction if it requires
a hardware resource which is still being used by a previously issued instruction.
e.g.
div r1, r2, r3
div r4, r2, r5
Computer Architecture: is a contract (instruction format and the
interpretation of the bits that constitute an instruction) between the
class of programs that are written for the architecture and the set of
processor implementations of that architecture.
In ILP Architectures: + information embedded in the program pertaining
to available parallelism between instructions and operations in the
program
Sequential Architectures:
The program is not expected to convey any explicit information regarding
parallelism. (Superscalar processors)
Dependence Architectures:
The program explicitly indicates the dependences that exist between
operations (Dataflow processors)
Independence Architectures:
The program provides information as to which operations are independent of
one another. (VLIW processors)
Program contains no explicit information regarding dependencies that
exist between instructions
Dependencies between instructions must be determined by the hardware
It is only necessary to determine dependencies with sequentially
preceding instructions that have been issued but not yet completed
Compiler may re-order instructions to facilitate the hardware’s task of
extracting parallelism