0% found this document useful (0 votes)
11 views8 pages

Computer Architecture Unit 3

The document discusses parallelism in computer architecture, focusing on Instruction-Level Parallelism (ILP) and its types, advantages, and disadvantages. It also covers techniques for increasing ILP, such as pipelining and superscalar execution, as well as the concepts of superpipelined architecture and Very Long Instruction Word (VLIW). Overall, it highlights how parallelism enhances computational speed and efficiency while addressing the complexities and limitations involved.

Uploaded by

Turbo Addict
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views8 pages

Computer Architecture Unit 3

The document discusses parallelism in computer architecture, focusing on Instruction-Level Parallelism (ILP) and its types, advantages, and disadvantages. It also covers techniques for increasing ILP, such as pipelining and superscalar execution, as well as the concepts of superpipelined architecture and Very Long Instruction Word (VLIW). Overall, it highlights how parallelism enhances computational speed and efficiency while addressing the complexities and limitations involved.

Uploaded by

Turbo Addict
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1

Unit 3
Parallelism
Parallelism in computer architecture is a method of breaking down tasks
into smaller parts that can be processed simultaneously by multiple
processors. This increases the speed and efficiency of computation.
Types of parallelism:
i)Instruction-level parallelism (ILP)
ii)Task parallelism
iii)Multiple instruction, multiple data (MIMD)

INSTRUCTION LEVEL PARALLELISM


Basic concepts
➢ Instruction Level Parallelism (ILP) is a measure of how many of the
operations in a computer program can be performed simultaneously.
➢ It is used to refer to the architecture in which multiple operations can
be performed parallelly in a particular process, with its own set of
resources – address space, registers, identifiers, state, and program
counters.
➢ It refers to the compiler design techniques and processors designed to
execute operations, like memory load and store, integer addition, and
float multiplication, in parallel to improve the performance of the
processors.
➢ It is a family of processor and compiler design techniques that speed up
execution by causing individual machine operations, such as memory
loads and stores, integer additions and floating-point multiplications,
to execute in parallel.
Classification of ILP Architectures:
• Sequential Architecture: Here, the program is not expected to
explicitly convey any information regarding parallelism to hardware,
like superscalar architecture.
2

• Dependence Architectures: The program explicitly mentions


information regarding dependencies between operations like dataflow
architecture.
• Independence Architecture: Here, programme gives information
regarding which operations are independent of each other so that they
can be executed.
Advantages of Instruction-Level Parallelism:
• Improved Performance: ILP can significantly improve the
performance of processors by allowing multiple instructions to be
executed simultaneously or out-of-order. This can lead to faster
program execution and better system throughput.
• Efficient Resource Utilization: ILP can help to efficiently utilize
processor resources by allowing multiple instructions to be executed at
the same time. This can help to reduce resource wastage and increase
efficiency.
• Reduced Instruction Dependency: ILP can help to reduce the number
of instruction dependencies, which can limit the amount of instruction-
level parallelism that can be exploited. This can help to improve
performance and reduce bottlenecks.
Disadvantages of Instruction-Level Parallelism
• Increased Complexity: Implementing ILP can be complex and
requires additional hardware resources, which can increase the
complexity and cost of processors.
• Data Dependency: Data dependency can limit the amount of
instruction-level parallelism that can be exploited. This can lead to
lower performance and reduced throughput.
• Reduced Energy Efficiency: ILP can reduce the energy efficiency of
processors by requiring additional hardware resources and increasing
instruction overhead. This can increase power consumption and result
in higher energy costs.
3

Techniques for increasing ILP


Increasing ILP can improve processor performance. Here are some
techniques for increasing ILP:
1. Pipelining:
• Break down the instruction execution process into a series of stages.
• Each stage can process a different instruction simultaneously.
2. Superscalar Execution:
• Execute multiple instructions simultaneously using multiple execution
units.
• Requires complex hardware to manage instruction dependencies.
3. Out-of-Order Execution (OoOE):
• Execute instructions out of their original order to minimize
dependencies.
• Requires complex hardware to manage instruction dependencies.
4. Speculative Execution:
• Execute instructions before it is known whether they are actually
needed.
• Can improve performance by reducing dependencies.
5. Register Renaming:
• Rename registers to avoid dependencies between instructions.
• Allows for more instructions to be executed simultaneously.
6. VLIW:
VLIW is Very Large Instruction Word.
4

SUPERSCALAR ARCHITECTURE
Definition: The main principle of superscalar approach is that it executes
instruction independently in different instruction pipelining which leads to
parallel processing thereby speeding up the processing of instruction.

❖ Superscalar refers to a machine that is designed to improve the


performance of the execution of scalar performance.
❖ Superscalar is in contrast in the intent of vector processors because in
most applications, the bulk of the operations are on scalar quantities.
❖ In superscalar processor, instructions execute independently in
different pipelines. So, throughput increases.
❖ It is a CPU that implements a form of parallelism called ILP within a
single processor.
❖ More commonly used in RISC.
❖ In superscalar processor, there are multiple functional units each of
which is implemented as a pipeline.
❖ Each pipeline consists of multiple stages to handle multiple
instructions at a time which supports parallel execution of instruction.
❖ Superscalar processors are much faster than scaler processors because
it increases throughput as the CPU can execute multiple instructions
per clock cycle.
❖ A scalar processor works on one or two data items, while in
superscalar processor each instruction processes one data item, but as
there are multiple execution units within each CPU, thus multiple
instructions can process separate data items concurrently.
❖ A superscalar processor typically fetches multiple instructions at a time
& try to find nearby instructions that are independent of one another &
can therefore be executed in parallel.
❖ If there is any dependency between input & output of instructions then
that instructions cannot be executed parallelly.
❖ Some unnecessary dependencies are eliminated by using additional
registers.
5

In the above diagram, there is a processor with two execution units; one for
integer and one for floating point operations. The instruction fetch unit is
capable of reading the instructions at a time and storing them in the
instruction queue. In each cycle, the dispatch unit retrieves and decodes up
to two instructions from the front of the queue. If there is one integer, one
floating point instruction and no hazards, both the instructions are
dispatched in the same clock cycle.
Advantages:
• The compiler can avoid many hazards through judicious selection and
ordering of instructions.
6

• In general, high performance is achieved if the compiler is able to


arrange program instructions to take maximum advantage of the
available hardware units.
Limitations:
• Superscalar processor depends on the ability to execute multiple
instructions in parallel.
• A combination of compiler-based optimization & hardware techniques
can be used to maximize instruction level parallelism.

SUPERPIPELINED ARCHITECTURE
Definition: An alternative approach in achieving greater performance
(throughput) is referred to as super pipelining.
❖ In super pipelining processor, many pipeline stages perform tasks that
require less than a half clock cycle. So, the number of executed
instructions will be doubled.
❖ A super pipelined architecture is one that makes use of more and more
small stages of a pipeline in attempt to shorten the clock period. With
more stages more instructions can be in the pipeline at the same time,
increasing parallelism (throughput). That mean instructions are
overlapped.

It issues two instructions per clock cycle & is capable of executing two
instances of each stage in parallel, so that no instances have to be idle in any
time.
7

❖ The number of instructions being processed at a given time depends


on the number of pipeline stages, commonly termed as the pipeline
depth.
❖ Some designer use maximum 8 stages of pipeline.
Benefits: Increases level of parallelism. Increases no of instruction
executed per unit time.
Limitations: The speed of execution is limited to the slowest stage of
parallelism.

VLIW
❖ Very long instruction word (VLIW) refers to instruction set
architectures that are designed to exploit instruction-level
parallelism (ILP).
❖ The limitations of the Superscalar processor are prominent as the
difficulty of scheduling instruction becomes complex. The intrinsic
parallelism in the instruction stream, complexity, cost, and the branch
instruction issue get resolved by a higher instruction set architecture
called the Very Long Instruction Word (VLIW) or VLIW Machines.
8

Advantages:
• Reduces hardware complexity.
• Reduces power consumption.
• Simplifies decoding and instruction issues.
• Increases potential clock rate.
• Functional units are positioned corresponding to the instruction pocket
by compiler.
Disadvantages:
• Complex compilers are required.
• Increased program code size.
• Larger memory bandwidth and register-file bandwidth.
• Unscheduled events, for example, a cache miss could lead to a stall that
will stall the entire processor.
• In case of un-filled opcodes in a VLIW, there is waste of memory space
and instruction bandwidth.

You might also like