Unit-4 Pipelinie and Vector Processing
Unit-4 Pipelinie and Vector Processing
Parallel processing may occur in the instruction stream, the data stream, or both.
Instruction stream: The sequence of instructions read from memory constitutes
an instruction stream.
Data stream: The operations performed on the data in the processor constitutes a
data stream.
Flynn's classification divides computers into four major groups as follows:
● Single instruction stream, single data stream – SISD
● Single instruction stream, multiple data stream – SIMD
● Multiple instruction stream, single data stream – MISD
● Multiple instruction stream, multiple data stream – MIMD
SISD –
● Represents the organization of a single computer containing a control
unit, a processor unit, and a memory unit.
● Instructions are executed sequentially.
● Parallel processing may be achieved by means of multiple functional
units or by pipeline processing
SIMD – Includes multiple processing units with a single control unit. All
processors receive the same instruction from the control unit, but operate
on different data.
One type of parallel processing that does not fit Flynn's classification is
pipelining.
Pipelining
● Pipelining is a technique of decomposing a sequential process into
suboperations, with each subprocess being executed in a special
dedicated segment that operates concurrently with all other segments
● Each segment performs partial processing dictated by the way the
task is partitioned
● The result obtained from the computation in each segment is
transferred to the next segment in the pipeline
● The final result is obtained after the data have passed through all
segments
● Can imagine that each segment consists of an input register followed
by an combinational circuit
● A clock is applied to all registers after enough time has elapsed to
perform all segment activity
● The information flows through the pipeline one step at a time.
Although each car still takes three hours to finish using pipelining,
we can now produce one car each hour rather than one every
three hours
Example: Ai * Bi + Ci for i = 1, 2, 3, ..., 7
Each suboperation is to be implemented in a segment within a pipeline.
Each segment has one or two registers and a combinational circuit as
shown in Fig.9-2.
(Assume that most of the instructions store the result in a register so that the
execution and storing of the result can be combined in one segment.)
Instruction cycle in the
CPU processed with a
four-segment pipeline.
● Figure shows the operation of the instruction pipetine. The time in the horizontal axis
is divided into steps of equal duration.
● Up to four suboperations in the instruction cycle can overlap and up to four different
instructions can be in progress of being processed at the same time
● It is assumed that the processor has separate instruction and data memories
so that the operation in Fl and FO can proceed at the same time.
● Assume that instruction 3 is a branch instruction. As soon as this instruction is
decoded in segment DA in step 4, the transfer from FI to DA of the other instructions
is halted until the branch instruction is executed in step 6.
Instruction Pipeline
There are three major difficulties that cause the instruction pipeline to
deviate from its normal operation :
● Resource conflicts caused by access to memory by two segments
at the same time. Most of these conflicts can be resolved by using
separate instruction and data memories.
● Data dependency conflicts arise when an instruction depends on
the result of a previous instruction, but this result is not yet
available.
● Branch difficulties arise from branch and other instructions that
change the value of PC .
RICS Pipeline
The simplicity of the RICS instruction set can be utilized to implement an
instruction pipeline using a small number of suboperations, with each being
executed in one clock cycle.
The instruction cycle can be divided into three suboperations and
implemented in three segments:
I: Instruction fetch :The I segment fetches the instruction from program
memory.
A: ALU operation : The instruction is decoded and an ALU operation is
performed in the A segment. The ALU is used for three different functions,
depending on the decoded instruction. It performs an operation for a data
manipulation instruction, it evaluates the effective address for a load or store
instruction, or it calculates the branch address for a program control
instruction.
E: Execute instruction : The E segment directs the output of the ALU to one
of three destinations, depending on the decoded instruction. It transfers the
result of the ALU operation into a destination register in the register file, it
transfers the effective address to a data memory for loading or storing, or it
transfers the branch address to the program counter.
Consider now the operation of the
following four instructions:
1. LOAD: Rl <-- M [address 1]
2. LOAD: R2 <-- M [address 2]
3. ADD: R3 <-- R l + R2
4. STORE: M[address 3] <-- R3
Q3: Perform the addition of the following floating point numbers using
arithmetic pipeline.
X=0.9504*103
Y=0.8200*102
Also draw the diagram representing pipeline for the floating point
addition and subtraction.
Practice Questions
Q4: Determine the number of clock cycles it takes to process 200
tasks in a 6 segment pipeline.
Solution : k = 6 segments, n = 200 tasks
(k + n – 1) = 6 + 200 – 1 = 205 cycles