Chapter 5 Pipelining and Vector Processing Modified
Chapter 5 Pipelining and Vector Processing Modified
- Dr. T.KAVITHA,
ASSOC.PROF.
MVSREC
COMPLEX INSTRUCTION SET COMPUTERS: CISC
• Parallel Processing
• Pipelining
• Arithmetic Pipeline
• Instruction Pipeline
•VLIW processor
Parallel Processing
PARALLEL PROCESSING
Parallel processing is a term used to denote a large class of techniques that are
used to provide simultaneous data-processing tasks for the purpose of
increasing Computational speed of a computer system.
- Program level
- Task level
Levels of Parallel Processing
- Instruction level
- DATA level
Parallel Processing
PARALLEL COMPUTERS
– Flynn's classification
Based on the number of Instruction Streams and Data Streams that are
manipulated simultaneously
• Instruction Stream
– Sequence of Instructions read from memory
• Data Stream
– Operations performed on the data in the processor
Instruction stream
Characteristics
- Standard von Neumann machine
- Instructions and data are stored in memory
- One operation at a time
Instructions are executed sequentially and the system may or may not have
parallel processing capabilities.
M CU P
M CU P Memory
• •
• •
• •
M CU P Data stream
Instruction stream
Characteristics
- There is no computer at present that can be
classified as MISD
Parallel Processing
Control Unit
Instruction stream
Characteristics
P M P M ••• P M
Interconnection Network
Shared Memory
Characteristics
- Multiple processing units
– Pipeline processing
– Vector processing
– Array processors
PIPELINING
1.Arithmetic Pipeline
2.Instruction Pieline
Pipelining
GENERAL PIPELINE
General Structure of a 4-Segment Pipeline
Clock
Input S1 R1 S2 R2 S3 R3 S4 R4
Space-Time Diagram
1 2 3 4 5 6 7 8 9 Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6
2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
Pipelining
PIPELINE SPEEDUP
n: Number of tasks to be performed
Speedup The remaining (n-1) tasks emerge from the pipe at the rate of
one task per clock cycle and will be completed in (n-1)tp.
Sk: Speedup
Sk = n*tn / (k + n - 1)*tp If ‘n’ becomes larger than K-1 and K+n-1 approaches
The value of ‘n’.
tn
lim Sk = ( = k, if tn = k * tp )
n tp
Pipelining
Pipelined System
(k + n - 1)*tp = (4 + 99) * 20 = 2060nS
Non-Pipelined System
n*k*tp = 100 * 80 = 8000nS
Speedup
Sk = 8000 / 2060 = 3.88
R R
X=Ax2 a
Y = B x 2b Compare Difference
Segment 1: exponents
by subtraction
4-Segments and their function
R
[1] Compare the exponents
[2] Align the mantissa Segment 2: Choose exponent Align mantissa
[3] Add/sub the mantissa
[4] Normalize the result R
R R
R R
X=0.95404x103
Y=0.8200x102
Z =1.0234x103;
Instruction Pipeline: Pipeline processing can occur not only in the data
stream but in the instruction stream as well.
This cause the instructions fetch and execute phases to overlap and
perform simultaneous operations.
INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle
INSTRUCTION PIPELINE
Execution of Three Instructions in a 4-Stage Pipeline
Conventional
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
Pipelined
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
Instruction Pipeline
Decode instruction
Segment2: and calculate
effective address
yes Branch?
no
Segment3: Fetch operand
from memory
Interrupt yes
Interrupt?
handling
no
Update PC
Empty pipe
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO EX
(Branch) 3 FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX
Pipeline conflicts or HAzards
Most of these conflicts can be resolved by using separate instruction and data
memories.
i FI DA FO EX
i+1 FI DA FO EX
Solution : The Pipeline is stalled for resource conflict Two Loads with
one port memory
Hardware Technique
Interlock
- hardware detects the data dependencies and delays the scheduling
of the dependent instruction by stalling enough clock cycles.
- Branch target address is not known until the branch instruction is decoded.
Branch
FI DA FO EX
Instruction
Next FI DA FO EX
Instruction
– Both are saved until branch is executed. Then, select the right
instruction stream and discard the wrong stream
Delayed Branch