0% found this document useful (0 votes)
59 views51 pages

5 Pipelined Processor: Temporal Overlapping of Processing, Assembly Line

This document discusses the principles and design of pipelined processors. It covers topics such as the basic concept of pipelining, pipeline performance measures, data and control dependencies that can cause stalls, and techniques to improve performance like bypassing and multiple-operation instructions. It also provides examples of integer and load/store pipeline designs for RISC and CISC processors, including stages, latency, and techniques to reduce load-use delay.

Uploaded by

Dhiraj Kapila
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views51 pages

5 Pipelined Processor: Temporal Overlapping of Processing, Assembly Line

This document discusses the principles and design of pipelined processors. It covers topics such as the basic concept of pipelining, pipeline performance measures, data and control dependencies that can cause stalls, and techniques to improve performance like bypassing and multiple-operation instructions. It also provides examples of integer and load/store pipeline designs for RISC and CISC processors, including stages, latency, and techniques to reduce load-use delay.

Uploaded by

Dhiraj Kapila
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 51

5 Pipelined Processor

temporal overlapping of processing, assembly line

5.1 Basic concept 5.2 Design space of pipelines 5.3 Overview of pipelined instruction processing 5.4 Pipelined execution of integer and Boolean instructions 5.5 Pipelined processing of loads and stores

TECH
CH01

Computer Science

5.1.1 Principle of pipelining

Principle of pipelining e.g.

Processing of a sequence of instructions using a basic pipeline

Pipelined and unpipelined processing

5.1.2 General structure of pipelines

Structure and pipelined operation of the Fx unit of the IBM Power1

Pipeline Performance Measures


Cycle time: tc
is determined by the worst-case processing time of the longest stage

Repetition Rate: R
the shortest possible time interval between subsequent independent instructions in the pipeline

Performance potential of a pipeline: P P = 1/(R * tc) PowerPC603 FP double Mul. e.g. R = 2, tc = 12 nsec P = 1/(R * tc) = 1/(2*12nec) = 44.6 MFLOPS

Performance: RAW-dependent
Latency:
specifies the amount of time that the result of a particular instruction takes to become available in the pipeline for a subsequent dependent instruction.

Define-use latency (10 to 100 cycles)


mul r1, r2, r3 add r5, r1, r4

Load-use latency (1 to 3 cycles)


load r1, x add r5, r1, r2

Stalled: the immediately following RAW-dependent instruction has to be stalled in the pipeline for n-1 cycle

Improve Performance
Multiple-operation instructions

HP PA 7100
FMPYADD RM1, RM2, RM3, RA1, RA2
RM3RM1*RM2 RA2RA1+RA2

PowerPC
FMA for performing (A*C) + B

5.1.4 Application scenarios of pipelines

5.2 Design space of pipelines


key aspect of the design space of pipeline

5.2.2 Basic layout of a pipeline


Design space of the overall stage layout

Increasing parellelism by raising the number of pipeline stages

Eight -stage pipeline

Problems arise for more stages


data and control dependencies occur more frequently
stalled and wait for data reload pipe in case of branch

subtask becomes less balances (in execution time)


cycle time is determined by the worst-case processing time of the longest stage

In most case
5-10 stages

Pipelines e.g. DEC 21064

Layout of the stage sequence

Bypasses (data forwarding in RAW)


Unless special arrangements are made, the results of the operation instruction is written into the register file, or into the memory, and then it is fetched from there as a source operand.

Principle of bypassing in define-use and loaduse conflicts

Possibilities for the timing of pipeline operation

5.3 Overview: pipelined instruction processing

Declaration of Logical Pipeline: e.g. Powerpc 601

Detailed Specification of each of the pipeline: e.g. //

Implementation of instruction pipelines (v.s. logical)

Layout of physical pipelines

Multiplicity of pipelines

Preserveing sequential consistency

Preserveing sequential consistency, implementation e.g.

Preserveing sequential consistency, e.g.

Case studies: Pentium


Logic layout of Pentiums pipelines

Case studies: PowerPC 604

5.4 (Specific) Pipelines execution: Integer and Boolean instructions (FX)

RISC pipelines 4 or 5 stages

Tradictrional FX pipeline of RISC processors

Logical to Physical: e.g. PowerPC601 using a single universal FX unit

Layout 5 stages e.g. : FX and L/S pipelines in the MIPS R4200

CISC pipeline 6 or 5 stages

Traditional CISC pipeline:


The execution of reister-memory instruction

CISC pipeline:
Execution of register-register and load/store instructions

CISC pipeline 5 stage: recycling E/C stage

Implementation of FX units: how many

Trend in increasing the performance

5.5 (Specific) Pipelines execution: loads and stores

5.5.3 Load-use delay: RICS pipelines

Load-use delay: MIPS

Load-use delay: CISC

Handling Load-use delay


Basic approaches to cope with a load-use delay

Remove Load-use delay

Remove Load-use delay: bringing forward the claculation of virtual address: for slow cache

You might also like