0% found this document useful (0 votes)

2K views43 pages

Pipeline Processing

Here are the answers to the questions: 1) Cycle time of pipelined processor with 5 evenly divided stages: CycleTime(unpipelined) / No. of stages + Latch latency = 25ns / 5 + 1ns = 5ns + 1ns = 6ns 2) Cycle time of pipelined processor with 50 evenly divided stages: CycleTime(unpipelined) / No. of stages + Latch latency = 25ns / 50 + 1ns = 0.5ns + 1ns = 1.5ns 3) Total latency of 5 stage pipeline = No. of stages x Cycle time = 5 x 6ns = 30ns 4

Uploaded by

prabhakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2K views43 pages

Pipeline Processing

Uploaded by

prabhakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 43

PIPELINE PROCESSING

PARALLEL PROCESSING
• A parallel processing system is able to perform
concurrent data processing to achieve faster
execution time

• The system may have two or more ALUs and be

able to execute two or more instructions at the
same time

A computer employing parallel processing is

also called parallel computer.
Parallel processing classification

Single instruction stream, single data stream – SISD

Single instruction stream, multiple data stream – SIMD

Multiple instruction stream, single data stream – MISD

Multiple instruction stream, multiple data stream – MIMD

Types of Parallel Computers
• Based on architectural configurations
– Pipelined computers
– Array processors
– Multiprocessor systems
Architectural Classification

– Flynn's classification
• Based on the multiplicity of Instruction Streams and
Data Streams
• Instruction Stream
– Sequence of Instructions read from memory
• Data Stream
– Operations performed on the data in the processor

Number of Data Streams

Single Multiple

Number of Single SISD SIMD

Instruction
Streams Multiple MISD MIMD
COMPUTER ARCHITECTURES FOR PARALLEL
PROCESSING
Von-Neuman SISD Superscalar processors
based
Superpipelined processors

VLIW(Very Long Instruction Word Arch.)

MISD Nonexistence

SIMD Array processors

Systolic arrays
Dataflow
Associative processors

MIMD Shared-memory multiprocessors

Reduction
Bus based
Crossbar switch based
Multistage IN based

Message-passing multicomputers

Hypercube
Mesh
Reconfigurable
PIPELINE PROCESSING
• Pipeline is a technique of overlapping the
execution of several instructions to reduce
the execution time of a set of instructions.

• It is a cascade of processing stages which

are linearly connected to perform a fixed
function over a stream of data flowing from
one end to another.
Advantages of Pipeline Processing
• Reduced access time
• Increased throughput
Types of Pipeline Models
• Asynchronous pipeline
• Synchronous pipeline

Both models external inputs are fed into the first

stage. The processed results are passed from
stage Si to stage Si+1 for all i=1,2,…,k-1.

Final results appears in stage Sk.

Asynchronous Pipeline Model
• Data flow between adjacent stages is controlled
by a handshaking protocol.
• Stage Si is ready to transmit data, it sends a
ready signal to stage Si+1. After stage Si+1
receives the incoming data, it returns an ACK
signal to stage Si.
• Advantages
– Useful for designing communication channel
in message passing multicomputers

• Disadvantages
– Variable throughput size
– Different amounts of delay may be used in
different stages
Synchronous pipeline
• Here clocked latches are used to interface
between stages. The latches are used to
isolate input from outputs.
• Upon arrival of the clock pulses , all
latches transfer data to the next stage
simultaneously.
Clock

Input S1 R1 S2 R2 S3 R3 S4 R4

• Advantage
– Equal delay in all stages
Instruction Execution steps
• Instruction fetch (IF) from MM
• Instruction Decoding (ID)
• Operand Fetch (OF), if any
• Execution of the decoded instruction (EX)
Non-pipelined computer
- 6 – Stage
- Instruction fetch, Instruction Decode, Operand
Address calculate, Operand fetch, Execute,
Write Result
Space-Time Diagram
1 2 3 4 5 6 7 8 9 Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6
2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
Pipelined Computer
EX I1 I2 I3
OF I1 I2 I3
ID I1 I2 I3
IF I1 I2 I3 I4
Stages
/Time
1 2 3 4 5 6 7 8 9 10 11 12 13
In the first cycle instruction I1 is fetched from memory. In the second
cycle another instruction I2 is fetched from memory and simultaneously
I1 is decoded by the instruction decoding unit.
Instruction Pipeline

INSTRUCTION PIPELINE

Execution of Three Instructions in a 4-Stage Pipeline

Conventional

i FI DA FO EX

i+1 FI DA FO EX

i+2 FI DA FO EX

Pipelined

i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
PIPELINING
A technique of decomposing a sequential process
into suboperations, with each subprocess being
executed in a partial dedicated segment that
operates concurrently with all other segments.
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi Memory Ci
Segment 1
R1 R2

Multiplier
Segment 2
R3 R4

Adder
Segment 3

R1  Ai, R2  Bi Load Ai and Bi

R3  R1 * R2, R4  Ci Multiply and load Ci
R5  R3 + R4 Add
OPERATIONS IN EACH PIPELINE
STAGE
Clock Segment 1 Segment 2 Segment 3
Pulse
Number R1 R2 R3 R4 R5
1 A1 B1
2 A2 B2 A1 * B1 C1
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 A7 * B7 C7 A6 * B6 + C6
9 A7 * B7 + C7
INSTRUCTION EXECUTION IN A 4-STAGE
PIPELINE
Segment1: Fetch instruction
from memory

Decode instruction
Segment2: and calculate
effective address

yes Branch?
no
Fetch operand
Segment3: from memory

Segment4: Execute instruction

Interrupt yes
Interrupt?
handling
no
Update PC

Empty pipe
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO EX
(Branch) 3 FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX
Example: 6 tasks, divided into 4
segments
1 2 3 4 5 6 7 8 9

T1 T2 T3 T4 T5 T6

T1 T2 T3 T4 T5 T6
Pipeline Performance
• Latency
– It is the amount of time, that a single operation takes
to execute
• Throughput
– It is the rate at which each operations gets executed.
(Operations/second or operations/cycle)

1
Non-pipelined processor, throughput =
Latency

1
pipelined processor, throughput >
Latency
• Cycle time of pipeline processor
– Dependent on 4 factor
• Cycle time of unpipelined processor
• Number of pipeline stages
• How evenly data path logic is divided among the stages
• Latency of the pipeline stages
• If the logic is evenly divided, then the clock period
of the pipeline processor is
CycleTime unpipelined
CycleTime pipelined   pipeline latch latency
No.of pipeline stages

• If the logic is cannot be evenly divided, then the

clock period of the pipeline processor is
• Cycle Time = Longest pipeline stages + Pipeline latch latency
• Latency of each pipeline = Cycle time of the pipeline x No. of pipeline stages
Questions

• An unpipelined processor has a cycle time

of 25 ns. What is the cycle time of
pipelined version of the rocessor with 5
evenly divided pipeline stages, if each
pipeline latch latency of 1 ns? What if the
processor is divided into 50 pipeline
stages? What is the total latency of the
pipeline? How about if the processor is
divided into 50 pipeline stages?
Solution
Given data: Cycle Timeunpipelined = 25 ns
No. of pipeline stages = 5
pipeline latch latency = 1 ns

CycleTime unpipelined
CycleTime pipelined   pipeline latch latency
No.of pipeline stages
= (25 / 5) + 1 ns = 6 ns
Therefore, Cycle time of the 5 pipeline stages = 6 ns
Latency of each pipeline = Cycletime of the pipeline x No. of pipeline stages
= 6 ns x 5 = 30 ns
For the 50 stage pipeline, cycle time = (25 ns / 50) + 1 ns = 1.5 ns
Therefore, Cycle time of the 50 pipeline stages = 1.5 ns
Latency of each pipeline = Cycletime of the pipeline x No. of pipeline stages
= 1.5 ns x 50 = 75 ns
Questions
• Suppose an unpipelined processor with a
25 ns cycle time is divided into 5 pipeline
stages with latencies of 5, 7, 3,6 and 4 ns.
If the pipeline latch latency is 1 ns, what is
the cycle time of the pipeline processor?
What is the latency of the resulting
pipeline?
Solution
• Here, unpipeline processor is used.
• The longest pipeline stage is : 7 ns
• Pipeline latch latency is = 1 ns
• Cycle time = Longest pipeline stages + Pipeline latch Latency
= 7 + 1 = 8 ns
Therefore, cycle time of the unpipelined processor = 8 ns
There are 5 pipeline stages.
Total latency = Cycle Time of the pipeline x No. of pipeline stages
= 8 ns x 5 = 40 ns
Question
• Suppose that an unpipelined processor has a
cycle time of 25 ns and that its datapath is made
up of modules with latencies of 2,3,4,7,3,2 and 4
ns (in that order). In pipelining this processor, it
is not possible to rearrange the order of the
modules (For example, putting the register read
stage before the instruction decode stage) or to
divide a module into multiple pipeline stages (for
complexity reasons). Given pipeline latches with
1 ns latency. What is the minimum cycle time
that can be achieved by pipelining this
processor?
Solution
• There is no limit on the number of pipeline
stages.
• The minimum cycle time =

Latency of the longest module in the datapath + Pipeline latch time

= 7 + 1 ns
= 8 ns
Question
• Given an unpipelined processor with a 10 ns
cycle time and pipeine latches with 0.5 ns
latency?
a. What are the cycle times of pipelined versions of
the processor with 2,4,7 and 16 stages if the
datapath logic is evenly divided among the
pipeline stages?
b. What is the latency of the pipelined versions of
the processor?
c. How many stages of pipelining are required to
achieve a cycle time of 2 ns and 1 ns?
Solution – a
CycleTime unpipelined
CycleTime pipelined   pipeline latch latency
No.of pipeline stages
Given data: Cycle Timeunpipelined = 10 ns
No. of pipeline stages = 2,4,7 and 16
pipeline latch latency = 0.5 ns
Cycle time pipeline for 2 stage pipeline = (10 ns / 2) + 0.5 = 5.5 ns
Cycle time pipeline for 4 stage pipeline = (10 ns / 4) + 0.5 = 3 ns
Cycle time pipeline for 7 stage pipeline = (10 ns / 7) + 0.5
= 1.42857 + 0.5 = 1.92857 ns
Cycle time pipeline for 7 stage pipeline = (10 ns / 16) + 0.5
= 0.625 + 0.5 = 1.125 ns
Solution – b
• Latency of each pipeline
= Cycle time of the pipeline x No. of pipeline stages
Latency for 2 stage pipeline = 5.5 x 2 = 11 ns
Latency for 4 stage pipeline = 3 x 4 = 12 ns
Latency for 7 stage pipeline = 1.92857 x 7
= 13.49999 ns
Latency for 16 stage pipeline = 1.125 x 16 = 18 ns
Solution – C
• 1st solve the number of pipeline stages
CycleTime unpipelined
CycleTime pipelined   pipeline latch latency
No.of pipeline stages
CycleTimeunpipelined
Number of pipeline stages 
cycle time pipelined  Pipeline latch latency

= (10 ns / (2ns – 0.5 ns)) = 10 / 1.5 = 6.6667

Therefore, Number of pipeline stages required to achieve 2 ns cycle
time is 6.6667 = 7 stages (approx)
(since fractional part of pipeline stages is not allowed)

Similarly, Number of pipeline stages required to achieve 1 ns cycle

= 10 ns / (1 ns – 0.5 ns) = 10/0.5 = 20 stages
Pipeline Hazards
Pipeline Hazards
• Pipeline increases the processor performance.
– Several instructions are overlapped in the pipeline,
cycle time can be reduced, increasing the rate at
which instructions are executed.
– There are number of factors that limits a pipeline
ability to execute instructions at its peak rate,
including dependencies between instructions,
branches and the time required to access the
memory.
Types of Hazards
• Instruction Hazards
• Structural Hazards
• Control Hazards
• Branches
Hazards in Pipelining
• Procedural dependencies => Control hazards
– conditional and unconditional branches,
calls/returns
• Data dependencies => Data hazards
– RAW (read after write)
– WAR (write after read)
– WAW (write after write)
• Resource conflicts => Structural hazards
– use of same resource in different stages
Instruction Hazards
– Occurs when instructions are R/W reg. that are used by other
instructions.
• RAR Hazards
– Occurs when 2 instructions both read from the same reg.
– Example:
» ADD R1, R2, R3
» SUB R4, R5, R3
• RAW Hazards
– Occurs when instruction reads a reg. that was written by prev. instructions
– Example:
» ADD R1, R2, R3
» SUB R4, R5, R1
• WAR Hazards
– Occurs when output reg. of an instruction has been read by a prev. instructions
– Example
» ADD R1, R2, R3
» SUB R2, R5, R6
• WAW Hazards
– Occurs when output reg. of an instruction has been written by prev. instructions
– Example:
» ADD R1,R2,R3
» SUB R1,R5,R6
STRUCTURAL HAZARDS
Structural Hazards
It occurs when the processor’s H/W is not capable of executing all the
Instructions in the pipeline simultaneously.

Example: With one memory-port, a data and an instruction fetch

cannot be initiated in the same clock
i FI DA FO EX

i+1 FI DA FO EX

i+2 stall stall FI DA FO EX

The Pipeline is stalled for a structural hazard

<- Two Loads with one port memory
-> Two-port memory will serve without stall
Control Hazards
• The delay between when a branch
instructions enters the pipeline and the
time at which the next instructions enters
the pipeline is called the processor’s
branch delay or Control Hazards.
• The delay is mainly due to the control flow
of the program
CONTROL HAZARDS

Branch Instructions

- Branch target address is not known until

the branch instruction is completed
Branch
FI DA FO EX
Instruction
Next FI DA FO EX
Instruction

Target address available

- Stall -> waste of cycle times

Branches
• Branch instructions can also cause delay
in pipelined processor because the
processor cannot determine which
instruction to fetch next until the branch
has executed.
• Conditional branch instruction creates data
dependencies between the branch
instructions and the instruction fetch stage
of the pipeline.
Question
A. Identify all of the RAW hazards in this
instruction queue
DIV R2, R5, R8
SUB R9, R2, R7
ASH R5, R14, R6
MUL R11, R9, R5
BEG R10, #0, R12
OR R8, R15, R2
B. Identify all of the WAR hazards in the previous instruction
sequence
C. Identify all of the WAW hazards in the previous instruction
sequence
D. Identify all of the control hazards in the previous instruction
sequence
Solution
A). RAW hazards exists between following
instructions
– Between DIV and SUB
– Between ASH and MUL
– Between SUB and MUL
– Between DIV and OR.

B). RAW hazards exists between following

- Between DIV and ASH
- Between DIV and OR
C) There are no WAW hazards
D) There is only one control hazard. Between BEQ and OR

Pipelining Basic and Intermediate Concepts
No ratings yet
Pipelining Basic and Intermediate Concepts
75 pages
07 Pipeline Notes
No ratings yet
07 Pipeline Notes
145 pages
Pipeline Processing
No ratings yet
Pipeline Processing
28 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
74 pages
Co MODULE 3 - Merged
No ratings yet
Co MODULE 3 - Merged
102 pages
COA Module 3 PPT Part 2
No ratings yet
COA Module 3 PPT Part 2
62 pages
Solution Manual of The 80×86 IBM PC and Compatible Computers - 4th Edition
60% (5)
Solution Manual of The 80×86 IBM PC and Compatible Computers - 4th Edition
126 pages
Unit 6
No ratings yet
Unit 6
30 pages
COA Practice Problems
No ratings yet
COA Practice Problems
59 pages
COA Unit-3 Slides
No ratings yet
COA Unit-3 Slides
76 pages
Pipelining Concepts and Problems
No ratings yet
Pipelining Concepts and Problems
33 pages
Module 3-Part 2
No ratings yet
Module 3-Part 2
50 pages
Pipelining Unit 3
No ratings yet
Pipelining Unit 3
19 pages
CS 211: Computer Architecture: Instructor: Prof. Bhagi Narahari
No ratings yet
CS 211: Computer Architecture: Instructor: Prof. Bhagi Narahari
82 pages
Stud CSA Mod4 p2 PipeliningBasics
No ratings yet
Stud CSA Mod4 p2 PipeliningBasics
83 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Pipelining I: Prepared By: Noshaba Nasir
No ratings yet
Pipelining I: Prepared By: Noshaba Nasir
32 pages
Pipelining and Others
No ratings yet
Pipelining and Others
34 pages
Lecture 4
No ratings yet
Lecture 4
19 pages
MIH61R Mission Hills - Sawgrass 10097 1 2 PDF
100% (1)
MIH61R Mission Hills - Sawgrass 10097 1 2 PDF
51 pages
Computer Architecture Pipe Line
No ratings yet
Computer Architecture Pipe Line
28 pages
UNIT-5: Pipeline and Vector Processing
No ratings yet
UNIT-5: Pipeline and Vector Processing
63 pages
Pipelining
No ratings yet
Pipelining
47 pages
Slide 6
No ratings yet
Slide 6
46 pages
Pipelining Basic Concept
No ratings yet
Pipelining Basic Concept
23 pages
Unit 4 Coa
No ratings yet
Unit 4 Coa
25 pages
LECTURE 3 Pipelining
No ratings yet
LECTURE 3 Pipelining
27 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Co Unit 4
No ratings yet
Co Unit 4
17 pages
Pipeline 1
No ratings yet
Pipeline 1
17 pages
Unit 3 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Advanced Computer Architecture - WWW - Rgpvnotes.in
15 pages
5 Pipeline
No ratings yet
5 Pipeline
63 pages
Pipe Lining
No ratings yet
Pipe Lining
23 pages
ACA - Chapter 6
No ratings yet
ACA - Chapter 6
75 pages
Pipelining
No ratings yet
Pipelining
44 pages
Parallel Processing Chapter - 3: Instruction Level Parallelism
No ratings yet
Parallel Processing Chapter - 3: Instruction Level Parallelism
33 pages
Uni1-2 Pipelining
No ratings yet
Uni1-2 Pipelining
12 pages
Pipeline and Vector
No ratings yet
Pipeline and Vector
29 pages
3.4 Pipelining Performance2
No ratings yet
3.4 Pipelining Performance2
12 pages
Computer Architecture 1
No ratings yet
Computer Architecture 1
8 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
33 Hazards in Pipeline 06-04-2023
No ratings yet
33 Hazards in Pipeline 06-04-2023
27 pages
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
No ratings yet
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
8 pages
ACA - Pipelining
No ratings yet
ACA - Pipelining
25 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
28 pages
PIPELINE
No ratings yet
PIPELINE
13 pages
CAO-II Module 2 Complete
100% (1)
CAO-II Module 2 Complete
32 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
PCC-CS402
No ratings yet
PCC-CS402
7 pages
CO Gate 2023
No ratings yet
CO Gate 2023
6 pages
Comparison Between Pipelining
No ratings yet
Comparison Between Pipelining
9 pages
Chap-10: Speed and Efficiency
No ratings yet
Chap-10: Speed and Efficiency
29 pages
CSN-221 Pipelines-Quiz: Enrollment No.: 18114031 Name - Hemil Panchiwala
No ratings yet
CSN-221 Pipelines-Quiz: Enrollment No.: 18114031 Name - Hemil Panchiwala
6 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
Module 4
No ratings yet
Module 4
12 pages
Coa Lecture Unit 3 Pipelining
No ratings yet
Coa Lecture Unit 3 Pipelining
95 pages
اسمبلي ٩
No ratings yet
اسمبلي ٩
3 pages
Parallel Processing
No ratings yet
Parallel Processing
32 pages
An Introductory Analysis of Pipelines: I I I I I Clock Cycles
No ratings yet
An Introductory Analysis of Pipelines: I I I I I Clock Cycles
2 pages
PipeLining in Microprocessors
No ratings yet
PipeLining in Microprocessors
19 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
IEEE 2020-2021 VLSI Project Titiles
No ratings yet
IEEE 2020-2021 VLSI Project Titiles
2 pages
CN 320: Microprocessor and Microcontroller Systems: Lecture I-Introduction
No ratings yet
CN 320: Microprocessor and Microcontroller Systems: Lecture I-Introduction
35 pages
Memory Organization Assignment
No ratings yet
Memory Organization Assignment
61 pages
Host Configuration
No ratings yet
Host Configuration
85 pages
Project SRAM - Design Review II: University of Virginia
No ratings yet
Project SRAM - Design Review II: University of Virginia
27 pages
Unit-I (First Half)
No ratings yet
Unit-I (First Half)
38 pages
EEE415 Week04 Machine Language
No ratings yet
EEE415 Week04 Machine Language
51 pages
68HC11 Introduction.: Bits and Bytes
0% (1)
68HC11 Introduction.: Bits and Bytes
22 pages
SH79F085 Sinowealth
No ratings yet
SH79F085 Sinowealth
97 pages
Atmel Avr Atmega32 Features: Lecture-12 By-Prof. Kanade D.G
No ratings yet
Atmel Avr Atmega32 Features: Lecture-12 By-Prof. Kanade D.G
13 pages
Assembly Program For 1
No ratings yet
Assembly Program For 1
5 pages
Book Description
No ratings yet
Book Description
4 pages
Addressing Modes - NJN
No ratings yet
Addressing Modes - NJN
47 pages
CPU Benchmarks
No ratings yet
CPU Benchmarks
62 pages
Microprocessor 1
No ratings yet
Microprocessor 1
31 pages
Lecture 2 - Microprocessor
No ratings yet
Lecture 2 - Microprocessor
20 pages
Lect5 Dctran
No ratings yet
Lect5 Dctran
36 pages
Lecture-5 (8086 Hardware Specifications - Pin Specification and Timing Diagrams) Notes
No ratings yet
Lecture-5 (8086 Hardware Specifications - Pin Specification and Timing Diagrams) Notes
42 pages
Best Processors - January 2024
No ratings yet
Best Processors - January 2024
5 pages
Elecom-8085 Mic MCQs
No ratings yet
Elecom-8085 Mic MCQs
3 pages
Multicore Processor
No ratings yet
Multicore Processor
18 pages
U 4
No ratings yet
U 4
19 pages
Study of 6T SRAM Cell Using High-K Gate Dielectric Based Junctionless Silicon Nanotube FET
No ratings yet
Study of 6T SRAM Cell Using High-K Gate Dielectric Based Junctionless Silicon Nanotube FET
8 pages
About System
No ratings yet
About System
9 pages
CO Important Topics
No ratings yet
CO Important Topics
5 pages
Xy 2500 5 08 4 Pin Male and Female Hanging Type
No ratings yet
Xy 2500 5 08 4 Pin Male and Female Hanging Type
2 pages
HC28.22.110 Bifrost JemDavies ARM v04 9
No ratings yet
HC28.22.110 Bifrost JemDavies ARM v04 9
31 pages
Program Control Instructions
No ratings yet
Program Control Instructions
3 pages
Stack Computers: The New Wave
From Everand
Stack Computers: The New Wave
Philip Koopman
No ratings yet

Pipeline Processing

Uploaded by

Pipeline Processing

Uploaded by

PIPELINE PROCESSING

• The system may have two or more ALUs and be

A computer employing parallel processing is

Single instruction stream, single data stream – SISD

Single instruction stream, multiple data stream – SIMD

Multiple instruction stream, single data stream – MISD

Multiple instruction stream, multiple data stream – MIMD

Number of Data Streams

Number of Single SISD SIMD

VLIW(Very Long Instruction Word Arch.)

SIMD Array processors

MIMD Shared-memory multiprocessors

• It is a cascade of processing stages which

Both models external inputs are fed into the first

Final results appears in stage Sk.

Execution of Three Instructions in a 4-Stage Pipeline

R1  Ai, R2  Bi Load Ai and Bi

Segment4: Execute instruction

• If the logic is cannot be evenly divided, then the

• An unpipelined processor has a cycle time

Latency of the longest module in the datapath + Pipeline latch time

= (10 ns / (2ns – 0.5 ns)) = 10 / 1.5 = 6.6667

Similarly, Number of pipeline stages required to achieve 1 ns cycle

Example: With one memory-port, a data and an instruction fetch

i+2 stall stall FI DA FO EX

The Pipeline is stalled for a structural hazard

- Branch target address is not known until

Target address available

- Stall -> waste of cycle times

B). RAW hazards exists between following

You might also like