0% found this document useful (0 votes)

45 views57 pages

Stud CSA Mod 5p2 Arithmetic SuperPipeline

The total branch penalty is 0.36 clock cycles.

Uploaded by

sheenanees

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views57 pages

Stud CSA Mod 5p2 Arithmetic SuperPipeline

The total branch penalty is 0.36 clock cycles.

Uploaded by

sheenanees

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 57

COMPUTER SYSTEM ARCHITECTURE - CS

405

Module - 5 Part - 2
Arithmetic Pipeline Design
Super Scalar Pipeline Design
٣
Arithmetic Pipeline for Floating Point Addition & Subtraction

 Exponent difference determine how many times the

mantissa associated with the smaller exponent must be
shifted to the right.
 This produces are alignment of two mantissas.
 The then mantissas are added or subtracted in segment 3.
 Finally result is normalised in segment 4.
 When a overflow occurs, the mantissa of the sum or
difference is shifted right and the exponent is incremented
by one.
 When an underflow occurs, the number of leading zeroes
in the mantissa determines number of left shifts in the
mantissa and the number that must be subtracted from the
exponent.
Fixed Point Multiplication Pipeline:
• A pipelined multiplier based on the digit products can be designed
using digit product generation logic and the digit adders.
Example:
25 * 35 = 875

Now for binary multiplication:

A = a1a0
B = b1b0
Multiply Pipeline Design
Multiply Pipeline Design
 Multiply A and B, two 8-bit numbers. Product is16-bit number,
P= A x B = P0+P1+P2+P3+P4+P5+P6+P7
 By multiplying the last bit of multiplier(B), by all the bits of
multiplicand (A), the partial product P0 is got.
 Partial products P0, P1, P2, P3, P4, P5, P6 and P7 corresponding to
multiplication of each of 8 bits of multiplier with all bits of multiplicand
can be generated simultaneously.
 Partial product Pj is got by multiplying the multiplicand A by the jth bit
of B and then shifting the result j bits to the left for j = 0 to 7.
 Thus partial product Pj is ( 8+j ) bits long with j trailing zeroes.
 Now (P0,P1,P2,P3,P4,P5,P6,P7) will be having length varying from
8 to15 bits
 Now these are the partial products, which are to be added.
 Summation of the eight partial products is done with a Wallace tree
of CSAs plus a CPA at the final stage.
 (P0,P1,P2), (P3,P4,P5) (P6,P7) given to 3 CSAs in first stage and
their output fed to CSAs in later stages and finally to a CPA to
produce 16-bit output (P).
Wallace tree for multiplying two 8-bit nos
Floating Point Multiplication Pipeline
• FP multiplication involves the following three major steps:
1. Multiplication of fractions.
2. Addition of exponents.
3. Normalization of the result.
• Since fractions and exponents are fixed-point numbers, the
steps 1 & 2 can be implemented using the principles discussed
before.
• Normalization step can be implemented as given in the floating
point addition.
Floating Point Division Pipeline:

Division operation appears less frequently in computer programs

compared to addition subtraction and multiplication and hence
separate pipeline unit for the division is seldom implemented.
It is common to schedule the division using adder and multiplier
pipelines.
Static vs Dynamic Pipeline

 Static Pipeline  Dynamic Pipeline

• It assumes only one functional • It permits several functional
configuration at a time configurations to exist
simultaneously
• It can be either unifunctional • A dynamic pipeline must be
or multifunctional multi-functional
• preferred when instructions of • requires more elaborate
same type are to be executed control and sequencing
continuously. mechanisms
• A unifunctional pipeline must • Multifunctional pipeline must
be static. be dynamic
Multifunctional Pipeline (*no need to study)

=
Multifunctional Pipeline - 4X-TI-ASC
• A multifunction pipe may perform different functions either at
different times or same time, by interconnecting different subset of
stages in pipeline.
• Eg: 4X-TI-ASC (Supercomputer - 1973)
• It has four multifunction pipeline processors, each one reconfigurable
for a variety of arithmetic or logic operations at different times.
• It is a four central processor comprised of nine units.
• It has
– one instruction processing unit
– four memory buffer units and
– four arithmetic units.
• Thus it provides four parallel execution pipelines below the IPU.
• Any mixture of scalar and vector instructions can be executed
simultaneously in four pipes.
2- issue Superscalar processor

2- issue Superscalar processor

2- issue Superscalar processor
2- issue Superscalar processor
2- issue Superscalar processor
2- issue Superscalar processor
In-order Instruction Issue & In-order Completion
In-order Instruction Issue & Out-of-order Completion
Out-of-order Instruction Issue & Out-of-order Completion
Instruction Issue & Completion
• In-order issue and completion is the simplest to implement. But
unnecessary stalls or delays in keeping program order. Still attractive
in multiprocessor environment.
• Out-of-order completion found in scalar and superscalar processors
• Long latency operations ( loads and FP operations) can be hidden in
out-of-order completion .
• Output dependency and anti dependency prevents out -of -order
completion
• Out-of-order issue gives freedom to exploit parallelism, enhances
pipeline efficiency
• Multiple pipeline scheduling is an NP-complete problem.
• Optimal scheduling is expensive
• Simple data dependence checking, a small look ahead window, and
scoreboarding mechanisms along with an optimizing compiler are
needed to to exploit instruction parallelism in a superscalar
Pipelining Example
• Assume the 5 stages take time 10ns, 8ns, 10ns, 10ns, and
7ns respectively
• Unpipelined
–Avg instruction execution time = 10+8+10+10+7= 45 ns
• Pipelined
–Each stage introduces some overhead, say 1ns per stage
–We can only go as fast as the slowest stage!
–Each stage then takes 11ns; in steady state we execute
each instruction in 11ns
–Speedup = UnpipelinedTime / Pipelined Time
= 45ns / 11ns = 4.1 times or about a 4X speedup
Note: Actually a higher latency for pipelined instructions!
Measuring Performance with Stalls
Ave _ Instr _ Time _ Unpiped
Speedup _ from _ Pipelining 
Ave _ Instr _ Time _ Pipelined
CPI _ Unpiped Clock _ Cycle _ Unpiped
 *
CPI _ Pipelined Clock _ Cycle _ Piped

We also know that the Ideal CPI is 1:

CPI _ Pipelined  Ideal _ CPI  Pipeline stall cycles per Instr

 1  Pipeline stall cycles per Instr

Assuming an identical clock cycle, substitution yields:

CPI _ Unpiped
Speedup _ from _ Pipelining 
1  Pipeline stall cycles per Instr
Measuring Stall Performance
Given:
CPI _ Unpiped
Speedup _ from _ Pipelining 
1  Pipeline stall cycles per Instr
In our simple case each instruction takes the same number of
cycles, which is equal to the number of pipeline stages or the
pipeline depth:
Pipeline _ Depth
Speedup _ from _ Pipelining 
1  Stall _ Cycles _ Per _ Instruction

If there are no pipeline stalls, the intuitive result is that pipelining

can improve performance by the depth of the pipeline.
Example Dynamic Hardware Prediction

Basic Branch Prediction: Branch Target Buffers

Instructions Prediction Actual Penalty
in Buffer Branch Cycles
Yes Taken Taken 0
Yes Taken Not taken 2
No Taken 3

Determine the total branch penalty for a BTB using the above penalties.
Assume also the following:
• Prediction accuracy of 80%
• Hit rate in the buffer of 90%
• 60% taken branch frequency.

Solution:
Branch Penalty = Misprediction penalty + Buffer miss penalty
= Percent buffer hit rate x Percent incorrect predictions x penalty cycles
+ ( 1 - percent buffer hit rate) x Taken branches x penalty cycles
Branch Penalty = ( 90% x 10% x 2) + (10% X 60% X 3)
Branch Penalty = 0.18 + 0.18 = 0.36 clock cycles

FL WLAN 1100/1101/2100/2101: User Manual
No ratings yet
FL WLAN 1100/1101/2100/2101: User Manual
58 pages
UNIT 1 "To Be": I Am Very Hungry Today
No ratings yet
UNIT 1 "To Be": I Am Very Hungry Today
108 pages
Speed and Velocity Acceleration
No ratings yet
Speed and Velocity Acceleration
38 pages
National Scientists of The Philippines
No ratings yet
National Scientists of The Philippines
3 pages
Basic Rules For Using Articles
No ratings yet
Basic Rules For Using Articles
4 pages
TNF Mountain Athletics M Running
100% (1)
TNF Mountain Athletics M Running
14 pages
Torture Stomach Swallow Disemboweling
100% (1)
Torture Stomach Swallow Disemboweling
18 pages
The Life and Work of Niels Bohr - A Brief Sketch: N Mukunda
No ratings yet
The Life and Work of Niels Bohr - A Brief Sketch: N Mukunda
8 pages
Pipe Lining
No ratings yet
Pipe Lining
36 pages
Pipeline and Vector Processing
No ratings yet
Pipeline and Vector Processing
28 pages
Candito 6 Week Program
No ratings yet
Candito 6 Week Program
13 pages
Gas Turbin Mark VI
100% (8)
Gas Turbin Mark VI
42 pages
Blue Brain Final Ppt1
No ratings yet
Blue Brain Final Ppt1
21 pages
Pipeline and Vector Processing
83% (12)
Pipeline and Vector Processing
37 pages
Test Fibrain Respuestas
No ratings yet
Test Fibrain Respuestas
2 pages
Datasheet Norsat LNA Ku Band 4000 Series
No ratings yet
Datasheet Norsat LNA Ku Band 4000 Series
1 page
Advanced Computer Architecture Assignment-1: Submitted To: Mrs. Rajni Sharma
No ratings yet
Advanced Computer Architecture Assignment-1: Submitted To: Mrs. Rajni Sharma
8 pages
Coa Mod 4 5
No ratings yet
Coa Mod 4 5
91 pages
Romantic Interlude in Japan
No ratings yet
Romantic Interlude in Japan
1 page
Circuits MCQ DR Haitham
No ratings yet
Circuits MCQ DR Haitham
7 pages
Grade 5 Nonrenewable Resources Word Search
No ratings yet
Grade 5 Nonrenewable Resources Word Search
2 pages
Principles of Linear Pipelining
50% (2)
Principles of Linear Pipelining
71 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
39 pages
Uspfo Sop
No ratings yet
Uspfo Sop
34 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
37 pages
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
No ratings yet
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
36 pages
AcA Assignment VIDHI KISHOR
No ratings yet
AcA Assignment VIDHI KISHOR
6 pages
Prestressed Concrete
No ratings yet
Prestressed Concrete
4 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
28 pages
HPC Computer Engg Sem 8 Notes
No ratings yet
HPC Computer Engg Sem 8 Notes
36 pages
Heckman 425 Wedge Insert
No ratings yet
Heckman 425 Wedge Insert
1 page
COA DR MVN 5 UNIT - Latest PDF
No ratings yet
COA DR MVN 5 UNIT - Latest PDF
24 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Unit-4 Pipelinie and Vector Processing
No ratings yet
Unit-4 Pipelinie and Vector Processing
33 pages
The Diamond Lens
No ratings yet
The Diamond Lens
19 pages
DI-9103E - Addressable ROR & FIX Heat Detector
No ratings yet
DI-9103E - Addressable ROR & FIX Heat Detector
2 pages
Comp Architecture Chapter 4 - Pipelining
No ratings yet
Comp Architecture Chapter 4 - Pipelining
53 pages
Pipelining and Vector Processing: - Parallel
No ratings yet
Pipelining and Vector Processing: - Parallel
37 pages
Pipeline and Vector
No ratings yet
Pipeline and Vector
29 pages
Chapter 3 - Pipelining-And-Vector-Processing
100% (1)
Chapter 3 - Pipelining-And-Vector-Processing
29 pages
Chapter 5 Pipelining and Vector Processing Modified
No ratings yet
Chapter 5 Pipelining and Vector Processing Modified
37 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
Parallel Processing
No ratings yet
Parallel Processing
32 pages
CSO Lecture Notes Unit - 5
No ratings yet
CSO Lecture Notes Unit - 5
11 pages
Industrial Automation-Car Manufacturing Industry: 1. Use Case Diagram
No ratings yet
Industrial Automation-Car Manufacturing Industry: 1. Use Case Diagram
7 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Lec-5 Hydrostatic Forces Numericals-5hrs
No ratings yet
Lec-5 Hydrostatic Forces Numericals-5hrs
39 pages
Arithmatic Pipline Unit-3
No ratings yet
Arithmatic Pipline Unit-3
27 pages
Pipelining 2
No ratings yet
Pipelining 2
43 pages
Unit 5-2 COA
No ratings yet
Unit 5-2 COA
52 pages
Bon Appétit Bon Appétit
No ratings yet
Bon Appétit Bon Appétit
16 pages
BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
Pipelining
No ratings yet
Pipelining
33 pages
Datapath Design: Fixed-Point Arithmetic
No ratings yet
Datapath Design: Fixed-Point Arithmetic
4 pages
Chapter 5 - CO - BIM - III
No ratings yet
Chapter 5 - CO - BIM - III
7 pages
Unit-4-Pipeline and Vector Processing
No ratings yet
Unit-4-Pipeline and Vector Processing
45 pages
Metag 2009 Flyer Cont
No ratings yet
Metag 2009 Flyer Cont
5 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
Vectors
No ratings yet
Vectors
52 pages
Chapter 3
No ratings yet
Chapter 3
59 pages
CS6303 Computer Architecture Question Bank 3rd Sem
No ratings yet
CS6303 Computer Architecture Question Bank 3rd Sem
5 pages
Importance of Forest in The Philippines
No ratings yet
Importance of Forest in The Philippines
39 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
ACA - Pipelining
No ratings yet
ACA - Pipelining
25 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
UNIT-4 - Pipelining & Parallel Processing
No ratings yet
UNIT-4 - Pipelining & Parallel Processing
34 pages
Unit-6 Pipelining
No ratings yet
Unit-6 Pipelining
63 pages
Module 3 Chapter 2
No ratings yet
Module 3 Chapter 2
40 pages
Pipe Lining
No ratings yet
Pipe Lining
7 pages
Csso U 5
No ratings yet
Csso U 5
29 pages
Niced Pastry Concept Paper Format For Feasibilty Studies 2025
No ratings yet
Niced Pastry Concept Paper Format For Feasibilty Studies 2025
4 pages
05 Types of Pipelining
No ratings yet
05 Types of Pipelining
56 pages
COAU5
No ratings yet
COAU5
31 pages
Arithmatic Pipline Unit-3
No ratings yet
Arithmatic Pipline Unit-3
27 pages
Pipeline - 3117
No ratings yet
Pipeline - 3117
21 pages
Lect28-Pipeline 15012019
No ratings yet
Lect28-Pipeline 15012019
36 pages
Drawing For Plinth Trasformer For 63 To 200KV Transformer
No ratings yet
Drawing For Plinth Trasformer For 63 To 200KV Transformer
1 page
Unit-5-Parallel Processing
No ratings yet
Unit-5-Parallel Processing
11 pages
Unit 7 N
No ratings yet
Unit 7 N
13 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
Pipeline - 3117
No ratings yet
Pipeline - 3117
22 pages
Unit 6 COA
No ratings yet
Unit 6 COA
37 pages
Coa M3 Bit
No ratings yet
Coa M3 Bit
4 pages
Module 5
No ratings yet
Module 5
16 pages
Coa Unit 5
No ratings yet
Coa Unit 5
71 pages
光纤模块使用说明 (V1 6) 英文
No ratings yet
光纤模块使用说明 (V1 6) 英文
1 page
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
Python Beyond Limits: Python, #3
From Everand
Python Beyond Limits: Python, #3
AnwaarX
No ratings yet
Investigation of the Usefulness of the PowerWorld Simulator Program: Developed by "Glover, Overbye & Sarma" in the Solution of Power System Problems
From Everand
Investigation of the Usefulness of the PowerWorld Simulator Program: Developed by "Glover, Overbye & Sarma" in the Solution of Power System Problems
Dr. Hidaia Mahmood Alassouli
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)

Stud CSA Mod 5p2 Arithmetic SuperPipeline

Uploaded by

Stud CSA Mod 5p2 Arithmetic SuperPipeline

Uploaded by

COMPUTER SYSTEM ARCHITECTURE - CS

 Exponent difference determine how many times the

Now for binary multiplication:

Division operation appears less frequently in computer programs

 Static Pipeline  Dynamic Pipeline

2- issue Superscalar processor

We also know that the Ideal CPI is 1:

CPI _ Pipelined  Ideal _ CPI  Pipeline stall cycles per Instr

Assuming an identical clock cycle, substitution yields:

If there are no pipeline stalls, the intuitive result is that pipelining

Basic Branch Prediction: Branch Target Buffers

You might also like