Chapter 6

Pipelining and superscalar techniques can improve processor performance. Pipelining divides instruction execution into stages to allow multiple instructions to be processed simultaneously. Superscalar techniques use multiple functional units and dynamic scheduling to allow multiple instructions to execute concurrently. Key aspects of pipelining include pipeline hazards, branch prediction, reservation tables, and latency analysis. Arithmetic pipelines apply similar techniques to speed up numerical calculations by dividing operations like addition and multiplication into stages.

Uploaded by

K S Sanath Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views71 pages

Chapter 6

Uploaded by

K S Sanath Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

Pipelining and superscalar

techniques
-prajwala T R
Dept. of CSE
PESIT
Linear pipelining processors
• Cascade of processing stages which are
linearly connected to perform fixed function
over stream of data
• External inputs are feed to stage s1
• K processing stages
Asynchronous model
• Uses handshaking protocol
• Ready and acknowledgement signal
• Variable throughput and variable delays at
each stage
Synchronous model
• Latches act as interface between stages
• Upon arrival of clock signal latch transfers data
• Pipeline stages-combinational circuits
• Reservation table-space and time diagram of
pipeline stage
Reservation table
Clock and timing protocol
• Clock cycle time-τ
• τ i-time delay of stage si
• D-time delay of latch
• τ= τm+d τm>>d
• Pipeline frequency=1/ τ
Clock skewing
• Same clock pulse arrive at different stages
with time offset s
• tmax and tmin
• To avoid race condition
– τm>tmax+s
– d<tmin-s
Speedup
• Linear pipeline of k stages can process n tasks
in k cycles n-1 tasks take (n-1) clock cycles
• Tk =[k+(n-1)] τ
• n tasks in non pipelined processor is T1 =nkτ
• Speedup is ratio of nonpipelined over
pipelined
Optimal pipeline stages
• t-total time required for non pipelined
program
• Same program using pipelining
– If k pipeline stages
– Clock period p=t/k+d
– Cost=c+kh
– Frequency f=1/p
– PCR=f/c+kh
• Efficiency

• (Hk)Pipeline throughput=no of tasks/time(tk)

• HK=Ek.f
Reservation and latency analysis
• Feed forward connections of pipeline
• Feedback connections of pipeline.
• Example-3 stage pipeline
• Output of pipeline is not always got from last
stage.
Reservation table
• Example of reservation table for function.
(multiple check marks in a row and column)
• Time space data flow
• Columns-evaluation time
• All initiations in static pipeline use same
reservation table.
• Dynamic pipeline –different initiation different
reservation table
Latency analysis
• Number clock cycles between 2 initiations of
pipeline-latency. Ex: k latency
• Collision-2 or more initiations use same
pipeline stage at same time.
• Latencies that cause collision are called
forbidden latencies.
• Forbidden latency in reservation table is
distance between any 2 checkmarks in a row.
• Latency cycle-latency sequence that repeats
itself indefinitely.
• Average latency.
• Collision free scheduling
• Collision vector
• State transition diagram
• Greedy cycles
Pipelined scheduled optimization
• Insert non compute delay stages into original
pipeline.
• New reservation table ,so new collision vector
• Lower bound is max checkmarks in any row of
reservation table
• MAL<=avg latency of greedy cycle
• Upper bound is number of 1’s in collision
vector +1
• Pipeline throughput-N tasks n pipeline cycles.
• N/n
• Pipeline efficiency-percentage of time each
pipeline stage is used over long series of tasks
• High pipeline throughput-less latency
• High efficiency- less idle time
Instruction pipeline design
• Instruction execution phase
– Fetch
– Decode
– Issue
– Execute
– Write results
Mechanisms for instruction pipelining
• Prefetch buffers
– Sequential buffers
– Target buffers
• Pairs of buffers
• FIFO
– Loop buffer
• Saves fetch time
• Unnecessary memory access is avoided
Mechanisms for instruction pipelining
contd…
• Multiple functional units
– Maximum checkmarks in a row in reservation
table-bottleneck is created.
– Multiple copies of the stage
– Reservation station
• Operations wait until no data dependencies
• Act as buffer
– Tag unit
– Multiple functional unit
Mechanisms for instruction pipelining
contd…
• Internal data forwarding
• Store –load operation
– Convert load operation to move
• Load –load operation
– Convert load operation to move
Dot product example
Mechanisms for instruction pipelining
contd…
• Hazard avoidance
– D(I)-Domain-set of inputs for instruction I
– R(I)-Range-set of outputs of instruction I
– RAW hazard-flow dependence
– WAR hazard-anti dependence
– WAW Hazard-output dependence
Mechanisms for instruction pipelining
contd…
• Static scheduling
• Handled by compiler
• For(i=1000;i>0;i=i-1)
• X[i]=x[i]+s;
• Loop unrolling is also a static scheduling
method
• Replicate body of loop multiple times until
terminal condition
Static scheduling
• Determine if loop unrolling would be useful or
not
• Use different set of registers to avoid collisions
• Determine if load and stores can be
interchanged safely or not.
• Schedule the code preserving the
dependencies.
Dynamic scheduling
• Hardware rearranges the instruction execution
to overcome stalls at run time
• The 2 techniques are
– Score boarding on CDC
– Tomosulo algorithm
Tomosulo algorithm
• Register renaming to avoid RAW WAR and
WAW hazards.
• Reservation station fetches the operand and
buffers it instead of registers.
• Common data bus
Tomosulo algorithm
CDC scoreboarding
• Multiple functional units
• Score board is unit which keeps track of
registers needed by instructions waiting for
various functional units.
• Centralized system
• When all registers have valid data score board
enabled instruction execution.
• Scoreboard releases resources once
instruction is been executed.
Branch handling techniques
• Branch taken
• Branch target
• Delay slot(b)
• p
• q
• pnbqτ
• Pipeline throughput
• Upper bound
• Performance degradation factor
f-Heff/f=pq(k-1)/pq(k-1)+1
Branch prediction-static scheme
• Sloution1 – freeze or flush all instructions until
branch destination is known
• Solution2-predicted not taken
– Continue fetching instructions as if branch not
taken
– If taken then restart execution
• Solution3- branch is always taken
• As soon as branch encountered compute
target address and store in PC
• Best performance
• Solution 4-delay slot
• Branch instrn
• Delay slot
• Branch target if taken
Delay slot
Branch hazards-dynamic solution
• History of branches is taken
• Branch history table
• 3 classes of strategy
• Hardware-branch target buffer
• 2 bit prediction scheme
– Branch prediction buffer
Delayed branches
• Delayed branch of d cycles allows at most d-1
useful instructions
• NOP act fillers
• Whether branch taken or not the result should
remain same.
Arithmetic pipeline design
• Applied to speedup numerical calculations
• Finite precision is required.
– Numbers exceeding limit must be rounded off or
truncated
– Overflow condition
– Underflow condition
• IEEE 754 format
• Decimal numbers can be fixed or floating
point
• Fixed point numbers
– Sign magnitude , 1’s complement, 2’s complement
– 1’s complement –dirty zero
– Multiplication and division requires 2n bit register
space
• Floating point numbers
• Single precision
– X=(m,e)=m X re
– Sign bit-1 bit
– Exponent-8 bit, -127 to +127 values
– Mantissa 23 bit
• X=(-1)s X 2 e-127 X m
• Floating point addition
• Floating point multiplication
Static arithmetic pipelines
• Scalar pipelines-one function at a time
• Vector or dynamic pipelines-more than
function at a time
• Multiple functional units
• 3 stages are
– Exponent comparison and equalization
– Fraction addition
– Normalization and rounding off the values
• Carry propagation adder
• Carry save adder
• Multiplication pipeline design
• Convergence division
– Q=N/D
• Convergence Division
• Generate reciprocal of the divisor by an
iterative process and then use:
• A/B = A X (1/B)

• Use(great, great, great grand-uncle) Newton-

Raphson method to solve for 1/B:
• xi+1 = xi - f(xi)/f'(xi)

IB Computer Science HL - Revision Guide 2024
No ratings yet
IB Computer Science HL - Revision Guide 2024
74 pages
Pipelining: Advanced Computer Architecture
100% (1)
Pipelining: Advanced Computer Architecture
30 pages
Pipelining and Superscalar Techniques: CSE539: Advanced Computer Architecture
No ratings yet
Pipelining and Superscalar Techniques: CSE539: Advanced Computer Architecture
49 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Chapter 2 The Systems Unit
No ratings yet
Chapter 2 The Systems Unit
76 pages
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
No ratings yet
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
34 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
More On Pipelining
100% (1)
More On Pipelining
34 pages
CAO-II Module 2 Complete
100% (1)
CAO-II Module 2 Complete
32 pages
Unit 3 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Advanced Computer Architecture - WWW - Rgpvnotes.in
15 pages
5.pipeline and Multiprocessors
100% (1)
5.pipeline and Multiprocessors
16 pages
Chapter 6 (Pipelining and Superscalar Techniques)
No ratings yet
Chapter 6 (Pipelining and Superscalar Techniques)
10 pages
Department of Computer Science
100% (1)
Department of Computer Science
22 pages
Instruction Pipeline - Study Notes
No ratings yet
Instruction Pipeline - Study Notes
14 pages
w10 SAP-2
100% (1)
w10 SAP-2
8 pages
Infineon-AURIX TC3xx Architecture vol1-UserManual-v01 00-EN
No ratings yet
Infineon-AURIX TC3xx Architecture vol1-UserManual-v01 00-EN
206 pages
ACA Question Bank
No ratings yet
ACA Question Bank
19 pages
Pipeline Processing
No ratings yet
Pipeline Processing
28 pages
Control Load Station B0700ax - J
No ratings yet
Control Load Station B0700ax - J
62 pages
Pinn Inc v. Apple - Patent Infringement
No ratings yet
Pinn Inc v. Apple - Patent Infringement
35 pages
ACA - Chapter 6
No ratings yet
ACA - Chapter 6
75 pages
Stud CSA Mod4 p2 PipeliningBasics
No ratings yet
Stud CSA Mod4 p2 PipeliningBasics
83 pages
Lecture-3 (Microprocessor Internal Architectres)
No ratings yet
Lecture-3 (Microprocessor Internal Architectres)
18 pages
Parallel Processor Computing Unit 2
No ratings yet
Parallel Processor Computing Unit 2
25 pages
Computer Architecture: Edited by Galatro Giovanni
No ratings yet
Computer Architecture: Edited by Galatro Giovanni
34 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
EC6009 Advanced Computer Architecture University Question Paper Nov Dec 2017
No ratings yet
EC6009 Advanced Computer Architecture University Question Paper Nov Dec 2017
3 pages
Unit 3
No ratings yet
Unit 3
64 pages
Contact Session 8 - With Annotation-1
No ratings yet
Contact Session 8 - With Annotation-1
47 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Computer Organization: Basic Structure of Computer
No ratings yet
Computer Organization: Basic Structure of Computer
59 pages
Pipeline and Vector
No ratings yet
Pipeline and Vector
29 pages
Lesson 3 5 Major Components of PC
No ratings yet
Lesson 3 5 Major Components of PC
22 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
Pipelining
No ratings yet
Pipelining
44 pages
Parallel Processing
No ratings yet
Parallel Processing
32 pages
Pipelinenew
No ratings yet
Pipelinenew
43 pages
Assembly Language Programming of 8085: Philippine College of Science & Technology
No ratings yet
Assembly Language Programming of 8085: Philippine College of Science & Technology
55 pages
Unit-1 Foc PDF Format
No ratings yet
Unit-1 Foc PDF Format
43 pages
Contact Session 8
No ratings yet
Contact Session 8
63 pages
Design of A 32-Bit Dual Pipeline Superscalar RISC-V Processor On FPGA
No ratings yet
Design of A 32-Bit Dual Pipeline Superscalar RISC-V Processor On FPGA
4 pages
The ARM Architecture The ARM Architecture
No ratings yet
The ARM Architecture The ARM Architecture
26 pages
Basics and Hazards of Pipeline Controller
No ratings yet
Basics and Hazards of Pipeline Controller
23 pages
Parallel Chapter3
No ratings yet
Parallel Chapter3
29 pages
MC0073 System Programming
No ratings yet
MC0073 System Programming
35 pages
Module 3 Chapter 2
No ratings yet
Module 3 Chapter 2
40 pages
CPS-UNIT - 2 (30th)
No ratings yet
CPS-UNIT - 2 (30th)
63 pages
DLCO Module 6 Sem 3
No ratings yet
DLCO Module 6 Sem 3
40 pages
Dpco Unit 4
No ratings yet
Dpco Unit 4
21 pages
Evolution of Cloud Computing
No ratings yet
Evolution of Cloud Computing
20 pages
APznzabDMN0K7ucLj 5y16mZ4MCAzvYka6XPubu o-J2kvJ41PtLmk6WmKHv2EeC4Ezo2wWs0bceGCsYwyq4dsvlt0hqLhY17sXl8HI4hJMeArq1cYV0OrVA-LXS0 77s jVurWxDlctuiAfZ24C8IrdGDNq-YxVFyEtTihvDe2xUFnrVedfCLXwLd0z
No ratings yet
APznzabDMN0K7ucLj 5y16mZ4MCAzvYka6XPubu o-J2kvJ41PtLmk6WmKHv2EeC4Ezo2wWs0bceGCsYwyq4dsvlt0hqLhY17sXl8HI4hJMeArq1cYV0OrVA-LXS0 77s jVurWxDlctuiAfZ24C8IrdGDNq-YxVFyEtTihvDe2xUFnrVedfCLXwLd0z
20 pages
ACA - Pipelining
No ratings yet
ACA - Pipelining
25 pages
Chapter 5
No ratings yet
Chapter 5
38 pages
Unit - 1 Notes
No ratings yet
Unit - 1 Notes
50 pages
Lecture 7 - PIPELINING
No ratings yet
Lecture 7 - PIPELINING
16 pages
Module 3-Part 2
No ratings yet
Module 3-Part 2
50 pages
Lecture 4 Parallel and Scalable Machine Learning With HPC Part 1
No ratings yet
Lecture 4 Parallel and Scalable Machine Learning With HPC Part 1
47 pages
Pipelining
No ratings yet
Pipelining
21 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
28 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
CSO Lecture Notes Unit - 5
No ratings yet
CSO Lecture Notes Unit - 5
11 pages
The Essential Guide To Reaktor Events
No ratings yet
The Essential Guide To Reaktor Events
23 pages
Intel Core I5 PPT 2
No ratings yet
Intel Core I5 PPT 2
15 pages
Pipeline & Parallel Processing
No ratings yet
Pipeline & Parallel Processing
19 pages
Pipeline
No ratings yet
Pipeline
30 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
IO Systems
No ratings yet
IO Systems
27 pages
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
PIpeline Processing and Multi Processing
No ratings yet
PIpeline Processing and Multi Processing
16 pages
Unit-5-Parallel Processing
No ratings yet
Unit-5-Parallel Processing
11 pages
History of Computer
No ratings yet
History of Computer
7 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
Unit 6 COA
No ratings yet
Unit 6 COA
37 pages
Pipelining Concepts and Problems
No ratings yet
Pipelining Concepts and Problems
33 pages
3.2 Pipeline Processing
No ratings yet
3.2 Pipeline Processing
18 pages
Lower Sixth Science, 2 Sequence Test. Computer Science November 2014, Time 2hours
No ratings yet
Lower Sixth Science, 2 Sequence Test. Computer Science November 2014, Time 2hours
2 pages
Instruction Pipelining
No ratings yet
Instruction Pipelining
32 pages
AL SW MOCK 2022 Computer Sience 1
No ratings yet
AL SW MOCK 2022 Computer Sience 1
5 pages
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
No ratings yet
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
8 pages
02 Unit 8 Assignment
No ratings yet
02 Unit 8 Assignment
4 pages
Imp Topics
No ratings yet
Imp Topics
5 pages
PL ANANDAM 23 Agustus 2011 8544
No ratings yet
PL ANANDAM 23 Agustus 2011 8544
6 pages
Canvas Pipelining and Parallel Processors
No ratings yet
Canvas Pipelining and Parallel Processors
5 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
Datasheet
No ratings yet
Datasheet
2 pages
Aduc812: Microconverter Quick Reference Guide
No ratings yet
Aduc812: Microconverter Quick Reference Guide
2 pages
PLC_Programmable logic controller, (Verilog).: Electronic control based on Fulcrum-B3 architecture.
From Everand
PLC_Programmable logic controller, (Verilog).: Electronic control based on Fulcrum-B3 architecture.
Mario Franco
4/5 (1)
पोर्ट टर्मिनल सिस्टम- कन्वेयर और उपकरण रखरखाव
From Everand
पोर्ट टर्मिनल सिस्टम- कन्वेयर और उपकरण रखरखाव
SANJIVAN SAINI
No ratings yet

Chapter 6

Uploaded by

Chapter 6

Uploaded by

Pipelining and superscalar

• (Hk)Pipeline throughput=no of tasks/time(tk)

• Use(great, great, great grand-uncle) Newton-

You might also like