0% found this document useful (0 votes)

79 views5 pages

Super Scalar Architecture With Dynamic Branch Prediction

This document describes a proposed super scalar architecture with dynamic branch prediction to improve CPU throughput. A super scalar processor can execute multiple instructions per clock cycle by dispatching instructions to redundant functional units simultaneously. Dynamic branch prediction predicts branch outcomes based on prior computational history. The proposed approach combines a super scalar architecture with a 2-bit dynamic branch predictor. Benchmark results show the combined approach improves instructions per clock cycle (CPI) over a basic 5-stage pipeline, increasing processor throughput.

Uploaded by

divineatma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views5 pages

Super Scalar Architecture With Dynamic Branch Prediction

Uploaded by

divineatma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Super Scalar Architecture with Dynamic Branch Prediction

Asish Skaria, Raghuveer Gopalakrishnan, Nikhil Vasanthakumar, Ashwath Narayan

University of Florida, Gainesville, USA

Abstract – As a novel approach, we propose a Super Scalar processors multiple instructions are fetched and passed to a
architecture with Dynamic Branch Prediction which exploits dispatcher. The dispatcher makes a rigorous comparison and
instruction level parallelism to increase the CPU throughput decides whether the instructions can be executed
at the same clock rate. A Super Scalar processor, unlike a simultaneously. The dispatcher then forwards the independent
scalar processor, executes more than one instruction during instructions to the available execution units of the processor,
a clock cycle by simultaneously dispatching multiple thereby executing the instructions in parallel. The more the
instructions to redundant functional units on the processor. instructions the Super Scalar processor is able to dispatch
Branch Prediction on the other hand, foretells the outcome simultaneously, the more the instructions gets completed in a
of conditional branch instructions. Excellent branch given cycle.
prediction techniques are essential for throughput
enhancement. In Dynamic Branch prediction the hardware The intricacy of Super Scalar architecture lies in the design of
influences the prediction while execution proceeds. an effective dispatch unit. The dispatcher needs to be able to
Prediction is decided on the computational history of the quickly and correctly determine whether instructions can be
program. executed in parallel, as well as dispatch them in such a way as
to keep as many execution units busy as possible.
The results prove that the CPI and thereby the throughput
of the processor has improved in comparison to the simple Branch predictors are crucial in today’s processors for
five stage pipeline processor, which executes one instruction achieving high performance. Branch Predictors, are broadly
every clock cycle. classified as Static Branch Predictors and Dynamic Branch
Predictors. The former technique predicts the outcome of a
Keywords: Super Scalar, Dynamic Branch Predictor, branch based solely on the branch instruction. Whereas, the
Reorder Buffer, Branch Table Buffer. latter technique, predicts the outcome based on the information
about the dynamic history of the branch instruction. Dynamic
1. INTRODUCTION branch prediction can be implemented in many ways, we make
Exploitation of parallelism is essential in enhancing the use of a 2-bit Dynamic Branch Predictor in our Super Scalar
throughput of a processor. The five-stage pipelined processor is architecture to enhance the performance of the processor.
one of the simplest methods used to accomplish parallelism. In
this simple processor, parallelism is accomplished by The performance parameter (CPI) for a set of benchmarks is
beginning the first steps of instruction fetching and decoding calculated using the simple five-stage pipeline and compared
before the prior instruction finishes its execution. Super Scalar with the Super Scalar architecture.
architecture enhances the concept of instruction pipelining and
decreases the idle time of the CPU components. 2. RELATED WORK
Currently, exhaustive research is being done to improve the
Super Scalar architecture includes a long instruction pipeline performance of superscalar architecture. Superscalar
and multiple identical execution units. In Super Scalar architectures have the capacity to reduce the clock cycles per
instruction by fetching more instructions, the pipeline stages instruction groups fetched and issued in parallel. Gordon
have to show improvement to exploit the fetch bandwidth. A Steven et al [3] analyzed the advantages and limitations of
recent study [1] [5]has indicated that the performance of large each of these processors in exploiting instruction level
instruction window that receives all the renamed registers can parallelism. They further proposed that in order for the
be substantially improved by partitioning the window into processors to achieve full performance, a confluence of
several small blocks each holding a dynamic code sequence. both VLIW and superscalar architecture is required. Thus a
Thus the performance has improved by a factor of 1.5 to 3.Also hybrid processor was designed that inculcated the
the loops show a strong tendency to exhibit vector like aggressive run time instruction scheduling in the VLIW
behavior as evident in our optimization. So vector tables have and the object code compatibility of the super scalar
been used. architecture over a wide range of implementations and
removes any requirements of no-ops.
Branch prediction in Superscalar architectures is a critical
A lot of focus has also been towards improving the fetching
parameter that affects the performance. A miss prediction
of instructions in a super scalar architecture. A fetch
in branches can considerably affect the performance of the
mechanism will be better if it provides higher performance,
superscalar processor. Dynamic scheduling is another
but also if it is less complex, takes fewer resources, requires
aspect that is of prime importance to the performance. So,
less chip area, or consumes less power Oliverio J Santana
active studies are being done on various branch prediction
et al[4] designed a novel fetch engine based on the
schemes like static and dynamic branch predictions, the
execution of long streams of sequential instructions An
effect of branch target buffers. A branch target buffer
instruction stream is a sequential run of instructions from
(BTB) which reduces the performance penalty of branches
the target of a taken branch to the next taken branch. A
in pipelined processors by predicting the path of the branch
single instruction stream may contain multiple basic blocks
and caching information used by the branch is analyzed [2].
as long as all the intermediate branches are not taken. As
Focus is on implementation of BTB with limited number of
such, an instruction stream is fully defined by its starting
bits. A method for discarding branches from the BTB is
instruction address and its length, since the behavior of the
examined. This method discards the branch with the
branches contained inside the stream is implicit in the
smallest expected value for improving performance; it
definition: all intermediate branches are not taken, while
outperforms the least recently used (LRU) strategy by a
the terminating branch is always taken.
small margin, at the cost of additional complexity.
Secondly, it resolves what information is to be stored in
3. METHODOLOGY
buffer. A BTB entry can consist of one or more of the
following: branch tag, prediction information, the branch
3.1 FIVE STAGE PIPELINE
target address, and instructions at the branch target. Various
BTB designs, with one or more of these fields, are
The classic five stage pipeline has five major stages for
evaluated and compared.
executing an instruction. The stages are namely:
Handling multiple instructions at a time reduces the clock 1. Instruction Fetch.
cycle per instruction significantly. The two most popular 2. Instruction Decode and Register Fetch.
architectures that make use of multiple instruction issue are 3. Execute.
the superscalar and VLIW architecture. In the Superscalar 4. Memory Access.
processor, the hardware decides which instructions to be 5. Write Back.
issued in parallel at run time while the VLIW processor re- A five stage pipeline can be thought of as a series of data paths
orders the original sequential code into fixed size shifted in time. The simple five stage pipeline fetches and
Figure1 above shows the main components in Super Scalar
architecture. The IF stage fetches multiple instructions
simultaneously and forwards it to the decode stage for further
execution. The IF stage makes use of the entries in the 2-bit
Branch Table Buffer (BTB) in order to determine the next
target address for the instruction fetch process. The BTB stores
Figure 1: Simple five stage pipeline branch and jump address, their target addresses and also their
executes only one instruction per cycle. Though the simple prediction information.
pipelining concept helps in reducing the Clock Cycles Per
Instruction (CPI), the CPI can further be improved by fetching BRANCH TARGET PREDICTION
and executing multiple instructions per cycle. This leads to a ADDRESS ADDRESS BITS
design of new architecture, Super Scalar architecture, which
improves CPI by exploiting the aforesaid parameters. ……………… ……………………

……………… ……………. ……..

3.2 SUPER SCALAR ARCHITECTURE …………..

Super Scalar processors dynamically issue multiple instructions

each clock cycle from a conventional linear instruction stream. Figure 3: Branch Target Buffer (BTB)
There are three distinct sections in a Super Scalar pipeline:
1. In-order Section: Instruction Fetch and the Decode. We make use of a dynamic 2-bit branch predictor for
2. Out-of-order Section: Issue and the Execution Stage. computing the prediction bits and to decide the outcome of a
3. In-order Section: Retirement and Write back stage. branch. Dynamic Branch predictors rely on the computational
history of the program to make the prediction of a conditional
branch. During the start up phase of the program execution,
where the static branch predictor might be effective, the history
information is gathered and then dynamic branch prediction
Retire and Write Back

becomes effective. In general, dynamic branch prediction gives

better results than the static one but at the cost of increased
hardware complexity.

T
Figure4: State Diagram of a 2-bit Predictor.

The above figure shows the state flow in a 2-bit predictor. The
states are incremented if the prediction is correct or
decremented if a miss-prediction occurs. In a 2-bit predictor a
Figure2: Super Scalar Architecture prediction must miss twice before it is changed. The two bit
predictor is implemented in the BTB by assigning two state bits Various benchmark programs are simulated on our Super
for each entry in the BTB. The BTB table is indexed using the Scalar based Instruction set simulator. In the software
branch address to determine the prediction bit value and architecture of the simulator, an assembler is invoked
thereby determine the outcome of the conditional branch. which converts the assembly language programs into a
machine language. The output of the assembler is in
The Instruction window and the Issue stages separate the hexadecimal format. This assembler output, a text file, is
Decode stage from the Execution units of the architecture. The then passed to the simulator as input. The simulator
Instruction window will have all the instructions waiting to be simulates the machine language program and creates two
dispatched to the execution stage. We make use of an Issue output text files namely “Result.txt” and “memory.txt”
logic which examines the waiting instructions in the Instruction containing the register and memory location values
window. If there are no dependencies or hazards involved the respectively. Also the Clock Cycles Per Instruction (CPI)
logic simultaneously assigns a number of instructions to the and Clock value are displayed in the output files.
execution unit up to a maximum Issue bandwidth. The program
order of the Issued instructions is stored in a reorder buffer.
The instruction Issue from the instruction window is out-of- 5. CONCLUSION AND FUTURE WORK
order.
A Super Scalar architecture is proposed as an alternative to the
The instructions which are independent and have all its five-staged pipeline architecture. The architectures were
operands ready are dispatched to the execution units. The experimentally verified for few benchmarks and the
dispatch is usually not a pipeline stage. Both dispatch and performance analysis was made and plotted. The Super Scalar
execution process are done out-of-order. An instruction is architecture which executes multiple instructions per clock
completed when the execution stage finishes the computation cycle shows improvement in the CPI value, thereby improving
and the result is made available for forwarding and buffering. the throughput and performance of the processor. The
Instruction completion is out of program order. performance of the Super Scalar architecture will further be
improved and studied by incorporating complex branch
Once the instruction gets completed the next phase is the prediction schemes like correlator branch predictors, which
committing phase. Committing an operation means that the decides the branch outcome based on the history of other
results of the operation have been made permanent and the branch instructions in the program as well. The redundant
operation is retired from the scheduler. We implement the functional units will also be increased in number and more
Write Back stage in in-order format. Re-order buffers are made number of instructions will be fetched simultaneously.
use of for this in-order execution. The re-order buffer keeps the
original program order of the instructions and allows result REFERENCES
serialization during the Write Back Stage. The re-order buffer
is implemented as a circular FIFO buffer, the buffer entries are [1]. Vajapeyam, S. and Mitra, T. 1997. Improving

allocated in the issue stage and de allocated serially when the superscalar instruction dispatch and issue by exploiting

instruction retires. State bits are made use of in the re-order dynamic code sequences- Proceedings of the 24th Annual

buffer to check whether the instruction has completed its International Symposium on Computer Architecture

execution or not. (Denver, Colorado, United States, June 01 - 04, 1997).

ISCA '97 ACM, New York NY, 1-12. DOI=

4. EXPERIMENT RESULTS http://doi.acm.org/10.1145/264107.264119

[2]. Perleberg, C. H. and Smith, A. J. 1993 Branch Target
Buffer Design and Optimization. IEEE Trans. Comput. 42,
4 (Apr. 1993), 396-412. DOI=
http://dx.doi.org/10.1109/12.214687

[3].A superscalar architecture to exploit instruction level

parallelism Gordon Steven, Bruce Christianson, Roger
Collins, Richard Potter, Fleur Steven University of
Hertfordshire, Hatfield, Her-is. AL10 9AB, UK Received
30 November 1995; revised 19 August 1996; accepted I
August 1996.

[4]. Santana, O. J., Ramirez, A., Larriba-Pey, J. L, and

Valero, M. 2004. A low-complexity fetch architecture for
high-performance superscalar processors.ACM Trans.
Archit. Code Optim. 1, 2 (Jun. 2004),
http://doi.acm.org/10.1145/1011528.1011532

[5]. Wallace, S. and Bagherzadeh, N. 1994.Performance

Issues of a Superscalar Microprocessors- Proceedings of
the 1994 international Conference on Parallel Processing -
Volume 01 (August 15 - 19, 1994). ICPP, IEEE Computer
Society Washington, DC, 293-297. DOI=
http://dx.doi.org/10.1109/ICPP.1994.160

http://en.wikipedia.org/wiki/Superscalar

Superscalar Architectures
No ratings yet
Superscalar Architectures
36 pages
Aca Important Questions 2 Marks 16marks
60% (5)
Aca Important Questions 2 Marks 16marks
18 pages
ACA Module2 2018.PDF Extra
No ratings yet
ACA Module2 2018.PDF Extra
48 pages
Lecture On Global Informatics and Electronics
No ratings yet
Lecture On Global Informatics and Electronics
45 pages
Onur Ddca 2025 Lecture15b Branch Prediction Beforelecture
No ratings yet
Onur Ddca 2025 Lecture15b Branch Prediction Beforelecture
188 pages
Branch Hazard.: Control Hazards
No ratings yet
Branch Hazard.: Control Hazards
4 pages
Pipeline - Instr - Super Branch
No ratings yet
Pipeline - Instr - Super Branch
48 pages
Module 6
No ratings yet
Module 6
59 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
CA - Slides
No ratings yet
CA - Slides
28 pages
Chapter 5
No ratings yet
Chapter 5
38 pages
Advanced Topics in Computer Architecture ECE 7373
No ratings yet
Advanced Topics in Computer Architecture ECE 7373
40 pages
Unit 5
No ratings yet
Unit 5
44 pages
Mathematics P2 Autumn 2021 Practice Paper 291021 - Compressed
100% (5)
Mathematics P2 Autumn 2021 Practice Paper 291021 - Compressed
30 pages
Presentation Cea Chapter16 2 Demo
No ratings yet
Presentation Cea Chapter16 2 Demo
30 pages
Simultaneous Multithreading
No ratings yet
Simultaneous Multithreading
50 pages
02 Processors
No ratings yet
02 Processors
49 pages
Pipelining in Pentium 2
No ratings yet
Pipelining in Pentium 2
9 pages
Superscalar Processor Simulator Report PDF Version
No ratings yet
Superscalar Processor Simulator Report PDF Version
16 pages
Pipelining
No ratings yet
Pipelining
13 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
Superscalar and Advanced Architectural Features of Powerpc and Pentium Family
No ratings yet
Superscalar and Advanced Architectural Features of Powerpc and Pentium Family
11 pages
Minimizes Data Hazards
No ratings yet
Minimizes Data Hazards
10 pages
Module 3 Chapter 2
No ratings yet
Module 3 Chapter 2
40 pages
BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
Sp11-Quiz1 Soln
No ratings yet
Sp11-Quiz1 Soln
20 pages
Superscalar Architecture
No ratings yet
Superscalar Architecture
2 pages
Aca Unit-4 Notes
No ratings yet
Aca Unit-4 Notes
23 pages
Chapter 6
No ratings yet
Chapter 6
71 pages
CA Lecture 4 Module 3
No ratings yet
CA Lecture 4 Module 3
27 pages
Software-Based and Hardware-Based Branch Prediction Strategies and Performance Evaluation
No ratings yet
Software-Based and Hardware-Based Branch Prediction Strategies and Performance Evaluation
19 pages
15CS72 ACA Module2Final
No ratings yet
15CS72 ACA Module2Final
29 pages
Dump File Coa
No ratings yet
Dump File Coa
12 pages
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
No ratings yet
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
17 pages
AcA Assignment VIDHI KISHOR
No ratings yet
AcA Assignment VIDHI KISHOR
6 pages
Unit V
No ratings yet
Unit V
23 pages
البحث الثاني
No ratings yet
البحث الثاني
10 pages
Unit 1
No ratings yet
Unit 1
5 pages
MP - Module 6 - Pentium 4 - Aeraxia - in
No ratings yet
MP - Module 6 - Pentium 4 - Aeraxia - in
6 pages
Selective Branch Prediction Schemes Based On FPGA MIPS Processor For Educational Purposes
No ratings yet
Selective Branch Prediction Schemes Based On FPGA MIPS Processor For Educational Purposes
9 pages
The Microarchitecture of Superscalar Processors: Paper
No ratings yet
The Microarchitecture of Superscalar Processors: Paper
16 pages
Pipelining: Advanced Computer Architecture
100% (1)
Pipelining: Advanced Computer Architecture
30 pages
Computer Architecture Solutions - OK
No ratings yet
Computer Architecture Solutions - OK
6 pages
Superscalar Processor - Wikipedia
No ratings yet
Superscalar Processor - Wikipedia
5 pages
MIPS Superscalar Simulator
No ratings yet
MIPS Superscalar Simulator
5 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
3 pages
Epic Vliw
No ratings yet
Epic Vliw
4 pages
Computer Architecture Unit 3
No ratings yet
Computer Architecture Unit 3
8 pages
Instruction Pipelining
No ratings yet
Instruction Pipelining
32 pages
A Comparative Report Between EPIC and VLIW Architecture
No ratings yet
A Comparative Report Between EPIC and VLIW Architecture
2 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
7 pages
Employee Relationship Management (MCQ)
67% (6)
Employee Relationship Management (MCQ)
46 pages
Order ID: 0081249
100% (2)
Order ID: 0081249
18 pages
Superscalar Processor
No ratings yet
Superscalar Processor
4 pages
Superscaling in Computer Architecture
No ratings yet
Superscaling in Computer Architecture
9 pages
Correlating (Global) Branch Predictors Correlating Branch Predictors
No ratings yet
Correlating (Global) Branch Predictors Correlating Branch Predictors
3 pages
Guidelines in Prac. Teaching and Field Study New
No ratings yet
Guidelines in Prac. Teaching and Field Study New
24 pages
Digital Marketing 306 MBA Sem. III
No ratings yet
Digital Marketing 306 MBA Sem. III
70 pages
Accounting Concepts and Conventions
100% (1)
Accounting Concepts and Conventions
8 pages
2019 JBKnowledge Construction Technology Report
No ratings yet
2019 JBKnowledge Construction Technology Report
60 pages
Airtel Vodafone
100% (2)
Airtel Vodafone
27 pages
2.8 Science and Risk-Based
No ratings yet
2.8 Science and Risk-Based
19 pages
Summary of Jurisdiction of Philippine Courts
No ratings yet
Summary of Jurisdiction of Philippine Courts
13 pages
COBOL - Programming Guide
No ratings yet
COBOL - Programming Guide
519 pages
2023 Process Food by Sugar concentration-MATERIALS-TOOLS-EQUIPMENTS
No ratings yet
2023 Process Food by Sugar concentration-MATERIALS-TOOLS-EQUIPMENTS
2 pages
Dharmashastra National Law University: Rofessional Ommunication
No ratings yet
Dharmashastra National Law University: Rofessional Ommunication
19 pages
Course Outline
No ratings yet
Course Outline
3 pages
Colin
No ratings yet
Colin
5 pages
Grove Gmk5225 Cranes Material Handlers
No ratings yet
Grove Gmk5225 Cranes Material Handlers
28 pages
2024 2025 TF Season-1
No ratings yet
2024 2025 TF Season-1
19 pages
Monash Offer Guide For International - 161216 - v4
No ratings yet
Monash Offer Guide For International - 161216 - v4
30 pages
PR Loral Paris 2022final
No ratings yet
PR Loral Paris 2022final
20 pages
AC1200 Wireless Dual Band Gigabit Router: Features
No ratings yet
AC1200 Wireless Dual Band Gigabit Router: Features
2 pages
Major Repair and Alteration (Airframe, Powerplant, Propeller, or Appliance)
No ratings yet
Major Repair and Alteration (Airframe, Powerplant, Propeller, or Appliance)
3 pages
Sampaguita
No ratings yet
Sampaguita
4 pages
Wind Rose The Beginners Guide - Perfect Pollucon Services
No ratings yet
Wind Rose The Beginners Guide - Perfect Pollucon Services
18 pages
DA-087-08 - No CAR For Any Personal Properties
No ratings yet
DA-087-08 - No CAR For Any Personal Properties
2 pages
Draft PK Hong Kong
No ratings yet
Draft PK Hong Kong
4 pages
Biodegradable Leaf Table A Sustainable Solution
No ratings yet
Biodegradable Leaf Table A Sustainable Solution
10 pages
Service Bulletin: Service Bulletin NUMBER: 5.10/102 Caterpillar: Confidential Green Page 1 of 2
No ratings yet
Service Bulletin: Service Bulletin NUMBER: 5.10/102 Caterpillar: Confidential Green Page 1 of 2
2 pages
Corporate Organization PDF
No ratings yet
Corporate Organization PDF
25 pages
RCC T4
No ratings yet
RCC T4
26 pages
Tradable News Trigger Report
No ratings yet
Tradable News Trigger Report
1 page
DeepSparse for Efficient CPU Inference: The Complete Guide for Developers and Engineers
From Everand
DeepSparse for Efficient CPU Inference: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Study Guide Cisco 300-540 SPCNI Designing and Implementing Cisco Service Provider Cloud Network Infrastructure
From Everand
Study Guide Cisco 300-540 SPCNI Designing and Implementing Cisco Service Provider Cloud Network Infrastructure
Anand Vemula
No ratings yet
Java Streams Explained: A Practical Guide with Examples
From Everand
Java Streams Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
From Everand
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
Anand Vemula
No ratings yet
Kafka for Distributed Systems: Definitive Reference for Developers and Engineers
From Everand
Kafka for Distributed Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Super Scalar Architecture With Dynamic Branch Prediction

Uploaded by

Super Scalar Architecture With Dynamic Branch Prediction

Uploaded by

Super Scalar Architecture with Dynamic Branch Prediction

Asish Skaria, Raghuveer Gopalakrishnan, Nikhil Vasanthakumar, Ashwath Narayan

University of Florida, Gainesville, USA

……………… ……………. ……..

Super Scalar processors dynamically issue multiple instructions

becomes effective. In general, dynamic branch prediction gives

execution or not. (Denver, Colorado, United States, June 01 - 04, 1997).

4. EXPERIMENT RESULTS http://doi.acm.org/10.1145/264107.264119

[3].A superscalar architecture to exploit instruction level

[4]. Santana, O. J., Ramirez, A., Larriba-Pey, J. L, and

[5]. Wallace, S. and Bagherzadeh, N. 1994.Performance

You might also like