0% found this document useful (0 votes)

97 views8 pages

Advanced Computer Architecture Pipeline and Branch Prediction

This document describes a programming assignment involving the simulation of a pipelined machine. The objective is to evaluate the performance of the pipeline by implementing dependency tracking, forwarding, and extending the pipeline to be superscalar and integrate branch prediction. Part A involves implementing dependency tracking and forwarding for scalar and superscalar pipelines. Part B extends the pipeline to support different branch predictors, including always taken, gshare, gselect, and tournament predictors. The document provides details on the problem, objectives, implementation requirements, and files provided for the assignment.

Uploaded by

sahith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views8 pages

Advanced Computer Architecture Pipeline and Branch Prediction

Uploaded by

sahith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Lab 2 Part A due: 09/15

Part B due: 09/22

CS 4290 / CS 6290 / ECE 4100 / ECE 6100

Advanced Computer Architecture

Lab 2: Dependency Tracking and Forwarding for

5-stage Superscalar Pipeline with Branch Prediction (10 Pts)

Part A Due: Friday, September 15th, 2023 (11:55 pm)

Part B Due: Friday, September 22nd, 2023 (11:55 pm)

Figure 1: 5-stage pipeline example.

This is an individual assignment. You can discuss this assignment with other classmates, but you
should code your assignment individually. You are NOT allowed to see the code of (or show your
code to) other students. We will be using code similarity detection software. If you think you
collaborated closely enough with another student that could lead to more code similarities than
would normally be expected, please declare it upfront, at the time of submission. Please note that
such declaration is not granting you an excuse for copying code.

OBJECTIVE
The objective of the second programming assignment is to evaluate the performance of a
pipelined machine. In particular, you will equip the code to check data dependencies in a pipeline,
implement a forwarding path, and extend your pipeline to be an “N-wide” superscalar machine.
The second part of the assignment integrates a branch predictor with the superscalar pipeline.

PROBLEM DESCRIPTION

1
Lab 2 Part A due: 09/15
Part B due: 09/22

The in-order five-stage pipeline is a variant of the five-stage pipeline we discussed in class, as
shown in the Figure above. It consists of Instruction Fetch (IF), Instruction Decode (ID), Execute
(EX), Memory (MEM) and Writeback (WB) stages. In addition, this pipeline implements an ISA
extension that supports a new type of ALU instructions with 3 source operands, such as ADD
<dest_reg> <src1_reg> <src2_reg> <src3_reg>.

We will use a trace-driven simulator that is strictly meant for doing timing simulation to estimate
the pipeline’s achieved performance. To keep the framework simple, we will not be doing any
functional simulation, which means the trace records that are fed to the pipelined machine do not
contain any data values, and your pipeline will not track any data values (in Registers, Memory,
or PC) either. Furthermore, the traces only contain the committed path of instructions. The
purpose of our simulation is to figure out how many clock cycles it takes to execute the given
instruction stream, for a variety of different microarchitectures such as with or without
forwarding and with a varying superscalar width (N).

For this assignment, we will assume that the register file employs a write on falling edge, which
means it is possible to write to the register file in the first half of the clock cycle and read from the
register file in the second half of the clock cycle. Therefore, there is no need to stall an instruction
in the decode stage if it has a RAW dependency with an instruction in the WB stage.
Consequently, there is also no need to implement data forwarding from the WB stage to the ID
stage.

You are provided with a trace reader, as well as a sample pipeline that simulates an N-wide
superscalar machine, without any dependence tracking. Your objectives are the following:

Part A: Assume that the pipeline has perfect branch prediction and does not suffer any stalls due
to control flow dependencies.

A.1 (2 points) Implement data dependency tracking and related stalls for a scalar machine (N=1).

A.2 (2 points) Generalize the above (A.1) to an N-wide superscalar pipeline. We will test for N=2,
although your code should be general enough to work for any reasonable value of N. Note that
for a superscalar machine you may have data dependencies not only from EX and MEM stages,
but also from older instructions that are in the ID stage.
Tip: Remember that you are building an in-order processor; hence, if at any point in time a younger
instruction is further ahead in the pipeline than an older instruction (which is perhaps stalled in
the ID stage), your implementation has an error.

A.3 (2 points for CS6290/ECE6100, 3 points for CS4290/ECE4100) Implement Data

Forwarding (from both MEM and EX). Note that the existence of a forwarding path does not
necessarily mean that you can pass the value from a later instruction to an earlier instruction. For
example, for a Load instruction, you would not have the produced value available until the MEM
stage, so you cannot forward the value of Load from the EX stage to the ID stage for an instruction
dependent on this Load instruction. We will test A.3 for N=2, although your program should work
for any reasonable value of N.

2
Lab 2 Part A due: 09/15
Part B due: 09/22

Part B: Extend your pipeline to support Branch Prediction. For this part, we will assume that the
machine has an idealized Branch Target Buffer (BTB), which identifies the conditional branches
(CBR) as soon as the instruction is fetched and also provides the correct target address. Your
objective is to consult the direction prediction on instruction fetch. If the prediction is correct, the
fetch unit continues to fetch subsequent instructions, otherwise the fetch unit stalls until the branch
resolves.
Tip: The reason you should stop fetching on misprediction is because the trace only contains
instructions that are committed (i.e., instructions on the correct execution path), so you cannot
fetch instructions on the wrong path and then flush them, as would happen in a real pipeline.
Therefore, you will only simulate the performance effect of mispredicting the branch direction.

B.1 (1 point for CS6290/ECE6100, 2 points for CS4290/ECE4100) Implement an

“AlwaysTaken” predictor and integrate it with your pipeline. We will evaluate your machine from
A.3 with N=2.

B.2 (1 point) Implement a gshare predictor, shown in the following figure, with HistoryLength =
14 (use the bottom 14 bits of the PC to XOR with the bottom 14 bits of the Global History Register,
GHR) and a PHT consisting of 2-bit counters, all initialized to the weakly taken state (10).

Figure 2: gshare branch predictor.

B.3 (1 point) [Required for CS6290/ECE6100. Optional for ECE4100/CS4290 (extra 1 point)]
Implement a gselect 7/7 predictor, shown in the following figure. The predictor has a configuration
of HistoryLength = 14 (use the bottom 7 bits of the PC to concatenate with the bottom 7 bits of

3
Lab 2 Part A due: 09/15
Part B due: 09/22

the Global History Register, GHR) and a PHT consisting of 2-bit counters, initialized to the weakly
taken state (10). Please note that the PC bits are followed by the GHR bits, i.e., index[13:7] =
PC[6:0] and index[6:0] = GHR[6:0], as indicated in the figure.

Figure 2: gselect 7/7 branch predictor.

B.4 (1 point) [Required for CS6290/ECE6100. Optional for ECE4100/CS4290 (extra 1 point)]
Implement a tournament predictor, shown in the following figure. The predictor has a
HistoryLength = 10 (use the bottom 10 bits of the PC, PC[9:0]) and a PHT consisting of 2-bit
counters, initialized to the weakly taken state (10). The tournament predictor will select the
GShare Predictor’s prediction if the counter is in the not taken state and will select the GSelect
Predictor’s prediction otherwise. When a branch is resolved, the GShare and GSelect predictors
are updated normally. The Tournament PHT is updated only if the GShare prediction differs from
the GSelect prediction, and the tournament counter shifts toward the direction of the predictor that
matches the branch resolution.

4
Lab 2 Part A due: 09/15
Part B due: 09/22

Figure 3: GShare-GSelect tournament branch predictor.

WHAT IS PROVIDED:
The simulator directory consists of sources and traces (note that these are different traces than
Lab1, as we need to do dependency tracking). The src directory contains the source code that
you will modify. The key files are the following:

1. sim.cpp and trace.h

The sim.cpp file is responsible for opening the trace, initialization, instantiating and
executing the pipeline until completion. The trace.h file serves a similar purpose as in
Lab1, however it has a few additional fields needed for this assignment.

2. pipeline.cpp/.h
These files contain the Pipeline class, internal structures and methods implementing the
pipeline functionality. The simulator is a series of latches storing the operands on
completion of the pipeline stages. You need to implement the functions pipe_cycle_IF()
… pipe_cycle_WB() to accommodate pipeline functionality and dependency handling. You
may add any additional structures you require. You also need to update the variables
marked stat_* at the right conditions in the code.

3. bpred.cpp/.h
These files contain the branch predictor interfaces. The interface contains only three
functions: the predictor initialization function, a function for getting the predicted value, and
a function for updating the predictor. You need to implement these functions according to
the branch prediction policy used in each case.

5
Lab 2 Part A due: 09/15
Part B due: 09/22

It is strongly recommended that you go through all files carefully and make sure you understand
the simulator’s structure before you proceed to write any code.

How to run the simulator:

1) Download the zipfile and type “unzip Lab2.zip”
2) Type “cd Lab2/src”
3) Type “make”
4) Type “./sim -h” for command line options (pipewidth, bpredpolicy, etc.)
5) “./sim ../traces/mcf.ptr.gz” (to test the current pipeline for N=1)
6) “./sim -pipewidth 2 ../traces/mcf.ptr.gz” (to test the current pipeline for N=2)

For Part A, you need to modify the pipe_cycle_* functions in pipeline.cpp. For Part B, you need
to add the data structures in bpred.h, functionality in bpred.cpp, and the pipe_check_bpred
function in pipeline.cpp. You will also need to implement the stall of fetch on branch mispredictions
and release the stall when the branch resolves (when the branch is in the MEM stage; however,
you can fetch only in the next cycle).

We have provided a script called runall.sh that can run all of the experiments (A1, A2, A3, B1,
B2, B3, B4) for all the four traces in the trace directory in a single invocation. This script is located
in the Lab_2/scripts directory. You may need to do “chmod +x runall.sh” before running this script.

Reference output for testing:

We provide an additional small trace (sml) in the traces directory, along with a sample expected
output for it in the reference directory, as a point of comparison for the outputs produced by your
solution. The reference directory also includes a reference output for the gcc trace.

Local and Gradescope autograders:

We provide a Python script (grade.py) in the scripts directory to generate a score based on the
reference outputs for the small and gcc traces. Compile the project and run python3 grade.py for
the CS6290/ECE6100 score, and python3 grade.py -u for the CS4290/ECE4100 score. A simple
autograder on Gradescope runs a similar script after submitting the required files. Please note
that these autograders are not comprehensive and your code will be tested against additional
traces.

WHAT to SUBMIT (on Gradescope):

For Part A
- pipeline.cpp
- bpred.cpp (implementation not required for bpred.cpp in Part A)

We will provide a sample solution for Part A after its deadline. Please use this (or your original
one if it is correct) to implement Part B.

For Part B
- pipeline.cpp

6
Lab 2 Part A due: 09/15
Part B due: 09/22

- bpred.cpp

Note for CS4290/ECE4100 students: You are not required to do B.3 and B.4. However, you
can still choose to do them for Extra Credit worth 1 point each.

REFERENCE MACHINE:
We will use the virtual machine oortcloud.cc.gatech.edu as a reference machine. You should
be able to connect to it using ssh and transfer files between this machine and your local
machine using scp, following the same steps as in Lab 1.

Before submitting your code ensure that your code compiles on this machine and generates the
desired output (only the output produced by the print_stats function, without any extra printf
statements to receive full credit). Please follow the submission instructions.

NOTE: It is impractical for us to support other platforms such as Mac, Windows, Ubuntu, etc.

FAQ:
1. Reason for Instruction Address not being unique in the Trace
During Trace generation, the complex x86 instructions having multiple operations at a
particular address were converted to simpler operations having the types provided in the trace
header file. These simpler instructions would then have the same instruction address. The
instruction address is thus not a unique identifier for an operation. Instead, op_id is supposed
to be used for that.

2. Data Forwarding for operations with conditional codes or belonging to the OTHER op-
type with destination
Handling data forwarding for the above is similar to the handling of ALU operations. Load
instructions having cc_write can only forward their conditional codes in the MEM stage.

3. Meaning of *_needed fields in the trace structure

These are binary 1 / 0 values, informing whether src1, src2, src3, and dest fields are valid in
an operation read from the trace file. If these are ‘1’, the corresponding values in the src1,
src2, src3, and dest fields represent the register being read from or written to.

4. Meaning of cc_read and cc_write

Consider the following operation
if (condition operation)
The condition operation implicitly writes to a condition 'status' register. Such an operation
would have cc_write set to 1. The following branch instruction based on the condition would
have the cc_read set to 1.
cc_read and cc_write are 1 / 0 values. Only branches perform a cc_read. The reading takes
place in the Instruction Decode stage, similar to the source register values
(Refer: http://en.wikipedia.org/wiki/Status_register).

7
Lab 2 Part A due: 09/15
Part B due: 09/22

5. What is the pipe_print_state function in pipeline.cpp?

This function is provided as a helpful tool for debugging purposes, and you can choose to use
it while developing your solution. Please make sure you do not have a call to this function
in your submitted solution. Leaving this on will result in additional printed statements, which,
as mentioned earlier, will result in you not receiving full credit.

6. Why are the five pipeline stages invoked in reverse order?

You will realize that doing so simplifies flow control and handling backpressure. In a pipeline,
if stage n needs to stall, all previous stages will also need to stall. Therefore, it is more practical
to move things forward from the end towards the start of the pipeline.

Cloud Computing, Service-Oriented Computing, Distributed Computing, and Virtualization
No ratings yet
Cloud Computing, Service-Oriented Computing, Distributed Computing, and Virtualization
26 pages
Gateway Installation and Operation Manual RevH
No ratings yet
Gateway Installation and Operation Manual RevH
47 pages
[Learning Record] Add Configuration DSQC652 in RobotStudio 6.08_robotstudio6.08 Tutorial Introduction-CSDN Blog
No ratings yet
[Learning Record] Add Configuration DSQC652 in RobotStudio 6.08_robotstudio6.08 Tutorial Introduction-CSDN Blog
10 pages
8.1.2 Making Conections 1
No ratings yet
8.1.2 Making Conections 1
15 pages
Aca Important Questions 2 Marks 16marks
60% (5)
Aca Important Questions 2 Marks 16marks
18 pages
Software Pipelining Patterson 1996
No ratings yet
Software Pipelining Patterson 1996
60 pages
Pipelined Processor Design
No ratings yet
Pipelined Processor Design
28 pages
Addition
No ratings yet
Addition
28 pages
FileList
No ratings yet
FileList
46 pages
pipeline mips
No ratings yet
pipeline mips
28 pages
111-1 Final Exam
No ratings yet
111-1 Final Exam
15 pages
MX48-MX60 Manual CRIBA SWECO
100% (1)
MX48-MX60 Manual CRIBA SWECO
200 pages
Pipeline - Instr - Super Branch
No ratings yet
Pipeline - Instr - Super Branch
48 pages
App C
No ratings yet
App C
50 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
CA - Slides
No ratings yet
CA - Slides
28 pages
EE557SP25HW2Sol
No ratings yet
EE557SP25HW2Sol
9 pages
Ex4 Updated
No ratings yet
Ex4 Updated
4 pages
Pipeline Hazards (1)
No ratings yet
Pipeline Hazards (1)
53 pages
wc INVOICE
No ratings yet
wc INVOICE
2 pages
Sp11-Quiz1 Soln
No ratings yet
Sp11-Quiz1 Soln
20 pages
Ug1433 Zcu216 Rfsoc Eval Tool
No ratings yet
Ug1433 Zcu216 Rfsoc Eval Tool
35 pages
projects for systolic arrays
No ratings yet
projects for systolic arrays
3 pages
Inventura Carnet 2022
No ratings yet
Inventura Carnet 2022
2 pages
Appendix C
No ratings yet
Appendix C
26 pages
1.00-Pub
No ratings yet
1.00-Pub
16 pages
Dell Optiplex 7450 All-In-One: Owner'S Manual
No ratings yet
Dell Optiplex 7450 All-In-One: Owner'S Manual
68 pages
Lakemaster Compatibility Chart
No ratings yet
Lakemaster Compatibility Chart
4 pages
Lecture On Global Informatics and Electronics
No ratings yet
Lecture On Global Informatics and Electronics
45 pages
Cs433 Fa20 Hw3 Solution
No ratings yet
Cs433 Fa20 Hw3 Solution
15 pages
Group 17_2151177
No ratings yet
Group 17_2151177
15 pages
HG50120
No ratings yet
HG50120
3 pages
Prog - Assignment 3
No ratings yet
Prog - Assignment 3
3 pages
2b.pipeline RISC-V v2
No ratings yet
2b.pipeline RISC-V v2
13 pages
Midterm1 s15 Sol
No ratings yet
Midterm1 s15 Sol
26 pages
PipelineHazards
No ratings yet
PipelineHazards
4 pages
CA Lecture 4 Module 3
No ratings yet
CA Lecture 4 Module 3
27 pages
Co - Unit Ii - Ii
No ratings yet
Co - Unit Ii - Ii
34 pages
Computer
No ratings yet
Computer
3 pages
CA_HW5 copy
No ratings yet
CA_HW5 copy
4 pages
Kien-Truc-May-Tinh - David-Brooks - cs146-hw2 - (Cuuduongthancong - Com)
No ratings yet
Kien-Truc-May-Tinh - David-Brooks - cs146-hw2 - (Cuuduongthancong - Com)
5 pages
Em&IOT lab manual updated ... (1)
No ratings yet
Em&IOT lab manual updated ... (1)
91 pages
Control Hazard
No ratings yet
Control Hazard
20 pages
Check Point Appliance Accessory Guide
No ratings yet
Check Point Appliance Accessory Guide
31 pages
CompEng 361 - Homework 3 Solutions(1)
No ratings yet
CompEng 361 - Homework 3 Solutions(1)
6 pages
PROFINET Edition (Siemens SIMATIC S7-1200)
No ratings yet
PROFINET Edition (Siemens SIMATIC S7-1200)
56 pages
PS4-Solution
No ratings yet
PS4-Solution
6 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Insta Infollo
No ratings yet
Insta Infollo
45 pages
A 1. Computer Won't Turn On: 2. Make Sure The Monitor or Display Is Functional
No ratings yet
A 1. Computer Won't Turn On: 2. Make Sure The Monitor or Display Is Functional
3 pages
ThinkVision T22i Datasheet EN
No ratings yet
ThinkVision T22i Datasheet EN
2 pages
Branch Hazard.: Control Hazards
No ratings yet
Branch Hazard.: Control Hazards
4 pages
Branch Prediction - 1: Computer Architecture: A Constructive Approach
No ratings yet
Branch Prediction - 1: Computer Architecture: A Constructive Approach
29 pages
Dpco Unit 4
No ratings yet
Dpco Unit 4
21 pages
Table 1: Control Signals and Opcodes
No ratings yet
Table 1: Control Signals and Opcodes
6 pages
Computer Architecture: Branching
No ratings yet
Computer Architecture: Branching
37 pages
# Tutorial 9 & 10
No ratings yet
# Tutorial 9 & 10
6 pages
MoTeC Brochure - M1 Series
No ratings yet
MoTeC Brochure - M1 Series
4 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
4 pages
TabletPhone As Arduino Screen and A 2 Oscilloscope
No ratings yet
TabletPhone As Arduino Screen and A 2 Oscilloscope
6 pages
Revision Questions 2
No ratings yet
Revision Questions 2
4 pages
CSE 560 - Practice Problem Set 4 Solution
No ratings yet
CSE 560 - Practice Problem Set 4 Solution
3 pages
Thinkcentre M70S Gen 3: Psref
No ratings yet
Thinkcentre M70S Gen 3: Psref
2 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
A Different Take On Asserts - Jack Ganssle - Embedded
No ratings yet
A Different Take On Asserts - Jack Ganssle - Embedded
5 pages
Prod 12267482012
No ratings yet
Prod 12267482012
1 page
491 Part%2B1%2B-%2BTarea
No ratings yet
491 Part%2B1%2B-%2BTarea
3 pages
CO Assignment 4 Solution
100% (1)
CO Assignment 4 Solution
10 pages
Pipelining
No ratings yet
Pipelining
44 pages
MIPS Superscalar Simulator
No ratings yet
MIPS Superscalar Simulator
5 pages
The Moron: CS 152 Final Project
No ratings yet
The Moron: CS 152 Final Project
18 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Pipeline
No ratings yet
Pipeline
36 pages
Proj2 Report-1-1
No ratings yet
Proj2 Report-1-1
13 pages
Instruction Level Parallelism: Pipelining
No ratings yet
Instruction Level Parallelism: Pipelining
6 pages
Correlating (Global) Branch Predictors Correlating Branch Predictors
No ratings yet
Correlating (Global) Branch Predictors Correlating Branch Predictors
3 pages
2162 Term Project: The Tomasulo Algorithm Implementation
No ratings yet
2162 Term Project: The Tomasulo Algorithm Implementation
5 pages
Cse590490 HW2
No ratings yet
Cse590490 HW2
5 pages
Field Tools Quick Start Guide en 132498
No ratings yet
Field Tools Quick Start Guide en 132498
94 pages
Midtermarch 2
No ratings yet
Midtermarch 2
9 pages
Unit 1: 1.1 Microprocessors and Microcontrollers
No ratings yet
Unit 1: 1.1 Microprocessors and Microcontrollers
13 pages
Solutions Ch4
No ratings yet
Solutions Ch4
7 pages
Sii9589 Datasheet 457
No ratings yet
Sii9589 Datasheet 457
38 pages
3D Hardware design:: Software applications for GPU
From Everand
3D Hardware design:: Software applications for GPU
S Mathioudakis
No ratings yet
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
Beginning Software Engineering
From Everand
Beginning Software Engineering
Rod Stephens
4/5 (1)
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
From Everand
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
Steve Brown
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
C Programming for the Pc the Mac and the Arduino Microcontroller System
From Everand
C Programming for the Pc the Mac and the Arduino Microcontroller System
Peter D Minns
No ratings yet
Computer Aided Design of Electrical Machines
From Everand
Computer Aided Design of Electrical Machines
K.M. Vishnu Murthy
No ratings yet

Advanced Computer Architecture Pipeline and Branch Prediction

Uploaded by

Advanced Computer Architecture Pipeline and Branch Prediction

Uploaded by

Lab 2 Part A due: 09/15

Part B due: 09/22

CS 4290 / CS 6290 / ECE 4100 / ECE 6100

Lab 2: Dependency Tracking and Forwarding for

Part A Due: Friday, September 15th, 2023 (11:55 pm)

Figure 1: 5-stage pipeline example.

A.3 (2 points for CS6290/ECE6100, 3 points for CS4290/ECE4100) Implement Data

B.1 (1 point for CS6290/ECE6100, 2 points for CS4290/ECE4100) Implement an

Figure 2: gshare branch predictor.

Figure 2: gselect 7/7 branch predictor.

Figure 3: GShare-GSelect tournament branch predictor.

1. sim.cpp and trace.h

How to run the simulator:

Reference output for testing:

Local and Gradescope autograders:

WHAT to SUBMIT (on Gradescope):

3. Meaning of *_needed fields in the trace structure

4. Meaning of cc_read and cc_write

5. What is the pipe_print_state function in pipeline.cpp?

6. Why are the five pipeline stages invoked in reverse order?

You might also like