0% found this document useful (0 votes)

328 views20 pages

Investigating Instruction Pipelining

The document discusses investigating instruction pipelines through a series of lab exercises. The exercises demonstrate: 1) The difference between sequential and pipelined CPU instruction processing and the stages of an instruction pipeline. 2) How data hazards can occur in pipelines and how operand forwarding eliminates hazards by passing data between instructions. 3) How loop unrolling optimization improves pipeline performance by reducing control dependencies between instructions. 4) How compilers can rearrange instructions to minimize data hazards in pipelines.

Uploaded by

Lam Tien Hung (K17 HCM)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

328 views20 pages

Investigating Instruction Pipelining

Uploaded by

Lam Tien Hung (K17 HCM)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

In Investigating Instruction Pipelines

Introduction
Objectives
At the end of this lab you should be able to:
 Demonstrate the difference between pipelined and sequential
processing of the CPU instructions
 Explain pipeline data dependency and data hazard
 Describe a pipeline technique to eliminate data hazards
 Demonstrate compiler “loop unrolling” optimization’s benefits for
instruction pipelining
 Describe compiler re-arranging instructions to minimize data
dependencies
 Show the use of jump-predict table for pipeline optimisation

Basic Theory
Modern CPUs incorporate instruction pipelines which are able to process
different stages of multiple-stage instructions in parallel thus improving the
overall performance of the CPUs. However, most programs include
instructions that do not readily lend themselves to smooth pipelining thus
causing pipeline hazards and effectively reducing the CPU performance. As a
result, CPU pipelines are designed with some tricks up their sleeves for
dealing with these hazards.

Lab Exercises - Investigate and Explore

The lab investigations are a series of exercises that are designed to
demonstrate the various aspects of CPU instruction pipelining.

1
Exercise 1 – Difference between the sequential and the pipelined execution
of CPU instructions
Enter the following source code, compile it and load in simulator’s memory:
program Ex1
for n = 1 to 20
p = p + 1
next
end

Open the CPU pipeline window by clicking on the SHOW PIPELINE… button in
the CPU simulator’s window. You should now see the Instruction Pipeline
window. This window simulates the behaviour of a CPU pipeline. Here we can
observe the different stages of the pipeline as program instructions are
processed. This pipeline has five stages. The stages are colour-coded as
shown in the key for the “Pipeline Stages”.
List the names of the stages here:
1. Fetch
2. Decode
3. Read Operands
4. Execute
5. Write Result

The instructions that are being pipelined are listed on the left side (in white
text boxes). The newest instruction in the pipeline is at the bottom and the
oldest at the top. You’ll see this when you run the instructions. The horizontal
yellowish boxes display the stages of an instruction as it progresses through
the pipeline. At the bottom left corner pipeline statistics are displayed as the
instructions are executed.
Check the box titled Stay on top and make sure No instruction pipeline check
box is selected. In the CPU simulator window bring the speed slider down to
around a reading of 30. Run the program and observe the pipeline. Wait for
the program to complete. Now make a note of the following values

2
Next, uncheck the No instruction pipeline checkbox, reset and run the above
program again and wait for it to complete.
Note down your observation on how the pipeline visually behaved differently

3
Now once again make a note of the following values

*Briefly explain why you think there is a difference in the two sets of values:
- No instruction pipleline: make the program run on single processor.

Exercise 2 – CPU pipeline data hazard and bubble

4
CPU pipelines often have to deal with various hazards, i.e. those aspects of
the CPU architecture which prevent the pipelines running smoothly and
uninterrupted. These are often called “pipeline hazards”. One such hazard is
called the “data hazard”. A data hazard is caused by unavailability of an
operand value when it is needed. In order to demonstrate this create a
program (call it Ex2) and enter the following set of instructions
MOV #1, R01
MOV #5, R03
MOV #3, R01
ADD R01, R03
HLT

Reset the program and run the above instructions.

Have you seen the “bubble”? What colour is it?

5
Make a note of the following values
CPI (Clocks Per Instruction) 13
SF (Speed-up Factor) 1.92
Data Hazards 1
. Data hazards in a CPU simulator are situations where two instructions
are trying to access the same data at the same time.
Exercise 3 – A pipeline technique to eliminate data hazards
One way of dealing with “data hazard” is to get the CPU to “speed up” the
availability of operands to pipelined instructions. One such method is called
“operand forwarding”, a kind of short-cut by the hardware. To demonstrate

6
this check the box titled Enable operand forwarding and run the above code
again.

- Operand forwarding involves passing the data from one instruction to another
instruction as soon as it is generated. This eliminates data hazards because
the instruction can access the data as soon as it is available, instead of having
to wait for the data to be read from memory. By avoiding memory reads,
instructions can access the data in a much faster and more efficient way,
eliminating the possibility of two instructions trying to access the same data at
the same time. This ensures that instructions are executed in the correct order
and that the data is accessed safely.
Has the bubble seen in Exercise 2 disappeared (or burst!)?

The simulator keeps a count of the pipeline hazards it detects as the

instructions go through the pipeline. These can be seen near the bottom of
the pipeline window.

7
Make a note of the following values
CPI (Clocks Per Instruction)
SF (Speed-up Factor)
Data Hazards

Has there been an improvement?

Exercise 4 – Loop unrolling optimization minimizing control dependencies
In a previous tutorial on compiler optimizations, we looked at one method of
optimization called “loop unrolling”. This method essentially duplicates the
inner code of a loop as many times as the number of loops, removing some
redundant code as well as the loop’s compare and jump instructions.
However, the code size of the program increases. It is shown that “loop
unrolling” is well suited to instruction pipelining which takes full advantage of
it thus improving CPU performance. Here, we will prove this to be the case.
Enter the following code, select optimization option Redundant Code and
compile it.
program Ex4_1
for n = 1 to 8
t = t + 1
next
end

Make a note of the size of the code generated for Ex4_1 here:

8
9
1
0
Now, load this code in CPU simulator’s memory.

Next, make sure the optimization option Loop Unrolling is selected in

addition to the option Redundant Code optimization. Change the program
name to Ex4_2 and compile it again. Load this code in memory too. So, now
you should have two versions of the code: Ex4_1 without “loop unrolling”
optimization and Ex4_2 with “loop unrolling” optimization.
Loop unrolling is a technique used in CPU simulators to improve the
performance of programs. It works by replacing a loop with multiple
copies of the same code so that the loop is executed multiple times in
one clock cycle. This reduces the number of clock cycles needed to
execute the loop, which can significantly improve the performance of
the program.
Make a note of the size of the code generated for Ex4_2 here:

1
1
Make sure the pipeline window stays on top. Also make sure the Enable
operand forwarding and Enable jump prediction boxes are all unchecked.
First, select program Ex4_1 from the PROGRAM LIST frame in the CPU
simulator window then click the RESET button. Make sure the speed of
simulation is set at maximum. Now click the RUN button to run program
Ex4_1. Observe the pipeline and when the program is finished make a note of
the following values:

1
2
Do the same with program Ex4_2 and make note of the following values:

1
3
Briefly comment on your observations making references to the code sizes
and the number of instructions executed:
:

1
4
Exercise 5 – Compiler re-arranging instruction sequence to help minimize
data dependencies
The optimization in Exercise 4 is one example of how a modern compiler can
provide support for the CPU pipeline. Another example is when the compiler
re-arranges the code without changing the logic of the code. This is done to
minimize pipeline hazards such as the “data hazard” we studied in Exercise
3. Here we demonstrate this technique.
Make sure Show dependencies check box is checked and ONLY the
Redundant Code optimization is selected. Enter the following source code,
compile it and load in memory
program Ex5_1
a = 1
b = a
c = 2
end

Copy the CPU instruction sequence generated below (do not include the
instruction addresses):

1
5
Next, select the optimization option Code Dependencies. Change the
program name to Ex5_2, compile it and load in memory.
Copy the CPU instruction sequence generated below:

1
6
How do the two sequences differ? Does the change affect the logic of the
program? Briefly explain the rationale for the change:

1
7
Let’s see if we can measure any improvement introduced by this “out of
sequence execution” method.
First reset and run program Ex5_1 and make note of the values below:
CPI (Clocks Per Instruction)
SF (Speed-up Factor)

Next, reset and run program Ex5_2 and make note of the values below:
CPI (Clocks Per Instruction)
SF (Speed-up Factor)

Do you see any improvement in program Ex5_2 over program Ex5_1 (express
this in percentage)?

No improment
Exercise 6 – Jump predict table
The CPU pipeline uses a table to keep a record of the predicted jump
addresses. So, whenever a conditional jump instruction is being executed this
table is consulted in order to see what the jump address is predicted as. If
this prediction is wrong then the calculated address is used instead. Often the
predicted address will be correct with occasional wrong prediction. However,
the overall effect will be an improvement on CPU’s performance.
Enter the following program and compile it with ONLY the Enable optimizer
and Remove redundant code check boxes selected. Load the compiled
program in the CPU.
program Ex6
i = 0
for p = 1 to 40
i = i + 1
if i = 10 then
i = 0
r = i
end if
next
end

1
8
Run the program and make a note of the following pipeline stats:

CPI (Clocks Per Instruction)

SF (Speed-up Factor)

Now, in the pipeline window select the Enable jump prediction check box.
Reset the program and run it again. Make a note of the following pipeline
stats:

CPI (Clocks Per Instruction)

SF (Speed-up Factor)

Do you see a difference? Is it an improvement?

1
9
Click on the SHOW JUMP TABLE… button. You should see the Jump Predict
Table window showing. This table keeps an entry relevant to each conditional
jump instruction. The information contained has the following fields. Can you
suggest what each field stands for? Enter your suggestions in the table below:

JInstAddr

JTarget

PStat

Count

2
0

Lab 11 Hannan Mazin Wajahat
No ratings yet
Lab 11 Hannan Mazin Wajahat
8 pages
Spos Unit 1 Introduction Notes
No ratings yet
Spos Unit 1 Introduction Notes
109 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
31 pages
CEN468 Lab 3 V2
No ratings yet
CEN468 Lab 3 V2
14 pages
Electronic Engineering Laboratory IV BEE31101 Instruction Sheet
No ratings yet
Electronic Engineering Laboratory IV BEE31101 Instruction Sheet
12 pages
Register and Flags 2
100% (1)
Register and Flags 2
27 pages
Steps To Receive Data Serially in 8051
No ratings yet
Steps To Receive Data Serially in 8051
1 page
COE 205 Lab Manual Experiment N o 1 1 in
No ratings yet
COE 205 Lab Manual Experiment N o 1 1 in
10 pages
1-32-Bit Microprocessor - Intel 80386
67% (3)
1-32-Bit Microprocessor - Intel 80386
37 pages
InternalArchitecture 8086 - PPT
100% (1)
InternalArchitecture 8086 - PPT
21 pages
Lab 1
No ratings yet
Lab 1
7 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Exercises On Memory Management
No ratings yet
Exercises On Memory Management
3 pages
Gtu Mpi Paper Solution
No ratings yet
Gtu Mpi Paper Solution
19 pages
Ramesh Mandal PPT 3rd Year
No ratings yet
Ramesh Mandal PPT 3rd Year
26 pages
Manual Pipelining
No ratings yet
Manual Pipelining
12 pages
C# Project Proposal2
No ratings yet
C# Project Proposal2
18 pages
L 1 ParallelProcess Challenges
No ratings yet
L 1 ParallelProcess Challenges
82 pages
Coa Module 5
No ratings yet
Coa Module 5
17 pages
MP LAB Cse Manual
No ratings yet
MP LAB Cse Manual
140 pages
UE18EC305: Using Keil IDE (Assembly Level Programs) Cycle 1: Computer Organization Laboratory (0-0-2-1-1)
100% (1)
UE18EC305: Using Keil IDE (Assembly Level Programs) Cycle 1: Computer Organization Laboratory (0-0-2-1-1)
7 pages
8255 Programmable Peripheral Interface
No ratings yet
8255 Programmable Peripheral Interface
32 pages
Microprocessors and Interfacing Devices - Unit-1
No ratings yet
Microprocessors and Interfacing Devices - Unit-1
42 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
Chapter 4 - Introduction To Intel 8086 Microprocessor
No ratings yet
Chapter 4 - Introduction To Intel 8086 Microprocessor
12 pages
8086 Masm Manual
No ratings yet
8086 Masm Manual
44 pages
Microprocessor Case Study
No ratings yet
Microprocessor Case Study
9 pages
Introduction To Cpu Simulator
No ratings yet
Introduction To Cpu Simulator
8 pages
Print
No ratings yet
Print
27 pages
Chapters 1 and 3: ARM Processor Architecture
No ratings yet
Chapters 1 and 3: ARM Processor Architecture
44 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
William Stallings Computer Organization and Architecture 8 Edition Instruction Sets: Addressing Modes and Formats
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Instruction Sets: Addressing Modes and Formats
47 pages
Assignment 5 Sazid
No ratings yet
Assignment 5 Sazid
3 pages
Lab3 ESD
No ratings yet
Lab3 ESD
32 pages
Computer Architecture
100% (2)
Computer Architecture
46 pages
Experiment No
No ratings yet
Experiment No
3 pages
Instruction Set Architecture and Design
No ratings yet
Instruction Set Architecture and Design
27 pages
The 80386 Microprocessors
No ratings yet
The 80386 Microprocessors
24 pages
Lab9 Report
No ratings yet
Lab9 Report
4 pages
CSC3201 - Compiler Construction (Part II) - Lecture 5 - Code Generation
No ratings yet
CSC3201 - Compiler Construction (Part II) - Lecture 5 - Code Generation
64 pages
RISC Architecture and Super Computer: Prof. Sin-Min Lee Department of Computer Science San Jose State University
No ratings yet
RISC Architecture and Super Computer: Prof. Sin-Min Lee Department of Computer Science San Jose State University
85 pages
8086 Interfacing-Chap 5
No ratings yet
8086 Interfacing-Chap 5
36 pages
Introduction To Embedded Systems: Bus Structure
No ratings yet
Introduction To Embedded Systems: Bus Structure
17 pages
Interupt in Arm
No ratings yet
Interupt in Arm
28 pages
Instruction Set and Addressing Modes
No ratings yet
Instruction Set and Addressing Modes
14 pages
Cpu Tutorial 2
No ratings yet
Cpu Tutorial 2
53 pages
ARM7TDMI Processor
No ratings yet
ARM7TDMI Processor
44 pages
Sorting A Series of 10 Numbers: Flowchart
No ratings yet
Sorting A Series of 10 Numbers: Flowchart
4 pages
Investigating Synchronisation
No ratings yet
Investigating Synchronisation
7 pages
Hardware Interfaces To 8051: 1. LCD 2. Keyboard 3. ADC 4. DAC 5. Stepper Motor 6. DC Motor
No ratings yet
Hardware Interfaces To 8051: 1. LCD 2. Keyboard 3. ADC 4. DAC 5. Stepper Motor 6. DC Motor
32 pages
Experiment No - 14: Objective - To Implement 11011 Nonoverlapping Mealy Sequence Detector
No ratings yet
Experiment No - 14: Objective - To Implement 11011 Nonoverlapping Mealy Sequence Detector
7 pages
80286
No ratings yet
80286
28 pages
Midterm X86 Assembly
No ratings yet
Midterm X86 Assembly
6 pages
ARM Lab Questions
No ratings yet
ARM Lab Questions
2 pages
Investigating Instruction Pipelining
No ratings yet
Investigating Instruction Pipelining
8 pages
Pipelining vs. Parallel Processing
No ratings yet
Pipelining vs. Parallel Processing
23 pages
13054119-176 - Pipelining - Aqsa Saleem
No ratings yet
13054119-176 - Pipelining - Aqsa Saleem
21 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Computer Organization Hamacher Instructor Manual Solution - Chapter 7
67% (3)
Computer Organization Hamacher Instructor Manual Solution - Chapter 7
13 pages
CS3351 Digital Principles and Computer Organization
No ratings yet
CS3351 Digital Principles and Computer Organization
30 pages
Pipeline Architecture PDF
100% (1)
Pipeline Architecture PDF
42 pages
Homework 2
No ratings yet
Homework 2
8 pages
CH03 Loaders and Linkers
100% (5)
CH03 Loaders and Linkers
20 pages
Superscalar Architectures: COMP375 Computer Architecture and Organization
No ratings yet
Superscalar Architectures: COMP375 Computer Architecture and Organization
35 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
COA UNIT - III Processor and Control Unit
No ratings yet
COA UNIT - III Processor and Control Unit
127 pages
Question Bank
No ratings yet
Question Bank
10 pages
Helping Slides Pipelining Hazards Solutions
No ratings yet
Helping Slides Pipelining Hazards Solutions
55 pages
L13 MIPS Control Hazards
No ratings yet
L13 MIPS Control Hazards
40 pages
ACA Unit 1
No ratings yet
ACA Unit 1
67 pages
Computer Organization and Design MIPS Edition 5th Edition Patterson Solutions Manualinstant Download
100% (7)
Computer Organization and Design MIPS Edition 5th Edition Patterson Solutions Manualinstant Download
49 pages
Microprocessor
No ratings yet
Microprocessor
22 pages
Principles of Designing Pipelined Processor-1
No ratings yet
Principles of Designing Pipelined Processor-1
32 pages
Computer Hardware and Peripherals
No ratings yet
Computer Hardware and Peripherals
38 pages
Computer Architecture and Organization Chapter 5 &6
No ratings yet
Computer Architecture and Organization Chapter 5 &6
22 pages
DPCO Unit 4 2mark Q&A
No ratings yet
DPCO Unit 4 2mark Q&A
11 pages
Branch Handling 1
No ratings yet
Branch Handling 1
50 pages
Coa Ct3 Set A Answer Key
No ratings yet
Coa Ct3 Set A Answer Key
5 pages
8 - RISCV - Pipelined - Arch2
No ratings yet
8 - RISCV - Pipelined - Arch2
57 pages
Com Roan in & Ar It Re
No ratings yet
Com Roan in & Ar It Re
35 pages
Kien-Truc-May-Tinh - Vo-Tan-Phuong - Chapter04-Exercise - (Cuuduongthancong - Com)
No ratings yet
Kien-Truc-May-Tinh - Vo-Tan-Phuong - Chapter04-Exercise - (Cuuduongthancong - Com)
13 pages
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
No ratings yet
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
21 pages
2EC601 - CA Question Bank
No ratings yet
2EC601 - CA Question Bank
19 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
37 pages
高级计算机体系结构第四课PPT
No ratings yet
高级计算机体系结构第四课PPT
43 pages
Unit 11 - Week 10 - Principles of Pipelining: Assignment 10
No ratings yet
Unit 11 - Week 10 - Principles of Pipelining: Assignment 10
3 pages
Mnemonics: Ase-Register
No ratings yet
Mnemonics: Ase-Register
4 pages
MID SEM Makeup QP July 2021
No ratings yet
MID SEM Makeup QP July 2021
4 pages
Chap 6
No ratings yet
Chap 6
20 pages
Web-Based MIPS Simulation Environmen PDF
No ratings yet
Web-Based MIPS Simulation Environmen PDF
7 pages