Study Guide Chapter 3

Chapter 3 discusses instruction-level parallelism (ILP) in CPU architecture, focusing on techniques like pipelining and dynamic scheduling to enhance performance. It covers the challenges posed by conditional code and dependencies, including data, name, and control dependencies, and introduces various optimization techniques such as loop unrolling and branch prediction. Additionally, it explores dynamic scheduling methods, register renaming, multiple issue, and multithreading techniques to improve CPU efficiency.

Uploaded by

o6pb5s1mp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views3 pages

Study Guide Chapter 3

Uploaded by

o6pb5s1mp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Exam 2 Study Guide

Chapter 3 is on utilizing instruction-level parallelism (ILP) to improve performance in a CPU architecture.

ILP refers to the modern techniques of pipelining instructions to allow multiple instructions to be in
flight at the same time at different execution stages. This may be combined with dynamic scheduling as
well, allowing the re-ordering of instructions for better efficiency, or statically scheduled with any
optimization taking place at compile time. ILP is constrained by a few different factors, primarily the
presence of conditional code and dependencies. Conditional code may be handled in a few different
fashions to minimize its impact but remains a performance problem. The presence of conditional code
limits the ability to reorganize other code, as it is generally not possible to reschedule past the condition.
The book estimates that modern code has branches in 15-25% of instructions, meaning that only every
3-5 instructions can be reordered before running into a branch.

Dependencies generally come in one of three categories: data dependencies, name dependencies, and
control dependencies. Data dependencies occur when instructions can not be reordered due to the
results of one instruction either directly or indirectly requiring the completion of a previous instruction
to ensure correct results. In a simplified math example, if you had the equation A=B+C*D, you would
get a different result for A (disregarding selecting values specifically to avoid this, such as zero for all) if
you tried to change the order of operations within the equation for better efficiency. In a program, this
often involves doing math with a variable value that is being modified previously in the code. If the
math occurs before the modification, the result is not the expected value as the correct value of the
variable had not yet been computed.

These data dependencies fall into the patterns of read after write (RAW), write after write (WAW), and
write after read (WAR). Read after read also occurs, but is not a hazard, as no modification of data
occurs so values remain consistent. In read after write, you must maintain the order of operations to
ensure that a read occurring after a write receives the correct value (reversing the order would result in
reading the unmodified value). Write after write can cause an issue if a later write in a program is
moved in front of a previous write of the same location. This would allow the execution of the program
to continue with the incorrect data stored, as instead of overwriting the data to the second value, the
write that should have occurred first occurs second. This is generally not a large consideration due to
the order of operations in most CPU pipelines. Write after read occurs when you move a write in front
of a read for a piece of data. This results in the read receiving the (incorrectly) updated data, instead of
whatever the pre-write value was.

Name dependencies occur when multiple instructions interact with the same named register or memory
location. These have similar consequences to the data dependencies, but the instructions are not
chained together in any fashion, they are just utilizing the same resources.

In a control dependency, instructions are dependent if they can’t be reordered with respect to a branch
without changing the operation of the program. A simple example of this is an IF/THEN/ELSE
statement. The contents of the ELSE are control dependent upon the else branch; they can’t be moved
into the prior control branch without changing the operation of the program.
There are a number of techniques that can be used to expose and exploit ILP for increased performance.
These may be present in the compiler for compile-time optimizations, in the hardware for runtime
optimizations, or in both.

One of the fundamental hardware ILP optimizations is in intelligently scheduling the pipeline of the CPU.
The pipeline allows multiple instructions to be issued and worked on during each clock cycle, with each
of them being in a different execution stage. Pipeline stalls occur when dependencies occur in close
proximity and can’t be worked around, or if a branch is misinterpreted (among other things). These
stalls may be avoided if the instructions can be placed far enough apart to ensure that the dependent
instruction is not scheduled prior to the completion of the execution of the instruction it is dependent
upon.

Another technique of this type is loop unrolling. In a program structure where you have a loop that has
no dependencies between loop iterations, the compiler can pull the loop apart and restructure the
execution to allow for more efficient execution. For example, if you had an architecture that could issue
4 reads a clock, and a loop that required reading in two values per loop execution, the compiler could
restructure the loop to pull in all four values in one execution of the loop, cutting the repetitions of the
loop in half via duplicating work in each individual execution. This also serves to reduce branch control
hazards, as with less invocations of the loop body, you parse the control structure of the loop (loop
again or not) less times.

The next scheduling improvement has to do with mitigating against branch stalls via adding support for
intelligently deciding if a branch is likely to be taken or not. The most simple version of branch
prediction relies on a static table that tracks how often a branch was taken in the past and applies this to
make a decision on if a branch will be taken in the future. This does nothing for the first execution of a
branch, as there is no information on the history of it; it’s mainly for evaluating loops. A more
complicated version tracks execution of branches as a program is running and dynamically maintains a
status on the last behavior of a branch to predict if it will be taken again or not. These can come in a 1-
bit predictor, which flips every time branch behavior changes, and a 2-bit version that requires two
consecutive changes of behavior (yes-no-no for example) to change the prediction.

A more modern approach uses correlated branch predictors, such that the hardware can note patterns
between branches and apply this to future branch behavior prediction. For example, if there are
branches B1, B2, and B3, a correlated system could note that B3 is always taken if B1 is taken and
avoided if B1 is avoided. This allows the system to predict B3 with absolute efficiency in this situation.

Another approach is referred to as a tournament predictor. In this setup, the hardware tracks both the
behavior of a branch at execution time in a global predictor and the address of the branch in a local
predictor. The system can refer to both of these data points when guessing branch behavior.

Dynamic scheduling has also been referred to several times as a method for extracting ILP in an
architecture. This approach seeks to avoid pipeline stalls by moving instruction order around to better
utilize the processor pipeline. There are data dependency hazards with this that would not occur in a
static issue design, as you don’t have to account for the affects of moving dependent instructions
around. These hazards can be dealt with utilizing a variety of techniques to mitigate against data
hazards.
The most common method of avoiding data dependencies introduced via dynamic scheduling is via
register renaming, which is used in all modern desktop architectures. In this scheme, you have
addressable registers, or named registers, that instructions reference to perform their work. However,
there are a much larger number of available registers that exist in the implementation of the ISA, but are
not directly addressable. The CPU would then map instructions from a named register to an actual
register, allowing multiple variations of data to exist in a single named register, as the multiple versions
of the data exist in separate physical registers which are selected as needed to represent the named
register.

Register renaming is complemented by a reorder buffer (ROB) that allows the CPU to commit (ie,
actually completing the operation of an instruction) that is added to the standard issue/execute/write
cycle of an instruction. This approach allows a CPU to issue instructions (ie, read them from the
instruction queue) in order, dynamically reschedule them for execution, write the results to the ROB,
and then commit the results in order by shuffling through the committed results in the ROB. From an
external view, the CPU becomes and in-order static system, pulling in sequential instructions in order
and executing them in order. Internally it is dynamically scheduling execution, but the ROB allows the
commits to occur in-order to avoid dependencies.

Another technique that can be combined with speculation (or not) and with dynamic scheduling (or not)
is multiple issue. In this implementation, some functional units inside the CPU (ALU, FPU, etc.) will be
duplicated and/or functional parts have been updated to allow for intaking/exporting multiple
operations (register with multiple read/write ports, for example). Modern CPUs typically issue 4+
instructions per clock. This is used in all modern architectures, even simple static scheduled
architectures, but it works better with more complicated implementations as they allow the technique
more room to actually execute the instructions fetched. I won’t be asking you to duplicate or answer
questions about the implementation graphics on page 223, that would be beyond the scope of the
exam.

The last concept on the exam will be on multithreading techniques. These are various techniques for
switching the execution resources of the CPU core from one task to another based on either stalls in the
instruction stream (ie, if a thread stalls out, you should switch to another one while waiting for the stall
to clear on the first one) or based upon a scheduling algorithm. The basic approaches are coarse
multithreaded, which isn’t used in a modern architecture but involves relatively slow swapping between
active threads; fine multithreading, which swaps threads in each clock cycle (assuming one isn’t stalled,
in which case it would continue to work on the active one), and simultaneous multithreading (SMT), in
which instructions from multiple threads can co-exist in the CPU pipeline at the same time via larger
register files and register renaming to keep the context of both threads current with the addition of
allowing the CPU to issue instructions from multiple threads in the same clock cycle. SMT is commonly
used in modern CPUs to allow “extra” threads on the CPU which can utilize instruction issue slots which
previously would not have been utilized, increasing overall efficiency in most circumstances. The
technique is not an improvement (and may cause performance losses) if a single thread was already
utilizing all issue resources in an efficient fashion.

The Art of Functional Programming Minh Quang Tran PDF Download
No ratings yet
The Art of Functional Programming Minh Quang Tran PDF Download
81 pages
Super 25 Mic Q&A v2v-1
No ratings yet
Super 25 Mic Q&A v2v-1
50 pages
Unit Iii
No ratings yet
Unit Iii
19 pages
CA Unit-3 Part2
No ratings yet
CA Unit-3 Part2
8 pages
Hardware Computer Systems and Components Storage Devices and Cloud Computing
No ratings yet
Hardware Computer Systems and Components Storage Devices and Cloud Computing
15 pages
COA Tutorial 2 With Ans
No ratings yet
COA Tutorial 2 With Ans
5 pages
CH 2 - Arithmetic Instruction Group
No ratings yet
CH 2 - Arithmetic Instruction Group
22 pages
Control Flow and Data Flow
No ratings yet
Control Flow and Data Flow
15 pages
Digital Computer Fundamentals and Microprocessor
No ratings yet
Digital Computer Fundamentals and Microprocessor
87 pages
Sister Nivedita University Syllabus FOR Three Years Bachelor Degree Course IN Computer Application (Bca) Under Ugc-Cbcs System
No ratings yet
Sister Nivedita University Syllabus FOR Three Years Bachelor Degree Course IN Computer Application (Bca) Under Ugc-Cbcs System
72 pages
Cod 5 Coa
No ratings yet
Cod 5 Coa
11 pages
U3.1 Concepts and Challenges
No ratings yet
U3.1 Concepts and Challenges
12 pages
EC483 Fall2024 W7
No ratings yet
EC483 Fall2024 W7
40 pages
8086 Instruction Set
No ratings yet
8086 Instruction Set
92 pages
Module 5 Pipeline and Vector Processing
No ratings yet
Module 5 Pipeline and Vector Processing
71 pages
Lecture-7-15 01 2025
No ratings yet
Lecture-7-15 01 2025
19 pages
Ch#16 (CPU Structure and Function)
No ratings yet
Ch#16 (CPU Structure and Function)
48 pages
Unit 6
No ratings yet
Unit 6
11 pages
Qiskit Pulse: Programming Quantum Computers Through The Cloud With Pulses
No ratings yet
Qiskit Pulse: Programming Quantum Computers Through The Cloud With Pulses
17 pages
MIPS Addressing
No ratings yet
MIPS Addressing
9 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
53 pages
Arm 32
0% (1)
Arm 32
20 pages
CompanionAsset 9780128119051 Chapter03
No ratings yet
CompanionAsset 9780128119051 Chapter03
67 pages
4-Advanced Pipelining - 241114 - 060906
No ratings yet
4-Advanced Pipelining - 241114 - 060906
80 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
4 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
22 pages
Lecture On Embedded System (Part - 3)
No ratings yet
Lecture On Embedded System (Part - 3)
38 pages
Department of Information and Communication Technology Course: ICT-3205: Computer Architecture and Microprocessor
No ratings yet
Department of Information and Communication Technology Course: ICT-3205: Computer Architecture and Microprocessor
12 pages
Parallel Processing Assignment 1
No ratings yet
Parallel Processing Assignment 1
14 pages
Model COC Exam ITSS 2 CODE002
100% (1)
Model COC Exam ITSS 2 CODE002
9 pages
Module 5 Instruction Level Parallelism and Pipelining
No ratings yet
Module 5 Instruction Level Parallelism and Pipelining
54 pages
Unit4 Aca
No ratings yet
Unit4 Aca
6 pages
10 Pipelining
No ratings yet
10 Pipelining
44 pages
MCP Unit 1
No ratings yet
MCP Unit 1
41 pages
Important Question Solutions of 8085 - ComputerSC
No ratings yet
Important Question Solutions of 8085 - ComputerSC
59 pages
COA Question Bank - CO Mapping
No ratings yet
COA Question Bank - CO Mapping
3 pages
CS 258 Parallel Computer Architecture: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
No ratings yet
CS 258 Parallel Computer Architecture: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
44 pages
S4 B.tech (2019) Syllabus
No ratings yet
S4 B.tech (2019) Syllabus
175 pages
Computer Generations and Applications: of Computers
No ratings yet
Computer Generations and Applications: of Computers
20 pages
COA Unit - V Notes
No ratings yet
COA Unit - V Notes
21 pages
Ec8552 - Cao MCQ
No ratings yet
Ec8552 - Cao MCQ
27 pages
Coa Unit 4
No ratings yet
Coa Unit 4
10 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
CPU Design PDF
No ratings yet
CPU Design PDF
53 pages
A Computer Is An Electronic Device
No ratings yet
A Computer Is An Electronic Device
10 pages
3DNow! and MMX Instructions Set PDF
No ratings yet
3DNow! and MMX Instructions Set PDF
44 pages
ILP Overview and Scoreboard
No ratings yet
ILP Overview and Scoreboard
60 pages
Lecture 5
No ratings yet
Lecture 5
76 pages
Computer Science and Engineering 2017 Scheme Syllabus
No ratings yet
Computer Science and Engineering 2017 Scheme Syllabus
147 pages
EEE 105 Lab Exercises 6 and 7: Multi-Cycle MIPS Processor I and II
No ratings yet
EEE 105 Lab Exercises 6 and 7: Multi-Cycle MIPS Processor I and II
4 pages
Unit IV Material Part 2 1704950984185
No ratings yet
Unit IV Material Part 2 1704950984185
6 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
CAP EndSem Unit 5
No ratings yet
CAP EndSem Unit 5
8 pages
Be Computer Engineering Semester 4 2018 December Computer Organization and Architecture Cbcgs
No ratings yet
Be Computer Engineering Semester 4 2018 December Computer Organization and Architecture Cbcgs
18 pages
Coa Iat-2 QB Soln
No ratings yet
Coa Iat-2 QB Soln
16 pages
C.Arch Large
No ratings yet
C.Arch Large
57 pages
CA 2mark and 16 Mark With Answer
No ratings yet
CA 2mark and 16 Mark With Answer
112 pages
Dpco Unit 4
No ratings yet
Dpco Unit 4
21 pages
MPCA Assignment 11 B - 66
No ratings yet
MPCA Assignment 11 B - 66
5 pages
SIC Assembler
No ratings yet
SIC Assembler
26 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
Computer Architectur by FM Sir
No ratings yet
Computer Architectur by FM Sir
23 pages
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
No ratings yet
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
201 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
Ch2 Lec7 Instruction Piplining
No ratings yet
Ch2 Lec7 Instruction Piplining
34 pages
Pipelining Become Universal Technique in 1985
No ratings yet
Pipelining Become Universal Technique in 1985
16 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
No ratings yet
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
14 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
No ratings yet
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
50 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
12 - Processor Structure and Function
No ratings yet
12 - Processor Structure and Function
73 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
214 pages
Aca Important Questions 2 Marks 16marks
60% (5)
Aca Important Questions 2 Marks 16marks
18 pages
Branch Prediction Techniques
No ratings yet
Branch Prediction Techniques
48 pages
03 Dynamic Sched
No ratings yet
03 Dynamic Sched
84 pages
ACA Unit 3
No ratings yet
ACA Unit 3
17 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
Oracle 11g Streams Implementer's Guide
From Everand
Oracle 11g Streams Implementer's Guide
Ann L. R. McKinnell
No ratings yet
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
Instruction Level Parallelism-Concepts N Challenges
100% (1)
Instruction Level Parallelism-Concepts N Challenges
4 pages
Instruction Level Parallelism: Soner Onder
No ratings yet
Instruction Level Parallelism: Soner Onder
25 pages
Cs2354 Advanced Computer Architecture 2 Marks
No ratings yet
Cs2354 Advanced Computer Architecture 2 Marks
10 pages
ORACLE PL/SQL Interview Questions You'll Most Likely Be Asked
From Everand
ORACLE PL/SQL Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
5/5 (1)
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Study Guide Chapter 3

Uploaded by

Study Guide Chapter 3

Uploaded by

Exam 2 Study Guide

Chapter 3 is on utilizing instruction-level parallelism (ILP) to improve performance in a CPU architecture.

You might also like