Chapter 14 - Processor Structure and Function
Chapter 14 - Processor Structure and Function
The pipeline has two independent stages. The first stage fetches
an instruction and buffers it.When the second stage is free, the
first stage passes it the buffered instruction.While the second
stage is executing the instruction, the first stage takes advantage
of any unused memory cycles to fetch and buffer the next
instruction. This is called instruction prefetch or fetch overlap.
14.6 List and briefly explain various ways in which an
instruction pipeline can deal with conditional branch
instructions.
Multiple streams: A brute-force approach is to replicate the initial portions of
the pipeline and allow the pipeline to fetch both instructions, making use of
two streams. Prefetch branch target: When a conditional branch is
recognized, the target of the branch is prefetched, in addition to the
instruction following the branch. This target is then saved until the branch
instruction is executed. If the branch is taken, the target has already been
prefetched. Loop buffer: A loop buffer is a small, very-high-speed memory
maintained by the instruction fetch stage of the pipeline and containing the n
most recently fetched instructions, in sequence. If a branch is to be taken, the
hardware first checks whether the branch target is within the buffer. If so, the
next instruction is fetched from the buffer. Branch prediction: A prediction is
made whether a conditional branch will be taken when executed, and
subsequent instructions are fetched accordingly. Delayed branch: It is
possible to improve pipeline performance by automatically rearranging
instructions within a program, so that branch instructions occur later than
actually desired.
14.7 List two sources of interrups and exceptions.
1. Interrupts
• Maskable interrupts: Received on the processor’s INTR pin. The
processor does not recognize a maskable interrupt unless the
interrupt enable flag (IF) is set.
• Nonmaskable interrupts: Received on the processor’s NMI pin.
Recognition of such interrupts cannot be prevented.
2. Exceptions
• Processor-detected exceptions: Results when the processor
encounters an error while attempting to execute an instruction.
• Programmed exceptions: These are instructions that generate
an exception (e.g., INTO, INT3, INT, and BOUND).
SHORT ANSWER
1. A processor must: fetch instruction, interpret instruction, process data, write data, and
_________.
fetch data
2. The major components of the processor are an arithmetic and logic unit (ALU) and a
__________.
control unit (CU)
3. The _________ element is needed to transfer data between the various registers and the
ALU.
internal processor bus
5. __________ registers are used by the control unit to control the operation of the
processor and by privileged operating system programs to control the execution of programs.
Control and status
6. Many processor designs include a register or set of registers often known as the
_________ that contain status information and condition codes.
program status word (PSW)
7. An instruction cycle includes the following stages: fetch, execute, and _______.
interrupt
8. __________ is a process where new inputs are accepted at one end before previously
accepted inputs appear as outputs at the other end.
Pipelining
9. __________ or fetch overlap is where, while the second stage is executing the
instruction, the first stage takes advantage of any unused memory cycles to fetch and
buffer the next instruction.
Instruction prefetch
10. A __________ occurs when the pipeline, or some portion of the pipeline, must stall
because conditions do not permit continued execution.
pipeline hazard
11. The three types of data hazards are: read after write (RAW), write after write (WAW),
and _________.
write after read (WAR)
12. A _________, also known as a branch hazard, occurs when the pipeline makes the
wrong decision on a branch prediction and therefore brings instructions into the pipeline
that must subsequently be discarded.
control hazard
13. Two classes of events cause the x86 to suspend execution of the current instruction
stream and respond to the event: interrupts and ________.
exceptions
14. The ________ flag allows the programmer to disable debug exceptions so that the
instruction can be restarted after a debug exception without immediately causing another
debug exception.
resume
15. Data are exchanged with the processor from external memory through a _________.
data bus
Chapter 15 - Reduced Instruction Set Computers
Although RISC systems have been defined and designed in a variety of ways
by different groups, the key elements shared by most designs are these:
• A large number of general-purpose registers, and/or the use of compiler
technology to optimize register usage
• A limited and simple instruction set
• An emphasis on optimizing the instruction pipeline
As the cost of hardware has dropped, the relative cost of software has risen.
Along with that, a chronic shortage of programmers has driven up software
costs in absolute terms. Thus, the major cost in the life cycle of a system is
software, not hardware. Adding to the cost, and to the inconvenience, is the
element of unreliability: it is common for programs, both system and
application, to continue to exhibit new bugs after years of operation.
15.3 What is the use construction of a circular buffer register?
How many produre activations can be held by N-window register files?
The circular organization is shown in
Figure 13.2, which depicts a circular
buffer of six windows. The buffer is
filled to a depth of 4 (A called B; B
called C; C called D) with procedure D
active.The current-window pointer
(CWP) points to the window of the
currently active procedure. Register
references by a machine instruction are
offset by this pointer to determine the
actual physical register. The saved
window pointer (SWP) identifies the
window most recently saved in
memory. If procedure D now calls
procedure E, arguments for E are
placed in D’s temporary registers (the
overlap between w3 and w4) and the
CWP is advanced by one window.
If procedure E then makes a call to procedure F, the call cannot be made with
the current status of the buffer. This is because F’s window overlaps A’s
window. If F begins to load its temporary registers, preparatory to a call, it will
overwrite the parameter registers of A (A.in).Thus, when CWP is incremented
(modulo 6) so that it becomes equal to SWP, an interrupt occurs, and A’s
window is saved. Only the first two portions (A.in and A.loc) need be saved.
Then, the SWP is incremented and the call to F proceeds. A similar interrupt
can occur on returns. For example, subsequent to the activation of F, when B
returns to A, CWP is decremented and becomes equal to SWP. This causes an
interrupt that results in the restoration of A’s window. From the preceding, it
can be seen that an N-window register file can hold only N-1 procedure
activations. The value of N need not be large.
15.4 Explain the “one instruction per cycle”
characteristic of RISC.
3. The difference between the operations provided in high-level languages (HLLs) and those
provided in computer architecture is known as the ________.
semantic gap
4. Blocks of memory, recently used global variables, memory addressing, and one operand
addressed and accessed per cycle are characteristics of _________ organizations.
cache
5. Individual variables, compiler assigned global variables, register addressing, and multiple
operands addressed and accessed in one cycle are characteristics of __________ organizations.
large register file
6. The acronym RISC stands for __________.
reduced instruction set computer
8. A ________ is defined to be the time it takes to fetch two operands from registers,
perform an ALU operation, and store the result in a register.
machine cycle
10. __________ is a way of increasing the efficiency of the pipeline by making use of a
branch that does not take effect until after execution of the following instruction.
Delayed branch
11. ________ can improve performance by reducing loop overhead, increasing instruction
parallelism by improving pipeline performance, and improving register, data cache, or TLB
locality.
Unrolling
12. The MIPS R4000 processor chip is partitioned into two sections, one containing the
CPU and the other containing a _________ for memory management.
coprocessor
13. A ________ architecture replicates each of the pipeline stages so that two or more
instructions at the same stage of the pipeline can be processed simultaneously.
superscalar
15. The work that has been done on assessing merits of the RISC approach can be grouped
into two categories: quantitative and _________.
qualitative
Chapter 16 - Instruction-Level Parallelism and
Superscalar Processors
16.1 Overview
16.2 Design Issues
Key Terms
antidependency out-of-order issue
branch prediction output dependency
commit procedural dependency
flow dependency read-write dependency
in-order completion register renaming
in-order issue resource conflict
instruction issue retire
instruction-level parallelism superpipelined
instruction window superscalar
machine parallelism true data dependency
micro-operations write-read dependency
micro-ops write-write dependency
out-of-order completion
REVIEW QUESTIONS
2. The term ________ refers to a machine that is designed to improve the performance of
the execution of scalar instructions.
superscalar
3. ________ exploits the fact that many pipeline stages perform tasks that require less
than half a clock cycle.
Superpipelining
4. The term _________ parallelism refers to the degree to which, on average, the
instructions of a program can be executed in parallel.
instruction-level
5. A _________ is a competition of two or more instructions for the same resource at the
same time.
resource conflict
6. _________ is a measure of the ability of the processor to take advantage of instruction-
level parallelism.
Machine parallelism
7. Committing or _________ the instruction is when instructions are conceptually put back
into sequential order and their results are recorded.
retiring
8. In the operation of the Pentium 4 each instruction is translated into one or more fixed-
length RISC instructions known as _________.
micro-operations (or micro-ops)
9. The ________ takes the already decoded micro-ops from the instruction decoder and
assembles them in to program-ordered sequences of micro-ops called traces.
trace cache
10. The _________ predicts the instruction stream, fetches instructions from the L1
instruction cache, and places the fetched instructions into a buffer for consumption by the
decode pipeline.
instruction fetch unit
11. Instruction-level parallelism is also determined by __________, which is the time until
the result of an instruction is available for use as an operand in a subsequent instruction.
operation latency
12. Superscalar instruction issue policies are grouped into the following categories: in-
order issue with in-order completion, out-of-order issue with out-of-order completion, and
____________.
in-order issue with out-of-order completion
13. With ____________ any number of instructions may be in the execution stage at any
one time, up to the maximum degree of machine parallelism across all functional units.
out-of-order completion
14. The ________ is a buffer used to decouple the decode and execute stages of the
pipeline to allow out-of-order issue.
instruction window
Invalid: The line in the cache does not contain valid data.
17.7 What are some of the key benefits of clustering?
Absolute scalability: It is possible to create large clusters that far surpass the
power of even the largest standalone machines.
Uniform memory access (UMA): All processors have access to all parts of
main memory using loads and stores. The memory access time of a processor
to all regions of memory is the same. The access times experienced by
different processors are the same.
Nonuniform memory access (NUMA): All processors have access to all parts
of main memory using loads and stores. The memory access time of a
processor differs depending on which region of main memory is accessed.
The last statement is true for all processors; however, for different processors,
which memory regions are slower and which are faster differ.
2. Computer systems that fall into the _________ category have a single machine instruction
that controls the simultaneous execution of a number of processing elements on a lockstep
basis.
SIMD (single instruction, multiple data stream)
3. Computer systems that fall into the __________ category have a sequence of data that is
transmitted to a set of processors, each of which executes a different instruction sequence.
MISD (multiple instruction, single data stream)
4. Computer systems that fall into the ________ category have a set of processors that
simultaneously execute different instruction sequences on different data sets.
MIMD (multiple instruction, multiple data stream)
7. ______ protocols distribute the responsibility for maintaining cache coherence among
all of the cache controllers in a multiprocessor.
Snoopy
8. The four states of the MESI protocol are: modified, shared, invalid, and ______.
exclusive
13. Widely used in data centers to save space and improve system management, a
_________ is a server architecture that houses multiple server modules in a single chassis.
blade server
14. An approach where all processors have access to all parts of main memory using loads
and stores, with the memory access time of a processor differing depending on which
region of main memory is accessed, is _________.
NUMA (nonuniform memory access)
15. A good example of a pipelined ALU organization for vector processing is the vector
facility developed for the _________ architecture and implemented on the high-end 3090
series.
IBM 370
Chapter 18 - Multicore Computers
Amdahl’s law
chip multiprocessor
multicore
simultaneous multithreading (SMT)
superscalar
REVIEW QUESTIONS
Some of the key elements of the threading strategy for the rendering
module
Are listed in [LEON07] and include the following:
• Construct scene-rendering lists for multiple scenes in parallel (e.g.,
the world and its reflection in water).
• Overlap graphics simulation.
• Compute character bone transformations for all characters in all
scenes in parallel.
• Allow multiple threads to draw in parallel.
18.5 At a top level, what are the main design variables
in a multicore organization?
2. A related advantage is that data shared by multiple cores is not replicated at the
shared cache level.
3. With proper frame replacement algorithms, the amount of shared cache allocated
to each core is dynamic, so that threads that have a less locality can employ more
cache.
5. The use of a shared L2 cache confines the cache coherency problem to the L1 cache
level, which may provide some additional performance advantage.
SHORT ANSWER
1. _________ states that performance increase is roughly proportional to square root of
increase in complexity.
Pollack’s rule
2. _______ law assumes a program in which a fraction (1- f )of the execution time involves
code that is inherently serial and a fraction f that involves code that is infinitely parallelizable
with no scheduling overhead.
Amdahl’s
7. ________ threading involves the selective use of fine-grain threading for some systems
and single threading for other systems.
Hybrid
8. ________ threading is when many similar or identical tasks are spread across multiple
processors.
Fine-grained
9. Individual modules called systems are assigned to individual processors with ________
threading.
coarse
10. The ________, introduced in 2006, implements two x86 superscalar processors with a
shared L2 cache.
Intel Core Duo
11. The Core Duo _________ is designed to manage chip heat dissipation to maximize
processor performance within thermal constraints.
thermal control unit
12. The __________ is responsible for reducing power consumption when possible, thus
increasing battery life for mobile platforms, such as laptops.
power management logic
13. A multicore computer, also known as a _________, combines two or more processors
on a single piece of silicon.
chip microprocessor
control bus
control path
control signal
control unit
hardwired
implementation
microoperations
REVIEW QUESTIONS
2. The ____________ of a processor causes the processor to step through a series of micro-
operations in the proper sequence, based on the program being executed.
control unit
3. The _________ of a processor generates the control signals that cause each micro-
operation to be executed.
control unit
4. The ____________ generated by the control unit cause the opening and closing of logic
gates, resulting in the transfer of data to and from registers and the operation of the ALU.
control signals
5. The six things needed to specify the function of a processor are: operations (opcodes),
addressing modes, registers, I/O module interface, memory module interface, and ________.
interrupts
6. Each of the smaller cycles involves a series of steps, each of which involves the
processor registers, referred to as _________.
micro-operations
7. The __________ register specifies the address in memory for a read or write
operation.
memory address (MAR)
8. The __________ register contains the value to be stored in memory or the last value
read from memory.
memory buffer (MBR)
9. If the instruction specifies an indirect address, then a(n) ________ cycle must precede
the execute cycle.
indirect
10. __________ is when the control unit examines the opcode and generates a sequence
of micro-operations based on the value of the opcode.
Instruction decoding
11. The key control unit inputs are: clock, instruction register, control signals from control
bus, and _________.
flags
12. The timing of processor operations is synchronized by the __________ and controlled
by the control unit with control signals.
clock
13. Control unit implementation techniques fall into two categories: microprogrammed
implementation and ___________ implementation.
hardwired
14. The _____________ must control the state of the instruction cycle.
control unit
15. In a __________ implementation the control unit is essentially a state machine circuit
and its input logic signals are transformed into a set of output logic signals, which are the
control signals.
hardwired
Chapter 20 - Microprogrammed Control
5. In a __________ microinstruction every bit in the control field attaches to a control line.
horizontal
6. In a ________ microinstruction a code is used for each action to be performed and the
decoder translates this code into individual control signals.
vertical
8. _________ processors, with their simpler instruction format, typically use hardwired
control units.
RISC
9. The two basic tasks performed by a microprogrammed control unit are microinstruction
sequencing and microinstruction __________.
execution
12. Each microinstruction cycle is made up of two parts: fetch and _________.
execute
13. Two approaches can be taken to organizing the encoded microinstruction into fields:
functional and __________.
resource
15. The principal function of the 8818 __________ is to generate the next microinstruction
address for the microprogram.
microsequencer
Key Terms – Assembly Language
assembler loading
assembly language macro
comment mnemonic
directive one-pass assembler
dynamic linker operand
instruction relocation
label run-time dynamic linking
linkage editor two-pass assembler
linking
load-time dynamic linking