0% found this document useful (0 votes)
238 views74 pages

Chapter 14 - Processor Structure and Function

This document discusses processor structure and function including topics like processor organization, register organization, instruction cycles, instruction pipelining, key terms, and review questions. It covers concepts such as user-visible registers, control registers, general purpose registers, condition codes, program status words, instruction prefetching, pipeline hazards, interrupts, exceptions, and data buses. The review questions ask about these topics to test understanding of processor components, register types, instruction cycles, pipelining, and RISC vs CISC architectures.

Uploaded by

Deadly Chiller
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
238 views74 pages

Chapter 14 - Processor Structure and Function

This document discusses processor structure and function including topics like processor organization, register organization, instruction cycles, instruction pipelining, key terms, and review questions. It covers concepts such as user-visible registers, control registers, general purpose registers, condition codes, program status words, instruction prefetching, pipeline hazards, interrupts, exceptions, and data buses. The review questions ask about these topics to test understanding of processor components, register types, instruction cycles, pipelining, and RISC vs CISC architectures.

Uploaded by

Deadly Chiller
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

Chapter 14 - Processor Structure and Function

14.1 Processor Organization


14.2 Register Organization
14.3 Instruction Cycle
14.4 Instruction Pipelining
Key Terms
branch prediction
condition code
delayed branch
flag
instruction cycle
instruction pipeline
instruction prefetch
program status word (PSW)
REVIEW QUESTIONS

14.1 Explain user-visible registers and control/status


registers.
User-visible registers: These enable the machine- or assembly
language programmer to minimize main-memory references by
optimizing use of registers.
Control and status registers: These are used by the control unit
to control the operation of the CPU and by privileged, operating
system programs to control the execution of programs.
14.2 Give examples of general-purpose registers.

Segment pointers, Index registers, Stack pointer.


14.3 Explain the advantages and disadvantages of
condition codes.
Advantages Disadvantages
• They should reduce the number of • Condition codes add complexity to
COMPARE and TEST instructions both software and hardware.
needed because condition codes are • Condition codes are irregular; they are
set by normal arithmetic and data typically not part of the main data
movement instructions. path, so they require extra hardware
• Conditional instructions, like BRANCH connections.
is simpler than composite instruction • Often condition code machines must
like TEST AND BRANCH. add special non-condition-code
• Condition codes facilitate multiway instructions for special situations
branches. For example, a TEST anyway, such as bit checking, loop
instruction can be followed by two control, and atomic semaphore
branches, one on less than zero and operations.
one on greater than or equal to zero. • In a pipelined implementation,
• Condition codes can be saved on stack condition codes require special
during subroutine calls along with synchronization to avoid conflicts.
other register information.
14.4 List and explain the common fields of a program
status word.
• Sign: Contains the sign bit of the result of the last arithmetic
operation.
• Zero: Set when the result is 0.
• Carry: Set if an operation resulted in a carry (addition) into or
borrow (subtraction) out of a high-order bit. Used for multiword
arithmetic operations.
• Equal: Set if a logical compare result is equality.
• Overflow: Used to indicate arithmetic overflow.
• Interrupt Enable/Disable: Used to enable or disable interrupts.
• Supervisor: Indicates whether the processor is executing in
supervisor or user mode. Certain privileged instructions can be
executed only in supervisor mode, and certain areas of memory can
be accessed only in supervisor mode.
14.5 What do you mean by “intruction prefetch” or
“fetch overlap”?

The pipeline has two independent stages. The first stage fetches
an instruction and buffers it.When the second stage is free, the
first stage passes it the buffered instruction.While the second
stage is executing the instruction, the first stage takes advantage
of any unused memory cycles to fetch and buffer the next
instruction. This is called instruction prefetch or fetch overlap.
14.6 List and briefly explain various ways in which an
instruction pipeline can deal with conditional branch
instructions.
Multiple streams: A brute-force approach is to replicate the initial portions of
the pipeline and allow the pipeline to fetch both instructions, making use of
two streams. Prefetch branch target: When a conditional branch is
recognized, the target of the branch is prefetched, in addition to the
instruction following the branch. This target is then saved until the branch
instruction is executed. If the branch is taken, the target has already been
prefetched. Loop buffer: A loop buffer is a small, very-high-speed memory
maintained by the instruction fetch stage of the pipeline and containing the n
most recently fetched instructions, in sequence. If a branch is to be taken, the
hardware first checks whether the branch target is within the buffer. If so, the
next instruction is fetched from the buffer. Branch prediction: A prediction is
made whether a conditional branch will be taken when executed, and
subsequent instructions are fetched accordingly. Delayed branch: It is
possible to improve pipeline performance by automatically rearranging
instructions within a program, so that branch instructions occur later than
actually desired.
14.7 List two sources of interrups and exceptions.

1. Interrupts
• Maskable interrupts: Received on the processor’s INTR pin. The
processor does not recognize a maskable interrupt unless the
interrupt enable flag (IF) is set.
• Nonmaskable interrupts: Received on the processor’s NMI pin.
Recognition of such interrupts cannot be prevented.

2. Exceptions
• Processor-detected exceptions: Results when the processor
encounters an error while attempting to execute an instruction.
• Programmed exceptions: These are instructions that generate
an exception (e.g., INTO, INT3, INT, and BOUND).
SHORT ANSWER
1. A processor must: fetch instruction, interpret instruction, process data, write data, and
_________.
fetch data

2. The major components of the processor are an arithmetic and logic unit (ALU) and a
__________.
control unit (CU)

3. The _________ element is needed to transfer data between the various registers and the
ALU.
internal processor bus

4. _________ registers enable the machine or assembly language programmer to minimize


main memory references by optimizing use of registers.
User-visible

5. __________ registers are used by the control unit to control the operation of the
processor and by privileged operating system programs to control the execution of programs.
Control and status
6. Many processor designs include a register or set of registers often known as the
_________ that contain status information and condition codes.
program status word (PSW)

7. An instruction cycle includes the following stages: fetch, execute, and _______.
interrupt

8. __________ is a process where new inputs are accepted at one end before previously
accepted inputs appear as outputs at the other end.
Pipelining

9. __________ or fetch overlap is where, while the second stage is executing the
instruction, the first stage takes advantage of any unused memory cycles to fetch and
buffer the next instruction.
Instruction prefetch

10. A __________ occurs when the pipeline, or some portion of the pipeline, must stall
because conditions do not permit continued execution.
pipeline hazard
11. The three types of data hazards are: read after write (RAW), write after write (WAW),
and _________.
write after read (WAR)

12. A _________, also known as a branch hazard, occurs when the pipeline makes the
wrong decision on a branch prediction and therefore brings instructions into the pipeline
that must subsequently be discarded.
control hazard

13. Two classes of events cause the x86 to suspend execution of the current instruction
stream and respond to the event: interrupts and ________.
exceptions

14. The ________ flag allows the programmer to disable debug exceptions so that the
instruction can be restarted after a debug exception without immediately causing another
debug exception.
resume

15. Data are exchanged with the processor from external memory through a _________.
data bus
Chapter 15 - Reduced Instruction Set Computers

15.1 Instruction Execution Characteristics


15.2 The Use of a Large Register File
15.3 Compiler-Based Register Optimization
15.4 Reduced Instruction Set Architecture
15.5 RISC Pipelining
15.6 MIPS R4000
15.7 SPARC
15.8 RISC Versus CISC Controversy
Key Terms
complex instruction set computer (CISC)
delayed branch
delayed load
high-level language (HLL)
reduced instruction set computer (RISC)
register file
register window
SPARC
REVIEW QUESTIONS
15.1 List the charateristics of an RISC organization.
The different characteristics are:
• The family concept: Introduced by IBM with its System/360 in 1964, followed shortly
thereafter by DEC, with its PDP-8. The family concept decouples the architecture of a
machine from its implementation. A set of computers is offered, with different
price/performance characteristics, that presents the same architecture to the user. The
differences in price and performance are due to different implementations of the same
architecture.
• Micro programmed control unit: Suggested by Wilkes in 1951 and introduced by IBM
on the S/360 line in 1964. Microprogramming eases the task of designing and
implementing the control unit and provides support for the family concept.
• Cache memory: First introduced commercially on IBM S/360 Model 85 in 1968. The
insertion of this element into the memory hierarchy dramatically improves
performance.
• Pipelining: A means of introducing parallelism into the essentially sequential nature of
a machine-instruction program. Examples are instruction pipelining and vector
processing.
• Multiple processors: This category covers a number of different organizations.
15.2 What are the key elements use in the contruction
of RISC hardware? How does it differ from other
manufacturing products?

Although RISC systems have been defined and designed in a variety of ways
by different groups, the key elements shared by most designs are these:
• A large number of general-purpose registers, and/or the use of compiler
technology to optimize register usage
• A limited and simple instruction set
• An emphasis on optimizing the instruction pipeline
As the cost of hardware has dropped, the relative cost of software has risen.
Along with that, a chronic shortage of programmers has driven up software
costs in absolute terms. Thus, the major cost in the life cycle of a system is
software, not hardware. Adding to the cost, and to the inconvenience, is the
element of unreliability: it is common for programs, both system and
application, to continue to exhibit new bugs after years of operation.
15.3 What is the use construction of a circular buffer register?
How many produre activations can be held by N-window register files?
The circular organization is shown in
Figure 13.2, which depicts a circular
buffer of six windows. The buffer is
filled to a depth of 4 (A called B; B
called C; C called D) with procedure D
active.The current-window pointer
(CWP) points to the window of the
currently active procedure. Register
references by a machine instruction are
offset by this pointer to determine the
actual physical register. The saved
window pointer (SWP) identifies the
window most recently saved in
memory. If procedure D now calls
procedure E, arguments for E are
placed in D’s temporary registers (the
overlap between w3 and w4) and the
CWP is advanced by one window.
If procedure E then makes a call to procedure F, the call cannot be made with
the current status of the buffer. This is because F’s window overlaps A’s
window. If F begins to load its temporary registers, preparatory to a call, it will
overwrite the parameter registers of A (A.in).Thus, when CWP is incremented
(modulo 6) so that it becomes equal to SWP, an interrupt occurs, and A’s
window is saved. Only the first two portions (A.in and A.loc) need be saved.
Then, the SWP is incremented and the call to F proceeds. A similar interrupt
can occur on returns. For example, subsequent to the activation of F, when B
returns to A, CWP is decremented and becomes equal to SWP. This causes an
interrupt that results in the restoration of A’s window. From the preceding, it
can be seen that an N-window register file can hold only N-1 procedure
activations. The value of N need not be large.
15.4 Explain the “one instruction per cycle”
characteristic of RISC.

A machine cycle is defined to be the time it takes to fetch two


operands from registers, perform an ALU operation, and store
the result in a register. Thus, RISC machine instructions should be
no more complicated than, and execute about as fast as,
microinstructions on CISC machines (discussed in Part Four).
With simple, one-cycle instructions, there is little or no need for
microcode; the machine instructions can be hardwired. Such
instructions should execute faster than comparable machine
instructions on other machines, because it is not necessary to
access a micro program control store during instruction
execution.
15.5 Define a way of increasing the efficiency of the
pipeline.

• Delayed branch, a way of increasing the efficiency of the pipeline, makes


use of a branch that does not take effect until after execution of the
following instruction (hence the term delayed). The instruction location
immediately following the branch is referred to as the delay slot .The
dependencies of the instructions is viewed in this branching. Thus, the
original semantics of the program are retained but one less clock cycle is
required for execution.

• This interchange of instructions will work successfully for unconditional


branches, calls, and returns. For conditional branches, this procedure
cannot be blindly applied. If the condition that is tested for the branch can
be altered by the immediately preceding instruction, then the compiler
must refrain from doing the interchange
SHORT ANSWER
1. Introduced by IBM with its System/360, the _________ is a set of computers offered with
different price and performance characteristics that presents the same architecture to the user.
family concept

2. A large number of general-purpose registers, and/or the use of compiler technology to


optimize register usage, a limited and simple instruction set, and an emphasis on optimizing
the instruction pipeline are all key elements of _________ architectures.
RISC (reduced instruction set computer)

3. The difference between the operations provided in high-level languages (HLLs) and those
provided in computer architecture is known as the ________.
semantic gap

4. Blocks of memory, recently used global variables, memory addressing, and one operand
addressed and accessed per cycle are characteristics of _________ organizations.
cache

5. Individual variables, compiler assigned global variables, register addressing, and multiple
operands addressed and accessed in one cycle are characteristics of __________ organizations.
large register file
6. The acronym RISC stands for __________.
reduced instruction set computer

7. Although a variety of different approaches to reduced instruction set architecture have


been taken, certain characteristics are common to all of them: register-to-register
operations, simple addressing modes, simple instruction formats, and __________.
one instruction per cycle

8. A ________ is defined to be the time it takes to fetch two operands from registers,
perform an ALU operation, and store the result in a register.
machine cycle

9. The acronym CISC stands for _________.


complex instruction set computer

10. __________ is a way of increasing the efficiency of the pipeline by making use of a
branch that does not take effect until after execution of the following instruction.
Delayed branch
11. ________ can improve performance by reducing loop overhead, increasing instruction
parallelism by improving pipeline performance, and improving register, data cache, or TLB
locality.
Unrolling

12. The MIPS R4000 processor chip is partitioned into two sections, one containing the
CPU and the other containing a _________ for memory management.
coprocessor

13. A ________ architecture replicates each of the pipeline stages so that two or more
instructions at the same stage of the pipeline can be processed simultaneously.
superscalar

14. The acronym SPARC stands for __________.


scalable processor architecture

15. The work that has been done on assessing merits of the RISC approach can be grouped
into two categories: quantitative and _________.
qualitative
Chapter 16 - Instruction-Level Parallelism and
Superscalar Processors

16.1 Overview
16.2 Design Issues
Key Terms
antidependency out-of-order issue
branch prediction output dependency
commit procedural dependency
flow dependency read-write dependency
in-order completion register renaming
in-order issue resource conflict
instruction issue retire
instruction-level parallelism superpipelined
instruction window superscalar
machine parallelism true data dependency
micro-operations write-read dependency
micro-ops write-write dependency
out-of-order completion
REVIEW QUESTIONS

16.1 What is the essential characteristic of the


superscalar approach to processor design?

A superscalar processor is one in which multiple independent


instruction pipelines are used. Each pipeline consists of multiple
stages, so that each pipeline can handle multiple instructions at
a time. Multiple pipelines introduce a new level of parallelism,
enabling multiple streams of instructions to be processed at a
time.
16.2 What is the difference between the superscalar
and superpipelined approaches?

Superpipelining exploits the fact that many pipeline stages


perform tasks that require less than half a clock cycle. Thus, a
doubled internal clock speed allows the performance of two
tasks in one external clock cycle
16.3 What is instruction-level parallelism?

Instruction-level parallelism refers to the degree to which the


instructions of a program can be executed in parallel.
16.4 Briefly define the following terms:
• True data dependency
• Procedural dependency
• Resource conflicts
• Output dependency
• Antidependency
True data dependency: A second instruction needs data
produced by the first instruction. Procedural dependency: The
instructions following a branch (taken or not taken) have a
procedural dependency on the branch and cannot be executed
until the branch is executed. Resource conflicts: A resource
conflict is a competition of two or more instructions for the same
resource at the same time. Output dependency: Two
instructions update the same register, so the later instruction
must update later. Antidependency: A second instruction
destroys a value that the first instruction uses.
16.5 What is the distinction between instruction-level
parallelism and machine parallelism?

Instruction-level parallelism exists when instructions in a


sequence are independent and thus can be executed in parallel
by overlapping. Machine parallelism is a measure of the ability
of the processor to take advantage of instruction-level
parallelism. Machine parallelism is determined by the number of
instructions that can be fetched and executed at the same time
(the number of parallel pipelines) and by the speed and
sophistication of the mechanisms that the processor uses to find
independent instructions.
16.6 List and briefly define three types of superscalar
instruction issue policies.
In-order issue with in-order completion: Issue instructions in
the exact order that would be achieved by sequential execution
and to write results in that same order.
In-order issue with out-of-order completion: Issue instructions
in the exact order that would be achieved by sequential
execution but allow instructions to run to completion out of
order.
Out-of-order issue with out-of-order completion: The processor
has a lookahead capability, allowing it to identify independent
instructions that can be brought into the execute stage.
Instructions are issued with little regard for their original
program order. Instructions may also run to completion out of
order.
16.7 What is the purpose of an instruction window?

For an out-of-order issue policy, the instruction window is a


buffer that holds decoded instructions. These may be issued
from the instruction window in the most convenient order.
16.8 What is register renaming and what is its
purpose?

Registers are allocated dynamically by the processor hardware,


and they are associated with the values needed by instructions
at various points in time. When a new register value is created
(i.e., when an instruction executes that has a register as a
destination operand), a new register is allocated for that value.
16.9 What are the key elements of a superscalar
processor organization?
(1) Instruction fetch strategies that simultaneously fetch multiple
instructions, often by predicting the outcomes of, and fetching beyond,
conditional branch instructions. These functions require the use of
multiple pipeline fetch and decode stages, and branch prediction logic.
(2) Logic for determining true dependencies involving register values,
and mechanisms for communicating these values to where they are
needed during execution.
(3) Mechanisms for initiating, or issuing, multiple instructions in
parallel.
(4) Resources for parallel execution of multiple instructions, including
multiple pipelined functional units and memory hierarchies capable of
simultaneously servicing multiple memory references.
(5) Mechanisms for committing the process state in correct order.
SHORT ANSWER
1. A ________ implementation of a processor architecture is one in which common
instructions can be initiated simultaneously and executed independently.
superscalar

2. The term ________ refers to a machine that is designed to improve the performance of
the execution of scalar instructions.
superscalar

3. ________ exploits the fact that many pipeline stages perform tasks that require less
than half a clock cycle.
Superpipelining

4. The term _________ parallelism refers to the degree to which, on average, the
instructions of a program can be executed in parallel.
instruction-level

5. A _________ is a competition of two or more instructions for the same resource at the
same time.
resource conflict
6. _________ is a measure of the ability of the processor to take advantage of instruction-
level parallelism.
Machine parallelism

7. Committing or _________ the instruction is when instructions are conceptually put back
into sequential order and their results are recorded.
retiring

8. In the operation of the Pentium 4 each instruction is translated into one or more fixed-
length RISC instructions known as _________.
micro-operations (or micro-ops)

9. The ________ takes the already decoded micro-ops from the instruction decoder and
assembles them in to program-ordered sequences of micro-ops called traces.
trace cache

10. The _________ predicts the instruction stream, fetches instructions from the L1
instruction cache, and places the fetched instructions into a buffer for consumption by the
decode pipeline.
instruction fetch unit
11. Instruction-level parallelism is also determined by __________, which is the time until
the result of an instruction is available for use as an operand in a subsequent instruction.
operation latency

12. Superscalar instruction issue policies are grouped into the following categories: in-
order issue with in-order completion, out-of-order issue with out-of-order completion, and
____________.
in-order issue with out-of-order completion

13. With ____________ any number of instructions may be in the execution stage at any
one time, up to the maximum degree of machine parallelism across all functional units.
out-of-order completion

14. The ________ is a buffer used to decouple the decode and execute stages of the
pipeline to allow out-of-order issue.
instruction window

15. An alternative to _________ is a scoreboarding.


register renaming
PART FIVE PARALLEL ORGANIZATION
Chapter 17 - Parallel Processing

17.1 Multiple Processor Organizations


17.2 Symmetric Multiprocessors
17.3 Cache Coherence and the MESI Protocol
17.4 Multithreading and Chip Multiprocessors
17.5 Clusters
17.6 Nonuniform Memory Access
17.7 Vector Computation
Key Terms
active standby
cache coherence
cluster
directory protocol
failback
failover
MESI protocol
multiprocessor
nonuniform memory access (NUMA)
passive standby
snoopy protocol
symmetric multiprocessor (SMP)
uniform memory access (UMA)
uniprocessor
vector facility
REVIEW QUESTIONS
17.1 List and briefly define three types of computer
system organization.
Single instruction, single data (SISD) stream: A single processor
executes a single instruction stream to operate on data stored in a
single memory.
Single instruction, multiple data (SIMD) stream: A single machine
instruction controls the simultaneous execution of a number of
processing elements on a lockstep basis. Each processing element has
an associated data memory, so that each instruction is executed on a
different set of data by the different processors.
Multiple instruction, multiple data (MIMD) stream: A set of
processors simultaneously execute different instruction sequences on
different data sets.
17.2 What are the chief characteristics of an SMP?

1. There are two or more similar processors of comparable capability.


2.These processors share the same main memory and I/O facilities and
are interconnected by a bus or other internal connection scheme, such
that memory access time is approximately the same for each
processor.
3. All processors share access to I/O devices, either through the same
channels or through different channels that provide paths to the same
device.
4. All processors can perform the same functions (hence the term
symmetric).
5. The system is controlled by an integrated operating system that
provides interaction between processors and their programs at the
job, task, file, and data element levels.
17.3 What are some of the potential advantages of an
SMP compared with a uniprocessor?

Performance: If the work to be done by a computer can be organized so that


some portions of the work can be done in parallel, then a system with
multiple processors will yield greater performance than one with a single
processor of the same type.
Availability: In a symmetric multiprocessor, because all processors can
perform the same functions, the failure of a single processor does not halt
the machine. Instead, the system can continue to function at reduced
performance.
Incremental growth: A user can enhance the performance of a system by
adding an additional processor.
Scaling: Vendors can offer a range of products with different price and
performance characteristics based on the number of processors configured in
the system.
17.4 What are some of the key OS design issues for an SMP?

Simultaneous concurrent processes: OS routines need to be reentrant to allow


several processors to execute the same IS code simultaneously. With multiple
processors executing the same or different parts of the OS, OS tables and
management structures must be managed properly to avoid deadlock or invalid
operations. Scheduling: Any processor may perform scheduling, so conflicts must
be avoided. The scheduler must assign ready processes to available processors.
Synchronization: With multiple active processes having potential access to shared
address spaces or shared I/O resources, care must be taken to provide effective
synchronization. Synchronization is a facility that enforces mutual exclusion and
event ordering. Memory management: Memory management on a multiprocessor
must deal with all of the issues found on uniprocessor machines, as is discussed in
Chapter 8. In addition, the operating system needs to exploit the available
hardware parallelism, such as multiported memories, to achieve the best
performance. The paging mechanisms on different processors must be coordinated
to enforce consistency when several processors share a page or segment and to
decide on page replacement. Reliability and fault tolerance: The operating system
should provide graceful degradation in the face of processor failure. The scheduler
and other portions of the operating system must recognize the loss of a processor
and restructure management tables accordingly.
17.5 What is the difference between software and
hardware cache coherent schemes?

Software cache coherence schemes attempt to avoid the need


for additional hardware circuitry and logic by relying on the
compiler and operating system to deal with the problem. In
hardware schemes, the cache coherence logic is implemented in
hardware.
17.6 What is the meaning of each of the four states in
the MESI protocol?

Modified: The line in the cache has been modified (different


from main memory) and is available only in this cache.

Exclusive: The line in the cache is the same as that in main


memory and is not present in any other cache.

Shared: The line in the cache is the same as that in main


memory and may be present in another cache.

Invalid: The line in the cache does not contain valid data.
17.7 What are some of the key benefits of clustering?
Absolute scalability: It is possible to create large clusters that far surpass the
power of even the largest standalone machines.

Incremental scalability: A cluster is configured in such a way that it is possible


to add new systems to the cluster in small increments. Thus, a user can start
out with a modest system and expand it as needs grow, without having to go
through a major upgrade in which an existing small system is replaced with a
larger system.

High availability: Because each node in a cluster is a standalone computer,


the failure of one node does not mean loss of service.

Superior price/performance: By using commodity building blocks, it is


possible to put together a cluster with equal or greater computing power
than a single large machine, at much lower cost.
17.8 What is the difference between failover and
failback?

The function of switching an applications and data resources


over from a failed system to an alternative system in the cluster
is referred to as failover. A related function is the restoration of
applications and data resources to the original system once it has
been fixed; this is referred to as failback.
17.9 What are the differences among UMA, NUMA,
and CC-NUMA?

Uniform memory access (UMA): All processors have access to all parts of
main memory using loads and stores. The memory access time of a processor
to all regions of memory is the same. The access times experienced by
different processors are the same.

Nonuniform memory access (NUMA): All processors have access to all parts
of main memory using loads and stores. The memory access time of a
processor differs depending on which region of main memory is accessed.
The last statement is true for all processors; however, for different processors,
which memory regions are slower and which are faster differ.

Cache-coherent NUMA (CC-NUMA): A NUMA system in which cache


coherence is maintained among the caches of the various processors.
SHORT ANSWER
1. Computer systems that fall into the __________ category have a single processor that
executes a single instruction stream to operate on data stored in a single memory.
SISD (single instruction, single data stream)

2. Computer systems that fall into the _________ category have a single machine instruction
that controls the simultaneous execution of a number of processing elements on a lockstep
basis.
SIMD (single instruction, multiple data stream)

3. Computer systems that fall into the __________ category have a sequence of data that is
transmitted to a set of processors, each of which executes a different instruction sequence.
MISD (multiple instruction, single data stream)

4. Computer systems that fall into the ________ category have a set of processors that
simultaneously execute different instruction sequences on different data sets.
MIMD (multiple instruction, multiple data stream)

5. A ________ is a group of interconnected, whole computers working together as a unified


computing resource that can create the illusion of being one machine.
cluster
6. The __________ is the simplest mechanism for constructing a multiprocessor system.
time-shared bus

7. ______ protocols distribute the responsibility for maintaining cache coherence among
all of the cache controllers in a multiprocessor.
Snoopy

8. The four states of the MESI protocol are: modified, shared, invalid, and ______.
exclusive

9. An approach that allows for a high degree of instruction-level parallelism without


increasing circuit complexity or power consumption is called ________.
multithreading

10. Two key characteristics of a process are: scheduling/execution and ________.


resource ownership
11. The four principal approaches to multithreading are: interleaved (fine-grained),
blocked (coarse-grained), simultaneous, and ________.
chip multiprocessing

12. _________ is the easiest multithreading approach to implement.


Interleaved multithreaded scalar

13. Widely used in data centers to save space and improve system management, a
_________ is a server architecture that houses multiple server modules in a single chassis.
blade server

14. An approach where all processors have access to all parts of main memory using loads
and stores, with the memory access time of a processor differing depending on which
region of main memory is accessed, is _________.
NUMA (nonuniform memory access)

15. A good example of a pipelined ALU organization for vector processing is the vector
facility developed for the _________ architecture and implemented on the high-end 3090
series.
IBM 370
Chapter 18 - Multicore Computers

18.1 Hardware Performance Issues


18.2 Software Performance Issues
18.3 Multicore Organization
Key Terms

Amdahl’s law
chip multiprocessor
multicore
simultaneous multithreading (SMT)
superscalar
REVIEW QUESTIONS

18.1 List the major organizational changes that have


occurred in processor design in chronological order?
The organizational changes in zno design have primarily been focused on
increasing instruction-level parallelism, so that more work could be done in
each clock cycle. These changes include, in chronological order:
• Pipelining: Individual instructions are executed through a pipeline of stages
so that while one instruction is executing in one stage of the pipeline, another
instruction is executing in another stage of the pipeline.
Superscalar: Multiple pipelines are constructed by replicating execution
resources. This enables parallel execution of instructions in parallel pipelines,
so long as hazards are avoided.
Simultaneous multithreading (SMT): Register banks are replicated so that
multiple threads can share the use of pipeline resources.
18.2 Give several reasons for the choice by designers
to move to a multicore organization rather than
increase parallelism within a single processor.

In the case of pipelining, simple 3-stage pipelines were replaced by pipelines


with 5 stages, and then many more stages, with some implementations
having over a dozen stages. There is a practical limit to how far this trend can
be taken, because with more stages, there is the need for more logic, more
interconnections, and more control signals. With superscalar organization,
performance increases can be achieved by increasing the number of parallel
pipelines. Again, there are diminishing returns as the number of pipelines
increases. More logic is required to manage hazards and to stage instruction
resources. Eventually, a single thread of execution reaches the point where
hazards and resource dependencies prevent the full use of the multiple
pipelines available. This same point of diminishing returns is reached with
SMT, as the complexity of managing multiple threads over a set of pipelines
limits the number of threads and number of pipelines that can be effectively
utilized.
18.3 Valve found that a hybrid threading approach was the most
promising and would scale the best as multicore systems with
eight or sixteen processors became avaiable. List some of the key
elements of their threading strategy for the rendering module.

Some of the key elements of the threading strategy for the rendering
module
Are listed in [LEON07] and include the following:
• Construct scene-rendering lists for multiple scenes in parallel (e.g.,
the world and its reflection in water).
• Overlap graphics simulation.
• Compute character bone transformations for all characters in all
scenes in parallel.
• Allow multiple threads to draw in parallel.
18.5 At a top level, what are the main design variables
in a multicore organization?

• The number of core processors on the chip

• The number of levels of cache memory

• The amount of cache memory that is shared


18.6 List some advantages of a shared L2 cache among cores
compared to separate dedicated L2 caches for each core.
1. Constructive interference can reduce overall miss rates. That is, if a thread on one
core accesses a main memory location, this brings the frame containing the referenced
location into the shared cache. If a thread on another core soon thereafter accesses
the same memory block, the memory locations will already be available in the shared
on-chip cache.

2. A related advantage is that data shared by multiple cores is not replicated at the
shared cache level.

3. With proper frame replacement algorithms, the amount of shared cache allocated
to each core is dynamic, so that threads that have a less locality can employ more
cache.

4. Interprocessor communication is easy to implement, via shared memory locations.

5. The use of a shared L2 cache confines the cache coherency problem to the L1 cache
level, which may provide some additional performance advantage.
SHORT ANSWER
1. _________ states that performance increase is roughly proportional to square root of
increase in complexity.
Pollack’s rule

2. _______ law assumes a program in which a fraction (1- f )of the execution time involves
code that is inherently serial and a fraction f that involves code that is infinitely parallelizable
with no scheduling overhead.
Amdahl’s

3. __________ applications are characterized by having a small number of highly threaded


processes.
Multithreaded native

4. _________ applications are characterized by the presence of many single-threaded


processes.
Multiprocess

5. ________ is a multithreaded process that provides scheduling and memory management


for Java applications.
Java Virtual Machine
6. _______ is an animation engine used by Valve for its games and licensed for other
game developers.
Source

7. ________ threading involves the selective use of fine-grain threading for some systems
and single threading for other systems.
Hybrid

8. ________ threading is when many similar or identical tasks are spread across multiple
processors.
Fine-grained

9. Individual modules called systems are assigned to individual processors with ________
threading.
coarse

10. The ________, introduced in 2006, implements two x86 superscalar processors with a
shared L2 cache.
Intel Core Duo
11. The Core Duo _________ is designed to manage chip heat dissipation to maximize
processor performance within thermal constraints.
thermal control unit

12. The __________ is responsible for reducing power consumption when possible, thus
increasing battery life for mobile platforms, such as laptops.
power management logic

13. A multicore computer, also known as a _________, combines two or more processors
on a single piece of silicon.
chip microprocessor

14. A single piece of silicon is called a ________.


die

15. The _________ is a cache-coherent, point-to-point link based electrical interconnect


specification for Intel processors and chipsets that enable high-speed communications
among connected processor chips.
QPI (Quick Path Interconnect)
PART SIX THE CONTROL UNIT
Chapter 19 - Control Unit Operation

19 - Control Unit Operation


19.1 Micro-operations
19.2 Control of the Processor
19.3 Hardwired Implementation
Key Terms

control bus
control path
control signal
control unit
hardwired
implementation
microoperations
REVIEW QUESTIONS

19.1 Explain the distinction between the written sequence and


the time sequence of an instruction.
19.2 What is the relationship between instructions and micro-
operations?
19.3 What is the overall function of a processor's control unit?
19.4 Outlinea three-step process that leads to a characterization
of the control unit.
19.5 What basic tasks does a control unit perform?
19.6 Provide a typical list of the inputs and outputs of a control
unit.
19.7 List three types of control signals.
19.8 Briefly explain what is meant by a hardwired
implementation of a control unit.
SHORT ANSWER
1. The execution of an instruction involves the execution of a sequence of substeps, generally
called ________.
cycles

2. The ____________ of a processor causes the processor to step through a series of micro-
operations in the proper sequence, based on the program being executed.
control unit

3. The _________ of a processor generates the control signals that cause each micro-
operation to be executed.
control unit

4. The ____________ generated by the control unit cause the opening and closing of logic
gates, resulting in the transfer of data to and from registers and the operation of the ALU.
control signals

5. The six things needed to specify the function of a processor are: operations (opcodes),
addressing modes, registers, I/O module interface, memory module interface, and ________.
interrupts
6. Each of the smaller cycles involves a series of steps, each of which involves the
processor registers, referred to as _________.
micro-operations

7. The __________ register specifies the address in memory for a read or write
operation.
memory address (MAR)

8. The __________ register contains the value to be stored in memory or the last value
read from memory.
memory buffer (MBR)

9. If the instruction specifies an indirect address, then a(n) ________ cycle must precede
the execute cycle.
indirect

10. __________ is when the control unit examines the opcode and generates a sequence
of micro-operations based on the value of the opcode.
Instruction decoding
11. The key control unit inputs are: clock, instruction register, control signals from control
bus, and _________.
flags

12. The timing of processor operations is synchronized by the __________ and controlled
by the control unit with control signals.
clock

13. Control unit implementation techniques fall into two categories: microprogrammed
implementation and ___________ implementation.
hardwired

14. The _____________ must control the state of the instruction cycle.
control unit

15. In a __________ implementation the control unit is essentially a state machine circuit
and its input logic signals are transformed into a set of output logic signals, which are the
control signals.
hardwired
Chapter 20 - Microprogrammed Control

20.1 Basic Concepts


20.2 Microinstruction Sequencing
20.3 Microinstruction Execution
Key Terms

control memory microinstructions


control word microprogram
firmware microprogrammed
hard microprogramming control unit
horizontal microprogramming
microinstruction language
microinstruction soft microprogramming
encoding unpacked
microinstruction microinstruction
execution vertical microinstruction
microinstruction
sequencing
REVIEW QUESTIONS
20.1 What is the difference between a hardwired implementation and
a microprogrammed implementation of a control unit?
20.2 How is a horizontal microinstruction interpreted?
20.3 What is the purpose of a control memory?
20.4 What is a typical sequence in the execution of a horizontal
microinstruction?
20.5 What is the difference between horizontal and vertical
microinstructions?
20.6 What are the basic tasks performed by a microprogrammed
control unit?
20.7 What is the difference between packed and unpacked
microinstructions?
20.8 What is the difference between hard and soft microprogramming?
20.9 What is the difference between functional and resource
encoding?
20.10 List some common applications of microprogramming.
SHORT ANSWER
1. An alternative to a hardwired control unit is a __________ control unit in which the logic
of the control unit is specified by a microprogram.
microprogrammed

2. The __________ generated by a microinstruction are used to cause register transfers


and ALU operations.
control signals

3. Each line of a microprogramming language describes a set of micro-operations occurring


at one time and is known as a ________.
microinstruction

4. A sequence of instructions is known as a ___________, or firmware.


microprogram

5. In a __________ microinstruction every bit in the control field attaches to a control line.
horizontal
6. In a ________ microinstruction a code is used for each action to be performed and the
decoder translates this code into individual control signals.
vertical

7. Microprogramming is the dominant technique for implementing control units in pure


_________ architectures due to its ease of implementation.
CISC

8. _________ processors, with their simpler instruction format, typically use hardwired
control units.
RISC

9. The two basic tasks performed by a microprogrammed control unit are microinstruction
sequencing and microinstruction __________.
execution

10. A ________________ instruction depends on the following types of information: ALU


flags, part of the opcode or address mode fields of the machine instruction; parts of a
selected register -- such as the sign bit, and status bits within the control unit.
conditional branch
11. The __________ approach involves the use of a microinstruction address that has
previously been saved in temporary storage within the control unit.
residual control

12. Each microinstruction cycle is made up of two parts: fetch and _________.
execute

13. Two approaches can be taken to organizing the encoded microinstruction into fields:
functional and __________.
resource

14. The LSI-11 is a good example of a __________ microinstruction approach.


vertical

15. The principal function of the 8818 __________ is to generate the next microinstruction
address for the microprogram.
microsequencer
Key Terms – Assembly Language
assembler loading
assembly language macro
comment mnemonic
directive one-pass assembler
dynamic linker operand
instruction relocation
label run-time dynamic linking
linkage editor two-pass assembler
linking
load-time dynamic linking

You might also like