Ch-6-Functional Organization
Ch-6-Functional Organization
FUNCTIONAL ORGANIZATION
1
Implementation of
Simple Data Paths
2
The Big Picture: Where are We
Now?
• The Five Classic Components of a Computer
Processor
Input
Control
Memory
Datapath
Output
CPU Basics
• The computer’s CPU fetches, decodes, and
executes program instructions.
• The two principal parts of the CPU are the
datapath and the control unit.
– The datapath consists of an arithmetic-logic unit and
storage units (registers) that are interconnected by a
data bus that is also connected to main memory.
– Various CPU components perform sequenced
operations according to signals provided by its
control unit.
4
Organization of a Simple Computer
11
Register transfer language
• Two-character names denote registers, such as R0, R1, DR, or
SA.
• Arrows indicate data transfers. To copy the contents of the
source register R2 into the destination register R1 in one
clock cycle:
R1 ← R2
• A conditional transfer is performed only if the Boolean
condition in front of the colon is true. To transfer R3 to R2
when K = 1:
K: R2 ← R3
• Multiple transfers on the same clock cycle are separated by
commas.
R1 ← R2, K: R2 ← R3
12
Register transfer operations
• We can apply arithmetic operations to registers.
R1 ← R2 + R3
R3 ← R1 - 1
Logical operations are applied bitwise. AND and OR are denoted
with special symbols, to prevent confusion with arithmetic
operations.
R2 ← R1 ∧ R2 bitwise AND
R3 ← R0 ∨ R1 bitwise OR
• Lastly, we can shift registers. Here, the source register R1 is
not modified, and we assume that the shift input is just 0.
R2 ← sl R1 left shift
R2 ← sr R1 right shift
13
CONTROL UNIT:
Hardwired realization vs.
Micro Programmed realization
14
Micro Operations
• The execution of an instruction involves a
sequence of substeps, generally called cycles.
• For e.g: an execution may consist of fetch,
indirect, execute and interrupt cycles.
• Each cycle is in turn made up of a sequence of
more fundamental operations, called micro
operations which are functional or atomic
operations of a CPU.
• A single micro-operation generally involves a
transfer between two registers, a transfer
between a register and an external bus, or a
single ALU operation.
15
The figure - depicts the relationship among the
various concepts in the execution of a program which
consists of sequential execution of instructions.
16
• Each instruction is executed during an
instruction cycle made up of shorter subcycles
like fetch, indirect, execute, and interrupt.
17
• MAR (PC)
• Move contents of PC to MAR
• MBR Memory
• Move contents of memory location
specified by MAR to MBR
• (PC) +1 PC
• Incrementing PC to fetch next instruction.
• IR (MBR)
• Move contents of MBR to IR.
18
Functions of Control Unit
• Sequencing
– Causing the CPU to step through a series of micro-
operations
• Execution
– Causing the performance of each micro-op
• This is done using Control Signals
Control Signals
• Clock
– One micro-instruction (or set of parallel micro-
instructions) per clock cycle
• Instruction register
– Op-code for current instruction
– Determines which micro-instructions are performed
• Flags
– State of CPU
– Results of previous operations
• From control bus
– Interrupts
– Acknowledgements
Model of Control Unit
The inputs are as follows
• Clock :- this is how CU “keeps time”. The CU causes
one micro-operation (or a set of simultaneous micro-
operations.) to be performed for each clock pulse.
• Instruction Register: The opcode of the current
instruction is used to determine which micro-
operation to perform during the execute cycle.
• Flags: These are needed by the CU to determine the
status of the processor and the outcome of previous
ALU operations.
• Control Signals from System Bus: - The control bus
portion of the system bus provides signals to the CU,
such as interrupt signals and acknowledgments.
22
The Outputs are as follows:
• Control signals within the processor :- These are of
two types. Those that cause data to be moved from
one register to another and those that activate
specific ALU functions.
• Control Signals to system bus;- These are also of two
types: control signals to memory and control signals
to the I/O module.
• The control signals generated by the CU cause
opening and closing of logic-gates; resulting in the
transfer of data to and from registers and the
operations of ALU.
23
• Consider the fetch cycle (as an example) to
see how the CU maintains control.
• The first step is to transfer the contents of the
PC to MAR.
• The CU does this by activating the control
signal that open the logic gates between the
PC and MAR.
• The next step is to read a word from memory
into the MBR and increment the PC.
24
• After an instruction has been fetched into the
instruction register (IR), decoding begins with
an address select or sequencing logic method .
• This identifies the instruction and transfer it to
the microinstruction PC.
• Each micro instruction is an encoded
representation of a micro-operation, which
when executed, issues control signals to the
data path.
25
• The CU does this by sending the following
control signals simultaneously.
– A control signal that opens gates, allowing the
contents of the MAR onto the address bus.
– A memory read control signal (RD)on the control bus.
– A control signal that opens the gates; allowing the
contents of data bus to be stored in MBR.
– A control signal to the logic gates that add 1 to the
contents of PC and store the result back to the PC.
– Following this, CU sends a control signal that opens the gates
between the MBR and the IR.
– This completes the fetch cycle.
26
• The CU must decide whether to perform an
indirect cycle or an execute cycle next.
• To decide this, it examines the IR to see
whether an indirect memory reference is
made or not.
• The indirect and interrupt cycles work
similarly.
• For the execute cycle, the CU begins by
examining the opcode and, on the basis of
that, it decides which sequence of micro-
operations to perform for the execute cycle.
27
Control Unit Implementation
• A wide variety of techniques have been used
for the control unit implementation.
• Most of these fall into one of the two
categories.
–Hardwired control
–Micro programmed control
28
Hardwired control unit
• A hardwired control unit is essentially a logic block-
consisting of gates, flip-flops, decoders and other
digital circuits.
• When an instruction is read into the IR, the bit
pattern provides an input to the logic block.
• The output bit pattern provides control signals for
the data path together with information needed to
control the timing signal on the next clock cycle.
• In this way each instruction causes an appropriate
sequence of control signals to be generated.
29
Hardwired control unit
30
• Hardwired control unit can only be altered by
rearranging the wiring used to connect the various
logic components together.
• This is a time consuming exercise and usually
involves re-designing the complete logic block.
• On the other hand, hardwired control units are faster
than microprogrammed control units. This is
because, they avoid micro program memory read
operations, which tend to be slower than the basic
Boolean operations performed by the hardwired
decoder logic.
31
The main points to be noted about hardwired control
units are:
• They minimize the average number of clock cycles needed
per instructions.
• They occupy a relatively small area (typically 10% of the
CPU chip area)
• They are less flexible than micro programmed control units
and cannot be easily modified without extensive re-design.
• They are impractical for use with complex instruction
format.
• Complex sequencing & micro-operation logic
• Difficult to design and test
• Inflexible design
• Difficult to add new instructions 32
Micro-programmed control unit
• The control information is stored in a control memory,
and the control memory is programmed to initiate the
required sequence of microoperations
• Any required change can be done by updating the
microprogram in control memory
• In many ways, a microprogrammed control unit is like a
CPU within a CPU having its own micro instruction
program counter, which uses to access a microprogram
stored in a ROM or programmed logic Array (PLA).
• A microprogram consists of a set of micro instructions,
each microinstruction having a bit pattern which controls
the movement of information through the data path.
33
Microprogrammed control unit
34
The main points to be noted about a
micro programmed control unit are : -
• They are flexible and allow designers to incorporate
new and more powerful instructions as technology
increases the available chip area for the CPU.
• They allow removal of any design errors which found
during the prototyping stage;
• They require several clock cycles to execute each
instruction, due to the access time of the micro
program memory.
• They occupy large portion (typically 55% ) of the CPU
chip area.
Slow operation
35
Introduction to
Instruction-Level Parallelism (ILP)
36
Introduction
• Pipelining become universal technique in 1985
– Overlaps execution of instructions
– Exploits “Instruction Level Parallelism”
40
Instruction Level Parallelism (ILP)
• Suppose we have an expression of the form x
= (a+b) * (c-d)
• Assuming a,b,c & d are in registers, this might
turn into
ADD R0, R2, R3
SUB R1, R4, R5
MUL R0, R0, R1
STR R0, x
ILP (cont)
• The MUL has a dependence on
the ADD and the SUB, and the
STR has a dependence on the ADD R0, R2, R3
MUL SUB R1, R4, R5
MUL R0, R0, R1
• However, the ADD and SUB are STR R0, x
independent
• In theory, we could execute
them in parallel, even out of
order
The Data Flow Graph
• We can see this more clearly if we draw the
data flow graph
R2 R3 R4 R5
x
Amount of ILP?
• This is obviously a very simple example
• However, real programs often have quite a
few independent instructions which could be
executed in parallel
• Exact number is clearly program dependent
but analysis has shown that maybe 4 is not
uncommon (in parts of the program anyway).
Instruction Pipelining
45
Instruction Pipelining
• As computer systems evolve, greater
performance can be achieved by taking
advantage of improvements in technology,
such as faster circuitry.
• In addition, organizational enhancements to
the CPU can improve performance.
• We have already seen some examples of this
such as use of multiple registers and use of a
cache memory cells.
46
• Another organizational approach which is
quite common nowadays is instruction
pipelining.
• Instruction pipelining is similar to the use of
an assembly line in a manufacturing plant.
• An assembly line takes advantage of the fact
that a product goes through various stages of
production.
• By laying the production process out in an
assembly line, products at various stages can
be worked on simultaneously. This process is
also referred to as pipelining. 47
• To apply this concept to instruction execution,
we must recognize that, in fact, an instruction
execution process has a number of stages.
• Clearly there should be some opportunity for
pipelining.
48
• As a simple approach, consider subdividing
instruction processing into two stages:-- Fetch
instruction and execute instruction.
• There are times during the execution of an
instruction when main memory is not being
accessed.
• This time could be used to fetch the next
instruction in parallel with the execution of
the current one.
• This pipelining method is known as instruction
‘prefetch’ or ‘fetch overlap’.
49
• It should be clear that this process will speed up instruction
execution.
• Doubling the execution rate is unlikely for some various
reasons like the following:
– The execution time will generally be longer than fetch
time.
– Execution will involve reading, storing the operands and
doing the operations. Thus the fetch stage may have to
wait for some time before it can empty its buffer.
– A conditional branch instruction makes the fetch cycle to
wait until it receives the next instruction from the
execute stage.
– The execute stage may then have to wait while the next
instruction is fetched.
50
• To gain further speed up, the pipeline must
have more stages.
• Consider the following decomposition of the
instruction processing.
51
Fetch instruction (FI)
Read the next expected instruction into a buffer.
Decode Instruction (DI)
Determine the opcode and the operand specifiers.
Calculate Operands (CO)
Calculation of effective address of each source
operand. This may involve displacement , register
indirect, or other forms of address calculation.
Fetch operands (FO)
Fetch each operand from memory.
Execute Instruction (EI)
Perform the indicated operation.
Write Operands (WO)
Store the result in
52
• The figure below shows that 6 stage pipeline
can reduce the execution time for 9
instructions (as an example) from 54 time
units to 14 time units.
• let us assume all 9 instructions are of equal
durations.
53
Timing diagram for Instruction Pipeline operation
54
• The diagram assumes that all of the stages can
be performed in parallel. In particular it is
assumed that there are no memory conflicts.
For e.g. FI,FO and WO stages involve a
memory access.
• The diagram also implies that all these access
can occur simultaneously, but most memory
systems will not permit that.
• However the desired value may be in the
cache or the FO / WO stage may be null.
• Thus much of the time memory conflicts will
not slow down the pipeline.
55
• The diagram assumes that each instruction
goes through all six stages of pipeline.
• This will not always be the case.
• For example, a LOAD instruction does not
need the WO stage.
• Several other factors serve to limit the
performance enhancements.
• If the 6 stages are not of equal duration there
will be some waiting involved at various
pipeline stages.
56
Pipelining
– This breaks up the instruction execution into two
parts - fetch and execute.
– In pipelining, we break an instruction up into
many parts, each one handled by dedicated
hardware units running in parallel.
– Each unit is called a stage. After the pipeline is
filled, an instruction completes at each (longest
stage length) time interval. This time interval is
the clock cycle of the CPU. The time to fill the
pipeline is called the latency.
Pipelining
a) A five-stage pipeline
b) The state of each stage as a function of time. Nine clock cycles are
illustrated
Limits to Pipelining
• Hazards prevent next instruction from executing
during its designated clock cycle
– Structural hazards: attempt to use the same
hardware to do two different things at once
– Data hazards: Instruction depends on result of
prior instruction still in the pipeline
– Control hazards: Caused by delay between the
fetching of instructions and decisions about
changes in control flow (branches and jumps).
59
END
60