Unit 2-Basic Processing Unit
Unit 2-Basic Processing Unit
UNIT II
Overview
Instruction Set Processor (ISP) or Central Processing
Unit (CPU) – the unit which executes instructions and
coordinates the activities of the other units
A typical computing task consists of a series of steps
specified by a sequence of machine instructions that
constitute a program.
An instruction is executed by carrying out a sequence
of more fundamental operations.
Some Fundamental
Concepts
Fundamental Concepts
Processor fetches one instruction at a time and perform
the operation specified.
Instructions are fetched from successive memory
locations until a branch or a jump instruction is
encountered.
Processor keeps track of the address of the memory
location containing the next instruction to be fetched
using Program Counter (PC).
Instruction Register (IR) – holds the instruction fetched
from memory
Executing an Instruction
Fetch the contents of the memory location pointed to by
the PC. The contents of this location are loaded into the
IR (fetch phase).
IR ← [[PC]]
Assuming that an instruction is 4bytes long, increment
the contents of the PC by 4 to point to the next
instruction. (fetch phase)
PC ← [PC] + 4
Carry out the actions specified by the instruction in the IR
(execution phase).
Processor Organization
Internal organization of the
processor
ALU – Used to perform arithmetic and logical operation.
Registers for temporary storage
Various digital circuits for executing different micro
operations.(gates, MUX, decoders, counters).
Internal path for movement of data between ALU and
registers.
Driver circuits for transmitting signals to external units.
Receiver circuits for incoming signals from external
units.
PC:
Keeps track of execution of a program
Contains the memory address of the next instruction to be fetched and
executed.
MAR:
Holds the address of the location to be accessed.
I/p of MAR is connected to Internal bus and an O/p to external bus.
MDR:
Contains data to be written into or read out of the addressed location.
It has 2 inputs and 2 Outputs.
Data can be loaded into MDR either from memory bus or from internal
processor bus.
The data and address lines are connected to the internal bus via MDR
and MAR
Registers:
The processor registers R0 to Rn-1 vary considerably from one
processor to another.
Registers are provided for general purpose, used by programmer.
Special purpose registers-index & stack registers.
Registers Y,Z &TEMP are temporary registers used by processor
during the execution of some instruction.
Multiplexer:
Select either the output of the register Y or a constant value 4 to be
provided as input A of the ALU.
Constant 4 is used by the processor to increment the contents of PC.
Data Path:
The registers, ALU and interconnecting bus are collectively
referred to as the data path.
Executing an Instruction
Execution of an instruction involves one or more of the
following operations:
Ri
Riout
Yin
Constant 4
Select MUX
A B
ALU
Zin
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
The input and output gates for register Ri are
controlled by signals Riin and Riout .
3. Zout, R3in
Processor Organization
MDR
Figure 7.4.
Figure 7.4. Connection and control
Connection and controlsignals
signalsfor
forregister
registerMDR.
MDR.
Fetching a Word from Memory
The response time of each memory access varies (cache miss)
To accommodate this, the processor waits until it receives an
indication that the requested operation has been completed
(Memory-Function-Completed, MFC).
Move (R1), R2
MAR ← [R1] R1out, MARin, Read
Start a Read operation on the memory bus
Wait for the MFC response from the memory MDRinE,WMFC
Load MDR from the memory bus
R2 ← [MDR]
MDRout, R2in
MDR
Timing Clock
MR
MDRinE
Data
1. R1out, MARin
2. R2out, MDRin, Write
3. MDRoutE, WMFC
Execution of a Complete
Instruction
Add (R3), R1
Fetch the instruction
Fetch the first operand (the contents of the
memory location pointed to by R3)
Perform the addition
Store the result into R1
Execution of a Complete
Instruction Internal processor
bus
PC
Instruction
Step Action Address
decoder and
lines
MAR control logic
Step Action
Incrementer
PC
Register
file
Constant 4
MUX
A
ALU R
Instruction
decoder
IR
MDR
MAR
on three-bus organization PC
Register
file
Step Action
Constant 4
MUX
1 PCout , R=B, MAR in , Read, IncPC A
ALU R
2 WMFC B
MDR
MAR
Incrementer
PC
WMFC
Constant 4
MUX
MDRoutB, R=B, IRin
A
ALU R
WMFC
decoder
IR
MAR
Control signals
MDRout, Irin
PC
3. Address
lines
Instruction
decoder and
MAR control logic
R1out, Yin
Memory
4.
bus
MDR
Data
lines IR
Select MUX
ALU
B
R n - 1
Carry-in
XOR TEMP
T4 T6
T1
Zin is asserted during time slot T1 for all instructions, during T6 for an Add instruction,
during T4 for an unconditional branch instruction, and son
Control signals for Add (R3), R1
Step Action
Instruction Data
cache cache
Bus interface
Processor
System bus
Main Input/
memory Output
MDRout
WMFC
MAR in
Select
PCout
R1out
R3out
Micro -
Read
PCin
R1 in
Z out
Add
End
IRin
Yin
instruction
Zin
1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0
2 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0
3 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
4 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0
5 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0
6 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1
Step Action
Figure 7.6. Con trol sequence for execution of the instruction Add (R3),R1.
Control Word (CW) is a word whose individual
bits represent various control signals.
Every instruction will need a sequence of CWs for
its execution.
At every step, some control signals are asserted
(=1) and all others are 0
Sequence of CWs for an instruction forms the
microroutine for that instruction.
Each CW in this microroutine is referred to as a
microinstruction.
Every instruction will have its own microroutine
which is made up of microinstructions.
Microroutines for all instructions in the instruction set
of a computer are stored in a special memory called
Control Store.
Control signals are generated by sequentially reading
the CWs of the corresponding microroutine from the
control store.
Basic organization of a microprogrammed
control unit
Microprogram counter (mPC) is used to read
CWs from control store sequentially.
When a new instruction is loaded into IR, starting
address generator generates the starting address of
the microroutine.
This address is loaded into the mPC. mPC is
automatically incremented by the clock, so
successive microinstructions are read from the
control store.
Execution of conditional Branch
Instruction
Step Action
Starting and
branch address Condition
IR codes
generator
Clock PC
Control
store CW
6 PM 7 8 9 10 11 Midnight
Time
30 40 20 30 40 20 30 40 20 30 40 20
• Sequential laundry takes 6 hours
A for 4 loads
• If they learned pipelining, how
long would laundry take?
B
D
Traditional Pipeline Concept
6 PM 7 8 9 10 11 Midnight
Time
T
a 30 40 40 40 40 20
s
k A
• Pipelined laundry takes 3.5
hours for 4 loads
O B
r
d C
e
r D
Traditional Pipeline Concept
6 PM 7 8 9
• Pipelining doesn’t help latency
Time of single task, it helps
throughput of entire workload
T
30 40 40 40 40 20 • Pipeline rate limited by slowest
a
pipeline stage
s
A • Multiple tasks operating
k
simultaneously using different
resources
O B
• Potential speedup = Number
r pipe stages
d C • Unbalanced lengths of pipe
e stages reduces speedup
r • Stall for Dependences
D
Use the Idea of Pipelining in a
Computer
Fetch + Execution
T ime
I1 I2 I3
Time
Clock cycle 1 2 3 4
F E F E F E
1 1 2 2 3 3 Instruction
I1 F1 E1
(a) Sequential execution
I2 F2 E2
Interstage buffer
B1
I3 F3 E3
Instruction Execution
fetch unit (c) Pipelined execution
unit
Instruction
I1 F1 D1 E1 W1
Fetch + Decode
+ Execution + Write I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
Interstage buffers
D : Decode
F : Fetch instruction E: Execute W : Write
instruction and fetch operation results
operands
B1 B2 B3
Instruction
I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
I5 F5 D5 E5
Figure 8.3. Effect of an execution operation taking more than one clock cycle.
Pipeline Performance
• The previous pipeline is said to have been stalled for two clock cycles.
• Structural hazard – the situation when two instructions require the use
of a given hardware resource at the same time.
Pipeline Performance
Time
Clock cycle 1 2 3 4 5 6 7 8 9
Instruction Instruction
hazard I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
Time
Clock cycle 1 2 3 4 5 6 7 8 9
Stage
F: Fetch F1 F2 F2 F2 F2 F3 Idle periods –
D: Decode D1 idle idle idle D2 D3 stalls (bubbles)
E: Execute E1 idle idle idle E2 E3
Instruction
I1 F1 D1 E1 W1
I2 (Load) F2 D2 E2 M2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4
I5 F5 D5
Instruction
I1 (Mul) F1 D1 E1 W1
I2 (Add) F2 D2 D2A E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
SRC1 SRC2
Register
file
ALU
RSLT
Destination
(a) Datapath
SRC1,SRC2 RSLT
E: Execute W: Write
(ALU) (Register file)
Forwarding path
(b) P osition of the source and result registers in the processor pipeline
84
Graph of Moore’s Law
11/01/2008 EADS 85
Single-core computer
Limits
• Both physical and practical reasons pose significant
constraints to simply building ever faster serial computers:
– Transmission speeds - the speed of a serial computer is directly
dependent upon how fast data can move through hardware.
Absolute limits are the speed of light (30 cm/nanosecond) and
the transmission limit of copper wire (9 cm/nanosecond).
Increasing speeds necessitate increasing proximity of processing
elements.
– Limits to miniaturization - processor technology is allowing an
increasing number of transistors to be placed on a chip.
However, even with molecular or atomic-level components, a
limit will be reached on how small components can be.
– Economic limitations - it is increasingly expensive to make a
single processor faster. Using a larger number of moderately fast
commodity processors to achieve the same (or better)
performance is less expensive.
Limits
• Relation btwn frequency of operation and power
consumption is given by :
Power= ½ x Capacitive load x Voltage2 x Frequency