Comp206 Lecture7
Comp206 Lecture7
The Processor
read/write data
addr. calcul. for load/store
op. exec.
• State elements
Internal storage
• Combinational element
Operate on data
I0 M A
u Y
I1 x
ALU Y
B
S
F
Sequential Elements
• Register: stores data in a circuit
Uses a clock signal to determine when to update the
stored value
Edge-triggered: update when Clk changes from 0 to 1
If the signal is written at the same time it’s read the
result is unpredictable:
Clocking methodology makes is predictable
Clock edge: quick transition from low to high or vice
versa
Clk
D Q
D
Clk
Q
Write
D Q D
Write
Q
Clk
Clocking Methodology Updated on a
rising-edge
• Combinational logic
transforms data during
clock cycles
Between clock edges
Input from state 1 cycle
elements, output to state
element
Longest delay determines
clock period
Edge-triggered timing: no feedback in
• Inputs are written in the single cycle
previous cycle
Just
re-routes
wires
compute branch
target
Sign-bit wire
replicated
Branch Instructions
Load/
35 or 43 rs rt address
Branch 4 rs rt address
31:26 25:21 20:16 15:0
The control unit can set all but one of the control signals based solely on the opcode
field of the instruction
• PCSrc control line should be asserted if the instruction is branch on equal (a
decision that the control unit can make) and the Zero output of the ALU,
which is used for equality comparison, is asserted
R-Type Instruction Control Path
• The instruction is fetched, and the PC is incremented
• Two registers, $t2 and $t3, are read from the register file; also,
the main control unit computes the setting of the control lines
• The ALU operates on the data read from the register file, using
the function code (bits 5:0, which is the funct field, of the
instruction) to generate the ALU function
• The result from ALU is written into register file using bits
• Pipelining
As soon as the washer is finished with the first load and placed in
the dryer, you load the washer with the second dirty load
• If all stages take similar amount of time & there’s enough work
to do
◼ Four loads:
◼ Speedup
= 8/3.5 = 2.3x
◼ pipeline is not full
yet: 2.3 < 4
MIPS Pipeline
• Five stages, one step per stage
IF: Instruction fetch from memory
ID: Instruction decode & register read
EX: Execute operation or calculate address
MEM: Access memory operand
WB: Write result back to register
Pipeline Performance
• Single-cycle
model: every Single-cycle (Tc= 800ps)
instruction takes
exactly one
clock cycle, so
the clock cycle
must be
stretched to
accommodate
the slowest
instruction Pipelined (Tc= 200ps)
• All the pipeline
stages take a
single clock
cycle, so the clock
cycle must be long
enough to
accommodate the
slowest operation
Pipeline Speedup
• If all stages are balanced
i.e., all take the same time
Time between instructionspipelined
= Time between instructionsnonpipelined
Number of stages