CH 5
CH 5
2
State Element
A state element has at least two inputs and one output.
When the data value is written, input are
Data value
Clock
The output from a state element provides the value that was
written in an earlier clock cycle.
3
Adding a Clock to a Circuit
Clock: free running signal with fixed cycle time (clock period)
high (1)
low (0)
period rising edge falling edge
4
Logic Design Convention-State Elements
Two styles
Unclocked vs. Clocked
Clocks used in synchronous logic
When should an element that contains state be updated?
Depends on the element type
Two synchronous state elements
Latch
State changes on the valid level (level-triggered)
D Flip Flops
State changes only on a clock edge (edge-triggered methodology)
cycle time Falling edge
6
Read/Write at the Same Cycle
Use the edge-triggered methodology
Read at the first half cycle and write at the second half cycle
State
Combinational logic
element
7
Storage Element: Register
(Basic Building Block)
Register
Similar to the D Flip Flop
N-bit input and output
Write Enable input
Write Enable:
Asserted -> update the register contents
Wirte Enable
CLK
8
Register File
9
Register File
Note: we still use the real clock to determine when to write
Write
C
0 Register 0
1 D
C
n-to-2𝑛
Register number Register 1
decoder
D
n-1
n
C
Register n - 2
D
C
Register n - 1
Register data D
10
Abstraction of Mux
Select
A31
M
U C31
B31 X
Select
32
A A30
M M
.
.
.
32
U C U C30
B
32 X B30 X .
.
.
.
.
.
A0
M
U C0
B0 X
11
Storage Element: Register File
Register File consists of 32 registers: RW RA RB
Write Enable 5 5 5
Two 32-bit output busses:
busA
busA and busB busW 32
32-bit
One 32-bit input bus: busW 32 Registers busB
Clk
Register is selected by: 32
RA (number) selects the register to put on busA (data)
RB (number) selects the register to put on busB (data)
Memory (idealized)
One input bus: Data In Data In DataOut
One output bus: Data Out 32 32
Clk
Memory word is selected by:
Address selects the word to put on Data Out
Write Enable = 1: address selects the memory
word to be written via the Data In bus
Clock input (CLK)
The CLK input is a factor ONLY during write operation
During read operation, behaves as a combinational logic block:
13
The Big Picture: Where Are We Now?
Datapath Output
14
The CPU
15
Datapath & Control
Simplified MIPS to contain only 3 classes of instructions:
memory-reference instructions:
lw, sw
arithmetic-logical instructions:
add, sub, and, or, slt
control flow instructions:
beq, j
Key design principles
Make the common case fast
Simplicity favors regularity
16
The MIPS Instruction Formats
All MIPS instructions are 32 bits long. The three instruction formats:
31 26 21 16 11 6 0
R-type op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
31 26 21 16 0
I-type op rs rt immediate
6 bits 5 bits 5 bits 16 bits
31 26 0
J-type op target address
6 bits 26 bits
The different fields are:
op: operation of the instruction
rs, rt, rd: the source and destination register specifiers
shamt: shift amount
funct: selects the variant of the operation in the “op” field
address / immediate: address offset or immediate value
target address: target address of the jump instruction
17
Execution Flow
Instruction
Fetch
add c, a, b
+ a+b
Instruction sub d, a, c
Decode
add d, c, b
.
.
Operand .
Fetch
Memory a
Execute b
c
Result
Store
Next
Instruction
register
18
Steps in Executing an Instruction
Instruction Fetch (IF)
Fetch the next instruction from memory
Instruction Decode (ID)
Examine instruction to determine:
What operation is performed by the instruction (e.g., addition)
What operands are required, and where the result goes
Operand Fetch
Fetch the operands
Execution (EX)
Perform the operation on the operands
Result Writeback (WB)
Write the result to the specified location
Next Instruction
Determine where to get next instruction
19
Overview of Implementations
Generic Implementation (high similarity between
instructions)
use the program counter (PC) to supply instruction address
get the instruction from memory
read registers
use the instruction to decide exactly what to do
All instructions use the ALU after reading the registers
Memory-reference lw $t1, 32($t2)
Arithmetic add $t1, $t2, $t3
Control flow beq $t1, $t2, L1
20
Abstract View of MIPS Implementation
Instruction
Memory Registers
Data
Data
Memory
Register#
PC Instruction ALU Address
Register# ALU
Register#
Data
21
Next Instruction: PC Datapath
NextPC = PC + 4
Add
4
Instruction memory
Read
PC Instrustion
address
22
Next Instruction: PC Datapath
Add Add
4
Instruction memory
Read
PC address
Instrustion
23
Abstract View of Basic Implementation
Two types of functional units:
elements that operate on data values (combinational)
elements that contain state (sequential)
Add Add
Data
address
Read Register# ALU
PC address Instrustion Register#
Register#
Data
Instrustion memory Registers Data memory
24
Basic Implementation (+Instruction Decode)
Branch
M
U Zero
X
4 Add
Add M
U
X
Instruction memory Registers Data memory
Data
MemWrite
Register# address
PC Address Instrustion Register#
M ALU
Register# U
ALU operation
X
RegWrite Data MemRead
Control
25
How to Design a Processor:
Step-by-step
Analyze instruction set → datapath requirements
the meaning of each instruction is given by the register transfers
datapath must include storage element for ISA registers
26
Datapath Components
Common to all instructions:
Instruction memory
PC and its update
Datapath of R type instructions (e.g. add $t1, $t2, $t3)
ALU
Register set (file)
Datapath of memory-reference instructions (e.g. lw $t1, offset($2) )
ALU (for address calculation)
Register set
Sign extension unit
data memory
Datapath for a branch inst. (e.g. beq $1, $2, offset)
Sign extension + 2bit shifter
Register set
Adder
ALU (zero output)
27
Instruction Memory
and PC Update
Two state elements are needed to store and access
instructions, and an adder is needed to compute the next
instruction address.
Instruction memory
Instruction
address
Instrustion PC Sum
Add
28
PC Datapath and
Instruction Fetch
NextPC = PC + 4
Add
4
Read
address
PC Instrustion
Instruction memory
29
Datapath for R-format
R type instructions (e.g. ADD $t1, $t2, $t3)
steps:
Read two registers
Register file
Datapath component
Register file
A collection of registers in which any register can be read or written
by specifying the number of register (register address) in the file
Needs a write control signal “RegWrite”
How many ports are required?
ALU
30
Datapath for R-format (1)
32 registers
Two read ports and one write port
Only write control
Registers
5 Read ALU operation
register 1 4
Read
Register 5 Data 1
Read
Numbers register 2 Zero
5
Data ALU ALU
Write
register result
Read
Data 2
Data Write
Data b. ALU
RegWrite The two elements needed to implement
R-format ALU operation are the register
a. Register Only write control file and the ALU
31
Datapath for R-format (2)
Registers ALU
Read Read
register 1 Data 1
Instruction Read Zero
register 2
Write ALU
register result
Read
Write Data 2
Data
32
Datapath of Memory-Reference Instructions
Memory-reference instructions (e.g. lw $t1, offset($t2) )
Steps:
Read one or two registers (lw: one register, sw: two registers)
Register file
Memory address calculation
ALU + sign extension unit
Memory read/write
Data memory
Write the result into a register
Register file
Datapath components
ALU (for address calculation)
Register set
Sign extension unit
Can you figure out why this?
Data memory
33
I-Format Instructions
Define the following “fields”:
6 5 5 16
opcode rs rt immediate
opcode: uniquely specifies an I-format instruction
rs: specifies the only register operand
rt: specifies register which will receive result of computation
(target register)
addi, slti, immediate is sign-extended to 32 bits, and treated
as a signed integer
16 bits ➔ can be used to represent immediate up to 216
different values
Key concept: Only one field is inconsistent with R-format.
Most importantly, opcode is still in same location
34
Datapath of Memory-Reference Instructions
(1)
The two units needed to implement loads and stores are
the data memory unit and the sign-extension unit, in
addition to the register file and ALU
35
Datapath of Memory-Reference Instructions
(2)
Registers ALU
Read Read
register 1 Data 1 Data Memory
Instruction Read Zero
register 2 Read
ALU address
Write
register result
Read Read
Write
Data 2 Data
Write address
Data
Write
Data
16 32
Sign
extend
(The data path for a load or store that does a register acess )
36
Combine R-type and Memory-Reference
Registers
4
ALU operation
Read Read
register 1 Data 1 ALU
MemWrite
Instruction Read Zero Data Memory
register 2
MemtoReg
ALUSrc ALU
Write
register result
Read Address Read
1
Write Data 2 1 Data M
M
Data U U
RegWrite X X
0
0
Write
Data
16 32 MemRead
Sign
extend
37
Datapath for Branch
Branch instruction (e.g. beq $1, $2, offset)
Base address: PC+4
add at the moment of instruction fetch
Offset
Shift left by 2-bits
word alignment
Steps
Read two registers
Register file
Branch/Jump address calculation
Adder + sign extension unit+2-bit shift
Compare the register contents to check the condition is true or not
ALU (zero output)
Write the result into PC
If branch, New PC = branch address (PC+4+offset)
If jump, replacing the lower 28-bit of the PC with the 26-bit immediate
values from the instructions shifted by 2-bits
38
Datapath for Branch (1)
The datapath for a branch uses an ALU for evaluation of the
branch condition and a separate adder for computing the
branch target as the sum of the incremented PC and the
sign-extended, lower 16 bits of the I instruction (the branch
displacement) shifted left 2 bits.
Registers
Read 4
register 1 ALU operation
Read ALU
Instruction Read Data 1
register 2
Zero To branch
Write control logic
register
Read
Data 2
Write
Data
PC + 4 from instruction datapath
RegWrite Add
M
U
Add
ALU X
4 result
Add
Shift
Left 2
Registers
Instruction memory
Read ALUSrc ALU operation
PC Read register 1 4
addrsss Read MemWrite
Data 1
Read Zero MemtoReg
Instruction register 2 Data Memory
Read ALU Address Read
Write Data 2 M ALU result Data M
register
U U
Write X X
Data
Write
RegWrite Data
16 Sign 32 MemRead
extend
41
Control – The Hardest Part
of Design
Purpose
Selecting the operations to perform (ALU, read/write, etc.)
Controlling the flow of data (multiplexor inputs)
How you get these control signals:
Information comes from the 32 bits of the instruction
Example:
op rs rt rd shamt funct
M
U
Add
ALU X
4 result
Add
Shift
Left 2
Registers
Read ALUSrc 4 ALU operation
Read register 1
PC Read
addrsss
Data 1 MemWrite
Read Zero MemtoReg
Instruction Data Memory
register 2
Read ALU Address Read
Write Data 2 ALU result Data M
register M
U U
Instruction memory Write X X
Data
Write
RegWrite Data
16 32 MemRead
Sign
For MUX extend
For ALU
43
Design Method for Control
Multi-level control (decoding)
Instruction opcode: main control unit (first level)
ALU control
Sub-control for arithmetic
MUX control
Which source registers and destination registers
ALU input source
Input source of destination register
Input source of PC
Result for first level
Seven 1-bit control lines
2-bit ALUOP control signals
The above control signals can be set based solely on the opcode
field of the instruction
Exception: PCSrc (depends on the beq result)
44
ALU Control (1)
Instructions using ALU
Load/store
address calculation – add
lw $t1, offset($t2)
Branch eq
Subtract for comparison
beq $t1, $t2, offset
R-type
and/or
set-on-less-than
45
ALU Control (2)
Multi-level control (decoding)
Instruction opcode: main control unit – first level
00 = lw, sw
01 = beq,
10 = arithmetic
2nd level: function code for arithmetic : sub control
Reduce the size of main control but may increase the delay
46
ALU Control (3) -
Truth Table For Gate Implementation
Instruction Instruction Desired ALU control
ALUOP Function Code
opcode opcode ALU action input
LW 00 Load word xxxxxx add 0010
SW 00 Store word xxxxxx add 0010
Branch equal 01 Branch equal xxxxxx subtract 0110
R-type 10 ADD 100000 add 0010
R-type 10 Subtract 100010 subtract 0110
R-type 10 AND 100100 and 0000
R-type 10 OR 100101 or 0001
R-type 10 set-on-less-than 101010 set-on-less-than 0111
ALUOp Funct field
Operation
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 0010
X 1 X X X X X X 0110
1 X X X 0 0 0 0 0010
1 X X X 0 0 1 0 0110
1 X X X 0 1 0 0 0000
1 X X X 0 1 0 1 0001
1 X X Xk
1 0 1 0 0111
47
Design the Main Control Unit
0 rs rt rd shamt funct
31:26 25:21 20:16 15:11 10:6 5:0
I-Format: load/store
35 or 43 rs rt address
Load/store
I-Format: load/store
35 or 43 rs rt address
Load/store
49
Design the Main Control Unit
The destination register is in one of two places
For a load it is in bit positions 20-16 (rt),
For an R-type instruction it is in bit positions 15-11 (rd).
Need to add a multiplex or to select which field of the instruction
is used to indicate the register number to be written.
R-Format
0 rs rt rd shamt funct
31:26 25:21 20:16 15:11 10:6 5:0
I-Format: load/store
35 or 43
Load/store
rs rt address
31:26 25:21 20:16 15:0
50
Design the Main Control Unit
Add PCSrc
0
Add M
U
X
4
ALU 1
result
Shift
Left 2
Instruction Registers
memory
Read Instruction[25:21] Read ALUSrc
PC addrsss register 1 Read ALU
Data 1 MemWrite
Instruction Instruction[20:16] Read Zero Data Memory
MemtoReg
[31:0] register 2
0
Read ALU Address Read 1
M Write Data 2 0
result Data M
U register M
U
Instruction[15:11] X U
1 X X
Write 1 0
Data
Write
RegWrite Data
RegDst
16 32 ALU MemRead
Instruction[15:0] Sign
extend control
What remained?
ALU Op
Seven 1-bit Control Instruction[5:0]
51
Effect of Seven
1-bit Control Signals
The function of each of the seven control signals. When the 1-bit
control to a two-way multiplexor is asserted, the multiplexor
selects the input corresponding to 1. Otherwise, if the control is
deserted, the multiplexor selects the 0 input. Remember that the
state elements all have the clock as an implicit input and that the
clock is used in controlling writes.
52
0
Add M
U
Add
ALU X
4 result 1
Shift
RegDst Left 2
Branch
MemRead
Instruction[31:26] MemtoReg
control ALU Op
MemWrite
ALUSrc
RegWrite
Instruction[25:21] Read
Read register 1
PC
addrsss
Read ALU
Instruction[20:16] Data 1
Read
Instruction register 2 Zero Data Memory
0
[31:0] ALU Read
M
Write Read 0 Address 1
U Data 2 result Data M
register M
Instruction[15:11] X U U
1 X X
Instruction Write 1 0
Data
memory Registers Write
Data
Instruction[15:0] 16 Sign 32
extend ALU
control
Four bits
Instruction[5:0]
Two bits
Op5 RegDst
ALUSrc
….
….
Control
ALUOp1
Op0 ALUOp0
54
Final Control – Truth Table
Instruction[5:0]
56
Datapath for a “load” Operation
0
Add Add M
ALU U
result
Shift X
1
RegDst Left 2
4 Branch
MemRead
MemtoReg
control ALU Op
MemWrite
ALUSrc
Instruction RegWrite
memory
Read Instruction[25:21] Read ALU
PC addrsss register 1 Read
Data 1
Instruction Instruction[20:16] Read Zero
[31:0] register 2
0
Read ALU Address Read 1
M Write Data 2 0
result Data M
U register M
U
Instruction[15:11] X U
1 X X
Write 1 0
Data Registers
Write
Data
Data Memory
16 Sign 32 ALU
Instruction[15:0]
extend control
Instruction[5:0]
57
Datapath for “beq”
0
Add Add M
ALU U
result X
Shift
1
Left 2
RegDst
4 Branch
MemRead
MemtoReg
control ALU Op
MemWrite
ALUSrc
RegWrite
Instruction[5:0]
58
Design with “jump”
Instruction (1)
Implement “jump” by concatenating
Upper 4-bits of “PC+4”: NextPC[31:28]
26-bit immediate field from instruction
Bits 00
59
Design with “jump” Instruction (2)
Instruction [25:0] Jump address[31:0]
Shift
Left 2 PC + 4[31:28]
26 28 0 1
Add M M
U U
X X
ALU
4 RegDst result
1 0
Add Shift
Left 2
Jump
Branch
MemRead
Instruction [31:26] MemtoReg
Control ALUOp
MemWrite
ALUSrc
RegWrite
Instruction[25:21] Read
Read register 1
PC addrsss Read ALU
Instruction[20:16] Data 1
Read
Zero
Instruction register2
[31:0] 0
Read ALU Read
M
Write 0 result Address Data
1
U
Instruction[15:11] X register Data 2 M M
U U
1 X
X
Instruction memory Write 1 0
Data Registers
Write
Data Data
Memory
Instruction[15:0] 16 Sign 32
extend ALU
control
Instruction[5:0]
60
Our Simple Control Structure
All of the logic is combinational
We wait for everything to settle down, and the right thing to be
done
ALU might not produce “right answer” right away
We use write signals along with clock to determine when to write
Cycle time determined by length of the longest path
State State
Combinational
element element
logic
1 2
Clock cycle
61
Why a Single-Cycle Implementation
Is Not Used Today
The path is almost certainly a load instruction, which uses
five functional units in series:
Instruction memory
Register file
ALU
Data memory
Register file
Several of other instruction classes could fit in a shorter
clock cycle.
62
Performance of Single Cycle Implementation
Add M
U
X
4 ALU
result
Shift
Left 2
Registers
ALU operation
P Read Read 4 MemWrite
C addrsss register 1 Read
Data 1 ALUSrc MemtoReg
Instruction
Read Zero
register 2 Data Memory
Read ALU Address Read
Write Data 2 M result Data M
register U U
X X
Instruction Write ALU
memory Data
Write
RegWrite Data
16 Sign 32 MemRead
extend
63
Critical Path for Different
Instructions
Class Function units
R-type Instruction fetch Register access ALU Register access
Load word Instruction fetch Register access ALU Memory access Register access
Store word Instruction fetch Register access ALU Memory access
Branch Instruction fetch Register access ALU
Jump Instruction fetch
Instruction Instruction Register ALU Data Register Total
class memory read Operation memory Write
R-type 200 50 100 0 50 400
Load word 200 50 100 200 50 600
Store word 200 50 100 200 550
Branch 200 50 100 0 350
Jump 200 200
64
Performance of Single Cycle Machines
65
Single Cycle Design
Timing problems
Fixed clock cycle time
Significant penalty
Clock cycle equal to the worst case
what if we had a more complicated instruction like floating point?
Violates “ make the common case fast”
Acceptable only for the small instruction set
Variable clock cycle time
Hard to implement
Asynchronous design style ?
Area problems
wasteful of area: duplicate resources
66
Where we are headed
One Solution:
Shorter clock cycle time and use “multiple clock cycle”
Different instructions take different numbers of cycles
Multicycle datapath
Another solution
Pipelining (Ch. 6)
Overlapping the execution of multiple instructions
67
See You Next Class!
68