0% found this document useful (0 votes)
20 views68 pages

CH 5

The document discusses the organization of computer processors, focusing on datapath and control mechanisms. It covers logic design conventions, state elements, clocking methodologies, and the structure of registers and memory in a CPU. Additionally, it outlines the execution flow of MIPS instructions and the design principles for creating a processor, emphasizing the importance of understanding instruction sets and datapath requirements.

Uploaded by

秦槐駿
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views68 pages

CH 5

The document discusses the organization of computer processors, focusing on datapath and control mechanisms. It covers logic design conventions, state elements, clocking methodologies, and the structure of registers and memory in a CPU. Additionally, it outlines the execution flow of MIPS instructions and the design principles for creating a processor, emphasizing the importance of understanding instruction sets and datapath requirements.

Uploaded by

秦槐駿
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Computer Organization

The Processor: Datapath and Control

Prof. Ya-Shu Chen


National Taiwan University of Science and Technology
1
Logic Design Conventions
 Two types of logic elements
 Combinational logic
 Output only depends on the current input
 Given the same input, a combination logic produces the same
output.
 Uses for ALU, multiplier, and other datapath
 Sequential logic (state element)
 Output depends on current inputs and current states
 State element to store the states if it has some internal storage.
 The instruction and data memories as well as the registers are all
examples of state elements.

2
State Element
 A state element has at least two inputs and one output.
 When the data value is written, input are
 Data value
 Clock
 The output from a state element provides the value that was
written in an earlier clock cycle.

3
Adding a Clock to a Circuit
 Clock: free running signal with fixed cycle time (clock period)

high (1)

low (0)
period rising edge falling edge

° Clock determines when to write memory element


• level-triggered - store clock high (low)
• edge-triggered - store only on clock edge

° We will use negative (falling) edge-triggered metho-


dology

4
Logic Design Convention-State Elements

 Two styles
 Unclocked vs. Clocked
 Clocks used in synchronous logic
 When should an element that contains state be updated?
 Depends on the element type
 Two synchronous state elements
 Latch
 State changes on the valid level (level-triggered)
 D Flip Flops
 State changes only on a clock edge (edge-triggered methodology)
cycle time Falling edge

Clock period Rising edge


5
Clocking Methodology
 Clocking methodology
 Defines when signals can be read and when they can be written
 Mainstream: An edge triggered methodology
 Any value stored in a sequential logic element are updated only on
a clock edge.
 Determine when data is valid and stable relative to the
clock
 Typical execution:
 read contents of some state elements,
 send values through some combinational logic
 write results to one or more state elements

6
Read/Write at the Same Cycle
 Use the edge-triggered methodology
 Read at the first half cycle and write at the second half cycle

State
Combinational logic
element

7
Storage Element: Register
(Basic Building Block)
 Register
 Similar to the D Flip Flop
 N-bit input and output
 Write Enable input
 Write Enable:
 Asserted -> update the register contents
Wirte Enable

Data In Data Out

CLK

8
Register File

 Built using D flip-flops


Read register
number 1
Register 0
Register file Register 1 M
..
Read register . U Read data 1
number1 Read Register n-2 X
data1 Register n-1
Read register
number2
Write Read
register data2
Read register
Write number 2
Data Write
M
U Read data 2
X

9
Register File
 Note: we still use the real clock to determine when to write

Write
C
0 Register 0
1 D
C
n-to-2𝑛
Register number Register 1
decoder
D

n-1
n

C
Register n - 2
D
C
Register n - 1
Register data D

10
Abstraction of Mux
Select

A31
M
U C31
B31 X
Select

32
A A30
M M
.
.
.
32
U C U C30
B
32 X B30 X .
.
.

.
.
.

A0
M
U C0
B0 X

11
Storage Element: Register File
 Register File consists of 32 registers: RW RA RB
Write Enable 5 5 5
 Two 32-bit output busses:
busA
busA and busB busW 32
32-bit
 One 32-bit input bus: busW 32 Registers busB
Clk
 Register is selected by: 32
 RA (number) selects the register to put on busA (data)
 RB (number) selects the register to put on busB (data)

 RW (number) selects the register to be written

via busW (data) when Write Enable is 1


 Clock input (CLK)
 The CLK input is a factor ONLY during write operation
 During read operation, behaves as a combinational logic block:

 RA or RB valid → busA or busB valid after “access time.”


12
Storage Element: Idealized Memory
Write Enable Address

 Memory (idealized)
 One input bus: Data In Data In DataOut
 One output bus: Data Out 32 32
Clk
 Memory word is selected by:
 Address selects the word to put on Data Out
 Write Enable = 1: address selects the memory
word to be written via the Data In bus
 Clock input (CLK)
 The CLK input is a factor ONLY during write operation
 During read operation, behaves as a combinational logic block:

 Address valid → Data Out valid after “access time.”

13
The Big Picture: Where Are We Now?

 The five classic components of a computer


Harvard Architecture
Processor
Input
Control
Memory

Datapath Output

 An abstract view of major functions of MIPS:


 Two types of function units
 Elements that operates on data values (combinational)
 Element that contains state (sequential)

14
The CPU

 Processor (CPU): the active part of the computer, which


does all the work (data manipulation and decision-making)
 Datapath: portion of the processor which contains
hardware necessary to perform operations required by the
processor (the brawn)
 Control: portion of the processor (also in hardware) which
tells the datapath what needs to be done (the brain)

15
Datapath & Control
 Simplified MIPS to contain only 3 classes of instructions:
 memory-reference instructions:
 lw, sw
 arithmetic-logical instructions:
 add, sub, and, or, slt
 control flow instructions:
 beq, j
 Key design principles
 Make the common case fast
 Simplicity favors regularity

16
The MIPS Instruction Formats
 All MIPS instructions are 32 bits long. The three instruction formats:
31 26 21 16 11 6 0
 R-type op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
31 26 21 16 0
 I-type op rs rt immediate
6 bits 5 bits 5 bits 16 bits
31 26 0
 J-type op target address
6 bits 26 bits
 The different fields are:
 op: operation of the instruction
 rs, rt, rd: the source and destination register specifiers
 shamt: shift amount
 funct: selects the variant of the operation in the “op” field
 address / immediate: address offset or immediate value
 target address: target address of the jump instruction
17
Execution Flow
Instruction
Fetch
add c, a, b
+ a+b
Instruction sub d, a, c
Decode
add d, c, b
.
.
Operand .
Fetch
Memory a
Execute b
c
Result
Store

Next
Instruction
register
18
Steps in Executing an Instruction
 Instruction Fetch (IF)
 Fetch the next instruction from memory
 Instruction Decode (ID)
 Examine instruction to determine:
 What operation is performed by the instruction (e.g., addition)
 What operands are required, and where the result goes
 Operand Fetch
 Fetch the operands
 Execution (EX)
 Perform the operation on the operands
 Result Writeback (WB)
 Write the result to the specified location
 Next Instruction
 Determine where to get next instruction
19
Overview of Implementations
 Generic Implementation (high similarity between
instructions)
 use the program counter (PC) to supply instruction address
 get the instruction from memory
 read registers
 use the instruction to decide exactly what to do
 All instructions use the ALU after reading the registers
 Memory-reference lw $t1, 32($t2)
 Arithmetic add $t1, $t2, $t3
 Control flow beq $t1, $t2, L1

20
Abstract View of MIPS Implementation

Instruction
Memory Registers
Data
Data
Memory
Register#
PC Instruction ALU Address
Register# ALU

Register#

Data

21
Next Instruction: PC Datapath

 NextPC = PC + 4

Add
4
Instruction memory

Read
PC Instrustion
address

22
Next Instruction: PC Datapath

 (PC + 4) + branch offset


 PC = (PC+4) + (immediate*4)

Add Add
4
Instruction memory

Read
PC address
Instrustion

23
Abstract View of Basic Implementation
 Two types of functional units:
 elements that operate on data values (combinational)
 elements that contain state (sequential)

Add Add

Data

address
Read Register# ALU
PC address Instrustion Register#
Register#
Data
Instrustion memory Registers Data memory
24
Basic Implementation (+Instruction Decode)
Branch
M
U Zero
X

4 Add

Add M
U
X
Instruction memory Registers Data memory

Data
MemWrite
Register# address
PC Address Instrustion Register#
M ALU
Register# U

ALU operation
X
RegWrite Data MemRead

Control

25
How to Design a Processor:
Step-by-step
 Analyze instruction set → datapath requirements
the meaning of each instruction is given by the register transfers
datapath must include storage element for ISA registers

datapath must support each register transfer

 Select set of datapath components and establish clocking


methodology
 Assemble datapath meeting the requirements
 Analyze implementation of each instruction to determine
setting of control points that effects the register transfer.
 Assemble the control logic

26
Datapath Components
 Common to all instructions:
 Instruction memory
 PC and its update
 Datapath of R type instructions (e.g. add $t1, $t2, $t3)
 ALU
 Register set (file)
 Datapath of memory-reference instructions (e.g. lw $t1, offset($2) )
 ALU (for address calculation)
 Register set
 Sign extension unit
 data memory
 Datapath for a branch inst. (e.g. beq $1, $2, offset)
 Sign extension + 2bit shifter
 Register set
 Adder
 ALU (zero output)
27
Instruction Memory
and PC Update
 Two state elements are needed to store and access
instructions, and an adder is needed to compute the next
instruction address.

Instruction memory
Instruction
address

Instrustion PC Sum
Add

a. Instruction memory b. Program counter c. Adder

28
PC Datapath and
Instruction Fetch
 NextPC = PC + 4

Add
4

Read
address

PC Instrustion

Instruction memory
29
Datapath for R-format
 R type instructions (e.g. ADD $t1, $t2, $t3)
 steps:
 Read two registers
 Register file

 Perform an ALU operations


 ALU
 Write the result into a register
 Register file

 Datapath component
 Register file
 A collection of registers in which any register can be read or written
by specifying the number of register (register address) in the file
 Needs a write control signal “RegWrite”
 How many ports are required?
 ALU
30
Datapath for R-format (1)
 32 registers
 Two read ports and one write port
 Only write control
Registers
5 Read ALU operation
register 1 4
Read
Register 5 Data 1
Read
Numbers register 2 Zero
5
Data ALU ALU
Write
register result
Read
Data 2
Data Write
Data b. ALU
RegWrite The two elements needed to implement
R-format ALU operation are the register
a. Register Only write control file and the ALU
31
Datapath for R-format (2)

Registers ALU
Read Read
register 1 Data 1
Instruction Read Zero
register 2
Write ALU
register result
Read
Write Data 2
Data

(The data path for R-type instruction)

32
Datapath of Memory-Reference Instructions
 Memory-reference instructions (e.g. lw $t1, offset($t2) )
 Steps:
 Read one or two registers (lw: one register, sw: two registers)
 Register file
 Memory address calculation
 ALU + sign extension unit
 Memory read/write
 Data memory
 Write the result into a register
 Register file
 Datapath components
 ALU (for address calculation)
 Register set
 Sign extension unit
 Can you figure out why this?
 Data memory
33
I-Format Instructions
 Define the following “fields”:
6 5 5 16
opcode rs rt immediate
 opcode: uniquely specifies an I-format instruction
 rs: specifies the only register operand
 rt: specifies register which will receive result of computation
(target register)
 addi, slti, immediate is sign-extended to 32 bits, and treated
as a signed integer
 16 bits ➔ can be used to represent immediate up to 216
different values
 Key concept: Only one field is inconsistent with R-format.
Most importantly, opcode is still in same location
34
Datapath of Memory-Reference Instructions
(1)
 The two units needed to implement loads and stores are
the data memory unit and the sign-extension unit, in
addition to the register file and ALU

35
Datapath of Memory-Reference Instructions
(2)
Registers ALU
Read Read
register 1 Data 1 Data Memory
Instruction Read Zero
register 2 Read
ALU address
Write
register result
Read Read
Write
Data 2 Data
Write address
Data

Write
Data
16 32
Sign
extend

(The data path for a load or store that does a register acess )
36
Combine R-type and Memory-Reference

Registers
4
ALU operation
Read Read
register 1 Data 1 ALU
MemWrite
Instruction Read Zero Data Memory
register 2
MemtoReg
ALUSrc ALU
Write
register result
Read Address Read
1
Write Data 2 1 Data M
M
Data U U
RegWrite X X
0
0
Write
Data
16 32 MemRead
Sign
extend

37
Datapath for Branch
 Branch instruction (e.g. beq $1, $2, offset)
 Base address: PC+4
 add at the moment of instruction fetch
 Offset
 Shift left by 2-bits
 word alignment
 Steps
 Read two registers
 Register file
 Branch/Jump address calculation
 Adder + sign extension unit+2-bit shift
 Compare the register contents to check the condition is true or not
 ALU (zero output)
 Write the result into PC
 If branch, New PC = branch address (PC+4+offset)
 If jump, replacing the lower 28-bit of the PC with the 26-bit immediate
values from the instructions shifted by 2-bits
38
Datapath for Branch (1)
 The datapath for a branch uses an ALU for evaluation of the
branch condition and a separate adder for computing the
branch target as the sum of the incremented PC and the
sign-extended, lower 16 bits of the I instruction (the branch
displacement) shifted left 2 bits.
Registers
Read 4
register 1 ALU operation
Read ALU
Instruction Read Data 1
register 2
Zero To branch
Write control logic
register
Read
Data 2
Write
Data
PC + 4 from instruction datapath
RegWrite Add

Sum branch target


16 Sign 32 Shift
extend Left 2
Only routing
39
Branches: Instruction Format
 Use I-format:
opcode rs rt immediate
 opcode specifies beq or bne
 rs and rt specify registers to compare
 What can immediate specify? PC-relative addressing
 Immediate specifies word address
 Instructions are word aligned (byte address is always a multiple of 4,
i.e., it ends with 00 in binary)
 Now, we can branch +/- 215 words from the PC (or +/- 217 bytes), handle
loops 4 times as large
 Immediate specifies PC + 4
 Due to hardware, add immediate to (PC+4), not to PC
 If branch not taken: PC = PC + 4
 If branch taken: PC = (PC+4) + (immediate*4)
40
Creating a Single Datapath
PCSrc

M
U
Add
ALU X
4 result
Add
Shift
Left 2
Registers
Instruction memory
Read ALUSrc ALU operation
PC Read register 1 4
addrsss Read MemWrite
Data 1
Read Zero MemtoReg
Instruction register 2 Data Memory
Read ALU Address Read
Write Data 2 M ALU result Data M
register
U U
Write X X
Data
Write
RegWrite Data

16 Sign 32 MemRead
extend

41
Control – The Hardest Part
of Design
 Purpose
 Selecting the operations to perform (ALU, read/write, etc.)
 Controlling the flow of data (multiplexor inputs)
 How you get these control signals:
 Information comes from the 32 bits of the instruction
 Example:

add $8, $17,$18 Instruction Format:

000000 10001 10010 01000 00000 100000

op rs rt rd shamt funct

ALU's operation based on instruction type (opcode) and function code


42
What Control Signals
Do We Need?
PCSrc

M
U
Add
ALU X
4 result
Add
Shift
Left 2
Registers
Read ALUSrc 4 ALU operation
Read register 1
PC Read
addrsss
Data 1 MemWrite
Read Zero MemtoReg
Instruction Data Memory
register 2
Read ALU Address Read
Write Data 2 ALU result Data M
register M
U U
Instruction memory Write X X
Data
Write
RegWrite Data

16 32 MemRead
Sign
For MUX extend

For ALU
43
Design Method for Control
 Multi-level control (decoding)
 Instruction opcode: main control unit (first level)
 ALU control
 Sub-control for arithmetic
 MUX control
 Which source registers and destination registers
 ALU input source
 Input source of destination register
 Input source of PC
 Result for first level
 Seven 1-bit control lines
 2-bit ALUOP control signals
 The above control signals can be set based solely on the opcode
field of the instruction
 Exception: PCSrc (depends on the beq result)

44
ALU Control (1)
 Instructions using ALU
 Load/store
 address calculation – add
lw $t1, offset($t2)
 Branch eq
 Subtract for comparison
beq $t1, $t2, offset
 R-type
 and/or
 set-on-less-than

45
ALU Control (2)
 Multi-level control (decoding)
 Instruction opcode: main control unit – first level
00 = lw, sw
01 = beq,
10 = arithmetic
 2nd level: function code for arithmetic : sub control
 Reduce the size of main control but may increase the delay

46
ALU Control (3) -
Truth Table For Gate Implementation
Instruction Instruction Desired ALU control
ALUOP Function Code
opcode opcode ALU action input
LW 00 Load word xxxxxx add 0010
SW 00 Store word xxxxxx add 0010
Branch equal 01 Branch equal xxxxxx subtract 0110
R-type 10 ADD 100000 add 0010
R-type 10 Subtract 100010 subtract 0110
R-type 10 AND 100100 and 0000
R-type 10 OR 100101 or 0001
R-type 10 set-on-less-than 101010 set-on-less-than 0111
ALUOp Funct field
Operation
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 0010
X 1 X X X X X X 0110
1 X X X 0 0 0 0 0010
1 X X X 0 0 1 0 0110
1 X X X 0 1 0 0 0000
1 X X X 0 1 0 1 0001
1 X X Xk
1 0 1 0 0111

47
Design the Main Control Unit

 The op field, also called the opcode


 always contained in bits 31-26. We will refer to this field as Op[5-0].
 The two registers to be read
 always specified by the rs and rt fields, at positions 25-21 and 20-16.
This is true for the R-type instructions, branch equal, and for store.
R-Format

0 rs rt rd shamt funct
31:26 25:21 20:16 15:11 10:6 5:0
I-Format: load/store
35 or 43 rs rt address
Load/store

31:26 25:21 20:16 15:0


I-Format: branch
4 rs rt address
31:26 25:21 20:16 15:0
48
Design the Main Control Unit
 The base register for load and store instructions
 always in bit positions 25-21 (rs).
 The 16-bit offset for branch equal, load, and store
 always in positions 15-0.

I-Format: load/store
35 or 43 rs rt address
Load/store

31:26 25:21 20:16 15:0


I-Format: branch
4 rs rt address
31:26 25:21 20:16 15:0

49
Design the Main Control Unit
 The destination register is in one of two places
 For a load it is in bit positions 20-16 (rt),
 For an R-type instruction it is in bit positions 15-11 (rd).
 Need to add a multiplex or to select which field of the instruction
is used to indicate the register number to be written.

R-Format
0 rs rt rd shamt funct
31:26 25:21 20:16 15:11 10:6 5:0

I-Format: load/store
35 or 43
Load/store
rs rt address
31:26 25:21 20:16 15:0
50
Design the Main Control Unit
Add PCSrc
0
Add M
U
X
4
ALU 1
result
Shift
Left 2
Instruction Registers
memory
Read Instruction[25:21] Read ALUSrc
PC addrsss register 1 Read ALU
Data 1 MemWrite
Instruction Instruction[20:16] Read Zero Data Memory
MemtoReg
[31:0] register 2
0
Read ALU Address Read 1
M Write Data 2 0
result Data M
U register M
U
Instruction[15:11] X U
1 X X
Write 1 0
Data
Write
RegWrite Data
RegDst
16 32 ALU MemRead
Instruction[15:0] Sign
extend control
What remained?
ALU Op
Seven 1-bit Control Instruction[5:0]

51
Effect of Seven
1-bit Control Signals

The function of each of the seven control signals. When the 1-bit
control to a two-way multiplexor is asserted, the multiplexor
selects the input corresponding to 1. Otherwise, if the control is
deserted, the multiplexor selects the 0 input. Remember that the
state elements all have the clock as an implicit input and that the
clock is used in controlling writes.
52
0
Add M
U
Add
ALU X
4 result 1

Shift
RegDst Left 2
Branch
MemRead
Instruction[31:26] MemtoReg
control ALU Op
MemWrite
ALUSrc
RegWrite

Instruction[25:21] Read
Read register 1
PC
addrsss
Read ALU
Instruction[20:16] Data 1
Read
Instruction register 2 Zero Data Memory
0
[31:0] ALU Read
M
Write Read 0 Address 1
U Data 2 result Data M
register M
Instruction[15:11] X U U
1 X X
Instruction Write 1 0
Data
memory Registers Write
Data

Instruction[15:0] 16 Sign 32
extend ALU
control
Four bits
Instruction[5:0]
Two bits

Memto- Reg Mem Mem


Instruction RegDst ALUSrc Reg Branch ALUOP1 ALUOP0
Write Read Write
R-format 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
53
Control Unit Design

Op5 RegDst
ALUSrc

….
….

Control
ALUOp1
Op0 ALUOp0

54
Final Control – Truth Table

 The control function for the simple one-clock implementation is completely


specified by this truth table. The top half of the table gives the combinations of
input signals that correspond to the four opcodes that determine the control
output setting. (Remember that Op (5-0) corresponds to bits 31-26 of the
instruction, which is the opcode field.) The bottom portion of the table gives the
outputs.
55
Datapath for R-type
0
Add M
ALU U
result X
Shift
1
Add RegDst Left 2
4 Branch
MemRead
MemtoReg
control ALU Op
MemWrite
ALUSrc
Instruction
RegWrite
memory
Read Instruction[25:21] Read
PC addrsss register 1 Read
Data 1
Instruction Instruction[20:16] Read
register 2 Zero
[31:0]
0
Read ALU Read
Write 0 ALU result Address Data
1
M
Data 2 M
U register M
U
Instruction[15:11] X U
1 X X
Write 1 0
Data
Write
Registers Data
Data Memory
16 Sign 32 ALU
Instruction[15:0]
extend control

Instruction[5:0]

56
Datapath for a “load” Operation
0
Add Add M
ALU U
result
Shift X
1
RegDst Left 2
4 Branch
MemRead
MemtoReg
control ALU Op
MemWrite
ALUSrc
Instruction RegWrite
memory
Read Instruction[25:21] Read ALU
PC addrsss register 1 Read
Data 1
Instruction Instruction[20:16] Read Zero
[31:0] register 2
0
Read ALU Address Read 1
M Write Data 2 0
result Data M
U register M
U
Instruction[15:11] X U
1 X X
Write 1 0
Data Registers
Write
Data
Data Memory
16 Sign 32 ALU
Instruction[15:0]
extend control

Instruction[5:0]

57
Datapath for “beq”
0
Add Add M
ALU U
result X
Shift
1
Left 2
RegDst
4 Branch
MemRead
MemtoReg
control ALU Op
MemWrite
ALUSrc
RegWrite

Read Instruction[25:21] Read


PC addrsss register 1 Read
Data 1
Instruction Instruction[20:16] Read Zero
[31:0] register 2
0
Read ALU Address Read 1
M Write Data 2 0 ALU result Data M
U
Instruction[15:11] X
register M
U U
1 X X
Instruction memory Write 1 0
Data Registers
Write
Data
Data Memory
16 32 ALU
Instruction[15:0] Sign
extend control

Instruction[5:0]

58
Design with “jump”
Instruction (1)
 Implement “jump” by concatenating
 Upper 4-bits of “PC+4”: NextPC[31:28]
 26-bit immediate field from instruction
 Bits 00

59
Design with “jump” Instruction (2)
Instruction [25:0] Jump address[31:0]
Shift
Left 2 PC + 4[31:28]
26 28 0 1
Add M M
U U
X X
ALU
4 RegDst result
1 0
Add Shift
Left 2
Jump
Branch
MemRead
Instruction [31:26] MemtoReg
Control ALUOp
MemWrite
ALUSrc

RegWrite
Instruction[25:21] Read
Read register 1
PC addrsss Read ALU
Instruction[20:16] Data 1
Read
Zero
Instruction register2
[31:0] 0
Read ALU Read
M
Write 0 result Address Data
1
U
Instruction[15:11] X register Data 2 M M
U U
1 X
X
Instruction memory Write 1 0
Data Registers
Write
Data Data
Memory
Instruction[15:0] 16 Sign 32
extend ALU
control

Instruction[5:0]

60
Our Simple Control Structure
 All of the logic is combinational
 We wait for everything to settle down, and the right thing to be
done
 ALU might not produce “right answer” right away
 We use write signals along with clock to determine when to write
 Cycle time determined by length of the longest path

State State
Combinational
element element
logic
1 2

Clock cycle

61
Why a Single-Cycle Implementation
Is Not Used Today
 The path is almost certainly a load instruction, which uses
five functional units in series:
 Instruction memory
 Register file
 ALU
 Data memory
 Register file
 Several of other instruction classes could fit in a shorter
clock cycle.

62
Performance of Single Cycle Implementation

 Calculate cycle time assuming negligible delays except:


 memory (200ps), ALU and adders (100ps), register file access
(50ps)
PCSrc
Add

Add M
U
X
4 ALU
result
Shift
Left 2
Registers
ALU operation
P Read Read 4 MemWrite
C addrsss register 1 Read
Data 1 ALUSrc MemtoReg
Instruction
Read Zero
register 2 Data Memory
Read ALU Address Read
Write Data 2 M result Data M
register U U
X X
Instruction Write ALU
memory Data
Write
RegWrite Data

16 Sign 32 MemRead
extend

63
Critical Path for Different
Instructions
Class Function units
R-type Instruction fetch Register access ALU Register access
Load word Instruction fetch Register access ALU Memory access Register access
Store word Instruction fetch Register access ALU Memory access
Branch Instruction fetch Register access ALU
Jump Instruction fetch
Instruction Instruction Register ALU Data Register Total
class memory read Operation memory Write
R-type 200 50 100 0 50 400
Load word 200 50 100 200 50 600
Store word 200 50 100 200 550
Branch 200 50 100 0 350
Jump 200 200
64
Performance of Single Cycle Machines

 Two implementation types


 1 cycle per instruction with fixed length cycle
 1 cycle per instruction with variable length cycle
 Which one is faster? Assume 25% load, 10% stores, 45% ALU,
15% branch, 5% jump

CPU cycle for variable length


= 600 × 25% + 550 × 10% + 400 × 45% + 350 × 15% + 200 × 5%
= 447.5 ps

𝐶𝑃𝑈 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝐶𝑃𝑈 𝑇𝑖𝑚𝑒 𝑓𝑖𝑥𝑒𝑑 600


= = = 1.34
𝐶𝑃𝑈 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑓𝑖𝑥𝑒𝑑 𝐶𝑃𝑈 𝑇𝑖𝑚𝑒 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 447.5

65
Single Cycle Design
 Timing problems
 Fixed clock cycle time
 Significant penalty
 Clock cycle equal to the worst case
 what if we had a more complicated instruction like floating point?
 Violates “ make the common case fast”
 Acceptable only for the small instruction set
 Variable clock cycle time
 Hard to implement
 Asynchronous design style ?
 Area problems
 wasteful of area: duplicate resources

66
Where we are headed
 One Solution:
 Shorter clock cycle time and use “multiple clock cycle”
 Different instructions take different numbers of cycles
 Multicycle datapath
 Another solution
 Pipelining (Ch. 6)
 Overlapping the execution of multiple instructions

Memory Instruction Registers


register ALU
PC Addrsss Data
A
Register#
Instruction ALUOut
or Data
Register#
Memory B
Data Data Register#
register

67
See You Next Class!

68

You might also like