04 The+processor
04 The+processor
• Single-cycle implementation
CSCE 5610 • All operations take the same amount of time—a single cycle.
• Multicycle implementation
Computer System Architecture • Allows faster operations to take less time than slower ones, so overall
performance can be increased.
• Pipelining
The Processor • Lets a processor overlap the execution of several instructions, potentially
leading to big performance gains.
• Control:
Today we’ll build a single-cyclebeq
implementation of this instruction set.
— All instructions will execute in the same amount of time; this will
determine the clock cycle time for our performance equations. CPU
— We’ll explain the datapath first, and then make the control unit.
State
Instruction fetching Basic MIPS implementation
• The CPU is always in an infinite loop, fetching The first two steps for every instruction:
instructions from memory and executing them. ● Send PC to instruction memory
Add ● Read one or two registers using fields in the machine code
• The program counter or PC register holds the 4
address of the current instruction. P
C
Read Instruction
address [31-0]
Instruction
memory
● Instruction Memory: stores the code and supply instruction given an address
● Program Counter (PC): holds the address of the current instruction
● Adder: increments the PC to the address of the next instruction
PC+4
op rs rt rd shamt func
Why
? 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
32-bit Machine Code add $20, $9, $10 00000 0100 0101 1010 0000 100000
0 1 0 0 0 0
Registers and ALUs Executing an R-type instruction
• R-type instructions must access registers and an ALU. RegWrite 1. Read an instruction from the instruction memory.
2. The source registers, specified by instruction fields rs and rt, should be
Read Read
• Our register file stores thirty-two 32-bit values. register 1 data 1 read from the register file.
— Each register specifier is 5 bits long. Read
Read
3. The ALU performs the desired operation.
register 2
— You can read from two registers at a time. Write
data 2
4. Its result is stored in the destination register, which is specified by field rd of
— RegWrite is 1 if a register should be written.
register
Registers the instruction word.
Write
RegWrite
data
memory data 2
I [15 - 11] Write
ALUOp Function ALU register ALUOp
Registers
Write
000 and data
001 or
ALUOp
010 add
110 subtract op rs rt rd shamt func
111 slt 31 26 25 21 20 16 15 11 10 6 5 0
1 1 0 RegDst ALUSrc
MemRead
I [15 - 0] Sign
extend
I [15 - 0] Sign
extend
sw instruction path beq instruction path
• An example store instruction is sw $a0, 16($sp). • One sample branch instruction is beq $at, $0, offset. The branch may
• The ALUOp must be 010 (add), again to compute the effective address. • The ALUOp is 110 (subtract), to test for equality. or may not be
taken, depending
0
M
0 on the ALU’s Zero
M
Add u
Add u output
x x
PC 4 4
Add 1 PC Add 1
Shift Shift
left 2 left 2
PCSrc PCSrc
RegWrite RegWrite
MemWrite MemToReg MemToReg
Read Instruction I [25 - 21] MemWrite
Read Read Read Instruction I [25 - 21]
address [31-0] Read Read
register 1 data 1 address [31-0]
ALU Read Read 1 register 1 data 1
I [20 - 16] Zero address data ALU Read Read 1
Read M I [20 - 16] Zero address data
Instruction Read M
register 2 Read 0 Result Write u Instruction Read
memory 0 register 2 0 Result Write u
data 2 M x memory 0
M address data 2 M address x
Write u M Write
u Data 0 u 0
register x Write u Data
x memory register x Write
I [15 - 11] Registers 1 ALUOp data x Registers memory
1 Write I [15 - 11] Write 1 ALUOp data
1
data data
MemRead MemRead
RegDst ALUSrc ALUSrc
RegDst
I [15 - 0] Sign I [15 - 0] Sign
extend extend 0, if branch not-
PCSrc = taken
1, if branch taken
control signals, RegWrite and MemWrite. These data ● the number of clock cycles occurring in 1 second
MemWrite
● measured in units of Hertz (Hz)
units can be written to only if the control signal is ● also called the Clock Rate F = 1 / Period
asserted and there is a positive clock edge. Read Read
address data
Write
• In a single-cycle machine the PC is updated on each address 10 nsec clock cycle period 100 MHz clock rate
clock cycle, so we don’t bother to give it an explicit Data When designing processors,
Write 5 nsec clock cycle period 200 MHz clock rate
write control signal. data
memory we work in units of:
2 nsec clock cycle period 500 MHz clock rate
Nanoseconds
MemRead 1 nsec (10-9) clock cycle period 1 GHz (109) clock rate
or
500 psec clock cycle period 2 GHz clock rate
GHz
250 psec clock cycle period 4 GHz clock rate
PC
200 psec clock cycle period 5 GHz clock rate
CPU Time CPU Time Exercise
I [15 - 0] Sign
extend
Compute the longest path in the add instruction The Slowest Instruction...
PC+4 • If all instructions must complete within one clock cycle, then the cycle time has to
0
be large enough to accommodate the slowest instruction.
M
u
• For example, lw $t0, –4($sp) is the slowest instruction needing ___ns.
Add
PC 4 Add
x
— Assuming the circuit latencies below.
1
Shift
2 ns left 2
PCSrc
RegWrite 2 ns 0 ns
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1 ALU Read Read 1
I [20 - 16] Zero address data
Read M Read Instruction I [25 - 21]
Instruction Read 0 Write u Read Read
0 register 2 Result address [31-0]
memory data 2 x register 1 data 1
M address ALU Read Read 1
M Write u 0 I [20 - 16] Zero address data
u Data Read M
register x Write Instruction Read 0
x memory register 2 Result Write ux
Registers 0
2 ns I [15 - 11]
1 Write 1 ALUOp data
0 ns
memory data 2 M address 0
M Write
data 2 ns u register
ux Data
MemRead x 1 Write
memory 0 ns
RegDst 1 ns ALUSrc 2 ns I [15 - 11]
1 Write
Registers 2 ns data
I [15 - 0] 0 data 0 ns
Sign
0 ns 2 ns
extend 2 ns 0 ns 1 ns
ns I [15 - Sign
0] extend
0 ns