0% found this document useful (0 votes)
16 views11 pages

04 The+processor

The document discusses the implementation of a simple MIPS-based instruction set architecture, focusing on single-cycle, multicycle, and pipelining techniques. It details the operation of the CPU, instruction fetching, and the execution of various instruction types, including R-type, I-type, and branch instructions. Additionally, it explains the control unit's role in managing control signals for proper instruction execution.

Uploaded by

x6ycdqdpj6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views11 pages

04 The+processor

The document discusses the implementation of a simple MIPS-based instruction set architecture, focusing on single-cycle, multicycle, and pipelining techniques. It details the operation of the CPU, instruction fetching, and the execution of various instruction types, including R-type, I-type, and branch instructions. Additionally, it explains the control unit's role in managing control signals for proper instruction execution.

Uploaded by

x6ycdqdpj6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Content

• Single-cycle implementation
CSCE 5610 • All operations take the same amount of time—a single cycle.
• Multicycle implementation
Computer System Architecture • Allows faster operations to take less time than slower ones, so overall
performance can be increased.
• Pipelining
The Processor • Lets a processor overlap the execution of several instructions, potentially
leading to big performance gains.

Single-cycle implementation Computers are state machines


• A computer is just a big fancy state machine.
• We will describe the implementation a simple MIPS-based instruction set
supporting just the following operations. — Registers, memory, hard disks and other storage form the state.
— The processor keeps reading and updating the state, according to the
Arithmetic: add sub and or s instructions in some program.
l
t
Data Transfer: lw sw

• Control:
Today we’ll build a single-cyclebeq
implementation of this instruction set.
— All instructions will execute in the same amount of time; this will
determine the clock cycle time for our performance equations. CPU
— We’ll explain the datapath first, and then make the control unit.

State
Instruction fetching Basic MIPS implementation
• The CPU is always in an infinite loop, fetching The first two steps for every instruction:
instructions from memory and executing them. ● Send PC to instruction memory
Add ● Read one or two registers using fields in the machine code
• The program counter or PC register holds the 4
address of the current instruction. P
C

Read Instruction
address [31-0]

Instruction
memory

● Instruction Memory: stores the code and supply instruction given an address
● Program Counter (PC): holds the address of the current instruction
● Adder: increments the PC to the address of the next instruction

Building a datapath: Fetch Encoding R-type instructions


To execute any instruction: • Last lecture, we saw encodings of MIPS instructions as 32-bit values.
• Register-to-register arithmetic instructions use the R-type format.
1. Fetch the instruction from memory
— op is the instruction opcode, and func specifies a particular arithmetic
2. Increment the PC to point at the next instruction operation.
— rs, rt and rd are source and destination registers.

PC+4
op rs rt rd shamt func
Why
? 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

• An example instruction and its encoding:

32-bit Machine Code add $20, $9, $10 00000 0100 0101 1010 0000 100000
0 1 0 0 0 0
Registers and ALUs Executing an R-type instruction
• R-type instructions must access registers and an ALU. RegWrite 1. Read an instruction from the instruction memory.
2. The source registers, specified by instruction fields rs and rt, should be
Read Read
• Our register file stores thirty-two 32-bit values. register 1 data 1 read from the register file.
— Each register specifier is 5 bits long. Read
Read
3. The ALU performs the desired operation.
register 2
— You can read from two registers at a time. Write
data 2
4. Its result is stored in the destination register, which is specified by field rd of
— RegWrite is 1 if a register should be written.
register
Registers the instruction word.
Write
RegWrite
data

Read Instruction I [25 - 21] Read Read


ALU
address [31-0] register 1
• Here’s a simple ALU with five operations, selected by a 3- data 1 Zero
I [20 - 16] Read
bit control signal ALUOp. Instruction register 2 Read
Result

memory data 2
I [15 - 11] Write
ALUOp Function ALU register ALUOp
Registers
Write
000 and data
001 or
ALUOp
010 add
110 subtract op rs rt rd shamt func
111 slt 31 26 25 21 20 16 15 11 10 6 5 0

Encoding I-type instructions Accessing data memory


• The lw, sw and beq instructions all use the I-type encoding. • For an instruction like lw $t0, –4($sp), the base register $sp is added to the sign-extended
constant to get a data memory address.
— rt is the destination for lw, but a source for beq and sw.
• This means the ALU must accept either a register operand for arithmetic instructions, or a
— address is a 16-bit signed constant. sign-extended immediate operand for lw and sw.
• We’ll add a multiplexer, controlled by ALUSrc, to select either a register operand (0) or a
constant operand (1). RegWrite
op rs rt address MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
6 bits 5 bits 5 bits 16 bits address [31-0]
register 1 data 1 ALU Read Read 1
I [20 - 16] Zero address data
Instruction Read M
register 2 Read 0 Result Write u
• Two example instructions: memory 0
data 2 M address x
M Write u Data 0
u register x Write
x Registers memory
lw $t0, –4($sp) sw $a0, I [15 - 11]
1 Write 1 ALUOp data
10001 1110 0100 1111 1111 1111 1100 data

1 1 0 RegDst ALUSrc
MemRead

16($sp) I [15 - 0] Sign


10101 1110 0010 0000 0000 0001 0000 extend
1 1 0

lw $t0, –4($sp) 10001 1110 0100 1111 1111 1111 1100


1 1 0
MemToReg RegDst
• The register file’s ― “Write data" input has a similar problem. It must be able to • A final annoyance is the destination register of lw is in rt instead of rd.
store either the ALU output of R-type instructions, or the data memory output for op rs rt address
lw. lw $rt, address($rs)
• We add a mux, controlled by MemToReg, to select between saving the ALU
• We’ll add one more mux, controlled by RegDst, to select the destination register
result (0) or the data memory output (1) to the registers.
from either instruction field rt (0) or field rd (1).
RegWrite
RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21] MemWrite MemToReg
Read Read Read I [25 - 21]
address [31-0] Read Read
register 1 data 1 Instruction
ALU Read Read 1 register 1
I [20 - 16] Zero address data data 1 ALU Read Read 1
Read M address [31-0] 16]
Instruction Zero address data
register 2 Read 0 Result Write u Read M
memory 0 Read 0 u
data 2 M address x I 0 register 2 Result Write
M Write u data 2 M x
Data 0 [2 M Write address
u register x Write Instruction u
x memory 0- u Data 0
I [15 - 11] Registers ALUOp register x Write
Write 1 data x memory
1 I [15 - 11] Registers 1 ALUOp data
data 1 Write
MemRead memory data
RegDst ALUSrc MemRead
RegDst ALUSrc
I [15 - 0] Sign
I [15 - 0] Sign
extend
extend

Branches The steps in executing a beq


• For branch instructions, the constant is not an address but an instruction offset 1. Fetch the instruction, like beq $at, $0, offset, from memory.
from the current program counter to the desired address. 2. Read the source registers, $at and $0, from the register file.
3. Compare the values by subtracting them in the ALU.
beq $at, $0, L
4. If the subtraction result is 0, the source operands were equal and the PC should
add $v1, $v0, $0
be loaded with the target address, PC + 4 + (offset x 4).
add $v1, $v1, $v
1 5. Otherwise the branch should not be taken, and the PC should just be
j Somewhere
incremented to PC + 4 to fetch the next instruction sequentially.
L: add $v1, $v0, $v
• The target address L is three instructions past the beq, 0so the encoding of the
branch instruction has 0000 0000 0000 0011 for the address field.

000100 00001 00000 0000 0000 0000 0011


op rs rt address
• Instructions are four bytes long, so the actual memory offset is 12 bytes.
Branching hardware Control
We need a second adder, since the ALU
is already doing subtraction for the beq. • The control unit is responsible for setting all the control signals so that each
instruction is executed properly.
• PCSrc=1 branches
0 — The control unit’s input is the 32-bit instruction word.
M to PC+4+(offset 4).
Add u
x
• PCSrc=0 continues — The outputs are values for the blue control signals in the datapath.
PC 4
Multiply constant Add 1 to PC+4.
Shift
by 4 to get offset. left 2
PCSrc
• Most of the signals can be generated from the instruction opcode alone, and not
RegWrite the entire 32-bit word.
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read • To illustrate the relevant control signals, we will show the route that is taken
address [31-0]
I [20 - 16]
register 1 data 1 ALU Read Read
address data
1 through the datapath by R-type, lw, sw and beq instructions.
Read Zero M
Instruction Read 0 u
0 register 2 Write
memory Result
data 2 M address x
M Write
u u Data 0
register x Write
x Registers memory
I [15 - 11] Write 1 ALUOp data
1
data
MemRead
RegDst ALUSrc

I [15 - 0] Sign
extend

R-type instruction path lw instruction path


• An example load instruction is lw $t0, –4($sp).
• The R-type instructions include add, sub, and, or, and slt. • The ALUOp must be 010 (add), to compute the effective address.
• The ALUOp is determined by the instruction’s ―"func” field.
0
M
Add u
x
PC 4
Add 1
Shift
left 2
PCSrc
RegWrite
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1 ALU Read Read 1
I [20 - 16] Zero address data
Read M
Instruction Read 0 Write u
0 register 2 Result
memory data 2 x
M address
M Write u Data 0
u register x
x Write
Registers memory
I [15 - 11] Write 1 ALUOp data
1
data
MemRead
RegDst ALUSrc

I [15 - 0] Sign
extend
sw instruction path beq instruction path
• An example store instruction is sw $a0, 16($sp). • One sample branch instruction is beq $at, $0, offset. The branch may
• The ALUOp must be 010 (add), again to compute the effective address. • The ALUOp is 110 (subtract), to test for equality. or may not be
taken, depending
0
M
0 on the ALU’s Zero
M
Add u
Add u output
x x
PC 4 4
Add 1 PC Add 1
Shift Shift
left 2 left 2
PCSrc PCSrc
RegWrite RegWrite
MemWrite MemToReg MemToReg
Read Instruction I [25 - 21] MemWrite
Read Read Read Instruction I [25 - 21]
address [31-0] Read Read
register 1 data 1 address [31-0]
ALU Read Read 1 register 1 data 1
I [20 - 16] Zero address data ALU Read Read 1
Read M I [20 - 16] Zero address data
Instruction Read M
register 2 Read 0 Result Write u Instruction Read
memory 0 register 2 0 Result Write u
data 2 M x memory 0
M address data 2 M address x
Write u M Write
u Data 0 u 0
register x Write u Data
x memory register x Write
I [15 - 11] Registers 1 ALUOp data x Registers memory
1 Write I [15 - 11] Write 1 ALUOp data
1
data data
MemRead MemRead
RegDst ALUSrc ALUSrc
RegDst
I [15 - 0] Sign I [15 - 0] Sign
extend extend 0, if branch not-
PCSrc = taken
1, if branch taken

Control Signal Table Generating Control Signals


Operatio RegDst RegWrit ALUSr ALUOp MemWrite MemRead MemToReg
• The control unit needs 13 bits of inputs.
n e c
— Six bits make up the instruction’s opcode.
add 1 1 0 010 0 0 0
— Six bits come from the instruction’s func field.
sub 1 1 0 110 0 0 0
— It also needs the Zero output of the ALU.
and 1 1 0 000 0 0 0
• The control unit generates 10 bits of output, corresponding to the signals
or 1 1 0 001 0 0 0 mentioned on the previous page.
slt 1 1 0 111 0 0 0
lw 0 1 1 010 0 1 1 RegDst
RegWrite
sw X 0 1 010 1 0 X
I [31 - 26] ALUSrc
Read Instruction
• beq
sw X the only0 instructions
and beq are 0 that 110
do not write0 any registers.
0 X address [31-0] ALUOp
I [5 - MemWrite
Control
• lw and sw are the only instructions that use the constant field. They also depend Instruction
0]
MemRead

on the ALU to compute the effective memory address. memory MemToReg


PCSrc
• ALUOp for R-type instructions depends on the instructions’ func field.
• The PCSrc control signal (not listed) should be set if the instruction is beq Zero

and the ALU’s Zero output is true.


Single-Cycle Performance Processor clock period
the processor clock oscillates between high and low signal levels:
• We saw a MIPS single-cycle datapath and control unit.
1.2 Volts
• Then, we’ll explore factors that contribute to a processor’s execution time, and Clock
Signal
specifically at the performance of the single-cycle machine. 0 Volts
• Next, we’ll explore how to improve on the single cycle machine’s performance Time Clock Cycle 1 Clock Cycle 2 Clock Cycle 3 Clock Cycle 4
using pipelining.
Oscilloscope Operation

the clock synchronizes the processor’s operations on rising


& falling edges:
● a register updates contents upon arrival of clock edge
● a processor starts instruction execution at clock edge

Clock Period = Clock Cycle Time


● interval of time between 2 adjacent rising edges
● duration of 1 cycle of the clock signal
● clock period has units of seconds (nanoseconds, microseconds, or picoseconds)

Edge-triggered State Elements Processor Frequency


RegWrite Clock Frequency = 1 / (Clock Period)
• In an instruction like add $t1, $t1, $t2, how do we know
$t1 is not updated until after its original value is read? Read
register 1
Read a CPU’s Clock Frequency is the inverse of it’s Clock Cycle time
data
• We’ll assume that our state elements are positive edge Read
1
register Read
triggered, and are updated only on the positive edge of a 2 data
clock signal. 2 Cycle 1 Cycle 2 Cycle 3
Write
register one clock period
• The register file and data memory have explicit write Write
Registers

control signals, RegWrite and MemWrite. These data ● the number of clock cycles occurring in 1 second
MemWrite
● measured in units of Hertz (Hz)
units can be written to only if the control signal is ● also called the Clock Rate F = 1 / Period
asserted and there is a positive clock edge. Read Read
address data
Write
• In a single-cycle machine the PC is updated on each address 10 nsec clock cycle period 100 MHz clock rate
clock cycle, so we don’t bother to give it an explicit Data When designing processors,
Write 5 nsec clock cycle period 200 MHz clock rate
write control signal. data
memory we work in units of:
2 nsec clock cycle period 500 MHz clock rate
Nanoseconds
MemRead 1 nsec (10-9) clock cycle period 1 GHz (109) clock rate
or
500 psec clock cycle period 2 GHz clock rate
GHz
250 psec clock cycle period 4 GHz clock rate
PC
200 psec clock cycle period 5 GHz clock rate
CPU Time CPU Time Exercise

CPI-Cycles per Instruction CPU Time


■ The average number of clock cycles per instruction, or CPI, is a function of the
machine and program.
■ The CPI depends on the actual instructions appearing in the program— a
floating-point intensive application might have a higher CPI than an integer-
based program.
■ It also depends on the CPU implementation. For example, a Pentium can
execute the same instructions as an older 80486, but faster.
■ We assumed each instruction took one cycle, so we had CPI = 1.
■ The CPI can be >1 due to memory stalls and slow instructions.
■ The CPI can be <1 on machines that execute more than 1 instruction per
cycle (superscalar).
Instructions Executed Instructions Executed
• Instructions executed: • Instructions executed:
— We are not interested in the static instruction count, or how many — We are not interested in the static instruction count, or how many
lines of code are in a program. lines of code are in a program.
— Instead we care about the dynamic instruction count, or how many — Instead we care about the dynamic instruction count, or how many
instructions are actually executed when the program runs. instructions are actually executed when the program runs.
— There are three lines of code below, but the number of instructions executed
would be 2001.

li $4, 1000 li $4, 1000


Ostrich: subi $4, $4, 1 Ostrich: subi $4, $4, 1
bne $4, $0, Ostrich bne $4, $0, Ostrich

Clock Cycle Time How does add go through the datapath


• One "cycle" is the minimum time it takes the CPU to do any work.
0
— The clock cycle time or clock period is just the length of a cycle. M
Add u
— The clock rate, or frequency, is the reciprocal of the cycle time. x
PC 4
Add 1
• Generally, a higher frequency is better. Shift
left 2
• Some examples illustrate some typical frequencies. PCSrc
RegWrite
— A 500MHz processor has a cycle time of 2ns. MemWrite MemToReg
Read Instruction I [25 - 21]
— A 2GHz (2000MHz) CPU has a cycle time of just 0.5ns. address [31-0]
Read
register 1
Read
data 1 ALU Read Read 1
I [20 - 16] Zero address data
Read M
Instruction Read 0 u
0 register 2 Result Write
memory data 2 x
M address
M Write u Data 0
u register x
x Write
Registers memory
I [15 - 11] Write 1 ALUOp data
1
data
MemRead
RegDst ALUSrc

I [15 - 0] Sign
extend
Compute the longest path in the add instruction The Slowest Instruction...
PC+4 • If all instructions must complete within one clock cycle, then the cycle time has to
0
be large enough to accommodate the slowest instruction.
M
u
• For example, lw $t0, –4($sp) is the slowest instruction needing ___ns.
Add

PC 4 Add
x
— Assuming the circuit latencies below.
1
Shift
2 ns left 2
PCSrc
RegWrite 2 ns 0 ns
MemWrite MemToReg
Read Instruction I [25 - 21]
Read Read
address [31-0]
register 1 data 1 ALU Read Read 1
I [20 - 16] Zero address data
Read M Read Instruction I [25 - 21]
Instruction Read 0 Write u Read Read
0 register 2 Result address [31-0]
memory data 2 x register 1 data 1
M address ALU Read Read 1
M Write u 0 I [20 - 16] Zero address data
u Data Read M
register x Write Instruction Read 0
x memory register 2 Result Write ux
Registers 0
2 ns I [15 - 11]
1 Write 1 ALUOp data
0 ns
memory data 2 M address 0
M Write
data 2 ns u register
ux Data
MemRead x 1 Write
memory 0 ns
RegDst 1 ns ALUSrc 2 ns I [15 - 11]
1 Write
Registers 2 ns data

I [15 - 0] 0 data 0 ns
Sign
0 ns 2 ns
extend 2 ns 0 ns 1 ns
ns I [15 - Sign
0] extend
0 ns

The Slowest Instruction... ...Determines the Clock Cycle Time


• If all instructions must complete within one clock cycle, then the cycle time has to be large • If we make the cycle time 8ns then every instruction will take 8ns, even if they don’t need
enough to accommodate the slowest instruction. that much time.
• For example, lw $t0, –4($sp) is the slowest instruction needing ___ns. • For example, the instruction add $s4, $t1, $t2 really needs just 6ns.
— Assuming the circuit latencies below.
reading the instruction memory 2ns reading the instruction memory 2 ns
reading the base register $sp 1ns computing memory reading registers $t1 and $t2 1 ns computing
address $sp-4 2ns 8ns $t1 + $t2 2 ns 6 ns
reading the data memory 2ns storing the result into $s0 1 ns
storing data back to $t0 1ns
Read Instruction I [25 - 21] Read Instruction I [25 - 21]
Read Read Read Read
address [31-0] address [31-0]
register 1 data 1 register 1 data 1
ALU Read Read 1 ALU Read Read 1
I [20 - 16] Zero address data I [20 - 16] Zero address data
Read M Read M
Instruction Read 0 Instruction Read 0 Write ux
register 2 Result Write ux 0 register 2 Result
memory 0 memory
data 2 M 0 data 2 M address 0
M address M Write
Write ux ux
u Data u register Data
x
register 1 Write 0 ns x 1 Write
memory 0 ns
2 ns I [15 - 11] Registers 2 ns data
memory 2 ns I [15 - 11] Write
Registers 2 ns data
1 Write 1
data 0 ns data 0 ns
2 ns 2 ns
0 ns 1 ns 0 ns 1 ns
I [15 - Sign I [15 - Sign
0] extend 0] extend
0 ns 0 ns
How bad is this? How bad is this?
• With these same component delays, a sw instruction would need 7ns, and beq • With these same component delays, a sw instruction would need 7ns, and beq
would need just 5ns. would need just 5ns.
• Let’s consider the gcc instruction mix. • Let’s consider the gcc instruction mix.

Instruction Frequency Instruction Frequency


Arithmetic 48% Arithmetic 48%
Loads 22% Loads 22%
Stores 11% Stores 11%
Branches 19% Branches 19%
• With a single-cycle datapath, each instruction would require 8ns. • With a single-cycle datapath, each instruction would require 8ns.
• But if we could execute instructions as fast as possible, the average time per • But if we could execute instructions as fast as possible, the average time per
instruction for gcc would be? instruction for gcc would be?

(48% x 6ns) + (22% x 8ns) + (11% x 7ns) + (19% x 5ns) = 6.36ns

• The single-cycle datapath is about 1.26 times slower!

You might also like