0% found this document useful (0 votes)
67 views34 pages

Single-Cycle Processors: Datapath & Control: Computer Science & Artificial Intelligence Lab M.I.T

This document discusses implementing a single-cycle processor for the MIPS instruction set architecture (ISA). It describes the key components needed in the datapath and control logic to execute MIPS instructions in a single cycle, including register files, an ALU, program counter, and memory. A single-cycle processor can execute each instruction in one clock cycle by having parallel hardware units that operate simultaneously, as opposed to pipelined processors that divide instruction execution across multiple cycles.

Uploaded by

Emin Kültürel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views34 pages

Single-Cycle Processors: Datapath & Control: Computer Science & Artificial Intelligence Lab M.I.T

This document discusses implementing a single-cycle processor for the MIPS instruction set architecture (ISA). It describes the key components needed in the datapath and control logic to execute MIPS instructions in a single cycle, including register files, an ALU, program counter, and memory. A single-cycle processor can execute each instruction in one clock cycle by having parallel hardware units that operate simultaneously, as opposed to pipelined processors that divide instruction execution across multiple cycles.

Uploaded by

Emin Kültürel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

1

Single-Cycle Processors:

Datapath & Control

Arvind

Computer Science & Artificial Intelligence Lab

M.I.T.

Based on the material prepared by

Arvind and Krste Asanovic

6.823 L5- 2

Instruction Set Architecture (ISA)


Arvind

versus Implementation
ISA is the hardware/software interface
Defines set of programmer visible state
Defines instruction format (bit encoding) and instruction
semantics

Examples: MIPS, x86, IBM 360, JVM

Many possible implementations of one ISA

360 implementations: model 30 (c. 1964), z900 (c. 2001)


x86 implementations: 8086 (c. 1978), 80186, 286, 386, 486,
Pentium, Pentium Pro, Pentium-4 (c. 2000), AMD Athlon,
Transmeta Crusoe, SoftPC
MIPS implementations: R2000, R4000, R10000, ...

JVM: HotSpot, PicoJava, ARM Jazelle, ...

September 26, 2005


6.823 L5- 3
Arvind

Processor Performance

Time = Instructions Cycles Time


Program Program * Instruction * Cycle

Instructions per program depends on source code, compiler


technology, and ISA
Cycles per instructions (CPI) depends upon the ISA and the
microarchitecture
Time per cycle depends upon the microarchitecture and the
base technology

Microarchitecture CPI cycle time


Microcoded >1 short
this lecture

Single-cycle unpipelined 1 long


Pipelined 1 short

September 26, 2005


6.823 L5- 4
Arvind

Microarchitecture: Implementation of an ISA

Controller control
status points
lines

Data
path

Structure: How components are connected.


Static
Behavior: How data moves between components
Dynamic
September 26, 2005
Hardware Elements

Combinational circuits OpSelect


Mux, Demux, Decoder, ALU, ... - Add, Sub, ...
- And, Or, Xor, Not, ...
Sel Sel - GT, LT, EQ, Zero, ...
lg(n)
lg(n)
A0 O0 O0 A

Decoder
...
O1
...
O1

Demux
A1 Result
...
O A A
Mux lg(n)
ALU
Comp?
An-1 On- On-1 B
1

Synchronous state elements


Flipflop, Register, Register file, SRAM, DRAM
D register
Clk ...
D0 D1 D2 Dn-1
En En En
Clk
ff
Clk ff ff ff ... ff
D
Q Q Q0 Q1 Q2 ... Qn-1

Edge-triggered: Data is sampled at the rising edge


September 26, 2005
6.823 L5- 6
Arvind

Register Files
Clock WE
we
ReadSel1 rs1 rd1 ReadData1
ReadSel2 rs2 Register rd2 ReadData2
file
WriteSel ws
2R+1W
WriteData wd

ws clk wd rs1
5 32 5

register 0 rd1


32
32
we register 1
rs2


32 5
register 31
rd2


32 32

No timing issues in reading a selected register


Register files with a large number of ports are difficult
to design
Intels Itanium, GPR File has 128 registers with 8 read ports and
4 write ports!!!
September 26, 2005
6.823 L5- 7
Arvind

A Simple Memory Model

WriteEnable
Clock

Address
MAGIC ReadData
RAM
WriteData

Reads and writes are always completed in one cycle


a Read can be done any time (i.e. combinational)
a Write is performed at the rising clock edge
if it is enabled
the write address and data
must be stable at the clock edge

Later in the course we will present a more realistic


model of memory
September 26, 2005
6.823 L5- 8
Arvind

Implementing MIPS:

Single-cycle per instruction

datapath & control logic

September 26, 2005


6.823 L5- 9
Arvind

The MIPS ISA

Processor State
32 32-bit GPRs, R0 always contains a 0

32 single precision FPRs, may also be viewed as

16 double precision FPRs


FP status register, used for FP compares & exceptions
PC, the program counter
some other special registers

Data types

8-bit byte, 16-bit half word

32-bit word for integers

32-bit word for single precision floating point

64-bit word for double precision floating point

Load/Store style instruction set

data addressing modes- immediate & indexed


branch addressing modes- PC relative & register indirect
Byte addressable memory- big endian mode

All instructions are 32 bits


September 26, 2005
6.823 L5- 10
Arvind

Instruction Execution

Execution of an instruction involves

1. instruction fetch
2. decode and register fetch
3. ALU operation
4. memory operation (optional)
5. write back

and the computation of the address of the


next instruction

September 26, 2005


6.823 L5- 11
Arvind

Datapath: Reg-Reg ALU Instructions

RegWrite
0x4
Add clk

inst<25:21> we
inst<20:16> rs1
addr rs2
PC inst<15:11> rd1
inst ALU
ws
wd rd2 z
clk Inst.
GPRs
Memory

inst<5:0> ALU
Control

OpCode

RegWrite Timing?
6 5 5 5 5 6
0 rs rt rd 0 func rd (rs) func (rt)
31 26 25 21 20 16 15 11 5 0
September 26, 2005
6.823 L5- 12
Arvind

Datapath: Reg-Imm ALU Instructions

RegWrite
0x4
clk
Add

inst<25:21> we
rs1
rs2
PC addr rd1
inst<20:16>
inst ws ALU
wd rd2 z
clk Inst. GPRs
Memory
inst<15:0> Imm
Ext
inst<31:26> ALU
Control

OpCode ExtSel

6 5 5 16
opcode rs rt immediate rt (rs) op immediate
31 26 25 2120 16 15 0
September 26, 2005
6.823 L5- 13
Arvind

Conflicts in Merging Datapath

RegWrite
0x4 Introduce
clk
Add muxes
inst<25:21> we
rs1
rs2
PC addr rd1
inst<20:16>
inst ws ALU
inst<15:11> wd rd2 z
clk Inst. GPRs
Memory
inst<15:0> Imm
Ext
inst<31:26> ALU
inst<5:0> Control

OpCode ExtSel

6 5 5 5 5 6
0 rs rt rd 0 func rd (rs) func (rt)
opcode rs rt immediate rt (rs) op immediate
September 26, 2005
6.823 L5- 14
Arvind

Datapath for ALU Instructions

RegWrite
0x4
clk
Add

<25:21> we
rs1
<20:16> rs2
PC addr rd1
inst ws ALU
<15:11> wd rd2 z
clk Inst. GPRs
Memory
<15:0> Imm
Ext
<31:26>, <5:0> ALU
Control

OpCode RegDst ExtSel OpSel BSrc


rt / rd Reg / Imm
6 5 5 5 5 6

0 rs rt rd 0 func rd (rs) func (rt)


opcode rs rt immediate rt (rs) op immediate
September 26, 2005
6.823 L5- 15
Arvind

Datapath for Memory Instructions

Should program and data memory be separate?

Harvard style: separate (Aiken and Mark 1 influence)


- read-only program memory
- read/write data memory
at some level the two memories have
to be the same

Princeton style: the same (von Neumanns influence)


- A Load or Store instruction requires
accessing the memory more than once
during its execution

September 26, 2005


6.823 L5- 16
Arvind

Load/Store Instructions:Harvard Datapath

RegWrite MemWrite
0x4 clk WBSrc
Add
ALU / Mem
base we
clk
rs1
rs2
addr rd1 we
PC ws addr
inst ALU
wd rd2
z
Inst. GPRs rdata
clk Data
Memory disp Imm Memory
Ext wdata
ALU
Control

OpCode RegDst ExtSel OpSel BSrc

6 5 5 16 addressing mode
opcode rs rt displacement (rs) + displacement
31 26 25 21 20 16 15 0
rs is the base register
rt is the destination of a Load or the source for a Store
September 26, 2005
6.823 L5- 17
Arvind

MIPS Control Instructions


Conditional (on GPR) PC-relative branch
6 5 5 16
opcode rs offset BEQZ, BNEZ

Unconditional register-indirect jumps

6 5 5 16
opcode rs JR, JALR

Unconditional absolute jumps


6 26
opcode target J, JAL

PC-relative branches add offset4 to PC+4 to calculate the


target address (offset is in words): 128 KB range
Absolute jumps append target4 to PC<31:28> to calculate
the target address: 256 MB range
jump-&-link stores PC+4 into the link register (R31)
All Control Transfers are delayed by 1 instruction
we will worry about the branch delay slot later
September 26, 2005
6.823 L5- 18
Arvind

Conditional Branches (BEQZ, BNEZ)


PCSrc
br RegWrite MemWrite WBSrc

pc+4

0x4
Add
Add

clk

we
clk
rs1
rs2
PC addr rd1 we
inst ws addr
wd rd2 ALU
clk Inst. GPRs z rdata
Memory Data
Imm Memory
Ext wdata
ALU
Control

OpCode RegDst ExtSel OpSel BSrc zero?

September 26, 2005


6.823 L5- 19
Arvind

Register-Indirect Jumps (JR)


PCSrc
br RegWrite MemWrite WBSrc
rind

pc+4

0x4
Add
Add

clk

we
clk
rs1
rs2
PC addr rd1 we
inst ws addr
wd rd2 ALU
clk Inst. GPRs z rdata
Memory Data
Imm Memory
Ext wdata
ALU
Control

OpCode RegDst ExtSel OpSel BSrc zero?

September 26, 2005


6.823 L5- 20
Arvind

Register-Indirect Jump-&-Link (JALR)


PCSrc
br RegWrite MemWrite WBSrc
rind

pc+4

0x4
Add
Add

clk

we
clk
rs1
rs2
PC addr 31 rd1 we
inst ws addr
wd rd2 ALU
clk Inst. GPRs z rdata
Memory Data
Imm Memory
Ext wdata
ALU
Control

OpCode RegDst ExtSel OpSel BSrc zero?

September 26, 2005


6.823 L5- 21
Arvind

Absolute Jumps (J, JAL)


PCSrc
br RegWrite MemWrite WBSrc
rind
jabs
pc+4

0x4
Add
Add

clk

we
clk
rs1
rs2
PC addr 31 rd1 we
inst ws addr
wd rd2 ALU
clk Inst. GPRs z rdata
Memory Data
Imm Memory
Ext wdata
ALU
Control

OpCode RegDst ExtSel OpSel BSrc zero?

September 26, 2005


6.823 L5- 22
Arvind

Harvard-Style Datapath for MIPS


PCSrc
br RegWrite MemWrite WBSrc
rind
jabs
pc+4

0x4
Add
Add

clk

we
clk
rs1
rs2
PC addr 31 rd1 we
inst ws addr
wd rd2 ALU
clk Inst. GPRs z rdata
Memory Data
Imm Memory
Ext wdata
ALU
Control

OpCode RegDst ExtSel OpSel BSrc zero?

September 26, 2005


23

Five-minute break to stretch your legs

6.823 L5- 24
Arvind

Single-Cycle Hardwired Control:

Harvard architecture

We will assume
clock period is sufficiently long for all of

the following steps to be completed:

1. instruction fetch
2. decode and register fetch
3. ALU operation
4. data fetch if required
5. register write-back setup time

tC > tIFetch + tRFetch + tALU+ tDMem+ tRWB

At the rising edge of the following clock, the PC,

the register file and the memory are updated

September 26, 2005


6.823 L5- 25
Arvind

Hardwired Control is pure


Combinational Logic

ExtSel
BSrc
OpSel
op code
combinational MemWrite

logic WBSrc
zero?
RegDst
RegWrite
PCSrc

September 26, 2005


6.823 L5- 26
Arvind

ALU Control & Immediate Extension

Inst<5:0> (Func)

Inst<31:26> (Opcode)
ALUop
+
0?

OpSel
( Func, Op, +, 0? )

Decode Map
ExtSel
( sExt16, uExt16,
High16)
September 26, 2005
6.823 L5- 27
Arvind

Hardwired Control Table

Opcode ExtSel BSrc OpSel MemW RegW WBSrc RegDst PCSrc


ALU * Reg Func no yes ALU rd pc+4
ALUi sExt16 Imm Op no yes ALU rt pc+4
ALUiu uExt16 Imm Op no yes ALU rt pc+4
LW sExt16 Imm + no yes Mem rt pc+4
SW sExt16 Imm + yes no * * pc+4
BEQZz=0 sExt16 * 0? no no * * br
BEQZz=1 sExt16 * 0? no no * * pc+4
J * * * no no * * jabs
JAL * * * no yes PC R31 jabs
JR * * * no no * * rind
JALR * * * no yes PC R31 rind

BSrc = Reg / Imm WBSrc = ALU / Mem / PC


RegDst = rt / rd / R31 PCSrc = pc+4 / br / rind / jabs
September 26, 2005
6.823 L5- 28
Arvind

Pipelined MIPS
To pipeline MIPS:

First build MIPS without pipelining with CPI=1

Next, add pipeline registers to reduce cycle


time while maintaining CPI=1

September 26, 2005


6.823 L5- 29
Arvind

Pipelined Datapath

0x4
Add
we
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Ext wdata

write
fetch decode & Reg-fetch execute memory -back
phase phase phase phase phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
However, CPI will increase unless instructions are pipelined
September 26, 2005
6.823 L5- 30
Arvind

An Ideal Pipeline

stage stage stage stage


1 2 3 4

All objects go through the same stages

No sharing of resources between any two stages

Propagation delay through all pipeline stages is equal

The scheduling of an object entering the pipeline


is not affected by the objects in other stages
These conditions generally hold for industrial
assembly lines.
But can an instruction pipeline satisfy the last
condition?
September 26, 2005
6.823 L5- 31

How to divide the datapath


Arvind

into stages
Suppose memory is significantly slower than
other stages. In particular, suppose

tIM
= 10 units
tDM
= 10 units

tALU = 5 units

tRF
= 1 unit
tRW = 1 unit

Since the slowest stage determines the clock, it


may be possible to combine some stages without
any loss of performance

September 26, 2005


6.823 L5- 32
Arvind

Alternative Pipelining
0x4
Add
we
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Ext wdata

write
fetch decode & Reg-fetch execute memory -back
phase phase phase phase phase
tCC > max {tIM
IM
, tRF , t ALU,, ttDM
RF+tALU
,, ttRW
DM+t
}}
RW}
RW
=
= ttDM
DM+ tRW
DM

increase the critical path by 10%


Write-back stage takes much less time than other stages.
Suppose we combined it with the memory phase
September 26, 2005
6.823 L5- 33
Arvind

Maximum Speedup by Pipelining


Assumptions Unpipelined Pipelined Speedup
tC tC
1. tIM = tDM = 10,

tALU = 5,

tRF = tRW= 1

4-stage pipeline 27 10 2.7

2. tIM =tDM = tALU = tRF = tRW = 5


4-stage pipeline 25 10 2.5

3. tIM =tDM = tALU = tRF = tRW = 5


5-stage pipeline 25 5 5.0

It is possible to achieve higher speedup with more


stages in the pipeline.

September 26, 2005


34

Thank you !

You might also like