Single-Cycle Processors: Datapath & Control: Computer Science & Artificial Intelligence Lab M.I.T
Single-Cycle Processors: Datapath & Control: Computer Science & Artificial Intelligence Lab M.I.T
Single-Cycle Processors:
Arvind
M.I.T.
6.823 L5- 2
versus Implementation
ISA is the hardware/software interface
Defines set of programmer visible state
Defines instruction format (bit encoding) and instruction
semantics
Processor Performance
Controller control
status points
lines
Data
path
Decoder
...
O1
...
O1
Demux
A1 Result
...
O A A
Mux lg(n)
ALU
Comp?
An-1 On- On-1 B
1
Register Files
Clock WE
we
ReadSel1 rs1 rd1 ReadData1
ReadSel2 rs2 Register rd2 ReadData2
file
WriteSel ws
2R+1W
WriteData wd
ws clk wd rs1
5 32 5
register 0 rd1
32
32
we register 1
rs2
32 5
register 31
rd2
32 32
WriteEnable
Clock
Address
MAGIC ReadData
RAM
WriteData
Implementing MIPS:
Processor State
32 32-bit GPRs, R0 always contains a 0
Data types
Instruction Execution
1. instruction fetch
2. decode and register fetch
3. ALU operation
4. memory operation (optional)
5. write back
RegWrite
0x4
Add clk
inst<25:21> we
inst<20:16> rs1
addr rs2
PC inst<15:11> rd1
inst ALU
ws
wd rd2 z
clk Inst.
GPRs
Memory
inst<5:0> ALU
Control
OpCode
RegWrite Timing?
6 5 5 5 5 6
0 rs rt rd 0 func rd (rs) func (rt)
31 26 25 21 20 16 15 11 5 0
September 26, 2005
6.823 L5- 12
Arvind
RegWrite
0x4
clk
Add
inst<25:21> we
rs1
rs2
PC addr rd1
inst<20:16>
inst ws ALU
wd rd2 z
clk Inst. GPRs
Memory
inst<15:0> Imm
Ext
inst<31:26> ALU
Control
OpCode ExtSel
6 5 5 16
opcode rs rt immediate rt (rs) op immediate
31 26 25 2120 16 15 0
September 26, 2005
6.823 L5- 13
Arvind
RegWrite
0x4 Introduce
clk
Add muxes
inst<25:21> we
rs1
rs2
PC addr rd1
inst<20:16>
inst ws ALU
inst<15:11> wd rd2 z
clk Inst. GPRs
Memory
inst<15:0> Imm
Ext
inst<31:26> ALU
inst<5:0> Control
OpCode ExtSel
6 5 5 5 5 6
0 rs rt rd 0 func rd (rs) func (rt)
opcode rs rt immediate rt (rs) op immediate
September 26, 2005
6.823 L5- 14
Arvind
RegWrite
0x4
clk
Add
<25:21> we
rs1
<20:16> rs2
PC addr rd1
inst ws ALU
<15:11> wd rd2 z
clk Inst. GPRs
Memory
<15:0> Imm
Ext
<31:26>, <5:0> ALU
Control
RegWrite MemWrite
0x4 clk WBSrc
Add
ALU / Mem
base we
clk
rs1
rs2
addr rd1 we
PC ws addr
inst ALU
wd rd2
z
Inst. GPRs rdata
clk Data
Memory disp Imm Memory
Ext wdata
ALU
Control
6 5 5 16 addressing mode
opcode rs rt displacement (rs) + displacement
31 26 25 21 20 16 15 0
rs is the base register
rt is the destination of a Load or the source for a Store
September 26, 2005
6.823 L5- 17
Arvind
6 5 5 16
opcode rs JR, JALR
pc+4
0x4
Add
Add
clk
we
clk
rs1
rs2
PC addr rd1 we
inst ws addr
wd rd2 ALU
clk Inst. GPRs z rdata
Memory Data
Imm Memory
Ext wdata
ALU
Control
pc+4
0x4
Add
Add
clk
we
clk
rs1
rs2
PC addr rd1 we
inst ws addr
wd rd2 ALU
clk Inst. GPRs z rdata
Memory Data
Imm Memory
Ext wdata
ALU
Control
pc+4
0x4
Add
Add
clk
we
clk
rs1
rs2
PC addr 31 rd1 we
inst ws addr
wd rd2 ALU
clk Inst. GPRs z rdata
Memory Data
Imm Memory
Ext wdata
ALU
Control
0x4
Add
Add
clk
we
clk
rs1
rs2
PC addr 31 rd1 we
inst ws addr
wd rd2 ALU
clk Inst. GPRs z rdata
Memory Data
Imm Memory
Ext wdata
ALU
Control
0x4
Add
Add
clk
we
clk
rs1
rs2
PC addr 31 rd1 we
inst ws addr
wd rd2 ALU
clk Inst. GPRs z rdata
Memory Data
Imm Memory
Ext wdata
ALU
Control
6.823 L5- 24
Arvind
Harvard architecture
We will assume
clock period is sufficiently long for all of
1. instruction fetch
2. decode and register fetch
3. ALU operation
4. data fetch if required
5. register write-back setup time
ExtSel
BSrc
OpSel
op code
combinational MemWrite
logic WBSrc
zero?
RegDst
RegWrite
PCSrc
Inst<5:0> (Func)
Inst<31:26> (Opcode)
ALUop
+
0?
OpSel
( Func, Op, +, 0? )
Decode Map
ExtSel
( sExt16, uExt16,
High16)
September 26, 2005
6.823 L5- 27
Arvind
Pipelined MIPS
To pipeline MIPS:
Pipelined Datapath
0x4
Add
we
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Ext wdata
write
fetch decode & Reg-fetch execute memory -back
phase phase phase phase phase
Clock period can be reduced by dividing the execution of an
instruction into multiple cycles
tC > max {tIM, tRF, tALU, tDM, tRW} ( = tDM probably)
However, CPI will increase unless instructions are pipelined
September 26, 2005
6.823 L5- 30
Arvind
An Ideal Pipeline
into stages
Suppose memory is significantly slower than
other stages. In particular, suppose
tIM
= 10 units
tDM
= 10 units
tALU = 5 units
tRF
= 1 unit
tRW = 1 unit
Alternative Pipelining
0x4
Add
we
rs1
rs2
PC addr rd1 we
rdata IR ws addr
wd rd2 ALU
GPRs rdata
Inst. Data
Memory Imm Memory
Ext wdata
write
fetch decode & Reg-fetch execute memory -back
phase phase phase phase phase
tCC > max {tIM
IM
, tRF , t ALU,, ttDM
RF+tALU
,, ttRW
DM+t
}}
RW}
RW
=
= ttDM
DM+ tRW
DM
tALU = 5,
tRF = tRW= 1
Thank you !