Computer Architecture-Performance - Datapath
Computer Architecture-Performance - Datapath
Dina Tantawy
Computer Engineering Department
Cairo University
Agenda
• Recap
• Performance Fallacies and pitfalls
• Real stuff: Benchmarking the Intel core i7
• What is pipelining?
• Characteristics of pipelining
Compiler
X X
ISA
X X X
Core
organization X X
Technology
X
Component Analysis
Is this true ??
Computer Engineering, Cairo University
MIPS Example
Store 10% 3 .3 .3 .3 .3
Branch 20% 2 .4 .4 .2 .4
• Performance depends on
• Algorithm: affects IC, possibly CPI
• Programming language: affects IC, CPI
• Compiler: affects IC, CPI
• Instruction set architecture: affects IC, CPI, Tc
n
n
Execution time ratio
i1 Computer Engineering, Cairo University
i
CINT2006 for Intel Core i7 920
chips
30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e
r D
Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?
Computer Engineering, Cairo University
What Is Pipelining Start work ASAP
6 PM 7 8 9 10 11 Midnight
Time
30 40 40 40 40 20
T
a A
s • Pipelined laundry takes 3.5
k hours for 4 loads
B
O
r
d C
e
r
D
• Usually instructions are few in number and are typically one size.
MIPS ISA
$18
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp
Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Read Address
Memory Register
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
Instr[15-0] Sign
ALU
16 Extend 32 control
Instr[5-0]
Computer Engineering, Cairo University
R-type Instruction Data/Control Flow
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp
Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Read Address
Memory Register zero
Instr[20-16] Read Addr 2 Data 1
Data
Read File
PC Instr[31-0] Memory Read Data
Address 0 Write Addr ALU 1
1 Read 0
Data 2 Write Data 0
Write Data
Instr[15 1
-11]
Instr[5-0] Sign
ALU
16 Extend 32 control
Instr[5-0]
Computer Engineering, Cairo University
Load Word Instruction Data/Control Flow
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp
Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21]
Instruction Read Addr 1
Read Address
Memory Register
Instr[20-16] Read Addr 2 Data 1 zero
Read Data
PC Address Instr[31-0] File Memory Read Data 1
0 Write Addr ALU
1 Read 0 0
Data 2 Write Data
Instr[15
Write Data
-11] 1
Instr[15-0]
Store Word Sign
ALU
16 Extend 32 control
Instruction? Instr[5-0]
Computer Engineering, Cairo University
Branch Instruction Data/Control Flow
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp
Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21]
Read Addr 1
Instruction
Read Address
Memory Instr[20-16] Register
Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
Instr[15-0]
Sign
ALU
16 Extend 32 control
Instr[5-0]
Computer Engineering, Cairo University
RISC Instruction Set Implementation
•We first need to look at how instructions in the MIPS instruction
set are implemented without pipelining. We’ll assume that any
instruction of the subset of MIPS can be executed in at most 5
clock cycles.
•The five clock cycles will be broken up into the following steps:
• Instruction Fetch Cycle
• Instruction Decode/Register Fetch Cycle
• Execution Cycle
• Memory Access Cycle
• Write-Back Cycle
Control
Unit
Read Addr 1
Read
Register
Read Addr 2 Data 1
Instruction
File
Write Addr Read
Computer Data 2 University
WriteEngineering,
Data Cairo
Executing R Format Operations (IE)
Read Addr 1
Read
Register
Read Addr 2 Data 1 overflow
Instruction
File zero
Write Addr ALU
Read
Data 2
Write Data
overflow
Read Addr 1 zero
Read Address
Register
Read Addr 2 Data 1
Instruction Data
File Memory Read Data
Write Addr ALU
Read
Data 2 Write Data
Write Data
Sign MemRead
16 Extend 32
Add Branch
4
Add target
Shift address
left 2
ALU control
PC
•If a load, the effective address computed from the previous cycle is referenced and the
memory is read. The actual data transfer to the register does not occur until the next cycle.
•If a store, the data from the register is written to the effective address in memory.
Read Addr 1
Read
Register
Read Addr 2 Data 1 overflow
Instruction
File zero
Write Addr ALU
Read
Data 2
Write Data
ALU
Ifetch Reg DMem Reg
n
s
t
ALU
Ifetch Reg DMem Reg
r
.
ALU
Ifetch Reg DMem Reg
O
r
d
ALU
Reg
e
Ifetch Reg DMem
P ro g ra m
e x e c utio n
o rd e r 2 4 6 8 10 12 14 16 18
Time
(in instructions)
Instruction Reg ALU Data Reg
lw $1, 100($0) fetch access
Instruction Data
8 ns Reg ALU Reg
fetch access
lw $2, 200($0)
Instruction
8 ns fetch
lw $3, 300($0) ...
8 ns
Instruction D ata
lw $2, 200($0) 2 ns R eg AL U R eg
fetch access
Instruction D ata
lw $3, 300($0) 2 ns R eg AL U R eg
fetch access
2 ns 2 ns 2 ns 2 ns 2 ns
Computer Engineering, Cairo University
CPU pipelining: Example
Wrong
register
number