0% found this document useful (0 votes)

13 views38 pages

Week6 Performance Numericals

Uploaded by

Markhor Gaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views38 pages

Week6 Performance Numericals

Uploaded by

Markhor Gaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Week 6 & 7

Numerical Problems +
Midterm Review for CSA

Adapted mostly from: Prof. Onur Mutlu

ETH Zurich
Evaluating the Single-Cycle
Microarchitecture

2
A Single-Cycle Microarchitecture
◼ Is this a good idea/design?

◼ When is this a good design?

◼ When is this a bad design?

◼ How can we design a better microarchitecture?

3
Performance Analysis Basics
Processor Performance
◼ How fast is my program?
❑ Every program consists of a series of instructions
❑ Each instruction needs to be executed.
Processor Performance
◼ How fast is my program?
❑ Every program consists of a series of instructions
❑ Each instruction needs to be executed.
◼ So how fast are my instructions ?
❑ Instructions are realized on the hardware
❑ They can take one or more clock cycles to complete
❑ Cycles per Instruction = CPI
Processor Performance
◼ How fast is my program?
❑ Every program consists of a series of instructions
❑ Each instruction needs to be executed.
◼ So how fast are my instructions ?
❑ Instructions are realized on the hardware
❑ They can take one or more clock cycles to complete
❑ Cycles per Instruction = CPI
◼ How much time is one clock cycle?
❑ The critical path determines how much time one cycle
requires = clock period.
❑ 1/clock period = clock frequency = how many cycles can be
done each second.
Processor Performance
◼ Now as a general formula
❑ Our program consists of executing N instructions.
❑ Our processor needs CPI cycles for each instruction.
❑ The maximum clock speed of the processor is f,
and the clock period is therefore T=1/f
Processor Performance
◼ Now as a general formula
❑ Our program consists of executing N instructions.
❑ Our processor needs CPI cycles for each instruction.
❑ The maximum clock speed of the processor is f,
and the clock period is therefore T=1/f
◼ Our program executes in
N x CPI x (1/f) =
N x CPI x T seconds
Performance Analysis Basics
◼ Execution time of an instruction
❑ {CPI} x {clock cycle time}
◼ CPI: Number of cycles it takes to execute an instruction

◼ Execution time (aka runtime) of a program

❑ Sum over all instructions [{CPI} x {clock cycle time}]
❑ {# of instructions} x {Average CPI} x {clock cycle time}

10
Performance Analysis of
Our Single-Cycle Design
A Single-Cycle Microarchitecture: Analysis
◼ Every instruction takes 1 cycle to execute
❑ CPI (Cycles per instruction) is strictly 1

◼ How long each instruction takes is determined by how long

the slowest instruction takes to execute
❑ Even though many instructions do not need that long to
execute

◼ Clock cycle time of the microarchitecture is determined by

how long it takes to complete the slowest instruction
❑ Critical path of the design is determined by the processing
time of the slowest instruction

12
What is the Slowest Instruction to Process?
◼ Let’s go back to the basics

◼ All six phases of the instruction processing cycle take a single

machine clock cycle to complete
❑ Fetch 1. Instruction fetch (IF)
❑ Decode 2. Instruction decode and
❑ Evaluate Address register operand fetch (ID/RF)
❑ Fetch Operands 3. Execute/Evaluate memory address (EX/AG)
4. Memory operand fetch (MEM)
❑ Execute
5. Store/writeback result (WB)
❑ Store Result

◼ Do each of the above phases take the same time (latency)

for all instructions?
13
A simplified view/model of SC processor
◼ Assumptions:
❑ Ignore mux delays
❑ Ignore single register delays (for PC)
❑ Ignore delay for control unit

14
Example Single-Cycle Datapath Analysis
◼ Assume (for the design in the previous slide)
❑ memory units (read or write): 200 ps
❑ ALU and adders: 100 ps
❑ register file (read or write): 50 ps
❑ other combinational logic: 0 ps
steps IF ID EX MEM WB
Delay
resources mem RF ALU mem RF

R-type 200 50 100 50 400

I-type 200 50 100 50 400
LW 200 50 100 200 50 600
SW 200 50 100 200 550
Branch 200 50 100 350
Jump 200 200
Let’s Find the Critical Path

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 bcond
Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT 2004

Elsevier. ALL RIGHTS RESERVED.]
R-Type and I-Type ALU

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add 100ps RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

100ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
250ps bcond
Zero
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
memory Instruction [15– 11]
1
x
Write
data
400ps 1
u
x
350ps Data
memory
M
u
x
0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT

2004 Elsevier. ALL RIGHTS RESERVED.]
17
LW

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add 100ps RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

100ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
250ps bcond
Zero
ALU ALU
[31– 0] 0 Read
Instruction
memory Instruction [15– 11]
M
u
x
Write
register

Write
data 2
M
u
x
result Address
data
550ps
1
M
u
1
600ps data 1 350ps Write
Data
memory 0
x

data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT

2004 Elsevier. ALL RIGHTS RESERVED.]
18
SW

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add 100ps RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

100ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
250ps bcond
Zero
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
Write x
1 data 1 350ps 550ps
Write
Data
memory 0
x

data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT

2004 Elsevier. ALL RIGHTS RESERVED.]
19
Branch Taken

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
M M
PC+4 [31– 28]
200ps u u

100ps ALU
Add result 1
x x
0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

350ps
MemWrite
ALUSrc
RegWrite

PC
Read
address
Instruction [25– 21] Read
register 1
Read
350ps
200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
250ps bcond
Zero
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT

2004 Elsevier. ALL RIGHTS RESERVED.]
20
Jump

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u

100ps ALU
Add result 1
x x
0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

200ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
bcond
Zero
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT

2004 Elsevier. ALL RIGHTS RESERVED.]
21
Exec. Time for one billion instructions
◼ Example:
For a program with 100 billion instructions executing on a
single-cycle MIPS processor:

Execution Time = # instructions x CPI x Tc

= (100 × 109)(1)(600 × 10-12 s)
= 100 x 600 x 10-3 s
= 60 s

22
Single-Cycle vs. Multicycle

Clock

Time
needed

Time
allotted Instr 1 Instr 2 Instr 3 Instr 4

Clock

Time Time
needed saved
3 cycles 5 cycles 3 cycles 4 cycles
Time
allotted Instr 1 Instr 2 Instr 3 Instr 4

Fig. Single-cycle versus multicycle instruction execution.

Performance of the Multicycle Processor
R-type 44% 4 cycles ALU-type P Not
C used
Load 24% 5 cycles
Store 12% 4 cycles
Branch 18% 3 cycles
Load P
Jump 2% 3 cycles C

Not
Contribution to CPI Store P
C used
R-type 0.444 = 1.76
Load 0.245 = 1.20
Store 0.124 = 0.48 Branch P Not Not Not
C used used used
Branch 0.183 = 0.54 (and jr)

Jump 0.023 = 0.06

_____________________________

Average CPI  4.04 Jump P Not Not Not Not

C used used used used
(except
jr & jal)

Note: ALU is not used in the last two cases here as separate
hardware exists for branch and jump address calculation,
which is not the case for our multicycle MIPs Slide 24
Multi-Cycle Performance: Average CPI
◼ Instructions take different number of cycles:
❑ 3 cycles: beq, j
❑ 4 cycles: R-Type, sw, addi
❑ 5 cycles: lw Realistic?
◼ CPI is weighted average, e.g. SPECINT2000 benchmark:
❑ 25% loads
❑ 10% stores
❑ 11% branches
❑ 2% jumps
❑ 52% R-type
❑ 0.25X

◼ Average CPI = (0.11 + 0.02) 3 +(0.52 + 0.10) 4 +(0.25) 5

= 4.12 25
If we can’t ignore “smaller” delays
◼ Find
❑ the critical path delay of SC processor (OR Find a lower bound
on the cycle time for the program counter.)
❑ the time taken for one billion instructions to execute

26
LW path when muxes etc can’t be ignored

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
M M
PC+4 [31– 28]

ALU
u
x
u
x 24
Add result 1 0
Add 20 RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

5 Instruction [25– 21]

PC
Read
Read
register 1
Read 40
60
address

25
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
bcond
Zero
ALU ALU
80
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data

82
u M
memory Instruction [15– 11] x u
Write x
1
97 data 1
Write
Data
memory 0
x

data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT

2004 Elsevier. ALL RIGHTS RESERVED.]
27
LW path when muxes etc can’t be ignored
◼ Basically 2 muxes and CU operate in parallel to regfile
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
M M
PC+4 [31– 28]

ALU
u
x
u
x 24
Add result 1 0
Add 20 RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite

CU is in parallelALUSrc
to regfile
RegWrite
CU finishes it work at 28 ps
5 Instructionwhile
[25– 21]regfile at 40 ps
PC
Read
Read
register 1
Read 40
60
address

25
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
bcond
Zero
ALU ALU
80
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data

82
u M
memory Instruction [15– 11] x u
Write x
This mux output is stable
1data
97
at 27 ps 1
In SW, immediateWrite
Data
memory 0
x

data is sent and not rt data

Its output is used after 82 ps, so it not data

Part of
Instruction the
[15– 0] critical path
16 32 This mux output is stable at 29 ps
Sign
extend WhileALU
ALU
rs read will complete at 40 ps
operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT

Execution Time = # instructions x CPI x Tc

= (100 × 109)(1)(97 × 10-12 s)
= 100 x 97 x 10-3 s
= 9.7 s

29
SC vs MC Perf. Which one is faster?
◼ : Consider the hardware times of major units (all others
being negligible) in a datapath as given below.
Determine minimum clock cycle time, average CPI,
and average instruction execution time for single-cycle
and multi-cycle datapath.

30
Solution
Single-cycle clock period (cycle time) is determined by the
lw instruction, which activates the critical path:
Cycle time = memory + reg. file + ALU + memory + reg. file
= 25 + 15 + 20 + 25 + 15 = 100ps
Av. CPI =1
Av. time/instruction
= 100 ✕ 1 = 100ps

31
Multi-cycle clock period (cycle time) is determined by the
slowest hardware unit (memory in this case):
Cycle time = 25ps
Clock cycles used by instructions are 5 for lw, 4 for sw, 4 for
r-type, 3 for branch and 3 for jump. Therefore,
Av. CPI
= 0.1✕5 + 0.1✕4 + 0.4✕4 + 0.2✕3 + 0.2✕3
= 3.7
Av. time/instruction
= 25 ✕ 3.7 = 92.5ps

32
Performance ratio
(a) Suppose an operation involving register file, memory or
ALU each takes 1 time unit. Neglecting the time of all other
hardware, how much time will each MIPS instruction
take on a single-cycle datapath? Consider R-type, lw, sw,
beq and j instructions.

(b) What will be the execution times for MIPS instructions on

a 5-cycle multi-cycle datapath using a clock period of 1 time
unit?
(c) A program contains the following mix of instructions: lw
5%, sw 5%, r-type 70%, branch 10%, jump 10%.
What is the ratio of single-cycle CPU time to multicycle CPU
time for running this program on these datapaths?

33
Solution
(a) Each instruction will take 5 units of time on a single-cycle
datapath.

b) Times for MIPS instructions to run on a multi-cycle

datapath are:
Load, lw 5 time units
Store, sw 4 time units
R-type, add, etc. 4 time units
Branch, beq, bne 3 time units
Jump, j 3 time units

34
Solution

35
Performance for a benchmark program
Clock rates for single-cycle and multicycle datapaths are
given as 1GHz and 5GHz, respectively.
The following subroutine is used for estimating performance.
The argument register $a0 contains a large positive
integer and $a contains 1.
loop sub $a0, $a0, $a1
beq $a0, $0, done
j loop
done jr $31
Determine:
(a) Average cycles per instruction (CPI) for two datapaths.
(b) How much faster is the execution of the program on
multicycle processor compared to that on single cycle proc.?
36
Solution
(a) CPI
Single-cycle CPI = 1.0, because each instruction executes in
one cycle.
The instruction mix for multicycle datapath is:
sub takes 4 cycles and is executed a0 times
beq takes 3 cycles and is executed a0 times
j takes 3 cycles and is executed a0 – 1 times
jr takes 3 cycles and is executed once

Total number of instructions = 3a0 – 1 + 1 = 3a0

Multicycle CPI
= (4×a0 + 3×a0 + 3×a0 – 3 + 3)/(3×a0) = 10/3 = 3.333

37
Solution
(b) Execution time ratio:
The multicycle clock period is 0.2ns and the single-cycle clock
period is 1ns.
Therefore,
Performance ratio
= (single-cycle exec time)/(multicycle exec time)
= (1×3a0)/(0.2×3.333×3a0) // 1GHz -> 1 ns
// 5GHz -> 0.2 ns
= 1.5

Programming Languages MCQs Set
100% (3)
Programming Languages MCQs Set
6 pages
Getting Started With Angular (2023) Create and Deploy Angular App
No ratings yet
Getting Started With Angular (2023) Create and Deploy Angular App
410 pages
Boeing 777 Manual
No ratings yet
Boeing 777 Manual
11 pages
CS609 Final Term Solved MCQs by JUNAID
50% (6)
CS609 Final Term Solved MCQs by JUNAID
44 pages
Today - Finish Single-Cycle Datapath/control Path - Look at Its Performance and How To Improve It
No ratings yet
Today - Finish Single-Cycle Datapath/control Path - Look at Its Performance and How To Improve It
28 pages
Arch2 Microarchitecture Design Afterlecture
No ratings yet
Arch2 Microarchitecture Design Afterlecture
222 pages
Chapter 12 Performance of Single-Cycle and Multi-Cycle Data Path
No ratings yet
Chapter 12 Performance of Single-Cycle and Multi-Cycle Data Path
27 pages
FALLSEM2024-25 CSI3021 TH VL2024250101951 2024-07-19 Reference-Material-I
No ratings yet
FALLSEM2024-25 CSI3021 TH VL2024250101951 2024-07-19 Reference-Material-I
21 pages
Lecture 12
No ratings yet
Lecture 12
34 pages
CSCE 5610 Computer System Architecture: Instruction Level Parallelism
No ratings yet
CSCE 5610 Computer System Architecture: Instruction Level Parallelism
11 pages
Comparch 04
No ratings yet
Comparch 04
73 pages
Lecture 13
No ratings yet
Lecture 13
114 pages
Slide 3
No ratings yet
Slide 3
65 pages
Lecture10 - Chapter4-P2
No ratings yet
Lecture10 - Chapter4-P2
46 pages
The Final Datapath: Add M U X
No ratings yet
The Final Datapath: Add M U X
32 pages
Tutorial Module 4
No ratings yet
Tutorial Module 4
9 pages
Disc 6 Sol
No ratings yet
Disc 6 Sol
2 pages
Ca Mid1 2017
No ratings yet
Ca Mid1 2017
9 pages
Multipath 1 Notes
No ratings yet
Multipath 1 Notes
37 pages
Lec 7 CSE-509 Pipelining
No ratings yet
Lec 7 CSE-509 Pipelining
27 pages
5 Singlecycle
No ratings yet
5 Singlecycle
60 pages
Computer Architecture: CSCE 350
No ratings yet
Computer Architecture: CSCE 350
41 pages
Sheet2 - Solution (Design)
No ratings yet
Sheet2 - Solution (Design)
6 pages
Single Cycle Vs Multi Cycle Cpu
No ratings yet
Single Cycle Vs Multi Cycle Cpu
11 pages
Single Cycle Processor Design: Computer Architecture and Assembly Language
No ratings yet
Single Cycle Processor Design: Computer Architecture and Assembly Language
24 pages
Pipelining
No ratings yet
Pipelining
24 pages
001-Stored Program Computer
100% (1)
001-Stored Program Computer
6 pages
04 CPUOverview
No ratings yet
04 CPUOverview
40 pages
461 Assignment
No ratings yet
461 Assignment
52 pages
Lecture 2: Performance/Power, MIPS Instructions
No ratings yet
Lecture 2: Performance/Power, MIPS Instructions
28 pages
08 Cse333
No ratings yet
08 Cse333
6 pages
What Is The Most Boring Household Activity?
No ratings yet
What Is The Most Boring Household Activity?
27 pages
Chapter04 ProcessorDesign PDF
No ratings yet
Chapter04 ProcessorDesign PDF
39 pages
03.EECE345 Computer Architecture ISA Design 02
No ratings yet
03.EECE345 Computer Architecture ISA Design 02
80 pages
Akash
No ratings yet
Akash
74 pages
793purl Computer-Organization TYS
No ratings yet
793purl Computer-Organization TYS
9 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
38 pages
Lecture 11
No ratings yet
Lecture 11
37 pages
It3030e CA Chap5 Cpu p1
No ratings yet
It3030e CA Chap5 Cpu p1
62 pages
CA07 2022S3 New
No ratings yet
CA07 2022S3 New
29 pages
Processor
No ratings yet
Processor
21 pages
CS104: Computer Organization: 30 March, 2020
No ratings yet
CS104: Computer Organization: 30 March, 2020
31 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Hazards Slideshow
No ratings yet
Hazards Slideshow
72 pages
CS1601 Computer Architecture
100% (1)
CS1601 Computer Architecture
389 pages
Slide 1
No ratings yet
Slide 1
33 pages
Lecture 06 - Slides - Computer Technology and Instructions
No ratings yet
Lecture 06 - Slides - Computer Technology and Instructions
46 pages
Onur 447 Spring15 Lecture5 Uarch Afterlecture
No ratings yet
Onur 447 Spring15 Lecture5 Uarch Afterlecture
80 pages
Computer Architecture: Trần Trọng Hiếu
No ratings yet
Computer Architecture: Trần Trọng Hiếu
29 pages
Module 3 DDCO
No ratings yet
Module 3 DDCO
67 pages
DDCO Imp Qs For 2nd Internals
No ratings yet
DDCO Imp Qs For 2nd Internals
13 pages
UNIT 1 - Basic Structure of Computers
No ratings yet
UNIT 1 - Basic Structure of Computers
91 pages
Lecture 19
No ratings yet
Lecture 19
5 pages
Chapter1 - Basic Structure of Computers
100% (1)
Chapter1 - Basic Structure of Computers
119 pages
Co Notes Module 3
No ratings yet
Co Notes Module 3
17 pages
CAO Fall 2024 Lecture 06 Design Metrics Performance Evaluation
No ratings yet
CAO Fall 2024 Lecture 06 Design Metrics Performance Evaluation
41 pages
Archmidsem 2009 Sol
No ratings yet
Archmidsem 2009 Sol
5 pages
21 Architecture MultiCycle PDF
No ratings yet
21 Architecture MultiCycle PDF
50 pages
Pipelining
No ratings yet
Pipelining
25 pages
Lecture 03
No ratings yet
Lecture 03
30 pages
Discussion Session 4-11
No ratings yet
Discussion Session 4-11
12 pages
09 Perf
No ratings yet
09 Perf
22 pages
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Worked Examples in Mechanics of Machines using MATLAB
From Everand
Worked Examples in Mechanics of Machines using MATLAB
Eric Ogur
No ratings yet
ALiEditor User Manual
No ratings yet
ALiEditor User Manual
15 pages
Searching and Sorting 2
No ratings yet
Searching and Sorting 2
24 pages
Qvproperties
No ratings yet
Qvproperties
6 pages
CAN-API Part2 Installation Manual
No ratings yet
CAN-API Part2 Installation Manual
112 pages
Wifi Tethering
No ratings yet
Wifi Tethering
2 pages
LoadRunner Scripting Guide
No ratings yet
LoadRunner Scripting Guide
93 pages
Software Engineering - 2023 - Assignment 6 Updated
No ratings yet
Software Engineering - 2023 - Assignment 6 Updated
6 pages
Term Paper
No ratings yet
Term Paper
24 pages
HP ZBook x2 G4 Detachable Workstation
No ratings yet
HP ZBook x2 G4 Detachable Workstation
12 pages
An Interpreter For Extended Lambda Calculus - AIM-349
No ratings yet
An Interpreter For Extended Lambda Calculus - AIM-349
43 pages
Cloud Security For Dummies Webinar
100% (1)
Cloud Security For Dummies Webinar
14 pages
Fix Stuck Windows 10 Update
No ratings yet
Fix Stuck Windows 10 Update
2 pages
Scopus Paper - 1 - Corresponding Author
No ratings yet
Scopus Paper - 1 - Corresponding Author
1 page
Network System Design
No ratings yet
Network System Design
828 pages
Datasheet - PW Series Modular Access Control System PDF
No ratings yet
Datasheet - PW Series Modular Access Control System PDF
4 pages
CP4094
No ratings yet
CP4094
153 pages
Cloud Phone: September 2020
No ratings yet
Cloud Phone: September 2020
4 pages
L3 Api Generation in R23 1
No ratings yet
L3 Api Generation in R23 1
6 pages
GPS 175/GNX 375/GNC 355 Part 23 AML STC Equipment List
No ratings yet
GPS 175/GNX 375/GNC 355 Part 23 AML STC Equipment List
4 pages
Forensic Analysis of Chromecast and Miracast Devices
No ratings yet
Forensic Analysis of Chromecast and Miracast Devices
25 pages
7 - OpenFOAM
No ratings yet
7 - OpenFOAM
12 pages
Int F (Int N) (Static Int I 1 If (N 5) Return N N N+i I++ Return F (N) )
No ratings yet
Int F (Int N) (Static Int I 1 If (N 5) Return N N N+i I++ Return F (N) )
9 pages
Overview of Experion HS R430: Superior, Flexible, and Efficient !!
No ratings yet
Overview of Experion HS R430: Superior, Flexible, and Efficient !!
10 pages
Mediatek Dimensity G99 Flashing
No ratings yet
Mediatek Dimensity G99 Flashing
21 pages
Handout 3 - Introduction To SQL
No ratings yet
Handout 3 - Introduction To SQL
9 pages
FDSRPDF 2023 05 29
No ratings yet
FDSRPDF 2023 05 29
255 pages

Week6 Performance Numericals

Uploaded by

Week6 Performance Numericals

Uploaded by

Week 6 & 7

Adapted mostly from: Prof. Onur Mutlu

◼ When is this a good design?

◼ When is this a bad design?

◼ How can we design a better microarchitecture?

◼ Execution time (aka runtime) of a program

◼ How long each instruction takes is determined by how long

◼ Clock cycle time of the microarchitecture is determined by

◼ All six phases of the instruction processing cycle take a single

◼ Do each of the above phases take the same time (latency)

R-type 200 50 100 50 400

PC+4 [31– 28] M M

Instruction [25– 21] Read

[Based on original figure from P&H CO&D, COPYRIGHT 2004

PC+4 [31– 28] M M

Instruction [25– 21] Read

[Based on original figure from P&H CO&D, COPYRIGHT

PC+4 [31– 28] M M

Instruction [25– 21] Read

[Based on original figure from P&H CO&D, COPYRIGHT

PC+4 [31– 28] M M

Instruction [25– 21] Read

[Based on original figure from P&H CO&D, COPYRIGHT

[Based on original figure from P&H CO&D, COPYRIGHT

PC+4 [31– 28] M M

Instruction [25– 21] Read

[Based on original figure from P&H CO&D, COPYRIGHT

Execution Time = # instructions x CPI x Tc

Fig. Single-cycle versus multicycle instruction execution.

Jump 0.023 = 0.06

Average CPI  4.04 Jump P Not Not Not Not

◼ Average CPI = (0.11 + 0.02) 3 +(0.52 + 0.10) 4 +(0.25) 5

5 Instruction [25– 21]

[Based on original figure from P&H CO&D, COPYRIGHT

data is sent and not rt data

[Based on original figure from P&H CO&D, COPYRIGHT

Execution Time = # instructions x CPI x Tc

(b) What will be the execution times for MIPS instructions on

b) Times for MIPS instructions to run on a multi-cycle

Total number of instructions = 3a0 – 1 + 1 = 3a0

You might also like