0% found this document useful (0 votes)

14 views41 pages

CAO Fall 2024 Lecture 06 Design Metrics Performance Evaluation

Uploaded by

Omair Siddique

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views41 pages

CAO Fall 2024 Lecture 06 Design Metrics Performance Evaluation

Uploaded by

Omair Siddique

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

EE-321 Fall 2024

Computer Architecture and Organization

Lecture # 06
Design Metrics and CPU Performance Evaluation

Muhammad Imran
[email protected]
Acknowledgement
2

▪ Content from following has been used in these lectures

▪ Computer Organization and Design, RISC-V 2nd Edition, Patterson and
Hennessy
▪ Computer Organization and Design, RISC-V 1st Edition, Patterson and
Hennessy
Contents
3

▪ Design Metrics and Design Tradeoffs

▪ Throughput
▪ Latency
▪ Timing
▪ Evaluating Computing Performance
Design Metrics and Design Tradeoffs
Measuring Speed of a Design
5

▪ Throughput
▪ Amount of data processed per clock cycle
▪ Bits per cycle or bits per second
▪ Tasks executed per unit time
▪ Instructions per second, Instructions per cycle etc.

▪ Latency
▪ Time to process a single task
▪ Number of cycles or seconds

▪ Timing
▪ Defined by the logic delays between sequential elements
▪ Clock period, frequency
Example …
6

input D Q D Q D Q output
8 Combinational Combinational 8
Logic Logic

p p p p

p p p

▪ Throughput?
▪ (Bits per output sample / time between two output samples)
▪ 8 bits/cycle, if 1 cycle = 10 ns, throughput = 8/10n = 800 Mbits/s
▪ Throughput can also be 1 task/sample per cycle!!
Example …
7

input D Q D Q D Q output
8 Combinational Combinational 8
Logic Logic

p p p p

p p p

▪ Latency?
▪ Time to complete one task / sample
▪ 3 clock cycles, if 1 cycle = 10 ns, latency = 30 ns
Example …
8

input D Q D Q D Q output
8 Combinational Combinational 8
Logic Logic

p p p p

p p p

▪ Timing?
▪ Clock period = tckl2q + combinational logic delay (longest) + ts
Design Tradeoffs: Multicycle Design
9

Xpower = 1;
for(i=0; i < 3; i++)
Xpower = X*Xpower;

clk
[7:0]

Start
[7:0]
×
0 [7:0] [7:0]
D[7:0] Q[7:0]
[7:0]
X[7:0] 1
Xpower[7:0]

▪ Throughput?
▪ 1 sample or task / 3 cycles
▪ 8 bits / 3 cycles = 2.7 bits per
Design Tradeoffs: Multicycle Design
10

Xpower = 1;
for(i=0; i < 3; i++)
Xpower = X*Xpower;

clk
[7:0]

Start
[7:0]
×
0 [7:0] [7:0]
D[7:0] Q[7:0]
[7:0]
X[7:0] 1
Xpower[7:0]

▪ Latency?
▪ 3 clock cycles
Design Tradeoffs: Multicycle Design
11

Xpower = 1;
for(i=0; i < 3; i++)
Xpower = X*Xpower;

clk
[7:0]

Start
[7:0]
×
0 [7:0] [7:0]
D[7:0] Q[7:0]
[7:0]
X[7:0] 1
Xpower[7:0]

▪ Clock Timing?
▪ Clock period = tclk2q + 1 multiplier delay + 1 mux delay + ts
Design Tradeoffs: Pipelining
12

clk
[7:0]
Xpower = 1; Start
[7:0]
for(i=0; i < 3; i++) ×
Xpower = X*Xpower;
0 [7:0] [7:0]
D[7:0] Q[7:0]
[7:0]
X[7:0] 1
Xpower[7:0]

D[7:0] Q[7:0]
clk [7:0]
D[7:0] Q[7:0] × D[7:0] Q[7:0] Xpower[7:0]
[7:0]
X[7:0] ×
[7:0] [7:0]
[7:0]
D[7:0] Q[7:0]

▪ Throughput after pipelining?

▪ 1 task per cycle, 8 bits per cycle! (Improved!)
Design Tradeoffs: Pipelining
13

clk
[7:0]
Xpower = 1; Start
[7:0]
for(i=0; i < 3; i++) ×
Xpower = X*Xpower;
0 [7:0] [7:0]
D[7:0] Q[7:0]
[7:0]
X[7:0] 1
Xpower[7:0]

D[7:0] Q[7:0]
clk [7:0]
D[7:0] Q[7:0] × D[7:0] Q[7:0] Xpower[7:0]
[7:0]
X[7:0] ×
[7:0] [7:0]
[7:0]
D[7:0] Q[7:0]

▪ Latency after pipelining?

▪ 3 cycles! (Same!)
Design Tradeoffs: Pipelining
14

clk
[7:0]
Xpower = 1; Start
[7:0]
for(i=0; i < 3; i++) ×
Xpower = X*Xpower;
0 [7:0] [7:0]
D[7:0] Q[7:0]
[7:0]
X[7:0] 1
Xpower[7:0]

D[7:0] Q[7:0]
clk [7:0]
D[7:0] Q[7:0] × D[7:0] Q[7:0] Xpower[7:0]
[7:0]
X[7:0] ×
[7:0] [7:0]
[7:0]
D[7:0] Q[7:0]

▪ Timing after pipelining?

▪ Critical path still involves one multiplier delay (Same!)
Design Tradeoffs: Pipelining
15

clk
[7:0]
Xpower = 1; Start
[7:0]
for(i=0; i < 3; i++) ×
Xpower = X*Xpower;
0 [7:0] [7:0]
D[7:0] Q[7:0]
[7:0]
X[7:0] 1
Xpower[7:0]

D[7:0] Q[7:0]
clk [7:0]
D[7:0] Q[7:0] × D[7:0] Q[7:0] Xpower[7:0]
[7:0]
X[7:0] ×
[7:0] [7:0]
[7:0]
D[7:0] Q[7:0]

▪ Cost of pipelining?
▪ More area! (Additional registers + Multiplier)
Design Tradeoffs: Single Cycle Design
16

[7:0] [7:0] × Xpower[7:0]

× [7:0]
X[7:0]

▪ Throughput?
▪ 1 sample per cycle or 8 bits per cycle!
▪ Latency?
▪ 1 cycle (low latency!)
▪ Timing?
▪ Clock period = 2 multipliers delay + clk2q + ts
▪ Slower clock may undermine low latency!
How do we evaluate computers?
Defining Performance
18

▪ Which airplane is fastest / best performing?

▪ Cruising Speed
▪ How fast a single task can be executed …
▪ How many passengers are transported in a given time?
▪ That’s throughput …
In a similar manner, computers may be evaluated for
several parame ers …
Execution Time vs Throughput
20

▪ Desktop Computer
▪ How fast it executes a program?
▪ Parameter of interest is execution time / response time
▪ To improve performance → reduce execution time!
▪ Server / Datacenter Computers
▪ How many tasks / jobs are executed in a given time?
▪ Focus is throughput / bandwidth!
▪ To improve performance → enhance throughput!
▪ For single core systems
1
▪ Performance =
Execution Time
Execution Time vs Throughput
21

▪ Throughput may impact response time

▪ Example: Given multiple jobs
▪ If we increase number of cores
▪ Throughput increases!
▪ If jobs need to be queued (too many)
▪ More cores would also reduce the response time!
▪ If we make a single core faster
▪ Both response time and throughput increase!
CPU Execution Time
22

▪ Execution Time / Elapsed Time

▪ Time to run a program
▪ Includes I/O time, waiting time etc.
▪ CPU Execution Time / CPU Time
▪ Time spent by CPU in executing the program
▪ Does not include I/O time or waiting time etc!
▪ Sub types:
▪ User CPU Time
▪ Time spent by CPU on program itself
▪ System CPU Time
▪ Time spent by OS on behalf of the program! (Not other programs!)
▪ CPU Performance refers to User CPU Time!
CPU Performance Factors
23

CPU Execution Time CPU Clock Cycles for a

= × Clock Cycle Time
for a program program

CPU Execution Time CPU Clock Cycles for a program

=
for a program
Clock Rate (frequency)
▪ Example
▪ Program runs in 10s on Computer A @ 2GHz
▪ On B, it will run in 6s at a faster clock but will take 1.2 times more clock
cycles!
▪ What would be clock rate for Computer B?
CPU Performance Factors
24

CPU Execution Time CPU Clock Cycles for a

= × Clock Cycle Time
for a program program

CPU Execution Time CPU Clock Cycles for a program

=
for a program
Clock Rate (frequency)
▪ Solution
▪ Cycles for Computer A = 10s × 2GHz = 20G cycles
▪ Cycles for Computer B = 1.2 × 20G = 24G cycles
▪ Clock Rate for Computer B = Cycles / Execution Time
= (24G)/6 = 4 GHz
Instruction Performance
25

▪ Execution time also depends on number of instructions in a program!

CPU Clock Cycles for a Instructions for a Average Clock Cycles

= ×
program Program per Instruction (CPI)

▪ CPI (average) is one way of comparing different implementations of

same ISA
▪ Given same number of instructions per program
Instruction Performance
26

CPU Clock Cycles for a Instructions for a Average Clock Cycles

= ×
program Program per Instruction (CPI)

▪ Example: Comparing two implementations of same ISA

▪ Computer A has clock cycle of 250ps and CPI of 2.0 for a program
▪ Computer B has clock cycle of 500ps and CPI of 1.2 for same program
▪ Which one is faster for this program? How much?
▪ Solution
▪ Clock cycles for Computer A = I × 2
▪ Clock cycles for Computer B = I × 1.2
▪ CPU Time for A = Clock cycles × Cycle Time = I × 2 × 250ps
▪ CPU Time for B = Clock cycles × Cycle Time = I × 1.2 × 500ps
▪ Computer B takes 1.2 times more execution time, i.e., it’s 1.2 times
slower!
CPU Performance Equation
27

CPU Time = Instructions Count × CPI × Clock Cycle Time

CPU Time = Instructions Count × CPI

Clock Rate

Seconds Instructions Clock Cycles Seconds

Time = = × ×
Program Program Instructions Clock Cycle
CPU Performance Equation
28

CPU Time = Instructions Count × CPI × Clock Cycle Time

Example: Comparing two code sequences!
CPI for each instruction Code Instruction counts for each instruction class
class sequence A B C
A B C 1 2 1 2

CPI 1 2 3 2 4 1 1

Hardware Specifications Two alternative code sequences

(By hardware designer) (Options for compiler writer!)

▪ Which code sequence executes most instructions?

▪ What is CPI of each code sequence?
▪ Which will execute faster?
▪ Solution
▪ Code Sequence 2 executes most instructions i.e., 6
▪ Cycles for Code Sequence 1 = (2 × 1) + (1 × 2) + (2 × 3) = 10 cycles, CPI = 10/5 = 2
▪ Cycles for Code Sequence 2 = (4 × 1) + (1 × 2) + (1 × 3) = 9 cycles, CPI = 9/6 = 1.5 (faster)
▪ CPU Time for Code Sequence 1 = 10 × Clock Cycle Time
▪ CPU Time for Code Sequence 2 = 9 × Clock Cycle Time (faster)
▪ Fewer instructions do not always mean faster execution !!!
Knowledge Check!
29

▪ A given application written in Java runs 15 seconds on a desktop

processor. A new Java compiler is released that requires only 0.6 as
many instructions as the old compiler. Unfortunately, it increases the
CPI by 1.1. How fast can we expect the application to run using this
new compiler?
▪ Solution
▪ CPU Time Old = 15s
▪ Instructions Count Old = I
▪ Instructions Count New = 0.6 × I
▪ CPI Old = C
▪ CPI New = 1.1 × C
▪ CPU Time Old = 15s = I × C × Clock Cycle Time
▪ CPU Time New = 0.6 × I × 1.1 × C × Clock Cycle Time
= 0.66 × I × C × Clock Cycle Time
▪ CPU Time New = 0.66 × CPU Time Old = 0.66 × 15s = 9.9s
Exercise 1
30

CPU Time = Instructions Count × CPI × Clock Cycle Time

▪ Consider three different processors P1, P2, and P3 executing the

same instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2
has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock
rate and has a CPI of 2.2.
a. Which processor has the highest performance expressed in
instructions per second?

▪ Solution (a)
▪ Instructions per second = instructions per cycle × cycles per second
▪ Instructions per second for P1 = (1/1.5) × 3GHz = 2G instructions/s
▪ Instructions per second for P2 = (1/1) × 2.5GHz = 2.5G instructions/s
▪ Instructions per second for P3 = (1/2.2) × 4GHz = 1.818G instructions/s
▪ P2 has highest performance!
Exercise 1
31

CPU Time = Instructions Count × CPI × Clock Cycle Time

▪ Consider three different processors P1, P2, and P3 executing the same
instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5
GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock rate and has a
CPI of 2.2.
b. If the processors each execute a program in 10 seconds, find the number
of cycles and the number of instructions.
▪ Solution (b)
▪ Execution time = Instruction count × CPI × Clock cycle time
▪ Number of cycles = Execution Time × Clock Rate
▪ Number of cycles for P1 = 10s × 3GHz = 30G cycles,
▪ Number of cycles for P2= 25G cycles
▪ Number of cycles for P3 = 40G cycles
▪ Instructions count = (Execution Time × Clock Rate)/CPI
▪ Instructions count for P1 = (10s × 3G)/1.5 = 20G instructions
▪ Instructions count for P2 = (10s × 2.5G)/1.0 = 25G instructions
▪ Instructions count for P3 = (10s × 4G)/2.2 = 18.18G instructions
Exercise 1
32

CPU Time = Instructions Count × CPI × Clock Cycle Time

▪ Consider three different processors P1, P2, and P3 executing the

same instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2
has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock
rate and has a CPI of 2.2.
c. We are trying to reduce the execution time by 30%, but this leads to
an increase of 20% in the CPI. What clock rate should we have to get
this time reduction?

▪ Solution (c)
▪ 0.7 × CPU Time = Instructions Count × 1.2 × CPI × Clock Cycle Time
▪ 1.2 / n = 0.7 → n = 1.2/0.7 = 1.714
▪ The clock rate (for any processor) must be increased by 1.714 to
achieve 30% reduction in execution time!
Exercise 2
33

CPU Time = Instructions Count × CPI × Clock Cycle Time

▪ Consider two different implementations of the same instruction set

architecture. The instructions can be divided into four classes according
to their CPI (classes A, B, C, and D). P1 with a clock rate of 2.5 GHz
and CPIs of 1, 2, 3, and 3, and P2 with a clock rate of 3 GHz and CPIs
of 2, 2, 2, and 2.
▪ Given a program with a dynamic instruction count of 1.0E6 instructions
divided into classes as follows: 10% class A, 20% class B, 50% class C,
and 20% class D
▪ Which is faster: P1 or P2?
▪ Solution
▪ CPU Time for P1 = (1e6) × (0.1 × 1 + 0.2 × 2 + 0.5 × 3 + 0.2 × 3) ×
(1/2.5GHz)
= 1.04 × 10-3 seconds
▪ CPU Time for P2 = (1e6) × (0.1 × 2 + 0.2 × 2 + 0.5 ×2 + 0.2 × 2) × (1/3GHz)
= 0.666 × 10-3 seconds
▪ P2 is faster!!
Exercise 2
34

CPU Time = Instructions Count × CPI × Clock Cycle Time

▪ Consider two different implementations of the same instruction set

architecture. The instructions can be divided into four classes
according to their CPI (classes A, B, C, and D). P1 with a clock rate
of 2.5 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate of 3
GHz and CPIs of 2, 2, 2, and 2.
▪ Given a program with a dynamic instruction count of 1.0E6
instructions divided into classes as follows: 10% class A, 20% class
B, 50% class C, and 20% class D
▪ What is the global CPI for each implementation??
▪ Solution
▪ Global CPI for P1 = 0.1 × 1 + 0.2 × 2 + 0.5 × 3 + 0.2 × 3 = 2.6
▪ Global CPI for P2 = 0.1 × 2 + 0.2 × 2 + 0.5 ×2 + 0.2 × 2 = 2
Exercise 2
35

CPU Time = Instructions Count × CPI × Clock Cycle Time

▪ Consider two different implementations of the same instruction set

architecture. The instructions can be divided into four classes
according to their CPI (classes A, B, C, and D). P1 with a clock rate
of 2.5 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate of 3
GHz and CPIs of 2, 2, 2, and 2.
▪ Given a program with a dynamic instruction count of 1.0E6
instructions divided into classes as follows: 10% class A, 20% class
B, 50% class C, and 20% class D
▪ Find the clock cycles required in both cases
▪ Solution
▪ Clock Cycles for P1 = 1e6 × 2.6 = 2.6M cycles
▪ Clock Cycles for P2 = 1e6 × 2 = 2M cycles
MIPS as Performance Measure
36

▪ MIPS: Million instructions per second

Instructions Count
MIPS =
Execution Time × 𝟏𝟎𝟔

▪ MIPS is the rate of instructions execution

▪ i.e., inverse of execution time!
▪ Limitations
▪ Cannot compare computers with different ISAs as the instructions count
and CPI would vary!
▪ Varies between programs for same ISA!
▪ Alternatively,

Instructions Count Clock Rate

MIPS = =
Instructions Count × CPI × 𝟏𝟎𝟔 CPI × 𝟏𝟎𝟔
Clock Rate
MIPS vs Execution Time!
37

Instructions Count Clock Rate

MIPS = MIPS =
Execution Time × 𝟏𝟎𝟔 CPI × 𝟏𝟎𝟔

▪ For a given program, consider:

Measurement Computer A Computer B
Instruction Count 10 billion 8 billion
Clock rate 4 GHz 4 GHz
CPI 1.0 1.1

▪ Which computer has higher MIPS rating?

▪ Solution
▪ MIPS for Computer A = 4G/(1×106 ) = 4×103
▪ MIPS for Computer B = 4G/(1.1×106 ) = 3.64 ×103
▪ Computer A has higher MIPS rating!
MIPS vs Execution Time!
38

Instructions Count Clock Rate

MIPS = MIPS =
Execution Time × 𝟏𝟎𝟔 CPI × 𝟏𝟎𝟔

▪ For a given program, consider:

Measurement Computer A Computer B

Instruction Count 10 billion 8 billion
Clock rate 4 GHz 4 GHz
CPI 1.0 1.1

▪ Which one is faster?

▪ Solution
▪ Execution Time for A = (10G)/(4×103 ×106 ) = 2.5s
▪ Execution Time for B = (8G)/(3.64 ×103 ×106 ) = 2.198s
▪ Computer B is faster despite having lower MIPS rating!
Execution Time is a more accurate measure of performance!
Amdahl’s La
39

▪ Performance enhancement possible by a given improvement is

limited by the amount that improved feature is used!
Execution time affected by
Execution time after improvement
Execution time
improvement = +
Amount of improvement unaffected
▪ Example
▪ Suppose a program runs in 100 seconds on a computer, with multiply
operations responsible for 80 seconds of this time. How much do I have
to improve the speed of multiplication if I want my program to run five
times faster?
▪ Solution
▪ 20 = (80/n) + 20
▪ That is, there is no amount by which we can enhance-multiply to achieve a
fivefold increase in performance, if multiply accounts for only 80% of the
workload.
Homework!
40

▪ Problems for Practice

▪ Do practice for exam!
▪ Chapter 1 Exercise Problems
▪ 1.6, 1.7,1.8, 1.14, 1.15
▪ Chapter 2 Exercise Problems
▪ 2.39 and 2.40
Relevant Reading
41

▪ Computer Organization and Design (RISC-V Edition), Patterson and

Hennessy
▪ Chapter 1
▪ Sections 1.6, 1.8, 1.9 and 1.10!

KORG MONOLOGUE Service Manual
No ratings yet
KORG MONOLOGUE Service Manual
16 pages
Nse4 FGT-7.0
No ratings yet
Nse4 FGT-7.0
4 pages
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
80% (5)
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
118 pages
BMW OnlineTIS Site - Information
100% (1)
BMW OnlineTIS Site - Information
30 pages
NEC SIP@Net - Installation Manual - ISS
100% (1)
NEC SIP@Net - Installation Manual - ISS
102 pages
Performance
No ratings yet
Performance
51 pages
Lecture 02 CH01 Performance Power
No ratings yet
Lecture 02 CH01 Performance Power
76 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
No ratings yet
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
52 pages
ACA Lec2 New
No ratings yet
ACA Lec2 New
44 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
52 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
56 pages
Week 10 Part 02 - Processor Performance (Answers)
No ratings yet
Week 10 Part 02 - Processor Performance (Answers)
35 pages
Module 2 (26-10-2024)
No ratings yet
Module 2 (26-10-2024)
50 pages
Lesson 3 - Computing For Performance
No ratings yet
Lesson 3 - Computing For Performance
38 pages
Computer Organization The Role of Performance
No ratings yet
Computer Organization The Role of Performance
45 pages
09 Perf
No ratings yet
09 Perf
22 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
Slide 1
No ratings yet
Slide 1
33 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
52 pages
C A Lecture-3
No ratings yet
C A Lecture-3
41 pages
Computer Performance
No ratings yet
Computer Performance
22 pages
Performance
No ratings yet
Performance
23 pages
4 Perfrmance
No ratings yet
4 Perfrmance
30 pages
Week 2 - Lecture 2 - Performance Measurement
No ratings yet
Week 2 - Lecture 2 - Performance Measurement
25 pages
Module 3.3 - Problems On Performance
No ratings yet
Module 3.3 - Problems On Performance
54 pages
It3030e CA Chap1 Introduction 2.0m
No ratings yet
It3030e CA Chap1 Introduction 2.0m
25 pages
L14 Introduction To Performance Evaluation
No ratings yet
L14 Introduction To Performance Evaluation
48 pages
Week 1
No ratings yet
Week 1
34 pages
Lecture # 2
No ratings yet
Lecture # 2
33 pages
Lecture 2: Performance/Power, MIPS Instructions
No ratings yet
Lecture 2: Performance/Power, MIPS Instructions
28 pages
Lecture4 Performance Evaluation
No ratings yet
Lecture4 Performance Evaluation
34 pages
Performance Measures
No ratings yet
Performance Measures
25 pages
2 - Computer Organization and Architecture
No ratings yet
2 - Computer Organization and Architecture
21 pages
02 Performance
No ratings yet
02 Performance
23 pages
Computer Architecture Measurement
No ratings yet
Computer Architecture Measurement
26 pages
Intro
No ratings yet
Intro
14 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Lecture4 Performance Evaluation 2011
No ratings yet
Lecture4 Performance Evaluation 2011
34 pages
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
No ratings yet
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
23 pages
02 Performance
No ratings yet
02 Performance
13 pages
Performance Chap4
No ratings yet
Performance Chap4
20 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
The Role of Performance: Chapter - 2
No ratings yet
The Role of Performance: Chapter - 2
40 pages
Computer Performance
No ratings yet
Computer Performance
18 pages
Computer Performance
No ratings yet
Computer Performance
17 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
Lect 1
No ratings yet
Lect 1
54 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
12 CPUPerformance
No ratings yet
12 CPUPerformance
26 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
Lect 1
No ratings yet
Lect 1
56 pages
Cse - 321 - 2
No ratings yet
Cse - 321 - 2
37 pages
Lsli 02
No ratings yet
Lsli 02
32 pages
Unit 2 Performance
No ratings yet
Unit 2 Performance
6 pages
Week2 Performance
No ratings yet
Week2 Performance
15 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
Performance of Computer Systems
No ratings yet
Performance of Computer Systems
6 pages
EE313 FilterDesign FormulaSheet
No ratings yet
EE313 FilterDesign FormulaSheet
1 page
Discovering Computers Fundamentals, 2012 Edition: Your Interactive Guide To The Digital World
No ratings yet
Discovering Computers Fundamentals, 2012 Edition: Your Interactive Guide To The Digital World
68 pages
Unit V Transport Layer & Application Layer
No ratings yet
Unit V Transport Layer & Application Layer
33 pages
Interface Panel Function PDF
No ratings yet
Interface Panel Function PDF
41 pages
(B) Zener Diode Characteristics: S.No Particulars Specification Range Quantity
No ratings yet
(B) Zener Diode Characteristics: S.No Particulars Specification Range Quantity
4 pages
Property Management System (PMS)
No ratings yet
Property Management System (PMS)
20 pages
Kunal Practical File
No ratings yet
Kunal Practical File
21 pages
Avago
No ratings yet
Avago
64 pages
Manual 710
No ratings yet
Manual 710
85 pages
VHDFF
No ratings yet
VHDFF
4 pages
Q1. Explain JDK, JRE and JVM?
No ratings yet
Q1. Explain JDK, JRE and JVM?
1 page
I ORIC-1: Z Basic I Programming 2 Manual
No ratings yet
I ORIC-1: Z Basic I Programming 2 Manual
138 pages
Srinivasa Ramanujan Institute of Technology
No ratings yet
Srinivasa Ramanujan Institute of Technology
2 pages
CPU-Upgrade - AMD 480X CrossFire Chipset Processor Support
No ratings yet
CPU-Upgrade - AMD 480X CrossFire Chipset Processor Support
13 pages
T6 Digital Fractional-N Phase Locked Loop Design
No ratings yet
T6 Digital Fractional-N Phase Locked Loop Design
90 pages
Vdoc - Pub - Voice Data Communications Handbook 1
100% (3)
Vdoc - Pub - Voice Data Communications Handbook 1
67 pages
Computer-Fundamentals Solved MCQs (Set-28)
No ratings yet
Computer-Fundamentals Solved MCQs (Set-28)
8 pages
Riverbed SD-WAN: Ricky Lin, CCIE#7469, MBA Deputy CTO
No ratings yet
Riverbed SD-WAN: Ricky Lin, CCIE#7469, MBA Deputy CTO
26 pages
MQX Rtos PDF
100% (1)
MQX Rtos PDF
23 pages
Automatic Tension Controller Usage Manual
No ratings yet
Automatic Tension Controller Usage Manual
6 pages
Red Hat Enterprise Linux 10
No ratings yet
Red Hat Enterprise Linux 10
68 pages
Java Persistence API JPA Basics
No ratings yet
Java Persistence API JPA Basics
60 pages
Table Scanner - Abap
No ratings yet
Table Scanner - Abap
29 pages
Msi MS-7267 Rev 4.8 PDF
No ratings yet
Msi MS-7267 Rev 4.8 PDF
29 pages
Searching On Sorted Sequence
No ratings yet
Searching On Sorted Sequence
9 pages
Experimental Approach CDMA & Interference (From Architecture Through VLSI)
No ratings yet
Experimental Approach CDMA & Interference (From Architecture Through VLSI)
290 pages

CAO Fall 2024 Lecture 06 Design Metrics Performance Evaluation

Uploaded by

CAO Fall 2024 Lecture 06 Design Metrics Performance Evaluation

Uploaded by

EE-321 Fall 2024

Computer Architecture and Organization

▪ Content from following has been used in these lectures

▪ Design Metrics and Design Tradeoffs

▪ Throughput after pipelining?

▪ Latency after pipelining?

▪ Timing after pipelining?

[7:0] [7:0] × Xpower[7:0]

▪ Which airplane is fastest / best performing?

▪ Throughput may impact response time

▪ Execution Time / Elapsed Time

CPU Execution Time CPU Clock Cycles for a

CPU Execution Time CPU Clock Cycles for a program

CPU Execution Time CPU Clock Cycles for a

CPU Execution Time CPU Clock Cycles for a program

▪ Execution time also depends on number of instructions in a program!

CPU Clock Cycles for a Instructions for a Average Clock Cycles

▪ CPI (average) is one way of comparing different implementations of

CPU Clock Cycles for a Instructions for a Average Clock Cycles

▪ Example: Comparing two implementations of same ISA

CPU Time = Instructions Count × CPI × Clock Cycle Time

CPU Time = Instructions Count × CPI

Seconds Instructions Clock Cycles Seconds

CPU Time = Instructions Count × CPI × Clock Cycle Time

Hardware Specifications Two alternative code sequences

▪ Which code sequence executes most instructions?

▪ A given application written in Java runs 15 seconds on a desktop

CPU Time = Instructions Count × CPI × Clock Cycle Time

▪ Consider three different processors P1, P2, and P3 executing the

CPU Time = Instructions Count × CPI × Clock Cycle Time

CPU Time = Instructions Count × CPI × Clock Cycle Time

▪ Consider three different processors P1, P2, and P3 executing the

CPU Time = Instructions Count × CPI × Clock Cycle Time

▪ Consider two different implementations of the same instruction set

CPU Time = Instructions Count × CPI × Clock Cycle Time

▪ Consider two different implementations of the same instruction set

CPU Time = Instructions Count × CPI × Clock Cycle Time

▪ Consider two different implementations of the same instruction set

▪ MIPS: Million instructions per second

▪ MIPS is the rate of instructions execution

Instructions Count Clock Rate

Instructions Count Clock Rate

▪ For a given program, consider:

▪ Which computer has higher MIPS rating?

Instructions Count Clock Rate

▪ For a given program, consider:

Measurement Computer A Computer B

▪ Which one is faster?

▪ Performance enhancement possible by a given improvement is

▪ Problems for Practice

▪ Computer Organization and Design (RISC-V Edition), Patterson and

You might also like