0% found this document useful (0 votes)

53 views35 pages

Computer Performance

This document discusses approaches for assessing computer system performance. It outlines factors that affect performance such as clock frequency, instructions per second, cycles per instruction, and memory access time. Amdahl's Law is introduced, which states that potential speedup from parallelization is limited by the fraction of the program that can be parallelized. The more serial portions limit scaling. Benchmarks and homeworks are also mentioned as ways to evaluate performance.

Uploaded by

Alejandra Solano

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views35 pages

Computer Performance

Uploaded by

Alejandra Solano

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Designing for

Performance

Raul Queiroz Feitosa

Objective

In this chapter we examine the most common

approach to assessing processor and
computer system performance”

Designing for Performance 2

Outline
◼ Performance Assessment
◼ Amdahl’s Law
◼ Benchmarks
◼ Homeworks

Designing for Performance 3

Which one would you choose?

Intel Xeon Platinum 8458P AMD Ryzen Threadripper PRO 5975WX

Cache 39 MB Cache 128 MB
Freq.: 2.2 GHz Freq.: 3.6 GHz
26 Cores 32 Cores

Designing for Performance 4

What matters?
❑ Cost
❑ Size
❑ Reliability
❑ Security
❑ Power Consumption
❑ Performance
❑…

Designing for Performance 5

Main CPU operations
❑ Fetch instructions
❑ Decode instructions
❑ Load and Store data
❑ Logic and Arithmetic Operations
❑ Fixed-Point

❑ Floating-Point

Designing for Performance 6

Performance Factors
Clock frequency ( f ) – expressed in multiples of Hz
• Clock cycle - one increment, or pulse, of the clock.
• Clock period ( τ ) - the time between consecutive pulses.
• Duty cycle – the ratio of time a signal is high compared to the total time.

clock
𝜏 cycle

clock
generator CPU

actual clock

Designing for Performance 7

Performance Factors
Clock frequency

• Usually, multiple clock cycles are required per

instruction.
• The amount of work implied by one instruction varies
considerably.
• Pipelining gives simultaneous execution of instructions.
• So, the clock frequency is not the whole story!

Designing for Performance 8

Performance Factors
Instruction Execution Rate

• Expressed in Millions of Instructions (MIPS)

• Floating-Point Instructions (MFLOPS) per second.
• Heavily dependent on the instruction set, compiler
design, processor implementation, cache, and memory
hierarchy.
• So, Instruction Execution Rate is not the whole story!

Designing for Performance 9

Performance Factors
CPI – the average number of cycles per instruction

• CPIk - number of cycles per instruction of type k.

• Ik - number of machine instructions of type k executed by a
program.
• Ic - number of machine instructions executed by a program

𝑛
σ𝑛𝑘=1 𝐶𝑃𝐼𝑘 × 𝐼𝑘
𝐼𝑐 = ෍ 𝐼𝑘 𝐶𝑃𝐼 = 𝑓(𝑀ℎ𝑧) = 𝐶𝑃𝐼 ∗ 𝑀𝐼𝑃𝑆
𝑘=1
𝐼𝑐

Designing for Performance 10

Performance Factors
T – processor time needed to execute a program
.

T = I c  CPI  

A refinement yields
𝑇 = 𝐼𝑐 × 𝑝 + (𝑚 × 𝐾) × 𝜏
where
p is the number of processor cycles to decode + execute the instruction
m is the number of memory references needed
K is the ratio between memory cycle time and processor cycle time.

Designing for Performance 11

Review Question 1
System attributes affecting the performance factors

Ic p m K τ
Instruction set architecture ✓ ✓ !

Compiler technology ✓ ✓ ✓

Processor implementation ✓ ✓

Cache and memory hierarchy ✓ ✓

• Ic is the total number of executed instructions

• p is the number of cycles for processor internal operations
• m is the number of memory references needed
• k is the ratio between memory cycle time and processor cycle time.
• τ is the clock period. Designing for Performance 12
Review Question 2
Consider two codes produced by two compilers for the same source program. The instructions of
the machine that will execute these codes can be divided into classes A (CPI=1) and B (CPI=2).
The number of executed instructions for each class are:
Class compiler 1 compiler 2 comments
A 600M 400M CPI=1
B 400M 400M CPI=2
a) Compute the execution time for both codes assuming a clock rate = 1 GHz.
𝑇1 = (600 × 1 + 400 × 2)106 Τ109 =1.4s
𝑇2 = (400 × 1 + 400 × 2)106 Τ109 =1.2s

b) Which compiler produces the most efficient code and by which factor?
The compiler 2 was 1,4/1,2=1,17 times more efficient than compiler 1

c) Which code executes at the highest MIPS?

𝐶𝑃𝐼1 = (600 × 1 + 400 × 2)106 Τ(1000𝑥106 ) = 1,4 cloks/instruction
𝐶𝑃𝐼2 = (400 × 1 + 400 × 2)106 Τ(800𝑥106 ) = 1,5 cloks/instruction
1000 800
Therefore, 𝑀𝐼𝑃𝑆1 = = 714 and 𝑀𝐼𝑃𝑆2 = = 667
1.4 1.2

Designing for Performance 13

Outline
◼ Performance Assessment
◼ Amdahl’s Law
◼ Homeworks
◼ Benchmarks

Designing for Performance 14

Amdahl’s Law
potential speed-up of the program using multiple processors

 T is the total execution time for the program on a single processor

 Fraction (1-f) of code inherently serial
 Fraction f of code parallelizable with no scheduling overhead
 N is the number of processors that fully exploit parallel portions of code

𝑇
single processor 𝑇(1 − 𝑓) 𝑇𝑓

𝑇𝑓
N parallel 𝑇(1 − 𝑓) 𝑁
processors

time to execute program on a single processor 𝑇 1 − 𝑓 + 𝑇𝑓 1

𝑆𝑝𝑒𝑒𝑑𝑢𝑝 = = =
time to execute programa on 𝑁 parallel processors 𝑇𝑓 𝑓
𝑇 1−𝑓 + 1−𝑓 +
𝑁 𝑁

Designing for Performance 15

Amdahl’s Law
potential speed-up of the program using multiple processors

 Performance gain conditioned to parallelizable code!

 If f small, adding processors has little effect.
 N → ∞, speedup bound by 1/(1 – f).
 diminishing returns for more processors.

1
𝑓
1−𝑓 +
𝑁

Designing for Performance 16

Amdahl’s Law
in practice
Parallel programs introduce an overhead due to coordination
and synchronization, not present in their sequential
counterparts.
𝑇
single processor 𝑇(1 − 𝑓) 𝑇𝑓

𝑇𝑓
N parallel 𝑇(1 − 𝑓) 𝑁
𝑜

processors

So, the actual speed-up becomes

𝑇 𝑇1 −
1−𝑓 𝑓+ +
𝑇𝑓𝑇𝑓 1 1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 = = =
𝑇𝑓 𝑓 𝑓
𝑇 1−𝑓 + + 𝑜1 − 1
𝑓 −+𝑓 + + 𝑜
𝑁 𝑁
𝑁

Designing for Performance 17

Review Question 3
A program spends 60% of its execution time with floating point operations. 90% of
them are executed in parallelizable loops. When the code is parallelized coordination
and synchronization between parts make the part not involving floating-point
operations 10% longer.
a) Find the improvement in terms of execution time achieved by doubling the speed of
the floating-point unit.
1
𝑠𝑝𝑒𝑒𝑑𝑢𝑝 = = 1.43
0.6
2 + 0.4
b) Find the improvement in terms of execution time achieved by using two processors
having the same speed and structure as the original one
1
𝑠𝑝𝑒𝑒𝑑𝑢𝑝 = = 1.30
0.6 ∗ 0.9
+ 0.6 ∗ 0.1 + 1.1 ∗ 0.4
2
c) What would be the improvement if both changes are implemented?
1
𝑠𝑝𝑒𝑒𝑑𝑢𝑝 = = 1.65
0.6 ∗ 0.9 0.6 ∗ 0.1
+ + 1.1 ∗ 0.4
4 2
Designing for Performance 18
Amdahl’s Law
Generalization for any design improvement

Execution time before enhancement

Speedup = .
Execution time after enhancement

Suppose that the enhancement affects the execution f of the

total runtime before enhancement, and that the speed up
brought by this enhancement is SUf . Thus

1
Speedup =
(1 − f ) + f
SU f

Designing for Performance 19

Amdahl’s Law
Generalization for any design improvement

Example:
Suppose that a task consumes 40% of the time with
floating-point operations. A new FPU has speedup
K. Then the overall speedup is
1
Speedup =
(1 − 0.4) + 0.4
K
So, the maximum speedup is 1.67.

Designing for Performance 20

Outline
◼ Performance Assessment
◼ Amdahl’s Law
◼ Benchmarks
◼ Homeworks

Designing for Performance 21

Benchmarks
Motivation

A high-level language statement

A=B+C /* assume all quantities in main memory */

Compiled code on RISC

load mem(B),reg(1);
Compiled code on CISC load mem(C),reg(2);
add mem(B),mem(C),mem(A) add reg(1),reg(2),reg(3);
store reg(3),mem(A);

Assume that both machines take the same time to run the same
high-level code.
So, if MIPSCISC= 1, then MIPSRISC= 4
Designing for Performance 22
Benchmarks
Definition
 Programs designed to test performance
 Written in high-level language → portable
 Represents a particular application or system programming
area (scientific, commercial)
 Easily measured and widely distributed
 The best-known such collection of benchmark suites is the
System Performance Evaluation Corporation (SPEC)
 The best-known of the SPEC suites is the CPU2017:
◼ contains 43 benchmarks organized into four suites
◼ includes an optional metric for measuring energy
consumption
Designing for Performance 23
System Performance Evaluation Corporation
(SPEC)

Designing for Performance 24

Benchmarks
SPECspeed metric

 Spec benchmarks do not concern with instruction execution

rates
 Base runtime defined for each benchmark using a reference
machine
 Speed metric is the ratio of reference time to system run time
◼ Trefi execution time for benchmark i on reference machine
◼ Tsuti execution time of benchmark i on a test system

Designing for Performance 25

Benchmarks
Averaging SPEC metrics

For the SPECspeed, the selected ratios are averaged using the
Geometric Mean, which is reported as the overall metric.

Designing for Performance 26

Outline
◼ Performance Assessment
◼ Amdahl’s Law
◼ Benchmarks
◼ Homeworks

Designing for Performance 27

Homework 1
A program involves the execution of 2 million instructions on a 400 MHz
processor. CPI and the proportion of four instruction types are given below.
Compute the average CPI:

instruction type CPI instruction mix

Arithmetic and logic 1 60%
Load/store with cache hit 2 18%
Branch 4 12%
Load/store with cache miss 8 10%

Answer:
average CPI is CPI = 0.6+ (2  0.18) + (4  0.12) + (8  0.1) = 2.24

Designing for Performance 28

Homework 2
Consider two hardware implementations M1 and M2 of the same instruction set.
There are three instruction classes: F, I and N. The M1 clock rate is 600 Mhz.
The clock cycle of M2 is 2 ns. The average CPI for these three instruction classes
are
Class CPI of M1 CPI of M2 Comments
F 5.0 4.0 floating-point
I 2.0 3.8 integer
N 2.4 2.0 non-arithmetic
a) Compute the peak performance for M1 and M2 in MIPS.
b) If 50% of the instruction executed in a given program belong to class N and
the other are equally distributed between F and I, which is the fastest
machine and by which factor?

Designing for Performance 29

Homework 2
c) A designer of M1 plan to change the project to improve performance.
Assuming the information in (b). Which of the options below should be
more beneficial?
1. Use a FPU twice as fast (CPI=2,5 for class F).
2. Add a second ALU to reduce the CPI for integer operations to 1.20
3. Use a faster logic that allows a clock rate of 750 MHz keeping the same
CPI values?
d) The CPI given above include a cache miss that occurs 5 times per 100
executed instructions. Each cache miss imply in a 10 cycles penalty. The
forth redesign option consists of using a larger instruction cache so as to
reduce the miss ratio from 5% to 3%. Compare this alternative with the
options before.
e) Characterize application programs that can be executed faster in M1 than in
M2, i. e., discuss the instruction composition of such applications. Hint: Let
x, y and 1-x-y the fraction of instructions belonging to classes F, I and N
respectively.

Designing for Performance 30

Homework 3
A processor is used for an application where 30 %, 25% and 10% of the
processing time is spent with floating-point addition, multiplication and division,
respectively. For a new processor version, 3 alternatives are being considered, all
of them involving nearly the same design and implementation cost. Which one
should be selected?
a) Redesign the adder making it twice as fast as the older one.
b) Redesign the multiplier making it three times as fast as the older one
c) Redesign the divider making it ten times as fast as the older one.

Designing for Performance 31

Homework 4
T is the average processing time of a computer operating at frequency f.
Instructions are grouped in 3 types, as shown below.
Instruction type CPI
Floating point arithmetic 10
Integer arithmetic 5
Non- arithmetic 2
Typically a program executes the same proportion of instructions from all three
groups/types. Compute the MIPS and the new execution time, if the FPU
becomes twice as fast.

Designing for Performance 32

Homework 5
Let f1 and f2 be the operation frequency of processors P1 and P2 respectively.
Assume that two compilers generate different executable codes for the same
source program which may be executed byP1 as well as byP2 . The codes have
the characteristics given below:

Proportion Proportion
Instruction type CPI
compiler 1 compiler 2
Floating point arithmetic 10 20 % 30 %
Integer arithmetic 5 30 % 10 %
Non- arithmetic 2 50 % 60 %

Compute the ratio f1/f2 for which the processing time in P1 executing code 1
equals the processing time of P2 executing code 2.

Designing for Performance 33

Homework 6
The code of an application can be separated in a sequential part (S) and in a
parallelizable part (P). The number of executed instructions of type P is twice as many as
of type S, when the application runs in a single processor. When the application runs in
multiple processors the number of instructions of type S increases in 10%. Consider the
following two configurations:

A) Single processor machine operating with frequency 2f.

B) Four processors machine operating with frequency f.

a) Determine the limit ratio r between the CPI of instructions of type P and type S
(r=CPIP /CPIS), for which the configuration A) is faster than configuration B).
b) Compute the upper limit for the speed up that can be achieved using multiple processors
without changing the operation frequency.

Designing for Performance 34

Designing for Performance

END
15-17, 24,28,31-25

Designing for Performance 35

Coa Viva Questions
83% (6)
Coa Viva Questions
15 pages
Intel Virtualization Technology Roadmap and VT-D Support in Xen
100% (1)
Intel Virtualization Technology Roadmap and VT-D Support in Xen
19 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
18 pages
06 CA (Performance Enhancement)
No ratings yet
06 CA (Performance Enhancement)
31 pages
CS-3006 4 PerformanceAnalysis
No ratings yet
CS-3006 4 PerformanceAnalysis
62 pages
lec3
No ratings yet
lec3
20 pages
Micro Processor and Assembly Language
No ratings yet
Micro Processor and Assembly Language
16 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
2 CPU Performance
No ratings yet
2 CPU Performance
35 pages
Lec 2 Performance
No ratings yet
Lec 2 Performance
28 pages
CS-3006 10 PerformanceAnalysis
No ratings yet
CS-3006 10 PerformanceAnalysis
52 pages
Computer Performance
No ratings yet
Computer Performance
27 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
Cs23402 - Computer Architecture - Unit - 1
No ratings yet
Cs23402 - Computer Architecture - Unit - 1
161 pages
Lecture 07 - Performance Measurements - Single and Multiple Cycle Processor Designs
No ratings yet
Lecture 07 - Performance Measurements - Single and Multiple Cycle Processor Designs
53 pages
CMP3010L02 Performance Datapath
No ratings yet
CMP3010L02 Performance Datapath
68 pages
Lec 3
No ratings yet
Lec 3
21 pages
2 Week
No ratings yet
2 Week
35 pages
Computer Architecture Unit 1
No ratings yet
Computer Architecture Unit 1
12 pages
Computer Architecture Unit 1 - Phase 2 PDF
No ratings yet
Computer Architecture Unit 1 - Phase 2 PDF
26 pages
Lecture2 E5231
No ratings yet
Lecture2 E5231
38 pages
Multi-Core Computer Architecture: Performance Evaluation Methods
No ratings yet
Multi-Core Computer Architecture: Performance Evaluation Methods
20 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
Chapter 1 Lecture 2 & 3 - Performance
No ratings yet
Chapter 1 Lecture 2 & 3 - Performance
36 pages
Chapter Two
No ratings yet
Chapter Two
33 pages
Lec 2
No ratings yet
Lec 2
31 pages
Inroduction and Performance Analysis
No ratings yet
Inroduction and Performance Analysis
29 pages
Quatitative Principle
No ratings yet
Quatitative Principle
56 pages
Lec 2
No ratings yet
Lec 2
31 pages
Fundamentals of Computer Design - 1
No ratings yet
Fundamentals of Computer Design - 1
32 pages
Computer Architecture
No ratings yet
Computer Architecture
26 pages
513 Lec 02 Quantifying Performance
No ratings yet
513 Lec 02 Quantifying Performance
50 pages
Performance: Latency
No ratings yet
Performance: Latency
7 pages
Assessing and Understanding Performance
No ratings yet
Assessing and Understanding Performance
31 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
L14 Introduction To Performance Evaluation
No ratings yet
L14 Introduction To Performance Evaluation
48 pages
Performance Numericals
No ratings yet
Performance Numericals
24 pages
Lecture 3: Performance/Power, MIPS Instructions
No ratings yet
Lecture 3: Performance/Power, MIPS Instructions
18 pages
SEN307 Lecture 5
No ratings yet
SEN307 Lecture 5
34 pages
Computer Architecture Measurement
No ratings yet
Computer Architecture Measurement
26 pages
Performance Chap4
No ratings yet
Performance Chap4
20 pages
Computer Performance Measurement. Amdahl's Law
No ratings yet
Computer Performance Measurement. Amdahl's Law
24 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
Performance
No ratings yet
Performance
4 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
Computer Performance
No ratings yet
Computer Performance
17 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
52 pages
Module 2 (26-10-2024)
No ratings yet
Module 2 (26-10-2024)
50 pages
Quiz For Chapter 1 Computer Abstractions and Technology
No ratings yet
Quiz For Chapter 1 Computer Abstractions and Technology
5 pages
L7 Performance
No ratings yet
L7 Performance
11 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
CA Lecture1
No ratings yet
CA Lecture1
9 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
2 Ch1 Npsf824lec2
No ratings yet
2 Ch1 Npsf824lec2
19 pages
Ünite
No ratings yet
Ünite
33 pages
2 - Computer Organization and Architecture
No ratings yet
2 - Computer Organization and Architecture
21 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
Quiz For Chapter 1 With Solutions
No ratings yet
Quiz For Chapter 1 With Solutions
6 pages
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
CIT 101 Holiday ASSIGNMENT
No ratings yet
CIT 101 Holiday ASSIGNMENT
24 pages
AKAI Professional OS Disk Maker
No ratings yet
AKAI Professional OS Disk Maker
2 pages
MPMC Unit 5 Notes
No ratings yet
MPMC Unit 5 Notes
28 pages
HP Compaq 8200 Elite Series Manual
No ratings yet
HP Compaq 8200 Elite Series Manual
114 pages
Accreditation: End-User/ Unit: Charged To
No ratings yet
Accreditation: End-User/ Unit: Charged To
12 pages
Format Step Note
No ratings yet
Format Step Note
15 pages
Sandisk® SSD Plus: Step Up To SSD Speeds
No ratings yet
Sandisk® SSD Plus: Step Up To SSD Speeds
2 pages
Acer Veriton M2630G Intel Core I5
No ratings yet
Acer Veriton M2630G Intel Core I5
14 pages
PR4XXX Series EN
No ratings yet
PR4XXX Series EN
6 pages
People V Burgos
No ratings yet
People V Burgos
2 pages
Step 4: Install Macos High Sierra: Clover Basics and Switching Advice
No ratings yet
Step 4: Install Macos High Sierra: Clover Basics and Switching Advice
3 pages
1-3 OptionConfiguration
No ratings yet
1-3 OptionConfiguration
7 pages
PHD Course Work: Computer Applications-Iia
No ratings yet
PHD Course Work: Computer Applications-Iia
41 pages
Canon GP216 GP215 GP211 GP210 GP200 Parts Catalog PDF
No ratings yet
Canon GP216 GP215 GP211 GP210 GP200 Parts Catalog PDF
368 pages
WWPC World Ports
No ratings yet
WWPC World Ports
20 pages
Afrizal Bagas Santoso - 21120117130075 Bab 1 Sudah Dikumpulkan
No ratings yet
Afrizal Bagas Santoso - 21120117130075 Bab 1 Sudah Dikumpulkan
4 pages
LB# 820-2849 Schematic Diagram PDF
No ratings yet
LB# 820-2849 Schematic Diagram PDF
103 pages
Catalogue Full PC-3000 Flash All-in-One Package TA
No ratings yet
Catalogue Full PC-3000 Flash All-in-One Package TA
4 pages
Zeal Polytechnic, Pune.: Third Year (Ty) Diploma in Computer Engineering Scheme: I Semester: V
No ratings yet
Zeal Polytechnic, Pune.: Third Year (Ty) Diploma in Computer Engineering Scheme: I Semester: V
34 pages
Operating System
100% (1)
Operating System
11 pages
Purchase Request: Qty. Unit Item Description Stock No. Estimated Unit Cost Total
No ratings yet
Purchase Request: Qty. Unit Item Description Stock No. Estimated Unit Cost Total
2 pages
Ng16dl Compact PLC
50% (2)
Ng16dl Compact PLC
2 pages
LCD Code For Avr
100% (2)
LCD Code For Avr
4 pages
HP DesignJet T650 Printer v3
100% (2)
HP DesignJet T650 Printer v3
2 pages
Manual HP 412
No ratings yet
Manual HP 412
139 pages
Gameboy Color Complete Technical Reference
No ratings yet
Gameboy Color Complete Technical Reference
168 pages
Ecs RC410
No ratings yet
Ecs RC410
3 pages
TR10 179
No ratings yet
TR10 179
16 pages

Computer Performance

Uploaded by

Computer Performance

Uploaded by

Designing for

Raul Queiroz Feitosa

In this chapter we examine the most common

Designing for Performance 2

Designing for Performance 3

Intel Xeon Platinum 8458P AMD Ryzen Threadripper PRO 5975WX

Designing for Performance 4

Designing for Performance 5

Designing for Performance 6

Designing for Performance 7

• Usually, multiple clock cycles are required per

Designing for Performance 8

• Expressed in Millions of Instructions (MIPS)

Designing for Performance 9

• CPIk - number of cycles per instruction of type k.

Designing for Performance 10

Designing for Performance 11

Cache and memory hierarchy ✓ ✓

• Ic is the total number of executed instructions

c) Which code executes at the highest MIPS?

Designing for Performance 13

Designing for Performance 14

 T is the total execution time for the program on a single processor

time to execute program on a single processor 𝑇 1 − 𝑓 + 𝑇𝑓 1

Designing for Performance 15

 Performance gain conditioned to parallelizable code!

Designing for Performance 16

So, the actual speed-up becomes

Designing for Performance 17

Execution time before enhancement

Suppose that the enhancement affects the execution f of the

Designing for Performance 19

Designing for Performance 20

Designing for Performance 21

A high-level language statement

A=B+C /* assume all quantities in main memory */

Compiled code on RISC

Designing for Performance 24

 Spec benchmarks do not concern with instruction execution

Designing for Performance 25

Designing for Performance 26

Designing for Performance 27

instruction type CPI instruction mix

Designing for Performance 28

Designing for Performance 29

Designing for Performance 30

Designing for Performance 31

Designing for Performance 32

Designing for Performance 33

A) Single processor machine operating with frequency 2f.

Designing for Performance 34

Designing for Performance 35

You might also like