0% found this document useful (0 votes)

177 views31 pages

06 CA (Performance Enhancement)

The document discusses performance enhancement in computer architecture. It explains that performance is determined by execution time, not just the number of instructions or cycles. It introduces Amdahl's law, which states that the overall speedup from an enhancement is limited by the fraction of time the original program spends running non-enhanced code. Several examples are provided to illustrate how to use Amdahl's law to evaluate potential performance improvements.

Uploaded by

Royal Stars

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

177 views31 pages

06 CA (Performance Enhancement)

Uploaded by

Royal Stars

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Performance Enhancement

CS353 – Computer Architecture

Najeeb-Ur-Rehman
Assistant Professor
Department of Computer Science
Faculty of Computing & IT
University of Gujrat
# of Instructions Example
A compiler designer is trying to decide between
two code sequences for a particular machine.
Based on the hardware implementation, there
are three different classes of instructions: Class
A, Class B, and Class C, and they require one,
two, and three cycles (respectively).
The first code sequence has 5 instructions: 2 of
A, 1 of B, and 2 of C. The second sequence has 6
instructions: 4 of A, 1 of B, and 1 of C.
Which sequence will be faster? How much?
What is the CPI for each sequence?
2
Performance
 Performance is determined by execution time
 Do any of the other variables equal performance?
 # of cycles to execute program?
 # of instructions in program?
 # of cycles per second?
 average # of cycles per instruction?
 average # of instructions per second?

 Common pitfall: thinking one of the variables is

indicative of performance when it really isn’t.

3
Quantitative Principles
 Make common case fast
 Favor the frequent case (simpler) over the infrequent
case.
 For example, given that overflow in addition is
infrequent, favor optimizing the case when no
overflow occurs.
 Objective
 Determine the frequent case.
 Determine how much improvement in performance is
possible by making it faster.

Amdahl's law can be used to quantify the latter

given that we have information concerning the
former.
4
Amdahl’s law

 The performance improvement to be gained from

using some faster mode of execution is limited by
the fraction of the time the faster mode can be used.
 This implies that the time consumed by events
whose performance is not improved limits the
effect of any improvement.
 Lowest performer restricts all others.

5
Amdahl’s law….
The parameter to use in measuring the
effect of Amdahl's Law is speedup:
Performanc e using enhancemen t
Speedup 
Performanc e without using enhancemen t
or
Execution time without enhancemen t
Speedup 
Execution time with enhancemen t

6
Speedup depends on two factors
 The fraction of the computation time in the original
machine that can be converted to take advantage of
the enhancement

Fraction enhanced  1
 The improvement gained by the enhanced execution
mode
Speedup enhanced  1

7
Example
 Trip from point A to point B in two parts

A 20 C 50/20/4/1.7/0.3 B

A-C Trip C-B Trip Total Time C-B Speedup Overall Speedup
20 50 70 1 1
20 20 40 2.5 1.75
20 4 24 12.5 2.9
20 1.7 21.7 29.4 3.2
20 0.3 20.3 166.66 3.4

8
Amdahl’s law
Exec timenew = execution time after some enhancement
Exec timeold = execution time before any enhancement
Fractionenhanced = fraction of work using the enhancement
Speedupenhanced = speedup of enhanced mode

9
Conti…

10
Example – I
 We are considering an enhancement to
the processor of a web server. The new
CPU is 20 times faster on search queries
than the old processor. The old
processor is busy with search queries
70% of the time, what is the speedup
gained by integrating the enhanced
CPU?
11
Example – I Solution

12
Example – II
 Suppose that we are considering an
enhancement to the processor of a server
system used for web serving. The new CPU
is 10 times faster on computation in the Web
serving application that the original
processor. Assuming that the original CPU
is busy with computation 40% of the time
and is waiting for I/O 60% of the time, what
is the overall speedup gained by
incorporating the enhancement?
13
Example – II Solution

14
Amdahl’s law Example
 Consider an enhancement that takes 20ns on a
machine with enhancement and 100ns on a
machine without. Assume enhancement can only
be used 30% of the time.
 What is the overall speedup?

15
Corollary
 If an enhancement is only usable for a fraction of a
task, we can’t speed up the task by more than the
reciprocal of 1 minus that fraction.

1
Performanc e Improvemen t Limit 
1 - Fraction enhanced

16
Example
 Frequency of FP instructions : 25%
 Average CPI of FP instructions : 4.0
 Average CPI of other instructions : 1.33
 Frequency of FPSQR = 2%
 CPI of FPSQR = 20
 Design Alternative 1: Reduce CPI of FPSQR
from 20 to 2.
 Design Alternative 2: Reduce average CPI of all
FP instruction to 2.5
 Compare these two design alternatives using
CPU Performance equation.

17
Solution
Original CPI = 0.25*4 + 1.33*(1-0.25) = 2.0

Option 1 CPI = 2.0 – 2%*(20-2) = 1.64

Option 2 CPI = 0.252.5 + 1.33(1-0.25) =

1.625

Speedup of Option 1 = 2/1.64 = 1.2195

Speedup of Option 2 = 2/1.625 = 1.2308

18
Example – III
A common transformation required in graphics engines is
square root. Implementations of floating-point (FP) square
root vary significantly in performance, especially among
processors designed for graphics. Suppose FP square root
(FPSQR) is responsible for 20% of the execution time of a
critical graphics benchmark. One proposal is to enhance
the FPSQR hardware and speed up this operation by a
factor of 10. The other alternative is just to try to make all
FP instructions in the graphics processor run faster by a
factor of 1.6; FP instructions are responsible for a total of
50% of the execution time for the application. The design
team believes that they can make all FP instructions run 1.6
times faster with the same effort as required for the fast
square root. Compare these two design alternatives.

19
Solution – III

20
Exercise
Clock freq = 1.4 GHz
FP insturctionss = 25%
Average CPI of FP instructions = 4.0
Average CPI of other instructions = 1.33
FPSQRT = 2%, CPI of FPSQRT = 20
 Design Option 1: decrease the CPI of FQSQRT to 2, clock
freq = 1.2GHz
 Design Option 2: decease the average CPI of all FP
instructions to 2.5, clock freq = 1.1 GHz

21
Pitfall: MIPS as a Performance Metric
 MIPS: Millions of Instructions Per Second
 Doesn’t account for
 Differences in ISAs between computers
 Differences in complexity between instructions

Instructio n count
MIPS 
Execution time  10 6
Instructio n count Clock rate
 
Instructio n count  CPI CPI  10 6
 10 6

Clock rate

 CPI varies between programs on a given CPU

22
SPEC CPU Benchmark
 Programs used to measure performance
 Supposedly typical of actual workload
 Standard Performance Evaluation Corp (SPEC)
 Develops benchmarks for CPU, I/O, Web, …

 SPEC CPU2006
 Elapsed time to execute a selection of programs
 Negligible I/O, so focuses on CPU performance
 Normalize relative to reference machine
 Summarize as geometric mean of performance ratios
 CINT2006 (integer) and CFP2006 (floating-point)

n
n
 Execution time ratio
i 1
i

23
CINT2006 for Opteron X4 2356
Name Description IC×109 CPI Tc (ns) Exec time Ref time SPECratio
perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3
bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8
gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1
mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8
go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6
hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5
sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5
libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8
h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3
omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1
astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1
xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0
Geometric mean 11.7

High cache miss rates

24
SPEC Benchmark
 Desktop Benchmarks
 CPU-intensive benchmarks
 SPEC89
 SPEC92
 SPEC95
 SPEC2000
 SPEC2006
 graphics-intensive benchmarks
 SPEC2000
 SPECviewperf
o is used for benchmarking systems supporting the OpenGL graphics library
 SPECapc
o consists of applications that make extensive use of graphics.

25
SPEC Benchmark
 Server Benchmarks
 SPECrate--processing rate of a multiprocessor
 (SPECSFS)--file server benchmark
 (SPECWeb)--Web server benchmark
 Transaction-processing (TP) benchmarks
 TPC benchmark—Transaction Processing Council
 TPC-A, 1985
 TPC-C, 1992,
 TPC-H TPC-RTPC-W

26
SPEC Benchmark
 Embedded Benchmarks
 EDN Embedded Microprocessor Benchmark
Consortium (or EEMBC, pronounced “embassy”).

27
Power Consumption Trends
 Power=Dynamic power+ Leakage power
• Dyn power∝activity capacitance×voltage2 ×frequency
• Capacitance per transistor and voltage are decreasing,
 but number of transistors and frequency are increasing at a faster rate
• Leakage power is also rising and will soon match dynamic
 power
 Power consumption is already around 100W in some
high-performance processors today

28
Power wall

 Power = K (Capacitive Load)·(Voltage)2·(Frequency Switched)

29
Fallacy: Low Power at Idle
 Look back at X4 power benchmark
 At 100% load: 295W
 At 50% load: 246W (83%)
 At 10% load: 180W (61%)
 Google data center
 Mostly operates at 10% – 50% load
 At 100% load less than 1% of the time
 Consider designing processors to make power
proportional to load

30
Performance Metrics
 MIPS: Millions of Instructions Per Second
 MFLOPS: Millions of floating point operations
per second.

 Topic 4.5 From Book

 http://ece-
research.unm.edu/jimp/611/slides/chap1_3.htm
l

Bhai Bahan Ka Anokha Payar (Incest Storey)
63% (52)
Bhai Bahan Ka Anokha Payar (Incest Storey)
188 pages
Equations and Patterns
No ratings yet
Equations and Patterns
230 pages
Batt Mobile - Digital Strategy Deck
No ratings yet
Batt Mobile - Digital Strategy Deck
72 pages
Meri Jung Update 177
64% (14)
Meri Jung Update 177
1,231 pages
Meri Behnen Meri Jindagi
60% (47)
Meri Behnen Meri Jindagi
1,102 pages
Computer Architecture Unit 1 - Phase 2 PDF
No ratings yet
Computer Architecture Unit 1 - Phase 2 PDF
26 pages
Computer Performance
No ratings yet
Computer Performance
35 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
18 pages
Ece586 Lec4 1
No ratings yet
Ece586 Lec4 1
4 pages
Multi-Core Computer Architecture: Performance Evaluation Methods
No ratings yet
Multi-Core Computer Architecture: Performance Evaluation Methods
20 pages
Chapter 1 Lecture 2 & 3 - Performance
No ratings yet
Chapter 1 Lecture 2 & 3 - Performance
36 pages
Lec 3
No ratings yet
Lec 3
21 pages
Computer Architecture Measuring Performance
No ratings yet
Computer Architecture Measuring Performance
33 pages
4 Performance
No ratings yet
4 Performance
27 pages
Chapter 1 Lecture 2 & 3 - Computer Performance
No ratings yet
Chapter 1 Lecture 2 & 3 - Computer Performance
37 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
Fundamentals of Computer Design - 1
No ratings yet
Fundamentals of Computer Design - 1
32 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
513 Lec 02 Quantifying Performance
No ratings yet
513 Lec 02 Quantifying Performance
50 pages
Quatitative Principle
No ratings yet
Quatitative Principle
56 pages
CS-3006 4 PerformanceAnalysis
No ratings yet
CS-3006 4 PerformanceAnalysis
62 pages
Medidas de Rendimiento
No ratings yet
Medidas de Rendimiento
5 pages
L14 Introduction To Performance Evaluation
No ratings yet
L14 Introduction To Performance Evaluation
48 pages
2 CPU Performance
No ratings yet
2 CPU Performance
35 pages
SEN307 Lecture 5
No ratings yet
SEN307 Lecture 5
34 pages
Computer Performance
No ratings yet
Computer Performance
17 pages
Computer Architecture Unit 1
No ratings yet
Computer Architecture Unit 1
12 pages
Performance Chap4
No ratings yet
Performance Chap4
20 pages
Fundamentals of Computer Design: Bina Ramamurthy CS506
No ratings yet
Fundamentals of Computer Design: Bina Ramamurthy CS506
25 pages
CALec 1
No ratings yet
CALec 1
24 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
CS-3006 10 PerformanceAnalysis
No ratings yet
CS-3006 10 PerformanceAnalysis
52 pages
Computer Architecture
No ratings yet
Computer Architecture
26 pages
Performance
No ratings yet
Performance
4 pages
Exam 2 Review Answers
No ratings yet
Exam 2 Review Answers
3 pages
Performance: Latency
No ratings yet
Performance: Latency
7 pages
Eceg-3143 Computer Architecture & Organization Lecture 2-Computer Performance Concepts
No ratings yet
Eceg-3143 Computer Architecture & Organization Lecture 2-Computer Performance Concepts
15 pages
CA Lecture1
No ratings yet
CA Lecture1
9 pages
Computer Architecture Unit1
No ratings yet
Computer Architecture Unit1
20 pages
Micro Processor and Assembly Language
No ratings yet
Micro Processor and Assembly Language
16 pages
Lec 2
No ratings yet
Lec 2
31 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
Computer Performance
No ratings yet
Computer Performance
27 pages
Introduction To Computer Organization
No ratings yet
Introduction To Computer Organization
66 pages
2 - Computer Organization and Architecture
No ratings yet
2 - Computer Organization and Architecture
21 pages
Computer Architecture Measurement
No ratings yet
Computer Architecture Measurement
26 pages
Quiz For Chapter 1 Computer Abstractions and Technology
No ratings yet
Quiz For Chapter 1 Computer Abstractions and Technology
5 pages
B38DF LS2b Performance
No ratings yet
B38DF LS2b Performance
20 pages
Lec 2
No ratings yet
Lec 2
31 pages
Assessing and Understanding Performance
No ratings yet
Assessing and Understanding Performance
31 pages
C F C P S (CS61063) : Tutorial 1
No ratings yet
C F C P S (CS61063) : Tutorial 1
13 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
Quiz For Chapter 1 With Solutions
No ratings yet
Quiz For Chapter 1 With Solutions
6 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
2 Week
No ratings yet
2 Week
35 pages
Lec 2 Performance
No ratings yet
Lec 2 Performance
28 pages
L7 Performance
No ratings yet
L7 Performance
11 pages
Lesson 3 - Computing For Performance
No ratings yet
Lesson 3 - Computing For Performance
38 pages
Computer Architecture and Performance
No ratings yet
Computer Architecture and Performance
33 pages
Jism Ki Bhook PDF
67% (3)
Jism Ki Bhook PDF
190 pages
3rd Lecture of CA
No ratings yet
3rd Lecture of CA
2 pages
Khooni Rishtey Me Pyar
73% (15)
Khooni Rishtey Me Pyar
94 pages
08 CA (Constructing+an+ALU)
No ratings yet
08 CA (Constructing+an+ALU)
17 pages
Application Section: Caterpillar Equipment
No ratings yet
Application Section: Caterpillar Equipment
9 pages
Bracing Stiffness
No ratings yet
Bracing Stiffness
8 pages
Customer On-Boarding - Disclaimer For Retail Bug - Fix - 2024-05-23 - 1155.export-Log
No ratings yet
Customer On-Boarding - Disclaimer For Retail Bug - Fix - 2024-05-23 - 1155.export-Log
109 pages
BSBINM601 Assessment2
100% (1)
BSBINM601 Assessment2
8 pages
Quarter 1 - Household Services 9 - Week 5
100% (1)
Quarter 1 - Household Services 9 - Week 5
6 pages
CH7 - Workplace Correspondence
No ratings yet
CH7 - Workplace Correspondence
50 pages
Iimjobs Siddharth Singh
No ratings yet
Iimjobs Siddharth Singh
1 page
Cummins PowerBox 20ft 40ft Container Genset Installation Manual
100% (1)
Cummins PowerBox 20ft 40ft Container Genset Installation Manual
28 pages
ISTQB CTFL40 Sample-Exam-Answers SET-E v1.2 GTB-edition Engl en
No ratings yet
ISTQB CTFL40 Sample-Exam-Answers SET-E v1.2 GTB-edition Engl en
59 pages
Kubernetes Interview Questions 1 3 1685320790
No ratings yet
Kubernetes Interview Questions 1 3 1685320790
3 pages
Amazon Products Data Entry Task Clarification - 17 Jan 2022
No ratings yet
Amazon Products Data Entry Task Clarification - 17 Jan 2022
3 pages
Manual de Servicio (Básico) DEP450 Inglés 2020
No ratings yet
Manual de Servicio (Básico) DEP450 Inglés 2020
95 pages
Mobile Applications in Children With Cerebral Palsy
No ratings yet
Mobile Applications in Children With Cerebral Palsy
14 pages
8051 External Memory Interfacing
No ratings yet
8051 External Memory Interfacing
17 pages
Module 5 Turing Machines
No ratings yet
Module 5 Turing Machines
6 pages
Description of Damage: 4.2.7 Brittle Fracture 4.2.7.1
No ratings yet
Description of Damage: 4.2.7 Brittle Fracture 4.2.7.1
5 pages
Code 188 - Punto Classic
No ratings yet
Code 188 - Punto Classic
5 pages
STA404 Exam Booklet - 20.03.2023
No ratings yet
STA404 Exam Booklet - 20.03.2023
153 pages
Renolit Poliplex Series - en
No ratings yet
Renolit Poliplex Series - en
2 pages
AX90032-i002 - Yeti™ XL 1-8th Scale RTR
No ratings yet
AX90032-i002 - Yeti™ XL 1-8th Scale RTR
32 pages
Isb-Iba1-P01-N9k DR Drill Cab Meeting
No ratings yet
Isb-Iba1-P01-N9k DR Drill Cab Meeting
10 pages
Boschtrainingsolutionsleafleta 4 Cropped
No ratings yet
Boschtrainingsolutionsleafleta 4 Cropped
2 pages
Led 08 02 2020
No ratings yet
Led 08 02 2020
41 pages
9 - HVAC - Heat Recovery Ventilation - (Slide 3 and Slide 10 - Slide 15)
No ratings yet
9 - HVAC - Heat Recovery Ventilation - (Slide 3 and Slide 10 - Slide 15)
24 pages
How To Use The Guide and Quiz: Select The Version. The Questions Are Identical
No ratings yet
How To Use The Guide and Quiz: Select The Version. The Questions Are Identical
11 pages
Asp7a Product Specifications
No ratings yet
Asp7a Product Specifications
2 pages
CCTR-809 Asset GPS Tracker User Manual
No ratings yet
CCTR-809 Asset GPS Tracker User Manual
16 pages
OOPS
No ratings yet
OOPS
3 pages

06 CA (Performance Enhancement)

Uploaded by

06 CA (Performance Enhancement)

Uploaded by

Performance Enhancement

CS353 – Computer Architecture

 Common pitfall: thinking one of the variables is

Amdahl's law can be used to quantify the latter

 The performance improvement to be gained from

Option 1 CPI = 2.0 – 2%*(20-2) = 1.64

Option 2 CPI = 0.25*2.5 + 1.33*(1-0.25) =

Speedup of Option 1 = 2/1.64 = 1.2195

 CPI varies between programs on a given CPU

High cache miss rates

 Power = K (Capacitive Load)·(Voltage)2·(Frequency Switched)

 Topic 4.5 From Book

You might also like

Option 2 CPI = 0.252.5 + 1.33(1-0.25) =