0% found this document useful (0 votes)
9 views31 pages

Lec 2

The document discusses performance issues in computer systems, highlighting the dramatic increase in computing power and the need for efficient design to handle demanding applications. It covers various strategies to improve performance, such as optimizing clock cycles, using multicore processors, and applying Amdahl's Law for resource allocation. Additionally, it emphasizes the importance of benchmarking and the role of SPEC in evaluating system performance.

Uploaded by

mohamedhuawei010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views31 pages

Lec 2

The document discusses performance issues in computer systems, highlighting the dramatic increase in computing power and the need for efficient design to handle demanding applications. It covers various strategies to improve performance, such as optimizing clock cycles, using multicore processors, and applying Amdahl's Law for resource allocation. Additionally, it emphasizes the importance of benchmarking and the role of SPEC in evaluating system performance.

Uploaded by

mohamedhuawei010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

+ Lecture 2

Performance Issues
+
Designing for Performance
 The cost of computer systems continues to drop dramatically, while the performance
and capacity of those systems continue to rise equally dramatically
 Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years
ago
 Processors are so inexpensive that we now have microprocessors we throw away
 Desktop applications that require the great power of today’s microprocessor-based
systems include:
 Image processing
 Three-dimensional rendering
 Speech recognition
 Videoconferencing
 Multimedia authoring
 Voice and video annotation of files
 Simulation modeling

 Businesses are relying on increasingly powerful servers to handle transaction


and database processing and to support massive client/server networks that
have replaced the huge mainframe computer centers of yesteryear
 Cloud service providers use massive high-performance banks of servers to
satisfy high-volume, high-transaction-rate applications for a broad spectrum of
clients
+
Performance
Balance
Increase the
number of bits
 Adjust the organization and that are retrieved
at one time by
architecture to compensate making DRAMs
“wider” rather
for the mismatch among the than “deeper” and
by using wide bus
capabilities of the various data paths

components Reduce the


frequency of
memory access by
 Architectural examples incorporating
increasingly
include: complex and
efficient cache
structures
between the
processor and
main memory

Change the DRAM Increase the


interface to make interconnect
it more efficient by bandwidth between
processors and
including a cache memory by using
or other buffering higher speed buses
scheme on the and a hierarchy of
DRAM chip buses to buffer and
structure data flow
+
Improvements in Chip
Organization and Architecture
 Increase hardware speed of processor
 Fundamentally due to shrinking logic gate size
 More gates, packed more tightly, increasing clock rate
 Propagation time for signals reduced

 Increase size and speed of caches


 Dedicating part of processor chip
 Cache access times drop significantly

 Change processor organization and architecture


 Increase effective speed of instruction execution
 Parallelism
+ 5

How to Improve Performance?


 To improve performance you can either:
 Decrease
the CPI (clock cycles per instruction) by using
new Hardware.
 Decreasethe clock time or Increase clock rate by reducing
propagation delays or by use pipelining.
 Decreasethe number of required cycles or improve ISA or
Compiler.

06/08/2025
+
Relative Performance
+
Measuring Execution Time
+
Execution Time and CPU Clocking
+
CPU Time
+
EXAMPLE
 Our favorite program runs in 10 seconds on computer A,
which has a 2 GHz clock. We are trying to help a
computer designer build a computer, B, which will run
this program in 6 seconds. The designer has determined
that a substantial increase in the clock rate is possible,
but this increase will affect the rest of the CPU design,
causing computer B to require 1.2 times as many clock
cycles as computer A for this program. What clock rate
should we tell the designer to target?
+
Answer

To run the program in 6 seconds, B must have twice the clock rate of A.
+
Instruction Count and CPI
Clock Cycles per Instruction (CPI)
• Instructions take different number of cycles to execute
– Multiplication takes more time than addition
– Floating point operations take longer than integer ones
– Accessing memory takes more time than accessing registers
• CPI is an average number of clock cycles per instruction

I1 I2 I3 I4 I5 I6 I7
CPI 14/7 =
= 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 cycles

• Important point
Changing the cycle time often changes the number of cycles required
for various instructions (more later)

13
+
CPI Example
+
CPI in More Detail
Performance Equation
• To execute, a given program will require …
– Some number of machine instructions
– Some number of clock cycles
– Some number of seconds
• We can relate CPU clock cycles to instruction count
CPU cycles = Instruction Count × CPI

• Performance Equation: (related to instruction count)


Time = Instruction Count × CPI × cycle time

Program Compiler (Scheduling)


architecture Physical Design
Organization (uArch)
compiler (ISA) Circuit Designers
Microarchitects
16
Understanding Performance Equation
Time = Instruction Count × CPI × cycle time

• How to improve (i.e. decrease) CPU time:


– Clock rate: hardware technology & organization,
– CPI: organization, ISA and compiler technology,
– Instruction count: ISA & compiler technology.
Many potential performance improvement techniques primarily improve
one component with small or predictable impact on the other two.
I-Count CPI Cycle
Program X
Compiler X X
ISA X X
Organization X X
Technology X
+
CPI Example
+
Performance Summary
+
Clock Rate and Power Trends
+
Reducing Power
 Suppose a new CPU has
 85% of capacitive load of old CPU
 15% voltage and 15% frequency reduction

 The power wall


We can’t reduce voltage further
 We can’t remove more heat

 How else can we improve performance?


+
Multiprocessors
 Multicore microprocessors
 More than one processor per chip
 Requires explicitly parallel programming
 Compare with instruction level parallelism
 Hardware executes multiple instructions at once
 Hidden from the programmer

 Hard to do
 Programming for performance
 Load balancing
 Optimizing communication and synchronization
+
Amdahl's Law
 In computer architecture, Amdahl's
law (or Amdahl's argument) is a formula that
shows how much faster a task can be completed
when you add more resources to the system.

 The law can be stated as:


"the overall performance improvement gained by
optimizing a single part of a system is limited by the
fraction of time that the improved part is actually
used".

 Amdahl's law is often used in parallel computing


to predict the theoretical speedup when using
multiple processors.
+
Amdahl's Law
+
+ Example on Amdahl's Law 26

• Suppose a program runs in 100 seconds on a machine,


with multiply responsible for 80 seconds of this time. How
much do we have to improve the speed of multiplication
if we want the program to run 4 times faster?
• Solution:
suppose we improve multiplication by a factor s
25 sec (4 times faster) = 80 sec / s + 20 sec
s = 80 / (25 – 20) = 80 / 5 = 16
Improve the speed of multiplication by s = 16 times
• How about making the program 5 times faster?
20 sec ( 5 times faster) = 80 sec / s + 20 sec
s = 80 / (20 – 20) = ∞ Impossible to make 5 times faster!

06/08/2025
+
Benchmark Principles

 Desirable
characteristics of a
benchmark program:

1. It is written in a high-level language,


making it portable across different
machines
2. It is representative of a particular kind of
programming domain or paradigm, such as
systems programming, numerical
programming, or commercial programming
3. It can be measured easily
4. It has wide distribution
+
System Performance Evaluation
Corporation (SPEC)
 Benchmark suite
 A collection of programs, defined in a high-level language
 Together attempt to provide a representative test of a
computer in a particular application or system
programming area

 SPEC
 An industry consortium
 Defines and maintains the best known collection of
benchmark suites aimed at evaluating computer systems
 Performance measurements are widely used for comparison
and research purposes
+  Best known SPEC benchmark suite

 Industry standard suite for


processor intensive applications
SPEC  Appropriate for measuring
performance for applications that
spend most of their time doing
computation rather than I/O
CPU2006  Consists of 17 floating point
programs written in C, C++, and
Fortran and 12 integer programs
written in C and C++

 Suite contains over 3 million lines of


code

 Fifth generation of processor


intensive suites from SPEC
+
Pitfall: MIPS as a Performance Metric
+
Concluding Remarks

You might also like