0% found this document useful (0 votes)
68 views

Performance

This document discusses various performance metrics for measuring computer systems. It provides examples of different metrics like execution time, throughput, component metrics, and discusses how to properly measure and analyze performance. Key points covered include defining good metrics, how to ensure reproducibility in experiments, and examples of metrics to use and avoid like MIPS and how averages are calculated.

Uploaded by

bijan shrestha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Performance

This document discusses various performance metrics for measuring computer systems. It provides examples of different metrics like execution time, throughput, component metrics, and discusses how to properly measure and analyze performance. Key points covered include defining good metrics, how to ensure reproducibility in experiments, and examples of metrics to use and avoid like MIPS and how averages are calculated.

Uploaded by

bijan shrestha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Performance Metrics

Why study performance metrics?


• determine the benefit/lack of benefit of designs
• computer design is too complex to intuit performance &
performance bottlenecks
• have to be careful about what you mean to measure & how
you measure it

What you should get out of this discussion


• good metrics for measuring computer performance
• what they should be used for
• what metrics you shouldn’t use & how metrics are misused

perf
Performance of Computer Systems
Many different factors to take into account when determining
performance:
• Technology
• circuit speed (clock, MHz)
• processor technology (how many transistors on a chip)
• Organization
• type of processor (ILP)
• configuration of the memory hierarchy
• type of I/O devices
• number of processors in the system
• Software
• quality of the compilers
• organization & quality of OS, databases, etc.

perf
“Principles” of Experimentation

Meaningful metrics
execution time & component metrics that explain it

Reproducibility
machine configuration, compiler & optimization level, OS, input

Real programs
no toys, kernels, synthetic programs
SPEC is the norm (integer, floating point, graphics, webserver)
TPC-B, TPC-C & TPC-D for database transactions

Simulation
long executions, warm start to mimic steady-state behavior
usually applications only; some OS simulation
simulator “validation” & internal checks for accuracy

perf
Metrics that Measure Performance
Raw speed: peak performance (never attained)

Execution time: time to execute one program from beginning to


end
• the “performance bottom line”
• wall clock time, response time
• Unix time function: 13.7u 23.6s 18:27 3%

Throughput: total amount of work completed in a given time


• transactions (database) or packets (web servers) / second
• an indication of how well hardware resources are being used
• good metrics for chip designers or managers of computer
systems

(Often improving execution time will improve throughput & vice


versa.)

Component metrics: subsystem performance, e.g., memory


behavior
• help explain how execution time was obtained
• pinpoints performance bottlenecks

perf
Execution Time

Performancea = 1 / (Execution Timea)

Processor A is faster than processor B, i.e.,

Execution TimeA < Execution TimeB


PerformanceA > PerformanceB

Relative Performance

PerformanceA / PerformanceB
=n
= ExecutionTImeB / ExecutionTimeA

performance of A is n times greater than B


execution time of B is n times longer than A

perf
CPU Execution Time
The time the CPU spends executing an application
• no memory effects
• no I/O
• no effects of multiprogramming

CPUExecutionTime = CPUClockCycles * ClockCycleTime

Cycle time (clock period) is measured in time or rate


• clock cycle time = 1/clock cycle rate

CPUExecutionTime = CPUClockCycles / ClockCycleRate

• clock cycle rate of 1 MHz = cycle time of 1 µs


• clock cycle rate of 1 GHz = cycle time of 1 ns

perf
CPI

CPUClockCycles = NumberOfInstructions * CPI

Average number of clock cycles per instruction


• throughput metric
• component metric, not a measure of performance
• used for processor organization studies, given a fixed compiler
& ISA

Can have different CPIs for classes of instructions


e.g., floating point instructions take longer than integer
instructions

n
CPUClockCycles = ∑ (CPI i × C i )
1

where CPIi = CPI for a particular class of instructions


where Ci = the number of instructions of the ith class that have
been executed

Improving part of the architecture can improve a CPIi


• Talk about the contribution to CPI of a class of instructions

perf
CPU Execution Time

CPUExecutionTime =
numberofInstructions * CPI * clockCycleTime

To measure:
• execution time: depends on all 3 factors
• time the program
• number of instructions: determined by the ISA
• programmable hardware counters
• profiling
• count number of times each basic block is executed
• instruction sampling
• CPI: determined by the ISA & implementation
• simulator: interpret (in software) every instruction &
calculate the number of cycles it takes to simulate it
• clock cycle time: determined by the implementation & process
technology

Factors are interdependent:


• RISC: increases instructions/program, but decreases CPI &
clock cycle time because the instructions are simple
• CISC: decreases instructions/program, but increases CPI &
clock cycle time because many instructions are more complex

perf
Metrics Not to Use
MIPS (millions of instructions per second)
instruction count / execution time*10^6 =
clock rate / (CPI * 10^6)
- instruction set-dependent (even true for similar architectures)
- implementation-dependent
- compiler technology-dependent
- program-dependent
+ intuitive: the higher, the better

MFLOPS (millions of floating point operations per second)


floating point operations / (execution time * 10^6)
+ FP operations are independent of FP instruction
implementation
- different machines implement different FP operations
- different FP operations take different amounts of time
- only measures FP code

static metrics (code size)

perf
Means
Measuring the performance of a workload
• arithmetic: used for averaging execution times
n
  1


∑ timei  ×
i =1  n
• harmonic: used for averaging rates ("the average of", as
opposed to "the average statistic of")
p
p
 

 ∑ 1 
ratei 
 i =1 
• weighted means: the programs are executed with different
frequencies, for example:
n
  1


∑ timei × weighti  ×
i =1  n

perf
Means

FP Ops Time (secs)

Computer A Computer B Computer C


program 1 100 1 10 20
program 2 100 1000 100 20
total 1001 110 40
arith mean 500.5 55 20

FP Ops Rate (FLOPS)

Computer A Computer B Computer C


program 1 100 100 10 5
program 2 100 .1 1 5
harm mean .2 1.5 5
arith mean 50.1 5.5 5

Computer C is ~25 times faster than A when measuring execution


time

Still true when measuring MFLOPS(a rate) with the harmonic mean

perf
Speedup

Speedup = Execution TimebeforeImprovement /


ExecutionTimeafterImprovement
Amdahl’s Law:
Performance improvement from speeding up a part of a
computer system is limited by the proportion of time the
enhancement is used.

perf

You might also like