0% found this document useful (0 votes)
26 views56 pages

CS5204/EE5364 - Advanced Computer Architecture - Performance

CSCI 5204/EE5364 is a course on Advanced Computer Architecture focusing on performance, power, and dependability, taught by Prof. Pen-Chung Yew at the University of Minnesota. The course includes two mid-term exams, homework assignments, and a team project, with a required textbook and suggested readings from various research papers. Key topics covered include measuring system performance, types of parallelism, and performance metrics such as speedup and execution time.

Uploaded by

wop8u3z8o
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views56 pages

CS5204/EE5364 - Advanced Computer Architecture - Performance

CSCI 5204/EE5364 is a course on Advanced Computer Architecture focusing on performance, power, and dependability, taught by Prof. Pen-Chung Yew at the University of Minnesota. The course includes two mid-term exams, homework assignments, and a team project, with a required textbook and suggested readings from various research papers. Key topics covered include measuring system performance, types of parallelism, and performance metrics such as speedup and execution time.

Uploaded by

wop8u3z8o
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

CSCI 5204/EE5364

Advanced Computer Architecture

Performance, Power, Dependability


Pen-Chung Yew
Department Computer Science and Engineering
University of Minnesota
http://www.cs.umn.edu/~yew

With slides from: Profs. Zhai, Mowry, Falsafi, Hill, Hoe, Lipasti, Shen,
Smith, Sohi, Vijaykumar, Patterson, Culler
Teaching Staffs and Course Resources
Instructor: Prof. Pen-Chung Yew
• Office: 6-225C Keller Hall
• E-mail: [email protected]
• Office phone: 612-625-7387
• Office hour: 14:30 pm -15:30 pm Tuesday
TA: Kartik Ramkrishnan
• Office: 6-244 Keller Hall
• Email: [email protected]
• Office hour: 1:30pm– 3:30pm Friday @ Rm L-103, Lind Hall
Web Page: https://canvas.umn.edu/courses/460175

2
Textbooks
Required Readings:
• Computer Architecture: A Quantitative Approach, 6th
Edition, John Hennessey and David Patterson. Morgan
Kauffman Publishers, 2018. (Available at the University
Bookstore.)

Suggested Readings
• Various research papers: Available online from the course
webpage.

3
More Information and Tech Papers
Major Architecture Conferences:
• International Symposium on Computer Architecture (ISCA)
• International Symposium on Microarchitecture (Micro)
• High-Performance Computer Architecture (HPCA)
• Architectural Support for Programming Languages and Operating
Systems (ASPLOS)
Major Compiler Conferences:
• Programming Language Design and Implementation (PLDI)
• Code Generation and Optimization (CGO)

Technical Weekly Reports


https://www.hpcwire.com/

4
Tentative Course Organization
• 2 Mid-term Exams (50%)
• 75-minute in-class exam each;
• Scheduled on Tue 10/15 & Tue 12/10, or check Canvas web site
• Each covers its respective portion of course materials;
• No final exam
• No make-up exams

• 2 Homework Assignments (20%)


• Project (30%)
• Team projects - a team of 2-3 students.
• A scaled-down version of a real research project (~8-week);
• Preferably aligned with your research interests, e.g., MS/PhD
work, or future job prospects.
• Novel ideas are encouraged, but not necessary
• Could be 1-person paper surveys
• Discuss with Professor Yew about your project proposal
• Project proposals due on Thursday 9/26/2024
Key to High Performance and Low Power:
Exploiting Various Kinds of Parallelism

• Classes of parallelism in applications: (Software)


• Data-Level Parallelism (DLP)
• Task-Level Parallelism (TLP)
• Request-Level Parallelism (RLP)
• Classes of architectural parallelism: (Hardware)
• Instruction-Level Parallelism (ILP)
• Vector architectures/Graphic Processor Units (GPUs)
• Thread-Level Parallelism (multicores)
• Request-Level Parallelism (warehouse-scale computers)
Flynn’s Taxonomy
• Single instruction stream, single data stream (SISD)
• Single instruction stream, multiple data streams (SIMD)
• Vector architectures
• Multimedia extensions (Intel AVX-512)
• Graphics processor units (GPUs)

• Multiple instruction streams, multiple data streams (MIMD)


• Tightly-coupled MIMD - scientific computing (latency)
• Loosely-coupled MIMD - mutliprogramming (throughput)

• Multiple instruction streams, single data stream (MISD)


• No commercial implementation
Outline – Performance, Power, Dependability

1. Define, quantify, and summarize performance


2. Define and quantify power
3. Define and quantify dependability

8
Measuring System Performance
• Two typical system performance metrics
• Speedup
• Execution time
• Benchmarks
Two Typical System Performance Metrics
• Response time (Figures 1.9 and 1.10)
• Also known as (a.k.a.) latency, running time, elapsed time,
completion time, execution time
• Time between end of an inquiry or a request and beginning of a
response
• 50-90X improvement for processors (over 25-40 years)
• 6-8X improvement for memory and disks (over 25-40 yrs)

• Throughput (Figure 1.9 and 1.10)


• The amount of work that a computer can do in a given period
• 32,000-40,000X improvement for processors (over 25-40 yrs)
• 300-1200X improvement for memory and disks (over 25-40 yrs)
Latency (Response Time) and Bandwidth (Throughput)

Figure 1.9 Log-log plot of


bandwidth and latency
milestones in Figure 1.10
relative to the first
milestone (over 25-40 yrs).
• Latency improved 8–91 ×
• Bandwidth improved
about 400–32,000 ×.
• Except networking,
modest improvements in
latency and bandwidth in
other three technologies
since ~2012: 0%–23% in
latency and 23%–70% in
bandwidth.
Figure 1.10 Performance
milestones over 25–40 years for
microprocessors, memory,
networks, and disks.
• Microprocessor milestones are
several generations of IA-32
processors, from 16-bit 80286
to 64-bit multicore, out-of-
order, superpipelined Core i7.
• Memory milestones go from
16-bit-wide, plain DRAM to 64-
bit-wide DDR4 SDRAM.
• Ethernet advanced from 10
Mbits/s to 400 Gbits/s.
• Disk milestones are based on
rotation speed, from 3600 to
15,000 RPM.
• Each case is best-case
bandwidth, and latency is the
time for a simple operation
assuming no contention.
13
Performance Expressed as Time
Definition:
1
PerformanceX =
Execution TimeX

Performance of system x is N times faster than system y:

PerformanceX Execution TimeY


Speedup = =
PerformanceY Execution TimeX
Review: Principles of Computer Design

• Processor Performance Equation and parameters that


impact processor performance (From EE4364/CS4203)

CPU time =
Review: Principles of Computer Design

• Different instruction types having different CPIs,


• e.g. integer add takes 1 cycle, while floating point add
takes 3 cycles
ICi : instruction counts for instruction type i
CPIi: # clock cycles per instruction for instruction type i
Review: CPI Example for Compiler
• Alternative compiled code sequences using instructions in classes A, B, C
Class A B C
CPI for class 1 2 3 IC: instruction count
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

n Sequence 1: IC = 5 n Sequence 2: IC = 6
n Clock Cycles n Clock Cycles

= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3


= 10 =9
n Avg. CPI = 10/5 = 2.0 n Avg. CPI = 9/6 = 1.5

CPU Time = # Clock Cycles x Clock Cycle Time


= (Instruction Count x CPI)/Clock Rate
Note: Performance determines by CPU Time, not by Instruction count alone
Review: Components of Performance

Instruction Clock
Count CPI Rate
Application X CPI
Program
Compiler X (X)
Instruction Clock
Instruction Set X X count Cycle time
(ISA)
Computer X X
Organization
Hardware X
Technology
Performance Expressed as Rate
• Rates are performance measures expressed in units of work
per unit time.
• Examples:
• millions of instructions / sec (MIPS)
• millions of floating point instructions / sec (MFLOPS)
• millions of bytes / sec (MBytes/sec)
• millions of bits / sec (Mbits/sec)
• Frames of images / sec
• samples / sec
• transactions / sec (TPS)

24
Example: MIPS (Million Instructions Per Second)

• Using rates as a performance measure can be misleading


• Example: millions of instructions / sec (MIPS)
• Instruction sets may be different
• Instructions do different amount of work and take
different amount of time to finish
• Compiler may be different
• Different instruction sequences can be chosen for
the same task with different optimization levels
• Input may be different
• Different program inputs may cause different
instruction mixes
25
Peak Rate
• Peak performance can be misleading
• Example: the i860 is advertised as having a peak rate of
80 MFLOPS (40 MHz with 2 flops per cycle).
• However, the measured performance of some compiled
linear algebra kernels (icc -O2) tells a different story:

Kernel 1d fft sasum saxpy sdot sgemm sgemv spvma


MFLOPS 8.5 3.2 6.1 10.3 6.2 15.0 8.1
%Peak (efficiency) 11% 4% 7% 13% 8% 19% 10%

26
Choosing Programs to Evaluate Performance
• Five levels of programs used:
• Real applications
• Modified (or scripted) applications
• Kernels
• Toy benchmarks
• Synthetic benchmarks

• How do we decide if computer A is better than


computer B?
• Which set of programs (benchmarks) should we use
for performance evaluation?
www.spec.org
• SPEC – System Performance Evaluation Corporation
• Desktop/Workstation benchmarks
• CPU intensive benchmarks
• Integer applications (CINT2006, 2017)

• Floating point applications (CFP2006, 2017)

• Graphics-intensive benchmarks
• SPECviewperf12

• SPECapc (Application Performance Characterization)

• Server Benchmarks
• CPU throughput benchmarks (SPECrate)
• File server benchmarks (SPECsfs)
28
• Web server benchmarks (SPECweb)
SPEC CPU Benchmarks (1989 – 2017)
SPEC 2017 Active Benchmark Suites

30
How to Compare a Suite Performance?
Computer A Computer B Computer C
Program 1 (secs) 1 10 20
Program 2 (secs) 1000 100 20

A is 10 times faster than B for program 1


B is 10 times faster than A for program 2
A is 20 times faster than C for program 1
C is 50 times faster than A for program 2
B is 2 times faster than C for program 1
C is 5 times faster than B for program 2

• All statements are correct

But, which computer performs better? 31


Measure I : Total/Average Execution Time

Computer A(sec) Computer B(sec) Computer C(sec)


Program 1 1 10 20
Program 2 1000 100 20
----------------------------------------------------------------------------------------
Arithmetic Mean 500.5 55 20

1 n
Arithmetic Mean = å
n i =1Timei

Timei is the execution of the ith program with a total of n programs


in the workload
Problems with Arithmetic Mean
§ Applications do not have the same probability of being run
§ Longer programs weigh more heavily in the average
For example, two machines timed on two benchmarks
Machine A Machine B
Program 1 2 seconds (80%) 4 seconds (80%)
Program 2 12 seconds (20%) 8 seconds (20%)

• If we take arithmetic mean, Program 2 “counts more”


than Program 1
• An improvement in Program 2 changes the average
more than a proportional improvement in Program 1
• But Program 1 is run 4X more frequently than Program 2
Example
• You live 20 miles away from school, you drive
• 60mile/hour for the first 10 miles, and then drive
10mile/hour for the next 10 miles
• What is you average speed going home?
Wrong answer:
(60miles/hour + 10miles/hour) / 2 = 35miles/hour (This is wrong!!)
20 miles/ 35miles/hour = 0.57 hour = 34.2 min
Correct answer:
10miles / 60miles/hour + 10miles/10miles/hour = 1/6 + 1 = 7/6 hr = 1 hr 10 min
20miles / (7/6 hr) = 17.14miles/hour
Alternatively:
1/6 hr + 1 hr = 7/6 hr total, so 1/7 vs. 6/7 weights
1/7 * 60miles/hour + 6/7 * 10miles/hour = 17.14miles/hour
Measure II - Weighted Execution Time
Machine A(S) Machine B(S) Machine C(S) Weightings
Program 1 1 10 20 W(1) W(2) W(3)

Program 2 1000 100 20


-------------------------------------------------------------------------------------- .999 .909 .50

Arith Mean: (w1) 2.00 10.09 20 .001 .091 .50

Arith Mean: (w2) 91.91 18.19 20 Weight Calculation:


Arith Mean: (w3) 500.50 55.00 20 A: 1/1001 = 0.001
1000/1001 = 0.999

å Weighti´Time
n B: 10/110 = 0.091
Weighted Arithmetic Mean = i =1 i 100/110 = 0.909
C: 20/40 = 0.5
Timei is the execution of the ith program in the workload

Weighti is the frequency of the ith program in the workload

Problem:
How can we agree on one set of weights?
Recap (09/10/2024)
• Course organization – Term project proposal due 9/26/24
• We us execution time, throughput and power as our
performance metrics
• A suite of benchmarks such as those produced by SPEC is
often used to measure and compare performance
• Need to summarize the performance measurements across
benchmark suite and across machines.
• Arithmetic means
• Applications are run disproportionately
• Longer programs have more weights in arithmetic means
• Weighted arithmetic means
• Difficult to determine the weights
• Geometric means 38
Measure III - Normalized Execution Time
• Normalize execution time to a reference machine
• Take the average of the normalized execution time
Normalized against machine A
Machine A(S) Machine B Machine C
Program 1 1 (1) 10 (10) 20 (20)
Program 2 1000 (1) 100 (0.1) 20 (0.02)
Normalized against machine B
Machine A Machine B(S) Machine C
Program 1 1 (0.1) 10 (1) 20 (2)
Program 2 1000 (10) 100 (1) 20 (0.2)
Normalized against machine C
Machine A Machine B Machine C(S)
Program 1 1(0.05) 10 (0.5) 20 (1)
Program 2 1000(50) 100 (5) 20 (1)
Geometric Means

n
Geometric Rate
Means
n
Õ
i =1 Rateref

n
n
Õ Rate
i =1
i

n
n
Õ Rate
i =1
ref
Normalized Execution Time
Normalized to A Normalized to B Normalized to C
A B C A B C A B C
Program 1 1.0 10 20 1.0 1.0
Program 2 1.0 0.1 0.02 1.0 1.0
Arithmetic 1.0 5.05 10.01 1.0 1.0
Mean
Geometric 1.0 1.0 0.63 1.0 1.0
Mean

Normalized against machine A


Machine A(S) Machine Machine C
Program 1 1 (1) 10 (10) 20 (20)
Program 2 1000 (1) 100 (0.1) 20 (0.02)
Normalized Execution Time
Normalized to A Normalized to B Normalized to C
A B C A B C A B C
Program 1 1.0 10 20 0.1 1.0 2.0 1.0
Program 2 1.0 0.1 0.02 10 1.0 0.2 1.0
Arithmetic 1.0 5.05 10.01 5.05 1.0 1.1 1.0
Mean
Geometric 1.0 1.0 0.63 1.0 1.0 0.63 1.0
Mean

Normalized against machine B


Machine A Machine B(S) Machine C
Program 1 1 (0.1) 10 (1) 20 (2)
Program 2 1000 (10) 100 (1) 20 (0.2)
Normalized Execution Time

Normalized to A Normalized to B Normalized to C


A B C A B C A B C
Program 1 1.0 10 20 0.1 1.0 2.0 0.05 0.5 1.0
Program 2 1.0 0.1 0.02 10 1.0 0.2 50 5 1.0
Arithmetic 1.0 5.05 10.01 5.05 1.0 1.1 25.03 2.75 1.0
Mean
Geometric 1.0 1.0 0.63 1.0 1.0 0.63 1.58 1.58 1.0
Mean

Normalized against machine C


Machine A Machine B Machine C(S)
Program 1 1(0.05) 10 (0.5) 20 (1)
Program 2 1000(50) 100 (5) 20 (1)
Normalized Execution Time

Normalized to A Normalized to B Normalized to C


A B C A B C A B C
Program 1 1.0 10 20 0.1 1.0 2.0 0.05 0.5 1.0
Program 2 1.0 0.1 0.02 10 1.0 0.2 50 5 1.0
Arithmetic 1.0 5.05 10.01 5.05 1.0 1.1 25.03 2.75 1.0
Mean
Geometric 1.0 1.0 0.63 1.0 1.0 0.63 1.58 1.58 1.0
Mean
(1.0) (1.0) (0.63)

Normalized against machine C


Machine A Machine B Machine C(S)
Program 1 1(0.05) 10 (0.5) 20 (1)
Program 2 1000(50) 100 (5) 20 (1)
Calculating Geometric Means
Geometric Arithmetic
n = # of programs
Means Means
in benchmark suite
n
Rate 1 n Timeref
n
Õ å
n i =1 Timei
i =1 Rateref

1 n
n å
n i =1
Timeref
n
Õ Rate i
i =1 1 n
n å Timei
n
Õ Rate
i =1
ref n i =1

• For a geometric mean, ratio of • Arithmetic mean does not


geometric means & geometric have the same property
mean of the ratio is the same, i.e.
we can choose any machine as a
reference machine.
Pros & Cons of Geometric Mean
Advantages:
• Ratio of geometric means & geometric mean of the ratio is the same
• Geometric means of normalized execution times are consistent despite
the choice of reference machine
Disadvantages:
• Geometric means do not predict execution time
• Encourages optimizations on benchmarks that are easy to improve
• Example:
• An optimization that improve one benchmark from 2sec to 1sec
• An optimization that improve one benchmark from 1000s to 500s?
• Which one did a better optimization job?
These are the same with our measurement using geometric means!
But, improving 500 secs is more challenging than improving 1 sec
Observations
• Arithmetic mean and geometric mean can come to
different conclusions
• The arithmetic mean performance varies depending on
the system you normalize it to, i.e. depending on the
reference machine chosen
-- > Not ideal on ratios, especially on speedups!
• The geometric mean is consistent no matter which
machine is the reference, but
• Does not predict runtime because it normalizes
execution time
• Each application now counts equally

53
How Does SPEC Reports Performance?
• A Sun Ultra Enterprise 2 workstation with a 296-MHz
UltraSPARC II processor is the current reference machine
for SPEC (It was a DEC machine originally)
Figure 1.19 SPEC2006Cint
• It uses Geometric Mean to report SPEC ratios execution times (in seconds)
for the Sun Ultra 5—the
reference computer of
SPEC2006—and execution
times and SPEC ratios for the
AMD A10 and Intel Xeon E5-
2690.
• Final two columns show
ratios of execution times
and SPEC ratios.
• The ratio of the execution
times is identical to the ratio
of the SPEC ratios, and the
ratio of the geometric
means (63.72/31.91 = 2.00)
is identical to the geometric
mean of the ratios (2.00).
• It shows the irrelevance of
reference computer in
relative performance.
Other Metrics to Measure
• Cost?
• Power and energy?
• Dependability?
Trends in Cost
• Cost driven down by learning curve
• Improving yield
• More good chips produced with same number of chips

• DRAM: price closely tracks cost

• Microprocessors: price depends on volume


• 10% less for each doubling of volume
• Amortizing initial engineering costs
Other Metrics to Measure
• Cost?
• Power and energy?
• Dependability?

58
Growth in Clock Rate on Power

• Intel 80386
consumed ~ 2 W
• 3.3 GHz Intel Core i7
consumes ~130 W
• Heat must be
dissipated from
1.5 x 1.5 cm chip
• This is the limit
of what can be
cooled by air
Power and Energy
• Power: amount of energy used per unit time
• Energy: 1 joule
= 1 ampere through 1 ohm for 1 second;
= energy required to accelerate a 1 kg mass at 1 m/s2 through a
distance of 1 meter.
• Power: 1 watt = 1 joule/second

• Challenges: Get more power in, get more heat out


• Thermal Design Power (TDP)
• Characterizes sustained power consumption
• Used as target for power supply and cooling system
• Lower than peak power (1.5X higher), higher than average power
consumption
• Energy-per-task is often a better measurement w.r.t battery life
• There are two components in the power consumed
• Static Power and Dynamic Power
Static Power
• Static power consumption
• 25-50% of total power
• Powerstatic = Currentstatic x Voltage
• Scales with number of transistors
• To reduce: use power gating, i.e. cut off the power
Define and Quantify Dynamic Power
• For CMOS chips, dominant energy consumption has been in
switching transistors on and off, called dynamic power
2
Powerdynamic = 1 / 2 × CapacitiveLoad × Voltage × FrequencySwitched
• For mobile devices, energy is a better metric for battery life
2
Energydynamic = CapacitiveLoad ´ Voltage
• Capacitive load - a function of number of transistors
connected to output and technology, which determines
capacitance of wires and transistors
• Dropping voltage helps both, so went from 5V to recent 1V
• For a fixed task, slowing clock rate (frequency switched)
reduces power, but not energy
• To save energy & dynamic power, most CPUs now turn off
clock of inactive modules (e.g. Fl. Pt. Unit) – Clock Gating
Example of Quantifying Power
• Suppose 15% reduction in voltage results in a 15%
reduction in frequency. What is impact on dynamic power?

2
Powerdynamic = 1 / 2 × CapacitiveLoad × Voltage × FrequencySwitched
2
= 1 / 2 × .85 × CapacitiveLoad × (.85×Voltage) × FrequencySwitched
= (.85)3 × OldPowerdynamic
≈ 0.6 × OldPowerdynamic

63
Leakage Current and Static Power
• Because leakage current still flows even when the clock to a
transistor is switched off (not powered off), now static
power is important too
Powerstatic = Currentstatic ´ Voltage

• Leakage current increases in processors with smaller


transistor sizes
• Increasing the number of transistors increases static power
even if they are idle
• In 2006, goal for leakage is 25% of total power
consumption; high performance designs at 40%
• Very low power systems even gate voltage to inactive
modules to control loss due to leakage, i.e. power gating
64
Reducing Power
• Techniques for reducing power:
• Power Gating (“dark silicon” approach) – turn off power to
components on chip not in use
• Dynamic Voltage-Frequency Scaling (DVFS) – changing/scaling
voltage and clock frequency (rate) to tune performance/power
Figure 1.12 Energy savings for a
server using an AMD Opteron
microprocessor, 8 GB of DRAM, and
one ATA disk.
• At 1.8 GHz, server can handle at
most up to 2/3 of workload
without causing service-level
violations
• At 1 GHz, it can safely handle
only 1/3 of workload

• Low-power state for processor, DRAM, disks


• Overclocking (increase clock rate briefly) to boost performance
Other Metrics to Measure
• Cost?
• Power?
• Dependability?

66
Define and Quantify Dependability (1/2)
• How to decide when a system is operating properly?
• Infrastructure providers now offer Service Level
Agreements (SLA) to guarantee that their networking or
power service would be dependable
• Systems alternate between 2 states of service with respect
to an SLA:
State 1: Service accomplishment, where the service is
delivered as specified in SLA
State 2: Service interruption, where the delivered
service is different from the SLA
• Failure = transition from state 1 to state 2
• Restoration = transition from state 2 to state 1
67
Define and Quantify Dependability (2/2)
• Module reliability = measure of continuous service accomplishment
(or expected time-to-failure).
• There are 2 metrics
• Mean Time To Failure (MTTF) measures Reliability
• No. of Failures In Time (FIT) = 1/MTTF, the Failure Rate
• Often reported as failures per billion hours of operation
• Mean Time To Repair (MTTR) measures Service Interruption
• Mean Time Between Failures (MTBF) = MTTF+MTTR
• Module availability measures service as alternate between states of
accomplishment and interruption (value between 0 and 1, e.g. 0.9)
• Module availability = MTTF / ( MTTF + MTTR) = MTTF / MTBF

Time

MTTF MTTR
MTBF
Example-1 of Calculating MTTF
• A disk subsystem with following components and MTTF. Assume
failures are independent, lifetimes are exponentially distributed.
10 disks - > MTTF is 1 x 106 hours for each disk
1 ATA disk controller - > MTTF is 0.5 x 106 hours
1 power supply - > MTTF is 0.2 x 106 hours
1 cooling fan - > MTTF is 0.2 x 106 hours
1 ATA cable - > MTTF is 1 x 106 hours
What is the MTTF of the disk system as a whole?
Answer:
Failure ratesystem = 10 x 1/(1x106)+ 1/(0.5x106) + 1/(0.2x106) +
1/(0.2x106) + 1/(1x106) = (10+2+5+5+1)/(1x106)
= 23,000/(1x109) hours = 23,000 Failures in Time (FIT)
MTTF = 1/Failure rate = 43,000 hours (~ 5 years)
Summary

• Performance is important to measure


• For computer architects -> comparing different
design/implementation options
• For developers of software optimizing their code -> applications
• For users -> deciding which machine to use, or to buy

• Performance metric are subtle


• Easy to mess up the “machine A is XXX times faster than machine
B” numerical performance comparison
• You need to know exactly what you are measuring: time, rate,
throughput, CPI, cycles, etc
• You need to know how combining these to give aggregate numbers
does different kinds of “distortions” to the individual numbers
• No metric is perfect, so lots of emphasis on standard benchmarks
and several different metrics to get a better overall picture
And in conclusion …
• Tracking and extrapolating technology are part of
computer architect’s responsibility
• Expect bandwidth in disks, DRAM, network, and
processors to improve by at least as much as the square
of the improvement in latency
• Quantify dynamic and static power
• ½ x Capacitance x Voltage2 x frequency = 1/2 C x V2 x f
• Energy vs. power

• Quantify dependability
• Reliability (MTTF, FIT), Availability (99.9…)

• Quantify and summarize performance


71
• Arithmetic Mean, Ratios and Geometric Mean

You might also like