CS5204/EE5364 - Advanced Computer Architecture - Performance
CS5204/EE5364 - Advanced Computer Architecture - Performance
With slides from: Profs. Zhai, Mowry, Falsafi, Hill, Hoe, Lipasti, Shen,
Smith, Sohi, Vijaykumar, Patterson, Culler
Teaching Staffs and Course Resources
Instructor: Prof. Pen-Chung Yew
• Office: 6-225C Keller Hall
• E-mail: [email protected]
• Office phone: 612-625-7387
• Office hour: 14:30 pm -15:30 pm Tuesday
TA: Kartik Ramkrishnan
• Office: 6-244 Keller Hall
• Email: [email protected]
• Office hour: 1:30pm– 3:30pm Friday @ Rm L-103, Lind Hall
Web Page: https://canvas.umn.edu/courses/460175
2
Textbooks
Required Readings:
• Computer Architecture: A Quantitative Approach, 6th
Edition, John Hennessey and David Patterson. Morgan
Kauffman Publishers, 2018. (Available at the University
Bookstore.)
Suggested Readings
• Various research papers: Available online from the course
webpage.
3
More Information and Tech Papers
Major Architecture Conferences:
• International Symposium on Computer Architecture (ISCA)
• International Symposium on Microarchitecture (Micro)
• High-Performance Computer Architecture (HPCA)
• Architectural Support for Programming Languages and Operating
Systems (ASPLOS)
Major Compiler Conferences:
• Programming Language Design and Implementation (PLDI)
• Code Generation and Optimization (CGO)
4
Tentative Course Organization
• 2 Mid-term Exams (50%)
• 75-minute in-class exam each;
• Scheduled on Tue 10/15 & Tue 12/10, or check Canvas web site
• Each covers its respective portion of course materials;
• No final exam
• No make-up exams
8
Measuring System Performance
• Two typical system performance metrics
• Speedup
• Execution time
• Benchmarks
Two Typical System Performance Metrics
• Response time (Figures 1.9 and 1.10)
• Also known as (a.k.a.) latency, running time, elapsed time,
completion time, execution time
• Time between end of an inquiry or a request and beginning of a
response
• 50-90X improvement for processors (over 25-40 years)
• 6-8X improvement for memory and disks (over 25-40 yrs)
CPU time =
Review: Principles of Computer Design
n Sequence 1: IC = 5 n Sequence 2: IC = 6
n Clock Cycles n Clock Cycles
Instruction Clock
Count CPI Rate
Application X CPI
Program
Compiler X (X)
Instruction Clock
Instruction Set X X count Cycle time
(ISA)
Computer X X
Organization
Hardware X
Technology
Performance Expressed as Rate
• Rates are performance measures expressed in units of work
per unit time.
• Examples:
• millions of instructions / sec (MIPS)
• millions of floating point instructions / sec (MFLOPS)
• millions of bytes / sec (MBytes/sec)
• millions of bits / sec (Mbits/sec)
• Frames of images / sec
• samples / sec
• transactions / sec (TPS)
24
Example: MIPS (Million Instructions Per Second)
26
Choosing Programs to Evaluate Performance
• Five levels of programs used:
• Real applications
• Modified (or scripted) applications
• Kernels
• Toy benchmarks
• Synthetic benchmarks
• Graphics-intensive benchmarks
• SPECviewperf12
• Server Benchmarks
• CPU throughput benchmarks (SPECrate)
• File server benchmarks (SPECsfs)
28
• Web server benchmarks (SPECweb)
SPEC CPU Benchmarks (1989 – 2017)
SPEC 2017 Active Benchmark Suites
30
How to Compare a Suite Performance?
Computer A Computer B Computer C
Program 1 (secs) 1 10 20
Program 2 (secs) 1000 100 20
1 n
Arithmetic Mean = å
n i =1Timei
å Weighti´Time
n B: 10/110 = 0.091
Weighted Arithmetic Mean = i =1 i 100/110 = 0.909
C: 20/40 = 0.5
Timei is the execution of the ith program in the workload
Problem:
How can we agree on one set of weights?
Recap (09/10/2024)
• Course organization – Term project proposal due 9/26/24
• We us execution time, throughput and power as our
performance metrics
• A suite of benchmarks such as those produced by SPEC is
often used to measure and compare performance
• Need to summarize the performance measurements across
benchmark suite and across machines.
• Arithmetic means
• Applications are run disproportionately
• Longer programs have more weights in arithmetic means
• Weighted arithmetic means
• Difficult to determine the weights
• Geometric means 38
Measure III - Normalized Execution Time
• Normalize execution time to a reference machine
• Take the average of the normalized execution time
Normalized against machine A
Machine A(S) Machine B Machine C
Program 1 1 (1) 10 (10) 20 (20)
Program 2 1000 (1) 100 (0.1) 20 (0.02)
Normalized against machine B
Machine A Machine B(S) Machine C
Program 1 1 (0.1) 10 (1) 20 (2)
Program 2 1000 (10) 100 (1) 20 (0.2)
Normalized against machine C
Machine A Machine B Machine C(S)
Program 1 1(0.05) 10 (0.5) 20 (1)
Program 2 1000(50) 100 (5) 20 (1)
Geometric Means
n
Geometric Rate
Means
n
Õ
i =1 Rateref
n
n
Õ Rate
i =1
i
n
n
Õ Rate
i =1
ref
Normalized Execution Time
Normalized to A Normalized to B Normalized to C
A B C A B C A B C
Program 1 1.0 10 20 1.0 1.0
Program 2 1.0 0.1 0.02 1.0 1.0
Arithmetic 1.0 5.05 10.01 1.0 1.0
Mean
Geometric 1.0 1.0 0.63 1.0 1.0
Mean
1 n
n å
n i =1
Timeref
n
Õ Rate i
i =1 1 n
n å Timei
n
Õ Rate
i =1
ref n i =1
53
How Does SPEC Reports Performance?
• A Sun Ultra Enterprise 2 workstation with a 296-MHz
UltraSPARC II processor is the current reference machine
for SPEC (It was a DEC machine originally)
Figure 1.19 SPEC2006Cint
• It uses Geometric Mean to report SPEC ratios execution times (in seconds)
for the Sun Ultra 5—the
reference computer of
SPEC2006—and execution
times and SPEC ratios for the
AMD A10 and Intel Xeon E5-
2690.
• Final two columns show
ratios of execution times
and SPEC ratios.
• The ratio of the execution
times is identical to the ratio
of the SPEC ratios, and the
ratio of the geometric
means (63.72/31.91 = 2.00)
is identical to the geometric
mean of the ratios (2.00).
• It shows the irrelevance of
reference computer in
relative performance.
Other Metrics to Measure
• Cost?
• Power and energy?
• Dependability?
Trends in Cost
• Cost driven down by learning curve
• Improving yield
• More good chips produced with same number of chips
58
Growth in Clock Rate on Power
• Intel 80386
consumed ~ 2 W
• 3.3 GHz Intel Core i7
consumes ~130 W
• Heat must be
dissipated from
1.5 x 1.5 cm chip
• This is the limit
of what can be
cooled by air
Power and Energy
• Power: amount of energy used per unit time
• Energy: 1 joule
= 1 ampere through 1 ohm for 1 second;
= energy required to accelerate a 1 kg mass at 1 m/s2 through a
distance of 1 meter.
• Power: 1 watt = 1 joule/second
2
Powerdynamic = 1 / 2 × CapacitiveLoad × Voltage × FrequencySwitched
2
= 1 / 2 × .85 × CapacitiveLoad × (.85×Voltage) × FrequencySwitched
= (.85)3 × OldPowerdynamic
≈ 0.6 × OldPowerdynamic
63
Leakage Current and Static Power
• Because leakage current still flows even when the clock to a
transistor is switched off (not powered off), now static
power is important too
Powerstatic = Currentstatic ´ Voltage
66
Define and Quantify Dependability (1/2)
• How to decide when a system is operating properly?
• Infrastructure providers now offer Service Level
Agreements (SLA) to guarantee that their networking or
power service would be dependable
• Systems alternate between 2 states of service with respect
to an SLA:
State 1: Service accomplishment, where the service is
delivered as specified in SLA
State 2: Service interruption, where the delivered
service is different from the SLA
• Failure = transition from state 1 to state 2
• Restoration = transition from state 2 to state 1
67
Define and Quantify Dependability (2/2)
• Module reliability = measure of continuous service accomplishment
(or expected time-to-failure).
• There are 2 metrics
• Mean Time To Failure (MTTF) measures Reliability
• No. of Failures In Time (FIT) = 1/MTTF, the Failure Rate
• Often reported as failures per billion hours of operation
• Mean Time To Repair (MTTR) measures Service Interruption
• Mean Time Between Failures (MTBF) = MTTF+MTTR
• Module availability measures service as alternate between states of
accomplishment and interruption (value between 0 and 1, e.g. 0.9)
• Module availability = MTTF / ( MTTF + MTTR) = MTTF / MTBF
Time
MTTF MTTR
MTBF
Example-1 of Calculating MTTF
• A disk subsystem with following components and MTTF. Assume
failures are independent, lifetimes are exponentially distributed.
10 disks - > MTTF is 1 x 106 hours for each disk
1 ATA disk controller - > MTTF is 0.5 x 106 hours
1 power supply - > MTTF is 0.2 x 106 hours
1 cooling fan - > MTTF is 0.2 x 106 hours
1 ATA cable - > MTTF is 1 x 106 hours
What is the MTTF of the disk system as a whole?
Answer:
Failure ratesystem = 10 x 1/(1x106)+ 1/(0.5x106) + 1/(0.2x106) +
1/(0.2x106) + 1/(1x106) = (10+2+5+5+1)/(1x106)
= 23,000/(1x109) hours = 23,000 Failures in Time (FIT)
MTTF = 1/Failure rate = 43,000 hours (~ 5 years)
Summary
• Quantify dependability
• Reliability (MTTF, FIT), Availability (99.9…)