0% found this document useful (0 votes)

28 views56 pages

CS5204/EE5364 - Advanced Computer Architecture - Performance

CSCI 5204/EE5364 is a course on Advanced Computer Architecture focusing on performance, power, and dependability, taught by Prof. Pen-Chung Yew at the University of Minnesota. The course includes two mid-term exams, homework assignments, and a team project, with a required textbook and suggested readings from various research papers. Key topics covered include measuring system performance, types of parallelism, and performance metrics such as speedup and execution time.

Uploaded by

wop8u3z8o

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views56 pages

CS5204/EE5364 - Advanced Computer Architecture - Performance

Uploaded by

wop8u3z8o

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

CSCI 5204/EE5364

Advanced Computer Architecture

Performance, Power, Dependability

Pen-Chung Yew
Department Computer Science and Engineering
University of Minnesota
http://www.cs.umn.edu/~yew

With slides from: Profs. Zhai, Mowry, Falsafi, Hill, Hoe, Lipasti, Shen,
Smith, Sohi, Vijaykumar, Patterson, Culler
Teaching Staffs and Course Resources
Instructor: Prof. Pen-Chung Yew
• Office: 6-225C Keller Hall
• E-mail: [email protected]
• Office phone: 612-625-7387
• Office hour: 14:30 pm -15:30 pm Tuesday
TA: Kartik Ramkrishnan
• Office: 6-244 Keller Hall
• Email: [email protected]
• Office hour: 1:30pm– 3:30pm Friday @ Rm L-103, Lind Hall
Web Page: https://canvas.umn.edu/courses/460175

2
Textbooks
Required Readings:
• Computer Architecture: A Quantitative Approach, 6th
Edition, John Hennessey and David Patterson. Morgan
Kauffman Publishers, 2018. (Available at the University
Bookstore.)

Suggested Readings
• Various research papers: Available online from the course
webpage.

3
More Information and Tech Papers
Major Architecture Conferences:
• International Symposium on Computer Architecture (ISCA)
• International Symposium on Microarchitecture (Micro)
• High-Performance Computer Architecture (HPCA)
• Architectural Support for Programming Languages and Operating
Systems (ASPLOS)
Major Compiler Conferences:
• Programming Language Design and Implementation (PLDI)
• Code Generation and Optimization (CGO)

Technical Weekly Reports

https://www.hpcwire.com/

4
Tentative Course Organization
• 2 Mid-term Exams (50%)
• 75-minute in-class exam each;
• Scheduled on Tue 10/15 & Tue 12/10, or check Canvas web site
• Each covers its respective portion of course materials;
• No final exam
• No make-up exams

• 2 Homework Assignments (20%)

• Project (30%)
• Team projects - a team of 2-3 students.
• A scaled-down version of a real research project (~8-week);
• Preferably aligned with your research interests, e.g., MS/PhD
work, or future job prospects.
• Novel ideas are encouraged, but not necessary
• Could be 1-person paper surveys
• Discuss with Professor Yew about your project proposal
• Project proposals due on Thursday 9/26/2024
Key to High Performance and Low Power:
Exploiting Various Kinds of Parallelism

• Classes of parallelism in applications: (Software)

• Data-Level Parallelism (DLP)
• Task-Level Parallelism (TLP)
• Request-Level Parallelism (RLP)
• Classes of architectural parallelism: (Hardware)
• Instruction-Level Parallelism (ILP)
• Vector architectures/Graphic Processor Units (GPUs)
• Thread-Level Parallelism (multicores)
• Request-Level Parallelism (warehouse-scale computers)
Flynn’s Taxonomy
• Single instruction stream, single data stream (SISD)
• Single instruction stream, multiple data streams (SIMD)
• Vector architectures
• Multimedia extensions (Intel AVX-512)
• Graphics processor units (GPUs)

• Multiple instruction streams, multiple data streams (MIMD)

• Tightly-coupled MIMD - scientific computing (latency)
• Loosely-coupled MIMD - mutliprogramming (throughput)

• Multiple instruction streams, single data stream (MISD)

• No commercial implementation
Outline – Performance, Power, Dependability

1. Define, quantify, and summarize performance

2. Define and quantify power
3. Define and quantify dependability

8
Measuring System Performance
• Two typical system performance metrics
• Speedup
• Execution time
• Benchmarks
Two Typical System Performance Metrics
• Response time (Figures 1.9 and 1.10)
• Also known as (a.k.a.) latency, running time, elapsed time,
completion time, execution time
• Time between end of an inquiry or a request and beginning of a
response
• 50-90X improvement for processors (over 25-40 years)
• 6-8X improvement for memory and disks (over 25-40 yrs)

• Throughput (Figure 1.9 and 1.10)

• The amount of work that a computer can do in a given period
• 32,000-40,000X improvement for processors (over 25-40 yrs)
• 300-1200X improvement for memory and disks (over 25-40 yrs)
Latency (Response Time) and Bandwidth (Throughput)

Figure 1.9 Log-log plot of

bandwidth and latency
milestones in Figure 1.10
relative to the first
milestone (over 25-40 yrs).
• Latency improved 8–91 ×
• Bandwidth improved
about 400–32,000 ×.
• Except networking,
modest improvements in
latency and bandwidth in
other three technologies
since ~2012: 0%–23% in
latency and 23%–70% in
bandwidth.
Figure 1.10 Performance
milestones over 25–40 years for
microprocessors, memory,
networks, and disks.
• Microprocessor milestones are
several generations of IA-32
processors, from 16-bit 80286
to 64-bit multicore, out-of-
order, superpipelined Core i7.
• Memory milestones go from
16-bit-wide, plain DRAM to 64-
bit-wide DDR4 SDRAM.
• Ethernet advanced from 10
Mbits/s to 400 Gbits/s.
• Disk milestones are based on
rotation speed, from 3600 to
15,000 RPM.
• Each case is best-case
bandwidth, and latency is the
time for a simple operation
assuming no contention.
13
Performance Expressed as Time
Definition:
1
PerformanceX =
Execution TimeX

Performance of system x is N times faster than system y:

PerformanceX Execution TimeY

Speedup = =
PerformanceY Execution TimeX
Review: Principles of Computer Design

• Processor Performance Equation and parameters that

impact processor performance (From EE4364/CS4203)

CPU time =
Review: Principles of Computer Design

• Different instruction types having different CPIs,

• e.g. integer add takes 1 cycle, while floating point add
takes 3 cycles
ICi : instruction counts for instruction type i
CPIi: # clock cycles per instruction for instruction type i
Review: CPI Example for Compiler
• Alternative compiled code sequences using instructions in classes A, B, C
Class A B C
CPI for class 1 2 3 IC: instruction count
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

n Sequence 1: IC = 5 n Sequence 2: IC = 6
n Clock Cycles n Clock Cycles

= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3

= 10 =9
n Avg. CPI = 10/5 = 2.0 n Avg. CPI = 9/6 = 1.5

CPU Time = # Clock Cycles x Clock Cycle Time

= (Instruction Count x CPI)/Clock Rate
Note: Performance determines by CPU Time, not by Instruction count alone
Review: Components of Performance

Instruction Clock
Count CPI Rate
Application X CPI
Program
Compiler X (X)
Instruction Clock
Instruction Set X X count Cycle time
(ISA)
Computer X X
Organization
Hardware X
Technology
Performance Expressed as Rate
• Rates are performance measures expressed in units of work
per unit time.
• Examples:
• millions of instructions / sec (MIPS)
• millions of floating point instructions / sec (MFLOPS)
• millions of bytes / sec (MBytes/sec)
• millions of bits / sec (Mbits/sec)
• Frames of images / sec
• samples / sec
• transactions / sec (TPS)

24
Example: MIPS (Million Instructions Per Second)

• Using rates as a performance measure can be misleading

• Example: millions of instructions / sec (MIPS)
• Instruction sets may be different
• Instructions do different amount of work and take
different amount of time to finish
• Compiler may be different
• Different instruction sequences can be chosen for
the same task with different optimization levels
• Input may be different
• Different program inputs may cause different
instruction mixes
25
Peak Rate
• Peak performance can be misleading
• Example: the i860 is advertised as having a peak rate of
80 MFLOPS (40 MHz with 2 flops per cycle).
• However, the measured performance of some compiled
linear algebra kernels (icc -O2) tells a different story:

Kernel 1d fft sasum saxpy sdot sgemm sgemv spvma

MFLOPS 8.5 3.2 6.1 10.3 6.2 15.0 8.1
%Peak (efficiency) 11% 4% 7% 13% 8% 19% 10%

26
Choosing Programs to Evaluate Performance
• Five levels of programs used:
• Real applications
• Modified (or scripted) applications
• Kernels
• Toy benchmarks
• Synthetic benchmarks

• How do we decide if computer A is better than

computer B?
• Which set of programs (benchmarks) should we use
for performance evaluation?
www.spec.org
• SPEC – System Performance Evaluation Corporation
• Desktop/Workstation benchmarks
• CPU intensive benchmarks
• Integer applications (CINT2006, 2017)

• Floating point applications (CFP2006, 2017)

• Graphics-intensive benchmarks
• SPECviewperf12

• SPECapc (Application Performance Characterization)

• Server Benchmarks
• CPU throughput benchmarks (SPECrate)
• File server benchmarks (SPECsfs)
28
• Web server benchmarks (SPECweb)
SPEC CPU Benchmarks (1989 – 2017)
SPEC 2017 Active Benchmark Suites

30
How to Compare a Suite Performance?
Computer A Computer B Computer C
Program 1 (secs) 1 10 20
Program 2 (secs) 1000 100 20

A is 10 times faster than B for program 1

B is 10 times faster than A for program 2
A is 20 times faster than C for program 1
C is 50 times faster than A for program 2
B is 2 times faster than C for program 1
C is 5 times faster than B for program 2

• All statements are correct

But, which computer performs better? 31

Measure I : Total/Average Execution Time

Computer A(sec) Computer B(sec) Computer C(sec)

Program 1 1 10 20
Program 2 1000 100 20
----------------------------------------------------------------------------------------
Arithmetic Mean 500.5 55 20

1 n
Arithmetic Mean = å
n i =1Timei

Timei is the execution of the ith program with a total of n programs

in the workload
Problems with Arithmetic Mean
§ Applications do not have the same probability of being run
§ Longer programs weigh more heavily in the average
For example, two machines timed on two benchmarks
Machine A Machine B
Program 1 2 seconds (80%) 4 seconds (80%)
Program 2 12 seconds (20%) 8 seconds (20%)

• If we take arithmetic mean, Program 2 “counts more”

than Program 1
• An improvement in Program 2 changes the average
more than a proportional improvement in Program 1
• But Program 1 is run 4X more frequently than Program 2
Example
• You live 20 miles away from school, you drive
• 60mile/hour for the first 10 miles, and then drive
10mile/hour for the next 10 miles
• What is you average speed going home?
Wrong answer:
(60miles/hour + 10miles/hour) / 2 = 35miles/hour (This is wrong!!)
20 miles/ 35miles/hour = 0.57 hour = 34.2 min
Correct answer:
10miles / 60miles/hour + 10miles/10miles/hour = 1/6 + 1 = 7/6 hr = 1 hr 10 min
20miles / (7/6 hr) = 17.14miles/hour
Alternatively:
1/6 hr + 1 hr = 7/6 hr total, so 1/7 vs. 6/7 weights
1/7 * 60miles/hour + 6/7 * 10miles/hour = 17.14miles/hour
Measure II - Weighted Execution Time
Machine A(S) Machine B(S) Machine C(S) Weightings
Program 1 1 10 20 W(1) W(2) W(3)

Program 2 1000 100 20

-------------------------------------------------------------------------------------- .999 .909 .50

Arith Mean: (w1) 2.00 10.09 20 .001 .091 .50

Arith Mean: (w2) 91.91 18.19 20 Weight Calculation:

Arith Mean: (w3) 500.50 55.00 20 A: 1/1001 = 0.001
1000/1001 = 0.999

å Weighti´Time
n B: 10/110 = 0.091
Weighted Arithmetic Mean = i =1 i 100/110 = 0.909
C: 20/40 = 0.5
Timei is the execution of the ith program in the workload

Weighti is the frequency of the ith program in the workload

Problem:
How can we agree on one set of weights?
Recap (09/10/2024)
• Course organization – Term project proposal due 9/26/24
• We us execution time, throughput and power as our
performance metrics
• A suite of benchmarks such as those produced by SPEC is
often used to measure and compare performance
• Need to summarize the performance measurements across
benchmark suite and across machines.
• Arithmetic means
• Applications are run disproportionately
• Longer programs have more weights in arithmetic means
• Weighted arithmetic means
• Difficult to determine the weights
• Geometric means 38
Measure III - Normalized Execution Time
• Normalize execution time to a reference machine
• Take the average of the normalized execution time
Normalized against machine A
Machine A(S) Machine B Machine C
Program 1 1 (1) 10 (10) 20 (20)
Program 2 1000 (1) 100 (0.1) 20 (0.02)
Normalized against machine B
Machine A Machine B(S) Machine C
Program 1 1 (0.1) 10 (1) 20 (2)
Program 2 1000 (10) 100 (1) 20 (0.2)
Normalized against machine C
Machine A Machine B Machine C(S)
Program 1 1(0.05) 10 (0.5) 20 (1)
Program 2 1000(50) 100 (5) 20 (1)
Geometric Means

n
Geometric Rate
Means
n
Õ
i =1 Rateref

n
n
Õ Rate
i =1
i

n
n
Õ Rate
i =1
ref
Normalized Execution Time
Normalized to A Normalized to B Normalized to C
A B C A B C A B C
Program 1 1.0 10 20 1.0 1.0
Program 2 1.0 0.1 0.02 1.0 1.0
Arithmetic 1.0 5.05 10.01 1.0 1.0
Mean
Geometric 1.0 1.0 0.63 1.0 1.0
Mean

Normalized against machine A

Machine A(S) Machine Machine C
Program 1 1 (1) 10 (10) 20 (20)
Program 2 1000 (1) 100 (0.1) 20 (0.02)
Normalized Execution Time
Normalized to A Normalized to B Normalized to C
A B C A B C A B C
Program 1 1.0 10 20 0.1 1.0 2.0 1.0
Program 2 1.0 0.1 0.02 10 1.0 0.2 1.0
Arithmetic 1.0 5.05 10.01 5.05 1.0 1.1 1.0
Mean
Geometric 1.0 1.0 0.63 1.0 1.0 0.63 1.0
Mean

Normalized against machine B

Machine A Machine B(S) Machine C
Program 1 1 (0.1) 10 (1) 20 (2)
Program 2 1000 (10) 100 (1) 20 (0.2)
Normalized Execution Time

Normalized to A Normalized to B Normalized to C

A B C A B C A B C
Program 1 1.0 10 20 0.1 1.0 2.0 0.05 0.5 1.0
Program 2 1.0 0.1 0.02 10 1.0 0.2 50 5 1.0
Arithmetic 1.0 5.05 10.01 5.05 1.0 1.1 25.03 2.75 1.0
Mean
Geometric 1.0 1.0 0.63 1.0 1.0 0.63 1.58 1.58 1.0
Mean

Normalized against machine C

Machine A Machine B Machine C(S)
Program 1 1(0.05) 10 (0.5) 20 (1)
Program 2 1000(50) 100 (5) 20 (1)
Normalized Execution Time

Normalized to A Normalized to B Normalized to C

Normalized against machine C

Machine A Machine B Machine C(S)
Program 1 1(0.05) 10 (0.5) 20 (1)
Program 2 1000(50) 100 (5) 20 (1)
Calculating Geometric Means
Geometric Arithmetic
n = # of programs
Means Means
in benchmark suite
n
Rate 1 n Timeref
n
Õ å
n i =1 Timei
i =1 Rateref

1 n
n å
n i =1
Timeref
n
Õ Rate i
i =1 1 n
n å Timei
n
Õ Rate
i =1
ref n i =1

• For a geometric mean, ratio of • Arithmetic mean does not

geometric means & geometric have the same property
mean of the ratio is the same, i.e.
we can choose any machine as a
reference machine.
Pros & Cons of Geometric Mean
Advantages:
• Ratio of geometric means & geometric mean of the ratio is the same
• Geometric means of normalized execution times are consistent despite
the choice of reference machine
Disadvantages:
• Geometric means do not predict execution time
• Encourages optimizations on benchmarks that are easy to improve
• Example:
• An optimization that improve one benchmark from 2sec to 1sec
• An optimization that improve one benchmark from 1000s to 500s?
• Which one did a better optimization job?
These are the same with our measurement using geometric means!
But, improving 500 secs is more challenging than improving 1 sec
Observations
• Arithmetic mean and geometric mean can come to
different conclusions
• The arithmetic mean performance varies depending on
the system you normalize it to, i.e. depending on the
reference machine chosen
-- > Not ideal on ratios, especially on speedups!
• The geometric mean is consistent no matter which
machine is the reference, but
• Does not predict runtime because it normalizes
execution time
• Each application now counts equally

53
How Does SPEC Reports Performance?
• A Sun Ultra Enterprise 2 workstation with a 296-MHz
UltraSPARC II processor is the current reference machine
for SPEC (It was a DEC machine originally)
Figure 1.19 SPEC2006Cint
• It uses Geometric Mean to report SPEC ratios execution times (in seconds)
for the Sun Ultra 5—the
reference computer of
SPEC2006—and execution
times and SPEC ratios for the
AMD A10 and Intel Xeon E5-
2690.
• Final two columns show
ratios of execution times
and SPEC ratios.
• The ratio of the execution
times is identical to the ratio
of the SPEC ratios, and the
ratio of the geometric
means (63.72/31.91 = 2.00)
is identical to the geometric
mean of the ratios (2.00).
• It shows the irrelevance of
reference computer in
relative performance.
Other Metrics to Measure
• Cost?
• Power and energy?
• Dependability?
Trends in Cost
• Cost driven down by learning curve
• Improving yield
• More good chips produced with same number of chips

• DRAM: price closely tracks cost

• Microprocessors: price depends on volume

• 10% less for each doubling of volume
• Amortizing initial engineering costs
Other Metrics to Measure
• Cost?
• Power and energy?
• Dependability?

58
Growth in Clock Rate on Power

• Intel 80386
consumed ~ 2 W
• 3.3 GHz Intel Core i7
consumes ~130 W
• Heat must be
dissipated from
1.5 x 1.5 cm chip
• This is the limit
of what can be
cooled by air
Power and Energy
• Power: amount of energy used per unit time
• Energy: 1 joule
= 1 ampere through 1 ohm for 1 second;
= energy required to accelerate a 1 kg mass at 1 m/s2 through a
distance of 1 meter.
• Power: 1 watt = 1 joule/second

• Challenges: Get more power in, get more heat out

• Thermal Design Power (TDP)
• Characterizes sustained power consumption
• Used as target for power supply and cooling system
• Lower than peak power (1.5X higher), higher than average power
consumption
• Energy-per-task is often a better measurement w.r.t battery life
• There are two components in the power consumed
• Static Power and Dynamic Power
Static Power
• Static power consumption
• 25-50% of total power
• Powerstatic = Currentstatic x Voltage
• Scales with number of transistors
• To reduce: use power gating, i.e. cut off the power
Define and Quantify Dynamic Power
• For CMOS chips, dominant energy consumption has been in
switching transistors on and off, called dynamic power
2
Powerdynamic = 1 / 2 × CapacitiveLoad × Voltage × FrequencySwitched
• For mobile devices, energy is a better metric for battery life
2
Energydynamic = CapacitiveLoad ´ Voltage
• Capacitive load - a function of number of transistors
connected to output and technology, which determines
capacitance of wires and transistors
• Dropping voltage helps both, so went from 5V to recent 1V
• For a fixed task, slowing clock rate (frequency switched)
reduces power, but not energy
• To save energy & dynamic power, most CPUs now turn off
clock of inactive modules (e.g. Fl. Pt. Unit) – Clock Gating
Example of Quantifying Power
• Suppose 15% reduction in voltage results in a 15%
reduction in frequency. What is impact on dynamic power?

2
Powerdynamic = 1 / 2 × CapacitiveLoad × Voltage × FrequencySwitched
2
= 1 / 2 × .85 × CapacitiveLoad × (.85×Voltage) × FrequencySwitched
= (.85)3 × OldPowerdynamic
≈ 0.6 × OldPowerdynamic

63
Leakage Current and Static Power
• Because leakage current still flows even when the clock to a
transistor is switched off (not powered off), now static
power is important too
Powerstatic = Currentstatic ´ Voltage

• Leakage current increases in processors with smaller

transistor sizes
• Increasing the number of transistors increases static power
even if they are idle
• In 2006, goal for leakage is 25% of total power
consumption; high performance designs at 40%
• Very low power systems even gate voltage to inactive
modules to control loss due to leakage, i.e. power gating
64
Reducing Power
• Techniques for reducing power:
• Power Gating (“dark silicon” approach) – turn off power to
components on chip not in use
• Dynamic Voltage-Frequency Scaling (DVFS) – changing/scaling
voltage and clock frequency (rate) to tune performance/power
Figure 1.12 Energy savings for a
server using an AMD Opteron
microprocessor, 8 GB of DRAM, and
one ATA disk.
• At 1.8 GHz, server can handle at
most up to 2/3 of workload
without causing service-level
violations
• At 1 GHz, it can safely handle
only 1/3 of workload

• Low-power state for processor, DRAM, disks

• Overclocking (increase clock rate briefly) to boost performance
Other Metrics to Measure
• Cost?
• Power?
• Dependability?

66
Define and Quantify Dependability (1/2)
• How to decide when a system is operating properly?
• Infrastructure providers now offer Service Level
Agreements (SLA) to guarantee that their networking or
power service would be dependable
• Systems alternate between 2 states of service with respect
to an SLA:
State 1: Service accomplishment, where the service is
delivered as specified in SLA
State 2: Service interruption, where the delivered
service is different from the SLA
• Failure = transition from state 1 to state 2
• Restoration = transition from state 2 to state 1
67
Define and Quantify Dependability (2/2)
• Module reliability = measure of continuous service accomplishment
(or expected time-to-failure).
• There are 2 metrics
• Mean Time To Failure (MTTF) measures Reliability
• No. of Failures In Time (FIT) = 1/MTTF, the Failure Rate
• Often reported as failures per billion hours of operation
• Mean Time To Repair (MTTR) measures Service Interruption
• Mean Time Between Failures (MTBF) = MTTF+MTTR
• Module availability measures service as alternate between states of
accomplishment and interruption (value between 0 and 1, e.g. 0.9)
• Module availability = MTTF / ( MTTF + MTTR) = MTTF / MTBF

Time

MTTF MTTR
MTBF
Example-1 of Calculating MTTF
• A disk subsystem with following components and MTTF. Assume
failures are independent, lifetimes are exponentially distributed.
10 disks - > MTTF is 1 x 106 hours for each disk
1 ATA disk controller - > MTTF is 0.5 x 106 hours
1 power supply - > MTTF is 0.2 x 106 hours
1 cooling fan - > MTTF is 0.2 x 106 hours
1 ATA cable - > MTTF is 1 x 106 hours
What is the MTTF of the disk system as a whole?
Answer:
Failure ratesystem = 10 x 1/(1x106)+ 1/(0.5x106) + 1/(0.2x106) +
1/(0.2x106) + 1/(1x106) = (10+2+5+5+1)/(1x106)
= 23,000/(1x109) hours = 23,000 Failures in Time (FIT)
MTTF = 1/Failure rate = 43,000 hours (~ 5 years)
Summary

• Performance is important to measure

• For computer architects -> comparing different
design/implementation options
• For developers of software optimizing their code -> applications
• For users -> deciding which machine to use, or to buy

• Performance metric are subtle

• Easy to mess up the “machine A is XXX times faster than machine
B” numerical performance comparison
• You need to know exactly what you are measuring: time, rate,
throughput, CPI, cycles, etc
• You need to know how combining these to give aggregate numbers
does different kinds of “distortions” to the individual numbers
• No metric is perfect, so lots of emphasis on standard benchmarks
and several different metrics to get a better overall picture
And in conclusion …
• Tracking and extrapolating technology are part of
computer architect’s responsibility
• Expect bandwidth in disks, DRAM, network, and
processors to improve by at least as much as the square
of the improvement in latency
• Quantify dynamic and static power
• ½ x Capacitance x Voltage2 x frequency = 1/2 C x V2 x f
• Energy vs. power

• Quantify dependability
• Reliability (MTTF, FIT), Availability (99.9…)

• Quantify and summarize performance

71
• Arithmetic Mean, Ratios and Geometric Mean

Cs23402 - Computer Architecture - Unit - 1
No ratings yet
Cs23402 - Computer Architecture - Unit - 1
161 pages
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
80% (5)
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
118 pages
Ios Mat 0010 13
50% (2)
Ios Mat 0010 13
55 pages
Hand Tools - Metal: Marking Out, Measurement, Fitting & Assembly
100% (2)
Hand Tools - Metal: Marking Out, Measurement, Fitting & Assembly
16 pages
2 RISC V Performance ISA
No ratings yet
2 RISC V Performance ISA
72 pages
Anila 8611
No ratings yet
Anila 8611
18 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Unit 1
No ratings yet
Unit 1
68 pages
Module 2 (26-10-2024)
No ratings yet
Module 2 (26-10-2024)
50 pages
Lecture 02 CH01 Performance Power
No ratings yet
Lecture 02 CH01 Performance Power
76 pages
CSA Performance
No ratings yet
CSA Performance
40 pages
CH - 4 Flip-Flops and Related Devices
No ratings yet
CH - 4 Flip-Flops and Related Devices
137 pages
San Miguel Corporation Business Model Canvas
71% (7)
San Miguel Corporation Business Model Canvas
2 pages
4 Performance
No ratings yet
4 Performance
67 pages
Performance
No ratings yet
Performance
51 pages
Lecture 11. Rice Milling Technology
100% (1)
Lecture 11. Rice Milling Technology
16 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
Quatitative Principle
No ratings yet
Quatitative Principle
56 pages
Chapter4 Performance
No ratings yet
Chapter4 Performance
36 pages
SEN307 Lecture 5
No ratings yet
SEN307 Lecture 5
34 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
52 pages
Lec 2
No ratings yet
Lec 2
31 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Computer Architecture and Performance
No ratings yet
Computer Architecture and Performance
33 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
Computer Architecture: Vnu - University Engineering Technology
No ratings yet
Computer Architecture: Vnu - University Engineering Technology
30 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
CCS 1202 Lecture 2 - Computer Evolution and Performance
No ratings yet
CCS 1202 Lecture 2 - Computer Evolution and Performance
32 pages
MODULE 12 GMOs AND GENE THERAPY
100% (2)
MODULE 12 GMOs AND GENE THERAPY
37 pages
Designing For Performance - Performance Metrics
No ratings yet
Designing For Performance - Performance Metrics
19 pages
Computer Performance
No ratings yet
Computer Performance
18 pages
ACA Lec2 New
No ratings yet
ACA Lec2 New
44 pages
Da Ci
No ratings yet
Da Ci
13 pages
Lecture # 2
No ratings yet
Lecture # 2
33 pages
Chapter Two
No ratings yet
Chapter Two
33 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
No ratings yet
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
52 pages
Performance Issues
No ratings yet
Performance Issues
19 pages
CSC232 - Chp1 (Compatibility Mode)
No ratings yet
CSC232 - Chp1 (Compatibility Mode)
50 pages
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
No ratings yet
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
23 pages
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Life Saving Rules Poster in English
No ratings yet
Life Saving Rules Poster in English
11 pages
Computer Performance
No ratings yet
Computer Performance
17 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
09 Perf
No ratings yet
09 Perf
22 pages
4 Perfrmance
No ratings yet
4 Perfrmance
30 pages
L7 Performance
No ratings yet
L7 Performance
11 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
17 pages
Lesson 3 - Computing For Performance
No ratings yet
Lesson 3 - Computing For Performance
38 pages
1 - Introduction To Computer System
No ratings yet
1 - Introduction To Computer System
31 pages
Lect 1
No ratings yet
Lect 1
56 pages
Week 2 - Lecture 2 - Performance Measurement
No ratings yet
Week 2 - Lecture 2 - Performance Measurement
25 pages
Chapter 4 Flexural Design - (Part 3)
No ratings yet
Chapter 4 Flexural Design - (Part 3)
37 pages
Lect 1
No ratings yet
Lect 1
54 pages
Performance Measures
No ratings yet
Performance Measures
25 pages
Computer Performance
No ratings yet
Computer Performance
22 pages
Advanced Computer Architecture: 563 L02.1 Fall 2011
No ratings yet
Advanced Computer Architecture: 563 L02.1 Fall 2011
57 pages
Group 4 Travel Device
No ratings yet
Group 4 Travel Device
8 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
CCNA4e Case Study
No ratings yet
CCNA4e Case Study
12 pages
Lec 2 Performance
No ratings yet
Lec 2 Performance
28 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
Assessing and Understanding Performance
No ratings yet
Assessing and Understanding Performance
31 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
Performance
No ratings yet
Performance
12 pages
Lecture 1 8405 Computer Architecture
No ratings yet
Lecture 1 8405 Computer Architecture
15 pages
This Unit: - Metrics
No ratings yet
This Unit: - Metrics
7 pages
Exim Bank Claim Form
No ratings yet
Exim Bank Claim Form
9 pages
Performance: Latency
No ratings yet
Performance: Latency
7 pages
CH 3 Geo Drainage
No ratings yet
CH 3 Geo Drainage
29 pages
Augmentix: Gage Repeatability & Reproducibility
No ratings yet
Augmentix: Gage Repeatability & Reproducibility
4 pages
LG 50PM4700-TA Chassis PA22A
No ratings yet
LG 50PM4700-TA Chassis PA22A
73 pages
Pakistan Tobacco Company Assignment
No ratings yet
Pakistan Tobacco Company Assignment
9 pages
Notation: Ae Aeff An
No ratings yet
Notation: Ae Aeff An
4 pages
Florida Department of Children and Families Legislative Budget Request FY 2010-11
No ratings yet
Florida Department of Children and Families Legislative Budget Request FY 2010-11
419 pages
Active Driveline
No ratings yet
Active Driveline
17 pages
Gul Nawaz CV
No ratings yet
Gul Nawaz CV
2 pages
6 Hobbies That Can Build Up Your Creativity and Imagination
No ratings yet
6 Hobbies That Can Build Up Your Creativity and Imagination
1 page
SpeedHeat EzeeStat II Instructions Rev 04
No ratings yet
SpeedHeat EzeeStat II Instructions Rev 04
4 pages
Patient Registration Form 29
No ratings yet
Patient Registration Form 29
8 pages
Ijfs 11 00110
No ratings yet
Ijfs 11 00110
17 pages
Alnpp0187h 2024
No ratings yet
Alnpp0187h 2024
8 pages
TEDxYouth Programme
No ratings yet
TEDxYouth Programme
2 pages
Httpssimplifydays.s3.Us West 2.amazonaws - Comsimplifybook Video4 PDF
No ratings yet
Httpssimplifydays.s3.Us West 2.amazonaws - Comsimplifybook Video4 PDF
7 pages
INtro To Eco
No ratings yet
INtro To Eco
5 pages
Uduud - Google Search
No ratings yet
Uduud - Google Search
2 pages
Quikcalc Eplus - Esercizio 21
No ratings yet
Quikcalc Eplus - Esercizio 21
1 page

CS5204/EE5364 - Advanced Computer Architecture - Performance

Uploaded by

CS5204/EE5364 - Advanced Computer Architecture - Performance

Uploaded by

CSCI 5204/EE5364

Advanced Computer Architecture

Performance, Power, Dependability

Technical Weekly Reports

• 2 Homework Assignments (20%)

• Classes of parallelism in applications: (Software)

• Multiple instruction streams, multiple data streams (MIMD)

• Multiple instruction streams, single data stream (MISD)

1. Define, quantify, and summarize performance

• Throughput (Figure 1.9 and 1.10)

Figure 1.9 Log-log plot of

Performance of system x is N times faster than system y:

PerformanceX Execution TimeY

• Processor Performance Equation and parameters that

• Different instruction types having different CPIs,

= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3

CPU Time = # Clock Cycles x Clock Cycle Time

• Using rates as a performance measure can be misleading

Kernel 1d fft sasum saxpy sdot sgemm sgemv spvma

• How do we decide if computer A is better than

• Floating point applications (CFP2006, 2017)

• SPECapc (Application Performance Characterization)

A is 10 times faster than B for program 1

• All statements are correct

But, which computer performs better? 31

Computer A(sec) Computer B(sec) Computer C(sec)

Timei is the execution of the ith program with a total of n programs

• If we take arithmetic mean, Program 2 “counts more”

Program 2 1000 100 20

Arith Mean: (w1) 2.00 10.09 20 .001 .091 .50

Arith Mean: (w2) 91.91 18.19 20 Weight Calculation:

Weighti is the frequency of the ith program in the workload

Normalized against machine A

Normalized against machine B

Normalized to A Normalized to B Normalized to C

Normalized against machine C

Normalized to A Normalized to B Normalized to C

Normalized against machine C

• For a geometric mean, ratio of • Arithmetic mean does not

• DRAM: price closely tracks cost

• Microprocessors: price depends on volume

• Challenges: Get more power in, get more heat out

• Leakage current increases in processors with smaller

• Low-power state for processor, DRAM, disks

• Performance is important to measure

• Performance metric are subtle

• Quantify and summarize performance

You might also like