0% found this document useful (0 votes)

10 views13 pages

Da Ci

Chapter 2 discusses comprehensive performance assessment across computer architectures, focusing on optimization targets, functional requirements, and performance measurement. It emphasizes the importance of understanding system tuning reports, benchmarking, and the impact of various factors like clock speed and execution time on performance. Additionally, it introduces Amdahl's Law and energy considerations in system design.

Uploaded by

Ammar Dridi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views13 pages

Da Ci

Uploaded by

Ammar Dridi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Chapter 2: Comprehensive Performance

Assessment Across Architectures

Dr. A. Djenadi

Chapter Objectives
The knowledge provided in this chapter will prove valuable to you, whether you
are tasked with choosing a new system or aiming to enhance the performance
of an existing one.
Additionally, the chapter explores various factors influencing performance.
By the end of this chapter, you will have a clear understanding of what to
examine in system tuning reports and how each piece of information contributes
to the broader perspective of overall system performance.

Introduction
The word architecture covers all three aspects of computer design: Software,
Instruction set architecture, and hardware.

Optimization Targets
• Software
• Instruction set architecture (ISA)
• Hardware
• Programming language

• Compiler
• Microarchitecture
• Transistor

Functional Requirements
Definition
This refers to the intended functionality and capabilities of the computer system.

1
Application Area
• Personal mobile device: Real-time performance, graphics, videos and
audio, energy efficiency.
• Desktop computer: Real-time performance, graphics, videos and audio.
• Servers: Support for databases and transaction processing; enhance-
ments for reliability and availability; support for scalability.
• Clusters computers: Throughput performance for many independent
tasks; error correction for memory; energy proportionality.
• Internet of things / Embedded computing: Special support for
graphics or video or other application-specific extension; power limitations
and power control may be required; real-time constraints.

Level of Software Compatibility

• Operating system requirements (Necessary features to support chosen
OS).
• Certain standards may be required by marketplace.
• Floating point Format and arithmetic: IEEE 754 standard, special arith-
metic for graphics or signal processing.
• I/O interfaces: For I/O devices: Serial ATA, Serial Attached SCSI, PCI
Express.
• Networks: Support required for different networks: Ethernet.
• Programming languages: Languages (ANSI C, C++, Java, Fortran) affect
instruction set.

Trends in Technology
Computer architects must stay updated on swiftly changing implementation
technologies, including:
• Integrated circuit logic technology: Transistor density and increases in die
size. However, this increase does not follow Moore’s law.
• Semiconductor DRAM (dynamic random-access memory).
• Semiconductor Flash (electrically erasable programmable read-only mem-
ory). This nonvolatile semiconductor memory is the standard storage
device in PMDs.
• Magnetic disk technology.
• Network technology.

2
Performance Measurement and Analysis
Question 1
What does it mean when we say that computer X has better performance than
computer Y?

Answer 1
Computer X is faster than computer Y.

Question 2
What does it mean that computer X is faster than computer Y?

Answer 2
It depends on the perspectives of the users and on both external and internal
considerations of the machine.

User Perspective
The user of a desktop computer may say a computer is faster when a program
runs in less time, while a computer center administrator may say a computer is
faster when it completes more transactions per unit time.

Metrics
• Response time (execution time): Defined as the time between the
start and the completion of an event.
• Throughput: Defined as the total amount of work done in a given time.

Important
The primary, consistent, and reliable indicator measure of performance is the
execution time of real programs.

Time & Computer: The Clock System

The actions carried out by a processor, such as retrieving an instruction, in-
terpreting the instruction, loading and storing data, and executing arithmetic
operations, are controlled by a system clock.
Typically, all operations begin with the pulse of the clock.
At the most fundamental level, the speed of a processor is dictated by the
pulse frequency produced by the clock, measured in cycles per second, or Hertz
(Hz).

3
Clock Signal Generation
• Quartz crystal
• Analog to Digital conversion

Example 1
1-GHz processor receives 1 billion pulses per second.
The rate of pulses is known as the clock rate, or clock speed (Frequency).
One increment, or pulse, of the clock is referred to as a clock tick.
The time between pulses is the cycle time, clock periods, cycles.

CPU Time (Execution Time): The Processor Per-

formance Equation
CPU time (execution time) for a program can be expressed in seconds in two
ways:

• CPU time = CPU clock cycles for a program × Clock cycle time (period)
• CPU time = CPU clock cycles for a program
Clock rate

Definitions
• CPU Time (execution time): This is the total time the CPU spends
executing a specific program. It is often measured in seconds.
• CPU Clock Cycles for a Program: This refers to the number of
clock cycles (periods) the CPU takes to execute all the instructions in the
program.

• Clock Cycle Time (period): This is the duration of a single clock

cycle, measured in seconds. It represents the time it takes for the CPU to
complete one clock cycle.
• Clock rate: This is the clock frequency (the number of clock cycles per
second).

Example 2
A program P1 consists of 30 instructions.
Clock frequency = 1 GHz
Number of cycles per instruction = 3 cycles
1
Cycle time = 1000 = 0.001µs = 1ns
CPU time for P1 = Execution time for P1 = 30 × 3 × 1 = 90ns

4
Expressing the Initial Formula in Terms of Units
of Measurement
• Instructions

• Clock cycles
• Seconds

As this formula demonstrates, processor performance is dependent upon

three characteristics:
• Clock cycle time (period): Hardware technology and organization.
• Clock cycles per instruction (CPI): Organization and instruction set ar-
chitecture.
• Instruction count: Instruction set architecture and compiler technology.

Remarks
Executing an instruction involves multiple steps, such as retrieving it from mem-
ory, decoding, and performing operations. Thus, most instructions on most pro-
cessors require multiple clock cycles to complete. Some instructions may take
only a few cycles, while others require dozens.
On any given processor, the number of clock cycles required varies for dif-
ferent types of instructions, such as load, store, branch, and so on.
A straight comparison of clock speeds (frequency) on different processors
does not tell the whole story about performance.

The Overall CPI or the Global CPI

P
(Instruction count×CPI)
• Global CPI = Instruction count

• The overall version of the CPI calculation considers each specific CPI and
ICi
its frequency in a program (i.e., Instruction count ).

• Because it must include pipeline effects, cache misses, and any other mem-
ory system inefficiencies, CPI should be measured and not just calculated
from a table in the back of a reference manual.

Example 3
Suppose we made the following measurements:

• Frequency of floating point (FP) operations: 25%

5
• Average CPI of FP operations: 4 cycles
• Average CPI of other instructions: 1.33 cycles
What is the CPI global?
CPI global = 0.25 × 4 + 0.75 × 1.33 = 2 cycles

Performance Comparison
We often compare the performance of two different computers, X and Y, by
using the assessment ”X is faster than Y”, which means that execution time is
lower on X than on Y for the given task.
In particular, ”X is n times as fast as Y” will mean:
Execution timeY
=n
Execution timeX
We suppose that the execution time is the reciprocal of performance, thus
we have the following relationship:
Execution timeY PerformanceX
=
Execution timeX PerformanceY

Throughput Metric
The execution time can be replaced by the throughput metric to compare the
performance between X and Y in terms of the amount of work done in a given
time.

Example
The throughput of X is 5.2 times as fast as Y signifies here that the number of
tasks completed per unit time on computer X is 5.2 times the number completed
on Y.

Remarks
• Execution time is expressed in seconds. It may include or not: instruction
processing; memory access; I/O; interruptions; operating system overhead.
• Output throughput is expressed in the number of instructions per second
(for a processor), the number of queries processed per hour (for a server),
MIPS (Million Instructions Per Second), and MFLOPS (Million Floating-
point Operations Per Second).

6
Benchmarks
Definition
Performance benchmarking involves objectively evaluating the performance of
one system (e.g., computer, software, component) in comparison to another.
Reliable benchmarks play a crucial role in cutting through marketing exag-
gerations and statistical manipulations. In essence, effective benchmarks help
pinpoint systems that deliver optimal performance at a reasonable cost.

Benchmark Types
• Kernels: Represents small, key pieces of real applications, such as Quick-
sort.
• Synthetic benchmarks: Consists of fake programs invented to imitate
the behavior of real applications, such as Dhrystone.

Flaws and Limitations

• The compiler writer and architect can manipulate the test results by mak-
ing the computer appear faster on these surrogate programs than on real
applications.
• The use of benchmark-specific compiler flags to improve the performance
of a benchmark. These flags often caused transformations that would be
illegal on many programs or would slow down performance.

• Modification of the source code of the benchmarks:

– No modifications allowed.
– Modifications allowed but impossible to be made (Database bench-
marks).
– Source modifications are allowed, as long as the altered version pro-
duces the same output.

Better Benchmarking Solution: Benchmark Suites

An accepted solution for performance assessment is the use of collections of
benchmark applications, called benchmark suites.
A key advantage of such suites is that the weakness of any one benchmark
is lessened by the presence of the other benchmarks.

7
SPEC: Standard Performance Evaluation Corpo-
ration
The most recognized standardized benchmark application suites have been the
SPEC (Standard Performance Evaluation Corporation).
The first benchmark suites version was developed in 1980 to benchmark
workstations. Currently, there are SPEC benchmarks to cover many application
classes. All the SPEC benchmark suites and their reported results are found at
http://www.spec.org.

SPEC Benchmarks
• Cloud: Cloud, JaaS 2016
• CPU: CPU2017

• Graphics and Workstation: SPECviewperf12, SPECvpe V2.0, SPECapeSM

for 3ds Max 2015, SPECapeSM for Maya20212, SPECapeSM for PTC
Creo 3.0, SPECapeSM for Siemens NX 9.0 and 10.0, SPECapeSM for
SolidWorks 2015
• High Performance Computing: ACCEL, MPI2007, OMP2012

• Java client/server: SPECjbb2015

• Power: SPECpower ssj2008
• Server (SFS): SFS2014, SPECsfs2008
• Virtualization: SPECvirt sc2013

Reporting Performance Results

The key principle in presenting performance measurements should prioritize
reproducibility, ensuring that another experimenter can replicate the results.
A SPEC benchmark report requires an extensive description of the computer
and the compiler flags, as well as the publication of both the baseline and the
optimized results.
Alongside hardware, software, and baseline tuning details, a SPEC report
includes performance times displayed in tables and graphs.

SPEC Results Comparison: SPECRatio

A normalization of the execution times to a reference computer by dividing
the time on the reference computer by the time on the computer being rated,
yielding a ratio proportional to performance. SPEC uses the SPECRatio.

8
For example, suppose that the SPECRatio of computer A on a benchmark
is 2.56 times as fast as computer B; then we know:
Execution timereference PerformanceA
2.56 = =
Execution timeA PerformanceB

Geometric Mean
After choosing a benchmark suite, the performance results of the suite are sum-
marized in a unique number that is the geometric mean of the SPECRatio of
the programs in the suite.
v
u n
uY
n
Geometric mean = t Samplei
i=1

In the case of SPEC, samplei is the SPECRatio for program i.

Why Use Geometric Mean

• The geometric mean of the ratios is the same as the ratio of the geometric
means.
• The ratio of the geometric means is equal to the geometric mean of the
performance ratios, which implies that the choice of the reference computer
is irrelevant.

Performance Enhancement: Amdahl’s Law

Objective
Enhancing the performance by improving a portion of a computer.

Definition
Amdahl’s Law states that the performance improvement to be gained from using
some faster mode of execution is limited by the fraction of the time the faster
mode can be used.

Speedup
Amdahl’s Law defines the speedup that can be gained by using a particular
feature. Speedup is the ratio given by:
Performance for entire task using the enhancement when possible
Speedup =
Performance for entire task without using the enhancement
Or, function of the execution times:
Execution time for entire task without using the enhancement
Speedup =
Execution time for entire task using the enhancement when possible

9
Amdahl’s Law Factors
• Fractionenhanced : T hef ractionof thecomputationtimeintheoriginalcomputerthatcanbeconvertedtotakeadvan
T heimprovementgainedbytheenhancedexecutionmode.T hisvalueisthetimeof theoriginalmodeoverthetimeof t

The New Enhanced Execution Time

The execution time using the original computer with the enhanced mode will
be the time spent using the unenhanced portion of the computer plus the time
spent using the enhancement:
Fractionenhanced
Execution timenew = Execution timeold ×(1−Fractionenhanced )+
Speedupenhanced

The overall speedup is given by:

Execution timeold
Speedupoverall =
Execution timenew

Example: Amdahl’s Law

Suppose that we want to enhance the processor used for web serving. The new
processor is 10 times faster on computation in the web serving application than
the old processor. Assuming that the original processor is busy with computa-
tion 40% of the time and is waiting for I/O 60% of the time.
What is the overall speedup gained by incorporating the enhancement?

Fractionenhanced = 0.4
Speedupenhanced = 10
1
Speedupoverall = 0.4 = 1.54
0.6 + 10

Power and Energy

Introduction
In today’s energy-sustained world, energy is considered the most significant
design aspect in every computer class design. Two main challenges arise from
this aspect:
• Power supply: Power must be efficiently transported in and distributed
around the chip.
• Cooling solutions: The dissipation of power as heat must be effectively
managed and removed.

10
System Architect Perspective
• Thermal Design Power (TDP): A metric that quantifies the maxi-
mum amount of heat generated through power consumption by a com-
puter component under normal operating conditions. Expressed in Watt.
Serves as a guideline for system designers to understand the amount of
heat dissipation that needs to be managed by the cooling system.
• Energy and Energy Efficiency: Power is energy per unit time: 1 watt
= 1 joule per second. Using energy as a metric is better since it is linked
to a specific task and the time needed to accomplish that task. The energy
to complete a workload is equal to the average power times the execution
time for the workload.

Energy and Power Within a Microprocessor

For CMOS chips, the energy consumption has been mostly occurring during
the transistor switching, also called dynamic energy. The energy required per
transistor of pulse of the logic transition of 0 → 1 → 0 or 1 → 0 → 1 is given
by:
Energydynamic = Capacitive load × voltage2

Remarks
• For a specific task, slowing the frequency reduces power, but not energy.
• The dynamic power and energy are reduced by lowering the voltage.
• The capacitive load consists in the number of transistors connected to
an output and the technology (i.e., the capacitance of the wires and the
transistors).
• The dynamic power is the primary source of power dissipation in CMOS,
however, static power is also an important issue because of leakage current
flows. The static power is given by:

Powerstatic = Currentstatic × Voltage

• The static power is proportional to the number of devices.

Energy, Power and Performance Enhancement

During the computer architecture evolution, the increase in the number of tran-
sistors and the frequency has dominated the decrease in load capacitance and
voltage, leading to an overall growth in power consumption and energy.

11
Examples
• First microprocessors consumed 1 watt.
• Intel Core i9-9900K 9th Gen consumes 95 watt (168.48 watt at full work-
load).

Consequences
• The limits of air cooling process are nearly reached.

• Decrease in the clock rates lead to a period of slow performance improve-

ment range.
• Distributing the power, removing the heat, and preventing hot spots have
become increasingly difficult challenges.

Methods for Improving Energy Efficiency

• Do nothing well: Consists in turning off the clock of inactive modules
to save energy and dynamic power. For example, if some cores are idle,
their clocks are stopped.

• Dynamic voltage-frequency scaling (DVFS): Consists in scaling down

the working voltage and/or frequency to use lower power and energy. For
example: energy saving mode in a laptop.
• Design for the typical case: Design components with energy saving
mode. For example: DRAM designed with a low power mode, disks that
have a mode that spins more slowly when unused to save power. However,
you cannot access DRAMs or disks in these modes, so you must return to
fully active mode to read or write.
• Overclocking (Ex Intel Turbo mode): Consists in a chip running at a
higher clock rate for a short time. For example: For single-threaded code,
the microprocessors can turn off all cores but one and run it faster.

Remarks
• In today’s microprocessor design, with so many transistors that they can-
not all be turned on at the same time: dark silicon phenomenon.

• The importance of power and energy has led to a new metric for evaluation:
tasks per joule or performance per watt rather than performance per mm2
of silicon as in the past.

12
Relative Energy Cost
• 8b Add: 0.03 pJ
• 16b Add: 0.05 pJ
• 32b Add: 0.1 pJ
• 16b FB Add: 0.4 pJ

• 32b FB Add: 0.9 pJ

• 8b Mult: 0.2 pJ
• 32b Mult: 3.1 pJ

• 16b FB Mult: 1.1 pJ

• 32b FB Mult: 3.7 pJ
• 32b SRAM Read 8KB: 5 pJ
• 32b DRAM Read: 640 pJ

Cs23402 - Computer Architecture - Unit - 1
No ratings yet
Cs23402 - Computer Architecture - Unit - 1
161 pages
RADS-At Student Training Guide
100% (10)
RADS-At Student Training Guide
58 pages
Computer Masti Book 7
No ratings yet
Computer Masti Book 7
142 pages
MUURS
100% (6)
MUURS
87 pages
Basic Vibration Seminar: ALPS Maintaineering Services, Inc
100% (1)
Basic Vibration Seminar: ALPS Maintaineering Services, Inc
101 pages
Module 2 (26-10-2024)
No ratings yet
Module 2 (26-10-2024)
50 pages
Performance
No ratings yet
Performance
51 pages
2 RISC V Performance ISA
No ratings yet
2 RISC V Performance ISA
72 pages
Lecture 02 CH01 Performance Power
No ratings yet
Lecture 02 CH01 Performance Power
76 pages
Tuning Forks
100% (1)
Tuning Forks
2 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
4 Performance
No ratings yet
4 Performance
67 pages
C A Lecture-3
No ratings yet
C A Lecture-3
41 pages
Computer Architecture Measurement
No ratings yet
Computer Architecture Measurement
26 pages
Chapter4 Performance
No ratings yet
Chapter4 Performance
36 pages
CSA Performance
No ratings yet
CSA Performance
40 pages
SEN307 Lecture 5
No ratings yet
SEN307 Lecture 5
34 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
Chapter 2-Part 12 1
No ratings yet
Chapter 2-Part 12 1
38 pages
Lect 1
No ratings yet
Lect 1
54 pages
ACA Lec2 New
No ratings yet
ACA Lec2 New
44 pages
Sony Hcd-Shake33 Hcd-Shake77 Ver1.0 SM
No ratings yet
Sony Hcd-Shake33 Hcd-Shake77 Ver1.0 SM
120 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
52 pages
Cse - 321 - 2
No ratings yet
Cse - 321 - 2
37 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
Lecture # 2
No ratings yet
Lecture # 2
33 pages
RF 7850s SPR Wideband Secure Personal Radio Ds
100% (1)
RF 7850s SPR Wideband Secure Personal Radio Ds
2 pages
Designing For Performance - Performance Metrics
No ratings yet
Designing For Performance - Performance Metrics
19 pages
Computer Performance
No ratings yet
Computer Performance
18 pages
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
No ratings yet
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
23 pages
4 Perfrmance
No ratings yet
4 Perfrmance
30 pages
Module 3.3 - Problems On Performance
No ratings yet
Module 3.3 - Problems On Performance
54 pages
AAU5336 Technical Specifications (V100R021C10 - 01) (PDF) - en
No ratings yet
AAU5336 Technical Specifications (V100R021C10 - 01) (PDF) - en
19 pages
Assessing and Understanding Performance
No ratings yet
Assessing and Understanding Performance
31 pages
Week 2 - Lecture 2 - Performance Measurement
No ratings yet
Week 2 - Lecture 2 - Performance Measurement
25 pages
Lesson 3 - Computing For Performance
No ratings yet
Lesson 3 - Computing For Performance
38 pages
05 Performance
No ratings yet
05 Performance
16 pages
Computer Performance
No ratings yet
Computer Performance
22 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
17 pages
The Role of Performance: Chapter - 2
No ratings yet
The Role of Performance: Chapter - 2
40 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
Performance Matrices
No ratings yet
Performance Matrices
14 pages
Week 13 14 - Performance Evaluation
No ratings yet
Week 13 14 - Performance Evaluation
19 pages
Lect 1
No ratings yet
Lect 1
56 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
52 pages
Computer Organization The Role of Performance
No ratings yet
Computer Organization The Role of Performance
45 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
Performance Measures
No ratings yet
Performance Measures
25 pages
Cs2100 14 Understanding Performance
No ratings yet
Cs2100 14 Understanding Performance
46 pages
Performance
No ratings yet
Performance
12 pages
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
No ratings yet
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
28 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
Co Unit1 Part3
No ratings yet
Co Unit1 Part3
11 pages
C Arm - Training Manual
No ratings yet
C Arm - Training Manual
131 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
IT401 Computer Organization and Architecture: Prasun Ghosal
No ratings yet
IT401 Computer Organization and Architecture: Prasun Ghosal
30 pages
Inroduction and Performance Analysis
No ratings yet
Inroduction and Performance Analysis
29 pages
CH 02a-Computer Performance
No ratings yet
CH 02a-Computer Performance
22 pages
Comp Org Notes On Measuring Cpu Performance
No ratings yet
Comp Org Notes On Measuring Cpu Performance
4 pages
Defining Performance
No ratings yet
Defining Performance
6 pages
Huawei RTN 950 VS. RTN 950A VS. RTN 980 VS. RTN 905 2F
No ratings yet
Huawei RTN 950 VS. RTN 950A VS. RTN 980 VS. RTN 905 2F
5 pages
Lectures 1-2 Time-Domain Characterization of LTI Discrete-Time Systems
No ratings yet
Lectures 1-2 Time-Domain Characterization of LTI Discrete-Time Systems
205 pages
Scsa4002 5GNetworks
No ratings yet
Scsa4002 5GNetworks
1 page
Hfe Philips Az1500 1505 Service en
No ratings yet
Hfe Philips Az1500 1505 Service en
34 pages
K3H K3MA Compressed
No ratings yet
K3H K3MA Compressed
112 pages
Technical - UMA - 01 - DM2-1015 PDF
No ratings yet
Technical - UMA - 01 - DM2-1015 PDF
7 pages
DX 394 Alignment
No ratings yet
DX 394 Alignment
6 pages
Flying Start
No ratings yet
Flying Start
5 pages
and 330425 Accelerometer Acceleration Transducers - Datasheet - 141638 - Cda - 000 PDF
No ratings yet
and 330425 Accelerometer Acceleration Transducers - Datasheet - 141638 - Cda - 000 PDF
8 pages
Delay
No ratings yet
Delay
7 pages
4G 700
No ratings yet
4G 700
5 pages
1-Basics of Building Acoustics
No ratings yet
1-Basics of Building Acoustics
38 pages
Toshiba Satellite C850 - User Guide
No ratings yet
Toshiba Satellite C850 - User Guide
125 pages
Paradise Datacom GaN Compact-Outdoor SSPA 209555 RevN
No ratings yet
Paradise Datacom GaN Compact-Outdoor SSPA 209555 RevN
9 pages
Bird Manual de Uso
No ratings yet
Bird Manual de Uso
52 pages
Mem Microproject
No ratings yet
Mem Microproject
14 pages
Radio Theory Q
No ratings yet
Radio Theory Q
4 pages
1highton Rugged Phones Pricelist-Highton-20191111
No ratings yet
1highton Rugged Phones Pricelist-Highton-20191111
3 pages
1784 RPM Measurement Techniques Technical Note
No ratings yet
1784 RPM Measurement Techniques Technical Note
4 pages
Narda Power Monitors
No ratings yet
Narda Power Monitors
4 pages
IGNOU Operating System Previous Years Solved Papers
From Everand
IGNOU Operating System Previous Years Solved Papers
Manish Soni
No ratings yet

Da Ci

Uploaded by

Da Ci

Uploaded by

Chapter 2: Comprehensive Performance

Assessment Across Architectures

Level of Software Compatibility

Time & Computer: The Clock System

CPU Time (Execution Time): The Processor Per-

• Clock Cycle Time (period): This is the duration of a single clock

As this formula demonstrates, processor performance is dependent upon

The Overall CPI or the Global CPI

• Frequency of floating point (FP) operations: 25%

Flaws and Limitations

• Modification of the source code of the benchmarks:

Better Benchmarking Solution: Benchmark Suites

• Graphics and Workstation: SPECviewperf12, SPECvpe V2.0, SPECapeSM

• Java client/server: SPECjbb2015

Reporting Performance Results

SPEC Results Comparison: SPECRatio

In the case of SPEC, samplei is the SPECRatio for program i.

Why Use Geometric Mean

Performance Enhancement: Amdahl’s Law

The New Enhanced Execution Time

The overall speedup is given by:

Example: Amdahl’s Law

Power and Energy

Energy and Power Within a Microprocessor

Powerstatic = Currentstatic × Voltage

• The static power is proportional to the number of devices.

Energy, Power and Performance Enhancement

• Decrease in the clock rates lead to a period of slow performance improve-

Methods for Improving Energy Efficiency

• Dynamic voltage-frequency scaling (DVFS): Consists in scaling down

• 32b FB Add: 0.9 pJ

• 16b FB Mult: 1.1 pJ

You might also like