0% found this document useful (0 votes)
122 views151 pages

Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu

This document provides an introduction to advanced computer architecture. It discusses improvements in computer performance from 1945-2002 driven by both technological advances and architectural innovations. It describes the dominance of microprocessor-based computers and how instruction level parallelism has given way to thread level parallelism and data level parallelism. The document also characterizes desktop computers, servers, and embedded systems and discusses levels of computer design.

Uploaded by

rahilshah100
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views151 pages

Advanced Computer Architecture ECE 6373: Pauline Markenscoff N320 Engineering Building 1 E-Mail: Markenscoff@uh - Edu

This document provides an introduction to advanced computer architecture. It discusses improvements in computer performance from 1945-2002 driven by both technological advances and architectural innovations. It describes the dominance of microprocessor-based computers and how instruction level parallelism has given way to thread level parallelism and data level parallelism. The document also characterizes desktop computers, servers, and embedded systems and discusses levels of computer design.

Uploaded by

rahilshah100
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 151

Advanced

Computer Architecture
ECE 6373
Pauline Markenscoff
N320 Engineering Building 1
E-mail: [email protected]

1
Introduction
Improvements in Computer Performance

• Advances in Technology

• Innovations in Computer Design

2
Introduction
1945-1970:
• Both forces contributed to performance improvements
(25% to 30% /year)

Early 70’s:
• Emergence of Microprocessors
• Minicomputers
• Mainframes
• Performance improvements are due mainly to
improvements in technology (35% /year)

3
Introduction
Early 80’s: New Set of Architectures

• Reduced need for object code compatibility (due to


elimination of assembly language programming).

• Creation of standardized, vendor-independent OS (Unix,


Linux) lowered the cost of bringing out a new technology.

• Use of microprocessor technology

4
Introduction
• RISC (Reduced Instruction Set Architectures)
• Instruction Level Parallelism
- Pipelining
- Multiple Instruction Issue
• Cache Organizations
- Simple
- More Sophisticated

Mid-80s:
• Higher performance growth rates (over 50%/year)

5
Highest performance microprocessors outperform
the supercomputer of less than 10 years ago.

A PC that costs less than a thousand dollars has


more performance, more main memory and more
disk storage than a computer bought in 1980 for a
million dollars.

6
Dominance of
microprocessor-based computers
• Minicomputers have been replaced by servers made
using microprocessors.

• Mainframes have been replaced by multiprocessors


consisting of small numbers of off-the-shelf
microprocessors.

• High-end supercomputers are being built with


collections of microprocessors.

7
Older architectures, such as the x86 (or IA-32) adopted
many of the innovations of the RISC designs.

Front-end processor interprets the x86 instructions and


maps them into operations that can be executed by
a RISC-style pipelined processor.

As transistor counts soared in the 90’s the hardware overhead


of translating x86 instructions became negligible.

8
From mid-80s to 2002
• A renaissance in computer design
(performance improvements 52%/year)based on
- Both architectural innovation and
- Efficient use of technology improvements.

By 2002 high performance microprocessors were about seven


times faster than would have been by relying on technology
alone.

9
Since 2002 processor performance improvement has
dropped to about 20% per year

• Maximum Power Dissipation

• Little Instruction Level Parallelism left to exploit

• Unchanged Memory Latency

10
 In 2004 Intel canceled its high performance
uniprocessor project
 Road to higher performance would be via
• Multiple processors per chip
and not via
• Faster uniprocessors

11
Swift from relying solely on

• Instruction Level Parallelism (ILP)


to

• Thread Level Parallelism (TLP) and

• Data Level Parallelism (DLP)

12
Instruction Level Parallelism (ILP)
• exploited implicitly by compiler and hardware
(no need for programmer’s attention).

Thread Level Parallelism (TLP) and Data Level


Parallelism (DLP)
• Are explicitly parallel requiring the programmer to write
parallel code to gain performance.

13
Growth in processor performance
Fig. 1.1

14
Changing Face of computing
1960’s: Mainframes

1970’s: Minicomputers

Supercomputers

1980’s: Desktop Computers

1990’s: Internet and the World Wide Web

15
Three classes of computing systems

 Desktop Computers

 Servers

 Embedded Computers

16
Characteristics of the three
computing classes
Fig. 1.2

17
Desktop Computers

Desktop Computers in the 1980s replaced


time-sharing:

• Personal Computers

• Workstations

18
Desktop Computers

 Largest market in dollar terms

 Low-end PC (under $500)

 Workstations (over $5,000)

 Optimize for price performance (price matters)

19
Servers
Provide large scale services such as

• reliable long term file storage access

• larger memory and

• more computing power.

20
Servers

• Characteristics:
• Dependability
• Scalability
• Memory
• Storage
• I/O Bandwidth

• Throughput (how many requests can be handled in a unit of time)


• Transactions per minute
• Web pages served per second

• Responsiveness

21
Supercomputers
 Cost tens of millions of dollars

 Emphasize FP performance

 Most often clusters of desktop computers

22
Availability vs. Reliability

• Reliability: system never fails

• Availability: the system can reliably and effectively


provide service, even if some component(s) fails.

• Cost of a system not being available can be very high.

23
Cost of unavailability

Fig. 1.3

24
Embedded Computers

They are in other devices and their presence is not


immediately obvious.
• Microwaves, washing machines, cars, cell phones,
networking switches, etc.

• user programmable or
• the only programming occurs with the initial loading of the
application.

25
Embedded Computers

Widest range of processing power and cost

• 8-bit and 16-bit processors that cost less than a dime

• 32-bit, 100MIPS processors that cost less than $5

• High-end processors that execute a billion instructions per second and


cost $100 (processors for video games and network switches)

26
Embedded Computers

Price is the key factor

Meet performance at a minimum cost rather than


achieving higher performance at a higher price.

Often performance is a real-time requirement

27
Embedded Computers
What is real-time performance requirement?

Hard real-time:
• Absolute max. execution time allowed for a segment of the
application.

Soft real-time:
• Av. time of a particular task is constrained, as well as the
number of instances when max. time is exceeded.

28
Embedded Computers
Other requirements for some embedded computers:

• Minimize memory
- Sometimes memory totally on processor chip, other times on a small off-
chip chip
- Emphasis on code size (data size dictated by the application)
• Minimize power
- (Use of batteries, less expensive packaging, absence of a cooling fan)

29
Embedded Computers
Approaches for the design of embedded systems:

 Custom software running on an off-the shelf embedded


processor.
 The designer uses a digital signal processor (DSP) and custom software.
• DSPs are processors for signal-processing applications.

 Combined hardware/software solution that includes some


custom hardware on an embedded processor core that is
integrated with the custom hardware, often on the chip.

30
Levels of Computer Design

 Instruction set architecture


(programmer visible instruction set)
 Functional organization
 Hardware
• Logic Design
• IC design, packaging, power and cooling

31
Instruction Set Architecture (ISA)
 Programmer’s visible instruction set

 Boundary between software and hardware

32
Class of ISA
 General-purpose Register Architectures
• Operands are either registers or memory locations

• Register-Memory Architectures
- 80x86 has 16 general purpose registers and 16 floating point
registers

• Load-Store Architectures
- 32 general purpose registers and 32 floating point registers

33
Memory Addressing
To access memory operands
• Byte addressing (virtually all desktop and server computers)

Memory Alignment

• An object of size s bytes at byte address A is aligned if


A mod s = 0.

• MIPS requires alignment.

• 80x86 does not require alignment, but accesses are faster if operands are aligned.

34
Addressing Modes
MIPS
• Register
• Immediate (constants)
• Displacement (a constant offset is added to a register to form the memory address)
80x86
• 3 variations for Displacement
- No register (absolute)
- Two registers (based indexed with displacement)
- Two registers where one register is multiplied by the size of the operand in bytes (based
with scaled index and displacement)
• 3 variations without Displacement
- Register indirect
- Indexed
- Based with scaled index

35
Types and Sizes of Operands
MIPS and 80x86 support
• 8-bit (ASCII character)
• 16 bit (Unicode character or half word)
• 32 bit (integer or word)
• 64-bit (double word or long integer)
• IEEE 754 floating point
- 32 bit (single precision)
- 64 bit (double precision)
• 80x86 also supports 80 bit floating point (extended double
precision)

36
Operations

• Data transfer
• Arithmetic-Logical
• Floating Point
• Control

 MIPS simple and easy to pipeline instruction set

 80x86 has a much richer instruction set

37
Control Flow Instructions
All ISAs (including MIPS and 80x86) provide support for
• Conditional Branches
• Unconditional Jumps
• Procedure Calls and Returns
 PC-relative addressing
• Branch address is specified by an address field that is added to the PC
 MIPS conditional branches test the contents of registers
 80x86 branches test condition code bits set as side effects of previous
arithmetic/logic operations
 MIPS procedure call (JAL) places the return address in a register
 80x86 procedure call (CALLF) places the return address on a stack in
memory

38
Encoding an ISA
 Fixed Length versus Variable Length
• Fixed Length encoding simplifies decoding of instructions
• Variable length instructions take less space than fixed length instructions

• MIPS uses fixed length encoding (all instructions are32 bits).


• 80x86 uses variable length encoding (instruction lengths vary from 1 to 18
bytes).
 Number of registers and number of addressing modes have a significant
impact on the size of instructions

39
Subset of the
Instructions in
MIPS64
Fig. 1.5

40
MIPS64 instruction set architecture format
Fig. 1.6

41
Functional Organization

 High-level aspects of Computer Design

• Memory System

• Memory Interconnect

• Design of CPU

42
Two machines can have
• the same instruction set but
• different functional organizations.
- AMD Opteron 64 and Intel Pentium 4
- The embedded processors NEC VR 5432 and NEC VR 4122)

Two machines can have


• the same instruction set
• the same functional organization but
• different implementations.
• Ex: Pentium 4 and the Mobile Pentium 4 (suitable for low-end computers) are
nearly identical, but offer different clock rates and different memory systems.

43
The task of the computer designer

 Meet functional requirements


• The functional requirements may be specific features inspired
by the market.

 Optimize design
• Maximize performance while meeting cost, power and
availability constraints.
• Requires familiarity with a wide range of technologies, from
compilers and OS to logic design and packaging.

44
Summary of some of the most important
functional requirements
Fig. 1.7

45
A computer designer must follow

• Technology trends

• Cost trends

These trends affect future cost and the longevity of an


architecture.

46
Technology trends
 A successful architecture must be designed to survive
rapid changes in technology
• Core of the IBM mainframe (in use for more than 40 years)

 Designer must be aware of rapid changes in


implementation technologies.

47
Technology trends
 Implementation technologies

• Integrated circuit logic

• Semiconductor DRAM

• Magnetic Disk

• Network

48
Technology trends
Integrated Circuit Technology:

• Transistor density increases by about 35% per year

• Die size increases 10% to 20% per year

• Transistor count increases 40%-55% per year

49
Technology trends
Semiconductor DRAM

• Density increases by about 40% per year doubling every


two years.

50
Technology trends

Magnetic Disk Technology

• Prior to 1990, density increased by about 30% per year.

• From 1990 to 1996 density increased to 60% per year.


- Change from inductive heads to thin film

• From 1996 to 2004 density increased to 100% per year.


- Giant magnetoresistive effect heads

• Since 2004 density increases dropped back to 30% per year.

51
Technology trends
Network technology

• Network performance depends on both the performance of


switches and the performance of the transmission medium.

52
Technology trends

 The design of a microprocessor may with speed and


technology enhancements have a lifetime of 5 or
more years.

 Designers often design for the next technology.

 Cost decreased at about the rate at which density


increased.

53
Technology trends
 Technology Thresholds
• Although technology improves fairly continuously, the
impact of the technology improvements can be seen in
discrete steps
• Example:
- When MOS technology reached the point of 25,000-50,000
transistors on a chip in the early 1980s, it became possible to
build a 32-bit microprocessor.
- By the late 1980s, first level caches could go on the chip.

54
Technology trends

By eliminating chip crossings within the processor


and between the processor and the cache, a dramatic
increase in performance was possible.

55
Performance Trends:
Bandwidth over Latency
Bandwidth or Throughput
• Total amount of work done in a given time
- Ex: Megabytes per second for a disk transfer

Latency or Response Time


• Time between the start and the completion of an event
- Ex:Milliseconds for disk access

56
Log-Log Plot of Bandwidth and Latency
Rule of Thumb:
Bandwidth grows by at least the
Fig. 1.8 square of the improvement in
latency.

Microprocessors and Networks:


1000-2000X improvements in
bandwidth
20-40X improvements in latency.

Bandwidth has outpaced latency across these technologies


57
Performance milestones for
microprocessors, memory, networks, and
disks

Fig. 1.9

Capacity is more important


than performance for
memory and disks.

58
Scaling of Transistor Performance
Feature size: Minimum size of a transistor or a wire in either
the x or y dimension.
• Decrease from 10 microns in 1971 to 0.09 microns (90 nanometers) in
2006.
• 65 nanometers are underway.

The density of transistors increases quadratically with a linear


decrease in feature size
(since transistor count per square millimeter of silicon is
determined by the surface area of a transistor).

59
Scaling of Transistor Performance
 As feature sizes shrink, devices shrink quadratically in the
horizontal dimension and also in the vertical dimension.

 The shrink in the vertical direction requires a reduction in


operating voltage to maintain correct operation and reliability.

 Transistor performance improves linearly with decreasing


feature size.

60
Scaling of Transistor Performance

Density improvements made it possible to move


quickly from 4-bit to 8-bit, to 16-bit, to 32-bit and
more recently to 64-bit microprocessors as well as
include pipelining, caches, etc.

61
Scaling of Transistor Performance
 Transistors generally improve in performance with decreased
feature size.

 The performance of wires in an IC circuit does not improve with


decreased feature size.

 As feature size shrinks, wires get shorter, but the resistance and
capacitance per unit length get worse.

 The signal delay for a wire increases in proportion to the product of


its resistance and capacitance.

62
Resistance and capacitance depend on
• Detailed aspects of the process
• Geometry of a wire
• Loading on a wire
• Adjacency to other structures

Occasionally performance enhancements


• Introduction of copper

63
Scaling of Transistor Performance

Wire delay has become a major design limitation for large ICs.

It is often more critical than transistor switching delay.

Larger and larger fractions of the clock cycle have been


consumed by the propagation delay of signals on wires.

64
Scaling of Transistor Performance

Interconnectors have replaced transistors as the main


determinant of chip performance.

“Beyond Moore’s Law: The Interconnect Era”


IEEE Computing in Science and Engineering,
Jan. Feb. 2003.

65
Scaling of Transistor Performance
In 2001 the Pentium 4
• Allocated 2 stages of its 20+ pipeline just for propagating
signals across the chip.

66
Trends in Power in Integrated Circuits

• Power provides challenges as devices are scaled.

• Power must be brought in and distributed around the chip.


- Hundreds of pins and multiple interconnect layers for just power
and ground.

• Power is dissipated as heat and must be removed.

67
Trends in Power in Integrated Circuits

 For CMOS microprocessors the dominant energy consumption


is in switching transistors (Dynamic Power).

 The required energy is proportional to the product of


• load capacitance of the transistor
• frequency of switching and
• square of the voltage.

Power dynamic=1/2 * Capacitive Load * Voltage2 * Frequency switched

68
Mobile devices care about battery life than power, so
energy (measured in joules) is the proper metric:

Energy dynamic = Capacitive load * Voltage2

69
Power dynamic=
1/2 * Capacitive Load * Voltage 2 * Frequency switched

Energy dynamic = Capacitive load * Voltage2

Dynamic power and energy are greatly reduced by lowering the voltage.
• Voltages have dropped from 5V to just over 1V in 20 years.

Slowing clock rate reduces power but not energy.

The capacitive load is a function of the number of transistors connected to an output


and the technology which determines the capacitance of the wires and the transistors.

70
Example
Some microprocessors today are designed to have
adjustable voltage
• A 15% reduction in voltage may result in a 15% reduction
in frequency
What would the impact be on dynamic power?

Powernew (Voltage  0.85)2  (Frequency switched  0.85)


  0.853
 0.61
Powerold Voltage  Frequency switched
2

The 15% reduction in voltage reduces power to 60% of the original.

71
Trends in Power in Integrated Circuits
Power dynamic= 1/2 * Capacitive Load * Voltage2 * Frequency switched

Energy dynamic = Capacitive load * Voltage2

The increase in the number of transistors switching and the


frequency with which they switch, dominates the decrease in
load capacitance and voltage, leading to an increase in the
power consumption.

72
Trends in Power in Integrated Circuits
• The first microprocessor consumed tenths of a watt.
• A 3.2 GHz Pentium 4 Extreme Edition consumes 135 watts.

• The heat must be dissipated from a chip that is about 1 cm on a side.


• We are reaching the limits of what can be cooled by air.
• Several Intel microprocessors have temperature diodes to reduce
activity automatically if the chip gets too hot.
- They may reduce voltage, clock frequency or the instruction issue rate.

73
Trends in Power in Integrated Circuits

Power is becoming a major limitation.

 Distributing the power, removing the heat, and


preventing hot spots have become increasing difficult
challenges.

74
Static Power:
• Due to leakage current which flows even when a transistor
is off.
Powerstatic = Current static * Voltage

Increasing the number of transistors increases power even


if they are turned off, and leakage current increases in
processors with smaller transistor sizes.

75
Goal for static power :
• 25% of the total power consumption

In high performance designs static power far exceeds that


goal.

To meet this goal achieve high performance using multiple


processors on a chip running at lower voltages and clock rates.

76
Trends in Cost

 Super computers
• Design for performance, cost tends to be less important

 Cost-sensitive designs are of growing significance.

 Understanding of cost and its factors is essential for designers


• Make decisions about whether or not a new feature should be included
in designs where cost is an issue.

77
The impact of Time, Volume and
Commodification on Cost

78
The impact of Time on Cost
Learning Curve:
Manufacturing costs decrease over time even without major
improvements in the basic implementation technology because
of increase in the yield.

• Yield: Percentage of manufactured devices that pass the testing


procedure.

• Whether it is a chip, a board, or a system


- Designs that have twice the yield will have half the cost.

79
The impact of Time on Cost

• Price per megabyte of DRAM dropped by 40% per year.

• DRAMs tend to be priced close to the cost with the exception of


periods when there is a shortage or an oversupply.

• Microprocessor prices drop over time but because they are less
standardized than DRAMs relationship between price and cost is more
complex.

80
Fig. 1.10

The price of an Intel Pentium 4 and Pentium M at a given frequency decreases over time as
yield enhancements decrease the cost of a good die and competition forces price reductions.

81
The impact of Volume on Cost
Volume: A second key factor in determining cost.

• It decreases the time needed to increase the yield.


• It increases purchasing and manufacturing efficiency
• It decreases the amount of development cost that must be amortized by
each machine, thus allowing cost and selling price to be closer.

As a rule of thumb the cost decreases about 10% for each


doubling of the volume.

82
The impact of Commodification on Cost

Commodities: Products that are sold by multiple


vendors in large volumes and are essentially identical.

• DRAMs, disks, monitors, keyboards.

• In the past 15 years low-end of PCs

83
The impact of Commodification on Cost
 In Commodity market cost decreases because of
• Volume
• A clear product definition that allows multiple suppliers to compete.

 Competition decreases gap between cost and selling price.

 Tremendous price pressure.

 Very limited profits per unit.

84
Cost of an Integrated Circuit

85
Cost of an Integrated Circuit
 IC costs are becoming a greater portion of the cost of
the system.
 A wafer is tested and chopped into dies that are
packaged.
 The cost of a packaged IC is:
Cost of integrated circuit =

Cost of die + Cost of testing die + Cost of packaging and final test
Final test yield
86
Cost of an Integrated Circuit

Number of good chips per wafer


• How many dies fit on a wafer

• Percentage of those that are good (Die yield)

Cost of wafer
Cost of die =
Dies per wafer  Die yield

87
“The square peg in a round hole” problem
• Rectangular dies near the periphery of round wafers

Fig. 1.12

88
Cost of an Integrated Circuit

Dies per wafer =


  ( Wafer diameter 2)   Wafer diameter
2

Die Area 2  Die Area
The second term compensates for
“the square peg in a round hole” problem;

Dividing the circumference d by the diagonal of a square die is


approximately the number of dies along the edge.

89
Cost of an Integrated Circuit
Example:

Find the number of dies per 300 mm (30cm) wafer for a die that
is 1.5 cm on a side.

Die area: 2.25 cm2

Dies per wafer =


  (30/2)2   30 706.9 94.2
    270
2.25 2  2.25 2.25 2.12

90
Cost of an Integrated Circuit
Die Yield:
• What is the percentage of good dies on a wafer?
Empirical Model of IC Yield:
• It assumes defects are randomly distributed over the wafer
and that yield is inversely proportional to the complexity of
the fabrication process.

Die yield =
Defects per unit area  Die area  a
Wafer yield  (1 )
a
91
Cost of an Integrated Circuit
Wafer yield: Percentage of good wafers
• We will assume that wafer yield is 100%

Defects per unit area depend on the maturity of the


manufacturing process.

• In 2006 this value is typically 0.4 defects per cm2 for 90nm technology
- Depends on the maturity of the process.

92
Cost of an Integrated Circuit

a is a parameter that depends on

• number of masking levels


(a measure of manufacturing complexity, critical to die yield)

• For today’s multilevel metal CMOS processes a good estimate is


a =4.0

93
Cost of an Integrated Circuit
Assume a defect density of 0.4 per cm2 and a = 4.0.

Die yields for dies that are 1.5 cm on a side:

0.4  2.25 4
Die yield = (1 + )  0.44
4.0
Die yields for dies that are 1 cm on a side:
0.4  1.00 4
Die yield = (1 + )  0.68
4.0
94
Cost of an Integrated Circuit

How many good dies per wafer?


Number of good dies per wafer=Number of dies* Die yield

From a 300 mm (30 cm) wafer


• 120 good 2.25 cm2 dies
• 435 good 0.49 cm2 dies

95
Cost of an Integrated Circuit
 Die size of most 32-bit and 64-bit microprocessors
processors (in a 90 nm technology):
• Between 0.49cm2 and 2.25 cm2.

 Die size of embedded processors:


• Low-end 32-bit processors are sometimes as small as 0.25
cm2.
• Processors used for embedded control are less than 0.1
cm2.

96
Cost of an Integrated Circuit
Tremendous pressures to lower costs in DRAMs and
SRAMs.

Include redundancy to raise yield


• so that a certain number of flaws can be accommodated.

97
Cost of an Integrated Circuit
Processing a 300 mm (12 inch) diameter wafer in a leading
technology with 4 to 6 metal layers had a cost of between
$5000 and $6000 in 2006.

Cost of wafer
Cost of die =
Dies per wafer  Die yield
Assuming a wafer cost of $5, 500
• Cost of a 2.25 cm2 die : $ 46
• Cost of a 1 cm2 die : $ 13

98
Cost of an Integrated Circuit
Cost of wafer
Cost of die =
Dies per wafer  Die yield

Dies per wafer =


  (Wafer diameter 2)2   Wafer diameter

Die Area 2  Die Area

Die yield =
Defects per unit area Die area 
Wafer yield  (1 )

99
Cost of an Integrated Circuit
The manufacturing process dictates
• Wafer cost
• Wafer yield
• Defects per unit area

100
Cost of an Integrated Circuit
Computer designer controls
• Die area
Hence he affects cost by
 What functions are included on or excluded from the
die and
 The number of I/O pins.

101
Cost of an Integrated Circuit
Cost of High Volume ICs:
• Variable cost of producing a functional die.

Cost of Low Volume ICs (less than a million units):


• Cost of a mask set (modern processes have 4 to 6 layers)
- This large fixed cost is a significant part of the production cost
- Mask costs exceed $1,000,000
• Cost of masks is likely to continue to increase, so designers may use
reconfigurable logic or gate arrays that have fewer custom mask levels.

102
Cost versus Price
 Margin between the cost of manufacturing a product and the
price a product sells has been shrinking.
 These margins pay for
• R&D
• Marketing
• Sales
• Manufacturing Equipment Maintenance
• Building Rental
• Cost of Financing
• Pretax Profits and Taxes

103
Cost vs. Price

 R&D spending in commodity PC business: 4%

 R&D spending in high-end server business:12%

 Companies with R&D spending of 15-20% do poorly.

104
Dependability
 Integrated circuits were one of the most reliable components
of a computer (error rate inside the chip was very low) but this
is changing as we head to feature sizes of 65 nm and smaller.

 Difficult question is deciding when a system is operating


correctly.

 Service Level Agreement (SLA) or Service Level Objective


(SLO): pay the customer a penalty if infrastructure providers
did not meet an agreement more than some hours per month.

105
Dependability
Systems alternate between two states of service with respect
to a SLA:

1. Service Accomplishment: service is delivered as specified to a SLA.

2. Service Interruption: the delivered service is different from the SLA.

Transitions from state 1 to state 2 are caused by failures.

Transitions from state 2 to state 1 are caused by restorations.

106
Dependability
Module reliability: Measure of the continuous service
accomplishment (time to failure).

MTTF (Mean Time to Failure): Reciprocal is a rate of failures.

MTTR(Mean Time to Repair): Measures service interruption

MTBF (Mean Time between Failures) = MTTF + MTTR

107
Dependability
If a collection of modules has exponentially
distributed lifetimes (i.e., the age of a module is not
important in probability of failure), the overall failure
rate of the collection is the sum of the failure rates of
the modules.

108
Dependability
Module Availability: Measure of service
accomplishment.
For nonredundant systems with repair:

MTTF
Module availability =
(MTTF + MTTR)

109
Dependability
Assumption: failures are independent

Estimate reliability and availability of system from


reliability of components:

Failure rate of the system is equal to the sum of the


failure rates of all the components.

110
Example 1
Assume a disk with the following components and
MTTF:
• 10 disk, each rated at 1,000,000 MTTF
• 1 SCSI controller, 5000,000-hour MTTF
• 1 power supply, 200,000 MTTF
• 1 fan, 200,000 MTTF
• 1 SCSI cable, 1,000,000-hour MTTF

111
Example 1
Failure rate of the system is equal to the sum of the
failure rates of all the components:
1 1 1 1 1
Failure rate system = 10     
1, 000, 000 500, 000 200, 000 200, 000 1, 000, 000

10 + 2 + 5 + 5 +1 23 23, 000
=  
1, 000, 000 hours 1, 000, 000 1, 000, 000 hours

1 1, 000, 000 hours


MTTF system =   43, 500 hours
Failure rate system 23, 000

112

Redundancy
Primary way to cope with failure:

• Redundancy in time
- Repeat the operation to see if it is still erroneous

• Redundancy in resources
- Once the component is replaced, the dependability of the system is
assumed to be as good as new.

113
Example 2
Disk subsystems often have redundant power supplies to improve
dependability.

• Assume one power supply is sufficient to run the disk subsystem and that
we are adding one redundant power supply.

• MTTF for two power supplies is the mean time until one power supply fails
divided by the chance that the other will fail before the first one is replaced.

MTTF power supply /2 MTTF 2 power supply


MTTF power supply pair = 
MTTR power supply 2  MTTR power supply
MTTF power supply
114
Example 2
Assume it takes 24 hours for a human operator to notice that a power
supply has failed and replace it (MTTR = 24):

MTTF 2 power supply 200,000 2


MTTF power supply pair    830,000,000
2  MTTR power supply 2  24

The pair is 4150 times more reliable (830,000,000/ 200,000=4150)


than the single power supply!


115
Quantitative Approach to Computer Design

 Empirical observations of programs

 Experimentation

 Simulation

116
Measuring, Reporting and
Summarizing Performance

117
Measuring Performance
A computer user interested in
Response time:
Time between the start and the completion of a job.

A manager of a processing center interested in


Throughput:
Total number of jobs done per unit of time.

118
Measuring Performance
Computer X is n times faster than Y:

Execution time Y
n
Execution time X

Execution time is the reciprocal of the performance:




1
Execution Time Y Performance Y Performance X
n  
Execution Time X 1 Performance Y
Performance X
119
Measuring Performance

Performance and execution time are reciprocals.


Increasing performance, decreases execution time.

• “Improve performance” means “Increase performance”.


• “Improve execution time” means “Decrease execution
time”.

120
Measuring Performance

Performance is measured by the execution time of


real programs.

The computer that performs the same amount of work


in the least time is the fastest.
• Response time (when we measure one task)
• Throughput (when we measure many tasks)

121
Measuring Performance
Execution time can be defined in different ways:
• Wall-clock time
• Response time or Elapsed Time
- Latency to complete a task, including disk accesses, memory accesses,
input/output activities, operating system overhead.
- In multiprogramming the processor works on another program while waiting
for I/O and may not necessarily minimize the elapsed time of one program.
• CPU Time
- Time the processor is computing;
Does not include the time waiting for I/O or running other programs.
• Response time seen by the user is the elapsed time of the program, not
the CPU time.

122
Comparing Performance
Comparing performance of computers is not easy!

Even if one agrees on the

• Programs

• experimental environments

• definition of faster

123
Comparing Performance

A is 10 times faster than B for program P1. B is 10 times faster than A for program P2.

B is 2 times faster than C for program P1. C is 5 times faster than B for program P2.

A is 20 times faster than C for program P1. C is 50 times faster than A for program P2.

124
Execution times of two programs
on three machines

Assume running programs P1 and P2 on each machine an equal number of times:


B is 9.1 (1001/110) times faster than A for programs P1 and P2.
C is 25 (1001/40) times faster than A for programs P1 and P2.
C is 2.75 (110/40) times faster than B for programs P1 and P2.

125
Comparing Performance

Computer users who routinely run the same programs


would simply compare the execution time of their
workloads-the mixture of programs and OS
commands that users run on a computer.

126
Choosing Programs to
Evaluate Performance

Ideally, users would use their workload, but usually


they rely on benchmarks, hoping that these will
predict performance for their usage of the new
machine.
Benchmarks: collection(suite) of programs that tries
to measure performance of processors with a variety
of applications.

127
Choosing Programs to
Evaluate Performance

Best Choice: Real applications

Simpler programs than real applications that have been used


as benchmarks:
• Kernels
• Toy benchmarks
• Synthetic benchmarks

128
Choosing Programs to
Evaluate Performance
 Kernels
• Small key pieces of code from real programs that are used to evaluate performance.
- For example, “Livermore Loops” and Linpack.
- They were used to isolate performance of individual features of a machine to explain for differences in
performance of real programs.
 Toy benchmarks
• They are typically between 10 and 100 lines of code and produce a result the user already knows
before running the program.
- Examples: Quicksort, Sieve of Eratosthenes, Puzzle etc.
 Synthetic benchmarks
• Fake programs invented to try to match the profile and behavior of real applications, such as
Dhrystone.

129
Choosing Programs to
Evaluate Performance
All three are discredited today.
Attempts at using those have led to performance
pitfalls.
• The compiler writer and architect can conspire to make the
computer appear faster on these than on real applications.

130
Benchmark Suites

 Popular measure of performance of processors:


• Benchmark suites (collection of benchmarks)

 Key advantage: weakness of any one benchmark is lessened by the


presence of the other benchmarks

 Goal: A benchmark suite will characterize the relative performance


of two computers, particularly for programs not in the suite that
customers are likely to run.

131
Benchmark Suites

SPEC (Standard Performance Evaluation Corporation) is a


company that develops benchmark suites for different
application classes for workstations (since the 1980s).

SPEC benchmark suites are documented together with


reported results at www.spec.org

132
Desktop Benchmarks
• CPU-intensive benchmarks:

- SPEC89,SPEC92,SPEC95,SPEC2000, SPEC2006 (5 generations)

- SPEC2006:
• 12 integer programs
• 17 FP programs

- Real programs that are portable and vary from a C compiler to a chess program to
a quantum computer simulation.
- Useful for processor benchmarking for both desktop and single-processor servers.

• Graphics-intensive benchmarks:

133
Evolution of the SPEC benchmarks
Fig. 13

Integer programs are above the line


Floating-point programs are below the line.

134
Server Benchmarks
SPEC CPU throughput-oriented benchmark SPECCPU2000
uses the SPEC CPU benchmarks to construct a throughput
benchmark that measures SPEC rate:
• The processing rate of a multiprocessor is measured by running
multiple copies (usually as many as there are CPUs of each SPEC CPU
benchmark and converting the CPU time into a rate).

135
Server Benchmarks
 Benchmarks for file servers
• SPECSFS (benchmark for measuring NFS (Network File
System) performance using a script of file server requests.
- It tests performance of the I/O (both disk and network I/O) and the
CPU.
 Benchmarks for web servers
• SPECWeb
- It simulates multiple clients requesting pages from a server, as well
as clients posting data to the server.

136
Server Benchmarks

 Benchmarks for database and transaction processing


systems (TP benchmarks):
• They measure performance in transactions per second.
• In addition they include response time requirement, so that
throughput is measured only when the response time limit
is met.
• Examples: Airline reservation system, bank ATM system.

137
Server Benchmarks

Transaction Processing Council (TPC): Vendor-


independent:
• Created realistic and fair benchmarks for transaction
processing (TPC benchmarks) described in www.tpc.org.

138
Embedded Benchmarks
Benchmarks for embedded computing systems are in
a far more primitive state than those for either
desktops or servers.
The use of a single benchmark is unrealistic due to:
• the variety in embedded applications as well as
• differences in performance requirements (hard real-time,
soft real-time and overall cost-performance)

139
Embedded Benchmarks
EEMBC (Embedded Microprocessor Benchmark Consortium)
• Set of 41 kernels to predict performance of different embedded applications
- Automotive/industrial
- Consumer
- Networking
- Office Automation
- Telecommunications

EEMBC does not have a good reputation of being a good predictor of relative
performance of different embedded computers.

140
Reporting Performance Results
Guiding principle: reproducibility

A SPEC benchmark report requires


• a fairly complete description of the machine and
• the compiler flags
• actual performance times in tabular form and as a graph.

141
Reporting Performance Results
Tremendous pressure on improving performance of
programs widely used in evaluating machines.
• This has led companies to add optimizations that improve
performance of synthetic programs, toy programs, kernels
or even real programs.
• Adding such optimizations is more difficult in real
programs.
• This fact has led benchmark providers to specify the rules
under which compilers must operate.

142
Reporting Performance Results
A system’s software configuration can significantly
affect the performance results for a benchmark.
 OS performance and support can be very important
for server benchmarks.
• These benchmarks are sometimes run in single-user mode.

 Compiler technology can play a big role in the


performance of compute-oriented benchmarks.

143
Reporting Performance Results
The impact of compiler technology can be especially
large
• when modification of the source is allowed or
• when a benchmark is particularly susceptible to an
optimization.
It is very important to describe exactly the software
system, as well as any special modifications.

144
Reporting Performance Results
To customize the software to improve the
performance of a benchmark
• Benchmark-specific flags.
These flags often caused transformations that
• would be illegal on many programs or
• would slow down performance on others.

145
Reporting Performance Results
Baseline performance:
To increase the significance of results benchmark
developers often require the vendor to use one
compiler and one set of flags for all the programs in
the same language (C or Fortran).

146
Reporting Performance Results
Key issue

• To allow or not to allow source code modifications

147
Reporting Performance Results

Three different approaches:


• Source code modifications are not allowed
• Source code modifications are allowed, but are difficult or
impossible to make
- For example database benchmarks rely on standard database programs that are tens
of millions of lines of code.
The database companies are unlikely to make changes to enhance the performance
of one particular computer.
• Source modifications are allowed

148
Reporting Performance Results

When are modifications allowed by benchmark


designers?
When such modifications reflect real practice.

149
150
151

You might also like