0% found this document useful (0 votes)

53 views33 pages

Chapter 11

The document discusses performance issues in computer organization and architecture. It covers designing for performance through increasing processor speed and improving chip organization. Some techniques to increase processor speed include pipelining, branch prediction, superscalar execution, and speculative execution. The document also discusses balancing performance between components and improving chip organization through larger caches and multicore processors.

Uploaded by

halilkuyuk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views33 pages

Chapter 11

Uploaded by

halilkuyuk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

+

William Stallings
Computer Organization
and Architecture
10th Edition
© 2016 Pearson Education, Inc., Hoboken,
NJ. All rights reserved.
+ Chapter 2
Performance Issues
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Designing for Performance
 The cost of computer systems continues to drop dramatically, while the performance
and capacity of those systems continue to rise equally dramatically

 Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years ago

 Processors are so inexpensive that we now have microprocessors we throw away

 Desktop applications that require the great power of today’s microprocessor-based

systems include:
 Image processing
 Three-dimensional rendering
 Speech recognition
 Videoconferencing
 Multimedia authoring
 Voice and video annotation of files
 Simulation modeling

 Businesses are relying on increasingly powerful servers to handle transaction and

database processing and to support massive client/server networks that have
replaced the huge mainframe computer centers of yesteryear

 Cloud service providers use massive high-performance banks of servers to

satisfy high-volume, high-transaction-rate applications for a broad spectrum of
clients
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Microprocessor Speed
Techniques built into contemporary processors include:

• Processor moves data or instructions into a

Pipelining conceptual pipe with all stages of the pipe processing
simultaneously

• Processor looks ahead in the instruction code fetched

Branch prediction from memory and predicts which branches, or groups
of instructions, are likely to be processed next

Superscalar • This is the ability to issue more than one instruction in

every processor clock cycle. (In effect, multiple
execution parallel pipelines are used.)

• Processor analyzes which instructions are dependent

Data flow analysis on each other’s results, or data, to create an
optimized schedule of instructions

Speculative • Using branch prediction and data flow analysis, some

processors speculatively execute instructions ahead
of their actual appearance in the program execution,
execution holding the results in temporary locations, keeping
execution engines as busy as possible

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

+
Performance
Balance
Increase the number
 Adjust the organization and of bits that are
retrieved at one time
architecture to compensate by making DRAMs
“wider” rather than
for the mismatch among the “deeper” and by
using wide bus data
capabilities of the various paths

components
Reduce the frequency
 Architectural examples of memory access by
incorporating
include: increasingly complex
and efficient cache
structures between
the processor and
main memory

Change the DRAM Increase the

interface to make it interconnect
more efficient by bandwidth between
processors and
including a cache or memory by using
other buffering higher speed buses
scheme on the DRAM and a hierarchy of
chip buses to buffer and
structure data flow

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Ethernet modem
(max speed)

Graphics display

Wi-Fi modem
(max speed)

Hard disk

Optical disc

Laser printer

Scanner

Mouse

Keyboard

101 102 103 104 105 106 107 108 109 1010 1011
Data Rate (bps)

Figure 2.1 Typical I/O Device Data Rates

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Improvements in Chip
Organization and Architecture
 Increase hardware speed of processor
 Fundamentally due to shrinking logic gate size
 More gates, packed more tightly, increasing clock rate
 Propagation time for signals reduced

 Increase size and speed of caches

 Dedicating part of processor chip
 Cache access times drop significantly

 Change processor organization and architecture

 Increase effective speed of instruction execution
 Parallelism

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

+
Problems with Clock Speed and
Login Density
 Power
 Power density increases with density of logic and clock speed
 Dissipating heat

 RC delay
 Speed at which electrons flow limited by resistance and
capacitance of metal wires connecting them
 Delay increases as the RC product increases
 As components on the chip decrease in size, the wire
interconnects become thinner, increasing resistance
 Also, the wires are closer together, increasing capacitance

 Memory latency
 Memory speeds lag processor speeds
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
107

106
Transistors (Thousands)
105 Frequency (MHz)
Power (W)
104 Cores

103

102
+
10

0.1
1970 1975 1980 1985 1990 1995 2000 2005 2010

Figu r e 2 .2 Pr oce ssor Tr e n ds

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

The use of multiple
processors on the same chip
provides the potential to
increase performance
Multicore without increasing the clock
rate

Strategy is to use two simpler

processors on the chip rather
than one more complex
processor

With two processors larger

caches are justified

As caches became larger it

made performance sense to
create two and then three
levels of cache on a chip

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

+
Many Integrated Core (MIC)
Graphics Processing Unit (GPU)
MIC GPU
 Leap in performance as well  Core designed to perform
as the challenges in parallel operations on graphics
developing software to exploit data
such a large number of cores
 Traditionally found on a plug-in
 The multicore and MIC graphics card, it is used to
strategy involves a encode and render 2D and 3D
homogeneous collection of graphics as well as process
general purpose processors video
on a single chip
 Used as vector processors for a
variety of applications that
require repetitive computations

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

+  Gene Amdahl

 Deals with the potential speedup of a

program using multiple processors
compared to a single processor
Amdahl’s  Illustrates the problems facing industry

Law
in the development of multi-core
machines
 Software must be adapted to a highly
parallel execution environment to
exploit the power of parallel
processing

 Can be generalized to evaluate and

design technical improvement in a
computer system

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

T
(1 – f)T fT

(1 – f)T fT
N

1
1 f 1 T
N

Figu r e 2 .3 I llu st r a t ion of Am da h l’s La w

Spe du p

f = 0 .9 0

+ f = 0 .7 5

f = 0 .5

N u m be r of Pr oce ssor s

Figur e 2 .4 Am da h l’s La w for M ult ipr oce ssor s

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Little’s Law
 Fundamental and simple relation with broad applications

 Can be applied to almost any system that is statistically in

steady state, and in which there is no leakage

 Queuing system
 If server is idle an item is served immediately, otherwise an
arriving item joins a queue
 There can be a single queue for a single server or for multiple
servers, or multiple queues with one being for each of multiple
servers

 Average number of items in a queuing system equals the

average rate at which items arrive multiplied by the time
that an item spends in the system
 Relationship requires very few assumptions
 Because of its simplicity and generality it is extremely useful

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

q
cr uar
ys tz
ta
l

an
co di alog
nv git to
er al
sio
n

From Computer Desktop Encyclopedia

1998, The Computer Language Co.

Figure 2.5 System Clock

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Ic p m k t
I nstruction set
X X
architecture
Compiler technology X X X
Processor
X X
implementation
Cache and memory
X X
hierarchy

Table 2.1 Performance Factors and System Attributes

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

The three
The use of benchmarks to
compare systems involves common
calculating the mean value of a formulas used
set of data points related to for calculating
execution time
a mean are:

• Arithmetic
• Geometric
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
• Harmonic
MD
AM
(a) GM
HM

MD
AM
(b) GM
HM

MD
AM
(c) GM
HM

MD
AM
(d) GM
HM

MD
AM
(e) GM
HM

MD
AM
(f) GM
HM

MD
AM
(g) GM
HM

0 1 2 3 4 5 6 7 8 9 10 11

(a) Constant (11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11) MD = median
(b) Clustered around a central value (3, 5, 6, 6, 7, 7, 7, 8, 8, 9, 1 1) AM = arithmetic mean
(c) Uniform distribution (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1) GM = geometric mean
(d) Large-number bias (1, 4, 4, 7, 7, 9, 9, 10, 10, 1 1, 11) HM = harmonic mean
(e) Small-number bias(1, 1, 2, 2, 3, 3, 5, 5, 8, 8, 1 1)
(f) Upper outlier (11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
(g) Lower outlier (1, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11)

Figure 2.6 Comparison of Means on Various Data Sets

(each set has a maximum data point value of 1 1)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
 An Arithmetic Mean (AM) is an
appropriate measure if the sum of all the
measurements is a meaningful and
interesting value Arithmetic
 The AM is a good candidate for
comparing the execution time
performance of several systems
For example, suppose we were interested in using a system
for large-scale simulation studies and wanted to evaluate several
alternative products. On each system we could run the simulation
multiple times with different input values for each run, and then
Mean
take the average execution time across all runs. The use of
multiple runs with different inputs should ensure that the results are
not heavily biased by some unusual feature of a given input set. The
AM of all the runs is a good measure of the system’s performance
on simulations, and a good number to use for system comparison.
+
 The AM used for a time-based variable, such as
program execution time, has the important property
that it is directly proportional to the total time
 If the total time doubles, the mean value doubles
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Computer Computer Computer Computer Computer Computer
A time B time C time A rate B rate C rate
(secs) (secs) (secs) (M FLOPS) (M FLOPS) (M FLOPS)
Program 1
(108 FP 2.0 1.0 0.75 50 100 133.33
ops)
Program 2
Table 2.2
(108 FP 0.75 2.0 4.0 133.33 50 25
ops) A Comparison
Total
execution 2.75 3.0 4.75
of Arithmetic
time and
Arithmetic Harmonic
mean of 1.38 1.5 2.38
times Means for
I nverse of Rates
total
execution 0.36 0.33 0.21
time
(1/sec)
Arithmetic
mean of 91.67 75.00 79.17
rates
Harmonic
mean of 72.72 66.67 42.11
rates

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Table 2.3 A Comparison of Arithmetic and Geometric M eans for Normalized
Results

(a) Results normalized to Computer A

Computer A time Computer B time Computer C time

Program 1 2.0 (1.0) 1.0 (0.5) 0.75 (0.38)
Program 2 0.75 (1.0) 2.0 (2.67) 4.0 (5.33)
Total execution time 2.75 3.0 4.75
Arithmetic mean of 1.00 1.58 2.85
normalized times
Geometric mean of 1.00 1.15 1.41
normalized times

(b) Results normalized to Computer B

Computer A time Computer B time Computer C time

Program 1 2.0 (2.0) 1.0 (1.0) 0.75 (0.75)
Program 2 0.75 (0.38) 2.0 (1.0) 4.0 (2.0)
Total execution time 2.75 3.0 4.75
Arithmetic mean of 1.19 1.00 1.38
normalized times
Geometric mean of 0.87 1.00 1.22
normalized times
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Table 2.4 Another Comparison of Arithmetic and Geometric M eans for
Normalized Results

(a) Results normalized to Computer A

Computer A time Computer B time Computer C time

Program 1 2.0 (1.0) 1.0 (0.5) 0.20 (0.1)
Program 2 0.4 (1.0) 2.0 (5.0) 4.0 (10)
Total execution time 2.4 3.00 4.2
Arithmetic mean of 1.00 2.75 5.05
normalized times
Geometric mean of 1.00 1.58 1.00
normalized times

(b) Results normalized to Computer B

Computer A time Computer B time Computer C time

Program 1 2.0 (2.0) 1.0 (1.0) 0.20 (0.2)
Program 2 0.4 (0.2) 2.0 (1.0) 4.0 (2)
Total execution time 2.4 3.00 4.2
Arithmetic mean of 1.10 1.00 1.10
normalized times
Geometric mean of 0.63 1.00 0.63
normalized times
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Benchmark Principles

 Desirable characteristics of a benchmark

program:

1. It is written in a high-level language, making it

portable across different machines
2. It is representative of a particular kind of
programming domain or paradigm, such as
systems programming, numerical
programming, or commercial programming
3. It can be measured easily
4. It has wide distribution

+
System Performance Evaluation
Corporation (SPEC)
 Benchmark suite
 A collection of programs, defined in a high-level language
 Together attempt to provide a representative test of a computer in
a particular application or system programming area

 SPEC
 An industry consortium
 Defines and maintains the best known collection of benchmark
suites aimed at evaluating computer systems
 Performance measurements are widely used for comparison and
research purposes

+  Best known SPEC benchmark suite

 Industry standard suite for processor

intensive applications
SPEC  Appropriate for measuring
performance for applications that
spend most of their time doing
computation rather than I/O
CPU2006  Consists of 17 floating point programs
written in C, C++, and Fortran and 12
integer programs written in C and C++

 Suite contains over 3 million lines of

code

 Fifth generation of processor intensive

suites from SPEC

Benchmark Reference I nstr Language Application Brief Description
time count Area
(hours) (billion)
Programming PERL programming
400.perlbench 2.71 2,378 C Language language interpreter, applied
to a set of three programs.
Compression General-purpose data
401.bzip2 2.68 2,472 C compression with most work
done in memory, rather than
doing I/O.
C Compiler Based on gcc Version 3.2,

Table 2.5
403.gcc 2.24 1,064 C
generates code for Opteron.
Combinatoria Vehicle scheduling
429.mcf 2.53 327 C l algorithm.
Optimization
Artificial Plays the game of Go, a
445.gobmk 2.91 1,603 C Intelligence simply described but deeply
complex game.

456.hmmer 2.59 3,363 C

Search Gene
Sequence
Protein sequence analysis
using profile hidden Markov
models.
SPEC
458.sjeng 3.36 2,383 C
Artificial
Intelligence
A highly ranked chess
program that also plays
several chess variants.
CPU2006
462.libquantum 5.76 3,555 C
Physics /
Quantum
Computing
Simulates a quantum
computer, running Shor's
polynomial-time
factorization algorithm.
Integer
464.h264ref 6.15 3,731 C
Video
Compression
H.264/AVC (Advanced
Video Coding) Video
compression.
Benchmarks
Discrete Uses the OMNet++ discrete
Event event simulator to model a
471.omnetpp 1.74 687 C++
Simulation large Ethernet campus
network.
Path-finding Pathfinding library for 2D
473.astar 1.95 1,200 C++
Algorithms maps.
XML A modified version of
483.xalancbmk 1.92 1,184 C++ Processing Xalan-C++, which
transforms XML documents
to other document types.

(Table can be found on page 69 in the textbook.)

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Reference I nstr count
Benchmark time (hours) (billion) Language Application Area Brief Description
Computes 3D transonic
410.bwaves 3.78 1,176 Fortran Fluid Dynamics transient laminar viscous
flow.
Quantum Quantum chemical
416.gamess 5.44 5,189 Fortran
Chemistry computations.
Physics / Quantum Simulates behavior of
433.milc 2.55 937 C
Chromodynamics quarks and gluons
Computational fluid
434.zeusmp 2.53 1,566 Fortran Physics / CFD dynamics simulation of

435.gromacs 1.98 1,958 C, Fortran

Biochemistry /
Molecular
astrophysical phenomena.
Simulate Newtonian
equations of motion for
Table 2.6
hundreds to millions of
Dynamics
particles.
436.cactusAD Physics / General Solves the Einstein
3.32 1,376 C, Fortran
M Relativity evolution equations.
437.leslie3d

444.namd
2.61

2.23
1,273

2,483
Fortran

C++
Fluid Dynamics
Biology /
Molecular
Model fuel injection flows.
Simulates large
biomolecular systems.
SPEC
CPU2006
Dynamics
Program library targeted at
Finite Element
447.dealII 3.18 2,323 C++ adaptive finite elements and
Analysis
error estimation.

450.soplex 2.32 703 C++

Linear
Programming,
Optimization
Test cases include railroad
planning and military airlift
models.
Floating-Point
453.povray

454.calculix
1.48

2.29
940

3,04`
C++

C, Fortran
Image Ray-tracing
Structural
Mechanics
3D Image rendering.
Finite element code for
linear and nonlinear 3D
Benchmarks
structural applications.
459.GemsFDT Computational Solves the Maxwell
2.95 1,320 Fortran
D Electromagnetics equations in 3D.
Quantum chemistry
Quantum
465.tonto 2.73 2,392 Fortran package, adapted for
Chemistry
crystallographic tasks.
Simulates incompressible
470.lbm 3.82 1,500 C Fluid Dynamics
fluids in 3D.
481.wrf 3.10 1,684 C, Fortran Weather Weather forecasting model
Speech recognition
482.sphinx3 5.41 2,472 C Speech recognition
software. (Table can be found on page 70
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. in the textbook.)
+
Terms Used in SPEC Documentation
 Benchmark  Peak metric
 A program written in a high-level  This enables users to attempt to
language that can be compiled optimize system performance by
and executed on any computer optimizing the compiler output
that implements the compiler
 Speed metric
 System under test  This is simply a measurement of the
 This is the system to be evaluated
time it takes to execute a compiled
benchmark
 Used for comparing the ability of
 Reference machine a computer to complete single
 This is a system used by SPEC to tasks
establish a baseline performance
for all benchmarks  Rate metric
 Each benchmark is run and  This is a measurement of how many
measured on this machine to tasks a computer can accomplish in
establish a reference time for a certain amount of time
that benchmark  This is called a throughput,
capacity, or rate measure
 Base metric  Allows the system under test to
 These are required for all execute simultaneous tasks to
reported results and have strict take advantage of multiple
guidelines for compilation processors

Start

Get next
program

Run program
three times

Select
median value

Ratio(prog) =
Tref(prog)/TSUT(prog)

Yes More No Compute geometric

programs? mean of all ratios

End

Figure 2.7 SPEC Evaluation Flowchart

(a) Sun Blade 1000

Execution Execution Execution Reference

Benchmark Ratio
time time time time
400.perlbench 3077 3076 3080 9770 3.18
401.bzip2 3260 3263 3260 9650 2.96
403.gcc 2711 2701 2702 8050 2.98
429.mcf 2356 2331 2301 9120 3.91
445.gobmk 3319 3310 3308 10490 3.17
456.hmmer 2586 2587 2601 9330 3.61
458.sjeng 3452 3449 3449 12100 3.51
462.libquantum 10318 10319 10273 20720 2.01
464.h264ref 5246 5290 5259 22130 4.21
471.omnetpp 2565 2572 2582 6250 2.43
473.astar 2522 2554 2565 7020 2.75
483.xalancbmk 2014 2018 2018 6900 3.42
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
(b) Sun Blade X6250

Execution Execution Execution Reference

Benchmark Ratio Rate
time time time time
400.perlbench 497 497 497 9770 19.66 78.63
401.bzip2 613 614 613 9650 15.74 62.97
403.gcc 529 529 529 8050 15.22 60.87
429.mcf 472 472 473 9120 19.32 77.29
445.gobmk 637 637 637 10490 16.47 65.87
456.hmmer 446 446 446 9330 20.92 83.68
458.sjeng 631 632 630 12100 19.18 76.70
462.libquantum 614 614 614 20720 33.75 134.98
464.h264ref 830 830 830 22130 26.66 106.65
471.omnetpp 619 620 619 6250 10.10 40.39
473.astar 580 580 580 7020 12.10 48.41
483.xalancbmk 422 422 422 6900 16.35 65.40

+ Summary Performance
Issues
Chapter 2

 Designing for performance  Basic measures of computer

 Microprocessor speed performance
 Performance balance  Clock speed
 Improvements in chip  Instruction execution rate
organization and
 Calculating the mean
architecture
 Arithmetic mean
 Multicore
 Harmonic mean
 MICs
 Geometric mean
 GPGPUs
 Amdahl’s Law  Benchmark principles
 Little’s Law
 SPEC benchmarks
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Mindray - DP 10-20-30 Service Manual
100% (7)
Mindray - DP 10-20-30 Service Manual
157 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
33 pages
M A Haque S M Richardson G Saville Blowdown of Pressure Vessels I Computer Model
No ratings yet
M A Haque S M Richardson G Saville Blowdown of Pressure Vessels I Computer Model
7 pages
Chapter 1 Solution
No ratings yet
Chapter 1 Solution
35 pages
Chapter 2
No ratings yet
Chapter 2
34 pages
Ünite
No ratings yet
Ünite
33 pages
CH02 COA10e
No ratings yet
CH02 COA10e
33 pages
CH02 COA10e.performance Issues
No ratings yet
CH02 COA10e.performance Issues
19 pages
SP23 CS 212 Week 2
No ratings yet
SP23 CS 212 Week 2
23 pages
Chapter 2
No ratings yet
Chapter 2
34 pages
Chapter 2 Notes NBCAS511
No ratings yet
Chapter 2 Notes NBCAS511
10 pages
2 Week
No ratings yet
2 Week
35 pages
التحليل
No ratings yet
التحليل
32 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
CH02 COA10e
No ratings yet
CH02 COA10e
67 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Chapter Two
No ratings yet
Chapter Two
33 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
CH02 COA11e
No ratings yet
CH02 COA11e
34 pages
Computer Architecture
No ratings yet
Computer Architecture
56 pages
Ch.2 Performance Issues: Computer Organization and Architecture
No ratings yet
Ch.2 Performance Issues: Computer Organization and Architecture
25 pages
Lec 2
No ratings yet
Lec 2
31 pages
Lec 2
No ratings yet
Lec 2
31 pages
Instructor: L. N. Bhuyan
No ratings yet
Instructor: L. N. Bhuyan
32 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Seminar Report
50% (4)
Seminar Report
30 pages
CA Lec1
No ratings yet
CA Lec1
29 pages
Chapter 2
No ratings yet
Chapter 2
14 pages
Performance
No ratings yet
Performance
57 pages
CH17 COA10e
No ratings yet
CH17 COA10e
45 pages
HPC - 1
No ratings yet
HPC - 1
40 pages
ch1 PC
No ratings yet
ch1 PC
84 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
How To Design A Microprocessor - Lesson Plan
No ratings yet
How To Design A Microprocessor - Lesson Plan
7 pages
Modle 01 - HPC Introduction To Pipeline
No ratings yet
Modle 01 - HPC Introduction To Pipeline
124 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Computer Architecture: Vnu - University Engineering Technology
No ratings yet
Computer Architecture: Vnu - University Engineering Technology
30 pages
Performance Issues
No ratings yet
Performance Issues
19 pages
Lecture1 Introduction To Parallel Computing - 2025
No ratings yet
Lecture1 Introduction To Parallel Computing - 2025
38 pages
Trends in Computer Architecture
No ratings yet
Trends in Computer Architecture
30 pages
HPC TT1
No ratings yet
HPC TT1
29 pages
Single Core Processor
No ratings yet
Single Core Processor
9 pages
4 - Performance Issues
No ratings yet
4 - Performance Issues
48 pages
19bce0531 VL2021220104072 Da 1 PDF
No ratings yet
19bce0531 VL2021220104072 Da 1 PDF
16 pages
Son-CA - Lec1 - 1 - Computer Abstraction and Technology
No ratings yet
Son-CA - Lec1 - 1 - Computer Abstraction and Technology
31 pages
Lecture 06
No ratings yet
Lecture 06
48 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
24 pages
CA Classes-41-45
No ratings yet
CA Classes-41-45
5 pages
IAS & MIPS Rate
No ratings yet
IAS & MIPS Rate
42 pages
Computer Architecture
No ratings yet
Computer Architecture
21 pages
Study Notes COAL Mids
No ratings yet
Study Notes COAL Mids
14 pages
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
No ratings yet
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
52 pages
Lecture1 2
No ratings yet
Lecture1 2
30 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
Essential Apache Beam: Definitive Reference for Developers and Engineers
From Everand
Essential Apache Beam: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
From Everand
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
Anand Vemula
No ratings yet
Kafka for Distributed Systems: Definitive Reference for Developers and Engineers
From Everand
Kafka for Distributed Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Daemon Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
Daemon Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Puma Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
From Everand
Puma Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Rsync Solutions: Definitive Reference for Developers and Engineers
From Everand
Rsync Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering the Art of x86 Assembly Programming: Unlocking the Secrets of Expert-Level Skills
From Everand
Mastering the Art of x86 Assembly Programming: Unlocking the Secrets of Expert-Level Skills
Steve Jones
No ratings yet
Part 1
No ratings yet
Part 1
22 pages
Part 13
No ratings yet
Part 13
48 pages
Ünite
No ratings yet
Ünite
40 pages
Part 8
No ratings yet
Part 8
23 pages
PART20
No ratings yet
PART20
24 pages
PART15
No ratings yet
PART15
39 pages
PART14
No ratings yet
PART14
54 pages
PART13
No ratings yet
PART13
36 pages
PART8
No ratings yet
PART8
54 pages
PART12
No ratings yet
PART12
13 pages
Abhijeet Cbap Project
100% (1)
Abhijeet Cbap Project
4 pages
00 - NEW Blank - Assignment Brief 2
No ratings yet
00 - NEW Blank - Assignment Brief 2
5 pages
Dissertation Topics Computer Security
100% (2)
Dissertation Topics Computer Security
7 pages
Manual Del Usuario Ford FOCUS 2002: EBHFZBWHZR - PDF - 291.76 KB - 07 Oct, 2015
0% (1)
Manual Del Usuario Ford FOCUS 2002: EBHFZBWHZR - PDF - 291.76 KB - 07 Oct, 2015
4 pages
ReadMe KMSnano
No ratings yet
ReadMe KMSnano
1 page
Sevt 102008 Uc500 Lab
No ratings yet
Sevt 102008 Uc500 Lab
31 pages
Designing A Home Alarm Using The Uml and Implementing It Using C++ and Vxworks
No ratings yet
Designing A Home Alarm Using The Uml and Implementing It Using C++ and Vxworks
0 pages
Interface Box High ULF-SBX-H Text
No ratings yet
Interface Box High ULF-SBX-H Text
4 pages
Midterm Module 2 Week 9 Server Installation
No ratings yet
Midterm Module 2 Week 9 Server Installation
12 pages
Maxbox - Starter75 Object Detection
No ratings yet
Maxbox - Starter75 Object Detection
7 pages
Detecting Fake News Using Machine Learning: Gaurav Kumar Choubey (21mca1061) Guide Name: DR Rajarajeswari S
No ratings yet
Detecting Fake News Using Machine Learning: Gaurav Kumar Choubey (21mca1061) Guide Name: DR Rajarajeswari S
29 pages
Computer Science: at Harvard
No ratings yet
Computer Science: at Harvard
2 pages
Functional System Testing: Written by Adam Carmi
No ratings yet
Functional System Testing: Written by Adam Carmi
25 pages
Kode Warna HTML: Nama: Pretty Septiana Kelas: XII IPA 4 No Absen: 07
No ratings yet
Kode Warna HTML: Nama: Pretty Septiana Kelas: XII IPA 4 No Absen: 07
12 pages
In Dash Navi & Infotainment System With Bluetooth: Service Manual
No ratings yet
In Dash Navi & Infotainment System With Bluetooth: Service Manual
70 pages
FISD Integration
No ratings yet
FISD Integration
21 pages
This File Contains The Following Worksheets:: Quick Instructions
No ratings yet
This File Contains The Following Worksheets:: Quick Instructions
7 pages
Ilide - Info Details of Mobile Repairing Course PR
No ratings yet
Ilide - Info Details of Mobile Repairing Course PR
5 pages
Introduction To Computers
No ratings yet
Introduction To Computers
11 pages
Loan Management System For HESFB
No ratings yet
Loan Management System For HESFB
42 pages
Tanner EDA Simulation Detailed
No ratings yet
Tanner EDA Simulation Detailed
125 pages
18 Useful Spanish Greetings Fo
No ratings yet
18 Useful Spanish Greetings Fo
4 pages
Project Report Anil
No ratings yet
Project Report Anil
57 pages
Resume Priyanshu Chouhan 1
No ratings yet
Resume Priyanshu Chouhan 1
1 page
Progress Tracking in Materials Management
No ratings yet
Progress Tracking in Materials Management
15 pages
Teneo Adult Matric InfoPack
No ratings yet
Teneo Adult Matric InfoPack
6 pages
Virtual File System - Linux
No ratings yet
Virtual File System - Linux
4 pages
Chat Bot
No ratings yet
Chat Bot
48 pages

Chapter 11

Uploaded by

Chapter 11

Uploaded by

+

 Processors are so inexpensive that we now have microprocessors we throw away

 Desktop applications that require the great power of today’s microprocessor-based

 Businesses are relying on increasingly powerful servers to handle transaction and

 Cloud service providers use massive high-performance banks of servers to

• Processor moves data or instructions into a

• Processor looks ahead in the instruction code fetched

Superscalar • This is the ability to issue more than one instruction in

• Processor analyzes which instructions are dependent

Speculative • Using branch prediction and data flow analysis, some

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Change the DRAM Increase the

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Figure 2.1 Typical I/O Device Data Rates

 Increase size and speed of caches

 Change processor organization and architecture

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Figu r e 2 .2 Pr oce ssor Tr e n ds

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Strategy is to use two simpler

With two processors larger

As caches became larger it

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

 Deals with the potential speedup of a

 Can be generalized to evaluate and

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Figu r e 2 .3 I llu st r a t ion of Am da h l’s La w

Figur e 2 .4 Am da h l’s La w for M ult ipr oce ssor s

 Can be applied to almost any system that is statistically in

 Average number of items in a queuing system equals the

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

From Computer Desktop Encyclopedia

Figure 2.5 System Clock

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Table 2.1 Performance Factors and System Attributes

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Figure 2.6 Comparison of Means on Various Data Sets

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

(a) Results normalized to Computer A

Computer A time Computer B time Computer C time

(b) Results normalized to Computer B

Computer A time Computer B time Computer C time

(a) Results normalized to Computer A

Computer A time Computer B time Computer C time

(b) Results normalized to Computer B

Computer A time Computer B time Computer C time

 Desirable characteristics of a benchmark

1. It is written in a high-level language, making it

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

 Industry standard suite for processor

 Suite contains over 3 million lines of

 Fifth generation of processor intensive

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

456.hmmer 2.59 3,363 C

(Table can be found on page 69 in the textbook.)

435.gromacs 1.98 1,958 C, Fortran

450.soplex 2.32 703 C++

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Yes More No Compute geometric

Figure 2.7 SPEC Evaluation Flowchart

(a) Sun Blade 1000

Execution Execution Execution Reference

Execution Execution Execution Reference

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

 Designing for performance  Basic measures of computer

You might also like