0% found this document useful (0 votes)

25 views19 pages

CH02 COA10e.performance Issues

The document discusses performance issues in computer organization and architecture, highlighting the dramatic improvements in computing power and the cost reduction of systems. It covers various techniques to enhance performance, such as pipelining, branch prediction, and multicore processors, as well as principles like Amdahl's Law and Little's Law for evaluating system performance. Additionally, it introduces benchmark principles and the SPEC benchmark suite for measuring and comparing computer system performance.

Uploaded by

chamso Abou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views19 pages

CH02 COA10e.performance Issues

Uploaded by

chamso Abou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

+

William Stallings
Computer Organization
and Architecture
10th Edition

© 2016 Pearson Education, Inc., Hoboken,

NJ. All rights reserved.
+ Chapter 2
Performance Issues
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Designing for Performance
 The cost of computer systems continues to drop dramatically, while the performance
and capacity of those systems continue to rise equally dramatically
 Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years
ago
 Processors are so inexpensive that we now have microprocessors we throw away
 Desktop applications that require the great power of today’s microprocessor-based
systems include:
 Image processing
 Three-dimensional rendering
 Speech recognition
 Videoconferencing
 Multimedia authoring
 Voice and video annotation of files
 Simulation modeling

 Businesses are relying on increasingly powerful servers to handle transaction

and database processing and to support massive client/server networks that
have replaced the huge mainframe computer centers of yesteryear
 Cloud service providers use massive high-performance banks of servers to
satisfy high-volume, high-transaction-rate applications for a broad spectrum of
clients
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

+
Performance
Balance
Increase the
number of bits
 Adjust the organization and that are retrieved
at one time by
architecture to compensate making DRAMs
“wider” rather
for the mismatch among the than “deeper” and
by using wide bus
capabilities of the various data paths

components Reduce the

frequency of
memory access by
 Architectural examples incorporating
increasingly
include: complex and
efficient cache
structures
between the
processor and
main memory

Change the DRAM Increase the

interface to make interconnect
it more efficient by bandwidth between
processors and
including a cache memory by using
or other buffering higher speed buses
scheme on the and a hierarchy of
DRAM chip buses to buffer and
structure data flow

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

+
Microprocessor Speed
Techniques built into contemporary processors include:

• Processor moves data or instructions into a

Pipelining conceptual pipe with all stages of the pipe
processing simultaneously

• Processor looks ahead in the instruction code

Branch prediction fetched from memory and predicts which

branches, or groups of instructions, are likely to
be processed next

Superscalar • This is the ability to issue more than one

instruction in every processor clock cycle. (In
execution effect, multiple parallel pipelines are used.)

• Processor analyzes which instructions are

Data flow analysis dependent on each other’s results, or data, to
create an optimized schedule of instructions

• Using branch prediction and data flow analysis,

Speculative some processors speculatively execute
instructions ahead of their actual appearance in

execution
the program execution, holding the results in
temporary locations, keeping execution engines
as busy as possible

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Improvements in Chip
Organization and Architecture
 Increase hardware speed of processor
 Fundamentally due to shrinking logic gate size
 More gates, packed more tightly, increasing clock rate
 Propagation time for signals reduced

 Increase size and speed of caches

 Dedicating part of processor chip
 Cache access times drop significantly

 Change processor organization and architecture

 Increase effective speed of instruction execution
 Parallelism

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

The use of multiple
processors on the same
chip provides the potential
to increase performance

Multicore without increasing the

clock rate

Strategy is to use two

simpler processors on the
chip rather than one more
complex processor

With two processors larger

caches are justified

As caches became larger it

made performance sense
to create two and then
three levels of cache on a
chip

+
Many Integrated Core (MIC)
Graphics Processing Unit
(GPU)
MIC GPU
 Leap in performance as well  Core designed to perform
as the challenges in parallel operations on
developing software to graphics data
exploit such a large number
of cores  Traditionally found on a
plug-in graphics card, it is
 The multicore and MIC used to encode and render
strategy involves a 2D and 3D graphics as well
homogeneous collection of as process video
general purpose processors
on a single chip  Used as vector processors
for a variety of applications
that require repetitive
computations
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+  Gene Amdahl

 Deals with the potential speedup of

a program using multiple
processors compared to a single
Amdahl’s processor

Law
 Illustrates the problems facing
industry in the development of
multi-core machines
 Software must be adapted to a
highly parallel execution
environment to exploit the power
of parallel processing

 Can be generalized to evaluate and

design technical improvement in a
computer system

If f is the proportion of a system or program that can be made
parallel, and 1-f is the proportion that remains serial, then the
maximum speedup S(N) that can be achieved using N
processors is: S(N)=1/((1-f)+(f/N)).
As N grows the speedup tends to 1/(1-f).

+
Little’s Law
 Fundamental and simple relation with broad applications
 Can be applied to almost any system that is statistically
in steady state, and in which there is no leakage
 Queuing system
 If server is idle an item is served immediately, otherwise an
arriving item joins a queue
 There can be a single queue for a single server or for multiple
servers, or multiple queues with one being for each of
multiple servers

 Average number of items in a queuing system equals

the average rate at which items arrive multiplied by the
time that an item spends in the system
 Relationship requires very few assumptions
 Because of its simplicity and generality it is extremely useful
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Benchmark Principles

 Desirablecharacteristics of a
benchmark program:

1. It is written in a high-level language,

making it portable across different
machines
2. It is representative of a particular kind of
programming domain or paradigm, such as
systems programming, numerical
programming, or commercial programming
3. It can be measured easily
4. It has wide distribution
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
System Performance Evaluation
Corporation (SPEC)
 Benchmark suite
 A collection of programs, defined in a high-level language
 Together attempt to provide a representative test of a
computer in a particular application or system
programming area

 SPEC
 An industry consortium
 Defines and maintains the best known collection of
benchmark suites aimed at evaluating computer systems
 Performance measurements are widely used for comparison
and research purposes

+  Best known SPEC benchmark suite

 Industry standard suite for

processor intensive applications
SPEC  Appropriate for measuring
performance for applications that
spend most of their time doing
computation rather than I/O
CPU2006  Consists of 17 floating point
programs written in C, C++, and
Fortran and 12 integer programs
written in C and C++

 Suite contains over 3 million lines of

code

 Fifth generation of processor

intensive suites from SPEC

William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
33 pages
Ab Initio Training
No ratings yet
Ab Initio Training
100 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
18 pages
SP23 CS 212 Week 2
No ratings yet
SP23 CS 212 Week 2
23 pages
Chapter 1 Solution
No ratings yet
Chapter 1 Solution
35 pages
Chapter 11
No ratings yet
Chapter 11
33 pages
Ünite
No ratings yet
Ünite
33 pages
Chapter 2
No ratings yet
Chapter 2
34 pages
Chapter 2 Notes NBCAS511
No ratings yet
Chapter 2 Notes NBCAS511
10 pages
CH02 COA10e
No ratings yet
CH02 COA10e
33 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
التحليل
No ratings yet
التحليل
32 pages
Chapter 2
No ratings yet
Chapter 2
34 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
CH02 COA10e
No ratings yet
CH02 COA10e
67 pages
2 Week
No ratings yet
2 Week
35 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
Lec 2
No ratings yet
Lec 2
31 pages
Lec 2
No ratings yet
Lec 2
31 pages
Chapter Two
No ratings yet
Chapter Two
33 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Chapter 2
No ratings yet
Chapter 2
14 pages
Lecture-2-06 01 2025
No ratings yet
Lecture-2-06 01 2025
21 pages
CH02 COA11e
No ratings yet
CH02 COA11e
34 pages
HPC TT1
No ratings yet
HPC TT1
29 pages
HPC - 1
No ratings yet
HPC - 1
40 pages
Modle 01 - HPC Introduction To Pipeline
No ratings yet
Modle 01 - HPC Introduction To Pipeline
124 pages
CA Lec1
No ratings yet
CA Lec1
29 pages
Computer Architecture
No ratings yet
Computer Architecture
56 pages
Cao - Unit 4 - Notes - Final
No ratings yet
Cao - Unit 4 - Notes - Final
30 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
1 Introduction
No ratings yet
1 Introduction
30 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Comp422 534 2020 Lecture1 Introduction
No ratings yet
Comp422 534 2020 Lecture1 Introduction
49 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Lecture 36
No ratings yet
Lecture 36
15 pages
How To Design A Microprocessor - Lesson Plan
No ratings yet
How To Design A Microprocessor - Lesson Plan
7 pages
Parallel Programming - Unit 1
No ratings yet
Parallel Programming - Unit 1
81 pages
Lecture1 Introduction To Parallel Computing - 2025
No ratings yet
Lecture1 Introduction To Parallel Computing - 2025
38 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
CS3350B Computer Architecture: Marc Moreno Maza
100% (1)
CS3350B Computer Architecture: Marc Moreno Maza
45 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
Instructor: L. N. Bhuyan
No ratings yet
Instructor: L. N. Bhuyan
32 pages
Performance Issues
No ratings yet
Performance Issues
19 pages
4 - Performance Issues
No ratings yet
4 - Performance Issues
48 pages
Performance
No ratings yet
Performance
57 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
Multi-Core Processing: Advantages & Challenges
No ratings yet
Multi-Core Processing: Advantages & Challenges
35 pages
Seminar Report
50% (4)
Seminar Report
30 pages
Computer Architecture Design and Performance
No ratings yet
Computer Architecture Design and Performance
381 pages
Computer Architecture: Vnu - University Engineering Technology
No ratings yet
Computer Architecture: Vnu - University Engineering Technology
30 pages
COA Midterm
No ratings yet
COA Midterm
13 pages
20BCE2351 Micro Assignment-02
No ratings yet
20BCE2351 Micro Assignment-02
5 pages
Essential Apache Beam: Definitive Reference for Developers and Engineers
From Everand
Essential Apache Beam: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Kafka for Distributed Systems: Definitive Reference for Developers and Engineers
From Everand
Kafka for Distributed Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
From Everand
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
Anand Vemula
No ratings yet
Puma Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
From Everand
Puma Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Rsync Solutions: Definitive Reference for Developers and Engineers
From Everand
Rsync Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Amazon EMR Solutions in Cloud Computing: Definitive Reference for Developers and Engineers
From Everand
Amazon EMR Solutions in Cloud Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
CH03-COA10e.top Level View
No ratings yet
CH03-COA10e.top Level View
40 pages
Chapter8 AdvancedSQL Part5V2
No ratings yet
Chapter8 AdvancedSQL Part5V2
45 pages
Chapter14 BigData&NoSQLDatabases
No ratings yet
Chapter14 BigData&NoSQLDatabases
39 pages
Chapter10 TransactionManagementandConcurrencyControl
No ratings yet
Chapter10 TransactionManagementandConcurrencyControl
31 pages
Chapter6 NormalizationDatabaseTables Part4
No ratings yet
Chapter6 NormalizationDatabaseTables Part4
38 pages
WFMP 111 Relnot
No ratings yet
WFMP 111 Relnot
26 pages
Cloud Computing Notes (Unit-1 To 5)
100% (1)
Cloud Computing Notes (Unit-1 To 5)
98 pages
B.C.A. PART-III EXAM, 2008: References
No ratings yet
B.C.A. PART-III EXAM, 2008: References
6 pages
Mcap Lesson Plan
No ratings yet
Mcap Lesson Plan
3 pages
Matlab Course Training Details
0% (1)
Matlab Course Training Details
12 pages
Quick Sort
No ratings yet
Quick Sort
5 pages
Random Forest Classifiers A Survey and Future
No ratings yet
Random Forest Classifiers A Survey and Future
10 pages
Fundamentals of Computer Design - 1
No ratings yet
Fundamentals of Computer Design - 1
32 pages
Computer System Organizations: Ms - Chit Su Mon
No ratings yet
Computer System Organizations: Ms - Chit Su Mon
74 pages
Chapter - 2 - Parallel Hardware and Parallel Software
No ratings yet
Chapter - 2 - Parallel Hardware and Parallel Software
143 pages
Day One:: Deploying BGP Rib Sharding and Update Threading
No ratings yet
Day One:: Deploying BGP Rib Sharding and Update Threading
41 pages
Operating System
No ratings yet
Operating System
45 pages
Shared Memory Architectures
No ratings yet
Shared Memory Architectures
34 pages
TanandSitar ParallelImplementationLS DEM
No ratings yet
TanandSitar ParallelImplementationLS DEM
18 pages
Accelerating Large Graph Algorithms On The GPU Using Cuda
No ratings yet
Accelerating Large Graph Algorithms On The GPU Using Cuda
12 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
34 pages
CS6703 - Rejin Notes
No ratings yet
CS6703 - Rejin Notes
80 pages
Parallel Databases
No ratings yet
Parallel Databases
19 pages
Vlsi 2018 19 PDF
No ratings yet
Vlsi 2018 19 PDF
44 pages
A Study of Data Flow Graph Representation Analysis With Syntax and Semantics
No ratings yet
A Study of Data Flow Graph Representation Analysis With Syntax and Semantics
4 pages
FFT Ua611
No ratings yet
FFT Ua611
2 pages
Basic of Thread Level Parallelism
No ratings yet
Basic of Thread Level Parallelism
30 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Department of Computer Science Vidyasagar University: Paschim Medinipur - 721102
No ratings yet
Department of Computer Science Vidyasagar University: Paschim Medinipur - 721102
26 pages
Lec1 24th Nov
No ratings yet
Lec1 24th Nov
29 pages
CS9211-Computer Architecture Question
No ratings yet
CS9211-Computer Architecture Question
7 pages
JNTU M Tech Computer Science Syllabus
No ratings yet
JNTU M Tech Computer Science Syllabus
16 pages
Electronics and Communication Engineering
No ratings yet
Electronics and Communication Engineering
552 pages

CH02 COA10e.performance Issues

Uploaded by

CH02 COA10e.performance Issues

Uploaded by

+

© 2016 Pearson Education, Inc., Hoboken,

 Businesses are relying on increasingly powerful servers to handle transaction

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

components Reduce the

Change the DRAM Increase the

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

• Processor moves data or instructions into a

• Processor looks ahead in the instruction code

Branch prediction fetched from memory and predicts which

Superscalar • This is the ability to issue more than one

• Processor analyzes which instructions are

• Using branch prediction and data flow analysis,

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

 Increase size and speed of caches

 Change processor organization and architecture

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Multicore without increasing the

Strategy is to use two

With two processors larger

As caches became larger it

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

 Deals with the potential speedup of

 Can be generalized to evaluate and

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

 Average number of items in a queuing system equals

1. It is written in a high-level language,

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

 Industry standard suite for

 Suite contains over 3 million lines of

 Fifth generation of processor

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

You might also like