0% found this document useful (0 votes)

45 views23 pages

SP23 CS 212 Week 2

Uploaded by

Adeena Asif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views23 pages

SP23 CS 212 Week 2

Uploaded by

Adeena Asif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

+

William Stallings
Computer Organization
and Architecture
10th Edition

© 2016 Pearson Education, Inc., Hoboken,

NJ. All rights reserved.
+ Chapter 2
Performance Issues
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Designing for Performance
 The cost of computer systems continues to drop dramatically, while the performance and capacity
of those systems continue to rise equally dramatically

 Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years ago

 Processors are so inexpensive that we now have microprocessors we throw away

 Desktop applications that require the great power of today’s microprocessor-based systems include:
 Image processing
 Three-dimensional rendering
 Speech recognition
 Videoconferencing
 Multimedia authoring
 Voice and video annotation of files
 Simulation modeling

 Businesses are relying on increasingly powerful servers to handle transaction and database
processing and to support massive client/server networks that have replaced the huge
mainframe computer centers of yesteryear

 Cloud service providers use massive high-performance banks of servers to satisfy high-
volume, high-transaction-rate applications for a broad spectrum of clients

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

+
Microprocessor Speed
Techniques built into contemporary processors include:

Pipelining • Processor moves data or instructions into a conceptual pipe

with all stages of the pipe processing simultaneously

• Processor looks ahead in the instruction code fetched from

Branch prediction memory and predicts which branches, or groups of
instructions, are likely to be processed next

• This is the ability to issue more than one instruction in every

Superscalar execution processor clock cycle. (In effect, multiple parallel pipelines
are used.)

• Processor analyzes which instructions are dependent on each

Data flow analysis other’s results, or data, to create an optimized schedule of
instructions

• Using branch prediction and data flow analysis, some

processors speculatively execute instructions ahead of their
Speculative execution actual appearance in the program execution, holding the
results in temporary locations, keeping execution engines as
busy as possible

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

+
Performance
Balance
Increase the number of
 Adjust the organization and bits that are retrieved at
one time by making
architecture to compensate DRAMs “wider” rather
than “deeper” and by
for the mismatch among the using wide bus data
paths
capabilities of the various
components
Reduce the frequency of
 Architectural examples memory access by
incorporating
include: increasingly complex and
efficient cache structures
between the processor
and main memory

Change the DRAM Increase the interconnect

interface to make it more bandwidth between
efficient by including a processors and memory
cache or other buffering by using higher speed
scheme on the DRAM buses and a hierarchy of
chip buses to buffer and
structure data flow

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Improvements in Chip Organization
and Architecture
 Increase hardware speed of processor
 Fundamentally due to shrinking logic gate size
 More gates, packed more tightly, increasing clock rate
 Propagation time for signals reduced

 Increase size and speed of caches

 Dedicating part of processor chip
 Cache access times drop significantly

 Change processor organization and architecture

 Increase effective speed of instruction execution
 Parallelism

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

+
Problems with Clock Speed and Logic
Density
 Power
 Power density increases with density of logic and clock speed
 Dissipating heat

 RC delay
 Speed at which electrons flow limited by resistance and capacitance of
metal wires connecting them
 Delay increases as the RC product increases
 As components on the chip decrease in size, the wire interconnects
become thinner, increasing resistance
 Also, the wires are closer together, increasing capacitance

 Memory latency
 Memory speeds lag processor speeds
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

The use of multiple processors on
the same chip provides the
potential to increase performance

Multicore without increasing the clock rate

Strategy is to use two simpler

processors on the chip rather than
one more complex processor

With two processors larger

caches are justified

As caches became larger it made

performance sense to create two
and then three levels of cache on
a chip

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

+
Many Integrated Core (MIC)
Graphics Processing Unit (GPU)
MIC GPU
 Leap in performance as well as the  Core designed to perform parallel
challenges in developing software operations on graphics data
to exploit such a large number of
cores  Traditionally found on a plug-in
graphics card, it is used to encode
 The multicore and MIC strategy and render 2D and 3D graphics as
involves a homogeneous collection well as process video
of general purpose processors on a
single chip  Used as vector processors for a
variety of applications that require
repetitive computations

+  Gene Amdahl

 Deals with the potential speedup of a

program using multiple processors compared
to a single processor
Amdahl’s  Illustrates the problems facing industry in the

Law
development of multi-core machines
 Software must be adapted to a highly
parallel execution environment to exploit
the power of parallel processing

 Can be generalized to evaluate and design

technical improvement in a computer system

 Consider a program running on a single processor such

that :
a fraction (1 - f) of the execution time involves code that is
inherently sequential,
 and a fraction f that involves code that is infinitely
parallelizable with no scheduling overhead
 Let T be the total execution time of the program using a
single processor.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Speed up

+
Conclusions from the Law

+
Speedup after optimization, SUf, so
overall speedup is:

+
Example:

+
Little’s Law
 Fundamental and simple relation with broad applications

 Can be applied to almost any system that is statistically in steady

state, and in which there is no leakage

 Queuing system
 If server is idle an item is served immediately, otherwise an arriving item
joins a queue
 There can be a single queue for a single server or for multiple servers, or
multiple queues with one being for each of multiple servers

 Average number of items in a queuing system equals the average

rate at which items arrive multiplied by the time that an item
spends in the system
 Relationship requires very few assumptions
 Because of its simplicity and generality it is extremely useful

+
Elaboration of Little’s Law

William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
33 pages
Comp422 534 2020 Lecture1 Introduction
No ratings yet
Comp422 534 2020 Lecture1 Introduction
49 pages
Seminar Report
50% (4)
Seminar Report
30 pages
Comp422 2011 Lecture1 Introduction
No ratings yet
Comp422 2011 Lecture1 Introduction
50 pages
CA Lec1
No ratings yet
CA Lec1
29 pages
Chapter 2
No ratings yet
Chapter 2
34 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Chapter 2 Notes NBCAS511
No ratings yet
Chapter 2 Notes NBCAS511
10 pages
Chapter 1 Solution
No ratings yet
Chapter 1 Solution
35 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
Chapter 2
No ratings yet
Chapter 2
34 pages
Chapter 11
No ratings yet
Chapter 11
33 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
How To Design A Microprocessor - Lesson Plan
No ratings yet
How To Design A Microprocessor - Lesson Plan
7 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Multicore Processor
100% (1)
Multicore Processor
23 pages
PC 1
No ratings yet
PC 1
53 pages
Instructor: L. N. Bhuyan
No ratings yet
Instructor: L. N. Bhuyan
32 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Computer Architecture
No ratings yet
Computer Architecture
56 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
19bce0531 VL2021220104072 Da 1 PDF
No ratings yet
19bce0531 VL2021220104072 Da 1 PDF
16 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
CH02 COA10e
No ratings yet
CH02 COA10e
67 pages
4 - Performance Issues
No ratings yet
4 - Performance Issues
48 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Multi-Core Processing: Advantages & Challenges
No ratings yet
Multi-Core Processing: Advantages & Challenges
35 pages
Module 5
No ratings yet
Module 5
45 pages
Chapter Two
No ratings yet
Chapter Two
33 pages
CS-3006 2 PDC Overview Compressed
No ratings yet
CS-3006 2 PDC Overview Compressed
107 pages
Single Core Processor
No ratings yet
Single Core Processor
9 pages
20BCE2351 Micro Assignment-02
No ratings yet
20BCE2351 Micro Assignment-02
5 pages
HPC Unit 2
No ratings yet
HPC Unit 2
72 pages
20BCE2351 Micro Assignment-02
No ratings yet
20BCE2351 Micro Assignment-02
5 pages
Lecture 36
No ratings yet
Lecture 36
15 pages
Ünite
No ratings yet
Ünite
33 pages
Chapter 2
No ratings yet
Chapter 2
14 pages
CH02 COA10e.performance Issues
No ratings yet
CH02 COA10e.performance Issues
19 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
2 Week
No ratings yet
2 Week
35 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Lecture1 Introduction To Parallel Computing - 2025
No ratings yet
Lecture1 Introduction To Parallel Computing - 2025
38 pages
Parallel Programming - Unit 1
No ratings yet
Parallel Programming - Unit 1
81 pages
CH02 COA11e
No ratings yet
CH02 COA11e
34 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
التحليل
No ratings yet
التحليل
32 pages
Cao - Unit 4 - Notes - Final
No ratings yet
Cao - Unit 4 - Notes - Final
30 pages
HPC - 1
No ratings yet
HPC - 1
40 pages
HPC TT1
No ratings yet
HPC TT1
29 pages
Lec 2
No ratings yet
Lec 2
31 pages
Lec 2
No ratings yet
Lec 2
31 pages
CH02 COA10e
No ratings yet
CH02 COA10e
33 pages
DeepSparse for Efficient CPU Inference: The Complete Guide for Developers and Engineers
From Everand
DeepSparse for Efficient CPU Inference: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Cerebras GPT: Wafer-Scale Architectures for Large Language Models
From Everand
Cerebras GPT: Wafer-Scale Architectures for Large Language Models
William Smith
No ratings yet
Rsync Solutions: Definitive Reference for Developers and Engineers
From Everand
Rsync Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Kafka for Distributed Systems: Definitive Reference for Developers and Engineers
From Everand
Kafka for Distributed Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Essential Apache Beam: Definitive Reference for Developers and Engineers
From Everand
Essential Apache Beam: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Puma Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
From Everand
Puma Deployment and Configuration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Oracle 11g Streams Implementer's Guide
From Everand
Oracle 11g Streams Implementer's Guide
Ann L. R. McKinnell
No ratings yet