0% found this document useful (0 votes)
45 views23 pages

SP23 CS 212 Week 2

Uploaded by

Adeena Asif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views23 pages

SP23 CS 212 Week 2

Uploaded by

Adeena Asif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

+

William Stallings
Computer Organization
and Architecture
10th Edition

© 2016 Pearson Education, Inc., Hoboken,


NJ. All rights reserved.
+ Chapter 2
Performance Issues
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Designing for Performance
 The cost of computer systems continues to drop dramatically, while the performance and capacity
of those systems continue to rise equally dramatically

 Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years ago

 Processors are so inexpensive that we now have microprocessors we throw away

 Desktop applications that require the great power of today’s microprocessor-based systems include:
 Image processing
 Three-dimensional rendering
 Speech recognition
 Videoconferencing
 Multimedia authoring
 Voice and video annotation of files
 Simulation modeling

 Businesses are relying on increasingly powerful servers to handle transaction and database
processing and to support massive client/server networks that have replaced the huge
mainframe computer centers of yesteryear

 Cloud service providers use massive high-performance banks of servers to satisfy high-
volume, high-transaction-rate applications for a broad spectrum of clients

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Microprocessor Speed
Techniques built into contemporary processors include:

Pipelining • Processor moves data or instructions into a conceptual pipe


with all stages of the pipe processing simultaneously

• Processor looks ahead in the instruction code fetched from


Branch prediction memory and predicts which branches, or groups of
instructions, are likely to be processed next

• This is the ability to issue more than one instruction in every


Superscalar execution processor clock cycle. (In effect, multiple parallel pipelines
are used.)

• Processor analyzes which instructions are dependent on each


Data flow analysis other’s results, or data, to create an optimized schedule of
instructions

• Using branch prediction and data flow analysis, some


processors speculatively execute instructions ahead of their
Speculative execution actual appearance in the program execution, holding the
results in temporary locations, keeping execution engines as
busy as possible

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Performance
Balance
Increase the number of
 Adjust the organization and bits that are retrieved at
one time by making
architecture to compensate DRAMs “wider” rather
than “deeper” and by
for the mismatch among the using wide bus data
paths
capabilities of the various
components
Reduce the frequency of
 Architectural examples memory access by
incorporating
include: increasingly complex and
efficient cache structures
between the processor
and main memory

Change the DRAM Increase the interconnect


interface to make it more bandwidth between
efficient by including a processors and memory
cache or other buffering by using higher speed
scheme on the DRAM buses and a hierarchy of
chip buses to buffer and
structure data flow

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Improvements in Chip Organization
and Architecture
 Increase hardware speed of processor
 Fundamentally due to shrinking logic gate size
 More gates, packed more tightly, increasing clock rate
 Propagation time for signals reduced

 Increase size and speed of caches


 Dedicating part of processor chip
 Cache access times drop significantly

 Change processor organization and architecture


 Increase effective speed of instruction execution
 Parallelism

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Problems with Clock Speed and Logic
Density
 Power
 Power density increases with density of logic and clock speed
 Dissipating heat

 RC delay
 Speed at which electrons flow limited by resistance and capacitance of
metal wires connecting them
 Delay increases as the RC product increases
 As components on the chip decrease in size, the wire interconnects
become thinner, increasing resistance
 Also, the wires are closer together, increasing capacitance

 Memory latency
 Memory speeds lag processor speeds
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


The use of multiple processors on
the same chip provides the
potential to increase performance

Multicore without increasing the clock rate

Strategy is to use two simpler


processors on the chip rather than
one more complex processor

With two processors larger


caches are justified

As caches became larger it made


performance sense to create two
and then three levels of cache on
a chip

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Many Integrated Core (MIC)
Graphics Processing Unit (GPU)
MIC GPU
 Leap in performance as well as the  Core designed to perform parallel
challenges in developing software operations on graphics data
to exploit such a large number of
cores  Traditionally found on a plug-in
graphics card, it is used to encode
 The multicore and MIC strategy and render 2D and 3D graphics as
involves a homogeneous collection well as process video
of general purpose processors on a
single chip  Used as vector processors for a
variety of applications that require
repetitive computations

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+  Gene Amdahl

 Deals with the potential speedup of a


program using multiple processors compared
to a single processor
Amdahl’s  Illustrates the problems facing industry in the

Law
development of multi-core machines
 Software must be adapted to a highly
parallel execution environment to exploit
the power of parallel processing

 Can be generalized to evaluate and design


technical improvement in a computer system

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+

 Consider a program running on a single processor such


that :
a fraction (1 - f) of the execution time involves code that is
inherently sequential,
 and a fraction f that involves code that is infinitely
parallelizable with no scheduling overhead
 Let T be the total execution time of the program using a
single processor.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Speed up

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Conclusions from the Law

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Speedup

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Speedup after optimization, SUf, so
overall speedup is:

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Example:

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Little’s Law
 Fundamental and simple relation with broad applications

 Can be applied to almost any system that is statistically in steady


state, and in which there is no leakage

 Queuing system
 If server is idle an item is served immediately, otherwise an arriving item
joins a queue
 There can be a single queue for a single server or for multiple servers, or
multiple queues with one being for each of multiple servers

 Average number of items in a queuing system equals the average


rate at which items arrive multiplied by the time that an item
spends in the system
 Relationship requires very few assumptions
 Because of its simplicity and generality it is extremely useful

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Elaboration of Little’s Law

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

You might also like