4 - Performance Issues

Uploaded by

hakumbilad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views48 pages

4 - Performance Issues

Uploaded by

hakumbilad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 48

CSC 2111

Computer Organisation and

Architecture
2 Unit 4: Intended Learning
Outcomes
 By the end of this unit, you should be able to:

 Understand the key performance issues that

relate to computer design.
 Distinguish among multicore, MIC, and GPGPU
organisations.
 Perform basic Measures of Computer
Performance
Designing for
Performance
Unit Introduction
 Chipmakers have been busy learning how to fabricate chips of
greater and greater density.
 But the raw speed of the microprocessor will not achieve its
potential unless it is fed a constant stream of work to perform
in the form of computer instructions.
 As such, processor designers must come up with ever more
elaborate techniques for feeding the processor with
instructions.
 Among the techniques built into contemporary processors are
the following:
 Pipelining
 Branch prediction
 Superscalar execution
 Data flow analysis
 Speculative execution
Pipelining
 Recap
 The execution of an instruction involves multiple stages of
operation: fetching the instruction, decoding the opcode,
fetching operands, performing a calculation, and so on.
 Pipelining
 A processor works simultaneously on multiple instructions
by performing a different phase for each of the multiple
instructions at the same time.
 It overlaps operations by moving data or instructions into
a conceptual pipe with all stages of the pipe processing
simultaneously.
 For example, while one instruction is being executed, the
computer is decoding the next instruction.
 This is the same principle as seen in an assembly line.
Assembly Line Example
 Consider a water bottle packaging plant.
 Let there be 3 stages that a bottle should pass through,
Inserting the bottle(I), Filling water in the bottle(F), and Sealing
the bottle(S).
 Let us consider these stages as stage 1, stage 2 and stage 3
respectively.
 Let each stage take 1 minute to complete its operation.
 Without (left) Vs. With (right) Pipelining
Execution in a Pipelined
Processor
 Consider a processor having 4 stages to each an
instruction and let there be 2 instructions to be executed.
Execution in a Pipelined
Processor
 Consider a processor having 4 stages to each an
instruction and let there be 2 instructions to be
executed.
Execution in a Pipelined
Processor
 Consider a processor having 4 stages to each an
instruction and let there be 2 instructions to be
executed.
Hyper Threading Vs. Pipelining
 In computer science a thread of execution is the smallest
sequence of programmed instructions that can be managed
independently by a scheduler, which is typically a part of
the operating system.
 The implementation of threads and processes differs between
operating systems, but in most cases a thread is a component of
a process.
 The multiple threads of a given process may be
executed concurrently (via multithreading capabilities), sharing
resources such as memory, while different processes do not
share these resources.
 Pipelining works on a single thread, hyperthreading works on
multiple threads.
Hyper Threading Vs.
Multithreading
Branch Prediction
A branch is an instruction in a computer program that can
cause a computer to begin executing a
different instruction sequence and thus deviate from its default
behavior of executing instructions in order.

A branch predictor is a digital circuit that tries to guess

which way a branch will go before this is known definitively. The
purpose of the branch predictor is to improve the flow in
the instruction pipeline.
Branch Prediction
Without branch prediction, the processor would have to
wait until the conditional jump instruction has passed the
execute stage before the next instruction can enter the
fetch stage in the pipeline.
The branch predictor attempts to avoid this waste of time
by trying to guess whether the conditional jump is most
likely to be taken or not taken.
The branch that is guessed to be the most likely is then
fetched and speculatively executed.
If it is later detected that the guess was wrong, then the
speculatively executed or partially executed instructions
are discarded and the pipeline starts over with the correct
branch, incurring a delay.
Superscalar Execution
 This is the ability to issue more than one instruction in
every processor clock cycle.
 In effect, multiple parallel pipelines are used.
 A superscalar processor is a CPU that implements a form
of parallelism called instruction-level parallelism within a
single processor.
 In contrast to a scalar processor, which can execute at most
one single instruction per clock cycle, a superscalar
processor can execute more than one instruction during a
clock cycle by simultaneously dispatching multiple
instructions to different execution units on the processor.
 Each execution unit is not a separate processor (or a core if
the processor is a multi-core processor), but an execution
resource within a single CPU such as an arithmetic logic unit.
Superscalar Execution Vs.
Pipelining
 While a superscalar CPU is typically also pipelined,
superscalar and pipelining execution are
considered different performance enhancement
techniques.
 Superscalar executes multiple instructions in
parallel by using multiple execution units
 Pipelining executes multiple instructions in the
same execution unit in parallel by dividing the
execution unit into different phases.
Superscalar Execution
Pipeline Execution
Data flow analysis
 The processor analyzes which
instructions are dependent on each
other’s results, or data, to create an
optimized schedule of instructions.
 In fact, instructions are scheduled to be
executed when ready, independent of
the original program order.
 This prevents unnecessary delay.
Speculative execution
 Using branch prediction and data flow analysis,
some processors speculatively execute
instructions ahead of their actual appearance
in the program execution, holding the results in
temporary locations.
 This enables the processor to keep its
execution engines as busy as possible by
executing instructions that are likely to be
needed.
Performance Balance
Performance Balance
 Need for performance balance?
 Processor power has raced ahead at
breakneck speed, while other critical
components of the computer have not kept
up.
 Resulting in adjusting of the organisation and
architecture to compensate for the mismatch
among the capabilities of the various
components: especially at the interface
between the processor and main
memory or I/O devices.
Processor – Memory Interface
 Increase the number of bits that are retrieved at one
time from DRAMs “wider” rather than “deeper”
 Using wide bus data paths
 Reduce the frequency of memory access by
incorporating increasingly complex and efficient cache
structures between the processor and main memory.
 Increase the interconnect bandwidth between
processors and memory by using higher-speed buses
and a hierarchy of buses to buffer and structure data
flow.
Processor – I/O Interface
 As computers become faster and more capable, more
sophisticated applications are developed that support
the use of peripherals with intensive I/O demands
Processor – I/O Interface
 Strategies of getting I/O data moved between
processor and peripherals.
 Caching and buffering schemes
 Use of higher-speed interconnection buses and
interconnection structures.
 Use of multiple-processor configurations to
satisfy I/O demands.
 Designers constantly strive to balance the
throughput and processing demands of the
processor components, main memory, I/O devices,
and the interconnection structures.
Multicore, MICS, and GPGPUs
 New approaches to improving performance:
 Multicore
 An approach to improving performance by placing multiple
processors on the same chip, with a large shared cache.
 Many Integrated Core (MIC).
 The number of cores per chip are more than 50 cores per chip.
 GPGPUs
 A chip with multiple general-purpose processors plus graphics
processing units (GPUs) and specialised cores for video
processing and other tasks.
 When a broad range of applications are supported by such a
processor, the term general-purpose computing on GPUs
(GPGPU) is used.
Basic Measures of
Computer Performance
Amdahl’s Law
Amdahl’s Law
 Computer system designers look for ways to
improve system performance by advances in
technology or change in design.
 However, a speedup in one aspect of the
technology or design does not result in a
corresponding improvement in performance.
 Amdahl’s law was first proposed by Gene
Amdahl in 1967 and deals with the potential
speedup of a program using multiple
processors compared to a single processor.
Amdahl’s Law
 Consider a program running on a single
processor such that:
 a fraction (1 - f) of the execution time involves
code that is inherently sequential, and
 a fraction f that involves code that is infinitely
parallelizable with no scheduling overhead.
 Let T be the total execution time of the program
using a single processor.
Amdahl’s Law
 Consider a program running on a single
processor such that:
 Then the speedup using a parallel processor with
N processors that fully exploits the parallel
portion of the program is as follows:
Amdahl’s Law
 Illustration of Amdahl’s Law
Amdahl’s Law
 Amdahl’s Law for Multiprocessors
Clock Speed
Clock Speed
 Operations performed by a processor, such as
fetching an instruction, decoding the instruction,
performing an arithmetic operation, and so on,
are governed by a system clock.
 Typically, all operations begin with the pulse of
the clock.
 Thus, at the most fundamental level, the speed
of a processor is dictated by the pulse frequency
produced by the clock, measured in cycles per
second, or Hertz (Hz).
 MHz (megahertz, or millions of pulses per second)
 GHz (gigahertz, or billions of pulses per second)
System Clock
 Typically, clock signals
are generated by a
quartz crystal, which
generates a constant
signal wave while power
is applied.
 This wave is converted
into a digital voltage
pulse stream that is
provided in a constant
flow to the processor
circuitry.
System Clock
System Clock
System Clock
 The rate of pulses is known as the clock
rate, or clock speed.
 One increment, or pulse, of the clock is
referred to as a clock cycle, or a clock tick.
 The time between pulses is the cycle time.
Instruction Cycle, Machine Cycle and T-
State
 The execution of an instruction involves a number
of discrete steps:
 fetching the instruction from memory,
 decoding the various portions of the instruction,
 loading and storing data, and
 performing arithmetic and logical operations.
 Most instructions on most processors require
multiple clock cycles to complete.
Instruction Cycle, Machine Cycle and T-
State
 Instruction Cycle
 Is the fetching, decoding and execution of a single instruction.
 Typically consists of one to five read or write operations
between processor and memory or input/output devices.
 Machine Cycle
 Is a particular time period required by each memory or I/O
operation
 In other words, to move a byte of data in or out of
the microprocessor, a machine cycle is required.
 T-State
 Each machine cycle consists of 3 to 6 clock periods/cycles,
referred to as T-states.
 Typically, one instruction cycle consists of one to five machine
cycles and one machine cycle consists of three to six T-states
i.e. three to six clock periods.
Instruction Cycle, Machine Cycle and T-
State
INSTRUCTION EXECUTION RATE
 Parameters
 Instruction Count (Ic), for a program is the
number of machine instructions executed for that
program until it runs to completion or for some
defined time interval.
 Average Cycles Per Instruction (CPI) for a program
is the number of clock cycles required for a machine
instruction.
 On any given processor, the number of clock cycles
required varies for different types of instructions, such
as load, store, branch, and so on.
 Hence, the average cycles per instruction (CPI) for
a program is an important parameter.
Class Exercise
 Consider the execution of a program that results in the
execution of 2 million instructions
 The program consists of 4 major types of instructions. The
instruction mix and the CPI for each instruction type are
given below
 What is the average CPI?
CPI Formula

 Let CPIi be the number of cycles required for

instruction type i and Ii be the number of
executed instructions of type Ii for a given
program.
 Then we can calculate an average CPI as
follows:
Processor Execution Time Formula

 A processor is driven by a clock with a

constant frequency f or,
 Equivalently, a constant cycle time T, where T
= 1/ f.
 The processor time T needed to execute a
given program can be expressed as:
Class Exercise
 The processor runs at a clock rate of 400
MHz
 What is the processor execution time?
Instruction Execution Rate
 A common measure of performance for a processor is
the rate at which instructions are executed
 Expressed as millions of instructions per second
(MIPS)- the MIPS rate.
 Expressed in terms of the clock rate and CPI as
follows:
Instruction Execution Rate

Question Bank Answers: Descriptive Questions
No ratings yet
Question Bank Answers: Descriptive Questions
19 pages
Steps To Implement Deltav Sis
No ratings yet
Steps To Implement Deltav Sis
16 pages
EE6304 Lecture12 TLP
No ratings yet
EE6304 Lecture12 TLP
70 pages
Introduction To High Performance Computing: Unit-I
No ratings yet
Introduction To High Performance Computing: Unit-I
70 pages
CSC 580 - Chapter 2
No ratings yet
CSC 580 - Chapter 2
50 pages
Pipeline Processing
No ratings yet
Pipeline Processing
28 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Campmc Unit Ii
No ratings yet
Campmc Unit Ii
61 pages
William Stallings Computer Organization and Architecture 10 Edition
No ratings yet
William Stallings Computer Organization and Architecture 10 Edition
33 pages
CH02 COA10e
No ratings yet
CH02 COA10e
67 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
CSO Computer Programming
No ratings yet
CSO Computer Programming
73 pages
Lecture 2 - Parallel Programming Platforms (Part I)
No ratings yet
Lecture 2 - Parallel Programming Platforms (Part I)
44 pages
Modle 01 - HPC Introduction To Pipeline
No ratings yet
Modle 01 - HPC Introduction To Pipeline
124 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
CSC 313 Module 3 Pipelining
No ratings yet
CSC 313 Module 3 Pipelining
59 pages
Tableau Document
No ratings yet
Tableau Document
362 pages
Unit 5
No ratings yet
Unit 5
44 pages
Lecture1 Introduction To Parallel Computing - 2025
No ratings yet
Lecture1 Introduction To Parallel Computing - 2025
38 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Computer Architecture 1
No ratings yet
Computer Architecture 1
37 pages
Lecture On Global Informatics and Electronics
No ratings yet
Lecture On Global Informatics and Electronics
45 pages
التحليل
No ratings yet
التحليل
32 pages
Lec 13
No ratings yet
Lec 13
36 pages
2 Week
No ratings yet
2 Week
35 pages
Unit 5
No ratings yet
Unit 5
36 pages
Module 5
No ratings yet
Module 5
45 pages
Lec 14
No ratings yet
Lec 14
36 pages
Ünite
No ratings yet
Ünite
33 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Parallel Computing Platforms-Dr Nausheen
No ratings yet
Parallel Computing Platforms-Dr Nausheen
47 pages
4.4 Pipelining
No ratings yet
4.4 Pipelining
39 pages
5 Pipeline
No ratings yet
5 Pipeline
63 pages
Lecture 2 - Parallel Programming Platforms (Part I) - Updated - 2021
No ratings yet
Lecture 2 - Parallel Programming Platforms (Part I) - Updated - 2021
44 pages
Chapter 11
No ratings yet
Chapter 11
33 pages
Chap2 Slides
No ratings yet
Chap2 Slides
127 pages
Wscad Suite X User Manual
No ratings yet
Wscad Suite X User Manual
1,153 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Parallel Processing
No ratings yet
Parallel Processing
127 pages
Chapter 2
No ratings yet
Chapter 2
34 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
No ratings yet
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
17 pages
Final
No ratings yet
Final
26 pages
Chapter 2
No ratings yet
Chapter 2
14 pages
SP23 CS 212 Week 2
No ratings yet
SP23 CS 212 Week 2
23 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
CH02 COA10e
No ratings yet
CH02 COA10e
33 pages
Immersion Day - Auto Scaling Lab
No ratings yet
Immersion Day - Auto Scaling Lab
25 pages
Autocad & Covadis Software in Surveying Project.
No ratings yet
Autocad & Covadis Software in Surveying Project.
14 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
AWS DevOps Interview Questions
No ratings yet
AWS DevOps Interview Questions
5 pages
Input Unit: Memory: in Processing Element (PE) or CPU: Output
No ratings yet
Input Unit: Memory: in Processing Element (PE) or CPU: Output
24 pages
Chapter 1 Solution
No ratings yet
Chapter 1 Solution
35 pages
Chapter 2 Notes NBCAS511
No ratings yet
Chapter 2 Notes NBCAS511
10 pages
CH02 COA10e.performance Issues
No ratings yet
CH02 COA10e.performance Issues
19 pages
Mod6 2 PDF
No ratings yet
Mod6 2 PDF
15 pages
Module 4
No ratings yet
Module 4
12 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
Microprocessor Old Question IOE TU
No ratings yet
Microprocessor Old Question IOE TU
27 pages
Superscalar Processor
No ratings yet
Superscalar Processor
4 pages
Mishika 54 Project Report PDF
No ratings yet
Mishika 54 Project Report PDF
58 pages
Cliosoft Sos Fundamentals
No ratings yet
Cliosoft Sos Fundamentals
124 pages
Design With Pantone Colors
No ratings yet
Design With Pantone Colors
5 pages
Delcam - PowerMILL 2012 Whats New EN - 2011
No ratings yet
Delcam - PowerMILL 2012 Whats New EN - 2011
100 pages
Delta Compact Modular Mid-Range PLC AS Series
No ratings yet
Delta Compact Modular Mid-Range PLC AS Series
56 pages
Best Resume Format Software Developer
100% (1)
Best Resume Format Software Developer
8 pages
2023 Core Katalog Web
No ratings yet
2023 Core Katalog Web
88 pages
Z-Trak Datasheet
No ratings yet
Z-Trak Datasheet
3 pages
Software Requirements Specification
No ratings yet
Software Requirements Specification
10 pages
Dell Latitude 5480: Owner's Manual
No ratings yet
Dell Latitude 5480: Owner's Manual
103 pages
Lab Manual: ISC 440 (Web Programming 2)
No ratings yet
Lab Manual: ISC 440 (Web Programming 2)
44 pages
SOLIDWORKS 2023 Hardware FAQs and Recommendations
No ratings yet
SOLIDWORKS 2023 Hardware FAQs and Recommendations
6 pages
Git in 4 Weeks
No ratings yet
Git in 4 Weeks
61 pages
Untitled
No ratings yet
Untitled
68 pages
Eclypse Connected Controller Platform Open To Services - Us
No ratings yet
Eclypse Connected Controller Platform Open To Services - Us
9 pages
Notification Thane DCC Bank JR Asst Peon Posts
No ratings yet
Notification Thane DCC Bank JR Asst Peon Posts
6 pages
Squad Coding Proposal
No ratings yet
Squad Coding Proposal
6 pages
Selenium Assignments by Raghuveer
No ratings yet
Selenium Assignments by Raghuveer
1 page
Ownership Based Cache Coherence
No ratings yet
Ownership Based Cache Coherence
10 pages
Online Examination System, PHP and MySQL Project Report Download
No ratings yet
Online Examination System, PHP and MySQL Project Report Download
7 pages
CF Lab#10
No ratings yet
CF Lab#10
2 pages
User Guide CC2018-up
No ratings yet
User Guide CC2018-up
1 page
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Operating Systems Interview Questions You'll Most Likely Be Asked
From Everand
Operating Systems Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

4 - Performance Issues

Uploaded by

4 - Performance Issues

Uploaded by

CSC 2111

Computer Organisation and

 Understand the key performance issues that

A branch predictor is a digital circuit that tries to guess

 Let CPIi be the number of cycles required for

 A processor is driven by a clock with a

You might also like