0% found this document useful (0 votes)
94 views49 pages

Processor's Performance: Parth Shah Parthshah - Ce@charusat - Ac.in

This document discusses processor performance. It defines performance in terms of execution time and throughput. Faster execution time means higher performance while higher throughput means more work is getting done. Response time measures how long it takes for a single job to run, while throughput measures how many jobs can be run at once. Performance can be measured by comparing the execution times of the same program on different processors.

Uploaded by

Harshal Jethwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views49 pages

Processor's Performance: Parth Shah Parthshah - Ce@charusat - Ac.in

This document discusses processor performance. It defines performance in terms of execution time and throughput. Faster execution time means higher performance while higher throughput means more work is getting done. Response time measures how long it takes for a single job to run, while throughput measures how many jobs can be run at once. Performance can be measured by comparing the execution times of the same program on different processors.

Uploaded by

Harshal Jethwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Processor’s Performance

IT254 Computer Organization & Microprocessor Interfacing

Parth Shah
[email protected]
Outline

Introduction
Defining Performance
The Iron Law of Processor Performance
Processor performance enhancement
Performance Evaluation Approaches
Performance Reporting
Amdahl’s Law

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 2


Defining Performance

• Parameters could be considered


• Speed
• Transporting passenger
• Cruising range

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 3


Performance example

Time of Concorde vs. Boeing 747?


Concorde is 1350 mph / 610 mph = 2.2 times faster (120%)
Throughput of Concorde vs. Boeing 747?
Concorde is 178,200 mph / 286,700 mph = 0.62 times capacity
Boeing is 286,700 mph / 178,200 mph = 1.6 times capacity (60%)
Our focus for processor performance = execution time for a
single job – why?

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 4


Introduction

Performance measurement is important:


Helps us to determine if one processor (or computer) works faster
than another
Helps us to know how much performance improvement has
taken place after incorporating some performance enhancement
feature
Helps to see through the marketing hype!
Performance measurement provides answer to the following
questions:
Why is some hardware better than others for different programs?
What factors affect system performance?
• Hardware, OS or Compiler?
How does the machine's instruction set affect performance?

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 5


Defining Performance in Terms of Time

Time is the final measure of computer performance


A computer exhibits higher performance if it executes
programs faster Individual user
concerns…
Response Time (elapsed time ,latency):
how long does it take for my job to run?
how long does it take to execute (start to finish) my job?
how long must I  wait for the database query? Systems manager
Throughput (Bandwidth, task per unit time): concerns…
How many jobs can the machine run at once?
what is the average execution rate?
How much work is getting done?
Throughput = (This is not always true)

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 6


Response time vs. Throughput

Response time and throughout often are in opposition – why?


When is throughput is more important than response time?
When is response time more important than throughput?

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 7


Throughput and Response Time

Do the following changes to a computer system increase


throughput, decrease response time, or both?
1. Replacing the processor in a computer with a faster version
2. Adding additional processors to a system that uses multiple
processors for separate tasks—for example, searching the web
In many real computer systems, changing either Response time
or throughput often affects the other.

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 8


Understanding performance

How are the following likely to effect response time and


throughput?
Upgrade a machine with a new processor with faster clock
Increasing the number of jobs (e.g., having a single computer service
multiple users).
Adding a new machine in the lab.
Increasing the number of processor (e.g., in a network of ATM
machines)

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 9


Tutorial… Latency and Throughput

Website for ordering Penguin based USB drives


2 servers
Order request assigned to a server
Server takes 1 Millisecond to process the order
Server can not do anything while processing the order
This web site has
Throughput (Orders / Second)
2000 Orders / Second
Latency (in Milliseconds)
1 Milliseconds

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 10


Tutorial… Latency and Throughput

Consider a website providing file processing


Website is hosted on 3 servers
Each server takes 2 milliseconds to process the file
Server can not do anything else while processing the file
The web site has
Throughput (# of file processed/seconds)
1500 file processed/seconds
Latency (milliseconds)
2 milliseconds

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 11


Execution Time

Elapsed Time
counts everything (disk and memory accesses, waiting for I/O,
running other programs, etc.) from start to finish
a useful number, but often not good for comparison purposes
elapsed time = CPU time+ wait time (I/O, other programs, etc.)
CPU time
doesn't count waiting for I/O or time spent running other programs
can be divided into user CPU time and system CPU time (OS calls)
CPU time = user CPU time + system CPU time
elapsed time = user CPU time + system CPU time + wait time
Our focus: user CPU time
 (CPU execution time or, simply, execution time): time spent
executing the lines of code that are in our program

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 12


Measuring Performance

For some program running on machine X:


PerformanceX =
X is n times faster than Y means:
Speedup = = n

Speedup = OR Speedup =

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 13


Example

If computer A runs a program in 10 seconds and computer B


runs the same program in 15 seconds, how much faster is A
than B?

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 14


Tutorial… Performance comparison

Mobile takes 4 hours to compress the video


New mobile can do it in 10 minutes
Speedup = 24

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 15


Tutorial… Performance comparison

Mobile takes 10 minutes to compress the video


It falls down storm drain, we have to use older mobile
Speedup = 0.04

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 16


Speedup

Speedup > 1 (Improved Performance)


Shorter execution time
Higher throughput
Speedup < 1 (Worse Performance)
Larger execution time
Lower throughput
Performance ≈ Throughput

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 17


Tutorial… Speedup averaging

Consider we have old laptop and new laptop


For homework formatting program we are getting 2x speedup
using new laptop
For virus scan program we are getting 8x speedup using new
laptop
So over all speedup = 4

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 18


Understanding Performance

If an Pentium III runs a program in 8 seconds and PowerPC runs


the same program in 10 seconds. How many times faster is the
Pentium III?
n = 10 / 8 = 1.25 times faster (or 25% faster)
Why might someone choose to buy the PowerPC in this case?

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 19


Measuring Performance

CPU execution time: Also called CPU time. The actual time the
CPU spends computing for a specific task.
User CPU time: The CPU time spent in a program itself.
System CPU time: The CPU time spent in the operating system
performing tasks on behalf of the program.
Clock Cycle: is a single period of an oscillating clock signal.
Clock speed, rate, and frequency are used to describe the same
thing: the number of clock cycles per second, measured in Hertz
(Hz) (e.g., 4 gigahertz, or 4 GHz)
Clock period: The time required to complete single clock cycle.
(e.g., 250 picoseconds)

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 20


The Iron Law of Processor Performance

code size CPI cycle time

Architecture --> Implementation --> Realization

Compiler Designer Processor Designer Chip Designer
Algorithm Instruction Set Circuit Designer

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 21


Tutorial… Example of computing CPU time

If clock rate = 50 MHZ, find execution time for a program with


1,000 instruction, if the CPI for the program = 3.5 then what is
the CPU (execution) time?
CPU time = instruction count x CPI / clock rate
CPU time = (1000 x 3.5 / (50 x 106)) sec = 70 microseconds
If clock rate increases from 50 MHz to 250 MHz and the other
factors remain the same, how many times faster will the
computer be?
= =5

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 22


Tutorial… Example of computing CPU time

Program executes 3 billions of instructions


Processor spends 2 cycles on each instruction
Processor clock speed is 3 GHz
The execution time = 2

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 23


The Iron Law of Processor Performance

Instructions/Program (Instruction count)


Instructions executed, not static code size
Determined by algorithm, compiler, ISA (Instruction set Architecture)
Cycles/Instruction (CPI)
Determined by ISA and CPU organization
Overlap among instructions reduces this term
Time/cycle (Cycle time)
Determined by technology, organization, clever circuit design

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 24


Different numbers of cycles for different instructions

Multiplication takes longer than addition


Floating point operation take longer than integer ones
Accessing memory takes more time than accessing registers
Changing cycle time often changes number of cycles required
for various instructions

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 25


Example

Our favorite program runs in 10 seconds on computer A, which


has a 2 GHz clock. We are trying to help a computer designer
build a computer, B, which will run this program in 6 seconds.
The designer has determined that a substantial increase in the
clock rate is possible, but this increase will affect the rest of the
CPU design, causing computer B to require 1.2 times as many
clock cycles as computer A for this program. What clock rate
should we tell the designer to target?

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 26


Example (Continue…)

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 27


Example

Suppose we have two implementations of the same instruction


set architecture. Computer A has a clock cycle time of 250 ps
and a CPI of 2.0 for some program, and computer B has a clock
cycle time of 500 ps and a CPI of 1.2 for the same program.
Which computer is faster for this program and by how much?

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 29


Example (Continue…)

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 30


CPI Example

Two implementations (A and B) of the same instruction set


architecture (ISA)
For some program
A has clock cycle time = 10 ns, CPI = 2.0
B has clock cycle time = 20 ns, CPI = 1.2
If two machines have the same ISA, which of four quantities will
not always be identical?
clock rate,
CPI,
execution time,
# of instructions,
MIPS

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 31


Example

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 32


Continue…

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 33


MIPS Example

Two compilers being tested for 100 Mhz machine with 3 classes
of instructions:
A (1 cycle), B (2 Cycles), and C (3 Cycles)
Compiler 1: 5 A, 1 B, 1 C instructions
Compiler 2: 10 A, 1 B, 1 C instructions
Which sequence has higher MIPS?
Which sequence has lower execution time?

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 34


Computing CPI

• CPI = avg. no. of cycles per instruction.


• CPI =

Operation Fi CPIi CPIi x Fi % Time


ALU 50% 1 0.5 23%
Load 20% 5 1.0 45%
Store 10% 3 0.3 14%
Branch 20% 2 0.4 18%
Total 100% 2.2 100%

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 35


Tutorial… Iron Law of Processors Performance

50 billions instructions
10 billions are Branches (CPI = 4)
15 billions are Loads (CPI = 2)
5 billions are Stores (CPI = 3)
The rest are Integer ADD (CPI = 4)
Clock rate = 4 GHz
Execution time is 26.25

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 36


Processor Performance Enhancement

All processor performance enhancement technique boils down


to reducing one or more of these three terms
Some techniques can be used to reduce one term without
affecting others
Improved hardware technology
Compiler optimization techniques
Such type of performance optimization techniques are preferred
Some techniques can reduce one of the terms, but may
increase other terms(Inter-related)
CISC ISA reduces instruction count but increases CPI
Loop unrolling reduces instruction count but increases CPI

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 37


Measuring Performance

PerformanceX =
Actual users’ workload
Many programs
Not representative of other user
How do we get workload data?

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 43


Measuring Performance Using Benchmarks

1. Real Applications Examples:


compilers/editors, scientific applications, graphics, etc. 
Problem: Portability due to dependence on OS and Compiler
2. Modified Applications
Real applications modified/tailored to improve portability or to test
specific features of CPU 
3. Kernels
Programs that are much simpler than real applications Kernels;
small and key pieces of real applications
Examples: 
• Livermore Loops: 24 loop kernels
• Linpack: linear algebra package

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 44


Measuring Performance Using Benchmarks

4. Toy benchmarks
10 to 100lines of simple programs Easy to type and run on almost all
computers Example: Quick sort, Merge sort, etc. 
5. Synthetic Benchmarks
Basic Principle:
• Analyze the distribution of instructions over a large number of practical
programs.
Synthesize a program that has the same instruction distribution as a
typical program:
• Need not compute something meaningful.
Dhrystone, Khornerstone, Linpack are some of the older synthetic
benchmarks

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 45


SPEC

Recently used popular approach is to put together collections


of benchmarks measuring performance of a variety of
applications
SPEC: System Performance Evaluation Cooperative:
A non-profit organization (www.spec.org)
CPU-intensive benchmark for evaluating processor
performance of workstation:
Generations: SPEC89, SPEC92, SPEC95, and SPEC2000 …
Emphasizing memory system performance inSPEC2000.

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 46


SPECINTC2006 benchmarks running on a 2.66 GHz
Intel Core i7 920

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 47


Amdahl’s Law

Quantifies overall performance gain due to improve in a part


of a computation.
Amdahl’s Law:
Performance improvement gained from using some faster mode of
execution is limited by the amount of time the enhancement is
actually used.
Speedup =

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 48


Amdahl’s Law and Speedup

Speedup tells us:


How much faster a machine will run due to an enhancement.
For using Amdahl’s law two things should be considered:
1st : Fraction of the computation time in the original machine that
can use the enhancement
• If a program executes in 30 seconds and 15 seconds of exec. uses
enhancement, fraction ½
2nd : Improvement gained by enhancement
• If enhanced task takes 3.5 seconds and original task took 7 secs, we say the
speedup is 2.

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 49


Amdahl’s Law Equations

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 50


Tutorial… Amdahl’s Law
Let 

if the part that can be improved is 25% of the overall system


and its performance can be doubled, then

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 51


Amdahl’s Law Implication

Speedup = 1 / ((1 - FracEnhc) + FracEnhc / SpeedupEnhc )


Enhancement 1:
Speedup of 20 on 10% of Time
1 / ((1 – 0.1) + 0.1 / 20)
Make the
1.105
common case
Speedup of 1.6 on 80% of Time fast
1 / ((1 – 0.8) + 0.8 / 1.6)
1.43
Infinite speedup on 10% Time
1 / (1 – 0.1) + 0
1.111

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 52


Tutorial… Amdahl’s Law
Inst Type % of Time CPI
Int 40 1
Br 20 4
Ld 30 2
St 10 3

Possible improvements (Which is the best)


Branch CPI 4 -> 3
Increase clock frequency 2 -> 2.3 GHz
Store CPI 3 -> 2

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 53


Tutorial… Amdahl’s Law
Inst Type % of Time CPI
Int 40 1
Br 20 4
Ld 30 2
St 10 3

Possible improvements (Which is the best)


Branch CPI 4 -> 3
Increase clock frequency 2 -> 2.3 GHz
Store CPI 3 -> 2

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 54


Summary…

03/17/2022 IT254 Computer Organization & Microprocessor Interfacing 55

You might also like