0% found this document useful (0 votes)

25 views62 pages

CS-3006 4 PerformanceAnalysis

The document discusses performance analysis in computer systems, focusing on various metrics such as execution time, clock speed, MIPS, and FLOPS. It emphasizes the importance of benchmarks for evaluating system performance and introduces Amdahl's Law and Gustafson's Law to explain the limitations and scalability of parallel computing. Additionally, it highlights factors affecting performance and provides examples of calculating theoretical compute power and speedup in parallel systems.

Uploaded by

i221861 Sara Zahid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views62 pages

CS-3006 4 PerformanceAnalysis

Uploaded by

i221861 Sara Zahid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 62

Performance Analysis

Dr. Muhammad Mateen Yaqoob,

Department of AI & DS,

National University of Computer & Emerging Sciences,
Islamabad Campus
Performance?
• To measure improvement in computer architectures, it
is necessary to compare alternative designs

• A better system has better performance, but what

exactly is the performance?
Performance?
Performance Metrics – Sequential
Systems
Performance?
• For the computer systems and programs:
– one main performance metric is Time
– Or just wall-clock time
Performance?
The execution time of a program A can be split into:
• User CPU time: capturing the time that the CPU spends for
executing A
• System CPU time: capturing the time that the CPU spends
for the execution of routines of the operating system issued
by A
• Waiting time: caused by waiting for the completion of I/O
operations and by the execution of other programs because
of time sharing

Here we concentrate on user CPU time

Computer Performance
• Measuring Computer Performance
– Clock Speed
– MIPS
– FLOPS
– Benchmark Tests

• Factors affecting Computer Performance

– Processor Speed
– Data-bus width
– Amount of cache
– Faster interfaces
– Amount of main-memory
Measuring Performance
• Every processor has a clock which ticks
continuously at a regular rate

• Clock synchronises all the digital components

• Cycle time measured in GHz

• 200 MHz (megahertz) means the clock ticks

200,000,000 (200 million) times a second
(Pentium1 -1995)
Machine Clock Rate
• Clock Rate (CR) in MHz, GHz, etc. is inverse of Clock Cycle (CC)
time (time of a single clock period)
CC = 1 / CR

one clock period

10 nsec clock cycle  100 MHz clock rate

5 nsec clock cycle  200 MHz clock rate
2 nsec clock cycle  500 MHz clock rate
1 nsec clock cycle  1 GHz clock rate
500 psec clock cycle  2 GHz clock rate
250 psec clock cycle  4 GHz clock rate
200 psec clock cycle  5 GHz clock rate
Measuring Performance
• Clock Speed
– Generally the faster the clock speed the faster the
processor – 3.2 GHz is faster than 1.2 GHz

• MIPS – Millions of Instructions per Second

– Better comparison
– But beware of false claims:
• Such as, only using the simplest & fastest instructions
and different processor families (having different ISA).
Measuring Performance
• Flops – Floating Point Operations per sec.
– Best measure as FP operations are the same in every
processor and provide best basis
– Measure of theoretical peak performance
Measuring Performance
• Flops – Floating Point Operations per sec.
– Servers are the only computers that sometimes have
more than one socket; for most home computers
(desktop or laptop), “sockets” will be 1.
– Cores per socket depend on your CPU. It could be 2
(dual-core), 3, 4 (quad-core), 6 (hexacore), or 8. There
are some prototype CPUs with as many as 80 cores.
– “Clock cycles per second” refers to the speed of your
CPU. Most modern CPUs are rated in gigahertz. So 2
GHz would be 2,000,000,000 clock cycles per second.
– The number of FLOPs per cycle also depends on the
CPU. One of the fastest (home computer) CPUs is the
Intel Core i7–970, capable of 4 double-precision or 8
single-precision floating-point operations per cycle.
Measuring Performance
• Test Example
– Intel Core i7–970 has 6 cores. If it is running at 3.46 GHz
and can perform 8 floating point operations per second,
calculate the theoretical compute power of this
machine.

– Intel Core i7–970 has 6 cores. If it is running at 3.46 GHz,

the formula would be:
– 1 (socket) * 6 (cores) * 3,460,000,000 (cycles per
second) * 8 (single-precision FLOPs per second) =
166,080,000,000 single-precision FLOPs per second or
83,040,000,000 double- precision FLOPs per second.
– 109 GFLOPS.
Units of High Performance Computing

Basic Unit Speed Capacity

Kilo 1 Kflop/s 103 Flop/second 1 KB 103 Bytes
Mega 1 Mflop/s 106 Flop/second 1 MB 106 Bytes
Giga 1 Gflop/s 109 Flop/second 1 GB 109 Bytes
Tera 1 Tflop/s 1012 Flop/second 1 TB 1012 Bytes
Peta 1 Pflop/s 1015 Flop/second 1 PB 1015 Bytes
Exa 1 Eflop/s 1018 Flop/second 1 EB 1018 Bytes
Zeta 1 Zflop/s 1021 Flop/second 1 ZB 1021 Bytes
Measuring Performance
• When we measure performance we usually mean how
fast the computer carries out instructions

• The measure we use is MIPS (Millions of Instructions

Per Second).

• MIPS affected by:

– The clock speed of the processor
– The speed of the buses
– The speed of memory access.
MIPS

Example:
ninstr(A) = 4 Millions
TU_CPU(A) = 0.05 seconds

4 / 0.05 = 80 Millions / 106 = 80 MIPS

MIPS

Example:
rcycle = 600 MHz (Mega == 106)
CPI(A) = 3

600 * 106 / 3 = 200 * 106 / 106 = 200 MIPS

MFLOPs

Example:
nflp_op(A) = 90 Millions (floating-point operations)
TU_CPU(A) = 3.5 seconds

(90 * 106)/ (3.5 * 106) = 25.71 MFLOPS(A)

Benchmarks
Why Do Benchmarks?
• How we evaluate differences?
–Different systems
–Changes to a single system

• Benchmarks represent large class of important

programs
Benchmarks
• Microbenchmarks
– Measure one performance dimension or aspect
• Cache bandwidth
• Memory bandwidth
• Procedure call overhead
• FP performance
– Insight into the underlying performance factors
– Not a good predictor of overall application performance

• Macrobenchmarks
– Application execution time
• Measures overall performance, using one application
• Need application suite
Popular Benchmark Suites
• Desktop
– SPEC CPU2000 - CPU intensive, integer & floating-point applications
– SPECviewperf, SPECapc - Graphics benchmarks
– SysMark, Winstone, Winbench
• Embedded
– EEMBC - Collection of kernels from 6 application areas
– Dhrystone - Old synthetic benchmark
• Servers
– SPECweb, SPECfs
– TPC-C - Transaction processing system
– TPC-H, TPC-R - Decision support system
– TPC-W - Transactional web benchmark
• Parallel Computers
– SPLASH - Scientific applications & kernels
– Linpack
Limitations of Memory System Performance
Limitations of Memory System Performance
• Example
• Consider a processor operating at 1 GHz (1 ns clock) connected to a
DRAM with a latency of 100 ns (no caches). Assume that the processor
has two multiply-add units and can execute four instructions in each
cycle of 1 ns.
• The peak processor rating = 4 GFLOPS. Since the memory latency is
equal to 100 cycles (each cycle is 1 ns) and block size is one word, every
time a memory request is made, the processor must wait 100 cycles
before it can process the data.
• Consider the problem of computing the dot-product of two vectors on
such a platform. A dot-product computation performs one multiply-add
on a single pair of vector elements, i.e., each floating point operation
requires one data fetch.
• It is easy to see that the peak speed of this computation is limited to one
floating point operation every 100 ns, or a speed of 10 MFLOPS.
Impact of Cache on System Performance
• Example
• Consider a processor with 1 GHz (1 ns clock) with a 100 ns latency
DRAM. Introduction of 32 KB Cache for Multiplying Matrixes. The
cache of size 32 KB with 1 ns latency.
• Stores two matrices A and B of dimensions 32 x 32. Assumes an
ideal cache placement strategy. Fetching the two matrices
(corresponds to fetching 2K words) takes approximately 200 µs.
• Multiplying two n x n matrices takes = 2n 3 operations. Therefore,
2*(323) = 64K operations in 16K cycles or 16 μs (four instructions
per cycle).
• Total computation time = 200 µs + 16 μs or 303 MFLOPS
• Improvement is 30 times as compared to the previous example.
However, as compared to peak performance (4GFLOPS), it is 10%.
Performance Metrics – Parallel Systems
Amdahl's Law & Speedup Factor
Amdahl's Law
Amdahl's Law states that potential program
speedup is defined by the fraction of code (P)
that can be parallelized:

1
Max.speedup = --------
1 - P

• If none of the code can be parallelized, P = 0 and the speedup = 1

(no speedup). If all of the code is parallelized, P = 1 and the
speedup is infinite (in theory).

• If 50% of the code can be parallelized, maximum speedup = 2,

meaning the code will run twice as fast
Amdahl's Law
• It soon becomes obvious that there are limits to the
scalability of parallelism

• For example, at P = .50, .90 and .99 (50%, 90% and

99% of the code is parallelizable)
speedup
--------------------------------
N P = .50 P = .90 P = .99
----- ------- ------- -------
10 1.82 5.26 9.17
100 1.98 9.17 50.25
1000 1.99 9.91 90.99
10000 1.99 9.91 99.02
Amdahl's Law for Parallel Program
• Example
• If 30% of the execution time may be the subject of a
speedup, p will be 0.3; if the improvement makes the
affected part twice as fast, s will be 2. Amdahl's law states
that the overall speedup of applying the improvement will
be?
Amdahl's Law for Parallel Program
• Example
• Assume that we are given a serial task that is split into four
consecutive parts, whose percentages of execution time are
p1=0.11, p2=0.18, p3=0.23, and p4=0.48 respectively. Then we are
told that the 1st part is not sped up, so s1=1, while the 2nd part is
sped up 5 times, so s2=5, the 3rd part is sped up 20 times, so s3=20,
and the 4th part is sped up 1.6 times, so s4=1.6. By using Amdahl's
law, the overall speedup is?
Amdahl's Law
Amdahl's Law
Amdahl's Law
Maximum Speedup (Amdahl's Law)

f = serial fraction
E.g., 1/0.05 (5% serial) = 20 speedup (maximum)
Maximum Speedup (Amdahl's Law)
Maximum speedup is usually p with p processors
(linear speedup).

Possible to get super-linear speedup (greater than p)

but usually a specific reason such as:
• Extra memory in multiprocessor system
• Nondeterministic algorithm
Speedup
Speedup

where ts is execution time on a single processor and tp is

execution time on a multiprocessor.

• S(p) gives increase in speed by using multiprocessor

• Use best sequential algorithm with single processor

system instead of parallel program run with 1
processor for ts. Underlying algorithm for parallel
implementation might be (and is usually) different.
Speedup
Speedup can also be used in terms of computational
steps:
Speedup

Here f is the part of the code that is serial:

e.g. if f==1 (all the code is serial, then the speedup will be 1
no matter how may processors are used
Speedup (with N CPUs or Machines)
• Introducing the number of processors performing the
parallel fraction of work, the relationship can be
modelled by:
1
speedup = ------------
fS + fP
-----

Proc

• where fP = parallel fraction,

Proc = number of processors and
fS = serial fraction
Linear and Superlinear Speedup
• Linear speedup
– Speedup of N, for N processors
– Parallel program is perfectly scalable
– Rarely achieved in practice

• Superlinear Speedup
– Speedup of >N, for N processors
• Theoretically not possible
• How is this achievable on real machines?
– Think about physical resources (cache, memory
etc) of N processors
Super-linear Speedup
Super-linear Speedup Example - Searching
Super-linear Speedup Example - Searching
Efficiency
• Efficiency is the ability to avoid wasting materials,
energy, efforts, money, and time in doing something or
in producing a desired result

• The ability to do things well, successfully, and without

waste
Efficiency
Speedups and Efficiencies of parallel program on
different Problem Sizes

Machine size (Processors)

Problem size

S: Speedup
E: Efficiency
Speedup
Efficiency
Gustafson’s Law
Amdahl’s law Sufficient?
• Amdahl’s law works on a fixed problem size
– Shows how execution time decreases as number of
processors increases
– Limits maximum speedup achievable
– So, does it mean large parallel machines are not
useful?
– Ignores performance overhead (e.g. communication,
load imbalance)

• Gustafson’s Law says that increase of problem size for

large machines can retain scalability with respect to
the number of processors
Gustafson’s Law
• Time-constrained scaling (i.e., we have fixed-time
to do performance analysis or execution)
• Example: a user wants more accurate results
within a time limit
• Execution time is fixed as system scales
Amdahl versus Gustafson's Law

Introduction to Parallel Computing, University of Oregon, IPCC

Credits: University of Oregon
Amdahl versus Gustafson's Law

Introduction to Parallel Computing, University of Oregon, IPCC

Credits: University of Oregon
Gustafson’s Law

• P processors, with increased number of processors the

problem size will also be increased
– Importantly parallel part will be increased
Gustafson’s Law

where, S(p) Scaled Speedup, using P processors

s  fraction of program that is serial (cannot be parallelized)
Gustafson’s Law- Example

S(p) = 10 + (1 - 10) * (0.03) = 10 – 0.27 = 9.73

(Scaled Speedup)

Speedup Using Amdahl's Law ?

Gustafson’s Law- Example
Speedup Using Amdahl's Law

S(p) = 1 / ((0.03)+ (0.97)/10

= 1 / 0.03 + 0.097
= 1 / 0.127
= 7.874

S(p) = 10 / 1 + (10-1)* (0.03)

= 1 / 1 + 0.027
= 1 / 1.027
= 7.874
Summary Gustafson’s Law
• Derived by fixing the parallel execution time (Amdahl
fixes the problem size -> fixed serial execution time)

• For many practical situations, Gustafson’s law makes

more sense
• Have a bigger computer, solve a bigger problem
Scalability
• In general, a problem is scalable if it can handle ever
increasing problem sizes

• If we increase the number of processes/threads and keep

the efficiency fixed without increasing problem size, the
problem is strongly scalable.

• If we keep the efficiency fixed by increasing the problem

size at the same rate as we increase the number of
processes/threads, the problem is weakly scalable.
Any Questions

CS-3006 10 PerformanceAnalysis
No ratings yet
CS-3006 10 PerformanceAnalysis
52 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
18 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
Lec 3
No ratings yet
Lec 3
21 pages
Module 3.3 - Problems On Performance
No ratings yet
Module 3.3 - Problems On Performance
54 pages
Performance Chap4
No ratings yet
Performance Chap4
20 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
Computer Performance
No ratings yet
Computer Performance
35 pages
Quatitative Principle
No ratings yet
Quatitative Principle
56 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
Computer Architecture Unit1
No ratings yet
Computer Architecture Unit1
20 pages
Computer Performance
No ratings yet
Computer Performance
27 pages
SEN307 Lecture 5
No ratings yet
SEN307 Lecture 5
34 pages
Performance: Latency
No ratings yet
Performance: Latency
7 pages
Assessing and Understanding Performance
No ratings yet
Assessing and Understanding Performance
31 pages
Chapter 1 Lecture 2 & 3 - Performance
No ratings yet
Chapter 1 Lecture 2 & 3 - Performance
36 pages
1aca L1
No ratings yet
1aca L1
35 pages
L14 Introduction To Performance Evaluation
No ratings yet
L14 Introduction To Performance Evaluation
48 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
Lec 2
No ratings yet
Lec 2
31 pages
2 Week
No ratings yet
2 Week
35 pages
CSC232 - Chp1 (Compatibility Mode)
No ratings yet
CSC232 - Chp1 (Compatibility Mode)
50 pages
Lec 2
No ratings yet
Lec 2
31 pages
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
No ratings yet
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
23 pages
Computer Architecture Unit 1 - Phase 2 PDF
No ratings yet
Computer Architecture Unit 1 - Phase 2 PDF
26 pages
Computer Performance Measurement. Amdahl's Law
No ratings yet
Computer Performance Measurement. Amdahl's Law
24 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
Module 1-Performance Measure
No ratings yet
Module 1-Performance Measure
14 pages
Measuring Performance: Chris Clack B261 Systems Architecture
No ratings yet
Measuring Performance: Chris Clack B261 Systems Architecture
19 pages
Lec 2 Performance
No ratings yet
Lec 2 Performance
28 pages
Lecture 02 CH01 Performance Power
No ratings yet
Lecture 02 CH01 Performance Power
76 pages
Principles of Scalable Performance
0% (1)
Principles of Scalable Performance
7 pages
Computer Architecture Unit 1
No ratings yet
Computer Architecture Unit 1
12 pages
06 CA (Performance Enhancement)
No ratings yet
06 CA (Performance Enhancement)
31 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
Chapter 1 Lecture 2 & 3 - Computer Performance
No ratings yet
Chapter 1 Lecture 2 & 3 - Computer Performance
37 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Computer Performance
No ratings yet
Computer Performance
22 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
1.2 Performance
No ratings yet
1.2 Performance
14 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
2 RISC V Performance ISA
No ratings yet
2 RISC V Performance ISA
72 pages
Cs23402 - Computer Architecture - Unit - 1
No ratings yet
Cs23402 - Computer Architecture - Unit - 1
161 pages
Computer Architecture Measuring Performance
No ratings yet
Computer Architecture Measuring Performance
33 pages
09 Perf
No ratings yet
09 Perf
22 pages
CMP3010L02 Performance Datapath
No ratings yet
CMP3010L02 Performance Datapath
68 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
Lecture Week - 3 Amdahl Law 1
No ratings yet
Lecture Week - 3 Amdahl Law 1
19 pages
Computer Architecture Measurement
No ratings yet
Computer Architecture Measurement
26 pages
Computer Performance
No ratings yet
Computer Performance
17 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
Week 13 14 - Performance Evaluation
No ratings yet
Week 13 14 - Performance Evaluation
19 pages
Da Ci
No ratings yet
Da Ci
13 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
17 pages
Cs2100 14 Understanding Performance
No ratings yet
Cs2100 14 Understanding Performance
46 pages
Lecture 3: Performance/Power, MIPS Instructions
No ratings yet
Lecture 3: Performance/Power, MIPS Instructions
18 pages
Python Beyond Limits: Python, #3
From Everand
Python Beyond Limits: Python, #3
AnwaarX
No ratings yet
Lock Free
No ratings yet
Lock Free
4 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
51 pages
Lecture 1
No ratings yet
Lecture 1
28 pages
Set Chassis Cluster Disable Reboot
No ratings yet
Set Chassis Cluster Disable Reboot
3 pages
Lecture 01
No ratings yet
Lecture 01
28 pages
M.E. Vlsi Design: Anna University of Technology Madurai MADURAI - 625002 Regulations 2010 Curriculam & Syllabi
No ratings yet
M.E. Vlsi Design: Anna University of Technology Madurai MADURAI - 625002 Regulations 2010 Curriculam & Syllabi
34 pages
Load Balancing in Parallel Computing
No ratings yet
Load Balancing in Parallel Computing
5 pages
SQL Server To SQL Server PDW: Migration Guide (AU3)
No ratings yet
SQL Server To SQL Server PDW: Migration Guide (AU3)
65 pages
Database Update Statices
No ratings yet
Database Update Statices
3 pages
CS621 - Handouts - Mids
No ratings yet
CS621 - Handouts - Mids
61 pages
M.E. Embedded System Technologies
No ratings yet
M.E. Embedded System Technologies
31 pages
Multi-Tasking in Python - Speed Up Your Program 10x by Executing Things Simultaneously - by Mike Huls - Towards Data Science
No ratings yet
Multi-Tasking in Python - Speed Up Your Program 10x by Executing Things Simultaneously - by Mike Huls - Towards Data Science
18 pages
Chapter 2
No ratings yet
Chapter 2
54 pages
Oracle Essbase 11.1.2.3.500 Patchset Readme
No ratings yet
Oracle Essbase 11.1.2.3.500 Patchset Readme
23 pages
Department of Cse CP7103 Multicore Architecture Unit Iii TLP and Multiprocessors 100% THEORY Question Bank
No ratings yet
Department of Cse CP7103 Multicore Architecture Unit Iii TLP and Multiprocessors 100% THEORY Question Bank
3 pages
Grav3d Manual
No ratings yet
Grav3d Manual
57 pages
Unit - 4 Computing Technologies: To - Bca 4 Sem BY-Vijayalaxmi Chiniwar
No ratings yet
Unit - 4 Computing Technologies: To - Bca 4 Sem BY-Vijayalaxmi Chiniwar
34 pages
Operating Systems Notes FINAL - Unit2
No ratings yet
Operating Systems Notes FINAL - Unit2
13 pages
Rekap Penerima SK Nominasi PIP Fase 1 2023
No ratings yet
Rekap Penerima SK Nominasi PIP Fase 1 2023
27 pages
HPC Int I Retest Answer Key
No ratings yet
HPC Int I Retest Answer Key
10 pages
Unit4 Session3 Parallel Computing Concepts Terminology Design Issues
No ratings yet
Unit4 Session3 Parallel Computing Concepts Terminology Design Issues
30 pages
Experience Report-Processing 6 Billion CDRs-day - From Research To Production
No ratings yet
Experience Report-Processing 6 Billion CDRs-day - From Research To Production
4 pages
Lesson 9 - Multithreading and Asynchronous Processing in Mobile Applications
No ratings yet
Lesson 9 - Multithreading and Asynchronous Processing in Mobile Applications
6 pages
Multicore Programming Practices
100% (1)
Multicore Programming Practices
114 pages
FDTD Cuda
No ratings yet
FDTD Cuda
118 pages
CC Unit1
No ratings yet
CC Unit1
9 pages
K3M: A Universal Algorithm For Image Skeletonization and A Review of Thinning Techniques
No ratings yet
K3M: A Universal Algorithm For Image Skeletonization and A Review of Thinning Techniques
19 pages
LOOG Improving GPU Efficiency With Light-Weight Out-Of-Order Execution
No ratings yet
LOOG Improving GPU Efficiency With Light-Weight Out-Of-Order Execution
4 pages
DSP Notes
No ratings yet
DSP Notes
15 pages
Maulana Abul Kalam Azad University of Technology, West Bengal
No ratings yet
Maulana Abul Kalam Azad University of Technology, West Bengal
2 pages

CS-3006 4 PerformanceAnalysis

Uploaded by

CS-3006 4 PerformanceAnalysis

Uploaded by

Performance Analysis

Dr. Muhammad Mateen Yaqoob,

Department of AI & DS,

• A better system has better performance, but what

Here we concentrate on user CPU time

• Factors affecting Computer Performance

• Clock synchronises all the digital components

• Cycle time measured in GHz

• 200 MHz (megahertz) means the clock ticks

one clock period

10 nsec clock cycle  100 MHz clock rate

• MIPS – Millions of Instructions per Second

– Intel Core i7–970 has 6 cores. If it is running at 3.46 GHz,

Basic Unit Speed Capacity

• The measure we use is MIPS (Millions of Instructions

• MIPS affected by:

4 / 0.05 = 80 Millions / 106 = 80 MIPS

600 * 106 / 3 = 200 * 106 / 106 = 200 MIPS

(90 * 106)/ (3.5 * 106) = 25.71 MFLOPS(A)

• Benchmarks represent large class of important

• If none of the code can be parallelized, P = 0 and the speedup = 1

• If 50% of the code can be parallelized, maximum speedup = 2,

• For example, at P = .50, .90 and .99 (50%, 90% and

Possible to get super-linear speedup (greater than p)

where ts is execution time on a single processor and tp is

• S(p) gives increase in speed by using multiprocessor

• Use best sequential algorithm with single processor

Here f is the part of the code that is serial:

• where fP = parallel fraction,

• The ability to do things well, successfully, and without

Machine size (Processors)

• Gustafson’s Law says that increase of problem size for

Introduction to Parallel Computing, University of Oregon, IPCC

Introduction to Parallel Computing, University of Oregon, IPCC

• P processors, with increased number of processors the

where, S(p) Scaled Speedup, using P processors

S(p) = 10 + (1 - 10) * (0.03) = 10 – 0.27 = 9.73

Speedup Using Amdahl's Law ?

S(p) = 1 / ((0.03)+ (0.97)/10

S(p) = 10 / 1 + (10-1)* (0.03)

• For many practical situations, Gustafson’s law makes

• If we increase the number of processes/threads and keep

• If we keep the efficiency fixed by increasing the problem

You might also like