Module 2 (26-10-2024)
Module 2 (26-10-2024)
Topic – 2:
Assessing and Understanding Performance
Course Teacher:
Dr. Md. Tarek Habib
Assistant Professor
Department of Computer Science and Engineering
Reference Books
• “Computer Organization and Design – The
Hardware/Software Interface,” David A.
Patterson and John L. Hennessy, 5th
Edt., Elsevier.
☞ Chapter 1
• Computer Organization and Architecture –
Design for Performance,” W. Stallings,
10th Edt., Prentice Hall.
☞ Chapter 2 2
Topic Contents
3
Topic Contents…
4
1.6 Performance
5
Performance…
■ Performance is the key to understanding underlying motivation
for the hardware and its organization
■ Measure, report, and summarize performance to enable users to
■ make intelligent choices
■ see through the marketing hype!
6
What do we measure?
Define performance….
Airplane Passengers Range (mi) Speed (mph)
7
Definition of Performance
■ For some program running on machine X:
8
Computer Performance:
TIME, TIME, TIME!!!
■ Response Time (elapsed time, latency):
■ how long does it take for my job to run?
■ how long does it take to execute (start to Individual user
finish) my job? concerns…
10
CPU Time
■ CPU Time
■ The actual time the CPU spends computing for a specific task.
■ CPU time also known as CPU Execution time.
■ CPU Execution time ≠ response time
11
Clock Cycle
■ Clock Cycle
■ The time for one clock period, usually of the processor clock, which
runs at a constant rate is called Clock Cycle.
■ Clock cycle also called tick, clock tick, clock period, clock, or
cycle.
■ Clock Period
■ The length of each clock cycle is called Clock Period.
12
Continuing Clock Cycle
■ Clock Cycles and Clock Cycle time
■ Alternatively, because clock rate and clock cycle time are inverses,
■ Example Problem I
■ Our favorite program runs in 10 seconds on computer A, which has
a 2 GHz clock. We are trying to help a computer designer build a
computer, B, which will run this program in 6 seconds. The designer
has determined that a substantial increase in the clock rate is
possible, but this increase will affect the rest of the CPU design,
causing computer B to require 1.2 times as many clock cycles as
computer A for this program. What clock rate should we tell the
designer to target?
13
Continuing Clock Cycle
■ Solution of Example Problem
■ Let’s first find the number of clock cycles required for the program
on A:
14
Continuing Clock Cycle
■ Solution of Example Problem Continued
■ CPU time for B can be found using this equation:
■ To run the program in 6 seconds, B must have twice the clock rate
of A.
15
Performance Equation I
equivalently
16
Performance Equation II
CPU execution time Instruction count × average CPI × Clock cycle time
=
for a program for a program
17
CPI
■ Clock Cycles Per Instruction
■ The term Clock Cycles Per Instruction, which is the average
number of clock cycles each instruction takes to execute, is often
abbreviated as CPI.
■ Example Problem II
■ Suppose we have two implementations of the same instruction set
architecture. Computer A has a clock cycle time of 250 ps and a
CPI of 2.0 for some program, and computer B has a clock cycle
time of 500 ps and a CPI of 1.2 for the same program. Which
computer is faster for this program and by how much?
18
CPI Continuing
■ Solution of Example Problem
■ We know that each computer executes the same number of
instructions for the program; let’s call this number I. First, find the
number of processor clock cycles for each computer:
19
CPI Continuing
■ Solution of Example Problem Continued
■ Likewise, for B:
20
Instruction
■ Instruction Count
■ Instruction count is the number of instructions executed by the
program.
■ We can now write this basic performance equation in terms of
instruction count (the number of instructions executed by the
program), CPI, and clock cycle time:
■ Or, since the clock rate is the inverse of clock cycle time:
■ Instruction mix
■ A measure of the dynamic frequency of instructions across one or
many programs.
21
CPI Example I
■ Suppose we have two implementations of the same instruction
set architecture (ISA). For some program:
■ machine A has a clock cycle time of 10 ns. and a CPI of 2.0
■ machine B has a clock cycle time of 20 ns. and a CPI of 1.2
22
CPI Example II
■ A compiler designer is trying to decide between two code
sequences for a particular machine.
■ Based on the hardware implementation, there are three
different classes of instructions: Class A, Class B, and Class C,
and they require 1, 2 and 3 cycles (respectively).
■ The first code sequence has 5 instructions:
2 of A, 1 of B, and 2 of C
The second sequence has 6 instructions:
4 of A, 1 of B, and 1 of C.
■ Which sequence will be faster? How much? What is the CPI for each
sequence?
23
Check Yourself
■ A given application written in Java runs for 15 seconds on a
desktop processor. A new Java compiler is released that requires
only 0.6 as many instructions as the old compiler. Unfortunately, it
increases the CPI by 1.1. How fast can we expect the application to
run using this new compiler? Pick the right answer from the three
choices below:
24
2.5 Calculating the Mean
25
26
27
28
29
1.9 Real Stuff: Benchmarking the
Intel Core i7
30
SPEC CPU Benchmark
■ Workload
■ A set of programs run on a computer that is either the actual
collection of applications run by a user or constructed from
real programs to approximate such a mix. A typical workload
specifies both the programs and the relative frequencies.
■ Benchmark
■ A program selected for use in comparing computer
performance.
31
SPEC CPU Benchmark…
■ SPEC (Standard Performance Evaluation Corporation)
■SPEC is an effort funded and supported by a number of computer vendors
to create standard sets of benchmarks for modern computer systems.
■In 1989, SPEC originally created a benchmark set focusing on processor
performance (now called SPEC89), which has evolved through five
generations. The latest is SPEC CPU2006, which consists of a set of 12
integer benchmarks (CINT2006) and 17 floating-point benchmarks
(CFP2006).
32
SPEC CPU Benchmark…
■ Elaboration
■ When comparing two computers using SPEC ratios, use the
geometric mean so that it gives the same relative answer no matter
what computer is used to normalize the results. If we averaged the
normalized execution time values with an arithmetic mean, the
results would vary depending on the computer we choose as the
reference.
■ The formula for the geometric mean is
33
SPEC CPU Benchmark…
34
2.3 Two Laws that Provide Insight
35
Amdahl’s Law
■ Gene Amdahl, an American computer architect and
high-tech entrepreneur
■ The performance gain that can be obtained by improving
some portion of a computer can be calculated using
Amdahl’s Law.
■ Amdahl’s Law states that the performance improvement to
be gained from using some faster mode of execution is
limited by the fraction of the time the faster mode can be
used.
■ Suppose that we can make an enhancement to a
computer that will improve performance when it is used.
36
Amdahl’s Law…
■ The execution time using the original computer with the enhanced
mode will be the time spent using the unenhanced portion of the
computer plus the time spent using the enhancement:
37
Amdahl’s Law…
■ The overall speedup is the ratio of the execution times:
38
Example I
💀 Suppose a program runs in 100 seconds on a computer,
with multiply operations responsible for 90 seconds of this
time. How much do we have to improve the speed of
multiplication if we want the program to run five times
faster?
39
Example I
💡 Solution:
40
Example II
💀 Suppose a program runs in 100 seconds on a computer,
with multiply operations responsible for 80 seconds of this
time. How much does someone have to improve the speed of
multiplication if he wants the program to run five times
faster?
41
Example II…
💡 Solution:
42
1.10 Fallacies and Pitfalls
43
Fallacies
■ Fallacies are commonly held misconceptions that
someone might encounter.
■ Cost/performance fallacies have ensnared many
computer architects!
■ A common fallacy is as follows:
44
Pitfalls
■ Pitfalls, or easily made mistakes, often are
generalizations of principles that are only true in a
limited context.
■ Cost/performance pitfalls also have ensnared many
computer architects!
■ A pitfall that traps many designers is as follows:
46
MIPS
■ MIPS (million instructions per second) is a measurement
of program execution speed based on the number of millions of
instructions. MIPS is computed as the instruction count divided
by the product of the execution time and 106.
■ MIPS is simply:
47
Check Yourself
48
Check Yourself
■ Two different compilers are being tested for a 500 MHz. machine with
three different classes of instructions: Class A, Class B, and Class C,
which require 1, 2 and 3 cycles (respectively). Both compilers are
used to produce code for a large piece of software.
■ Compiler 1 generates code with 5 billion Class A instructions, 1 billion
Class B instructions, and 1 billion Class C instructions.
■ Compiler 2 generates code with 10 billion Class A instructions, 1
billion Class B instructions, and 1 billion Class C instructions.
49
THANKS…