0% found this document useful (0 votes)
7 views50 pages

Module 2 (26-10-2024)

The document outlines the course content for 'Computer Organization and Architecture (CSE 214)' focusing on performance assessment and understanding. It includes topics such as benchmarking, performance metrics, Amdahl's Law, and the significance of CPU time and clock cycles. Reference materials include books by Patterson and Hennessy, and Stallings, with specific sections highlighted for study.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views50 pages

Module 2 (26-10-2024)

The document outlines the course content for 'Computer Organization and Architecture (CSE 214)' focusing on performance assessment and understanding. It includes topics such as benchmarking, performance metrics, Amdahl's Law, and the significance of CPU time and clock cycles. Reference materials include books by Patterson and Hennessy, and Stallings, with specific sections highlighted for study.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Spring 2025

Computer Organization and Architecture


(CSE 214)

Topic – 2:
Assessing and Understanding Performance

Course Teacher:
Dr. Md. Tarek Habib
Assistant Professor
Department of Computer Science and Engineering
Reference Books
• “Computer Organization and Design – The
Hardware/Software Interface,” David A.
Patterson and John L. Hennessy, 5th
Edt., Elsevier.

☞ Chapter 1
• Computer Organization and Architecture –
Design for Performance,” W. Stallings,
10th Edt., Prentice Hall.

☞ Chapter 2 2
Topic Contents

• Performance [Sec. – 1.6]

• The Sea Change: The Switch from Uniprocessors


to Multiprocessors [Sec. – 1.8]

• Calculating the Mean [Sec. – 2.5]

3
Topic Contents…

• Real Stuff: Benchmarking the Intel Core i7


[Sec. – 1.9]
+
Benchmarks and SPEC [Sec. – 2.6]
• Two Laws that Provide Insight [Sec. – 2.3]

• Fallacies and Pitfalls [Sec. – 1.10]

4
1.6 Performance

5
Performance…
■ Performance is the key to understanding underlying motivation
for the hardware and its organization
■ Measure, report, and summarize performance to enable users to
■ make intelligent choices
■ see through the marketing hype!

■ Why is some hardware better than others for different programs?


■ What factors of system performance are hardware related?
(e.g., do we need a new machine, or a new operating system?)
■ How does the machine's instruction set affect performance?

6
What do we measure?
Define performance….
Airplane Passengers Range (mi) Speed (mph)

Boeing 737-100 101 630 598


Boeing 747 470 4150 610
BAC/Sud Concorde 132 4000 1350
Douglas DC-8-50146 8720 544
■ How much faster is the Concorde compared to the
747?
■ How much bigger is the Boeing 747 than the Douglas
DC-8?

■ So which of these airplanes has the best performance?!

7
Definition of Performance
■ For some program running on machine X:

PerformanceX = 1 / Execution timeX

■ X is n times faster than Y means:

8
Computer Performance:
TIME, TIME, TIME!!!
■ Response Time (elapsed time, latency):
■ how long does it take for my job to run?
■ how long does it take to execute (start to Individual user
finish) my job? concerns…

■ how long must I wait for the database query?


■ Throughput:
■ how many jobs can the machine run at once?
■ what is the average execution rate? Systems manager
concerns…
■ how much work is getting done?

■ If we upgrade a machine with a new processor what do we increase?


■ If we add a new machine to the lab what do we increase?
9
Execution Time
■ Execution Time
■ As an individual computer user, you are interested in reducing
response time—the time between the start and completion of a
task. Response time also called Execution Time. The total time
required for the computer to complete a task, including disk
accesses, memory accesses, I/O activities, operating system
overhead, CPU execution time, and so on.

■ Program execution time is measured in seconds per program.

10
CPU Time
■ CPU Time
■ The actual time the CPU spends computing for a specific task.
■ CPU time also known as CPU Execution time.
■ CPU Execution time ≠ response time

■ User CPU Time


■ The CPU time spent in a program itself.

■ System CPU Time


■ The CPU time spent in the operating system performing tasks on
behalf of the program.

11
Clock Cycle
■ Clock Cycle
■ The time for one clock period, usually of the processor clock, which
runs at a constant rate is called Clock Cycle.
■ Clock cycle also called tick, clock tick, clock period, clock, or
cycle.

■ Clock Period
■ The length of each clock cycle is called Clock Period.

■ Clock cycles and Clock cycle time


■ A simple formula relates the most basic metrics (clock cycles and
clock cycle time) to CPU time.

12
Continuing Clock Cycle
■ Clock Cycles and Clock Cycle time
■ Alternatively, because clock rate and clock cycle time are inverses,

■ Example Problem I
■ Our favorite program runs in 10 seconds on computer A, which has
a 2 GHz clock. We are trying to help a computer designer build a
computer, B, which will run this program in 6 seconds. The designer
has determined that a substantial increase in the clock rate is
possible, but this increase will affect the rest of the CPU design,
causing computer B to require 1.2 times as many clock cycles as
computer A for this program. What clock rate should we tell the
designer to target?

13
Continuing Clock Cycle
■ Solution of Example Problem
■ Let’s first find the number of clock cycles required for the program
on A:

14
Continuing Clock Cycle
■ Solution of Example Problem Continued
■ CPU time for B can be found using this equation:

■ To run the program in 6 seconds, B must have twice the clock rate
of A.

15
Performance Equation I

equivalently

CPU execution time CPU clock cycles × Clock cycle time


for a program = for a program

■ So, to improve performance one can either:


■ reduce the number of cycles for a program, or
■ reduce the clock cycle time, or, equivalently,
■ increase the clock rate

16
Performance Equation II
CPU execution time Instruction count × average CPI × Clock cycle time
=
for a program for a program

■ Derive the above equation from Performance Equation I

■ The clock rate is the inverse of clock cycle time. So,

17
CPI
■ Clock Cycles Per Instruction
■ The term Clock Cycles Per Instruction, which is the average
number of clock cycles each instruction takes to execute, is often
abbreviated as CPI.

■ Example Problem II
■ Suppose we have two implementations of the same instruction set
architecture. Computer A has a clock cycle time of 250 ps and a
CPI of 2.0 for some program, and computer B has a clock cycle
time of 500 ps and a CPI of 1.2 for the same program. Which
computer is faster for this program and by how much?

18
CPI Continuing
■ Solution of Example Problem
■ We know that each computer executes the same number of
instructions for the program; let’s call this number I. First, find the
number of processor clock cycles for each computer:

■ Now we can compute the CPU time for each computer:

19
CPI Continuing
■ Solution of Example Problem Continued
■ Likewise, for B:

■ Clearly, computer A is faster. Th e amount faster is given by the


ratio of the execution times:

■ We can conclude that computer A is 1.2 times as fast as computer


B for this program

20
Instruction
■ Instruction Count
■ Instruction count is the number of instructions executed by the
program.
■ We can now write this basic performance equation in terms of
instruction count (the number of instructions executed by the
program), CPI, and clock cycle time:

■ Or, since the clock rate is the inverse of clock cycle time:

■ Instruction mix
■ A measure of the dynamic frequency of instructions across one or
many programs.
21
CPI Example I
■ Suppose we have two implementations of the same instruction
set architecture (ISA). For some program:
■ machine A has a clock cycle time of 10 ns. and a CPI of 2.0
■ machine B has a clock cycle time of 20 ns. and a CPI of 1.2

■ Which machine is faster for this program, and by how much?


■ If two machines have the same ISA, which of our quantities (e.g., clock
rate, CPI, execution time, # of instructions, MIPS) will always be
identical?

22
CPI Example II
■ A compiler designer is trying to decide between two code
sequences for a particular machine.
■ Based on the hardware implementation, there are three
different classes of instructions: Class A, Class B, and Class C,
and they require 1, 2 and 3 cycles (respectively).
■ The first code sequence has 5 instructions:
2 of A, 1 of B, and 2 of C
The second sequence has 6 instructions:
4 of A, 1 of B, and 1 of C.

■ Which sequence will be faster? How much? What is the CPI for each
sequence?

23
Check Yourself
■ A given application written in Java runs for 15 seconds on a
desktop processor. A new Java compiler is released that requires
only 0.6 as many instructions as the old compiler. Unfortunately, it
increases the CPI by 1.1. How fast can we expect the application to
run using this new compiler? Pick the right answer from the three
choices below:

24
2.5 Calculating the Mean

25
26
27
28
29
1.9 Real Stuff: Benchmarking the
Intel Core i7

2.6 Benchmarks and SPEC

30
SPEC CPU Benchmark
■ Workload
■ A set of programs run on a computer that is either the actual
collection of applications run by a user or constructed from
real programs to approximate such a mix. A typical workload
specifies both the programs and the relative frequencies.
■ Benchmark
■ A program selected for use in comparing computer
performance.

31
SPEC CPU Benchmark…
■ SPEC (Standard Performance Evaluation Corporation)
■SPEC is an effort funded and supported by a number of computer vendors
to create standard sets of benchmarks for modern computer systems.
■In 1989, SPEC originally created a benchmark set focusing on processor
performance (now called SPEC89), which has evolved through five
generations. The latest is SPEC CPU2006, which consists of a set of 12
integer benchmarks (CINT2006) and 17 floating-point benchmarks
(CFP2006).

32
SPEC CPU Benchmark…
■ Elaboration
■ When comparing two computers using SPEC ratios, use the
geometric mean so that it gives the same relative answer no matter
what computer is used to normalize the results. If we averaged the
normalized execution time values with an arithmetic mean, the
results would vary depending on the computer we choose as the
reference.
■ The formula for the geometric mean is

■ where Execution time ratioi is the execution time, normalized to the


reference computer, for the ith program of a total of n in the
workload, and

33
SPEC CPU Benchmark…

34
2.3 Two Laws that Provide Insight

35
Amdahl’s Law
■ Gene Amdahl, an American computer architect and
high-tech entrepreneur
■ The performance gain that can be obtained by improving
some portion of a computer can be calculated using
Amdahl’s Law.
■ Amdahl’s Law states that the performance improvement to
be gained from using some faster mode of execution is
limited by the fraction of the time the faster mode can be
used.
■ Suppose that we can make an enhancement to a
computer that will improve performance when it is used.
36
Amdahl’s Law…

■ The execution time using the original computer with the enhanced
mode will be the time spent using the unenhanced portion of the
computer plus the time spent using the enhancement:

37
Amdahl’s Law…
■ The overall speedup is the ratio of the execution times:

38
Example I
💀 Suppose a program runs in 100 seconds on a computer,
with multiply operations responsible for 90 seconds of this
time. How much do we have to improve the speed of
multiplication if we want the program to run five times
faster?

39
Example I
💡 Solution:

40
Example II
💀 Suppose a program runs in 100 seconds on a computer,
with multiply operations responsible for 80 seconds of this
time. How much does someone have to improve the speed of
multiplication if he wants the program to run five times
faster?

41
Example II…
💡 Solution:

42
1.10 Fallacies and Pitfalls

43
Fallacies
■ Fallacies are commonly held misconceptions that
someone might encounter.
■ Cost/performance fallacies have ensnared many
computer architects!
■ A common fallacy is as follows:

Designing for performance and designing for energy


efficiency are unrelated goals.

44
Pitfalls
■ Pitfalls, or easily made mistakes, often are
generalizations of principles that are only true in a
limited context.
■ Cost/performance pitfalls also have ensnared many

computer architects!
■ A pitfall that traps many designers is as follows:

Expecting the improvement of one aspect of a computer


to increase overall performance by an amount
proportional to the size of the improvement.
45
Pitfalls…
■ Remedy: Amdahl’s Law.
■ Amdahl’s Law is a rule stating that the performance
enhancement possible with a given improvement is
limited by the amount that the improved feature is
used.
■ It is a quantitative version of the law of diminishing
returns.

46
MIPS
■ MIPS (million instructions per second) is a measurement
of program execution speed based on the number of millions of
instructions. MIPS is computed as the instruction count divided
by the product of the execution time and 106.
■ MIPS is simply:

■ We see the relationship between MIPS, clock rate, and CPI:

47
Check Yourself

48
Check Yourself
■ Two different compilers are being tested for a 500 MHz. machine with
three different classes of instructions: Class A, Class B, and Class C,
which require 1, 2 and 3 cycles (respectively). Both compilers are
used to produce code for a large piece of software.
■ Compiler 1 generates code with 5 billion Class A instructions, 1 billion
Class B instructions, and 1 billion Class C instructions.
■ Compiler 2 generates code with 10 billion Class A instructions, 1
billion Class B instructions, and 1 billion Class C instructions.

■ Which sequence will be faster according to MIPS?


■ Which sequence will be faster according to execution time?

49
THANKS…

You might also like