Cse - 321 - 2

Computer Architecture
Chapter - 2
• Discusses how to measure, report,

and summarize performance
• Describe the major factors that

determine the performance of a
computer.
Why examining performance is
important?
• Hardware performance is often key to the effectiveness of an
entire system.
• The computer components are now updating and improving its

technology frequently and speedy, and hence affected its price are
rising. The computer hardware and software costs a lot of money in an
organization. Therefore, it is very important that the IT department of
an organization should choose and buy the most appropriate and
cost-effective computer hardware.
Why assessing the performance is
challenging?
• The scale and intricacy of modern software

systems, together with the wide range of
performance improvement techniques employed
by hardware designers have made performance
assessment much more difficult.
• For different types of applications, different

performance metrics may be appropriate and
different aspects of a computer system may be
the most significant in determining overall
performance.
Measuring Performance
• Time is the measure of computer

performance.
• Program execution time is measured in
seconds per program.
• Wall-clock time / response time / elapsed time
/ execution time – total time to complete a
task, including - disk accesses, memory
access, I/O activity, OS overhead.
• Throughput : the total amount of work done in
a given time.
Performance analysis
1
PerformanceX =
Execution timeX
Performance of X is greater than the performance
of Y
PerformanceX > PerformanceY

1 1
>
Execution timeX Execution timeY
Execution timey > Execution timex
X is faster than Y
Continuation
• X is n times faster than Y, it means,
PerformanceX
= n
PerformanceY
PerformanceX Execution timey

= =n
PerformanceY Execution timex
Relative performance
• Example: If machine A runs a program in 10

seconds and machine B runs the same program
in 15 seconds, how faster is A than B?
– A is n times faster than B if
PerformanceA
=n
PerformanceB
Execution timeB 15
=n = 1.5
Execution timeA 10
– A is 1.5 times faster than B
Continuation
• We could also say that – Machine B is 1.5 times

slower than machine A. since
PerformanceA
=n
PerformanceB
PerformanceA
PerformanceB =
n
CPU execution time / CPU time
• is the time the CPU spends computing

for a task and does not include time
spent waiting for I/O or running other
programs.
CPU execution time / CPU time < Response time

Continuation
User CPU time
CPU time
System CPU time
• User CPU time – the CPU time spent in
the program
• System CPU time – the CPU time spent

in the OS performing tasks on behalf of
the program
Continuation
Execution Time
CPU time
For I/O User CPU System

and Others time CPU time
Continuation
• Example:
• Unix time command –
• 90.7u 12.9s 2:39 65%
User CPU time System CPU time Elapsed time

(90.7 seconds) (12.9 seconds) 2*60 + 39 =
(159 seconds)
90.7 + 12.9
= 0.65
159
Continuation
• Clock cycle – Almost all computers are
constructed using a clock that determines
when events take place. These discrete time
intervals are called clock cycles (ticks / clock
ticks / clock periods / clocks / cycles).
• Clock rate – Inverse of clock period.

Relating the Metrics
CPU execution CPU clock

Clock cycle
time for a = cycle for a ×
time
program program
CPU execution CPU clock cycle for a
time for a = program
program Clock rate
Hardware designer can improve performance by
reducing either the length of the clock cycle or
the number of clock cycles required for a
program.
Improving Performance
Our favorite program runs in 10 seconds on
computer A, which has a 400 MHz clock. We are
trying to help a computer designer build a machine
B, that will run this program in 6 seconds. The
designer has determined that a substantial increase
in the clock rate is possible, but this increase will
affect the rest of the CPU design, causing machine
B to require 1.2 times as many clock cycles as
machine A for this program. What clock rate
should we tell the designer to target?
Improving Performance (Cont.)
CPU clock cycleA
CPU timeA =
Clock rateA
CPU clock cycleA
10 Seconds =
400 × 106 cycles/sec
CPU clock cycleA = 10 seconds × 400 × 106 cycles/sec
= 4000 × 106 cycles
CPU clock cycleB
CPU timeB =
Clock rateB
1.2 × CPU clock cycleA
CPU timeB =
Clock rateB
Improving Performance (Cont.)
1.2 × 4000 × 106 cycles
6 seconds =
Clock rateB
1.2 × 4000 × 106 cycles
Clock rateB =
6 seconds
= 800 MHz
Machine B must therefore have twice the clock

rate of A to run the program in 6 seconds.
Hardware Software Interface
• Since Machine had to execute the

instructions to run the program, the
execution time must depend on the
number of instructions in a program.
CPU clock Instructions Average clock
cycles (for a = for a × cycles per
program) program instruction
CPI
Using the Performance Equation
• Suppose, we have two implementations of the

same instruction set architecture. Machine A has
a clock cycle time of 1 ns and a CPI of 2.0 for
some program, and machine B has a clock
cycle time of 2 ns and a CPI of 1.2 for the
same program. Which machine is faster for this
program, and by how much?
Continuation
Let the number of instructions of the program be I
CPU clock cyclesA = I × 2.0
CPU clock cyclesB = I × 1.2
CPU timeA = CPU clock cyclesA × Clock cycle timeA
= I × 2.0 × 1 ns = 2I ns
CPU timeB = I × 1.2 × 2 ns = 2.4I ns
CPU performanceA Execution timeB 2.4I ns

= = = 1.2
CPU performanceB Execution timeA 2I ns
A is 1.2 times faster than B
Continuation
• Basic performance equation
CPU time = Instruction count × CPI × clock cycle time
Instruction count × CPI

CPU time =
Clock rate
Continuation
• It is possible to compute the CPU clock

cycles by looking at the different types
of instructions and using their
individual clock cycle counts.
• In such cases,
CPU clock cycles= summation of (CPIi*Ci)
Comparing Code Segments
• Example
– The hardware designer supplied:
Instruction Class CPI for this class
A 1
B 2
C 3
– Two code sequences requires the following:

Code Sequence Instruction Counts for instruction class
A B C
1 2 1 2
2 4 1 1
– Which code sequence executes the most instructions?

– Which will be faster?
– What is the CPI for each sequence?
Solution
• Sequence 1 executes 2 + 1 + 2 = 5
instructions.
• Sequence 2 executes 4 + 1 + 1 = 6
instructions.
• So sequence 2 executes most instructions.
Solution
• CPU clock cycles1 = (2×1) + (1×2) +
(2×3) = 2 + 2 + 6 = 10 cycles
• CPU clock cycles2 = (4×1) + (1×2) +

(1×3) = 4 + 2 + 3 = 9 cycles
• So code sequence 2 is faster.

Solution
CPU clock cycles1 10
CPI1 = = = 2
Instruction count1 5
CPU clock cycles2 9

CPI2 = = = 1.5
Instruction count2 6
When comparing two machines, we must look at all three

components, which combine to form execution time.
Processor Clock Rate CPI
P1 4GHz 1.25
P2 3GHz 0.75
Instruction count= 10^6

Prove the fallacy, “ Largest clock rate has largest performance”
Here,
CPU execution time , p1= (CPI * Instructions) / clock rate
= (1.25* 10^6)/ (4*10^9)
CPU execution time, p2 = (0.75*10^6)/ (3*10^9)

Performance p1 : performance p2 =
((0.75*10^6)/ (3*10^9) ) / ((1.25* 10^6)/ (4*10^9) )
= 0.8
So, performance p1 = 0.8 * performance p2

Here,
P1 has highest clock rate but performance is lower.
So, the fallacy is wrong.
Check yourself:
Processors Clock rate CPI
P1 2GZ 1.5
P2 1.5GZ 1.0
P3 3GZ 2.5
Instruction set is same.
1. Which processor has the highest performance?

2. If the processors each execute a program in 10s, find the number
of cycles and number of instructions.
3. If execution time is 30% reduced and CPI is 20% increased then
what clock rate should be given?
MIPS (Millions instructions per second)
A measurement of program
execution speed based on the
number of millions of
instructions.
Limitations of MIPS:
Firstly, MIPS specifies the instruction execution rate but does not
specify the capabilities of the instructions.
Secondly, MIPS varies between program on the same computer.

Thus, a machine should not have a same MIPS ratings.
Finally, MIPS inversely related to performance!!

MIPS as a Performance Measure
MFLOPS (Million floating point operation per
second)– Performance Metric
MFLOPS=(Number of floating point operations in a

program) / (Execution time * 10^6)
Amdahl’s Law (self)
Earlier version of Amdahl’s law:
Latest version (second law) of Amdahl’s law:
Speed up = (Performance after improvement) / (Performance before

improvement)
= (Execution time before improvement) / Execution time
after improvement)

Cse - 321 - 2

Uploaded by

Copyright:

Available Formats

Cse - 321 - 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cse - 321 - 2

Uploaded by

Copyright:

Available Formats

Computer Architecture

• Discusses how to measure, report,

• Describe the major factors that

• The computer components are now updating and improving its

• The scale and intricacy of modern software

• For different types of applications, different

• Time is the measure of computer

PerformanceX > PerformanceY

Execution timey > Execution timex

• X is n times faster than Y, it means,

PerformanceX Execution timey

• Example: If machine A runs a program in 10

• We could also say that – Machine B is 1.5 times

• is the time the CPU spends computing

CPU execution time / CPU time < Response time

• System CPU time – the CPU time spent

For I/O User CPU System

User CPU time System CPU time Elapsed time

• Clock rate – Inverse of clock period.

CPU execution CPU clock

Machine B must therefore have twice the clock

• Since Machine had to execute the

• Suppose, we have two implementations of the

CPU performanceA Execution timeB 2.4I ns

CPU time = Instruction count × CPI × clock cycle time

Instruction count × CPI

• It is possible to compute the CPU clock

– Two code sequences requires the following:

– Which code sequence executes the most instructions?

• CPU clock cycles2 = (4×1) + (1×2) +

• So code sequence 2 is faster.

CPU clock cycles2 9

When comparing two machines, we must look at all three

Instruction count= 10^6

CPU execution time, p2 = (0.75*10^6)/ (3*10^9)

So, performance p1 = 0.8 * performance p2

Instruction set is same.

1. Which processor has the highest performance?

Secondly, MIPS varies between program on the same computer.

Finally, MIPS inversely related to performance!!

MFLOPS=(Number of floating point operations in a

Earlier version of Amdahl’s law:

Latest version (second law) of Amdahl’s law:

Speed up = (Performance after improvement) / (Performance before

You might also like

CPU execution time, p2 = (0.7510^6)/ (310^9)