0% found this document useful (0 votes)
89 views21 pages

CS104: Computer Organization: Lecture 08, 2 March 2020

The document discusses CPU performance evaluation and measuring execution time. It introduces the concept of cycles per instruction (CPI) which is the average number of clock cycles needed to complete one instruction. Execution time can be calculated using the CPU equation: Time = Instructions x CPI x Clock Cycle. This allows comparing performance of different CPUs by calculating their speedup based on the ratio of execution times. Factors that impact performance are the number of instructions, CPI, and clock cycle time.

Uploaded by

Om Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views21 pages

CS104: Computer Organization: Lecture 08, 2 March 2020

The document discusses CPU performance evaluation and measuring execution time. It introduces the concept of cycles per instruction (CPI) which is the average number of clock cycles needed to complete one instruction. Execution time can be calculated using the CPU equation: Time = Instructions x CPI x Clock Cycle. This allows comparing performance of different CPUs by calculating their speedup based on the ratio of execution times. Factors that impact performance are the number of instructions, CPI, and clock cycle time.

Uploaded by

Om Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

L08

02/03/2020

CS104: Computer Organization


Lecture 08, 2nd March 2020
Manojit Ghose
Computer Organization (IS F242)
Dip Sankar
Lecture Banerjee
1 : Introductory Thoughts

Dip Sankar Banerjee


[email protected]
Department
Indian Institute of CS &Technology,
of Information IS Guwahati
Jan-Apr 2020
L08
02/03/2020

CPU Performance Evaluation (CPI)


• Most computers run synchronously utilizing a CPU clock running at a
constant clock rate: Or clock frequency: f Clock cycle

where: Clock rate = 1 / clock cycle


f = 1 /C cycle 1 cycle 2 cycle 3

• The CPU clock rate depends on the specific CPU organization (design) and hardware
implementation technology (VLSI) used.
• A computer machine (ISA) instruction is comprised of a number of elementary or
micro operations which vary in number and complexity depending on the the
instruction and the exact CPU organization (Design).
– A micro operation is an elementary hardware operation that can be performed
during one CPU clock cycle.
– This corresponds to one micro-instruction in microprogrammed CPUs.
– Examples: register operations: shift, load, clear, increment, ALU operations:
add , subtract, etc.
• Thus: A single machine instruction may take one or more CPU cycles to complete
termed as the Cycles Per Instruction (CPI). Instructions Per Cycle = IPC = 1/CPI

• Average (or effective) CPI of a program: The average CPI of all instructions executed
in the program on a given CPU design.
Cycles/sec = Hertz = Hz
MHz = 106 Hz GHz = 109 Hz
L08
02/03/2020
Generic CPU Machine Instruction Processing Steps
Obtain instruction from program memory
Instruction
The Program Counter (PC) points to the instruction to be processed
Fetch

Instruction Determine required actions and instruction size


Decode

Operand Locate and obtain operand data


From data memory or registers
Fetch
Compute result value or status
Execute

Deposit results in storage (data memory or


Result
register) for later use
Store

Next Determine successor or next instruction


(i.e Update PC to fetch next instruction to be processed)
Instruction
CPI = Cycles per instruction
L08
02/03/2020
Computer Performance Measures:
Program Execution Time
• For a specific program compiled to run on a specific machine (CPU)
“A”, has the following parameters:
– The total executed instruction count of the program. I
– The average number of cycles per instruction (average CPI). CPI
– Clock cycle of machine “A” C/CC Or effective CPI

• How can one measure the performance of this machine (CPU) running this
program?
– Intuitively the machine (or CPU) is said to be faster or has better
performance running this program if the total execution time is
shorter.
– Thus the inverse of the total measured program execution time is a
possible performance measure or metric:
Seconds/program
Programs/second PerformanceA = 1 / Execution TimeA
How to compare performance of different machines?
What factors affect performance? How to improve performance?
L08
02/03/2020
Comparing Computer Performance Using
Execution Time
• To compare the performance of two machines (or CPUs) “A”, “B” running a
given specific program: The two CPUs may target
PerformanceA = 1 / Execution TimeA different ISAs provided
the program is written in a high
PerformanceB = 1 / Execution TimeB
level language (HLL)
• Machine A is n times faster than machine B means (or slower? if n < 1) :
PerformanceA Execution TimeB
Speedup = n = =
PerformanceB Execution TimeA
• Example: (i.e Speedup is ratio of performance, no units)
For a given program:
Execution time on machine A: ExecutionA = 1 second
Execution time on machine B: ExecutionB = 10 seconds
Speedup= PerformanceA / PerformanceB = Execution TimeB / Execution TimeA

= 10 / 1 = 10
The performance of machine A is 10 times the performance of
machine B when running this program, or: Machine A is said to be 10
times faster than machine B when running this program.
L08
02/03/2020
CPU Execution Time: The CPU Equation
• A program is comprised of a number of instructions executed , I
– Measured in: instructions/program AKA Dynamic Executed Instruction Count
• The average instruction executed takes a number of cycles per
instruction (CPI) to be completed.
Or Instructions Per Cycle (IPC):
– Measured in: cycles/instruction, CPI IPC = 1/CPI
• CPU has a fixed clock cycle time C = 1/clock rate C = 1/f
– Measured in: seconds/cycle

• CPU execution time is the product of the above three


parameters as follows: Executed

CPU time = Seconds = Instructions x Cycles x Seconds


Program Program Instruction Cycle

T = I x CPI x C
execution Time Number of Average CPI for program CPU Clock Cycle
per program in seconds instructions executed

(This equation is commonly known as the CPU performance equation)


L08
02/03/2020

CPU Average CPI/Execution Time


For a given program executed on a given machine (CPU):
CPI = Total program execution cycles / Instructions count
(i.e average or effective CPI) Executed (I)

 CPU clock cycles = Instruction count x CPI


(executed, I)

CPU execution time =

= CPU clock cycles x Clock cycle


= Instruction count x CPI x Clock cycle
T = I x CPI x C
Average
execution Time Number of or effective CPU Clock Cycle
per program in seconds instructions executed CPI for
program

(This equation is commonly known as the CPU performance equation)

CPI = Cycles Per Instruction


L08
02/03/2020
CPU Execution Time: Example
• A Program is running on a specific machine (CPU) with the
following parameters: I

– Total executed instruction count: 10,000,000


instructions
– Average CPI for the program: 2.5 cycles/instruction.
i.e 5 nanoseconds

– CPU clock rate: 200 MHz. (clock cycle = C = 5x10-9


seconds)
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
• What is the execution time for this program:
CPU time = Instruction count x CPI x Clock cycle
= 10,000,000 x 2.5 x 1 / clock rate
= 10,000,000 x 2.5 x 5x10-9
= 0.125 seconds
Nanosecond = nsec =ns = 10-9 second
MHz = 106 Hz T = I x CPI x C
L08
02/03/2020

Factors Affecting CPU Performance


CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
Average
T = I x CPI x C
Instruction Cycles per Clock Rate
Count Instruction (1/C)
Program

Compiler

Instruction Set
Architecture (ISA)

Organization
(CPU Design)

Technology
(VLSI)

T = I x CPI x C
L08
02/03/2020

Aspects of CPU Execution Time


CPU Time = Instruction count executed x CPI x Clock cycle

T = I x CPI x C Depends on:


Program Used
Compiler
ISA

Instruction Count I
(executed)

Depends on:
Program Used Depends on:
Compiler CPI Clock CPU Organization
ISA (Average Cycle Technology (VLSI)
CPU Organization
CPI) C
L08
02/03/2020
Performance Comparison: Example
• From the previous example: A Program is running on a specific machine
(CPU) with the following parameters:
– Total executed instruction count, I: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz. Thus: C = 1/(200x106)= 5x10-9 seconds

• Using the same program with these changes:


– A new compiler used: New executed instruction count, I: 9,500,000
New CPI: 3.0
– Faster CPU implementation: New clock rate = 300 MHz
Thus: C = 1/(300x106)= 3.33x10-9 seconds
• What is the speedup with the changes?
Speedup = Old Execution Time = Iold x CPIold x Clock cycleold
New Execution Time Inew x CPInew x Clock Cyclenew

Speedup = (10,000,000 x 2.5 x 5x10-9) / (9,500,000 x 3 x 3.33x10-9 )


= .125 / .095 = 1.32
or 32 % faster after changes.
Clock Cycle = C = 1/ Clock Rate T = I x CPI x C
L08
02/03/2020
Instruction Types & CPI
• Given a program with n types or classes of instructions
executed on a given CPU with the following characteristics:
e.g ALU, Branch etc.
Ci = Count of instructions of typei executed
i = 1, 2, …. n
CPIi = Cycles per instruction for typei
Then: Depends on CPU Design

CPI = CPU Clock Cycles / Instruction Count I


i.e average or effective CPI

Where:
Executed

 CPI 
n
CPU clock cycles  i
 Ci
i 1
Executed Instruction Count I = S Ci

T = I x CPI x C
L08
02/03/2020
Instruction Types & CPI: An Example
• An instruction set has three instruction classes:
Instruction class CPI
A 1 For a specific
e.g ALU, Branch etc. B 2 CPU design
C 3
• Two code sequences have the following instruction counts:
Program Instruction counts for instruction class
Code Sequence A B C
1 2 1 2
2 4 1 1
• CPU cycles for sequence 1 = 2 x 1 + 1 x 2 + 2 x 3 = 10 cycles
CPI for sequence 1 = clock cycles / instruction count
i.e average or effective CPI = 10 /5 = 2
• CPU cycles for sequence 2 = 4 x 1 + 1 x 2 + 1 x 3 = 9 cycles
CPI for sequence 2 = 9 / 6 = 1.5

CPI  C 
n
CPU clock cycles   i i
CPI = CPU Cycles / I
i 1
L08
02/03/2020
Instruction Frequency & CPI
• Given a program with n types or classes of instructions
with the following characteristics:
i = 1, 2, …. n
Ci = Count of instructions of typei executed
CPIi = Average cycles per instruction of typei
Fi = Frequency or fraction of instruction typei executed
= Ci/ total executed instruction count = Ci/ I
Where: Executed Instruction Count I = S Ci
Then:

CPI   CPI i  F i 
n

i 1
i.e average or effective CPI

Fraction of total execution time for instructions of type i = CPIi x Fi


CPI

T = I x CPI x C
L08
02/03/2020

Instruction Type Frequency & CPI


Program Profile or Executed Instructions Mix CPIi x Fi
CPI
Base Machine (Reg / Reg) Depends on CPU Design

Op Freq, Fi CPIi CPIi x Fi % Time


ALU 50% 1 .5 23% = .5/2.2
Given
Load 20% 5 1.0 45% = 1/2.2
Store 10% 3 .3 14% = .3/2.2
Branch 20% 2 .4 18% = .4/2.2

Sum = 2.2
Typical Mix

CPI   CPI i  F i 
n

i.e average or effective CPI i 1

CPI = .5 x 1 + .2 x 5 + .1 x 3 + .2 x 2 = 2.2
= .5 + 1 + .3 + .4
T = I x CPI x C
L08
02/03/2020

Metrics of Computer Performance


(Measures)

Application Execution time: Target workload,


SPEC, etc.
Programming
Language

Compiler
(millions) of Instructions per second – MIPS
(millions) of (F.P.) operations per second – MFLOP/s
ISA

Datapath
Control Megabytes per second.

Function Units
Transistors Wires Pins Cycles per second (clock rate).

Each metric has a purpose, and each can be misused.


L08
02/03/2020
Choosing Programs To Evaluate Performance
Levels of programs or benchmarks that could be used to evaluate
performance:
– Actual Target Workload: Full applications that run on the target
machine.
– Real Full Program-based Benchmarks:
• Select a specific mix or suite of programs that are typical of targeted
applications or workload (e.g SPEC95, SPEC CPU2000).
– Small “Kernel” Benchmarks: Also called synthetic benchmarks
• Key computationally-intensive pieces extracted from real programs.
– Examples: Matrix factorization, FFT, tree search, etc.
• Best used to test specific aspects of the machine.

– Microbenchmarks:
• Small, specially written programs to isolate a specific aspect of
performance characteristics: Processing: integer, floating point, local
memory, input/output, etc.
L08
02/03/2020
Types of Benchmarks
Pros Cons
• Very specific.
• Representative Actual Target Workload • Non-portable.
• Complex: Difficult
to run, or measure.

• Portable.
• Widely used. • Less representative
Full Application Benchmarks
• Measurements than actual workload.
useful in reality.

Small “Kernel” • Easy to “fool” by


• Easy to run, early in designing hardware
the design cycle. Benchmarks
to run them well.

• Peak performance
• Identify peak results may be a long
performance and Microbenchmarks
way from real application
potential bottlenecks. performance
L08
02/03/2020
SPEC: System Performance Evaluation Corporation
The most popular and industry-standard set of CPU benchmarks.
• SPECmarks, 1989: Programs application domain: Engineering and scientific computation

– 10 programs yielding a single number (“SPECmarks”).


• SPEC92, 1992:
– SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs).
• SPEC95, 1995:
– SPECint95 (8 integer programs):
• go, m88ksim, gcc, compress, li, ijpeg, perl, vortex
– SPECfp95 (10 floating-point intensive programs):
• tomcatv, swim, su2cor, hydro2d, mgrid, applu, turb3d, apsi, fppp, wave5
– Performance relative to a Sun SuperSpark I (50 MHz) which is given a score of SPECint95 =
SPECfp95 = 1
• SPEC CPU2000, 1999:
– CINT2000 (11 integer programs). CFP2000 (14 floating-point intensive programs)
– Performance relative to a Sun Ultra5_10 (300 MHz) which is given a score of SPECint2000 =
SPECfp2000 = 100

• SPEC CPU2006, 2006:


– CINT2006 (12 integer programs). CFP2006 (17 floating-point intensive programs)
– Performance relative to a Sun Ultra Enterprise 2 workstation with a 296-MHz UltraSPARC II
processor which is given a score of SPECint2006 = SPECfp2006 = 1
All based on execution time and give speedup over a reference CPU
L08
02/03/2020

Example Problem
• A 1 GHz processor takes 100 seconds to execute a program,
while consuming 70 W of dynamic power and 30 W of
leakage power. Does the program consume less energy
in Turbo boost mode when the frequency is increased to
1.2 GHz?

20
L08
02/03/2020

Example Problem
• A 1 GHz processor takes 100 seconds to execute a program,
while consuming 70 W of dynamic power and 30 W of
leakage power. Does the program consume less energy
in Turbo boost mode when the frequency is increased to
1.2 GHz?

Normal mode energy = 100 W x 100 s = 10,000 J


Turbo mode energy = (70 x 1.2 + 30) x 100/1.2 = 9,500 J

Note:
Frequency only impacts dynamic power, not leakage power.
We assume that the program’s CPI is unchanged when
frequency is changed, i.e., exec time varies linearly
with cycle time. 21

You might also like