Processor's Performance: Parth Shah Parthshah - Ce@charusat - Ac.in
Processor's Performance: Parth Shah Parthshah - Ce@charusat - Ac.in
Parth Shah
[email protected]
Outline
Introduction
Defining Performance
The Iron Law of Processor Performance
Processor performance enhancement
Performance Evaluation Approaches
Performance Reporting
Amdahl’s Law
Elapsed Time
counts everything (disk and memory accesses, waiting for I/O,
running other programs, etc.) from start to finish
a useful number, but often not good for comparison purposes
elapsed time = CPU time+ wait time (I/O, other programs, etc.)
CPU time
doesn't count waiting for I/O or time spent running other programs
can be divided into user CPU time and system CPU time (OS calls)
CPU time = user CPU time + system CPU time
elapsed time = user CPU time + system CPU time + wait time
Our focus: user CPU time
(CPU execution time or, simply, execution time): time spent
executing the lines of code that are in our program
Speedup = OR Speedup =
CPU execution time: Also called CPU time. The actual time the
CPU spends computing for a specific task.
User CPU time: The CPU time spent in a program itself.
System CPU time: The CPU time spent in the operating system
performing tasks on behalf of the program.
Clock Cycle: is a single period of an oscillating clock signal.
Clock speed, rate, and frequency are used to describe the same
thing: the number of clock cycles per second, measured in Hertz
(Hz) (e.g., 4 gigahertz, or 4 GHz)
Clock period: The time required to complete single clock cycle.
(e.g., 250 picoseconds)
Compiler Designer Processor Designer Chip Designer
Algorithm Instruction Set Circuit Designer
Two compilers being tested for 100 Mhz machine with 3 classes
of instructions:
A (1 cycle), B (2 Cycles), and C (3 Cycles)
Compiler 1: 5 A, 1 B, 1 C instructions
Compiler 2: 10 A, 1 B, 1 C instructions
Which sequence has higher MIPS?
Which sequence has lower execution time?
50 billions instructions
10 billions are Branches (CPI = 4)
15 billions are Loads (CPI = 2)
5 billions are Stores (CPI = 3)
The rest are Integer ADD (CPI = 4)
Clock rate = 4 GHz
Execution time is 26.25
PerformanceX =
Actual users’ workload
Many programs
Not representative of other user
How do we get workload data?
4. Toy benchmarks
10 to 100lines of simple programs Easy to type and run on almost all
computers Example: Quick sort, Merge sort, etc.
5. Synthetic Benchmarks
Basic Principle:
• Analyze the distribution of instructions over a large number of practical
programs.
Synthesize a program that has the same instruction distribution as a
typical program:
• Need not compute something meaningful.
Dhrystone, Khornerstone, Linpack are some of the older synthetic
benchmarks