Lecture1 2
Lecture1 2
Lecture1 2
Lecture 1-2
Instructor: L. N. Bhuyan
9/23/2004
Lec 1-2
Instructor Information
Laxmi Narayan Bhuyan Office: Engg.II Room 441 E-mail: [email protected] Tel: (909) 787-2347 Office Times: W, Th 2-3 pm
9/23/2004
Lec 1-2
Course Syllabus
Instruction level parallelism, Dynamic scheduling, Branch Prediction and Speculation Ch 3 Text ILP with Software Approaches Ch 4 Memory Hierarchy Ch 5 VLIW, Multithreading, CMP and Network processor architectures From papers Text: Hennessy and Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufman Publisher
Course Details
Grading: Based on Curve Test1: 30 points Test 2: 40 points Project: 30 points
9/23/2004
Lec 1-2
9/23/2004
Lec 1-2
software
instruction set
hardware
9/23/2004
Lec 1-2
functional appearance to its immediate user/system programmer Opcodes, addressing modes, architected registers, IEEE floating point
Realization
chip/system designer view physical structure that embodies the implementation Gates, cells, transistors, wires
Lec 1-2 7
(chip)
9/23/2004
Hardware
Machine specifics:
Feature size (10 microns in 1971 to 0.18 microns in 2001)
Minimum size of a transistor or a wire in either the x or y dimension
9/23/2004
Lec 1-2
Processors with identical ISA and nearly identical organization are still not nearly identical.
e.g. Pentium II and Celeron are nearly identical but differ at clock rates and memory systems
9/23/2004
Lec 1-2
10
Classes of Computers
High performance (supercomputers)
Balanced cost/performance
Supercomputers Cray T-90 Massively parallel computers Cray T3E Workstations SPARCstations Servers SGI Origin, UltraSPARC High-end PCs Pentium quads Low-end PCs, laptops, PDAs mobile Pentiums
Lec 1-2 11
Low cost/power
9/23/2004
Designs change even if requirements are fixed. But the requirements are not fixed.
9/23/2004
Lec 1-2
13
9/23/2004
Lec 1-2
14
9/23/2004
Lec 1-2
15
Measuring Performance
Latency (response time, execution time)
Minimize time to wait for a computation
9/23/2004
Lec 1-2
16
Performance Terminology
X is n times faster than Y means:
Execution timeY Execution timeX =n
X 100% = m
Speedup (E) =
Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, what is the Execution timeafter and Speedup(E) ?
9/23/2004 Lec 1-2 18
Amdahls Law
Execution timeafter = ExTimebefore x [(1-F) + F
Speedup(E) =
ExTimebefore ExTimeafter
1 F [(1-F) + S
9/23/2004
Lec 1-2
19
Speedup =
ExTimebefore ExTimeafter
1 0.95
= 1.053
CPU Performance
The Fundamental Law
seconds instructio ns cycles seconds CPU time program program instructio n cycle
Three components of CPU performance:
Instruction count CPI Clock cycle time
Clock
X X X
21
9/23/2004
Lec 1-2
Total Cycle CPI Total Instruction Count IC i CPI i Fi where Fi Instruction Count i 1
CPU time Cycle time (CPI i IC i )
i 1 n
Example:
Store 12% 2
Branch 24% 2
Example
Instruction mix of a RISC architecture.
Inst. Freq. C. C. ALU 50% 1 Load 20% 2 Store 10% 2 Branch 20% 2
Solution
Instr. ALU Load Store Branch Reg/Mem
1.0 CPI=1.5
Fi
.5 .2 .1 .2
CPIi
1 2 2 2
CPIixFi
.5 .4 .2 .4
Ii
.5-X .2-X .1 .2 X
CPIi
1 2 2 3 2
CPIixIi
.5-X .4-2X .2 .6 2X
1-X
(1.7-X)/(1-X)
Cache
Memory
Lec 1-2
Disk/Tape
25
Benchmarks
program as unit of work
There are millions of programs Not all are the same, most are very different Which ones to use?
Benchmarks
Standard programs for measuring or comparing performance Representative of programs people care about repeatable!!
9/23/2004 Lec 1-2 26
Synthetic benchmarks
Kernels
Real programs
e.g., gcc, spice, SPEC89, 92, 95, SPEC2000 (standard performance evaluation corporation), TPCC, TPCD
27
9/23/2004
Lec 1-2
28
less compiler dependent than MIPS. not all FP ops are implemented in h/w on all machines. not all FP ops have same latencies. normalized MFLOPS: uses an equivalence table to even out the various latencies of FP ops.
Lec 1-2 29
Performance Contd.
SPEC CINT 2000, SPEC CFP2000, and TPCC figures are plotted in Fig. 1.19, 1.20 and 1.22 for various machines. EEMBC Performance of 5 different embedded processors (Table 1.24) are plotted in Fig. 1.25. Also performance/watt plotted in Fig. 1.27. Fig.1.30 lists the programs and changes in SPEC89, SPEC92, SPEC95 and SPEC2000 benchmarks.
9/23/2004
Lec 1-2
30