Lecture 1
Lecture 1
Lecture 1: Introduction
Daniel A. Jiménez
The University of Texas at San Antonio
http://www.cs.utsa.edu/~dj
http://www.cs.utsa.edu/~dj/cs5513
Outline
1000
52%/year
100
10
25%/year
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
• VAX : 25%/year 1978 to 1986
• RISC + x86: 52%/year 1986 to 2002
• RISC + x86: 18%/year 2002 to 2008
Sea Change in Chip Design
• Intel 4004 (1971): 4-bit processor,
2312 transistors, 0.4 MHz,
10 micron PMOS, 11 mm2 chip
instruction set
hardware
Creativity
Cost /
Performance
Analysis
Good Ideas
Mediocre Ideas
Bad Ideas
CS 5513 Administrivia
Instructor: Prof. Daniel A. Jiménez
Office: SB 4.01.58
Office Hours: By appointment
T. A: Xinran Yu
Office: Office Hours in Main Lab
Office Hours: Monday, 3:00pm to 4:30pm
Class: Tuesday/Thursday 5:30pm to 6:45pm, HSS 3.04.28
Text: Hennessy and Patterson, Computer Architecture: A
Quantitative Approach, 4th Edition
Web page: http://www.cs.utsa.edu/~dj/cs5513
ALU
Reg
n Ifetch Reg DMem
s
t
r.
ALU
Ifetch Reg DMem Reg
O
r
ALU
Ifetch Reg DMem Reg
d
e
r
ALU
Ifetch Reg DMem Reg
Limits to pipelining
ALU
I Ifetch Reg DMem Reg
n
s
ALU
Ifetch Reg DMem Reg
t
r.
ALU
Ifetch Reg DMem Reg
ALU
r Ifetch Reg DMem Reg
d
e
r
2) The Principle of Locality
• The Principle of Locality:
– Program access a relatively small portion of the address space at any instant of time.
• Two Different Types of Locality:
– Temporal Locality (Locality in Time): If an item is referenced, it will tend to be
referenced again soon (e.g., loops, reuse)
– Spatial Locality (Locality in Space): If an item is referenced, items whose addresses
are close by tend to be referenced soon
(e.g., straight-line code, array access)
• Last 30 years, HW relied on locality for memory perf.
P $ MEM
Capacity
Levels of the Memory Hierarchy
Access Time Staging
Cost Xfer Unit
CPU Registers
Registers Upper Level
100s Bytes
300 – 500 ps (0.3-0.5 ns) prog./compiler
Instr. Operands 1-8 bytes faster
L1 and L2 Cache L1 Cache
10s-100s K Bytes cache cntl
~1 ns - ~10 ns Blocks 32-64 bytes
$1000s/ GByte
L2 Cache
cache cntl
Main Memory Blocks 64-128 bytes
G Bytes
80ns- 200ns Memory
~ $100/ GByte
OS
Pages 4K-8K bytes
Disk
10s T Bytes, 10 ms
(10,000,000 ns)
Disk
~ $1 / GByte user/operator
Files Mbytes
Larger
Tape
infinite Tape Lower Level
sec-min
~$1 / GByte
3) Focus on the Common Case
• Common sense guides computer design
– Since its engineering, common sense is valuable
• In making a design trade-off, favor the frequent
case over the infrequent case
– E.g., Instruction fetch and decode unit used more frequently
than multiplier, so optimize it 1st
– E.g., If database server has 50 disks / processor, storage
dependability dominates system dependability, so optimize it 1st
• Frequent case is often simpler and can be done
faster than the infrequent case
– E.g., overflow is rare when adding 2 numbers, so improve
performance by optimizing more common case of no overflow
– May slow down overflow, but overall performance improved by
optimizing for the normal case
• What is frequent case and how much performance
improved by making case faster => Amdahl’s Law
4) Amdahl’s Law
Fraction
ExTime
new ExTime
old 1 Fraction
enhanced enhanced
Speedup
enhanced
ExTime
old 1
Speedup
overall
ExTime Fraction
new 1 Fraction
enhanced
enhanced
Speedup
enhanced
1
Speedup overall
Fraction enhanced
1 Fraction enhanced
Speedup enhanced
1 1
1.56
0.4 0.64
1 0.4
10
• Apparently, its human nature to be attracted by 10X
faster, vs. keeping in perspective its just 1.6X faster
CPI
5) Processor performance equation
inst count Cycle time
CPU
CPUtime
time == Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx
Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle
Inst Count CPI Clock Rate
Program X
Compiler X (X)
Inst. Set. X X
Organization X X
Technology X
What’s a Clock Cycle?
Latch combinational
or logic
register