CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design
Chapter 1: Fundamentals of
Computer Design
1
Course Objectives
To evaluate the issues involved in choosing and
designing instruction set.
To learn concepts behind advanced pipelining
techniques.
To understand the hitting the memory wall
problem and the current state-of-art in memory
system design.
To understand the qualitative and quantitative
tradeoffs in the design of modern computer systems
Parallelism
Computer
Hardware Organization Architecture:
Measurement &
Evaluation
Programming
Language
Interface
Interface Design
(ISA)
Applications
OS
3
Emerging Technologies
Interleaving Memories
DRAM
Memory
Hierarchy
VLSI
Coherence,
Bandwidth,
Latency
L2 Cache
L1 Cache
Instruction Set Architecture
RAID
Addressing,
Protection,
Exception Handling
Interconnection Network
Processor-Memory-Switch
Multiprocessors
Networks and Interconnections
Shared Memory,
Message Passing,
Data Parallelism
Network Interfaces
Topologies,
Routing,
Bandwidth,
Latency,
Reliability
Design
Analysis
Creativity
Cost /
Performance
Analysis
Good Ideas
Bad Ideas
Mediocre Ideas
6
OS requirements
Address space issues, memory management, protection
Conformance to Standards
Languages, OS, Networks, I/O, IEEE floating pt.
Supercomputers
Massively Parallel Processors
Mini-supercomputers
Minicomputers
Workstations
PCs
2002
Powerful PCs and
SMP Workstations
Network of SMP
Workstations
Mainframes
Supercomputers
Embedded Computers
Higher volumes
Lower margins by class of computer, due to fewer services
Growth in Microprocessor
Performance
10
11
Speed (latency)
Logic
4x in 4 years
2x in 3 years
DRAM
4x in 3 years
2x in 10 years
Disk
4x in 2 years
2x in 10 years
12
13
Defects_per_unit_area * Die_Area
14
DAP.S98 1
Performance Trends
(Summary)
Workstation performance (measured in Spec
Marks) improves roughly 50% per year
(2X every 18 months)
Improvement in cost performance estimated
at 70% per year
15
Computer Engineering
Methodology
Implementation
Complexity
Evaluate Existing
Systems for
Bottlenecks
Benchmarks
Technology
Trends
Implement Next
Generation System
Simulate New
Designs and
Organizations
Workloads
16
DC to Paris
Speed
Passengers
Throughput
(pmph)
Boeing 747
6.5 hours
610 mph
470
286,700
BAD/Sud
Concodre
3 hours
1350 mph
132
178,200
17
Performance(X)
--------------Performance(Y)
Measurement Tools
Benchmarks, Traces, Mixes
Hardware: Cost, delay, area, power estimation
Simulation (many levels)
ISA, RT, Gate, Circuit
Queuing Theory
Rules of Thumb
Fundamental Laws/Principles
Understanding the limitations of any
measurement tool is crucial.
19
Metrics of Performance
Application
Programming
Language
Compiler
(millions) of Instructions per second: MIPS
ISA
(millions) of (FP) operations per second:
MFLOP/s
Datapath
Megabytes per second
Control
Function Units
Cycles per second (clock rate)
Transistors Wires Pins
20
I/O level
Spec92 spreadsheet program (sp)
Companies noticed that the produced output was always out put to a file (so they stored
the results in a memory buffer) and then expunged at the end (which was not measured).
One company eliminated the I/O all together.
21
24
25
26
Real Programs
Real Kernels
Toy Benchmarks
Synthetic Benchmarks
27
How to Summarize
Performance
Arithmetic mean (weighted arithmetic mean) tracks
28
Performance Evaluation
For better or worse, benchmarks shape a field
Good products created when have:
Good benchmarks
Good ways to summarize performance
29
Simulations
When are simulations useful?
What are its limitations, I.e. what real world
phenomenon does it not account for?
The larger the simulation trace, the less
tractable the post-processing analysis.
30
Queueing Theory
What are the distributions of arrival rates
and values for other parameters?
Are they realistic?
What happens when the parameters or
distributions are changed?
31
Principles of Locality
Take advantage of Parallelism
32
Amdahl's Law
Speedup due to enhancement E:
ExTime w/o E
Speedup(E) = ------------ExTime w/ E
Performance w/ E
----------------Performance w/o
Amdahls Law
ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
Speedupoverall =
ExTimeold
ExTimenew
1
=
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
34
Amdahls Law
Floating point instructions improved to run 2X; but
only 10% of actual instructions are FP
ExTimenew =
Speedupoverall =
35
== Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx Seconds
Seconds
Program
Program
Instruction
Cycle
Program
Program
Instruction
Cycle
Program
Compiler
(X)
Inst. Set.
Organization
Technology
Clock Rate
X
X
36
i =1
CPIi
* iI
Instruction Frequency
n
CPI =
i =1
CPI
i
iF
where iF
iI
Instruction Count
(% Time)
(33%)
(27%)
(13%)
(27%)
Typical Mix
38
Chapter Summary, #1
Designing to Last through Trends
Capacity
Speed
Logic
2x in 3 years
2x in 3 years
DRAM
4x in 3 years
2x in 10 years
Disk
4x in 3 years
2x in 10 years
Performance(X)
-------------Performance(Y)
39
Chapter Summary, #2
Amdahls Law:
Speedupoverall =
CPI Law:
CPU
CPUtime
time
ExTimeold
ExTimenew
1
=
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
== Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx Seconds
Seconds
Program
Program
Instruction
Cycle
Program
Program
Instruction
Cycle
40
1
0.95
1.053
43