0% found this document useful (0 votes)
58 views43 pages

CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design

The document provides an overview of the CIS775: Computer Architecture course. The key topics covered include: - Course objectives such as evaluating instruction set design, advanced pipelining techniques, and memory system design. - Defining computer architecture as the functional operation and information flow within a computer system. - Major topics that will be covered like instruction set architecture, pipelining, memory hierarchy, multiprocessors, and performance evaluation methods. - How computer systems have changed dramatically over the past decades due to advances in technology, computer architecture, and a drop in costs.

Uploaded by

padma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views43 pages

CIS775: Computer Architecture: Chapter 1: Fundamentals of Computer Design

The document provides an overview of the CIS775: Computer Architecture course. The key topics covered include: - Course objectives such as evaluating instruction set design, advanced pipelining techniques, and memory system design. - Defining computer architecture as the functional operation and information flow within a computer system. - Major topics that will be covered like instruction set architecture, pipelining, memory hierarchy, multiprocessors, and performance evaluation methods. - How computer systems have changed dramatically over the past decades due to advances in technology, computer architecture, and a drop in costs.

Uploaded by

padma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 43

CIS775: Computer Architecture

Chapter 1: Fundamentals of
Computer Design
1

Course Objectives
To evaluate the issues involved in choosing and
designing instruction set.
To learn concepts behind advanced pipelining
techniques.
To understand the hitting the memory wall
problem and the current state-of-art in memory
system design.
To understand the qualitative and quantitative
tradeoffs in the design of modern computer systems

What is Computer Architecture?


Functional operation of the individual HW
units within a computer system, and the
flow of information and control among
them.
Technology

Parallelism

Computer
Hardware Organization Architecture:
Measurement &
Evaluation

Programming
Language
Interface

Interface Design
(ISA)
Applications

OS
3

Computer Architecture Topics


Input/Output and Storage
Disks, WORM, Tape

Emerging Technologies
Interleaving Memories

DRAM

Memory
Hierarchy

VLSI

Coherence,
Bandwidth,
Latency

L2 Cache

L1 Cache
Instruction Set Architecture

RAID

Addressing,
Protection,
Exception Handling

Pipelining, Hazard Resolution,


Superscalar, Reordering,
Prediction, Speculation,
Vector, DSP

Pipelining and Instruction


Level Parallelism

Computer Architecture Topics


P

Interconnection Network

Processor-Memory-Switch

Multiprocessors
Networks and Interconnections

Shared Memory,
Message Passing,
Data Parallelism
Network Interfaces
Topologies,
Routing,
Bandwidth,
Latency,
Reliability

Measurement and Evaluation


Architecture is an iterative process:
Searching the space of possible designs
At all levels of computer systems

Design

Analysis

Creativity
Cost /
Performance
Analysis

Good Ideas

Bad Ideas

Mediocre Ideas
6

Issues for a Computer Designer


Functional Requirements Analysis (Target)
Scientific Computing HiPerf floating pt.
Business transactional support/decimal arith.
General Purpose balanced performance for a range of tasks

Level of software compatibility


PL level
Flexible, Need new compiler, portability an issue

Binary level (x86 architecture)


Little flexibility, Portability requirements minimal

OS requirements
Address space issues, memory management, protection

Conformance to Standards
Languages, OS, Networks, I/O, IEEE floating pt.

Computer Systems: Technology


Trends
1988

Supercomputers
Massively Parallel Processors
Mini-supercomputers
Minicomputers
Workstations
PCs

2002
Powerful PCs and
SMP Workstations
Network of SMP
Workstations
Mainframes
Supercomputers
Embedded Computers

Why Such Change in 10 years?


Performance
Technology Advances
CMOS (complementary metal oxide semiconductor) VLSI dominates older
technologies like TTL (transistor transistor logic) in cost AND performance

Computer architecture advances improves low-end


RISC, pipelining, superscalar, RAID,

Price: Lower costs due to


Simpler development
CMOS VLSI: smaller systems, fewer components

Higher volumes
Lower margins by class of computer, due to fewer services

Function :Rise of networking/local interconnection technology

Growth in Microprocessor
Performance

10

Six Generations of DRAMs

11

Updated Technology Trends


(Summary)
Capacity

Speed (latency)

Logic

4x in 4 years

2x in 3 years

DRAM

4x in 3 years

2x in 10 years

Disk

4x in 2 years

2x in 10 years

Network (bandwidth) 10x in 5 years


Updates during your study period??
BS (4 yrs)
MS (2 yrs)
PhD (5 yrs)

12

13

Integrated Circuits Costs


IC cost = Die cost + Testing cost + Packaging cost
Final test yield
Die cost =
Wafer cost
Dies per Wafer * Die yield
Dies per wafer = * ( Wafer_diam / 2)2 * Wafer_diam Test dies
Die Area
2 * Die Area

Die Yield = Wafer yield * 1 +

Defects_per_unit_area * Die_Area

Die Cost goes roughly with die area4

14

DAP.S98 1

Performance Trends
(Summary)
Workstation performance (measured in Spec
Marks) improves roughly 50% per year
(2X every 18 months)
Improvement in cost performance estimated
at 70% per year

15

Computer Engineering
Methodology
Implementation
Complexity

Evaluate Existing
Systems for
Bottlenecks
Benchmarks

Technology
Trends

Implement Next
Generation System

Simulate New
Designs and
Organizations

Workloads
16

How to Quantify Performance?


Plane

DC to Paris

Speed

Passengers

Throughput
(pmph)

Boeing 747

6.5 hours

610 mph

470

286,700

BAD/Sud
Concodre

3 hours

1350 mph

132

178,200

Time to run the task (ExTime)


Execution time, response time, latency

Tasks per day, hour, week, sec, ns (Performance)


Throughput, bandwidth

17

The Bottom Line:


Performance and Cost or Cost
and Performance?
"X is n times faster than Y" means
ExTime(Y)
--------ExTime(X)

Performance(X)
--------------Performance(Y)

Speed of Concorde vs. Boeing 747


Throughput of Boeing 747 vs. Concorde
Cost is also an important parameter in the
equation which is why concordes are being put
to pasture!
18

Measurement Tools
Benchmarks, Traces, Mixes
Hardware: Cost, delay, area, power estimation
Simulation (many levels)
ISA, RT, Gate, Circuit

Queuing Theory
Rules of Thumb
Fundamental Laws/Principles
Understanding the limitations of any
measurement tool is crucial.
19

Metrics of Performance
Application

Answers per month


Operations per second

Programming
Language
Compiler
(millions) of Instructions per second: MIPS
ISA
(millions) of (FP) operations per second:
MFLOP/s
Datapath
Megabytes per second
Control
Function Units
Cycles per second (clock rate)
Transistors Wires Pins

20

Cases of Benchmark Engineering


The motivation is to tune the system to the benchmark to achieve peak
performance.
At the architecture level
Specialized instructions

At the compiler level (compiler flags)


Blocking in Spec89 factor of 9 speedup
Incorrect compiler optimizations/reordering.
Would work fine on benchmark but not on other programs

I/O level
Spec92 spreadsheet program (sp)
Companies noticed that the produced output was always out put to a file (so they stored
the results in a memory buffer) and then expunged at the end (which was not measured).
One company eliminated the I/O all together.

21

After putting in a blazing performance on the benchmark test,


Sun issued a glowing press release claiming that it had
outperformed Windows NT systems on the test.
Pendragon president Ivan Phillips cried foul, saying the results
weren't representative of real-world Java performance and that
Sun had gone so far as to duplicate the test's code within Sun's
Just-In-Time compiler. That's cheating, says Phillips, who claims
that benchmark tests and real-world applications aren't
the same thing.
Did Sun issue a denial or a mea culpa? Initially, Sun neither
denied optimizing for the benchmark test nor apologized for
it. "If the test results are not representative of real-world Java
applications, then that's a problem with the benchmark,"
Sun's Brian Croll said.
After taking a beating in the press, though, Sun retreated and
issued an apology for the optimization.[Excerpted from PC Online221997]

Issues with Benchmark


Engineering
Motivated by the bottom dollar, good
performance on classic suites more
customers, better sales.
Benchmark Engineering Limits the
longevity of benchmark suites
Technology and Applications Limits the
longevity of benchmark suites.
23

SPEC: System Performance


Evaluation Cooperative
First Round 1989
10 programs yielding a single number (SPECmarks)

Second Round 1992


SPECInt92 (6 integer programs) and SPECfp92 (14 floating point
programs)
Compiler Flags unlimited. March 93
new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10
floating point)

benchmarks useful for 3 years


Single flag setting for all programs: SPECint_base95, SPECfp_base95
SPEC CPU2000 (11 integer benchmarks CINT2000, and 14
floating-point benchmarks CFP2000

24

SPEC 2000 (CINT 2000)Results

25

SPEC 2000 (CFP 2000)Results

26

Reporting Performance Results


Reproducability
Apply them on publicly available
benchmarks. Pecking/Picking order

Real Programs
Real Kernels
Toy Benchmarks
Synthetic Benchmarks
27

How to Summarize
Performance
Arithmetic mean (weighted arithmetic mean) tracks

execution time: sum(Ti)/n or sum(Wi*Ti)


Harmonic mean (weighted harmonic mean) of rates (e.g.,
MFLOPS) tracks execution time:
n/sum(1/Ri) or 1/sum(Wi/Ri)
Normalized execution time is handy for scaling
performance (e.g., X times faster than SPARCstation 10)
But do not take the arithmetic mean of normalized
execution time,
use the geometric mean = (Product(Ri)^1/n)

28

Performance Evaluation
For better or worse, benchmarks shape a field
Good products created when have:
Good benchmarks
Good ways to summarize performance

Given sales is a function in part of performance relative to


competition, investment in improving product as reported by
performance summary
If benchmarks/summary inadequate, then choose between
improving product for real programs vs. improving product to get
more sales;
Sales almost always wins!
Execution time is the measure of computer performance!

29

Simulations
When are simulations useful?
What are its limitations, I.e. what real world
phenomenon does it not account for?
The larger the simulation trace, the less
tractable the post-processing analysis.
30

Queueing Theory
What are the distributions of arrival rates
and values for other parameters?
Are they realistic?
What happens when the parameters or
distributions are changed?
31

Quantitative Principles of Computer


Design
Make the Common Case Fast
Amdahls Law

CPU Performance Equation


Clock cycle time
CPI
Instruction Count

Principles of Locality
Take advantage of Parallelism
32

Amdahl's Law
Speedup due to enhancement E:
ExTime w/o E
Speedup(E) = ------------ExTime w/ E

Performance w/ E
----------------Performance w/o

Suppose that enhancement E accelerates a fraction F


of the task by a factor S, and the remainder of the
task is unaffected
33

Amdahls Law
ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced

Speedupoverall =

ExTimeold
ExTimenew

1
=

(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced

34

Amdahls Law
Floating point instructions improved to run 2X; but
only 10% of actual instructions are FP
ExTimenew =
Speedupoverall =

35

CPU Performance Equation


CPU
CPUtime
time

== Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx Seconds
Seconds
Program
Program
Instruction
Cycle
Program
Program
Instruction
Cycle

Program

Inst Count CPI


X

Compiler

(X)

Inst. Set.

Organization
Technology

Clock Rate

X
X
36

Cycles Per Instruction


Average Cycles per Instruction
CPI = (CPU Time * Clock Rate) / Instruction Count
= Cycles / Instruction Count
n

CPU time = CycleTime *

i =1

CPIi

* iI

Instruction Frequency
n

CPI =

i =1

CPI
i

iF

where iF

iI
Instruction Count

Invest Resources where time is Spent!


37

Example: Calculating CPI


Base Machine (Reg / Reg)
Op
Freq Cycles CPI(i)
ALU
50%
1
.5
Load
20%
2
.4
Store
10%
2
.2
Branch
20%
2
.4
1.5

(% Time)
(33%)
(27%)
(13%)
(27%)

Typical Mix

38

Chapter Summary, #1
Designing to Last through Trends
Capacity

Speed

Logic

2x in 3 years

2x in 3 years

DRAM

4x in 3 years

2x in 10 years

Disk

4x in 3 years

2x in 10 years

6yrs to graduate => 16X CPU speed, DRAM/Disk size

Time to run the task


Execution time, response time, latency

Tasks per day, hour, week, sec, ns,


Throughput, bandwidth

X is n times faster than Y means


ExTime(Y)
--------ExTime(X)

Performance(X)
-------------Performance(Y)

39

Chapter Summary, #2
Amdahls Law:
Speedupoverall =

CPI Law:
CPU
CPUtime
time

ExTimeold
ExTimenew

1
=

(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced

== Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx Seconds
Seconds
Program
Program
Instruction
Cycle
Program
Program
Instruction
Cycle

Execution time is the REAL measure of computer


performance!
Good products created when have:

Good benchmarks, good ways to summarize performance

Die Cost goes roughly with die area4

40

Food for thought


Two companies reports results on two benchmarks
one on a Fortran benchmark suite and the other on
a C++ benchmark suite.
Company As product outperforms Company Bs
on the Fortran suite, the reverse holds true for the
C++ suite. Assume the performance differences are
similar in both cases.
Do you have enough information to compare the
two products. What information will you need?
41

Food for Thought II


In the CISC vs. RISC debate a key argument of the
RISC movement was that because of its simplicity,
RISC would always remain ahead.
If there were enough transistors to implement a CISC
on chip, then those same transistors could implement
a pipelined RISC
If there was enough to allow for a pipelined CISC
there would be enough to have an on-chip cache for
RISC. And so on.
After 20 years of this debate what do you think?
Hint: Think of commercial PCs, Moores law and
some of the data in the first chapter of the book (and
on these slides)
42

Amdahls Law (answer)


Floating point instructions improved to run 2X; but
only 10% of actual instructions are FP
ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold
Speedupoverall =

1
0.95

1.053

43

You might also like