0% found this document useful (0 votes)
14 views76 pages

Lecture 02 CH01 Performance Power

Uploaded by

zB H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views76 pages

Lecture 02 CH01 Performance Power

Uploaded by

zB H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Lecture 2

• Accessing and Understanding Performance & Power

1
Evaluation is the first step
◼ To build a good computer, first thing you have to do is to
evaluate how good it is.

Compiler

Interface

Evaluating
performance
2
Metrics Matters

◼ Performance
❑ How fast a program can execute

◼ Power
❑ How much energy is consumed

◼ Others metrics
❑ Yield
❑ Cost
❑ Etc….
3
Outline of this lecture

◼ Performance
❑ Basics of performance evaluations
❑ Basic idea of benchmarks
❑ Making the common case fast!
❑ Reporting performance results
◼ Power
❑ Basics of power/energy evaluations
❑ Reducing energy by going from uni-processor to multi-
processors

4
Performance
◼ Why do we care about performance evaluation?
❑ Purchasing perspective
◼ given a collection of machines, which has the
❑ best performance ?
❑ least cost ?
❑ best performance / cost ?
❑ best performance / energy ?
❑ Design perspective
◼ faced with design options, which has the
❑ best performance improvement ?
❑ least cost ?
❑ best performance / cost ?

◼ How to measure, report, and summarize performance?


❑ What kind of performance metrics should be utilized?
❑ How to get the performance values?
5
Two Notions of “Performance”

◼ Response Time (latency)


— How long does it take for my job to run?
— How long does it take to execute a job?
— How long must I wait for the database query?
◼ Throughput
— How many jobs can the machine run at once?
— What is the average execution rate?
— How much work is getting done?

◼ If we upgrade a machine with a new processor what do we increase?


◼ If we add a new machine to the lab what do we increase?

6
Definitions
◼ Performance is in units of things-per-second
❑ bigger is better
◼ If we are primarily concerned with response time
❑ performance(x) = 1
execution_time(x)

" X is n times faster than Y" means


Performance(X) Execution_Time (Y)
n = ---------------------- = -------------------------
Performance(Y) Execution_Time (X)

7
Example of Relative Performance

◼ If computer A runs a program in 10 seconds and


computer B runs the same program in 15 seconds, how
much faster is A than B?

(1) PerformanceA/PerformanceB = n
(2) Performance ratio: 15/10 = 1.5
(3) A is 1.5 times faster than B

We will focus primarily on execution time for a single job !

8
Metrics for Performance Evaluation

◼ Program execution time


❑ Seconds for a program
❑ Elapsed time
◼ Total time to complete a task, including disk access, I/O, etc
◼ CPU execution time
❑ doesn't count I/O or time spent running other
programs
❑ can be broken up into system time, and user time

◼ Our focus: user CPU time


❑ time spent executing the lines of code that are "in" our
program
9
CPU Clocking
◼ Operation of digital hardware governed by a
constant-rate clock
Clock period

Clock (cycles)

Data transfer
and computation
Update state

◼ Clock period: duration of a clock cycle


◼ e.g., 250ps = 0.25ns = 250×10–12s
◼ Clock frequency (rate): cycles per second
◼ e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Clock Cycles
◼ Instead of reporting execution time in seconds, we often use cycles
seconds cycles seconds
= 
program program cycle

◼ cycle time = time between ticks = seconds per cycle


◼ clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec)
1
A 200 MHz. clock has a  10 9 = 5 nanoseconds cycle
time 200  10 6

11
How to Improve Performance
seconds cycles seconds
= 
program program cycle

So, to improve performance (everything else being equal) you can

either

________ the # of required cycles for a program, or

________ the clock cycle time or, said another way,

________ the clock rate.

12
Example of Improving Performance
◼ A program runs 10 second on 4GHz clock computer A. We are
trying to help a computer designer build a computer, B, that will run
this program in 6 seconds. The designer has determined that a
substantial increase in the clock rate is possible, but this increase
will affect the rest of the CPU design, causing computer B to
require 1.2 times as many clock cycles as computer A for this
program. What clock rate should we tell the designer to target?

(1) CPU execute timeA


= CPU clock cyclesA / Clock rateA
= 10 sec = CPU clock cyclesA / 4GHz
CPU clock cyclesA=4GHz x 10sec =40G cycles

(2) CPU execute timeB


= 1.2 x CPU clock cyclesA / Clock rateB
= 6 sec =1.2x40G cycles / Clock rateB
CPU rateB= 1.2x40G cycles / 6sec =8GHz
13
How many cycles are required for a program?

◼ Could assume that # of cycles = # of instructions

2nd instruction
3rd instruction
1st instruction

4th
5th
6th
...
time

This assumption is incorrect,

different instructions take different amounts of time on different machines.

14
Different numbers of cycles for different
instructions

time

• Multiplication takes more time than addition

• Floating point operations take longer than integer ones

• Accessing memory takes more time than accessing registers

• CPI (cycles per instruction)

CPU clock cycles = Instruction for a program x Average CPI

15
Performance Equation
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle

◼ Performance is determined by execution time


◼ Do any of the other variables equal performance?
❑ # of cycles to execute program?

❑ # of instructions in program?

❑ # of cycles per second?

❑ average # of cycles per instruction?

❑ average # of instructions per second?

◼ MIPS (million instructions per second)

◼ When is it fair to compare two processors using MIPS?

◼ Common pitfall: thinking one of the variables is indicative of


performance when it really isn’t.
16
Example #1
◼ Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for a
program, and machine B has a clock cycle time of 500 ps and a CPI of
1.2 for the same program. Which machine is faster for this program, and
by how much?

(1) CPU clock cyclesA=I x 2.0


CPU clock cyclesB=I x 1.2
# I is the number of instructions for this program

(2) CPU timeA=I x 2.0 x 250 ps = 500 x I ps


CPU timeB=I x 1.2 x 500 ps = 600 x I ps

(3) CPU performanceA= 1 / timeA


CPU performanceB= 1 / timeB

(4) Speedup= performanceA/ performanceB =1.2


17
How Metrics Misleads

◼ MIPS
❑ Million Instructions per Second
❑ A metrics commonly used by vendors to show how
high performance their CPUs are

18
Example #2 MIPS Performance Measure

Instruction counts (in billions)


Code from for each instruction class
A B C
Compiler 1 5 1 1
Compiler 2 10 1 1

• Which code sequence will execute faster


according to MIPS (Million Instructions Per
Second)?
• Which code sequence will execute faster
according to execution time? CPI : A = 1, B =2 , C = 3
Clock Rate 4 GHz

n MIPS = Instruction Count


CPU clock cycles =  CPI  C
------------------------------
i i
i =1 Execution time x 10^6
19
Example of MIPS Performance Measure (cont.)

(1) CPU clock cycles1=(5x1+1x2+1x3)x109=10x109


CPU clock cycles2=(10x1+1x2+1x3)x109=15x109

(2) Execution time1=10x109/4x109=2.5sec


Execution time2=15x109/4x109=3.75sec

(3) MIPS1=(5+1+1)x109/2.5x106=2800
MIPS2=(10+1+1)x109/3.75x106=3200

So, the code from compiler 2 has a higher MIPS


rating, but the code from compiler 1 runs faster!

20
Example #4
◼ Suppose we have two implementations of the same instruction set
architecture (ISA).
For some program,
Machine A has a clock cycle time of 10 ns. and a CPI of 2.0
Machine B has a clock cycle time of 20 ns. and a CPI of 1.2
What machine is faster for this program, and by how much?

◼ If two machines have the same ISA, which of our quantities


(e.g., clock rate, CPI, execution time, # of instructions, MIPS)
will always be identical?

21
Example #4
◼ Suppose we have two implementations of the same instruction set
architecture (ISA).
For some program,
Machine A has a clock cycle time of 10 ns. and a CPI of 2.0
Machine B has a clock cycle time of 20 ns. and a CPI of 1.2
What machine is faster for this program, and by how much?

CPU time = Seconds = Instructions x Cycles x Seconds


Program Program Instruction Cycle

I x 2.0 x 10
Execution_Time (A)
=
Execution_Time (B) I x 1.2 x 20

22
Example #5
Instruction class CPI for this instruction class
A 1
B 2
C 3

Instruction counts for instruction class


Code sequence
A B C
1 2 1 2
2 4 1 1

• Which code sequence executes the most


instructions?
• Which will be faster?
• What is the CPI for each sequence?
23
Example #5

(1) Seq1 = 2 + 1 + 2 = 5
Seq2 = 4 + 1 + 1 = 6

(2) CPU clock cycles1=(2x1)+(1x2)+(2x3)=10


CPU clock cycles2=(4x1)+(1x2)+(1x3)=9

(3) CPI1= CPU clock cycles1/Instruction count1 = 10/5 = 2


CPI2= CPU clock cycles2/Instruction count2 = 9/6 = 1.5

When comparing 2 machines, these “3 components”


must be considered!

24
Aspects of CPU Performance
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle

Inst Count CPI Clock Rate

Algorithm X X

Programming
Language X X

Compiler X X

ISA X X X
(instruction set architecture)

25
Performance Improvement
Seconds Instructions Clock cycles Seconds
Time = =  
pogram program Instruction Clock cycles

Instruction count CPI: clock per Clock rate


for a program instruction

◼ A shown in the formula, given an ISA (Instruction Set


Architecture), increases in CPU performance can
come from three source:
1. Increases in clock rate
2. Improvements in processor organization that lower the CPI
3. Compiler enhancements that lower the instruction count or
generate instructions with a lower average CPI (e.g., by
using simpler instructions)
26
Now, you can answer this question..
◼ CPU frequency ? Performance

10

6
SPECint

0
50 100 150 200 250

Clock rate (MHz) Pentium

Pentium Pro
27
How to get IC and CPI values?

◼ Which programs should be run to get the


numbers?
❑ Applications/programs that are commonly used in the
target machines

◼ Should there be only one program?


❑ More than one applications/programs may operate on
the target machines
❑ Workload
◼ Set of applications that are commonly running on the target
machines
28
Benchmarks or Benchmark Suites
◼ Set of applications/programs used for evaluating a
computer

◼ Small benchmarks
❑ nice for architects and designers
❑ easy to standardize
❑ can be abused
◼ SPEC (System Performance Evaluation Cooperative)
❑ companies have agreed on a set of real program and inputs
❑ can still be abused
❑ valuable indicator of performance (and compiler technology)
❑ latest: spec2010

29
SPEC Benchmarks
◼ CPU
❑ Computation-intensive workload for testing different
CPU architectures
❑ Two major sets:
◼ Integer (SPEC CINT)
◼ Floating point (SPEC CFP)
◼ High Performance Computing, OpenMP, MPI
❑ For testing parallel applications
◼ Power
◼ Web server
◼ More information in http://www.spec.org/
30
SPEC CINT2000

31
Now you have several results…
◼ Usually you will have results of
❑ Running the workload on the new machine
❑ Running the workload on the old/reference machine
❑ Execution times of all programs in the workload
◼ How do you report this clearly?
Programs of Exe. Time of Exe. Time of Speedup
the workload Reference New
A 1000 500 2
B 90 20 4.5
C 600 150 4
D 10 1 10
E 12600 300 42
F 1200 60 20 32
How to report results clearly?

◼ Which machine is better? By how much?


◼ Which application achieves the best
performance improvement on the new machine?
By how much?
◼ How much performance gain of the whole
workload?

33
Comparisons of Reporting Methods
Program Exe. Exe. Speedup Comparison of Execution Times
s of the Time of Time of
workload Referenc New Exe. Time of Reference Exe. Time of New
e 14000
A 1000 500 2
12000
B 90 20 4.5 Huge

In Seconds
C 600 150 4 10000
Differences!
D 10 1 10
8000
E 12600 300 42
F 1200 60 20 6000

4000

2000

0
A B C D E F Average

• Huge differences in execution times!


• Overall workload results by simply reporting the average
time is not fair.
34
Comparisons of Reporting Methods
Program Exe. Exe. Speedup
s of the Time of Time of speedup
workloa Referen New 45
d ce 40
A 1000 500 2 35

B 90 20 4.5 30

C 600 150 4 25

D 10 1 10 20

E 12600 300 42 15

10
F 1200 60 20
5
Avg. 2583 171 13.75
0
A B C D E F Average

Better! The average is meaningful.


Huge differences still.

35
Normalize to the reference machine
◼ Normalization
❑ adjusting values measured on different scales to a
notionally common scale
❑ Normalized to the reference machine
→ exe.new / exe.reference
normalized to reference
Progr Exe. Exe. Speed Norm 60%
ams Time Time up alized
of the of of to 50%
workl Refere New Refere
oad nce nce 40%

A 1000 500 2 0.5


30%
B 90 20 4.5 0.22
20%
C 600 150 4 0.25
D 10 1 10 0.1 10%

E 12600 300 42 0.024 0%


A B C D E F Average
F 1200 60 20 0.05
36
Avg. 2583 171 13.75 0.19
Normalize to the reference machine

We can easily summarize the performance as


1. The new machine improves 80% of performance on the
average.
2. The new machine works the best on Program E that yield
98% performance improvement.

normalized to reference
Progr Exe. Exe. Speed Norm 60%
ams Time Time up alized
of the of of to 50%
workl Refere New Refere
oad nce nce 40%

A 1000 500 2 0.5


30%
B 90 20 4.5 0.22
20%
C 600 150 4 0.25
D 10 1 10 0.1 10%

E 12600 300 42 0.02 0%


A B C D E F Average
F 1200 60 20 0.05
Avg. 2583 171 13.75 0.19 37
Outline of this lecture

◼ Performance
❑ Basics of performance evaluations
❑ Basic idea of benchmarks
❑ Reporting performance results
❑ Making the common case fast!
◼ Power
❑ Basics of power/energy evaluations
❑ Reducing energy by going from uni-processor to multi-
processors

38
Amdahl's Law
Speedup due to enhancement E:
ExTime w/o E Performance w/ E
Speedup(E) = ------------- = -------------------
ExTime w/ E Performance w/o E

Suppose that enhancement E accelerates a fraction F of the task by a


factor S, and the remainder of the task is unaffected, then:

ExTime(E) =((1-F) + F/S) X ExTime(without E)


1
Speedup(E) =
(1-F) + F/S

39
Amdahl’s Law
◼ Floating point instructions improved to run 2X;
but only 10% of actual instructions are FP

ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

1
Speedupoverall = = 1.053
0.95

40
Eight Great Ideas

◼ Design for Moore’s Law


◼ Use abstraction to simplify design
◼ Make the common case fast
◼ Performance via parallelism
◼ Performance via pipelining All of these
would be
◼ Performance via prediction
covered in this
◼ Hierarchy of memories semester’s
◼ Dependability via Redundancy lecture

41
Example #6
◼ Our favorite program runs in 10 seconds on computer A, which has a
400 Mhz. clock. We are trying to help a computer designer build a new
machine B, that will run this program in 6 seconds. The designer can use
new (or perhaps more expensive) technology to substantially increase the
clock rate, but has informed us that this increase will affect the rest of the
CPU design, causing machine B to require 1.2 times as many clock cycles
as machine A for the same program. What clock rate should we tell the
designer to target?"

1
Execution_Time (A) 10 C x 400 x10^ 6
= = 1
6 1.2C x
Execution_Time (B) x
seconds cycles seconds
= 
program program cycle

42
Outline of this lecture

◼ Performance
❑ Basics of performance evaluations
❑ Basic idea of benchmarks
❑ Reporting performance results
❑ Making the common case fast!
◼ Power
❑ Basics of power/energy evaluations
❑ Reducing energy by going from uni-processor to
multi-processors

43
With Moore’s Law…The Power Wall
Moore’s Law (1965): “The density of transistors in an
integrated circuit will double every year.” (18 months in fact)
Processor Performance increases
1600 Interl
Pentium
Power dissipation also increases…
1500

1400
1.58x per year
1300

1200

1100

1000
HP
900 9000

800 10000
700

600 Rocket Sun’surface


Power Density (W/cm2)

DEC
500 1000 Alpha Nozzle
400 Nuclear
300
Reactor
200 DEC
1.35x per year
MIPS IBM HP
Alpha
100
R2000
100
Pow er1 9000

1984 1986 1988 1990 1992 1994 1996 1998 2000


Hot Plate
8008 8086
10
4004 8085 486 P6
386 Pentium®
8080 286
1
1970 1980 1990 2000 2010
44
The Power Wall Problem
◼ Pentium 4 has sharply increase in frequency and power
consumption, but little increase in performance
◼ Core series has lower power consumption while keeps
the frequency

45
Basics of Power Consumption

Pavg = Pswitching + Pshort −circuit + Pleakage


=  o→1C L  V  f clk + I sc  Vdd + I leakage  Vdd
2
dd

where : α 0 → 1 switch frequency


1. Switching Power or Dynamic power:
Switching: charging and discharing at transition from 0->1 or 1-> 0
Short circuit: power due to brief short circuit current during transition
2. Leakage Power or Static power : leakage power, per-cycle energy cost
250
Leakage Vs. Dynamic
Energy (in J) 200 Power*

E =t*P 150
Leakage Power
100 Dynamic Power
50

0 46
250 nm 180 nm 130 nm 100 nm 50 nm
Switch from Uniprocessor to Multiprocessor

◼ Utilizing massive on-chip resource to improve


throughput
◼ Decrease power consumption
❑ Pdynamic = o→1CL Vdd2  f clk
❑ Voltage and frequency are positive related

Case 1 Energy = t × αCV2f


Processor 1

Case 2 t
Energy
Processor 1
= 2t × (αC(0.5V)2(0,5f))
Processor 2 = 2t x 0.125αCV2f
t = 0.25t x αCV2f 47
§1.8 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance

Constrained by power, instruction-level parallelism,


memory latency
Chapter 1 — Computer
Abstractions and
Technology — 48
Multiprocessors

◼ Multicore microprocessors
❑ More than one processor per chip
◼ Requires explicitly parallel programming
❑ Compare with instruction level parallelism
◼ Hardware executes multiple instructions at once
◼ Hidden from the programmer
❑ Hard to do
◼ Programming for performance
◼ Load balancing
◼ Optimizing communication and synchronization

Chapter 1 — Computer
Abstractions and
Technology — 49
Multiprocessor Trend

Product AMD Intel IBM Sun


Opteron X4 Nehalem Power 6 Ultra SPARC
(Barcelona) T2 (Niagara 2)
Cores/Chip 4 4 2 8
Clock Rate 2.5 GHz ~2.5 GHz? 4.7 GHz 1.4GHz
Power 120W ~100W? ~ 100W? 94W

Plan of these companies:


Double the number of cores per microprocessor for every 2 years.

51
Design Challenges of Multicore Architecture

◼ Parallel programming
❑ Rewrite the originally sequential program to take
advantage of multiple processors
❑ OpenMP, POSIX, CUDA
◼ Load balance of processors
❑ How to schedule tasks onto processors?
◼ Communication & synchronization issue

52
Evaluations with SPECspeed 2017 Integer benchmarks on a
1.8 GHz Intel Xeon E5-2650L

Chapter 1 — Computer
Abstractions and
Technology — 53
SPEC Power Benchmark
◼ Power consumption of server at different
workload levels
❑ Performance: ssj_ops/sec
❑ Power: Watts (Joules/sec)

 10   10 
Overall ssj_ops per Watt =   ssj_opsi    poweri 
 i =0   i =0 

Chapter 1 — Computer
Abstractions and
Technology — 54
SPECpower_ssj2008 for Xeon E5-2650L

Chapter 1 — Computer
Abstractions and
Technology — 55
Fallacy: Low Power at Idle

◼ Look back at i7 power benchmark


❑ At 100% load: 258W
❑ At 50% load: 170W (66%)
❑ At 10% load: 121W (47%)
◼ Google data center
❑ Mostly operates at 10% – 50% load
❑ At 100% load less than 1% of the time
◼ Consider designing processors to make power
proportional to load
Chapter 1 — Computer
Abstractions and
Technology — 56
§1.12 Concluding Remarks
Concluding Remarks

◼ Cost/performance is improving
❑ Due to underlying technology development
◼ Hierarchical layers of abstraction
❑ In both hardware and software
◼ Instruction set architecture
❑ The hardware/software interface
◼ Execution time: the best performance measure
◼ Power is a limiting factor
❑ Use parallelism to improve performance

Chapter 1 — Computer
Abstractions and
Technology — 57
Reading Assignment

◼ 2.1 ~ 2.7

58
Which of these airplanes has the best performance?
Airplane Passenger Cruising range Cruising speed Passenger throughput
Capacity (miles) (m.p.h.) (passengers x m.p.h.)

Boeing 777 375 4630 610 228,750

Boeing 747 470 4150 610 286,700

BAC/Sud Concorde 132 4000 1350 178,200

Douglas DC-8-50 146 8720 544 79,424

• What metric defines performance? Concorde


• Capacity, cruising range, or speed?
• Speed? Still has two possible definitions…
• Taking one passenger from one point to another in the least time
• Transporting 450 passengers from one point to another

59
Which one is faster? Concorde or Boeing 747

• Response Time of Concorde vs. Boeing 747?


• Concord is 1350 mph / 610 mph = 2.2 times faster

• Throughput of Concorde vs. Boeing 747 ?


• Boeing is 286,700 pmph / 178,200 pmph = 1.6 “times faster”

• Boeing is 1.6 times (“60%”)faster in terms of throughput


• Concord is 2.2 times (“120%”) faster in terms of flying time

60
FOR YOUR READING

62
Benchmarking the Intel Core i7

◼ Performance & power evaluation by proper


workloads
❑ Workload
◼ set of programs that would run on the processor
❑ For general purpose processors, workloads are
usually formed by programs in SPEC benchmarks

63
SPECINT 2006 running on Intel Core i7

64
SPECpower_ssj2008 running on Intel Xeon
X5650

65
Fallacies and Pitfalls

66
Further Reading

◼ Section 1.13
◼ Section 1.14
❑ Self study

67
Evolution of Intel Microprocessors : 4004

◼ First microprocessor (1971)


❑ For Busicom calculator
◼ Characteristics
❑ 10 mm process
❑ 2300 transistors
❑ 400 – 800 kHz
❑ 4-bit word size

Courtesy of Intel Museum 68


8008

◼ 8-bit follow-on (1972)


❑ Mark 8: Dumb terminals
◼ Characteristics
❑ 10 mm process
❑ 3500 transistors
❑ 500 – 800 kHz
❑ 8-bit word size

Courtesy of Intel Museum 69


8080
◼ 16-bit address bus (1974)
❑ Altair : first personal computer
◼ Characteristics Jon's Mark-8 prototype, as seen on the cover of July 1974 Radio Electronics.

❑ 6 mm process
❑ 4500 transistors
❑ 2 MHz
❑ 8-bit word size

Courtesy of Intel Museum 70


8086 / 8088

◼ 16-bit processor (1978-9)


❑ IBM PC and PC XT
❑ Revolutionary products
❑ Introduced x86 ISA
◼ Characteristics
❑ 3 mm process
❑ 29k transistors
❑ 5-10 MHz
❑ 16-bit word size

Courtesy of Intel Museum 71


80286

◼ Virtual memory (1982)


❑ IBM PC AT
◼ Characteristics
❑ 1.5 mm process
❑ 134k transistors
❑ 6-12 MHz
❑ 16-bit word size

Courtesy of Intel Museum 72


80386

◼ 32-bit processor (1985)


❑ Modern x86 ISA
◼ Characteristics
❑ 1.5-1 mm process
❑ 275k transistors
❑ 16-33 MHz
❑ 32-bit word size

Courtesy of Intel Museum 73


80486

◼ Pipelining (1989)
❑ Floating point unit
❑ 8 KB cache
◼ Characteristics
❑ 1-0.6 mm process
❑ 1.2M transistors
❑ 25-100 MHz
❑ 32-bit word size

Courtesy of Intel Museum 74


Pentium

◼ Superscalar (1993)
❑ 2 instructions per cycle

❑ Separate 8KB I$ & D$

◼ Characteristics
❑ 0.8-0.35 mm process

❑ 3.2M transistors

❑ 60-300 MHz

❑ 32-bit word size

Courtesy of Intel Museum


75
Pentium Pro / II / III

◼ Dynamic execution (1995-9)


❑ 3 micro-ops / cycle
❑ Out of order execution
❑ 16-32 KB I$ & D$
❑ Multimedia instructions
❑ PIII adds 256+ KB L2$
◼ Characteristics
❑ 0.6-0.18 mm process
❑ 5.5M-28M transistors
❑ 166-1000 MHz
❑ 32-bit word size

Courtesy of Intel Museum


76
Pentium 4
◼ Deep pipeline (2001)
❑ Very fast clock
❑ 256-1024 KB L2$
◼ Characteristics
❑ 180 – 90 nm process
❑ 42-125M transistors

❑ 1.4-3.4 GHz

❑ Extended Memory

64 Technology
❑ HyperThreading

Courtesy of Intel Museum


77
Pentium D
◼ Dual core
❑ 2 Pentium 4 cores
❑ 1 M L2 cache each core
◼ Characteristics
❑ 90nm process technology.
❑ 230 million transistors.

❑ 3.2 GHz, 3 GHz, 2.8 GHz,

2.66 GHz
❑ Extended Memory

64 Technology
Courtesy of Intel Museum
❑ HyperThreading
78

You might also like