0% found this document useful (0 votes)

15 views76 pages

Lecture 02 CH01 Performance Power

Uploaded by

zB H

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views76 pages

Lecture 02 CH01 Performance Power

Uploaded by

zB H

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 76

Lecture 2

• Accessing and Understanding Performance & Power

1
Evaluation is the first step
◼ To build a good computer, first thing you have to do is to
evaluate how good it is.

Compiler

Interface

Evaluating
performance
2
Metrics Matters

◼ Performance
❑ How fast a program can execute

◼ Power
❑ How much energy is consumed

◼ Others metrics
❑ Yield
❑ Cost
❑ Etc….
3
Outline of this lecture

◼ Performance
❑ Basics of performance evaluations
❑ Basic idea of benchmarks
❑ Making the common case fast!
❑ Reporting performance results
◼ Power
❑ Basics of power/energy evaluations
❑ Reducing energy by going from uni-processor to multi-
processors

4
Performance
◼ Why do we care about performance evaluation?
❑ Purchasing perspective
◼ given a collection of machines, which has the
❑ best performance ?
❑ least cost ?
❑ best performance / cost ?
❑ best performance / energy ?
❑ Design perspective
◼ faced with design options, which has the
❑ best performance improvement ?
❑ least cost ?
❑ best performance / cost ?

◼ How to measure, report, and summarize performance?

❑ What kind of performance metrics should be utilized?
❑ How to get the performance values?
5
Two Notions of “Performance”

◼ Response Time (latency)

— How long does it take for my job to run?
— How long does it take to execute a job?
— How long must I wait for the database query?
◼ Throughput
— How many jobs can the machine run at once?
— What is the average execution rate?
— How much work is getting done?

◼ If we upgrade a machine with a new processor what do we increase?

◼ If we add a new machine to the lab what do we increase?

6
Definitions
◼ Performance is in units of things-per-second
❑ bigger is better
◼ If we are primarily concerned with response time
❑ performance(x) = 1
execution_time(x)

" X is n times faster than Y" means

Performance(X) Execution_Time (Y)
n = ---------------------- = -------------------------
Performance(Y) Execution_Time (X)

7
Example of Relative Performance

◼ If computer A runs a program in 10 seconds and

computer B runs the same program in 15 seconds, how
much faster is A than B?

(1) PerformanceA/PerformanceB = n
(2) Performance ratio: 15/10 = 1.5
(3) A is 1.5 times faster than B

We will focus primarily on execution time for a single job !

8
Metrics for Performance Evaluation

◼ Program execution time

❑ Seconds for a program
❑ Elapsed time
◼ Total time to complete a task, including disk access, I/O, etc
◼ CPU execution time
❑ doesn't count I/O or time spent running other
programs
❑ can be broken up into system time, and user time

◼ Our focus: user CPU time

❑ time spent executing the lines of code that are "in" our
program
9
CPU Clocking
◼ Operation of digital hardware governed by a
constant-rate clock
Clock period

Clock (cycles)

Data transfer
and computation
Update state

◼ Clock period: duration of a clock cycle

◼ e.g., 250ps = 0.25ns = 250×10–12s
◼ Clock frequency (rate): cycles per second
◼ e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Clock Cycles
◼ Instead of reporting execution time in seconds, we often use cycles
seconds cycles seconds
= 
program program cycle

◼ cycle time = time between ticks = seconds per cycle

◼ clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec)
1
A 200 MHz. clock has a  10 9 = 5 nanoseconds cycle
time 200  10 6

11
How to Improve Performance
seconds cycles seconds
= 
program program cycle

So, to improve performance (everything else being equal) you can

either

________ the # of required cycles for a program, or

________ the clock cycle time or, said another way,

________ the clock rate.

12
Example of Improving Performance
◼ A program runs 10 second on 4GHz clock computer A. We are
trying to help a computer designer build a computer, B, that will run
this program in 6 seconds. The designer has determined that a
substantial increase in the clock rate is possible, but this increase
will affect the rest of the CPU design, causing computer B to
require 1.2 times as many clock cycles as computer A for this
program. What clock rate should we tell the designer to target?

(1) CPU execute timeA

= CPU clock cyclesA / Clock rateA
= 10 sec = CPU clock cyclesA / 4GHz
CPU clock cyclesA=4GHz x 10sec =40G cycles

(2) CPU execute timeB

= 1.2 x CPU clock cyclesA / Clock rateB
= 6 sec =1.2x40G cycles / Clock rateB
CPU rateB= 1.2x40G cycles / 6sec =8GHz
13
How many cycles are required for a program?

◼ Could assume that # of cycles = # of instructions

2nd instruction
3rd instruction
1st instruction

4th
5th
6th
...
time

This assumption is incorrect,

different instructions take different amounts of time on different machines.

14
Different numbers of cycles for different
instructions

time

• Multiplication takes more time than addition

• Floating point operations take longer than integer ones

• Accessing memory takes more time than accessing registers

• CPI (cycles per instruction)

CPU clock cycles = Instruction for a program x Average CPI

15
Performance Equation
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle

◼ Performance is determined by execution time

◼ Do any of the other variables equal performance?
❑ # of cycles to execute program?

❑ # of instructions in program?

❑ # of cycles per second?

❑ average # of cycles per instruction?

❑ average # of instructions per second?

◼ MIPS (million instructions per second)

◼ When is it fair to compare two processors using MIPS?

◼ Common pitfall: thinking one of the variables is indicative of

performance when it really isn’t.
16
Example #1
◼ Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for a
program, and machine B has a clock cycle time of 500 ps and a CPI of
1.2 for the same program. Which machine is faster for this program, and
by how much?

(1) CPU clock cyclesA=I x 2.0

CPU clock cyclesB=I x 1.2
# I is the number of instructions for this program

(2) CPU timeA=I x 2.0 x 250 ps = 500 x I ps

CPU timeB=I x 1.2 x 500 ps = 600 x I ps

(3) CPU performanceA= 1 / timeA

CPU performanceB= 1 / timeB

(4) Speedup= performanceA/ performanceB =1.2

17
How Metrics Misleads

◼ MIPS
❑ Million Instructions per Second
❑ A metrics commonly used by vendors to show how
high performance their CPUs are

18
Example #2 MIPS Performance Measure

Instruction counts (in billions)

Code from for each instruction class
A B C
Compiler 1 5 1 1
Compiler 2 10 1 1

• Which code sequence will execute faster

according to MIPS (Million Instructions Per
Second)?
• Which code sequence will execute faster
according to execution time? CPI : A = 1, B =2 , C = 3
Clock Rate 4 GHz

n MIPS = Instruction Count

CPU clock cycles =  CPI  C
------------------------------
i i
i =1 Execution time x 10^6
19
Example of MIPS Performance Measure (cont.)

(1) CPU clock cycles1=(5x1+1x2+1x3)x109=10x109

CPU clock cycles2=(10x1+1x2+1x3)x109=15x109

(2) Execution time1=10x109/4x109=2.5sec

Execution time2=15x109/4x109=3.75sec

(3) MIPS1=(5+1+1)x109/2.5x106=2800
MIPS2=(10+1+1)x109/3.75x106=3200

So, the code from compiler 2 has a higher MIPS

rating, but the code from compiler 1 runs faster!

20
Example #4
◼ Suppose we have two implementations of the same instruction set
architecture (ISA).
For some program,
Machine A has a clock cycle time of 10 ns. and a CPI of 2.0
Machine B has a clock cycle time of 20 ns. and a CPI of 1.2
What machine is faster for this program, and by how much?

◼ If two machines have the same ISA, which of our quantities

(e.g., clock rate, CPI, execution time, # of instructions, MIPS)
will always be identical?

21
Example #4
◼ Suppose we have two implementations of the same instruction set
architecture (ISA).
For some program,
Machine A has a clock cycle time of 10 ns. and a CPI of 2.0
Machine B has a clock cycle time of 20 ns. and a CPI of 1.2
What machine is faster for this program, and by how much?

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

I x 2.0 x 10
Execution_Time (A)
=
Execution_Time (B) I x 1.2 x 20

22
Example #5
Instruction class CPI for this instruction class
A 1
B 2
C 3

Instruction counts for instruction class

Code sequence
A B C
1 2 1 2
2 4 1 1

• Which code sequence executes the most

instructions?
• Which will be faster?
• What is the CPI for each sequence?
23
Example #5

(1) Seq1 = 2 + 1 + 2 = 5
Seq2 = 4 + 1 + 1 = 6

(2) CPU clock cycles1=(2x1)+(1x2)+(2x3)=10

CPU clock cycles2=(4x1)+(1x2)+(1x3)=9

(3) CPI1= CPU clock cycles1/Instruction count1 = 10/5 = 2

CPI2= CPU clock cycles2/Instruction count2 = 9/6 = 1.5

When comparing 2 machines, these “3 components”

must be considered!

24
Aspects of CPU Performance
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle

Inst Count CPI Clock Rate

Algorithm X X

Programming
Language X X

Compiler X X

ISA X X X
(instruction set architecture)

25
Performance Improvement
Seconds Instructions Clock cycles Seconds
Time = =  
pogram program Instruction Clock cycles

Instruction count CPI: clock per Clock rate

for a program instruction

◼ A shown in the formula, given an ISA (Instruction Set

Architecture), increases in CPU performance can
come from three source:
1. Increases in clock rate
2. Improvements in processor organization that lower the CPI
3. Compiler enhancements that lower the instruction count or
generate instructions with a lower average CPI (e.g., by
using simpler instructions)
26
Now, you can answer this question..
◼ CPU frequency ? Performance

6
SPECint

0
50 100 150 200 250

Clock rate (MHz) Pentium

Pentium Pro
27
How to get IC and CPI values?

◼ Which programs should be run to get the

numbers?
❑ Applications/programs that are commonly used in the
target machines

◼ Should there be only one program?

❑ More than one applications/programs may operate on
the target machines
❑ Workload
◼ Set of applications that are commonly running on the target
machines
28
Benchmarks or Benchmark Suites
◼ Set of applications/programs used for evaluating a
computer

◼ Small benchmarks
❑ nice for architects and designers
❑ easy to standardize
❑ can be abused
◼ SPEC (System Performance Evaluation Cooperative)
❑ companies have agreed on a set of real program and inputs
❑ can still be abused
❑ valuable indicator of performance (and compiler technology)
❑ latest: spec2010

29
SPEC Benchmarks
◼ CPU
❑ Computation-intensive workload for testing different
CPU architectures
❑ Two major sets:
◼ Integer (SPEC CINT)
◼ Floating point (SPEC CFP)
◼ High Performance Computing, OpenMP, MPI
❑ For testing parallel applications
◼ Power
◼ Web server
◼ More information in http://www.spec.org/
30
SPEC CINT2000

31
Now you have several results…
◼ Usually you will have results of
❑ Running the workload on the new machine
❑ Running the workload on the old/reference machine
❑ Execution times of all programs in the workload
◼ How do you report this clearly?
Programs of Exe. Time of Exe. Time of Speedup
the workload Reference New
A 1000 500 2
B 90 20 4.5
C 600 150 4
D 10 1 10
E 12600 300 42
F 1200 60 20 32
How to report results clearly?

◼ Which machine is better? By how much?

◼ Which application achieves the best
performance improvement on the new machine?
By how much?
◼ How much performance gain of the whole
workload?

33
Comparisons of Reporting Methods
Program Exe. Exe. Speedup Comparison of Execution Times
s of the Time of Time of
workload Referenc New Exe. Time of Reference Exe. Time of New
e 14000
A 1000 500 2
12000
B 90 20 4.5 Huge

In Seconds
C 600 150 4 10000
Differences!
D 10 1 10
8000
E 12600 300 42
F 1200 60 20 6000

4000

2000

0
A B C D E F Average

• Huge differences in execution times!

• Overall workload results by simply reporting the average
time is not fair.
34
Comparisons of Reporting Methods
Program Exe. Exe. Speedup
s of the Time of Time of speedup
workloa Referen New 45
d ce 40
A 1000 500 2 35

B 90 20 4.5 30

C 600 150 4 25

D 10 1 10 20

E 12600 300 42 15

10
F 1200 60 20
5
Avg. 2583 171 13.75
0
A B C D E F Average

Better! The average is meaningful.

Huge differences still.

35
Normalize to the reference machine
◼ Normalization
❑ adjusting values measured on different scales to a
notionally common scale
❑ Normalized to the reference machine
→ exe.new / exe.reference
normalized to reference
Progr Exe. Exe. Speed Norm 60%
ams Time Time up alized
of the of of to 50%
workl Refere New Refere
oad nce nce 40%

A 1000 500 2 0.5

30%
B 90 20 4.5 0.22
20%
C 600 150 4 0.25
D 10 1 10 0.1 10%

E 12600 300 42 0.024 0%

A B C D E F Average
F 1200 60 20 0.05
36
Avg. 2583 171 13.75 0.19
Normalize to the reference machine

We can easily summarize the performance as

1. The new machine improves 80% of performance on the
average.
2. The new machine works the best on Program E that yield
98% performance improvement.

normalized to reference
Progr Exe. Exe. Speed Norm 60%
ams Time Time up alized
of the of of to 50%
workl Refere New Refere
oad nce nce 40%

A 1000 500 2 0.5

30%
B 90 20 4.5 0.22
20%
C 600 150 4 0.25
D 10 1 10 0.1 10%

E 12600 300 42 0.02 0%

A B C D E F Average
F 1200 60 20 0.05
Avg. 2583 171 13.75 0.19 37
Outline of this lecture

◼ Performance
❑ Basics of performance evaluations
❑ Basic idea of benchmarks
❑ Reporting performance results
❑ Making the common case fast!
◼ Power
❑ Basics of power/energy evaluations
❑ Reducing energy by going from uni-processor to multi-
processors

38
Amdahl's Law
Speedup due to enhancement E:
ExTime w/o E Performance w/ E
Speedup(E) = ------------- = -------------------
ExTime w/ E Performance w/o E

Suppose that enhancement E accelerates a fraction F of the task by a

factor S, and the remainder of the task is unaffected, then:

ExTime(E) =((1-F) + F/S) X ExTime(without E)

1
Speedup(E) =
(1-F) + F/S

39
Amdahl’s Law
◼ Floating point instructions improved to run 2X;
but only 10% of actual instructions are FP

ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

1
Speedupoverall = = 1.053
0.95

40
Eight Great Ideas

◼ Design for Moore’s Law

◼ Use abstraction to simplify design
◼ Make the common case fast
◼ Performance via parallelism
◼ Performance via pipelining All of these
would be
◼ Performance via prediction
covered in this
◼ Hierarchy of memories semester’s
◼ Dependability via Redundancy lecture

41
Example #6
◼ Our favorite program runs in 10 seconds on computer A, which has a
400 Mhz. clock. We are trying to help a computer designer build a new
machine B, that will run this program in 6 seconds. The designer can use
new (or perhaps more expensive) technology to substantially increase the
clock rate, but has informed us that this increase will affect the rest of the
CPU design, causing machine B to require 1.2 times as many clock cycles
as machine A for the same program. What clock rate should we tell the
designer to target?"

1
Execution_Time (A) 10 C x 400 x10^ 6
= = 1
6 1.2C x
Execution_Time (B) x
seconds cycles seconds
= 
program program cycle

42
Outline of this lecture

43
With Moore’s Law…The Power Wall
Moore’s Law (1965): “The density of transistors in an
integrated circuit will double every year.” (18 months in fact)
Processor Performance increases
1600 Interl
Pentium
Power dissipation also increases…
1500

1400
1.58x per year
1300

1200

1100

1000
HP
900 9000

800 10000
700

600 Rocket Sun’surface

Power Density (W/cm2)

DEC
500 1000 Alpha Nozzle
400 Nuclear
300
Reactor
200 DEC
1.35x per year
MIPS IBM HP
Alpha
100
R2000
100
Pow er1 9000

1984 1986 1988 1990 1992 1994 1996 1998 2000

Hot Plate
8008 8086
10
4004 8085 486 P6
386 Pentium®
8080 286
1
1970 1980 1990 2000 2010
44
The Power Wall Problem
◼ Pentium 4 has sharply increase in frequency and power
consumption, but little increase in performance
◼ Core series has lower power consumption while keeps
the frequency

45
Basics of Power Consumption

Pavg = Pswitching + Pshort −circuit + Pleakage

=  o→1C L  V  f clk + I sc  Vdd + I leakage  Vdd
2
dd

where : α 0 → 1 switch frequency

1. Switching Power or Dynamic power:
Switching: charging and discharing at transition from 0->1 or 1-> 0
Short circuit: power due to brief short circuit current during transition
2. Leakage Power or Static power : leakage power, per-cycle energy cost
250
Leakage Vs. Dynamic
Energy (in J) 200 Power*

E =t*P 150
Leakage Power
100 Dynamic Power
50

0 46
250 nm 180 nm 130 nm 100 nm 50 nm
Switch from Uniprocessor to Multiprocessor

◼ Utilizing massive on-chip resource to improve

throughput
◼ Decrease power consumption
❑ Pdynamic = o→1CL Vdd2  f clk
❑ Voltage and frequency are positive related

Case 1 Energy = t × αCV2f

Processor 1

Case 2 t
Energy
Processor 1
= 2t × (αC(0.5V)2(0,5f))
Processor 2 = 2t x 0.125αCV2f
t = 0.25t x αCV2f 47
§1.8 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance

Constrained by power, instruction-level parallelism,

memory latency
Chapter 1 — Computer
Abstractions and
Technology — 48
Multiprocessors

◼ Multicore microprocessors
❑ More than one processor per chip
◼ Requires explicitly parallel programming
❑ Compare with instruction level parallelism
◼ Hardware executes multiple instructions at once
◼ Hidden from the programmer
❑ Hard to do
◼ Programming for performance
◼ Load balancing
◼ Optimizing communication and synchronization

Chapter 1 — Computer
Abstractions and
Technology — 49
Multiprocessor Trend

Product AMD Intel IBM Sun

Opteron X4 Nehalem Power 6 Ultra SPARC
(Barcelona) T2 (Niagara 2)
Cores/Chip 4 4 2 8
Clock Rate 2.5 GHz ~2.5 GHz? 4.7 GHz 1.4GHz
Power 120W ~100W? ~ 100W? 94W

Plan of these companies:

Double the number of cores per microprocessor for every 2 years.

51
Design Challenges of Multicore Architecture

◼ Parallel programming
❑ Rewrite the originally sequential program to take
advantage of multiple processors
❑ OpenMP, POSIX, CUDA
◼ Load balance of processors
❑ How to schedule tasks onto processors?
◼ Communication & synchronization issue

52
Evaluations with SPECspeed 2017 Integer benchmarks on a
1.8 GHz Intel Xeon E5-2650L

Chapter 1 — Computer
Abstractions and
Technology — 53
SPEC Power Benchmark
◼ Power consumption of server at different
workload levels
❑ Performance: ssj_ops/sec
❑ Power: Watts (Joules/sec)

 10   10 
Overall ssj_ops per Watt =   ssj_opsi    poweri 
 i =0   i =0 

Chapter 1 — Computer
Abstractions and
Technology — 54
SPECpower_ssj2008 for Xeon E5-2650L

Chapter 1 — Computer
Abstractions and
Technology — 55
Fallacy: Low Power at Idle

◼ Look back at i7 power benchmark

❑ At 100% load: 258W
❑ At 50% load: 170W (66%)
❑ At 10% load: 121W (47%)
◼ Google data center
❑ Mostly operates at 10% – 50% load
❑ At 100% load less than 1% of the time
◼ Consider designing processors to make power
proportional to load
Chapter 1 — Computer
Abstractions and
Technology — 56
§1.12 Concluding Remarks
Concluding Remarks

◼ Cost/performance is improving
❑ Due to underlying technology development
◼ Hierarchical layers of abstraction
❑ In both hardware and software
◼ Instruction set architecture
❑ The hardware/software interface
◼ Execution time: the best performance measure
◼ Power is a limiting factor
❑ Use parallelism to improve performance

Chapter 1 — Computer
Abstractions and
Technology — 57
Reading Assignment

◼ 2.1 ~ 2.7

58
Which of these airplanes has the best performance?
Airplane Passenger Cruising range Cruising speed Passenger throughput
Capacity (miles) (m.p.h.) (passengers x m.p.h.)

Boeing 777 375 4630 610 228,750

Boeing 747 470 4150 610 286,700

BAC/Sud Concorde 132 4000 1350 178,200

Douglas DC-8-50 146 8720 544 79,424

• What metric defines performance? Concorde

• Capacity, cruising range, or speed?
• Speed? Still has two possible definitions…
• Taking one passenger from one point to another in the least time
• Transporting 450 passengers from one point to another

59
Which one is faster? Concorde or Boeing 747

• Response Time of Concorde vs. Boeing 747?

• Concord is 1350 mph / 610 mph = 2.2 times faster

• Throughput of Concorde vs. Boeing 747 ?

• Boeing is 286,700 pmph / 178,200 pmph = 1.6 “times faster”

• Boeing is 1.6 times (“60%”)faster in terms of throughput

• Concord is 2.2 times (“120%”) faster in terms of flying time

60
FOR YOUR READING

62
Benchmarking the Intel Core i7

◼ Performance & power evaluation by proper

workloads
❑ Workload
◼ set of programs that would run on the processor
❑ For general purpose processors, workloads are
usually formed by programs in SPEC benchmarks

63
SPECINT 2006 running on Intel Core i7

64
SPECpower_ssj2008 running on Intel Xeon
X5650

65
Fallacies and Pitfalls

66
Further Reading

◼ Section 1.13
◼ Section 1.14
❑ Self study

67
Evolution of Intel Microprocessors : 4004

◼ First microprocessor (1971)

❑ For Busicom calculator
◼ Characteristics
❑ 10 mm process
❑ 2300 transistors
❑ 400 – 800 kHz
❑ 4-bit word size

Courtesy of Intel Museum 68

8008

◼ 8-bit follow-on (1972)

❑ Mark 8: Dumb terminals
◼ Characteristics
❑ 10 mm process
❑ 3500 transistors
❑ 500 – 800 kHz
❑ 8-bit word size

Courtesy of Intel Museum 69

8080
◼ 16-bit address bus (1974)
❑ Altair : first personal computer
◼ Characteristics Jon's Mark-8 prototype, as seen on the cover of July 1974 Radio Electronics.

❑ 6 mm process
❑ 4500 transistors
❑ 2 MHz
❑ 8-bit word size

Courtesy of Intel Museum 70

8086 / 8088

◼ 16-bit processor (1978-9)

❑ IBM PC and PC XT
❑ Revolutionary products
❑ Introduced x86 ISA
◼ Characteristics
❑ 3 mm process
❑ 29k transistors
❑ 5-10 MHz
❑ 16-bit word size

Courtesy of Intel Museum 71

80286

◼ Virtual memory (1982)

❑ IBM PC AT
◼ Characteristics
❑ 1.5 mm process
❑ 134k transistors
❑ 6-12 MHz
❑ 16-bit word size

Courtesy of Intel Museum 72

80386

◼ 32-bit processor (1985)

❑ Modern x86 ISA
◼ Characteristics
❑ 1.5-1 mm process
❑ 275k transistors
❑ 16-33 MHz
❑ 32-bit word size

Courtesy of Intel Museum 73

80486

◼ Pipelining (1989)
❑ Floating point unit
❑ 8 KB cache
◼ Characteristics
❑ 1-0.6 mm process
❑ 1.2M transistors
❑ 25-100 MHz
❑ 32-bit word size

Courtesy of Intel Museum 74

Pentium

◼ Superscalar (1993)
❑ 2 instructions per cycle

❑ Separate 8KB I$ & D$

◼ Characteristics
❑ 0.8-0.35 mm process

❑ 3.2M transistors

❑ 60-300 MHz

❑ 32-bit word size

Courtesy of Intel Museum

75
Pentium Pro / II / III

◼ Dynamic execution (1995-9)

❑ 3 micro-ops / cycle
❑ Out of order execution
❑ 16-32 KB I$ & D$
❑ Multimedia instructions
❑ PIII adds 256+ KB L2$
◼ Characteristics
❑ 0.6-0.18 mm process
❑ 5.5M-28M transistors
❑ 166-1000 MHz
❑ 32-bit word size

Courtesy of Intel Museum

76
Pentium 4
◼ Deep pipeline (2001)
❑ Very fast clock
❑ 256-1024 KB L2$
◼ Characteristics
❑ 180 – 90 nm process
❑ 42-125M transistors

❑ 1.4-3.4 GHz

❑ Extended Memory

64 Technology
❑ HyperThreading

Courtesy of Intel Museum

77
Pentium D
◼ Dual core
❑ 2 Pentium 4 cores
❑ 1 M L2 cache each core
◼ Characteristics
❑ 90nm process technology.
❑ 230 million transistors.

❑ 3.2 GHz, 3 GHz, 2.8 GHz,

2.66 GHz
❑ Extended Memory

64 Technology
Courtesy of Intel Museum
❑ HyperThreading
78

Unit - Iii Instruction Set of 8086 Microprocessor: (16 - Marks)
100% (2)
Unit - Iii Instruction Set of 8086 Microprocessor: (16 - Marks)
54 pages
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
80% (5)
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
118 pages
Cs23402 - Computer Architecture - Unit - 1
No ratings yet
Cs23402 - Computer Architecture - Unit - 1
161 pages
8051 Microcontroller Instruction
No ratings yet
8051 Microcontroller Instruction
32 pages
COMP 303 Computer Architecture
No ratings yet
COMP 303 Computer Architecture
34 pages
Module 2 (26-10-2024)
No ratings yet
Module 2 (26-10-2024)
50 pages
Ilovepdf - Merged (4) 36 274
No ratings yet
Ilovepdf - Merged (4) 36 274
120 pages
Performance
No ratings yet
Performance
51 pages
2 RISC V Performance ISA
No ratings yet
2 RISC V Performance ISA
72 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
Quatitative Principle
No ratings yet
Quatitative Principle
56 pages
C A Lecture-3
No ratings yet
C A Lecture-3
41 pages
Computer Architecture Measurement
No ratings yet
Computer Architecture Measurement
26 pages
CAO Fall 2024 Lecture 06 Design Metrics Performance Evaluation
No ratings yet
CAO Fall 2024 Lecture 06 Design Metrics Performance Evaluation
41 pages
1aca L1
No ratings yet
1aca L1
35 pages
2 CPU Performance
No ratings yet
2 CPU Performance
35 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
ACA Lec2 New
No ratings yet
ACA Lec2 New
44 pages
Lect 1
No ratings yet
Lect 1
54 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
52 pages
Cse - 321 - 2
No ratings yet
Cse - 321 - 2
37 pages
Week 10 Part 02 - Processor Performance (Answers)
No ratings yet
Week 10 Part 02 - Processor Performance (Answers)
35 pages
Performance
No ratings yet
Performance
23 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
2 - Computer Organization and Architecture
No ratings yet
2 - Computer Organization and Architecture
21 pages
Computer Performance
No ratings yet
Computer Performance
18 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
56 pages
Lecture # 2
No ratings yet
Lecture # 2
33 pages
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
No ratings yet
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
23 pages
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
No ratings yet
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
52 pages
Lecture4 Performance Evaluation 2011
No ratings yet
Lecture4 Performance Evaluation 2011
34 pages
Measuring Performance: Chris Clack B261 Systems Architecture
No ratings yet
Measuring Performance: Chris Clack B261 Systems Architecture
19 pages
4 Perfrmance
No ratings yet
4 Perfrmance
30 pages
Assessing and Understanding Performance
No ratings yet
Assessing and Understanding Performance
31 pages
09 Perf
No ratings yet
09 Perf
22 pages
Week 2 - Lecture 2 - Performance Measurement
No ratings yet
Week 2 - Lecture 2 - Performance Measurement
25 pages
L14 Introduction To Performance Evaluation
No ratings yet
L14 Introduction To Performance Evaluation
48 pages
Lesson 3 - Computing For Performance
No ratings yet
Lesson 3 - Computing For Performance
38 pages
Module 3.3 - Problems On Performance
No ratings yet
Module 3.3 - Problems On Performance
54 pages
Da Ci
No ratings yet
Da Ci
13 pages
Computer Performance
No ratings yet
Computer Performance
22 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
It3030e CA Chap1 Introduction 2.0m
No ratings yet
It3030e CA Chap1 Introduction 2.0m
25 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
The Role of Performance: Chapter - 2
No ratings yet
The Role of Performance: Chapter - 2
40 pages
Week 13 14 - Performance Evaluation
No ratings yet
Week 13 14 - Performance Evaluation
19 pages
Lect 1
No ratings yet
Lect 1
56 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
52 pages
Computer Organization The Role of Performance
No ratings yet
Computer Organization The Role of Performance
45 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
Performance Measures
No ratings yet
Performance Measures
25 pages
Performance of Processor1
No ratings yet
Performance of Processor1
9 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
Lecture4 Performance Evaluation
No ratings yet
Lecture4 Performance Evaluation
34 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
Performance Chap4
No ratings yet
Performance Chap4
20 pages
CH 02a-Computer Performance
No ratings yet
CH 02a-Computer Performance
22 pages
2 - 7 Multiprocessor Configurations
No ratings yet
2 - 7 Multiprocessor Configurations
15 pages
Microprocessor Core 2 Duo
100% (1)
Microprocessor Core 2 Duo
22 pages
Coa Viva
No ratings yet
Coa Viva
5 pages
Chapter 04 Processors and Memory Hierarchy PDF
No ratings yet
Chapter 04 Processors and Memory Hierarchy PDF
50 pages
8051 Instruction Set
No ratings yet
8051 Instruction Set
23 pages
4-Special Processor Activities, Min and Max Modes-04!01!2025
No ratings yet
4-Special Processor Activities, Min and Max Modes-04!01!2025
45 pages
How To Implement STM32 To B4R
No ratings yet
How To Implement STM32 To B4R
8 pages
STM32更新程序OTA CSDN博客
No ratings yet
STM32更新程序OTA CSDN博客
1,325 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
Amali Arduino
100% (1)
Amali Arduino
45 pages
Basic Course On 8051 Microcontrollers
No ratings yet
Basic Course On 8051 Microcontrollers
2 pages
CSC 2211 Computer Organization and Archicture
No ratings yet
CSC 2211 Computer Organization and Archicture
3 pages
Register File Prefetching
No ratings yet
Register File Prefetching
14 pages
Chuong 4-1. Gioi Thieu VDK AVR ATmega32
No ratings yet
Chuong 4-1. Gioi Thieu VDK AVR ATmega32
33 pages
8051 Microcontroller Notes by Arunkumar
No ratings yet
8051 Microcontroller Notes by Arunkumar
2 pages
x86 Architecture 1703184092
No ratings yet
x86 Architecture 1703184092
30 pages
Zen Architecture
No ratings yet
Zen Architecture
24 pages
Techopedia Explains: Amdahl's Law
No ratings yet
Techopedia Explains: Amdahl's Law
19 pages
Addressing Modes of 8086
0% (1)
Addressing Modes of 8086
4 pages
EE P62 Microprocessor & Microcontrollers Lab
No ratings yet
EE P62 Microprocessor & Microcontrollers Lab
2 pages
Pentium 1 Features and Architecture
No ratings yet
Pentium 1 Features and Architecture
5 pages
Exercise 4
No ratings yet
Exercise 4
3 pages
Motorola 6809 and Hitachi 6309 Programming Reference (Darren Atkinson) - 146-153
No ratings yet
Motorola 6809 and Hitachi 6309 Programming Reference (Darren Atkinson) - 146-153
8 pages
Read and Translate The Text Central Processing Unit (CPU)
No ratings yet
Read and Translate The Text Central Processing Unit (CPU)
2 pages
Motherboard Jumpers
No ratings yet
Motherboard Jumpers
2 pages
Presantation Topic
No ratings yet
Presantation Topic
6 pages
Processor
No ratings yet
Processor
3 pages
What Is The Use of Crystal Oscillator in 8051 and Pic Ic's
No ratings yet
What Is The Use of Crystal Oscillator in 8051 and Pic Ic's
2 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet