Chapter 6 Full
Chapter 6 Full
6. PENTIUM 4
1) It has six types of memory segments, i.e. 1) It has four types of memory segments i.e.
CS,ES,FS,GS and SS CS,DS,ES and SS
2) Size of segments are variable from 1 byte 2) Size of segments are variable from 1 byte
to 4GB to 64KB
7) 80386 can access 4GB of memory 7) 8086 can access 1MB of memory
PENTIUM PROCESSOR BASIC
FEATURES
The Pentium microprocessor is almost identical to
the earlier 80386 and 80486 microprocessors
Pentium II
Cartridge
Cache
512K/
Pentium II Internal Bus
1M/
2M
A TYPICAL PENTIUM II SYSTEM
Pentium II
Cartridge
AGP SDRAM
Chipset or
Slot
DRAM
PCI Bus
USB Bridge
Bus
ISA Bus
PENTIUM II (CONT...)
L2 cache is no longer inside the µP IC
But placed very close to µP IC
This changes make the µP less expensive
PENTIUM II (CONT...)
Various versions of P II are available
Standard P II
L2 cache operates at half the processor speed
Celeron: does not contain L2 cache in the cartridge
Rather it is in the main board
Operates at processor speed
Xeon: contain up to 2M (512K/1M/2M) L2 cache
Operates at processor speed
PENTIUM II (CONT...)
Early P II requires
5.0 V
3.3 V and
variable voltage power supply for operation
may vary from 3.5v to as low as 1.8v
Requires 8.4 to 14.2A depending on operating
frequency
PENTIUM II (CONT...) MEMORY
SYSTEM
36 bit address
64 bit data
RAM used has an access time of 8 ns to 10 ns
Also include ECC
Though not used by P II system, parity checking is available
Transfers between PII and memory system are controlled by the
chipset
In fact, chipset controls PII, which is a departure from the
traditional use of processor
PENTIUM II (CONT...)
MEMORY MAP OF A PII BASED SYSTEM
Conventional Memory 0 – 1M
Application Area 0 – 640K
System Area 640K – 1M
Main Memory 1M – 1G
Optional ISA Memory 15M – 16M
Remapped AGP Data
PCI Memory 1G – 4G
AGP Aperture Texture and Instructions
PCI Access to AGP Frame Buffer
PCI Access to AGP Registers
For future expansion 4G – 64G
PENTIUM III
PENTIUM III
Improved version of PII, but based on Pro
architecture, not on P II
Two version of P III available
packaged in a slot 1 cartridge instead of IC chip like P
II with a non-blocking 512K cache running at half
speed of processor
Packaged in 370-pins IC, known as Coppermine, with
256K advanced transfer cache within the IC and
running at processor speed
It has been observed that, increasing cache size
from 256K to 512K improves the performance by
only a few percent
PENTIUM III (CONT...)
Chipset is different from P II
Coppermine increases the bus speed to either
100MHz or 133MHz
Bus speed cannot be increased arbitrarily due
to radiation problem
PENTIUM III (CONT...)
Various versions of P III are also available like PII
Standard P III
Celeron PIII uses 66MHz bus speed
Xeon PIII allows larger cache for server
applications
PENTIUM IV
BLOCK DIAGRAM OF PENTIUM 4
PENTIUM IV
Based on Pro architecture, not on P II or P III
Released initially in Nov’00 with 1.3GHz speed
Later available with speed more then 3GHz
Available in two IC packages using 0.18 micron
423-pin PGA
478-pin FC-PGA2
More recent versions use 0.13 or 0.09 micron
technology
It uses physically smaller transistors
Making it much smaller and faster than P III
Uses 100MHz bus speed.
PENTIUM IV (CONT...)
MEMORY INTERFACE
Typically uses Intel 850 chipset
850 provides a dual-pipe memory bus with
processor
Each pipe interfaced to a 32-bit wide section of
memory
Two pipes functions together to comprise the 64-bit
data bus
PENTIUM 4
Still translate from 80x86 to micro-ops
P4 has better branch predictor, more FUs
Instruction Cache holds micro-operations vs. 80x86 instructions
no decode stages of 80x86 on cache hit
called “trace cache” (TC)
Faster memory bus: 400 MHz v. 133 MHz
Caches
Pentium III: L1I 16KB, L1D 16KB, L2 256 KB
Pentium 4: L1I 12K uops, L1D 8 KB, L2 256 KB
Block size: PIII 32B v. P4 128B; 128 v. 256 bits/clock
Clock rates:
Pentium III 1 GHz v. Pentium IV 1.5 GHz
PENTIUM 4 FEATURES
Multimedia instructions 128 bits wide vs. 64 bits wide => 144
new instructions
When used by programs?
Faster Floating Point: execute 2 64-bit FP Per clock
Memory FU: 1 128-bit load, 1 128-store /clock to MMX regs
Using RAMBUS DRAM
Bandwidth faster, latency same as SDRAM
Cost 2X-3X vs. SDRAM
ALUs operate at 2X clock rate for many ops
Pipeline doesn’t stall at this clock rate: uops replay
Rename registers: 40 vs. 128; Window: 40 v. 126
BTB: 512 vs. 4096 entries (Intel: 1/3 improvement)
BASIC PENTIUM 4 PIPELINE
BTB = Branch Target Buffer (branch predictor)
I-TLB = Instruction TLB, Trace Cache = Instruction cache
RF = Register File; AGU = Address Generation Unit
"Double pumped ALU" means ALU clock rate 2X => 2X ALU F.U.s
From “Pentium 4 (Partially) Previewed,” Microprocessor Report, 8/28/00