0% found this document useful (0 votes)
8 views

Module 4 (1)

The document provides an overview of the Pentium microprocessor, detailing its architecture, features, and operational capabilities, including its superscalar design and dual integer pipelines. It highlights the advantages of separate caches for code and data, branch prediction mechanisms, and the integrated floating point unit. Additionally, it compares the Pentium processor with its predecessor, the 80386, emphasizing improvements in performance and functionality.

Uploaded by

hppatilhpp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Module 4 (1)

The document provides an overview of the Pentium microprocessor, detailing its architecture, features, and operational capabilities, including its superscalar design and dual integer pipelines. It highlights the advantages of separate caches for code and data, branch prediction mechanisms, and the integrated floating point unit. Additionally, it compares the Pentium processor with its predecessor, the 80386, emphasizing improvements in performance and functionality.

Uploaded by

hppatilhpp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

M O D U L E 4:

PENTIUM
P RO C E S S O R
M O D U L E 4 CO N T E N TS

1. Pentium Architecture
2. Superscalar Operation,
3. Integer &Floating-Point Pipeline Stages,
4. Branch Prediction Logic,
5. Cache Organization
6. M E S I protocol
Pentium Microprocessor

 Pentium Microprocessor is one of the powerful


family members of Intel’s Χ 86 microprocessor. It
is an advanced superscalar 32-bit.
microprocessor, introduced in the year 1993
that contains around 3.1 million transistors.
 It has a 64-bit data bus and a 32-bit address bus

that offers 4 Gb of physical memory space.


F E AT U R E S O F P E N T I U M
P RO C E S S O R
 Separate instruction and Data caches.
 Dual integer pipelines i.e. U-pipeline and V-

Pipeline.
 Branch prediction using the branch target buffer

(BTB).
 Pipeliened floating point unit.

 64- bit external data bus.

 Even-parity checking is implemented for data

bus, caches and TLBs.


SA L I E N T F EAT U R E S
 32- bit Superscalar and super-pipelined
architecture C I S C processor.
 32-bit address bus can address up to 4GB of

physical memory.
 64- bit data bus so arithmetic and logical

operation can be perform on 64-bit operand.


 Two integer pipeline U and V with two ALU’s

provide one-clock execution for core instructions


which Improved Instructions to execute Time.
 Support five stage pipeline enables
multiple instructions to execute in parallel with high
efficiency.
 Two 8 K B caches memories, one for data and other for

code.
 On- chip pipelined floating point coprocessor.

 Support power management feature i.e.


System Management mode and Clock Control.
 Enhanced branch prediction buffer.

 On-chip memory management unit.

 Support multiprogramming and multitasking.


 Internal error Detection features.
 4MB pages for increased T LB Hit Rate.

 Support for Second level cache and Write Back M E S I .

 Protocol in the data cache.

 Supports Bus cycle Pipelining, Address Parity and

Internal Parity Checking, Functional Redundancy


checking, Execution Tracing, Performance Monitoring.
Pentium Register

∗ Four 32-bit register (EAX, E B X , E C X , EDX)


 Four 32-bit registers can be used as

∗ Four 16-bit register (AX, BX , C X , DX)


∗ Eight 8-bit register (AH, AL, B H , BL, C H ,
C L , D H , DL)

∗ E C X for count in loop instructions


• Some registers have special use
∗ 16- or 32-bit registers
 Two index registers

∗ Used in string
instructions
» Source (SI) and

∗ Can be used as
destination (DI)

general- purpose data


registers

∗ 16- or 32-bit registers


• Two pointer registers

∗ Used exclusively
to maintain the
stack
Superscalar Execution

 It supports superscalar pipeline architecture.


 The Pentium processor sends two instructions in

parallel to the two independent integer pipeline


known as U and V pipelines for execution of
multiple instructions concurrently.
 Thus, processor capable of parallel instruction

execution of multiple instructions is known as


Superscalar Machine.
 Each of these pipeline i.e. U and V, has 65 stages of
execution i.e.
⚫ Prefetch
⚫ First Decode
⚫ Second Decode
⚫ Execute
⚫ Write Back
1. Pre-fetch (PF): In this stage, instructions are
prefetched by prefetch buffer through U and V
pipelene from the on-chip instruction cache.
2. First Decode (D1): In this stage, two decoders
decode the instructions to generate a control
word and try to pair them together so they can
run in parallel.
3. Second Decode (D2): In this stage, the C P U
decodes the control word and calculates the address
of memory operand.
4. Execute (EX): The instruction is executrd in A LU
and data cache is accessed at this stage. For both
A L U and data cache access requires more than one
clock.
5. Write Back (WB): In this stage, the C P U stores
result and update the flags.
S E PA R AT E C O D E AND DATA
C AC H E S :
 Pentium has two separate 8KB caches for code
and data as the superscalar design and branch
prediction need more bandwidth than a unified
cache.
 The data cache has a dedicated T LB to translate

linear address into physical address used by


data cache.
 The code cache is responsible for getting raw

instructions into execution unis of the Pentium


processor and hence instructions are fetched from
the code cache.
A DVA N TAG E S O F H AV I N G
SEPARATE C O D E AND DATA
C ACHES
 Separate caches efficiently execute the
branch
prediction.
 Caches raise system performance i.e. an internal

read request is performed more quickly than a


bus cycle to memory.
 Separate caches also reduce the processor’s use of

the external bus when the same location are


accessed multiple times.
 Separate caches for instructions a and data allow

simultaneous cache look-up.


 Up to two data references and up to 32 bytes of

raw op-codes can be accessed In one clock.


BRANCH
PREDICTION
 Branch prediction is used to predict the most likely set

of instruction to be executed and prefetch to make


them available to the pipeline as and when they are
called.
 Hence the pentium processor incorporate a branch

target buffer (BTB), which is an associative memory


used to improve the performance if it takes the branch
instrction.
 Branch instructions of pentium processor change the

normal sequential control flow of the program


execution and may stall the pipelined execution in the
pentium system.
 Branches instruction is of two types i.e. conditional and
unconditional branch.
 During the conditional branching, the C P U has two

wait till the execution stage to determine whether the


condition is satisfied or not.
 When the condition satisfies, a branching is to be taken

using branch prediction algorithm for speed up of the


instruction execution.
 In pentium Processor, BTB can have 256 entries which

contains the branch target address for previously


executed branches.
 The BTB is four ways set associative on-chip memory.
 Whenever the branching occurs, the C P U checks the
branch instruction address and the destination
address in the BTB.
 When the instruction is decoded, the C P U searches
the branch target buffer to decide whether any entry
exists for a corresponding branch instruction.
 If BTB is hit, i.e. if BTB exist such entries, then the
C P U use the history to decide whether the branch will
be taken or not.
 If the entry exist in its previous history in BTB to take
the branch, the C P U fetches the instruction from the
target address and decodes them.
 If the branch prediction is correct, the process
continue else the C P U clears the pipeline and fetches
from the appropriate target address.
F LOAT I N G POINT UNIT
 The Pentium contains an on chip floating point
unit that provides significant floating point
performance advantage over previous
generations of processors.
 Providing the coprocessor onto the same chip as

the processor, pentium allows


faster communication and quicker execution.
 Thus many floating point instructions requires

fewer clock cycles that the previous 80X87.


F LOAT I N G PO I NT P I P E L I N E
 The floating point pipeline of Pentium consists of
eight stages which are used to speedup the
execution of floating point unit. Hence, it is
necessary floating point pipeline.
 These are prefetch, first decode, second decode,

operand fetch, first execute, second execute,


write float and error reporting.
 Floating point unit has eight stage pipeline which

gives a single cycle execution for many of floating


point instructions such as floating adds,
subtract, multiply and compare
 The stages and their functions are given below:
1. PF: Instructions are prefetched from the on chip
instruction cache.
2. D1: Instruction decode to generate control word. A
single control word causes direct execution of
an instruction and complex instruction require
micro- coded control sequence.
3. D2: Address of memory resident operand are
calculated.
4. Ex: In this stage, register read, memory read or
memory write operation is performed to access an
operand as required by the instructioin.
5. X1: In this stage, the floating point data from
register or memory is written into floating point
register or memory is written into floating point
register and converted to floating point format before
loaded into the floating point unit.
6. X2: In this stage, the floating point operation is
performed by floating point unit.
7. WF: In this stage, results or floating point operation
are rounded and the written to the
destination floating point register.
8. E R : In this stage, if any error is occurred during
floating point execution, then it is reported and
F P U status word is updated.
F LOAT I N G POINT E XCE P T I O N
 There are six floating point exception condition
while execution floating point instructions are
available in status word register of Pentium
F P U.
 These exceptions are:

1. Invalid Opeartions:
⚫ Stack overflow or underflow
⚫ Invalid arithmatic operation
This exception occurs when stack fault flag (SF) of the
F P U status word indicates the type of operation i.e.
stack overflow or Underflow for SF=1 and an
arithmetic instruction has encountered an invalid
operand for SF=1.
• Divide by zero: This exception occurs
wheneveran
instruction attempts to divid a finite
• De-normalized non-zeroThe
exception: operand by 0.
operand de-normal if an
operand toexception
attempts operate onoccurs arithmetic
a de-normal operand instruction
or if an attempt
is made to load de-normal single or double real value into an
F P U register.
• Numeric flow exception: This exception occurs
whenever the rounded result of an arithmetic instruction is
less than the smallest possible normalized, finite value that
will fit into the real format of the destination operand.
• Inexact result (Precision) Exception: This
exception occurs if the result of an operation is not exactly
representable in the destination format.
C O M PA R I S O N O F 80386 AND
PENTIUM 80386 Pentium
32- bit integer core C P U with 32 32 bit C P U with 64-bit data bus
bit Data bus
No superscalar architecture and Superscalar architecture i.e. two
single cycle execution pipelined Integer Units are
capable of 2 Instructions per
clock
Ni internal cache available for Separate 8KB code and 8KB data
data and code cache available
80386 does not support branch Advanced design feature i.e.
prediction Dynamic branch Prediction
One integer Alu Two integer A LU
Operating frequency are 20 MHz Operating frequency 60 MH z and
to 66 MHz more
F P U is non- pipelined as it is an F P U is pipelined as it is in built
external device 80387 in pentium
P E N T I U M P RO F EAT U R E S
 The Pentium pro has a performance near about
50% higher than a Pentium of the same clock
speed.
 Super-pipelining: 14 stages pipelining as
compare to 5 stage of pentium Processor.
 Integrated Level 2 Cache: 256-K B static Ram
on- chip coupled to the core processor through a
full clock speed, 64- bit, cache bus.
 32- bit Optimization: Optimized for running,
32-bit code used in Windows NT.
 Wider Address Bus: 36 bit address bus which is
used to address 2 36 =64GB of physical address
space.
 Greater Multiprocessing: Multi-processor systems
of up to 4 Pentium Pro processors.
 Out of order completion: Out of order execution
mechanism called as dynamic execution.
 Superior branch prediction Unit: The branch
target buffer (BTB) is double the size as compare to
Pentium processor which increases its accuracy.
 Register renaming: Improves parallel performance
of the pipelines.
 Speculative execution: Speculative execution
reduces pipeline stall time in its R I S C core.
 Dynamic data flow analysis: Real time analysis of
the flow of data trough the processor to determine
data and register dependencies and to detect
opportunities for out of order instruction
execution.

You might also like