0% found this document useful (0 votes)
4 views

Module 4 part-II (1)

The document outlines the features and architecture of the Pentium processor, highlighting its 32-bit microprocessor capabilities, including a 66-99 MHz frequency and advanced pipelining techniques. It explains the concept of superscalar processors, which can execute multiple instructions per clock cycle, and details the various functional units within the Pentium architecture. Additionally, it covers branch prediction mechanisms, cache memory types, and their performance, emphasizing the importance of cache in improving data retrieval efficiency.

Uploaded by

hppatilhpp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Module 4 part-II (1)

The document outlines the features and architecture of the Pentium processor, highlighting its 32-bit microprocessor capabilities, including a 66-99 MHz frequency and advanced pipelining techniques. It explains the concept of superscalar processors, which can execute multiple instructions per clock cycle, and details the various functional units within the Pentium architecture. Additionally, it covers branch prediction mechanisms, cache memory types, and their performance, emphasizing the importance of cache in improving data retrieval efficiency.

Uploaded by

hppatilhpp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

M O D U L E 4:

PENTIUM
P RO C E S S O R
Features of Pentium Processor

 Following are the features that Pentium processor


offers:
1. It is 32 bit Microprocessor.
2. It works on 66-99 MHz frequency.
3. 64 bit data bus.
4. It has 8 memory Bank
5. It has 32 bit address bus. 2^32=4 G B phy.memory.
6. It has 5 stages Interger Pipeline
7. 2 way super Scalar. U & V(2 pipeline)
8. On chip floating point pipeline
9. 8 stage floating point pipeline
10. Branch prediction logic
11. On chip L1 and split cache
What are Superscalar Processors?

 A special category of microprocessors that involves a


parallel approach for instruction execution
called through which more than one
instruction-level parallelism
instruction gets executed in one clock cycle is called
superscalar processors. It is famous as a second-
generation R I S C processor because R I S C is the ones that
operate in a faster manner with reduced instruction sets.

 Unlike scalar processors that have the ability to execute


maximal one instruction per clock cycle, the superscalar
processor uses the approach of simultaneously executing
two instructions in one clock cycle. The superscalar
processors perform this task by sending multiple
instructions to various execution units at the same time.
Hence this provides high throughput.
 It is to be noted here that superscalar processors are
generally pipelined. However, pipelining is different from
super scaling in a way that superscalars allow execution of
multiple instructions parallelly using multiple execution
units while pipelining uses a single execution unit which is
divided into multiple phases in order to execute multiple
instructions.
Architecture of Pentium Microprocessor
 The various functional units are as follows:
1. Bus unit
2. Paging unit
3. Control ROM
4. Prefetch buffer
5. Execution unit with two integer pipeline (U-pipe
and V-pipe)
6. Code cache
7. Data cache
8. Instruction decode
9. Branch target buffer
10. Dual processing logic
11. Advanced programmable interrupt controller
 Let us now understand, how the architectural
operation takes place.
 The Pentium processor has two primary operating
modes –

 Protected Mode - In this mode all instructions and


architectural features are available, providing
the highest performance and capability. This is
the recommended mode that all new
applications and operating systems should
target.

 Real-Address Mode - This mode provides the


programming environment of the Intel 8086
processor, with a few extensions. Reset initialization
places the processor in real mode where, with a
single instruction, it can switch to protected mode.
Integer pipeline Stages

 The Pentium's basic integer pipeline is five stages


long, with the stages broken down as follows:
 Pre-fetch/Fetch: Instructions are fetched from the
instruction cache and aligned in pre-fetch buffers for
decoding.
 Decode1: Instructions are decoded into the Pentium's
internal instruction format. Branch prediction also
takes place at this stage.
 Decode2: Same as above, and microcode RO M kicks in
here, if necessary. Also, address computations take
place at this stage.
 Execute: The integer hardware executes the
instruction.
 Write-back: The results of the computation
are
written back to the register file.
Floating Point Unit:

 There are 8 general-purpose 80-bit Floating point registers.


Floating point unit has 8 stages of pipelining. First five are
similar to integer unit. Since the possibility of error is more
in Floating Point unit (FPU) than in integer unit,
additional error checking stage is there in F P U. The
floating point unit is shown as below
 Where, F R D - Floating Point Rounding
 F D D - Floating Point Division

 FA D D - Floating Point Addition

 F E X P - Floating Point Exponent

 FA N D - Floating Point And

 F M U L - Floating Point Multiply


Branch Prediction in Pentium

 Why do we need branch prediction?

1. The gain produced by Pipelining can be reduced


by the presence of program transfer
instructions eg J M P, C A L L , R E T etc
2. They change the sequence causing all the
instructions that entered the pipeline after
program transfer instructions invalid
3. Thus no work is done as the pipeline stages are
reloaded.
Branch prediction logic

 To avoid this problem, Pentium uses a scheme called


Dynamic Branch Prediction. In this scheme, a prediction is
made for the branch instruction currently in the pipeline.
The prediction will either be taken or not taken. If the
prediction is true then the pipeline will not be flushed and
no clock cycles will be lost. If the prediction is false then the
pipeline is flushed and starts over with the current
instruction.
 It is implemented using 4 way set associated cache with
256 entries. This is called Branch Target Buffer (BTB).
The directory entry for each line consists of:
 Valid bit: Indicates whether the entry is valid or not.
 History bit: Track how often bit has been taken.
 Source memory address is from where the branch
instruction was fetched. If the directory entry is valid then
the target address of the branch is stored in corresponding
data entry in BTB.
Working of Branch Prediction

1. BTB is a lookaside cache that sits to the side of Decode


Instruction(DI) stage of 2 pipelines and monitors for branch
instructions.
2. The first time that a branch instruction enters the pipeline,
the BTB uses its source memory to perform a lookup in
the cache.
3. Since the instruction was never seen before, it is BTB miss.
It predicts that the branch will not be taken even
though it is unconditional jump instruction.
4. When the instruction reaches the EU(execution unit), the
branch will either be taken or not taken. If taken, the next
instruction to be executed will be fetched from the branch
target address. If not taken, there will be a sequential fetch
of instructions.
5. When a branch is taken for the first time, the execution unit
provides feedback to the branch prediction. The branch
target address is sent back which is recorded in BTB.
6. A directory entry is made containing the source memory
address and history bit is set as strongly taken.
Cache Memory

 Cache memory is a chip-based computer component that


makes retrieving data from the computer's memory more
efficient. It acts as a temporary storage area that the
computer's processor can retrieve data from easily. This
temporary storage area, known as a cache, is more readily
available to the processor than the computer's main
memory source, typically some form of DRAM.
 Cache Memory is a special very high-speed memory. It is
used to speed up and synchronizing with high-speed C P U.
 Cache memory is costlier than main memory or disk
memory but economical than C P U registers. Cache
memory is an extremely fast memory type that acts as a
buffer between R A M and the C P U.
 It holds frequently requested data and instructions so that
they are immediately available to the C P U when needed.
 Cache memory is used to reduce the average time
to access data from the Main memory. The cache
is a smaller and faster memory which stores
copies of the data from frequently used main
memory locations. There are various different
independent caches in a C P U, which store
instructions and data.
Levels of memory/Types of Memory

 Level 1 or Register –
It is a type of memory in which data is stored and
accepted that are immediately stored in C P U. Most
commonly used register is accumulator, Program
counter, address register etc.
 Level 2 or Cache memory –
It is the fastest memory which has faster access time
where data is temporarily stored for faster access.
 Level 3 or Main Memory –
It is memory on which computer works currently. It is
small in size and once power is off data no longer
stays in this memory.
 Level 4 or Secondary Memory –
It is external memory which is not as fast as main
memory but data stays permanently in this memory.
Cache Performance:

 When the processor needs to read or write a


location in main memory, it first checks for a
corresponding entry in the cache.
 If the processor finds that the memory location is
in the cache, a cache hit has occurred and data
is read from cache
 If the processor does not find the memory
location in the cache, a cache miss has occurred.
For a cache miss, the cache allocates a new entry
and copies in data from main memory, then the
request is fulfilled from the contents of the cache.
 The performance of cache memory is frequently
measured in terms of a quantity called Hit ratio.
 Cache memory mapping
 Caching configurations continue to evolve, but cache memory traditionally
works under three different configurations:
 Direct mapped cache has each block mapped to exactly one cache memory
location. Conceptually, a direct mapped cache is like rows in a table with three
columns: the cache block that contains the actual data fetched and stored, a tag
with all or part of the address of the data that was fetched, and a flag bit that
shows the presence in the row entry of a valid bit of data.
 Fully associative cache mapping is similar to direct mapping in structure
but allows a memory block to be mapped to any cache location rather than to a
prespecified cache memory location as is the case with direct mapping.
 Set associative cache mapping can be viewed as a compromise between
direct mapping and fully associative mapping in which each block is mapped to
a subset of cache locations. It is sometimes called N-way set associative
mapping, which provides for a location in main memory to be cached to any of
"N" locations in the L1 cache.
 Data writing policies
 Data can be written to memory using a variety of techniques, but the two main
ones involving cache memory are:
 Write-through. Data is written to both the cache and main memory at the
same time.
 Write-back. Data is only written to the cache initially. Data may then be
written to main memory, but this does not need to happen and does not inhibit
the interaction from taking place.
 Cache memory is important because it improves
the efficiency of data retrieval. It stores program
instructions and data that are used repeatedly in
the operation of programs or information that the
C P U is likely to need next. The computer
processor can access this information more
quickly from the cache than from the main
memory. Fast access to these instructions
increases the overall speed of the program.

You might also like