0% found this document useful (0 votes)
328 views

Microprocessor Module-5 Question Answers

Uploaded by

Shubham Barge
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
328 views

Microprocessor Module-5 Question Answers

Uploaded by

Shubham Barge
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Vidyavardhini's College of Engineering and Technology

K.T. Marg, Vasai Road (West)

Module - 5
Pentium Processor

May-2023
1) Explain in brief cache organization of Pentium processor. (5m)
2) Explain MESI protocol. (10m)
December 2022
1) Explain the Floating-point pipeline of Pentium processor. (5m)
2) Explain the Branch Prediction Mechanism of Pentium processor. (10m)
December 2019
1) Explain how flushing of pipeline problem is minimized in Pentium architecture.
(10m)
May-2019
1) Explain an instruction issue algorithm of Pentium processor. (5m)
2) Explain cache organization of Pentium processor. (10m)
December-2018
1) Explain the Branch Prediction logic used in Pentium processor. (10m)
2) Explain an instruction issue algorithm of Pentium processor. (5m)

SE Sem-IV - CSC405 Microprocessor Prof. Sunil Katkar


Vidyavardhini's College of Engineering and Technology
K.T. Marg, Vasai Road (West)

Module - 5
Pentium Processor
Q1) Explain in brief cache organization of Pentium processor.
Ans - The data cache organization of a Pentium processor typically involves the following
aspects:
Cache Hierarchy: Pentium processors usually feature a multi-level cache hierarchy,
including L1, L2, and potentially L3 caches. The data cache organization ensures that
frequently accessed data is stored in these caches, closer to the processor cores, to minimize
memory access latency.
L1 Data Cache (L1D): The L1 data cache is the closest and fastest cache to the processor
cores. It stores recently accessed data from memory to speed up subsequent accesses. The
size of the L1 data cache in Pentium processors varies depending on the specific model but
typically ranges from 32KB to 64KB per core. It operates on a principle of temporal and
spatial locality, meaning it caches data that is frequently accessed or located close together in
memory.
Cache Associativity: The L1 data cache in Pentium processors typically has a set-associative
or higher associativity, allowing for more flexibility in storing and retrieving data. Higher
associativity helps reduce cache conflicts and improves cache hit rates.
Cache Coherency: In multi-core processors, ensuring cache coherency is essential to
maintain data consistency across different cores. Pentium processors employ various cache
coherency protocols, such as MESI (Modified, Exclusive, Shared, Invalid), to manage data
consistency between caches in a multi-core environment.
Write Policies: Pentium processors may employ different write policies for the data cache,
such as write-through or write-back. Write-through immediately updates data in both the
cache and main memory, while write-back caches store data only in the cache until it is
replaced or invalidated, reducing memory traffic but requiring additional logic to manage
dirty data.
Overall, the data cache organization in Pentium processors is designed to optimize
memory access performance by efficiently storing and managing frequently accessed data,
minimizing latency, and enhancing overall system performance.

Q2) Explain the Integer pipeline of Pentium processor.


Ans: Pentium performs integer instructions in a five-stage pipeline -
Stage-1: Prefetch
• Here instructions are fetched from the L1 cache and stored into the Prefetch queue.
• The Prefetch queue is 32 bytes as it needs at least two full instructions to be present
inside for feeding the two pipelines, and maximum size of an instruction is 15 bytes.
• There are two Prefetch queues but only one of them is active at a time.
• It supplies the instructions to the two pipes.
• The other one is used when branch prediction logic predicts a branch to be “taken”.
• Since the bus from L1 cache to the prefetcher is 256 bits (32 bytes), the entire queue
can be fetched in 1 Cycle. (T State)
Stage-2: Instruction Decode 1
• The decode stage decodes the instruction opcode.
• It also checks for instruction pairing and performs branch prediction.

SE Sem-IV - CSC405 Microprocessor Prof. Sunil Katkar


Vidyavardhini's College of Engineering and Technology
K.T. Marg, Vasai Road (West)

• Certain rules are provided for instruction pairing. Not all instructions are pairable.
• If the two instructions can be paired, the first one is given to the U pipe and the
second one to the V pipe. If not, then the first one is given to the U pipe and the
second one is held back and then paired with the forthcoming instruction.
Stage-3: Instruction Decode 2 or Address Generation Stage
• It performs address generation where it generates the physical address of the required
memory operand using segment translation and page translation.
• Even protection checks are performed at this stage.
• The address calculation is fast due to segment descriptor caches and TLB.
• In most cases the address translation is performed in 1 cycle itself.
Stage-4: Execution Stage
• The Execution stage mainly uses the ALU.
• The U pipeline’s ALU has a barrel shifter, while the V pipeline’s does not.
• Instructions involving shifting like MUL, DIV etc. can only be done by U pipeline.
• Operands are either provided by registers or by data cache.
• Both U and V pipes can access the data cache simultaneously.
• During execution, if the U pipe instruction stalls, the V pipe one must also stall.
• But if the V pipe instruction stalls, the U pipe one can continue.
Stage-5: Write-Back Stage
• As the name suggests, the result is written back into the appropriate registers.
• The flags are updated accordingly.

Q3) Explain the Floating-point pipeline of Pentium processor.


Ans: Most floating-point instructions are issued singly to the U pipeline and can not be
paired with integer instructions. It consists of eight pipeline stages.
The first four stages are shared with integer pipeline and the last four reside within the
floating-point unit itself.
Stage-1: Prefetch
• Here instructions are fetched from the L1 cache and stored into the Prefetch queue.
• The Prefetch queue is 32 bytes as it needs at least two full instructions to be present
inside for feeding the two pipelines, and maximum size of an instruction is 15 bytes.
• There are two Prefetch queues but only one of them is active at a time.
• It supplies the instructions to the two pipes.
• The other one is used when branch prediction logic predicts a branch to be “taken”.
• Since the bus from L1 cache to the prefetcher is 256 bits (32 bytes), the entire queue
can be fetched in 1 Cycle. (T State)
Stage-2: Instruction Decode 1
• The decode stage decodes the instruction opcode.
• It also checks for instruction pairing and performs branch prediction.
• Certain rules are provided for instruction pairing. Not all instructions are pairable.
• If the two instructions can be paired, the first one is given to the U pipe and the
second one to the V pipe. If not, then the first one is given to the U pipe and the
second one is held back and then paired with the forthcoming instruction.
SE Sem-IV - CSC405 Microprocessor Prof. Sunil Katkar
Vidyavardhini's College of Engineering and Technology
K.T. Marg, Vasai Road (West)

Stage-3: Instruction Decode 2 or Address Generation Stage


• It performs address generation where it generates the physical address of the required
memory operand using segment translation and page translation.
• Even protection checks are performed at this stage.
• The address calculation is fast due to segment descriptor caches and TLB.
• In most cases the address translation is performed in 1 cycle itself.
Stage-4: Execution Stage
This stage performs register read, memory read, or memory write as required by the
instruction.
Stage-5: FP Execution 1 Stage
In this stage information from register or memory is written into a floating-point register.
Data is converted to floating point format before being loaded into the floating-point unit.
Stage-6: FP Execution 2 Stage
In this stage floating-point operation is performed within floating-point unit.
Stage-7: Write FP Result
In this stage floating-point results are rounded, and the result is written to the target floating-
point register.
Stage-8: Error Reporting
In this stage if an error is detected, an error reporting stage is entered where the error is
reported, and the floating-point unit status word is updated.

Q4) Explain MESI protocol.


Ans -

SE Sem-IV - CSC405 Microprocessor Prof. Sunil Katkar


Vidyavardhini's College of Engineering and Technology
K.T. Marg, Vasai Road (West)

The MESI protocol, which stands for Modified, Exclusive, Shared, and Invalid, is a cache
coherence protocol used in many modern processors, including some Pentium processors. It's
designed to maintain cache coherence between multiple caches in a multi-core processor
system. Here's an explanation of each state in the MESI protocol:
Modified (M):
When a cache line is in the Modified state, it means that the cache holds a copy of the data
that has been modified compared to the data in main memory. This state indicates that the
data in the cache is the most up-to-date, and no other caches hold a copy of this data. If
another cache requests this data, the cache holding it in the Modified state must write it back
to main memory before fulfilling the request.
Exclusive (E):
In the Exclusive state, the cache holds a clean copy of the data that matches the data in main
memory, and no other caches in the system hold a copy of this data. This state is similar to the
Modified state, but the data has not been modified in this cache. If the processor modifies this
data, it transitions to the Modified state. If another cache requests this data, it can be provided
directly without needing to write it back to main memory.
Shared (S):
The Shared state indicates that the cache line is present in multiple caches, and the data
matches the data in main memory. This state allows multiple processors to read the data
concurrently without causing inconsistencies. If a processor modifies the data in a cache
holding it in the Shared state, it must invalidate other caches holding the same data to
maintain coherence.
Invalid (I):
In the Invalid state, the cache line is not valid, meaning it does not contain a copy of any
meaningful data. This state is typically entered when a cache line is evicted or when the
processor determines that the data is stale and needs to be reloaded from main memory.
Processors in the Invalid state cannot fulfill read or write requests for the associated data.
The MESI protocol ensures cache coherence by governing the transitions between these
states based on cache operations and communication between caches in a multi-core system.
By maintaining coherence, the MESI protocol allows multiple processors to share data
efficiently while ensuring data integrity and consistency.

Q5) Explain the Branch Prediction Mechanism of Pentium processor. OR


Q6) Explain how flushing of pipeline problem is minimized in Pentium architecture.
Ans:

SE Sem-IV - CSC405 Microprocessor Prof. Sunil Katkar


Vidyavardhini's College of Engineering and Technology
K.T. Marg, Vasai Road (West)

• The Pentium processor includes branch prediction logic, allowing it to minimize


pipeline flushing.
• When a branch operation is correctly predicted, no performance penalty is incurred.
• However, when branch prediction is not correct, a three-cycle penalty is incurred if
the branch is executed in the U pipeline and up to four (3 + 1 extra may be needed)
cycle penalty if the branch is in the V pipeline.
• The prediction mechanism is implemented using a four-way, set-associative Cache
with 256 entries.
• This is referred to as the Branch Target Buffer (BTB)
• The directory entry for each line contains the following information –
1. A Valid bit that indicates whether the entry is in use or not.
2. Two History bits that track how often the branch has been taken each time that
it entered the pipeline before.
3. The Memory Address of the branch instruction for identification.
• The Branch Target Buffer, or BTB, is a look-aside cache that sits off to the side of the
D1 stages of the two pipelines and monitors for branch instructions.
• During D1 stage, when an instruction is decoded and identified as a branch
instruction, the address of the instruction is searched in the BTB for a previous
history.
• If no history exists, the prediction is made that the branch will not be taken.
• If there is a history (BTB hit), then prediction is made as follows: -
▪ If the History bits are 00 or 01 (Strongly Not taken or weakly not taken), then
the prediction is that the branch will not be taken.
▪ If the History bits are 10 or 11 (Strongly taken or weakly taken), then the
prediction is that the branch will be taken.
• If the branch is predicted to be taken, then the active queue is no longer used. Instead,
the prefetcher starts fetching instructions from the branch address and stores them into
the second queue which now becomes the active queue. This queue now starts feeding
instructions into the two pipes.
• If the branch is predicted to be not taken, then nothing changes, and the active queue
remains active, and instructions are fetched from the sequentially next locations.
• When the instruction reaches the execution stage, the branch will either be taken or
not taken. If taken, the next instruction to be executed should be the one fetched from
the branch target address.
• If the branch is not taken the next instruction executed should be the one fetched from
the next sequential memory address after the branch instruction.
• When the branch is taken for the first time, the execution unit provides feedback to
the prediction logic. The branch target address is sent back and recorded in the BTB.
• A directory entry is made containing the source memory address that the branch
instruction was fetched from, and history bits are set to indicate that the branch has
been strongly taken.

History Resulting Prediction If actually not


If actually taken
Bits Description made taken
Strongly Branch will Upgrades to Remains Strongly
00
Not Taken not be taken Weakly Not Taken Not Taken

SE Sem-IV - CSC405 Microprocessor Prof. Sunil Katkar


Vidyavardhini's College of Engineering and Technology
K.T. Marg, Vasai Road (West)

Weakly Not Branch will Upgrades to Downgrades to


01
Taken not be taken Weakly Taken Strongly Taken
Weakly Branch will Upgrades to Downgrades to
10
Taken be taken Strongly Taken Weakly Not Taken
Strongly Branch will Remains Strongly Downgrades to
11
Taken be taken Taken Weakly Taken

Q7) Explain an instruction issue algorithm of Pentium processor.


Ans: Register contention results if 2 instructions attempt to access the same register during
parallel execution. Register contentions are classified as –
Explicit Register Contention
The different cases of Explicit Register Contention are as follows –
1. RAW (Read after Write)
MOV AX,004BH
MOV [BP],AX
i.e. Register write followed by same Register read.
If both instructions are paired, then we get ERRATIC result. Since AX will be updated
buy ‘U’ pipeline in write backstage, while ‘V’ pipeline will read it in ALU stage; ‘V’
pipeline will take old data from AX.
2. WAW (Write after Write)
MOV AX,004BH
XLAT
i.e. Register write followed by same register write.
If paired, then we get ERRATIC result.
3. Writing to different parts of the same register.
MOV AL,05H
MOV AH,0AH
i.e. both instructions write to different parts of same register.
These Explicit Register Contentions are solved by allowing only first instruction in
‘U’ pipeline. Besides that, ‘U’ pipeline is allowed to write-back first while ‘V’
pipeline to stall.
Implicit Register Contention –
It occurs if two instructions imply reference to same register.
MOV AX,[SI]
MOV BX[SI+4]
There are certain exclusions to Implicit Register Contention. Two exceptions for Implicit
Register Contention are –
1. Flags References – Compare and Branch operations.
eg. CMP and JC or
ADD and JNZ
These pair of instructions require flags i.e. contention, but still are allowed to be used
simultaneously.
2. Stack pointer Reference: Pushes and Pops
eg. PUSH and PUSH or
PUSH and CALL or
POP and POP

SE Sem-IV - CSC405 Microprocessor Prof. Sunil Katkar


Vidyavardhini's College of Engineering and Technology
K.T. Marg, Vasai Road (West)

Based on the above Register Contentions, we can formulate the instruction issue algorithm as
below –
Consider two consecutive instructions I1 and I2, decoded by the microprocessor …..
If all the following are true:
I1 is a Simple instruction.
I2 is a Simple instruction.
I1 is not a Jump instruction.
Destination o I1 not the same as Source of I2
Destination o I1 not the same as Destination of I2
Then
Issue I1 to U-Pipe and I2 to V-Pipe
Else
Issue I1 to U-Pipe

SE Sem-IV - CSC405 Microprocessor Prof. Sunil Katkar

You might also like