Lec 8 Performance Enhancement-Computer Architecture
Lec 8 Performance Enhancement-Computer Architecture
ENHANCEMENT
LECTURE 8
Performance enhancement
Can be achieved through
■ Cache memory
■ Pipelining
■ Instruction pre fetch
Cache Memory:
Cache memory is a storage device placed in between CPU and main
memory. These are semiconductor memories. These are basically fast
memory device, faster than main memory.
A cache hit occurs when the requested data can be found in a cache,
while a cache miss occurs when it cannot. Cache hits are served by
reading data from the cache, which is faster than re-computing a result
or reading from a slower data store; thus, the more requests can be
served from the cache, the faster the system performs
A cache's sole purpose is to reduce accesses to the underlying slower
storage. Cache is also usually an abstraction layer that is designed to be
invisible from the perspective of neighboring layers.
Types of Cache Memory
There are three different types of mapping used for the purpose of cache
memory which are as follows: Direct mapping, Associative mapping, and
Set-Associative mapping
Direct Mapping
The simplest technique, known as direct mapping, maps each block of
main memory into only one possible cache line. or In Direct mapping,
assign each memory block to a specific line in the cache.
If a line is previously taken up by a memory block when a new block
needs to be loaded, the old block is trashed
Associative Mapping
In this type of mapping, the associative memory is used to store content and
addresses of the memory word. Any block can go into any line of the cache.
This means that the word id bits are used to identify which word in the block is
needed, but the tag becomes all of the remaining bits. This enables the placement
of any word at any place in the cache memory. It is considered to be the fastest
and the most flexible mapping form.
Set-associative Mapping
This form of mapping is an enhanced form of direct mapping where the drawbacks
of direct mapping are removed. Set associative addresses the problem of possible
thrashing in the direct mapping method. It does this by saying that instead of
having exactly one line that a block can map to in the cache, we will group a few
lines together creating a set. Then a block in memory can map to any one of the
lines of a specific set
Pipelining
■ It is observed that organization enhancements to the CPU
can improve performance. We have already seen that use
of multiple registers rather than a single a accumulator,
and use of cache memory improves the performance
considerably. Another organizational approach, which is
quite common, is instruction pipelining.
PIPELINING
■ A pipeline is a set of data processing elements connected in series,
where the output of one element is the input of the next one.
■ It allows storing and executing instructions in an orderly process. It is
also known as pipeline processing.
■ Pipelining increases the overall instruction throughput.
■ In pipeline system, each segment consists of an input register
followed by a combinational circuit. The register is used to hold data
and combinational circuit performs operations on it. The output of
combinational circuit is applied to the input register of the next
segment.
■ Pipeline system is like the modern day assembly line setup in
factories. For example in a car manufacturing industry, huge
assembly lines are setup and at each point, there are robotic arms to
perform a certain task, and then the car moves on ahead to the next
arm
Pipelining cont’d
■ To apply the concept of instruction execution in pipeline, it is required
to break the instruction in different task. Each task will be executed in
different processing elements of the CPU.
■ As we know that there are two distinct phases of instruction
execution: one is instruction fetch and the other one is instruction
execution. Therefore, the processor executes a program by fetching
and executing instructions, one after another.
Let us refer to the fetch and execute steps for instruction . Execution of
a program consists of a sequence of fetch and execute steps is shown on
the next slide
Pipelining cont’d
As a simple approach, consider subdividing instruction processing into two
stages:
■ fetch instruction and
■ execute instruction.
There are times during the execution of an instruction when main memory is
not being accessed. This time could be used to fetch the next instruction in
parallel with the execution of the current one.
The pipeline has two independent stages.
The first stage fetches an instruction and buffers it. When the second stage is
free, the first stage passes it the buffered instruction. While the second stage
is executing the instruction, the first stage takes advantage of any unused
memory cycles to fetch and buffer the next instruction. This is called
instruction pre-fetch or fetch overlap.
Note that this approach, which involves instruction buffering, requires more
registers. In general, pipelining requires registers to store data between stages
Instruction decomposition
To gain further speedup, the pipeline must have more stages. Let
us consider the following decomposition of the instruction
processing
■ Fetch instruction (FI): Read the next expected instruction into a
buffer.
■ Decode instruction (DI): Determine the opcode and the operand
specifiers.
■ Calculate operands (CO): Calculate the effective address of each
source operand. This may involve displacement, register indirect,
indirect, or other forms of address calculation.
■ Fetch operands (FO): Fetch each operand from memory. Operands in
registers need not be fetched.
■ Execute instruction (EI):Perform the indicated operation and store the
result, if any, in the specified destination operand location.
■ Write operand (WO): Store the result in memory.
Pipelining cont’d
■ With this decomposition, the various stages will be of more nearly
equal duration. For the sake of illustration, let us assume equal
duration. Using this assumption, Figure 12.10 shows that a six-stage
pipeline can reduce the execution time for 9 instructions from 54 time
units to 14 time units.
■ Several comments are in order: The diagram assumes that each
instruction goes through all six stages of the pipeline. This will not
always be the case. For example, a load instruction does not need the
WO stage. However, to simplify the pipeline hardware, the timing is
set up assuming that each instruction requires all six stages. Also, the
diagram assumes that all of the stages can be performed in parallel.
In particular, it is assumed that there are no memory conflicts
Pipeline Hazards
A pipeline hazard occurs when the pipeline, or some portion of the
pipeline, must stall because conditions do not permit continued
execution. Such a pipeline stall is also referred to as a pipeline bubble.
There are three types of hazards: resource, data, and control.
■ RESOURCE HAZARDS/STRUCTURAL HAZARD.
A resource hazard occurs when two (or more) instructions that are
already in the pipeline need the same resource. The result is that the
instructions must be executed in serial rather than parallel for a portion
of the pipeline. A resource hazard is sometime referred to as a
structural hazard.
■ CONTROL HAZARDS/BRANCH HAZARD.
A control hazard, also known as a branch hazard, occurs when the
pipeline makes the wrong decision on a branch prediction and therefore
brings instructions into the pipeline that must subsequently be
discarded. We discuss approaches to dealing with control hazards next
Pipeline Hazards
■ DATA HAZARDS
A data hazard occurs when there is a conflict in the access of an operand
location. In general terms, we can state the hazard in this form: Two
instructions in a program are to be executed in sequence and both
access a particular memory or register operand. If the two instructions
are executed in strict sequence, no problem occurs. However, if the
instructions are executed in a pipeline, then it is possible for the operand
value to be updated in such a way as to produce a different result than
would occur with strict sequential execution. In other words, the program
produces an incorrect result because of the use of pipelining
Pipeline Conflicts
■ There are some factors that cause the pipeline to deviate its normal
performance. Some of these factors are given below:
1. Timing Variations
All stages cannot take same amount of time. This problem generally
occurs in instruction processing where different instructions have
different operand requirements and thus different processing time.
2. Data Hazards
When several instructions are in partial execution, and if they reference
same data then the problem arises. We must ensure that next instruction
does not attempt to access data before the current instruction, because
this will lead to incorrect results.
Pipeline Conflicts
3. Branching
In order to fetch and execute the next instruction, we must know what
that instruction is. If the present instruction is a conditional branch, and
its result will lead us to the next instruction, then the next instruction
may not be known until the current one is processed.
4. Interrupts
Interrupts set unwanted instruction into the instruction stream. Interrupts
effect the execution of instruction
5. Data Dependency
It arises when an instruction depends upon the result of a previous
instruction but this result is not yet available
Advantages of Pipelining
■ The cycle time of the processor is reduced.
■ It increases the throughput of the system
■ It makes the system reliable.
Disadvantages of Pipelining
■ The design of pipelined processor is complex and costly to
manufacture.
■ The instruction latency is more.
Instruction pre fetch
■ In computer architecture, instruction pre fetch is a technique used in
central processor units to speed up the execution of a program by
reducing wait states.
■ Pre fetching occurs when a processor requests an instruction or data
block from main memory before it is actually needed. Once the block
comes back from memory, it is placed in a cache. When the
instruction/data block is actually needed, it can be accessed much
more quickly from the cache than if it had to make a request from
memory.
■ Since programs are generally executed sequentially, performance is
likely to be best when instructions are pre fetched in program order.
■ Alternatively, the pre fetch may be part of a complex branch
prediction algorithm, where the processor tries to anticipate the result
of a calculation and fetch the right instructions in advance
Types of pre fetching.
Pre fetching can be classified in many ways;
■ Data or instruction pre fetching. As the name implies, the pre
fetching can be performed for either data blocks or instruction blocks.
Since data access patterns show less regularity than instruction
patterns, accurate data pre fetching is generally more challenging
than instruction pre fetching.