0% found this document useful (0 votes)
35 views

Module 3

Uploaded by

Rj Atre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Module 3

Uploaded by

Rj Atre
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Module 3

5.1,5.2,5.3,5.4,5.5(5.5.1,5.5.2),5.6
5.1 BASIC CONCEPTS
• Maximum size of memory that can be used in any computer is determined by
addressing mode.
• If MAR is k-bits long then
→ memory may contain up to 2K addressable locations

• If MDR is n-bits long, then


→ n-bits of data are transferred between the memory and processor.

• The data-transfer takes place over the processor-bus.

• The processor-bus has


1) Address-Line
2) Data-line
3) Control-Line (R/W’, MFC – Memory Function Completed).
The Control-Line is used for coordinating data-transfer.
• The processor reads the data from the memory by
→ loading the address of the required memory-location into MAR and
→ setting the R/W’ line to 1.

• The memory responds by


→ placing the data from the addressed-location onto the data-lines and
→ confirms this action by asserting MFC signal.

• Upon receipt of MFC signal, the processor loads the data from the data-lines into
MDR.
• The processor writes the data into the memory-location by
→ loading the address of this location into MAR &
→ setting the R/W’ line to 0
• Memory Access Time: It is the time that elapses between
→ initiation of an operation &
→ completion of that operation.

• Memory Cycle Time: It is the minimum time delay that required between
the initiation of the 2 successive memory-operations.
• RAM (Random Access Memory)
• In RAM, any location can be accessed for a Read/Write-operation in fixed
amount of time.

• Cache Memory
It is a small, fast memory that is inserted between
larger slower main-memory and
Processor

• It holds the currently active segments of a program and their data.


• Virtual Memory
• The address generated by the processor is referred to as a virtual/logical address.
• The virtual-address-space is mapped onto the physical-memory where data are
actually stored.
• The mapping-function is implemented by MMU. (MMU = memory management
unit).
• Only the active portion of the address-space is mapped into locations in the
physical-memory.
• The remaining virtual-addresses are mapped onto the bulk storage devices such
as magnetic disk.
• As the active portion of the virtual-address-space changes during program
execution, the MMU
→ changes the mapping-function &
→ transfers the data between disk and memory
• During every memory-cycle,
• MMU determines whether the addressed-page is in the memory.
• If the page is in the memory.
• Then, the proper word is accessed and execution proceeds.
• Otherwise, a page containing desired word is transferred from disk to memory.
• Memory can be classified as follows:
1) RAM which can be further classified as follows:
i) Static RAM
ii) Dynamic RAM (DRAM) which can be further classified as
synchronous DRAM
asynchronous DRAM
2) ROM which can be further classified as follows:
i) PROM
ii) EPROM
iii) EEPROM
iv) Flash Memory which can be further classified as
Flash Cards
Flash Drives
5.2 SEMI CONDUCTOR RAM MEMORIES
5.2.1 INTERNAL ORGANIZATION OF MEMORY-CHIPS
• Memory-cells are organized in the form of array.
• Each cell is capable of storing 1-bit of information.
• Each row of cells forms a memory-word.
• All cells of a row are connected to a common line called as Word-Line.
• The cells in each column are connected to Sense/Write circuit by 2-bit-
lines.
• The Sense/Write circuits are connected to data-input or output lines of
the chip.
• During a write-operation, the sense/write circuit
→ receive input information &
→ store input info in the cells of the selected word
• The data-input and data-output of each Sense/Write circuit are connected to
a single bidirectional data-line.

• Data-line can be connected to a data-bus of the computer.

• Following 2 control lines are also used:


1) R/W’
• Specifies the required operation.
2) CS’
• Chip Select input selects a given chip in the multi-chip memory-system
• Large chips have essentially the
same organization but use a
larger memory cell array and
have more external connections.

• Eg: 4M bit chip may have a


512kX8 organization, in which
19 address and data
input/output pins are needed.
5.2.2 STATIC RAM (OR MEMORY)
• Memories consist of circuits capable of retaining their state as long as
power is applied are known.
• 2 inverters are cross connected to form a latch.

• The latch is connected to 2-bit-lines by transistors T1 and T2.

• The transistors act as switches that can be opened/closed under the


control of the word-line.

• When the word-line is at ground level, the transistors are turned off
and the latch retain its state.
Read Operation
• To read the state of the cell, the word-line is activated to close switches
T1 and T2.

• If the cell is in state 1, the signal on bit-line b is high and the signal on the
bit-line b’ is low.

• Thus, b and b’ are complement of each other.

• Sense/Write circuit
→ monitors the state of b & b’ and
→ sets the output accordingly.
Write Operation
• The state of the cell is set by
→ placing the appropriate value on bit-line b and its complement on b’
→ then activating the word-line.

This forces the cell into the corresponding state.


• The required signal on the bit-lines is generated by Sense/Write circuit
CMOS Cell
• Transistor pairs (T3, T5) and (T4, T6) form the inverters in the latch .
• In state 1, the voltage at point X is high by having T3, T6 ON and T4, T5 are
OFF.
• Thus, T1 and T2 returned ON (Closed), bit-line b and b’ will have high and
low signals respectively.
Advantages:
1) It has low power consumption, the current flows in the cell only when the
cell is active.
2) Static RAM can be accessed quickly. It access time is few nanoseconds.
Disadvantage:
1) SRAMs are said to be volatile memories, their contents are lost when power
is interrupted.
5.2.3 ASYNCHRONOUS DRAM
• Less expensive RAMs can be implemented if simple cells are used.

• Such cells cannot retain their state indefinitely.

• Hence they are called Dynamic RAM (DRAM).

• The information stored in a dynamic memory-cell in the form of a charge on a


capacitor.

• This charge can be maintained only for tens of milliseconds.

• The contents must be periodically refreshed by restoring this capacitor charge to its
full value.
• In order to store information in the cell, the transistor T is turned ON.

• The appropriate voltage is applied to the bit-line which charges the capacitor.

• After the transistor is turned off, the capacitor begins to discharge.

• Hence, info. stored in cell can be retrieved correctly before threshold value of
capacitor drops down.

• During a read-operation,
→ transistor is turned ON
→ a sense amplifier detects whether the charge on the capacitor is above the threshold
value.
• If (charge on capacitor) > (threshold value) -> Bit-line will have logic value 1
• If (charge on capacitor) < (threshold value) -> Bit-line will set to logic value 0
ASYNCHRONOUS DRAM CONTD..
• 16 megabit DRAM chip configured as 2MX8

• The cells are organized in the form of 4KX4K array.

• The 4096 cells in each row are divided into 512 groups of 8 so that a row can store 512
bytes of data.

• 21 bit address is needed to access a byte in the memory.

• 21 bit is divided as follows:


1) 12 address bits are needed to select a row.
2) 9 bits are needed to specify a group of 8 bits in the selected row.

• The high order 12 bits and the low order 9 bits of the address constitute the row and
column addresses of a byte respectively.
• During Read/Write-operation,
→ row-address is applied first.
→ row-address is loaded into row-latch in response to a signal pulse on
RAS input of chip. (RAS = Row-address Strobe, CAS = Column-address
Strobe)
• When a Read-operation is initiated, all cells on the selected row are
read and refreshed.
• Shortly after the row-address is loaded, the column-address is
→ applied to the address pins &
→ loaded into CAS.
• The information in the latch is decoded.
• The appropriate group of 8 Sense/Write circuits is selected.
• R/W’=1(read-operation) -> Output values of selected circuits are transferred to data-lines
D7-D0.
• R/W’=0(write-operation) ->Information on D7-D0 are transferred to the selected circuits.

• RAS’ & CAS’ are active-low so that they cause latching of address when they
change from high to low.

• To ensure that the contents of DRAMs are maintained, each row of cells is
accessed periodically.

• A special memory-circuit provides the necessary control signals RAS’ & CAS’ that
govern the timing.

• The processor must take into account the delay in the response of the memory.
Fast Page Mode
• Transferring the bytes in sequential order is achieved by applying the
consecutive sequence of column-address under the control of
successive CAS’ signals.

• This scheme allows transferring a block of data at a faster rate.

• The block of transfer capability is called as fast page mode.


5.2.4 SYNCHRONOUS DRAMs
• The operations are directly synchronized with clock signal.

• The address and data connections are buffered by means of registers.

• The output of each sense amplifier is connected to a latch.

• A Read-operation causes the contents of all cells in the selected row to be loaded in
these latches.

• Data held in latches that correspond to selected columns are transferred into data-
output register.

• Thus, data becoming available on the data-output pins.


• First, the row-address is latched under control of RAS’ signal .

• The memory typically takes 2 or 3 clock cycles to activate the selected row.

• Then, the column-address is latched under the control of CAS’ signal.

• After a delay of one clock cycle, the first set of data bits is placed on the
data-lines.

• SDRAM automatically increments column-address to access next 3 sets of


bits in the selected row.
LATENCY & BANDWIDTH
• A good indication of performance is given by 2 parameters:
1) Latency
2) Bandwidth
Latency
• It refers to the amount of time it takes to transfer a word of data to or from
the memory.
• For a transfer of single word, the latency provides the complete indication
of memory performance.
• For a block transfer, the latency denotes the time it takes to transfer the
first word of data.
Bandwidth
• It is defined as the number of bits or bytes that can be transferred in 1 Sec
• Bandwidth mainly depends on
1) The speed of access to the stored data
2) The number of bits that can be accessed in parallel
DOUBLE DATA RATE SDRAM (DDR-SDRAM)
• The standard SDRAM performs all actions on the rising edge of the clock signal.

• The DDR-SDRAM transfer data on both the edges (leading edge, trailing edge).

• The Bandwidth of DDR-SDRAM is doubled for long burst transfer.

• To make it possible to access the data at high rate, the cell array is organized into 2 banks.

• Each bank can be accessed separately.

• Consecutive words of a given block are stored in different banks.

• Such interleaving of words allows simultaneous access to 2 words.

• The 2 words are transferred on successive edge of the clock.


5.2.5 STRUCTURE OF LARGER MEMORIES
Static memory systems
• Consider a memory system consists of 2M (2,097,152)word of 32 bits each.
• 512KX8 static memory chips.
• Each column consist of 4 chips, each implement one byte position.
• 4 of these sets provide the required 2MX32 memory.
• Each chip has control input called chip select.
• When this input is 1, it enables the chip to accept data from or to place data on its data
lines.
• The data output for each chip is of 3 state type.
• Only the selected chip places data on the data output line, which all other outputs are in the
high impedance state.
• 21 address bits are needed to select a 32 bit word in this memory.
• The high order 2 bits of the address are decoded to determine which of the 4 chip select control signals
should be activated
• The remaining 19 address bits are used to access specific byte locations inside each chip of the selected
row.
• The R/W’ input of all chips are tied together to provide a common R/W’ control.
Dynamic Memory System
• The physical implementation is done in the form of memory-modules.

• If a large memory is built by placing DRAM chips directly on the


Motherboard, then it will occupy large amount of space on the board.

• These packaging consideration have led to the development of larger


memory units known as SIMMs & DIMMs.
1) SIMM -> Single Inline memory-module
2) DIMM ->Dual Inline memory-module

• SIMM/DIMM consists of many memory-chips on small board that plugs into


a socket on motherboard
5.2.6 MEMORY-SYSTEM CONSIDERATION
MEMORY CONTROLLER
• To reduce the number of pins, the dynamic memory-chips use multiplexed-
address inputs.
• The address is divided into 2 parts:
1) High Order Address Bit ->Select a row in cell array.
->It is provided first and latched into memory-chips under the control of RAS’
signal.
2) Low Order Address Bit -> Selects a column.
->They are provided on same address pins and latched using CAS’ signals.
• The Multiplexing of address bit is usually done by Memory Controller Circuit .
• The Controller accepts a complete address & R/W’ signal from the processor.

• A Request signal indicates a memory access operation is needed.

• Then, the Controller

→ forwards the row & column portions of the address to the memory.
→ generates RAS’ & CAS’ signals
→ sends R/W’ & CS’ signals to the memory.
Refresh overhead
• All dynamic memories have to be refreshed.
• In older DRAMs , a typical period for refreshing all rows was 16ms.
• In typical SDRAMs, a typical period is 64ms
• Consider an SDRAM whose cells are arranged in 8K(=8192)rows.
• Suppose that it takes 4 clock cycles to access(read) each row.
• Then it takes 8192*4=32,768 cycles to refresh all rows.
• At a clock rate of 133 Mhz, the time needed to refresh all rows is
32,768/(133*106)=246*10-6 seconds.
• The refreshing process occupies 0.246ms in each 64ms time interval.
• The refresh overhead is 0.246/64=0.0038 which is less than 0.4 percent of the
total time available for accessing the memory.
5.2.7 RAMBUS MEMORY
• The usage of wide bus is expensive.

• Rambus developed the implementation of narrow bus.

• Rambus technology is a fast signaling method used to transfer information


between chips.

• The signals consist of much smaller voltage swings around a reference voltage
Vref.
• The reference voltage is about 2V.

• The two logical values are represented by 0.3V swings above and below Vref.

• This type of signaling is generally known as Differential Signaling.


• Rambus provides a complete specification for design of communication
called as Rambus Channel.

• Rambus memory has a clock frequency of 400 MHz.

• The data are transmitted on both the edges of clock so that effective
data-transfer rate is 800MHZ.

• Circuitry needed to interface to Rambus channel is included on chip.

• Such chips are called RDRAM. (RDRAM = Rambus DRAMs).


• Rambus channel has:
1) 9 Data-lines (1st-8th line ->Transfer the data, 9th line->Parity checking)
2) Control-Line
3) Power line

• A two channel Rambus has 18 data-lines which has no separate Address-Lines.

• Communication between processor and RDRAM modules is carried out by means of


packets transmitted on the data-lines.

• There are 3 types of packets:


1) Request
2) Acknowledge
3) Data
5.3 READ ONLY MEMORY (ROM)
• Both SRAM and DRAM chips are volatile, i.e. they lose the stored information if power is turned
off.

• Many application requires non-volatile memory which retains the stored information if power is
turned off.

• For ex: OS software has to be loaded from disk to memory i.e. it requires non-volatile memory.

• Non-volatile memory is used in embedded system.

• Since the normal operation involves only reading of stored data, a memory of this type is called
ROM.

At Logic value ‘0’


 Transistor(T) is connected to the ground point (P).
Transistor switch is closed & voltage on bit-line nearly drops to zero.

At Logic value ‘1’


Transistor switch is open.
The bit-line remains at high voltage
5.3.1 ROM
• Logic 0 is stored in the cell if the transistor is connected to ground at point p,
otherwise 1 is stored.

• The bit line is connected through a resistor to the power supply.

• To read the state of the cell, the word line is activated.

• The transistor switch is closed


• The voltage on the bit line drops to zero if there is a connection between the transistor and
ground.

• If there is no connection to ground, the bit line remains in the high voltage,
indicating a 1.

• A sense circuit at the end of the bit line generates the proper output value.

• Data are written in to ROM when it is manufactured.


• TYPES OF ROM

• Different types of non-volatile memory are


1) PROM
2) EPROM
3) EEPROM
4) Flash Memory
Flash Cards
Flash Drives
5.3.2 PROM (PROGRAMMABLE ROM)
• PROM allows the data to be loaded by the user.

• Programmability is achieved by inserting a “fuse‟ at point P in a ROM cell.

• Before PROM is programmed, the memory contains all 0’s.

• User can insert 1’s at required location by burning-out fuse using high current-pulse.

• This process is irreversible.

• Advantages:
1) It provides flexibility.
2) It is faster.
3) It is less expensive because they can be programmed directly by the user.
5.3.3 EPROM (ERASABLE REPROGRAMMABLE ROM)
• EPROM allows
→ stored data to be erased and
→ new data to be loaded.
• In cell, a connection to ground is always made at “P‟ and a special transistor is used.

• The transistor has the ability to function as


→ a normal transistor or
→ a disabled transistor that is always turned “off‟.

• Transistor can be programmed to behave as a permanently open switch, by injecting


charge into it.

• Erasure requires dissipating the charges trapped in the transistor of memory-cells.


• This can be done by exposing the chip to ultra-violet light.
• Advantages:
1) It provides flexibility during the development-phase of digital-
system.
2) It is capable of retaining the stored information for a long time.

• Disadvantages:
1) The chip must be physically removed from the circuit for
reprogramming.
2) The entire contents need to be erased by UV light.
5.3.4 EEPROM (ELECTRICALLY ERASABLE
ROM)
• Advantages:
1) It can be both programmed and erased electrically.
2) It allows the erasing of all cell contents selectively.

• Disadvantage:
1) It requires different voltage for erasing, writing and reading the stored
data.
5.3.5
• In EEPROM, it is possible to read & write the contents of a single cell.
• In Flash device, it is possible to read contents of a single cell & write entire contents
of a block.
• Prior to writing, the previous contents of the block are erased.
• Eg. In MP3 player, the flash memory stores the data that represents sound.
• Single flash chips cannot provide sufficient storage capacity for embedded-system.
• Advantages:
1) Flash drives have greater density which leads to higher capacity & low cost per
bit.
2) It requires single power supply voltage & consumes less power.
• There are 2 methods for implementing larger memory:
1) Flash Cards
2) Flash Drive
1) Flash Cards

One way of constructing larger module is to mount flash-chips on a


small card.
Such flash-card have standard interface.
The card is simply plugged into a conveniently accessible slot.
Memory-size of the card can be 8, 32 or 64MB.
Eg: A minute of music can be stored in 1MB of memory. Hence 64MB
flash cards can store an hour of music.
2) Flash Drives
Larger flash memory can be developed by replacing the hard disk-drive.
The flash drives are designed to fully emulate the hard disk.
The flash drives are solid state electronic devices that have no movable parts.
Advantages:
1) They have shorter seek & access time which results in faster response.
2) They have low power consumption. So they are attractive for battery driven
application.
3) They are insensitive to vibration.
Disadvantages:
1) The capacity of flash drive (1GB).
2) It leads to higher cost per bit.
3) Flash memory will weaken after it has been written a number of times
(typically at least 1 million times)
3.4 SPEED, SIZE COST
• SRAM-very fast memory
• But expensive because their basic cells have 6 transistors.

• DRAM chips have much simpler basic cells and less expensive.
• But such memories are slower.

• Secondary storage devices-magnetic memory


• To implement large memory spaces
• Large disks are available at reasonable price
• Slower than the semiconductor memory units.

A huge amount of cost effective storage can be provided by magnetic disks.

A large main memory can be built with dynamic RAM technology.


• Registers:
• The fastest access is to data held in processor registers.
• Top in terms of speed of access.
• Next level
• Relatively small amount of memory that can be implemented directly on processor chip.
• Called as processor cache.
• The Cache-memory is of 2 types:
1) Primary/Processor Cache (Level1 or L1 cache)
->It is always located on the processor-chip.
->small because it competes the space for the processor chip.
2) Secondary Cache (Level2 or L2 cache)
-> It is placed between the primary-cache and the rest of the memory.
->Implemented using SRAM chips.
• Next level in the hierarchy:
• Main memory
• Implemented using dynamic memory components in the form of SIMMs,
DIMMs, RIMMs
• Larger
• Slower than the cache memory.

• Disk:
• Provides a huge amount of inexpensive storage
• Very slow compared to semiconductor devices used to implement the main
memory.
5.5 CACHE MEMORIES
• The effectiveness of cache mechanism is based on the property of “Locality of Reference”.
• Locality of Reference
• Many instructions in the localized areas of program are executed repeatedly during some time period
• Remainder of the program is accessed relatively infrequently
• There are 2 types:
1) Temporal
->The recently executed instructions are likely to be executed again very soon.
2) Spatial
->Instructions in close proximity to recently executed instruction are also likely to be executed
soon.
• If active segment of program is placed in cache-memory, then total execution time can be
reduced.
• Block refers to the set of contiguous address locations of some size.
• The cache-line is used to refer to the cache-block.
• The Cache-memory stores a reasonable number of blocks at a given time.
• This number of blocks is small compared to the total number of blocks available in
main-memory.
• Correspondence b/w main-memory-block & cache-memory-block is specified by
mapping-function.
• Cache control hardware decides which block should be removed to create space for
the new block.
• The collection of rule for making this decision is called the Replacement Algorithm.
• The cache control-circuit determines whether the requested-word currently exists
in the cache.
• The write-operation is done in 2 ways:
1) Write-through protocol &
2) Write-back protocol.
• Write-Through Protocol
Here the cache-location and the main-memory-locations are updated
simultaneously.

• Write-Back Protocol
This technique is to
 update only the cache-location
mark the cache-location with associated flag bit called Dirty/Modified Bit.
The word in memory will be updated later, when the marked-block is removed from
cache
During Read-operation
• If the requested-word currently not exists in the cache, then read-miss will occur.
• To overcome the read miss, Load–through/Early restart protocol is used.
• Load–Through Protocol
The block of words that contains the requested-word is copied from the memory
into cache.
After entire block is loaded into cache, the requested-word is forwarded to
processor.
During Write-operation
• If the requested-word not exists in the cache, then write-miss will occur.
1) If Write Through Protocol is used, the information is written directly into main-
memory.
2) If Write Back Protocol is used,
→ then block containing the addressed word is first brought into the cache &
→ then the desired word in the cache is over-written with the new information
5.5.1 MAPPING-FUNCTION
• 3 different mapping-function:
1) Direct Mapping
2) Associative Mapping
3) Set-Associative Mapping
DIRECT MAPPING
• The block-j of the main-memory maps onto block-j modulo-128 of the cache .

• When the memory-blocks 0, 128, & 256 are loaded into cache, the block is stored in
cache-block 0.

• Similarly, memory-blocks 1, 129, 257 are stored in cache-block 1.

• The contention may arise when


1) When the cache is full.
2) When more than one memory-block is mapped onto a given cache-block position.

• The contention is resolved by allowing the new blocks to overwrite the currently resident-
block.

• Memory-address determines placement of block in the cache.


• The memory-address is divided into 3 fields:
1) Low Order 4 bit field
Selects one of 16 words in a block.
2) 7 bit cache-block field
7-bits determine the cache-position in which new block must be stored.
3) 5 bit Tag field
5-bits memory-address of block is stored in 5 tag-bits associated with cache-
location.
• As execution proceeds, 5-bit tag field of memory-address is compared with tag-bits
associated with cache-location.
• If they match, then the desired word is in that block of the cache.
• Otherwise, the block containing required word must be first read from the memory.
• And then the word must be loaded into the cache.
ASSOCIATIVE MAPPING
• The memory-block can be placed into any cache-block position.

• 12 tag-bits will identify a memory-block when it is resolved in the cache.

• Tag-bits of an address received from processor are compared to the tag-


bits of each block of cache.

• This comparison is done to see if the desired block is present.


• It gives complete freedom in choosing the cache-location.

• A new block that has to be brought into the cache has to replace an existing
block if the cache is full.

• The memory has to determine whether a given block is in the cache.

• Advantage: It is more flexible than direct mapping technique.

• Disadvantage: Its cost is high.


SET-ASSOCIATIVE MAPPING
• It is the combination of direct and associative mapping.
• The blocks of the cache are grouped into sets.
• The mapping allows a block of the main-memory to reside in any block of the specified
set.
• The cache has 2 blocks per set, so the memory-blocks 0, 64, 128…….. 4032 maps into
cache set “0‟.
• The cache can occupy either of the two block position within the set.
• 6 bit set field
Determines which set of cache contains the desired block.
• 6 bit tag field
The tag field of the address is compared to the tags of the two blocks of the set.
This comparison is done to check if the desired block is present.
• The cache which contains 1 block per set is called direct mapping.
• A cache that has “k‟ blocks per set is called as “k-way set associative cache‟.
• Each block contains a control-bit called a valid-bit.
• The Valid-bit indicates that whether the block contains valid-data.
• The dirty bit indicates that whether the block has been modified during its cache
residency.
• Valid-bit=0 When power is initially applied to system.
• Valid-bit=1 When the block is loaded from main-memory at first time.
• If the main-memory-block is updated by a source & if the block in the source is already
exists in the cache, then the valid-bit will be cleared to “0‟.
• If Processor & DMA uses the same copies of data then it is called as Cache Coherence
Problem.
• Advantages:
1) Contention problem of direct mapping is solved by having few choices for block
placement.
2) The hardware cost is decreased by reducing the size of associative search.
5.5.2 REPLACEMENT ALGORITHM
• In direct mapping method, the position of each block is pre-determined and there is no need of
replacement strategy.

• In associative & set associative method,

• The block position is not pre-determined.

• If the cache is full and if new blocks are brought into the cache, then the cache-controller must decide
which of the old blocks has to be replaced.

• When a block is to be overwritten, the block with longest time w/o being referenced is over-written.

• This block is called Least recently Used (LRU) block & the technique is called LRU algorithm.

• The cache-controller tracks the references to all blocks with the help of block-counter.

• Advantage: Performance of LRU is improved by randomness in deciding which block is to be overwritten.


• Eg:
• Consider 4 blocks/set in set associative cache.

2 bit counter can be used for each block.

When a ‘hit’ occurs,


• Then block counter=0
• The counter with values originally lower than the referenced one are incremented
by 1 & all others remain unchanged.

When a ‘miss’ occurs & if the set is full,


• the blocks with the counter value 3 is removed
• the new block is put in its place & its counter is set to “0‟ and other block counters
are incremented by 1.
5.6 PERFORMANCE CONSIDERATION
• 2 key factors in the commercial success are
1) Performance
2) Cost

• The best possible performance is at low cost.

• A common measure of success is called the Price Performance ratio.

• Performance depends on
→ how fast the machine instructions are brought to the processor &
→ how fast the machine instructions are executed.

• To achieve parallelism, interleaving is used.

• Parallelism means both the slow and fast units are accessed in the same manner.
5.6.1 INTERLEAVING
• The main-memory of a computer is structured as a collection of physically separate
modules.

• Each module has its own


1) ABR (address buffer register) &
2) DBR (data buffer register).

• So, memory access operations may proceed in more than one module at the same time.

• Thus, the aggregate-rate of transmission of words to/from the main-memory can be


increased.
• The low-order k-bits of the memory-address select a module.

• While the high-order m-bits name a location within the module.

• In this way, consecutive addresses are located in successive modules.

• Any component of the system can keep several modules busy at any one
time T.

• This results in both


→ faster access to a block of data and
→ higher average utilization of the memory-system as a whole.

• To implement the interleaved-structure, there must be 2k modules.

• Otherwise, there will be gaps of non-existent locations in the address-


space.
5.6.2 Hit Rate & Miss Penalty
• The number of hits stated as a fraction of all attempted accesses is called the Hit
Rate.
• The extra time needed to bring the desired information into the cache is called the
Miss Penalty.
• High hit rates well over 0.9 are essential for high-performance computers.
• Performance is adversely affected by the actions that need to be taken when a miss
occurs.
• A performance penalty is incurred because of the extra time needed to bring a block
of data from a slower unit to a faster unit.
• During that period, the processor is stalled waiting for instructions or data.
• We refer to the total access time seen by the processor when a miss occurs as the
miss penalty.
• Let h be the hit rate, M the miss penalty, and C the time to access information in
the cache. The average access time experienced by the processor is
tavg = hC + (1 − h)M
5.6.3 Caches on the Processor Chip
• When information is transferred between different chips, considerable delays occur in driver and receiver gates
on the chips.

• It is best to implement the cache on the processor chip.

• Most processor chips include at least one L1 cache.

• There are 2 separate L1 caches, one for instructions and another for data.

• In high-performance processors, 2 levels of caches are normally used, separate L1 caches for instructions and
data and a larger L2 cache.

• These caches are often implemented on the processor chip.

• In this case, the L1 caches must be very fast, as they determine the memory access time seen by the processor.

• The L2 cache can be slower, but it should be much larger than the L1 caches to ensure a high hit rate.

• Its speed is less critical because it only affects the miss penalty of the L1 caches.

• A typical computer may have L1 caches with capacities of tens of kilobytes and an L2 cache of hundreds of
kilobytes or possibly several megabytes.
5.6.4 Other Enhancements

• In addition to the main design issues just discussed,


several other possibilities exist for enhancing
performance.
Write Buffer
• To improve performance, a Write buffer can be included for temporary storage of Write requests.
• The processor places each Write request into this buffer and continues execution of the next
instruction.
• The Write requests stored in the Write buffer are sent to the main memory whenever the memory is
not responding to Read requests.
• It is important that the Read requests be serviced quickly, because the processor usually cannot
proceed before receiving the data being read from the memory.
• Hence, these requests are given priority over Write requests.
• The Write buffer may hold a number of Write requests.
• Thus, it is possible that a subsequent Read request may refer to data that are still in the Write buffer.
• To ensure correct operation, the addresses of data to be read from the memory are always compared
with the addresses of the data in the Write buffer.
• In the case of a match, the data in the Write buffer are used.
Prefetching
• To avoid stalling the processor, it is possible to prefetch the data into the cache before they are needed.
• The simplest way to do this is through software.
• A special prefetch instruction may be provided in the instruction set of the processor.
• Executing this instruction causes the addressed data to be loaded into the cache, as in the case of a Read miss.
• A prefetch instruction is inserted in a program to cause the data to be loaded in the cache shortly before they
are needed in the program.
• Then, the processor will not have to wait for the referenced data as in the case of a Read miss.
• The hope is that prefetching will take place while the processor is busy executing instructions that do not result
in a Read miss, thus allowing accesses to the main memory to be overlapped with computation in the
processor.
• Prefetch instructions can be inserted into a program either by the programmer or by the compiler.
• Compilers are able to insert these instructions with good success for many applications.
• Software prefetching entails a certain overhead because inclusion of prefetch instructions increases the length
of programs.
• Some prefetches may load into the cache data that will not be used by the instructions that follow.
Lockup-Free Cache
• A cache that can support multiple outstanding misses is called lockup-
free.
• Such a cache must include circuitry that keeps track of all outstanding
misses.
• This may be done with special registers that hold the pertinent
information about these misses.
• Software prefetching is used to motivate the need for a cache that is not
locked by a Read miss.
• In a pipelined processor, which overlaps the execution of several
instructions, a Read miss caused by one instruction could stall the
execution of other instructions.
• A lockup-free cache reduces the likelihood of such stalls.

You might also like