Module 3
Module 3
5.1,5.2,5.3,5.4,5.5(5.5.1,5.5.2),5.6
5.1 BASIC CONCEPTS
• Maximum size of memory that can be used in any computer is determined by
addressing mode.
• If MAR is k-bits long then
→ memory may contain up to 2K addressable locations
• Upon receipt of MFC signal, the processor loads the data from the data-lines into
MDR.
• The processor writes the data into the memory-location by
→ loading the address of this location into MAR &
→ setting the R/W’ line to 0
• Memory Access Time: It is the time that elapses between
→ initiation of an operation &
→ completion of that operation.
• Memory Cycle Time: It is the minimum time delay that required between
the initiation of the 2 successive memory-operations.
• RAM (Random Access Memory)
• In RAM, any location can be accessed for a Read/Write-operation in fixed
amount of time.
• Cache Memory
It is a small, fast memory that is inserted between
larger slower main-memory and
Processor
• When the word-line is at ground level, the transistors are turned off
and the latch retain its state.
Read Operation
• To read the state of the cell, the word-line is activated to close switches
T1 and T2.
• If the cell is in state 1, the signal on bit-line b is high and the signal on the
bit-line b’ is low.
• Sense/Write circuit
→ monitors the state of b & b’ and
→ sets the output accordingly.
Write Operation
• The state of the cell is set by
→ placing the appropriate value on bit-line b and its complement on b’
→ then activating the word-line.
• The contents must be periodically refreshed by restoring this capacitor charge to its
full value.
• In order to store information in the cell, the transistor T is turned ON.
• The appropriate voltage is applied to the bit-line which charges the capacitor.
• Hence, info. stored in cell can be retrieved correctly before threshold value of
capacitor drops down.
• During a read-operation,
→ transistor is turned ON
→ a sense amplifier detects whether the charge on the capacitor is above the threshold
value.
• If (charge on capacitor) > (threshold value) -> Bit-line will have logic value 1
• If (charge on capacitor) < (threshold value) -> Bit-line will set to logic value 0
ASYNCHRONOUS DRAM CONTD..
• 16 megabit DRAM chip configured as 2MX8
• The 4096 cells in each row are divided into 512 groups of 8 so that a row can store 512
bytes of data.
• The high order 12 bits and the low order 9 bits of the address constitute the row and
column addresses of a byte respectively.
• During Read/Write-operation,
→ row-address is applied first.
→ row-address is loaded into row-latch in response to a signal pulse on
RAS input of chip. (RAS = Row-address Strobe, CAS = Column-address
Strobe)
• When a Read-operation is initiated, all cells on the selected row are
read and refreshed.
• Shortly after the row-address is loaded, the column-address is
→ applied to the address pins &
→ loaded into CAS.
• The information in the latch is decoded.
• The appropriate group of 8 Sense/Write circuits is selected.
• R/W’=1(read-operation) -> Output values of selected circuits are transferred to data-lines
D7-D0.
• R/W’=0(write-operation) ->Information on D7-D0 are transferred to the selected circuits.
• RAS’ & CAS’ are active-low so that they cause latching of address when they
change from high to low.
• To ensure that the contents of DRAMs are maintained, each row of cells is
accessed periodically.
• A special memory-circuit provides the necessary control signals RAS’ & CAS’ that
govern the timing.
• The processor must take into account the delay in the response of the memory.
Fast Page Mode
• Transferring the bytes in sequential order is achieved by applying the
consecutive sequence of column-address under the control of
successive CAS’ signals.
• A Read-operation causes the contents of all cells in the selected row to be loaded in
these latches.
• Data held in latches that correspond to selected columns are transferred into data-
output register.
• The memory typically takes 2 or 3 clock cycles to activate the selected row.
• After a delay of one clock cycle, the first set of data bits is placed on the
data-lines.
• The DDR-SDRAM transfer data on both the edges (leading edge, trailing edge).
• To make it possible to access the data at high rate, the cell array is organized into 2 banks.
→ forwards the row & column portions of the address to the memory.
→ generates RAS’ & CAS’ signals
→ sends R/W’ & CS’ signals to the memory.
Refresh overhead
• All dynamic memories have to be refreshed.
• In older DRAMs , a typical period for refreshing all rows was 16ms.
• In typical SDRAMs, a typical period is 64ms
• Consider an SDRAM whose cells are arranged in 8K(=8192)rows.
• Suppose that it takes 4 clock cycles to access(read) each row.
• Then it takes 8192*4=32,768 cycles to refresh all rows.
• At a clock rate of 133 Mhz, the time needed to refresh all rows is
32,768/(133*106)=246*10-6 seconds.
• The refreshing process occupies 0.246ms in each 64ms time interval.
• The refresh overhead is 0.246/64=0.0038 which is less than 0.4 percent of the
total time available for accessing the memory.
5.2.7 RAMBUS MEMORY
• The usage of wide bus is expensive.
• The signals consist of much smaller voltage swings around a reference voltage
Vref.
• The reference voltage is about 2V.
• The two logical values are represented by 0.3V swings above and below Vref.
• The data are transmitted on both the edges of clock so that effective
data-transfer rate is 800MHZ.
• Many application requires non-volatile memory which retains the stored information if power is
turned off.
• For ex: OS software has to be loaded from disk to memory i.e. it requires non-volatile memory.
• Since the normal operation involves only reading of stored data, a memory of this type is called
ROM.
• If there is no connection to ground, the bit line remains in the high voltage,
indicating a 1.
• A sense circuit at the end of the bit line generates the proper output value.
• User can insert 1’s at required location by burning-out fuse using high current-pulse.
• Advantages:
1) It provides flexibility.
2) It is faster.
3) It is less expensive because they can be programmed directly by the user.
5.3.3 EPROM (ERASABLE REPROGRAMMABLE ROM)
• EPROM allows
→ stored data to be erased and
→ new data to be loaded.
• In cell, a connection to ground is always made at “P‟ and a special transistor is used.
• Disadvantages:
1) The chip must be physically removed from the circuit for
reprogramming.
2) The entire contents need to be erased by UV light.
5.3.4 EEPROM (ELECTRICALLY ERASABLE
ROM)
• Advantages:
1) It can be both programmed and erased electrically.
2) It allows the erasing of all cell contents selectively.
• Disadvantage:
1) It requires different voltage for erasing, writing and reading the stored
data.
5.3.5
• In EEPROM, it is possible to read & write the contents of a single cell.
• In Flash device, it is possible to read contents of a single cell & write entire contents
of a block.
• Prior to writing, the previous contents of the block are erased.
• Eg. In MP3 player, the flash memory stores the data that represents sound.
• Single flash chips cannot provide sufficient storage capacity for embedded-system.
• Advantages:
1) Flash drives have greater density which leads to higher capacity & low cost per
bit.
2) It requires single power supply voltage & consumes less power.
• There are 2 methods for implementing larger memory:
1) Flash Cards
2) Flash Drive
1) Flash Cards
• DRAM chips have much simpler basic cells and less expensive.
• But such memories are slower.
• Disk:
• Provides a huge amount of inexpensive storage
• Very slow compared to semiconductor devices used to implement the main
memory.
5.5 CACHE MEMORIES
• The effectiveness of cache mechanism is based on the property of “Locality of Reference”.
• Locality of Reference
• Many instructions in the localized areas of program are executed repeatedly during some time period
• Remainder of the program is accessed relatively infrequently
• There are 2 types:
1) Temporal
->The recently executed instructions are likely to be executed again very soon.
2) Spatial
->Instructions in close proximity to recently executed instruction are also likely to be executed
soon.
• If active segment of program is placed in cache-memory, then total execution time can be
reduced.
• Block refers to the set of contiguous address locations of some size.
• The cache-line is used to refer to the cache-block.
• The Cache-memory stores a reasonable number of blocks at a given time.
• This number of blocks is small compared to the total number of blocks available in
main-memory.
• Correspondence b/w main-memory-block & cache-memory-block is specified by
mapping-function.
• Cache control hardware decides which block should be removed to create space for
the new block.
• The collection of rule for making this decision is called the Replacement Algorithm.
• The cache control-circuit determines whether the requested-word currently exists
in the cache.
• The write-operation is done in 2 ways:
1) Write-through protocol &
2) Write-back protocol.
• Write-Through Protocol
Here the cache-location and the main-memory-locations are updated
simultaneously.
• Write-Back Protocol
This technique is to
update only the cache-location
mark the cache-location with associated flag bit called Dirty/Modified Bit.
The word in memory will be updated later, when the marked-block is removed from
cache
During Read-operation
• If the requested-word currently not exists in the cache, then read-miss will occur.
• To overcome the read miss, Load–through/Early restart protocol is used.
• Load–Through Protocol
The block of words that contains the requested-word is copied from the memory
into cache.
After entire block is loaded into cache, the requested-word is forwarded to
processor.
During Write-operation
• If the requested-word not exists in the cache, then write-miss will occur.
1) If Write Through Protocol is used, the information is written directly into main-
memory.
2) If Write Back Protocol is used,
→ then block containing the addressed word is first brought into the cache &
→ then the desired word in the cache is over-written with the new information
5.5.1 MAPPING-FUNCTION
• 3 different mapping-function:
1) Direct Mapping
2) Associative Mapping
3) Set-Associative Mapping
DIRECT MAPPING
• The block-j of the main-memory maps onto block-j modulo-128 of the cache .
• When the memory-blocks 0, 128, & 256 are loaded into cache, the block is stored in
cache-block 0.
• The contention is resolved by allowing the new blocks to overwrite the currently resident-
block.
• A new block that has to be brought into the cache has to replace an existing
block if the cache is full.
• If the cache is full and if new blocks are brought into the cache, then the cache-controller must decide
which of the old blocks has to be replaced.
• When a block is to be overwritten, the block with longest time w/o being referenced is over-written.
• This block is called Least recently Used (LRU) block & the technique is called LRU algorithm.
• The cache-controller tracks the references to all blocks with the help of block-counter.
• Performance depends on
→ how fast the machine instructions are brought to the processor &
→ how fast the machine instructions are executed.
• Parallelism means both the slow and fast units are accessed in the same manner.
5.6.1 INTERLEAVING
• The main-memory of a computer is structured as a collection of physically separate
modules.
• So, memory access operations may proceed in more than one module at the same time.
• Any component of the system can keep several modules busy at any one
time T.
• There are 2 separate L1 caches, one for instructions and another for data.
• In high-performance processors, 2 levels of caches are normally used, separate L1 caches for instructions and
data and a larger L2 cache.
• In this case, the L1 caches must be very fast, as they determine the memory access time seen by the processor.
• The L2 cache can be slower, but it should be much larger than the L1 caches to ensure a high hit rate.
• Its speed is less critical because it only affects the miss penalty of the L1 caches.
• A typical computer may have L1 caches with capacities of tens of kilobytes and an L2 cache of hundreds of
kilobytes or possibly several megabytes.
5.6.4 Other Enhancements