Memory and Cache
Memory and Cache
INTRODUCTION
Memory unit enables us to store data inside the computer. The computer memory always
had here’s to principle of locality.
Principleoflocalityorlocalityofreferenceisthetendencyofaprocessortoaccessthesame set of
memory locations repetitively over a short period of time.
Cache memory (CPU memory) is high-speed SRAM that a computer Microprocessor can
access more quickly than it can access regular RAM. This memory is typically integrated
directly into the CPU chip or placed on a separate chip that has a separate bus
interconnect with the CPU.
The data transfer between various levels of memory is done through blocks. The minimum
unit of information is called a block. If the data requested by the processor appears in some
block
in the upper level, this is called a hit. If the data is not found in the upper level, the request is
called a miss. The lower level in the hierarchy is then accessed to retrieve the block
containing the requested data.
The fraction of memory accesses found in a cache is termed as hit rate or hit ratio
Miss rate is the fraction of memory accesses not found in a level of the memory hierarchy.
Hit time is the time required to access a level of the memory hierarchy, including the time
needed to determine whether the access is a hit or a miss.
Because the upper level is smaller and built using faster memory parts, the hit time will be
much smaller than the time to access the next level in the hierarchy, which is the major
component of the miss penalty.
MEMORY HIERARCHY
A memory unit is a collection of semi-conductor storage cells with circuits to access the data
stored in them. The data storage in memory is done in words. The number of bits in a word
depends on the architecture of the computer. Generally a word is always multiple of 8.
Memory is accessed through unique system assigned address. The accessing of data from
memory is based on principle of locality.
Principle of Locality
The locality of reference or the principle of locality is the term applied to situations where
the same value or related storage locations are frequently accessed. There are three basic
types of locality of reference:
Temporal locality: Here a resource that is referenced at one point in time is referenced
again soon afterwards.
Spatial locality: Here the likelihood of referencing a storage location is greater if a storage
location near it has been recently referenced.
Sequential locality: Here storage is accessed sequentially, in descending or ascending
order. The locality or reference leads to memory hierarchy.
Miss: If the requested data is not found in the upper levels of memory hierarchy it is called
miss.
Hit rate or Hit ratio: It is the fraction of memory access found in the upper level .It is a
performance metric. Hit Ratio = Hit/ (Hit + Miss)
Miss rate: It is the fraction of memory access not found in the upper level (1-hit rate).
Hit Time: The time required for accessing a level of memory hierarchy, including the time
needed for finding whether the memory access is a hit or miss.
Miss penalty: The time required for fetching a block into a level of the memory hierarchy
from the lower level, including the time to access, transmit, insert it to new level and pass
the block to the requestor.
Bandwidth: The data transfer rate by the memory.
Latency or access time: Memory latency is the length of time between the memory is
receipt of a read request and its release of data corresponding with the request.
The memory access time increases as the level increases. Since the CPU registers are located
in very close proximity to the CPU they can be accessed very quickly and they are the more
costly. As the level increases, the memory access time also increases thereby decreasing the
costs.
The instructions and data are stored in memory unit of the computer system are divided into
following main groups:
Main or Primary memory Secondary memory.
Primary Memory:
Primary memory is the main area in a computer in which data is stored for quick access by
the computer’s processor. it is divided into two parts:
i) Random Access Memory(RAM):
RAM is a type of computer primary memory. It accessed any piece of data at any time.
RAM stores data for as long as the computer is switched on or is in use. This type of
memory is volatile. The two types of RAM are:
Static RAM: This type of RAM is static in nature, as it does not have to be refreshed at
regular intervals. Static RAM is made of large number of flip-flops on IC. It is being costlier
and having packing density.
Dynamic RAM: This type of RAM holds each bit of data in an individual capacitor in an
integrated circuit. It is dynamic in the sense that the capacitor charge is repeatedly
refreshed to ensure the data remains intact.
ii) Read Only Memory(ROM):
The ROM is nonvolatile memory. It retains stored data and information if the power is
turned off. )n ROM, data are stored permanently and can╆t alter by the programmer. There
are four types of ROM:
MROM (mask ROM): MROM (mask ROM) is manufacturer-Programmed ROM in which
data is burnt in by the manufacturer of the electronic equipment in which it is used and it is
not possible for a user to modify programs or data stored inside the ROM chip.
PROM (programmable ROM): PROM is one in which the user can load and store ╉
readonly╊ programs and data. )n PROM the programs or data are stored only fast time and
the stored data cannot modify the user.
EPROM (erasable programmable ROM): EPROM is one in which is possible to erase
Secondary Memory:
Secondary memory is where programs and data are kept on a long time basis. It is cheaper
from of memory and slower than main or primary memory. It is non-volatile and cannot
access data directly by the computer processor. It is the external memory of the computer
system.
Example: hard disk drive, floppy disk, optical disk/ CD-ROM.
A memory consists of cells in the form of an array. The basic element of the semiconductor
memory is the cell. Each cell is capable of storing one bit of information. Each
row of the cells constitutes a memory words and all cells of a row are connected to a
common line referred to as a word line. AW×b memory has w words, each word having ╅b╆
number of bits.
The basic memory element called cell can be in two states (0 or 1). The data can be written
into the cell and can be read from it.
In the above diagram there are 16 memory locations named as w0, w1, w3…w15. Each
location
can store at most 8 bits of data (b0, b1, b3… b7). Each location (wn) is the word line. The
word
line of Fig 4.4 is 8.
The cache memory stores instructions and data that are more frequently used or data that is
likely to be used next. The processor looks first in the cache memory for the data. If it finds
the instructions or data then it does perform a more time-consuming reading of data from
larger main memory or other data storage devices.
The processor do not need to know the exact location of the cache. It can simply issue
readandwriteinstructions.Thecachecontrolcircuitrydetermineswhethertherequesteddata
resides in the cache.
Cache and temporal reference: When data is requested by the processor, the data should
be loaded in the cache and should be retained till it is needed again.
Cache and spatial reference: Instead of fetching single data, a contiguous block of data is
loaded into the cache.
Terminologies in Cache
Split cache: It has separate data cache and a separate instruction cache. The two caches
work in parallel, one transferring data and the other transferring instructions.
A dual or unified cache: The data and the instructions are stored in the same cache. A
combined cache with a total size equal to the sum of the two split caches will usually have a
better hit rate.
Mapping Function: The correspondence between the main memory blocks and those in the
cache is specified by a mapping function
Cache Replacement: When the cache is full and a memory word that is not in the cache is
referenced, the cache control hardware must decide which block should be removed to
create space for the new block that contains the referenced word. The collection of rules for
making this decision is the replacement algorithm.
Hit ratio = hit / (hit + miss) = Number of hits/ Total accesses to the cache
Miss penalty or cache penalty is the sum of time to place a bock in the cache and time to
deliver the block to CPU.
Miss Penalty= time for block replacement + time to deliver the block to CPU
Cache performance can be enhanced by using higher cache block size, higher associativity,
reducing miss rate, reducing miss penalty, and reducing the time to hit in the cache. CPU
execution Time of a given task is defined as the time spent by the system executing that task,
including the time spent executing run-time or system services.
CPU execution time=(CPU clock cycles + memory stall cycles (if any)) x Clock cycle time
The memory stall cycles are a measure of count of the memory cycles during which the
CPU is waiting for memory accesses. This is dependent on caches misses and cost per miss
(cache penalty).
Memory stall cycles = number of cache misses x miss penalty
Instruction Count x (misses/ instruction) x miss penalty
Instruction Count (IC) x (memory access/ instruction) x miss penalty
IC x Reads per instruction x Read miss rate X Read miss penalty + IC x Write per
instruction x Write miss rate X Write miss penalty