Chapter 3
Chapter 3
i. Location
Internal memory is often equated with main memory.
The processor requires its own local memory in the form of registers (e.g.,
PC, IR, ID).
The control unit portion of the processor may also require its own internal
memory.
Cache is another form of internal memory.
External memory consists of peripheral storage devices e.g., disk, tape, that
are accessible to the processor via I/O controller.
ii. Capacity
The natural unit of organization.
Common word lengths are 8, 16, and 32 bits.
External memory capacity is typically expressed in terms of bytes (1GB,
20MB).
Unit of transfer:
- Number of bits written out or read in to memory at a time. On Pentium,
64-bits.
- For external memory, data are often transferred in much larger units than
a word, and these are referred to as blocks.
Direct:
-Individual blocks have unique address.
-Access time depends on location of data and previous location.
-E.g. disk
Random:
-Individual addresses identify locations exactly.
-Access time independent of location or previous access.
-E.g. RAM, cache
Associative:
-Data is located by a comparison with contents of a portion of the store.
-Access time is independent of location or previous access.
-E.g. cache
v. Performance
3 performance parameters are used:
Access time(latency) for random-access, this is the time it takes to perform
a read or write operation, that is, the time from the instant that an address is
presented to the memory to the instant that data have been stored or made
available for use. For non-random-access is the time for the read/write
mechanism to position at the desired location.
Memory cycle time applied to random-access, consists of the access time
plus any additional time requires before a second access can commence.
This additional time may be required for transients to die out on signal lines
or to regenerate data if they are read destructively.
Transfer rate this is the rate that data can be transferred into or out of a
memory unit.
Transfer rate calculation
For random-access:
-transfer is based in 1 cycle time (clock period)
For non-random-access:
-Tn=TA + N/R
Where:
Tn = average time to read/write N bits;
TA = average access time;
N = number of bits;
R = transfer rate (bps)
Registers
In CPU
Internal or Main memory
May include one or more levels of cache
RAM
External memory
Backing store
A typical hierarchy is illustrated in figure 3.1. as one goes down the hierarchy, the
following occur:
a) Decreasing cost per bit;
b) Increasing capacity;
c) Increasing access time;
d) Decreasing frequency of access of the memory by the processor.
Locality of Reference
Two or more levels of memory can be used to produce average access time approaching
the highest level.
The reason that this works well is called locality of reference.
In practice memory references (both instructions and data) tend to cluster.
- Instructions: iterative loops and repetitive subroutine calls.
- Data: tables, arrays, etc. Memory references cluster in short run
3.2 CACHE MEMORY PRINCIPLES
Small amount of fast memory
Sits between normal main memory and CPU
May be located on CPU chip or module
1. Intended to give memory speed approaching that of fastest memories available but with
large size, at close to price of slower memories
2. Cache is checked first for all memory references.
3. If not found, the entire block in which that reference resides in main memory is stored in a
cache slot, called a line.
4. Each line includes a tag (usually a portion of the main memory address) which identifies
which particular block is being stored.
5. Locality of reference implies that future references will likely come from this block of
memory, so that cache line will probably be utilized repeatedly.
6. The proportion of memory references, which are found already stored in cache, is called
the hit ratio.
Figure 3.2b depicts the use of multiple level cache memory. it is a bit slower. It has three
level which is level 1 (L1), level 2 (L2), and level 3 (L3). L2 cache is slower and typically
larger compared to L1 cache, and the L3 cache is slower and larger compared to L2 cache
MAPPING FUNCTION
- Algorithm is needed for mapping main memory blocks into cache lines because there
fewer cache lines than main memory.
- Three technique used:
i. Direct mapping
The simplest technique
Maps each block of main memory into only one possible cache line
ii. Associative mapping
Permits each main memory into any line cache
Cache control logic interprets a memory address simply as a Tag and
Word field
To determine whether blocks is in the cache, the cache control logic
must simultaneously examine every lines Tag for a match
iii. Set Associative Mapping
A compromise that exhibits the strengths of both direct and associative
approaches while reducing their disadvantages
Consists number of sets
Each set contains number of lines
VICTIM CACHE
- Originally proposed as an approach to reduce conflict misses of direct mapped caches
without affecting its fast access time.
- Fully associative cache
- Typical size is 4 to 16 cache lines
- Residing between direct mapped L1 cache and next level of memory
REPLACEMENT ALGORITHMS
- Once cache has been filled, when new block is brought into cache, one of existing
blocks must be replaced
- For direct mapping, only one possible line for any particular block and no choice is
possible
- For associative and set-associative techniques, replacement algorithm is needed.
- To achieve high speed, algorithm must be implemented in hardware
WRITE POLICY
- When block that is resident in cache is to be replaced there are two cases to consider:
1. If old block in the cache has not been altered then it may be overwritten with new
block without first writing out the old block.
2. If at least one write operation has been performed on word in that line of cache then
main memory must be updated by writing line of cache out to the block of memory
before bringing in the new block
- There two problem contend with:
1. More than one device may access to main memory
2. More complex problem occurs when multiple processors are attached to the same
bus and each processor has its own local cache-if word is altered in one cache it
could conceivably invalidate a word in other caches
WRITE THROUGH AND WRITE BACK
Write through
- Simplest technique
- All write operations are made to main memory as well as to cache
- Main disadvantages of this technique is its generates substantial memory traffic and
may creates a bottleneck
Write back
- Minimizes memory writes
- Updates are made only in the cache
- Portions of main memory are invalid and hence accesses by I/O modules can be
allowed only through cache
- Makes for complex circuitry and a potential bottleneck
-
LINE SIZE
- When block of data is retrieved and placed in the cache not only desired word but also
some number of adjacent words are retrieved
- As block size increases the hit ratio will at first increase because of principle of locality
- As block size increases more useful data are brought into cache
- Hit ratio begin to decreased as block becomes bigger and probability of using newly
fetched information becomes less than the probability of reusing information that has
to be replaced
- Two specific effects come into play:
Larger blocks reduce number of blocks that fit into cache
As block becomes larger each additional word is farther from requested word
MULTILEVEL CACHES
- As logic density increased it has become possible to have cache on same chip as
processor
- The on-chip cache reduces processor external bus activity and speeds up execution time
and increases overall system performance
When requested instruction or data is found in on-chip cache, bus access is
eliminated
On-chip cache accesses will complete appreciably faster than would even zero-
wait state bus cycles
During this period the bus is free to support others transfers
- Two-level cache:
Internal cache designated as level 1 (L1)
External cache designated as level 2 (L2)
- Potential savings due to the use of L2 cache depends on hit rates in both L1 and L2
caches
- The use of multilevel caches complicates all of design issues related to caches, include
the size, replacement algorithm and write policy
UNIFIED VERSUS SPLIT CACHES
- Has becomes common to split cache
One dedicated to instructions
One dedicated to data
Both exist at same level, typically as two L1 caches
- Advantages of unified cache:
Higher hit rate
Balances load of instruction and data fetches automatically
Only one cache needs to be designed and implemented
- Trend is toward split caches at L1 and unified caches for higher levels
- Advantages of split cache:
Eliminates cache contention between instruction fetch/decode unit and
execution unit
Important in pipelining
Physical types:
- the most common today are semiconductor memory, magnetic surface
memory, used for disk and tape, and optical and magneto-optical.
Physical characteristics:
- in a volatile memory, information decays naturally or is lost when electrical
power is switched off.
- in a nonvolatile memory, information once recorded remains without
deterioration until deliberately changed; no electrical power is needed to
retain information. Magnetic-surface memories are nonvolatile.
- semiconductor memory (memory on integrated circuits) may be either volatile
or nonvolatile.
- nonerasable memory cannot be altered, except by destroying the storage unit.
Semiconductor memory of this type is known as read-only memory (ROM).
- of necessity, a practical nonerasable memory must also be nonvolatile.
Capacity:
- for internal memory, this is typically expressed in term of bytes (1 byte = 8
bits) or words.
- common word lengths are 8, 16, and 32 bits.
- external memory capacity is typically expressed in terms of bytes.
3.7 With the aid of a suitable diagram, describe the factors to be considered when
designing a memory system, in terms of capacity, access time, frequency of access and
cost.
It could be concluded that smaller, more expensive, faster memories are supplemented by larger,
cheaper, slower memories. The key to the success of this organization is decreasing frequency of
access. The use of two levels of memory to reduce average time access time works in principle,
but only if conditions (a) through (d) apply. Fortunately, condition (d) is also generally valid.
3.8 Explain what is meant by the term cache memory, and how it differs from the main
memory.
Cache memory is a design for the purpose to combine the memory access time of
expensive, high-speed memory combined with the large memory size of less
expensive, lower-speed memory.
Cache memory different from main memory because it contains a copy of
portions of main memory. Cache is closer to CPU so it is faster. The size of
cache memory also is smaller compared to main memory.
3.9 Describe the operations of a single and a multiple level cache memory.
For a single level cache memory, it is faster compare to multiple level. But it is
smaller in size.
For a multiple level cache memory, it is a bit slower. It has three level which is
level 1 (L1), level 2 (L2), and level 3 (L3). L2 cache is slower and larger
compared to L1 cache, and the L3 cache is slower and larger compared to L2
cache