Access Type
Access Type
Access type,
Capacity,
Cycle time,
Latency,
Bandwidth,
Cost.
Random access any access to any memory location takes the same fixed amount of
time
Write operation to memory location 100 takes 15 ns then
Read operation to memory location 3000, also take 15 ns.
Compared to sequential access in which if
Access to location 100 takes 500 ns,
Access to location 101 takes 505 ns,
Access to location 300 may take 1500 ns.
The effectiveness of a memory hierarchy depends on the principle of moving
information into the fast memory infrequently and accessing it many times before
replacing it with new information. This principle is possible due to a phenomenon
called locality of reference; that is, within a given period of time, programs tend to
reference a relatively confined area of memory repeatedly. There exist two forms of
locality: spatial and temporal locality.
The sequence of events that takes place when the processor makes a
request for an item is as follows.
First, the item is sought in the first memory level of the memory
hierarchy.
The probability of finding the requested item in the first level is
called the hit ratio, h1.
The probability of not finding (missing) the requested item in the
first level of the memory hierarchy is called the miss ratio, (1- h1).
When the requested item causes a miss, it is sought in the next
subsequent memory level.
The probability of finding the requested item in the second memory
level, the hit ratio of the second level, is h2.
The miss ratio of the second memory level is (1 - h2).
The process is repeated until the item is found. Upon finding the
requested item, it is brought and sent to the processor.
In a memory hierarchy that consists of three levels, the average memory
access time can be expressed as follows:
CACHE MEMORY
Distinguish between two types of main memory:
The conventional and the slave memory.
Slave memory is a second level of unconventional high-speed memory, (cache
memory)
(The term cache means a safe place for hiding or storing things).
The idea behind using a cache as the first level of the memory hierarchy is to keep
the information expected to be used more frequently by the CPU in the cache (a
small high-speed memory that is near the CPU).
The end result is that at any given time some active portion of the main memory is duplicated in the
cache.
When the processor makes a request for a memory reference, the request is first
sought in the cache.
A cache hit ratio, hc, is defined as the probability of finding the requested element in
the cache. A cache miss ratio (1 - hc) is defined as the probability of not finding the
requested element in the cache.
In the case that the requested element is not found in the cache, then it has to be
brought from a subsequent memory level in the memory hierarchy.
If the element exists in the next memory level, that is, the main memory, then it has
to be brought and placed in the cache.
In expectation that the next requested element will be residing in the neighboring
locality of the current requested element (spatial locality), then upon a cache miss
what is actually brought to the main memory is a block of elements that contains the
requested element.
The advantage of transferring a block from the main memory to the cache will be
most visible if it could be possible to transfer such a block using one main memory
access time. Such a possibility could be achieved by increasing the rate at which
information can be transferred between the main memory and the cache. One
possible technique that is used to increase the bandwidth is memory interleaving. To
achieve best results, we can assume that the block brought from the main memory
to the cache, upon a cache miss, consists of elements that are stored in different
memory modules, that is, whereby successive memory addresses are stored in
successive memory modules. Figure 6.2 illustrates the simple case of a main
memory consisting of eight memory modules. It is assumed in this case that the
block consists of 8 bytes. Having introduced the basic idea leading to the use of a
cache memory, we would like to assess the impact of temporal and spatial locality
on the performance of the memory hierarchy. In order to make such an assessment,
we will limit our deliberation to the simple case of a hierarchy consisting only of two
levels, that is, the cache and the main memory. We assume that the main memory
access time is tm and the cache access time is tc. We will measure the impact of
locality in terms of the average access time, defined as the average time required to
access an element (a word) requested by the processor in such a two-level
hierarchy.
Assume that:
The size of the block transferred from the main memory to the cache,
upon a cache miss, is m elements.
All m elements were requested, one at a time, by the processor.
Based on these assumptions, the average access time, tav, is given by
leading to:
Cache-Mapping Function
Interface between the cache and the main memory.
A request for accessing a memory element is made by the processor
through issuing the address of the requested element.
The address issued by the processor may correspond to that of an
element that exists currently in the cache (cache hit); otherwise,
It may correspond to an element that is currently residing in the main
memory.
Therefore, address translation has to be made in order to determine the
whereabouts of the requested element.
This is one of the functions performed by the memory management unit
(MMU).
A schematic of the address mapping function is shown in Figure 6.3.
o If the element is not currently in the cache, then it will be brought (as part of a
block) from the main memory and placed in the cache and the element
requested is made available to the processor.
Direct Mapping
It places an incoming main memory block into a specific fixed
cache block location.
The placement is done based on a fixed relation between the
incoming block number, i, the cache block number, j, and the
number of cache blocks, N:
j = i mod N
Example 1
A main memory consisting of 4K blocks, a cache memory consisting of
128 blocks, and a block size of 16 words.
Figure 6.4 shows the division of the main memory and the cache
according to the direct-mapped cache technique.
There are a total of 32 main memory blocks that map to a given cache
block. For example, main memory blocks 0, 128, 256, 384, . . . , 3968
map to cache block 0.
10
11
12
13
The search made in step 1 above requires matching the tag field of the
address with each and every entry in the tag memory.
Such a search, if:
done sequentially, could lead to a long delay. Therefore, the tags are
stored in an associative (content addressable) memory. This allows the
entire contents of the tag memory to be searched in parallel
(associatively), hence the name, associative mapping.
Each line of the cache contains its own compare circuitry that is able to
discern in an instant whether or not the block is contained at that line.
With all of the lines performing this compare in parallel, the correct line is
identified quickly.
When a computer system is powered up, all valid bits are made equal to
0, indicating that they carry invalid information.
As blocks are brought to the cache, their statuses are changed
accordingly to indicate the validity of the information contained.
14
Set-Associative Mapping
The cache is divided into a number of sets. Each set consists of a
number of blocks.
A given main memory block maps to a specific cache set based on the
equation
s = i mod S,
Where S is the number of sets in the cache, i is the main memory block
number, and s is the specific cache set to which block i maps.
An incoming block maps to any block in the assigned cache set.
Therefore, the address issued by the processor is divided into three
distinct fields. These are the Tag, Set, and Word fields.
The Set field is used to uniquely identify the specific cache set that
ideally should hold the targeted block.
The Tag field uniquely identifies the targeted block within the determined
set.
The Word field identifies the element (word) within the block that is
requested by the processor.
According to the set-associative mapping technique, the MMU interprets
the address issued by the processor by dividing it into three fields as
shown in Figure 6.9. The length, in bits, of each of the fields of Figure
6.9 is given by
1. Word field = log2 B, where B is the size of the block in words
2. Set field = log2 S, where S is the number of sets in the cache
3. Tag field = log2 (M/S), where M is the size of the main memory in
blocks.
15
1. Use the Set field (5 bits) to determine (directly) the specified set (1 of
the 32 sets).
2. Use the Tag field to find a match with any of the (four) blocks in the
determined set. A match in the tag memory indicates that the
specified set determined in step 1 is currently holding the targeted
block, that is, a cache hit.
3. among the 16 words (elements) contained in hit cache block, the
requested word is selected using a selector with the help of the Word
field.
4. If in step 2, no match is found, then this indicates a cache miss.
Therefore, the required block has to be brought from the main
memory, deposited in the specified set first, and the targeted element
(word) is made available to the processor. The cache Tag memory
and the cache block memory have to be updated accordingly.
16
17
Example1
Assume a cache system has been designed such that each block contains 4 words and the
cache has 1024 lines, i.e., the cache can store up to 1024 blocks. What line of the cache is
supposed to hold the block that contains the word from the twenty-bit address 3A45616? In
addition, what is the tag number that will be stored with the block?
Solution
Start by dividing the address into its word id, line id, and tag bits. Since 4=22, then the two
least significant bits identify the word, i.e., w = 2. Since the cache has 1024=210 lines, then the
next 10 bits identify the line number where the data is supposed to be stored in the cache, i.e.,
l = 10. The remaining t = 20 w l = 8 bits are the tag bits. This partitions the address
3A45616 = 001110100100010101102 as follows:
Therefore, the block from address 3A45416 to 3A45716 will be stored in line 01000101012 =
27710 of the cache with the tag 001110102.
Example2
The first 10 lines of a 256 line cache are shown in the table below. Identify the address of the
data that is shaded (D816). For this cache, a block contains 4 words. The tags are given in
binary in the table.
18
Example3
Using the table from the previous example, determine if the data stored in main memory at
address 101C16 is contained in this cache, and if it is, retrieve the data.
Converting 101C16 to binary gives us 0001 0000 0001 11002.
From this we see that the line in the cache where this data should be stored is 000001112 =
710. The tag currently stored in this line is 0001002 which equals the tag from the above
partitioned address.
The word is FE16.
19
Example4
Using the same table from the previous two examples, determine if the data from address
982716 is in the cache.
982716 = 1001 1000 0010 01112.
Tag is 1001102,
The line number is 000010012 = 910, and the word offset into the block is 112.
Looking at
Line number 9 we see that the tag stored there equals 1010002.
Since this does not equal 1001102, the data from that address is not contained in this cache,
and we will have to get it from the main memory.
Example5
The table below represents five lines from a cache that uses fully associative mapping with a
block size of eight. Identify the address of the shaded data (C916).
The tag for C916 is 01000110101012. When combining this with the word id of 0012, the
address in main memory from which C916 was retrieved is 01000110101010012 = 46A916.
20
Example 6
Is the data from memory address 1E6516 contained in the table from the previous example?
Since 1E6516 = 00011110011001012,
The word is 9E
Example7
Identify the set number where the block containing the address 29ABCDE816 will be stored.
In addition, identify the tag and the lower and upper addresses of the block. Assume the
cache is a 4-way set associative cache with 4K lines(block), each block containing 16 words,
with the main memory of size 1 Gig memory space.
Solution
First, we need to identify the partitioning of the bits in the memory address.
1 Gig = 230 memory space requires 30 address lines (bit).
Four of those address lines will be used to identify one out of the 16 words within the block.
Notice: A set associative cache that has k lines per set is referred to as a k-way set associative
cache
21
The lowest address will be the one where word 00002 is stored and the highest address will be
the one where the word 11112 is stored.
Replacing the last four bits of 29ABCDE816 with 00002 gives us a low address of
29ABCDE016 while replacing them with 11112 gives us a high address of 29ABCDEF16
Following figure shows an example of a 1 Meg memory space divided into four word blocks.
22
Identify the line number, tag, and word position for each of the 30- bit addresses shown
below if they are stored in a cache using the direct mapping method.
a.) Address: 23D94EA616 Lines in cache: 4K Block size: 2
b.) Address: 1A54387F6 Lines in cache: 8K Block size: 4
c.) Address: 3FE9704A16 Lines in cache: 16K Block size: 16
d.) Address: 54381A516 Lines in cache: 1K Block size: 8
4. True or False: A block from main memory could possibly be stored in any line of a cache
using fully associative mapping.
5.
What problem is the fully or set-associative mapping methods for caches supposed to
solve over the direct mapping method?
6.
What is the easiest replacement algorithm to use with a 2-way set associative cache?
7. The table below represents five lines from a cache that uses fully associative mapping with
a block size of eight. Identify the address of the shaded data (3B16).
8. Using the table from the previous problem, identify the data value represented by each of
the following addresses.
a.) 7635916
b.) 386AF16
c.) BC5CC16
9. Identify the set number, tag, and word position for each of the 30- bit addresses stored in
an 8K line set associative cache.
a.) Address: 23D94EA616
b.) Address: 1A54387F6
c.) Address: 3FE9704A16
d.) Address: 54381A516
23