Topics Covered: Memory Subsystem
Topics Covered: Memory Subsystem
Memory subsystem
Example #1: Effect of Interleaving
Consider a cache which has 8 words per block. On a read miss, the block that
contains the desired word must be copied from the memory into the cache. Assume
that the hardware has following properties. It takes 1 clock cycle to send an address
to the main memory. The first word is accessed in 8 clock cycles, and subsequent
words are accessed in 4 clock cycles. Also, one clock cycle is necessary to send the
word to the cache. How many clock cycles does it take to send the block of words to
the cache?
The total time taken is 1 + 8 + (7x4) +1 = 38
2
Example #1: Effect of Interleaving
If the memory is constructed as four interleaved modules, then when the starting
address of the block arrives at the memory, all four modules being accessing the
required data using the high order bits of the address. After 8 clock cycles, each
module has one word of data in its DBR. These words are transferred to the cache
one word at a time during the next 4 clock cycles. During this time, the next word
in each module is accessed. Then it takes another 4 clock cycles to transfer these
words to the cache.
Therefore the total time taken is 1+8+4+4=17.
3
Example #2: Effect of cache on processor chip
Consider the impact of the cache on the overall performance of the computer. Let
h be the hit rate, M be the miss penalty, that is, the time to access information in the
main memory, and C the time to access information in the cache. Then, the average
access time experienced by the processor is given by:
Refer to page 332 of the text book
Let us consider the following example. If the computer has no cache, then it takes 10
clock cycles for every memory read access. For a computer which has a cache that
holds 8 word blocks and an interleaved main memory, it takes 17 clock cycles to
transfer a block from the main memory to the cache. Assume that 30% of the instructions
require a memory access, so there are 130 memory accesses for every 100 instructions
executed. Assume that the hit rate in the cache are 0.95 for instructions and 0.9 for
data. Then, the improvement in performance is:
4
Example #3: Effect of L1 & L2 cache.
Consider the impact of L1 and L2 cache on the overall performance of the processor.
Let h1 be hit rate in cache L1, h2 the hit rate in cache L2, C1 the time to access
information in L1 cache, C2 time to access information in L2 cache, M is the time to
access information in the main memory. Then, the average access time of the
processor is given by:
5
Example #4: Set-associative cache
A computer system has a main memory of 64K 16-bit words. It consists of a cache of
128 blocks with 16 words per block organized in a block set associative manner
with 2 blocks per set.
(a) Calculate the number of bits in each of the TAG, SET and WORD fields of the
main memory address format.
(b) Assume that the cache is initially empty. Suppose that the processor fetches 2080
words from locations 0,1,....2079, in that order. It then repeats this fetch sequence
nine more times. If the cache is 10 times faster than the main memory, estimate
the improvement factor resulting from the use of the cache. Assume that the LRU
algorithm is used for block replacement.
6
Example #4: Set-associative cache
Words 0, 1, 2,....,2079 occupy blocks 0 to 129 in the main memory. After blocks 0,
127 have been read from the main memory into the cache on the first pass, the cache
is full. Because the replacement algorithm is LRU, main memory blocks that occupy
the first two sets of the 64 cache sets are always overwritten before they can be used
on a successive pass. In particular main memory blocks 0, 64 and 128 continually
displace each other in competing for the 2 block positions in cache set 0. Similarly,
main memory blocks 1, 65 and 129 continually displace each other in competing for
the 2 block positions in cache set 1. Main memory blocks that occupy the last 62 sets
are fetched once in the first pass and remain in the cache for the next 9 pases. On the
first pass all 130 blocks must be fetched from the main memory. On each of the 9
passes blocks in the last 62 sets of the cache (62x2=124) are found in the cache. The
remaining 6 blocks (130-124) must be fetced from the main memory.
Improvement factor = Time without cache/Time with cache
= 10x130x10t/(1x130x11t + 9(124x1t + 6x11t))
= 4.14