Module2 Part2 Memory
Module2 Part2 Memory
Module 2
Basic Concepts
The maximum size of the memory that can be
used in any computer is determined by the
addressing scheme.
For example, a computer that generates 16-bit
addresses is capable of addressing up to 216 =
64K memory locations.
Machines whose instructions generate 32-bit
addresses can utilize a memory that contains up
to 232 = 4G (giga) locations.
The number of locations represents the size of
the address space of the computer.
Basic Concepts..
1K=2^10=1024 1 M = 2^20=1048576
2K=2^11=2048 1 G = 1024 MB
4K=2^12=4096
8K=2^13=8192
16K=2^14=16384
32K=2^15=32768
• When a new block enters the cache, the 7-bit cache block
field determines the in which cache block should main
memory block be stored. (2^7=128 blocks in cache, out of
which select 1 block)
• The high-order 5 bits of the memory address of the block are
stored in 5 tag bits associated with its location in the cache.
• The tag bits identify which of the 32 main memory blocks
mapped into this cache position is currently resident in the
cache (eg: in block 0 of cache, out of the blocks
0/128/256/512/…. Which block of main memory is in cache)
Cache Memory Mapping DIRECT MAPPING
When the processor generates memory address for read or write, the
following sequence of actions take place:
• The high-order 5 bits of the address are compared with the tag bits
associated with that cache location.
• If they match, then the desired word is in that block of the cache.
• If there is no match, then the block containing the required word must
first be read from the main memory and loaded into the
• cache.
Since more than one memory block is mapped onto a given cache block
position, contention may arise for that position even when the cache is not
full
Cache Memory Mapping ASSOCIATIVE MAPPING
This is the most flexible
mapping method
• The slave removes its data signals from the bus and returns
its Slave-ready signal to the low level at the end of cycle 3.
• The bus transfer operation is now complete, and a new
transfer might start in clock cycle 4.
• The Slave-ready signal is an acknowledgment from the slave
to the master, confirming that the requested data have been
placed on the bus.
• In the example in Figure 7.5, the slave responds in cycle 3.
• A different device may respond in an earlier or a later cycle
ASYNCHRONOUS BUS
t3—The master loads the data into its register. Then, it drops
the Master-ready signal, indicating that it has received the data
• In this case, the master places the output data on the data
lines at the same time that it transmits the address and
command information.
• The selected slave loads the data into its data register
when it receives the Master-ready signal and indicates that
it has done so by setting the Slave-ready signal to 1.
ASYNCHRONOUS BUS
• Because of handshake protocol, delay in signal doesn’t
cause errors (circuit design becomes simple)
• Data transfer rate is slow because of waiting for Master
ready and Slave read signals
SYNCHRONOUS BUS
• Clock signal duration should be properly designed to take
are of delays
• Faster transfer rates (If slow devices are present, the
number of clock cycles to perform each operation can be
increased
Problems
Problem: A computer system uses 32-bit memory addresses
and it has a main memory consisting of1Gbytes. It has a 4K-
byte cache organized in the block-set-associative manner,
with 4 blocks per set and 64 bytes per block. Calculate the
number of bits in each of the Tag, Set, and Word fields of the
memory address.
Solution: Consecutive addresses refer to bytes.
A block has 64 bytes; hence the Word field is 6 bits long.
With 4 × 64 = 256 bytes in a set (given: 1 set=4 blocks)
there are 4K/256 = 16 sets, requiring a Set field of 4 bits. This
leaves 32 − 4 − 6 = 22 bits for the Tag field.
Problems
Problem: Describe a structure similar to the one in
Figure 8.10 for an 8M × 32 memory using 512K × 8
memory chips.
(c) Suppose that the L2 cache has an ideal hit rate of 1. By what factor
would this reduce the average memory access time as seen by the
processor?
(a) The fraction of memory accesses that miss in both the L1 and L2
caches is
(1 − h )(1 − h ) = (1 − 0.96)(1 − 0.80) = 0.008
1 2
(b) The average memory access time using two cache levels is
t = 0.96τ + 0.04(0.80 × 15τ + 0.20 × 100τ)
avg
= 2.24τ
Problem – continued from previous slide
(c) With no misses in the L2 cache, we get:
tavg(ideal) = 0.96τ + 0.04 × 15τ = 1.56τ
Therefore,
[tavg(actual)/ tavg(ideal)] = [2.24τ / 1.56τ ]= 1.44