0% found this document useful (0 votes)
10 views

Module2 Part2 Memory

Uploaded by

poojarysujan77
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Module2 Part2 Memory

Uploaded by

poojarysujan77
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 113

MEMORY SYSTEM

Module 2
Basic Concepts
 The maximum size of the memory that can be
used in any computer is determined by the
addressing scheme.
 For example, a computer that generates 16-bit
addresses is capable of addressing up to 216 =
64K memory locations.
 Machines whose instructions generate 32-bit
addresses can utilize a memory that contains up
to 232 = 4G (giga) locations.
 The number of locations represents the size of
the address space of the computer.
Basic Concepts..

 Most modern computers are byte-addressable.


 The memory is usually designed to store and
retrieve data in word-length quantities.(32 bits)
 When a 32-bit address is sent from the
processor to the memory unit, the high-order 30
bits determine which word will be accessed
 The low-order 2 bits of the address specify
which byte location is involved
Basic Concepts..
 The connection between the processor and the
memory is shown in Figure 8.1
Basic Concepts..

 The processor uses the address lines to


specify the memory location involved in
a data transfer operation
 Data lines are used to transfer the data.
 The control lines carry the command
indicating
 a Read or a Write operation
 whether a byte or a word is to be
transferred.
Basic Concepts..
 The control lines also provide the necessary timing
information and are used by the memory to indicate
when it has completed the requested operation.

 When the processor-memory interface receives the


memory’s response, it asserts the MFC signal

 It indicates that the requested memory operation has


been completed.

 When MFC=1, the processor proceeds to the next


step in its execution sequence.
(MFC means Memory Function Complete)
Basic Concepts..

 Measures for the speed of a memory:


 Memory access time - time that elapses
between the initiation of an operation to
transfer a word of data and the completion of
that operation
 Memory cycle time - minimum time delay
required between the initiation of two
successive memory operations
 for example, the time between two
successive Read operations
Basic Concepts..

 A memory unit is called random-access memory


(RAM) if any location can be accessed for a
Read or Write operation in some fixed amount of
time that is independent of the location's
address.
 This distinguishes such memory units from
serial, or partly serial, access storage devices
such as magnetic disks and tapes.
 Access time on the latter devices depends on the
address or position of the data.
Basic Concepts..

 The processor of a computer can usually


process instructions and data faster than they
can be fetched from a reasonably priced
memory unit.
 The memory cycle time, then, is the bottleneck in the
system.
 An important design issue is to provide a
computer system with as large and fast a
memory as possible, within a given cost
target.
Basic Concepts..
 Several techniques to increase the
effective size and speed of the memory.
 Cache memory - increases the effective speed
 Virtual memory - increases the effective size

 One way to reduce the memory access time is to


use a cache memory.
 This is a small, fast memory that is inserted
between the larger, slower main memory
and the processor.
 It holds the currently active segments of a
program and their data.
Basic Concepts..
 Virtual memory
 In this technique, only the active portions of a
program are stored in the main memory, and the
remainder is stored on the much larger secondary
storage device.
 Sections of the program are transferred back and
forth between the main memory and the secondary
storage device in a manner that is transparent to the
application program.
 As a result, the application program sees a memory
that is much larger than the computer’s physical main
memory.
Semiconductor RAM Memories

 Semiconductor memories are available in a wide range


of speeds.
 Their cycle times range from 100 ns to less than 10 ns.
 When first introduced in the late 1960s, they were much
more expensive than the magnetic-core memories they
replaced.
 Because of rapid advances in VLSI (Very Large Scale
Integration) technology, the cost of semiconductor
memories has dropped dramatically.
 As a result, they are now used almost exclusively in
implementing memories.
Internal Organization of Memory
Chips
 Each memory cell can hold one bit of
information.
 Memory cells are organized in the
form of an array.
 One row is one memory word.
 All cells of a row are connected to a
common line, known as the word line.
 Word line is connected to the address
decoder.
Internal Organization of Memory Chips
 The cells in each column are connected to a
Sense/Write circuit by two bit lines
 Sense/Write circuits are connected to the data
input/output lines of the chip.
 During a Read operation, these circuits sense,
or read, the information stored in the cells
selected by a word line and place this
information on the output data lines.
 During a Write operation, the Sense/Write
circuits receive input data and store them in the
cells of the selected word.
Internal Organization of Memory Chips
Internal Organization of Memory Chips..
 Figure 8.2 is an example of a very small memory
circuit consisting of 16 words of 8 bits each.
 This is referred to as a 16 × 8 organization.
 The data input and the data output of each
Sense/Write circuit are connected to a single
bidirectional data line that can be connected to
the data lines of a computer.
Internal Organization of Memory Chips..

 Two control lines, R/W and CS, are provided.


 The ( ) input specifies the required
operation
 CS (Chip Select) input selects a given chip in a
multichip memory system.
Internal Organization of Memory
Chips..

 The memory circuit in Figure 8.2 stores 128 bits and


requires 14 external connections for address, data, and
control lines.(4+8+2)
 It also needs two lines for power supply and ground connections.
 If the circuit has 1K (1024) memory cells, this circuit can
be organized as a 128 × 8 memory, requiring a total of
19 external connections.(7+8+2+2)
 Alternatively, the same number of cells can be organized
into a 1K×1 format. (Fig 8.3)
 In this case, a 10-bit address is needed, but there is only
one data line, resulting in 15 external connections.
Internal Organization of Memory
Chips..

 Figure 8.3 shows such an organization.


 The required 10-bit address is divided into two
groups of 5 bits each to form the row and
column addresses for the cell array.
 A row address selects a row of 32 cells, all of
which are accessed in parallel.
 But, only one of these cells is connected to the
external data line, based on the column address.
Internal Organization of Memory Chips
Internal Organization of Memory
Chips..

1K=2^10=1024 1 M = 2^20=1048576
2K=2^11=2048 1 G = 1024 MB
4K=2^12=4096
8K=2^13=8192
16K=2^14=16384
32K=2^15=32768

Different ways of organizing 1K memory:


• 1024 x 1= 1024 rows, 1 column (10 addr lines + 1 data line required)
• 1024 x 1=32 x 32 x 1 (5 addr line+ 5 addr line + 1 data)
• 128 x 8 = 128 rows, 8 columns (7 addr lines + 8 data lines)
Internal Organization of Memory Chips..
 Commercially available memory chips contain a
much larger number of memory cells than the
examples shown in Figures 8.2 and 8.3.
 Large chips have essentially the same
organization as Figure 8.3, but use a larger
memory cell array and have more external
connections.
 For example, a 1G-bit chip may have a
256M × 4 organization, in which case a 28-bit
address is needed and 4 bits are used for
data.
Static Memories

 Memories that consist of circuits capable of


retaining their state as long as power is
applied are known as static memories.
SRAM Cell
Static RAM (SRAM) CELL
 Figure 8.4 illustrates how a static RAM (SRAM) cell may
be implemented.
 Two inverters are cross-connected to form a latch.
 The latch is connected to two bit lines by transistors T1
and T2.
 These transistors act as switches that can be opened or
closed under control of the word line.
 When the word line is at ground level, the transistors are
turned off and the latch retains its state.
 For example, if the logic value at point X is 1 and at point
Y is 0, this state is maintained as long as the signal on
the word line is at ground level.
 Assume that this state represents the value 1.
Static RAM (SRAM)..
 Read operation:
 The word line is activated to close switches T1
and T2.
 If the cell is in state 1, the signal on bit line is high
and the signal on bit line is low.
 The opposite is true if the cell is in state 0.

 Thus, and are always complements of each


other.
 The Sense/Write circuit at the end of the two bit
lines monitors their state and sets the
corresponding output accordingly.
Static RAM (SRAM)..
 Write operation:
 The Sense/Write circuit drives bit lines and

• It places the appropriate value on bit line


and its complement on and activates the
word line.
 This forces the cell into the corresponding
state, which the cell retains when the word line
is deactivated.
CMOS Cell
 A CMOS realization of the cell in Figure 8.4 is given
in Figure 8.5.
 Transistor pairs (T3, T5) and (T4, T6) form the
inverters in the latch.
 The state of the cell is read or written as follows:
 For example, in state 1, the voltage at point X is
maintained high by having transistors T3 and T6 on,
while T4 and T5 are off.
 If T1 and T2 are turned on, bit lines and will have
high and low signals, respectively.
CMOS Cell..
Static RAM (SRAM)..
 SRAMs are said to be volatile memories because their
contents are lost when power is interrupted.
 Advantage of CMOS SRAMs is their very low power
consumption, because current flows in the cell only
when the cell is being accessed.
 Otherwise, T1, T2, and one transistor in each inverter are
turned off, ensuring that there is no continuous electrical
path between Vsupply and ground.
 Static RAMs can be accessed very quickly.
 Access times on the order of a few nanoseconds are found
in commercially available chips.
 SRAMs are used in applications where speed is of
critical concern.
SRAMs vs. DRAMs
 Static RAMs (SRAMs):
 Consist of circuits that are capable of retaining their
state as long as the power is applied.
 Volatile memories, because their contents are lost
when power is interrupted.
 Access times of static RAMs are in the range of few
nanoseconds.
 However, the cost is usually high.

 Dynamic RAMs (DRAMs):


 Do not retain their state indefinitely.
 Contents must be periodically refreshed.
 Contents may be refreshed while accessing them for reading.
Asynchronous DRAM

 Information is stored in a dynamic memory cell in the


form of a charge on a capacitor
 This charge can be maintained for only tens of
milliseconds.
 Since the cell is required to store information for a
much longer time, its contents must be periodically
refreshed by restoring the capacitor charge to its full
value.
 This occurs when the contents of the cell are read or when new
information is written into it.
 An example of a dynamic memory cell that consists of a
capacitor, C, and a transistor, T, is shown in Figure 8.6.
DRAM
Asynchronous DRAMs
 To store information in this cell, transistor T is
turned on and an appropriate voltage is applied
to the bit line.
 This causes a known amount of charge to be stored in
the capacitor.
 After the transistor is turned off, the capacitor
begins to discharge.
 The information stored in the cell can be
retrieved correctly only if it is read before the
charge in the capacitor drops below some
threshold value.
Asynchronous DRAMs

 During a Read operation, the transistor in a


selected cell is turned on.
 A sense amplifier connected to the bit line
detects whether the charge stored in the
capacitor is above or below the threshold value.
 If the charge is above the threshold, the sense
amplifier drives the bit line to the full voltage
representing the logic value 1.
 As a result, the capacitor is recharged to the full
charge corresponding to the logic value 1.
Asynchronous DRAMs

 If the sense amplifier detects that the charge in


the capacitor is below the threshold value, it
pulls the bit line to ground level to discharge the
capacitor fully.
 Thus, reading the contents of a cell
automatically refreshes its contents.
 Since the word line is common to all cells in a
row, all cells in a selected row are read and
refreshed at the same time.
Asynchronous DRAMs

 A 256-Megabit DRAM chip, configured as 32M × 8


is shown in Figure 8.7.
 The cells are organized in the form of a 16K x 16K array.
 The 16384 cells in each row are divided into 2048
groups of 8.
 A row can store 2048 bytes of data.
 14 address bits are needed to select a row.
 Another 11 bits are needed to specify a group of 8 bits in
the selected row.
 Thus, a 25-bit address is needed to access a byte in this
memory.
Asynchronous DRAM
Asynchronous DRAMs..

 The high-order 14 bits constitute the row address and


the low-order 11 bits of the address constitute column
address of a byte.
 To reduce the number of pins needed for external
connections, the row and column addresses are
multiplexed on 14 pins.
 During a Read or a Write operation, the row address is
applied first.
 It is loaded into the row address latch in response to a
signal pulse on the Row Address Strobe (RAS) input of
the chip.
Asynchronous DRAMs..
 Then a Read operation is initiated, in which all cells on the
selected row are read and refreshed.
 Shortly after the row address is loaded, the column address is
applied to the address pins and loaded into the column address
latch under control of the Column Address Strobe (CAS) signal.
 The information in this latch is decoded and the appropriate
group of 8 Sense/Write circuits are selected.
 If the ഥ control signal indicates a Read operation, the output
values of the selected circuits are transferred to the data lines,
D7-0.
 For a Write operation, the information on the D7-0 lines is
transferred to the selected circuits.
 This information is then used to overwrite the contents of the selected cells
in the corresponding 8 columns.
Asynchronous DRAMs..

 In commercial DRAM chips, the RAS and CAS


control signals are active low.
 To indicate this fact, these signals are shown on
diagrams as and .
 To ensure that the contents of a DRAM are
maintained, each row of cells must be accessed
periodically.
 A refresh circuit usually performs this function
automatically.
Asynchronous DRAMs..
 The timing of the memory device is controlled
asynchronously.
 A specialized memory controller circuit provides the
necessary control signals, RAS and CAS, that govern
the timing.
 The processor must take into account the delay in the
response of the memory.
 Such memories are referred to as asynchronous DRAMs.
 When the DRAM in Figure 8.7 is accessed, the contents
of all 16,384 cells in the selected row are sensed, but only
8 bits are placed on the data lines, D7−0.
 This byte is selected by the column address, bitsA10−0
Fast Page Mode
 Suppose if we want to access the consecutive bytes in the
selected row.
 This can be done without having to reselect the row.
 Add a latch at the output of the sense circuits in each row.
 All the latches are loaded when the row is selected.
 Different column addresses can be applied to select and place
different bytes on the data lines.
 Consecutive sequence of column addresses can be applied
under the control signal CAS, without reselecting the row.
 Allows a block of data to be transferred at a much faster rate than
random accesses.
 A small groups of bytes is referred to as blocks and larger groups as
pages.
 This transfer capability is referred to as the fast page mode
feature.
Synchronous DRAM
• Its operation is synchronized with a clock signal.
• Its structure is shown in Figure 8.8.
• The cell array is the same as in asynchronous DRAMs.
• The distinguishing feature of an SDRAM is the use of
a clock signal, the availability of which makes it
possible to incorporate control circuitry on the chip
that provides many useful features.
• For example, SDRAMs have built-in refresh circuitry,
with a refresh counter to provide the addresses of the
rows to be selected for refreshing.
• As a result, the dynamic nature of these memory chips
is almost invisible to the user
Synchronous DRAM
• The address and data connections of an SDRAM may be buffered
by means of registers, as shown in the figure 8.8
• Internally, the Sense/Write amplifiers function as latches, as in
asynchronous DRAMs.
• A Read operation causes the contents of all cells in the selected
row to be loaded into these latches.
• The data in the latches of the selected column are transferred into
the data register, thus becoming available on the data output pins.
• The buffer registers are useful when transferring large blocks of
data at very high speed.
• By isolating external connections from the chip’s internal
circuitry, it becomes possible to start a new access operation
while data are being transferred to or from the registers.
Synchronous DRAM
• SDRAMs have several different modes of operation,
which can be selected by writing control information
into a mode register.
• For example, burst operations of different lengths can
be specified.
• It is not necessary to provide externally-generated
pulses on the CAS line to select successive columns.
• The necessary control signals are generated internally
using a column counter and the clock signal.
• New data are placed on the data lines at the rising edge
of each clock pulse
Structure of Larger memories
A memory consisting of 2M words of 32 bits each.
• Figure 8.10 shows how this memory can be
implemented using 512K × 8 static memory chips.
• 512K + 512 K + 512 K+ 512 K= 1M (4 rows)
• 8+8+8+8=32 (4 columns)
• Each column in the figure implements one byte position
in a word, with four chips providing 2M bytes.
• Four columns implement the required 2M × 32 memory.
• Each chip has a control input called Chip-select.
• When this input is set to 1, it enables the chip to accept
data from or to place data on its data lines.
• The data output for each chip is of the tri-state
Structure of Larger memories
• Only the selected chip places data on the data output line,
while all other outputs are electrically disconnected from
the data lines.
• Twenty-one address bits are needed to select a 32-bit
word in this memory. (19+2)
• The high-order two bits of the address are decoded to
determine which of the four rows should be selected.
• The remaining 19 address bits are used to access specific
byte locations inside each chip in the selected row.
• (2^19=2^10 x 2^9=1K x 512=512K)
• The R/W inputs of all chips are tied together to provide a
common Read/Write control line (not shown in the fig.)
Memory
Hierarchy
Memory Hierarchy
• The fastest access is to data held in processor registers
• The processor registers are at the top in terms of speed of
access
• At the next level of the hierarchy is Processor Cache memory
• It is a relatively small amount of memory that can be
implemented directly on the processor chip.
• It holds copies of the instructions and data stored in a external
memory
• There are often two or more levels of cache.
• A primary cache is always located on the processor chip.
• This cache is small and its access time is comparable to that of
processor registers.
• The primary cache is referred to as the level 1 (L1) cache.
Memory Hierarchy
• Secondary Cache is somewhat slower memory
placed between the primary cache and the rest of the
memory.
• Secondary cache is referred to as the level 2 (L2)
cache.
• Often, the L2 cache is also housed on the processor
chip
• Some computers have a level 3 (L3) cache of even
larger size, in addition to the L1 and L2 caches.
• An L3 cache, also implemented in SRAM technology,
may or may not be on the same chip with the
processor and the L1 and L2 caches
Memory Hierarchy
• The next level in the hierarchy is the main memory.
• This is a large memory implemented using dynamic
memory components
• The main memory is much larger but significantly
slower than cache memories.
• Disk devices provide a very large amount of
inexpensive memory, and they are widely used as
secondary storage in computer systems.
• They are very slow compared to the main memory.
• They represent the bottom level in the memory
hierarchy
Cache Memories
 Processor is much faster than the main memory.
 Cache memory is small and fast memory placed in
between main memory and processor
 It makes the main memory appear faster to the
processor than it really is.
 Cache memory is based on the property of
computer programs known as “locality of reference”.
Operation of Cache Memories..
Locality of Reference
 Analysis of programs shows that most of their
execution time is spent on routines in which
many instructions are executed repeatedly.
 These instructions may constitute a simple loop,
nested loops, or a few procedures that
repeatedly call each other.
 Many instructions in localized areas of the
program are executed repeatedly during some
time period, and the remainder of the program is
accessed relatively infrequently.
 This is called “locality of reference”.
Locality of Reference..
 Temporal locality of reference
 Recently executed instruction is likely to be executed
again very soon (whenever an information item,
instruction or data, is first needed, this item should be
brought into the cache)
 Spatial locality of reference
 Instructions with addresses close to a recently executed
instruction are likely to be executed soon. (instead of
fetching just one item from the main memory to the cache,
it is useful to fetch several items that are located at adjacent
addresses as well)
Operation of Cache Memories..
 Processor issues a Read request, a block of words is
transferred from the main memory to the cache, one
word at a time.
 Subsequent references to the data in this block of words
are found in the cache.
 At any given time, only some blocks in the main memory
are held in the cache.
 Which blocks in the main memory are in the cache is determined
by a “mapping function”.
 When the cache is full, and a block of words needs to be
transferred from the main memory, some block of words
in the cache must be replaced.
 This is determined by a “replacement algorithm”.
Cache Hit

 The processor does not need to know explicitly about the


existence of the cache.
 It simply issues Read and Write requests using
addresses that refer to locations in the memory.
 The cache control circuitry determines whether the
requested word currently exists in the cache.
 If it does, the Read or Write operation is performed on the
appropriate cache location.
 In this case, a read hit or write hit is said to have occurred.
 In a Read operation, the main memory is not involved.
 Data is obtained from the cache.
Cache Hit..
 For a Write operation, the system can proceed in
two ways.
 Write-through protocol
 The cache location and the main memory location are
updated simultaneously.
 Write-back or copy-back protocol
 Only the cache location is updated and it is marked as
updated with an associated flag bit, often called the
dirty or modified bit.
 The main memory location of the word is updated
later, when the block containing this marked word is to
be removed from the cache to make room for a new
block.
Cache Miss
 When the addressed word in a Read operation is
not in the cache, a read miss occurs.
 The block of words that contains the requested word
is copied from the main memory into the cache.
 After the entire block is loaded into the cache, the
particular word requested is forwarded to the
processor.
 Alternatively, this word may be sent to the processor
as soon as it is read from the main memory.
 This approach is called load-through, or early restart,
 Reduces the processor's waiting period, but at the expense
of more complex circuitry.
Cache Miss..

 During a Write operation, if the addressed word is


not in the cache, a write miss occurs.
 Then, if the write-through protocol is used, the
information is written directly into the main memory.
 In the case of the write-back protocol, the block
containing the addressed word is first brought into
the cache, and then the desired word in the cache is
overwritten with the new information.
MAPPING FUNCTIONS
Consider a cache consisting of 128 blocks of 16 words each, for a total of 2048 (2K)
words, and assume that the main memory is addressable by a 16-bit address. The main
memory has 64K words, which we will view as 4K blocks of 16 words each.
BLOCK 0 BLOCK 1 …….. BLOCK 127
BLOCK 0 BLOCK 127 16 words * 128 blocks =2048
---------------------- ---------------------- words (2K words cache memory)
WORD 0 WORD 0
WORD 1 WORD 1
WORD 2 WORD 2 Main memory: 64K words
.. .. (4K blocks , Block 0 to block 4096,
. . each of 16 words)
. . Main memory is addressable by 16
. . bits
WORD 16 WORD 16
MAPPING FUNCTIONS: Direct Mapping
Block j of the main memory
maps onto block j modulo 128 of
the cache

Whenever one of the main


memory blocks 0, 128, 256, . . . is
loaded into the cache, it is stored
in cache block
0. (0 mod 128=128 mod 128=256
mod 128= 0)

Blocks 1, 129, 257, . . . are stored


in cache block 1(1 mod 128=129
mod 128=257 mod 128= 1) and
so on.
Cache Memory Mapping DIRECT MAPPING
Since more than one memory block is mapped onto a given
cache block position, contention may arise for that position
even when the cache is not full

Placement of a block in the cache is determined by its


memory address. The memory address can be divided into
three fields, as shown in Figure 8.16.
Cache Memory Mapping DIRECT MAPPING
• The low-order 4 bits select one of 16 words in a cache block.

• When a new block enters the cache, the 7-bit cache block
field determines the in which cache block should main
memory block be stored. (2^7=128 blocks in cache, out of
which select 1 block)
• The high-order 5 bits of the memory address of the block are
stored in 5 tag bits associated with its location in the cache.
• The tag bits identify which of the 32 main memory blocks
mapped into this cache position is currently resident in the
cache (eg: in block 0 of cache, out of the blocks
0/128/256/512/…. Which block of main memory is in cache)
Cache Memory Mapping DIRECT MAPPING
When the processor generates memory address for read or write, the
following sequence of actions take place:

• The high-order 5 bits of the address are compared with the tag bits
associated with that cache location.
• If they match, then the desired word is in that block of the cache.
• If there is no match, then the block containing the required word must
first be read from the main memory and loaded into the
• cache.

The direct-mapping technique is easy to implement, but it is not very


flexible

Since more than one memory block is mapped onto a given cache block
position, contention may arise for that position even when the cache is not
full
Cache Memory Mapping ASSOCIATIVE MAPPING
This is the most flexible
mapping method

Main memory block can be


placed into any cache block
position.

In this case, 12 tag bits are


required to identify a
memory block when it is
resident in the cache.
(2^12=4096 blocks in main
memory)
Cache Memory Mapping ASSOCIATIVE MAPPING
• The tag bits of an address received from the processor are
compared to the tag bits of each block of the cache to see if
the desired block is present.
• This is called the associative-mapping technique.
• It gives complete freedom in choosing the cache location in
which to place the memory block, resulting in a more
efficient use of the space in the cache.
• When a new block is brought into the cache, it replaces
(ejects) an existing block only if the cache is full.
• In this case, we need an algorithm to select the block to be
replaced The complexity of an associative cache is higher
than that of a direct-mapped cache, because of the need to
search all 128 tag patterns to determine whether a given
block is in the cache.
Cache Memory Mapping SET ASSOCIATIVE MAPPING
• It is the combination of the direct- and associative-
mapping techniques.
• The blocks of the cache are grouped into sets,
and the mapping allows a block of the main
memory to reside in any block of a specific set.
• Hence, the contention problem of the direct
method is eased by having a few choices for block
placement.
• At the same time, the hardware cost is reduced by
decreasing the size of the associative search.
Cache Memory Mapping SET ASSOCIATIVE MAPPING
• An example of this set-
associative-mapping
technique is shown in Figure
8.18
• Here in cache there are two
blocks per set.
• In this case, memory blocks
0, 64, 128, . . . , 4032 map
into cache set 0, and they can
occupy either of the two block
positions within this set.
Cache Memory Mapping SET ASSOCIATIVE MAPPING
• 64 sets means that the 6-bit set field of the address
determines which set of the cache might contain the
desired block.
• The tag field of the address must then be associatively
compared to the tags of the two blocks of the set to check
if the desired block is present.
• This two-way associative search is simple to implement.
• If it is required to have Four blocks per set (fig. 8.18) then
5-bit set field is required (128/4=32 sets)
• Eight blocks per set can be accommodated by a 4-bit set
field (128/8=16 sets)
• A cache that has k blocks per set is referred to as a k-way
set-associative cache.
Performance considerations
 Two key factors in the commercial success of a
computer are performance and cost;
 A common measure of success is the
price/performance ratio.
 Performance depends on how fast machine
instructions can be brought into the processor
and how fast they can be executed.
Performance considerations – Hit
Rate and Miss Penalty
 A successful access to data in a cache is called a hit.
 Hit rate=(number of hits/ number of all attempted
accesses)
 Miss rate=(number of misses/ number of all attempted
accesses)
 Performance is adversely affected by the actions that
need to be taken when a miss occurs.
 A performance penalty is incurred because of the extra
time needed to bring a block of data from a slower unit
in the memory hierarchy to a faster unit.
 Total access time seen by the processor when a miss
occurs is called as the miss penalty.
Performance considerations – Hit
Rate and Miss Penalty
 Consider a system with only one level of cache.
 In this case, the miss penalty consists
 almost entirely of the time to access a block of data
in the main memory
 average access time experienced by the
processor= tavg
 tavg = hC + (1 − h)M
 h be the hit rate
 M the miss penalty
 C the time to access information in the cache
Performance considerations – Caches
on processor chip
 If there are 2 levels cache, L1(separate instruction and
data cache) and L2 (both instruction and data together),
the performance increases
 Thus, the average access time experienced by the
processor in such a system is:
 tavg = h1C1 + (1 − h1)(h2C2 + (1 − h2)M )
 where
 h1 is the hit rate in the L1 caches.
 h2 is the hit rate in the L2 cache.
 C1 is the time to access information in the L1 caches.
 C2 is the miss penalty to transfer information from the L2 cache to
an L1 cache.
 M is the miss penalty to transfer information from the main memory
to the L2 cache.
Performance considerations – Example
Consider a computer that has the following parameters:
Access time to the cache memory=ζ
Access time to main memory =10 ζ
a) Find the miss penalty, if a cache miss occurs when a block
of 8 words is transferred from main memory to the cache
b) Find the improvement in performance due to the
availability of cache memory, if the hit rates in cache are
0.95 for instructions and 0.9 for data
Performance considerations – Example
Solution: (a)
• Time for initial access to cache= ζ (But it is a miss, hence
read from main memory)
• Time taken to transfer first word: 10 ζ (Memory access)
• Time taken to transfer remaining 7 words= rate of one word
every ζ seconds=7 ζ
• Time to transfer the word to the processor after the block is
loaded to cache= ζ
• (assuming no load-through).
• Thus, the miss penalty in this computer is given by:
M = ζ + 10 ζ + 7 ζ + ζ = 19 ζ
Performance considerations – Example
Solution: (b)
• Let us assume that 30% of the instructions are meant for
read/write operations (Hence 130 memory operations=100
instructions+30 data)
• Given: Hit rates in cache are 0.95 for instructions and 0.9
for data
• Assume, miss penalty is the same for read/write operations
• tavg = hC + (1 − h)M
• Improvement in performance due to use of cache=
BUS STRUCTURES
• The bus shown in
Figure 7.1 is a simple
structure that
implements the
interconnection
between various
components.
• Only one
source/destination pair
of units can use this
bus to transfer data at
a time.
BUS STRUCTURES
• The bus consists of three sets of lines used to carry address,
data, and control signals.
• I/O device interfaces are connected to these lines, as shown in
Figure 7.2
• Each I/O device is assigned a unique set of addresses for the
registers in its interface.
• When the processor places a particular address on the address
lines, it is examined by the address decoders of all devices on the
bus.
• The device that recognizes this address responds to the
commands issued on the control lines.
• The processor uses the control lines to request either a Read or a
Write operation, and the requested data are transferred over the
data lines.
BUS STRUCTURES
BUS STRUCTURES
• When I/O devices and the memory share the same
address space, the arrangement is called memory-
mapped I/O
• Any machine instruction that can access memory
can be used to transfer data to or from an I/O
device.
• For example:
 If the input device is a keyboard and if DATAIN
is its data register
 If display device is the output and DATAOUT is
its data register
BUS STRUCTURES
• Load R2, DATAIN; reads the data from DATAIN
and stores them into processor register R2.
• Store R2, DATAOUT; sends the contents of
register R2 to location DATAOUT
• The status and control registers contain
information relevant to the operation of the I/O
device.
• The device’s interface circuit contains:
 The address decoder
 The data and status registers
 The control circuitry required to coordinate I/O transfers
BUS OPERATIONS
A bus requires a set of rules, often called a bus protocol, that
govern how the bus is used by various devices.
The bus protocol determines when a device may place
information on the bus, when it may load the data on the bus
into one of its registers, and so on.
These rules are implemented by control signals
One control line, usually labelled R/W, specifies whether a
Read or a Write operation is to be performed.
It specifies Read when set to 1
Write when set to 0.
BUS OPERATIONS
• Examples of control line:
R/W, specifies whether a Read or a Write operation is
to be performed.
It specifies Read when set to 1
Write when set to 0.
DataSize: When several data sizes are possible, such
as byte, halfword, or word, the required size is indicated
by another control line.
• The bus control lines also carry timing information.
• They specify the times at which the processor and the I/O
devices may place data on or receive data from the data
lines.
SYNCHRONOUS BUS
• On a synchronous bus, all devices derive timing
information from a control line called the bus clock,
shown in Figure below.
• The signal on this line has two phases:
high level
low level.
• The two phases constitute a clock cycle.
SYNCHRONOUS BUS
• The address and data lines in Figure 7.3 are shown as if
they are carrying both high and low signal levels at the same
time.
• This is a common convention for indicating that some lines
are high and some low, depending on the particular address
or data values being transmitted.
• The crossing points indicate the times at which these
patterns change.
• A signal line at a level half-way between the low and high
signal levels indicates periods during which the signal is
unreliable, and must be ignored by all devices.
SYNCHRONOUS BUS
SYNCHRONOUS BUS – Read operation
Sequence of signal events during an input (Read)
operation
• At time t0, the master places the device address on the
address lines and sends Read control signal
• The command may also specify the length of the operand to
be read.
• Information travels over the bus at a speed determined by its
physical and electrical characteristics
• All devices decodes the address and control signals, but
only the addressed device (the slave) respond at time t1 by
placing the requested input data on the data lines.
• At the end of the clock cycle, at time t2, the master loads the
data on the data lines into one of its registers.
SYNCHRONOUS BUS – Write operation
Sequence of signal events during an output (Write)
operation
• At time t0, the master places the device
address on the address lines and sends
Write control signal
• The master also places the output data on
the data lines at time t0
• At time t2, the addressed device loads the
data into its data register.
SYNCHRONOUS BUS – Read with delays
SYNCHRONOUS BUS – Read with delays
• Because signals take time to travel from one device to another, a
given signal transition is seen by different devices at different
times
• The master sends the address and command signals on the rising
edge of the clock at the beginning of the clock cycle (at t0).
• Due to internal delay, these signals do not actually appear on the
bus until tAM,.
• A short while later, at tAS, the signals reach the slave.
• The slave decodes the address, and at t1 sends the requested
data.
• Here again, the data signals do not appear on the bus until tDS.
• They travel toward the master and arrive at tDM.
• At t2, the master loads the data into its register.
SYNCHRONOUS BUS – One clock cycle
The disadvantages of Read/Write in one clock cycle:
 The clock period, t2 − t0, must be chosen to accommodate
the longest delays on the bus and the slowest device
interface. This forces all devices to operate at the speed of
the slowest device.
 The processor has no way of determining whether the
addressed device has actually responded.
 At t2, it simply assumes that the input data are
available on the data lines in a Read operation, or that
the output data have been received by the I/O device in
a Write operation.
 If, because of a malfunction, a device does not operate
correctly, the error will not be detected.
SYNCHRONOUS BUS – Multiple clock cycle Read
SYNCHRONOUS BUS – Multiple clock cycle Read
With respect to the Fig 7.5
• During clock cycle 1, the master sends address and
command information on the bus, requesting a Read
operation.
• The slave receives this information and decodes it.
• The slave starts accessing the data at the beginning of clock
cycle 2.
• But it places the data on the bus during clock cycle 3.
• The slave asserts a control signal called Slave-ready at the
same time.
• The master, which has been waiting for this signal, loads the
data into its register at the end of the clock cycle.
SYNCHRONOUS BUS – Multiple clock cycle Read –
continued Fig 7.5

• The slave removes its data signals from the bus and returns
its Slave-ready signal to the low level at the end of cycle 3.
• The bus transfer operation is now complete, and a new
transfer might start in clock cycle 4.
• The Slave-ready signal is an acknowledgment from the slave
to the master, confirming that the requested data have been
placed on the bus.
• In the example in Figure 7.5, the slave responds in cycle 3.
• A different device may respond in an earlier or a later cycle
ASYNCHRONOUS BUS

• It does not use clock signal


• Handshake protocol is used between the master
and the slave.
• A handshake is an exchange of command and
response signals between the master and the
slave.
• A control line called Master-ready is asserted by
the master to indicate that it is ready to start a data
transfer.
• The Slave responds by asserting Slave-ready.
ASYNCHRONOUS BUS

A data transfer controlled by a handshake protocol proceeds as


follows.
• The master places the address and command information on the
bus.
• Then it indicates to all devices that it has done so by activating
the Master-ready line.
• This causes all devices to decode the address.
• The selected slave performs the required operation and activates
the Slave-ready line.
• The master waits till it receives Slave-ready signal as high
• In the case of a Read operation, it also loads the data into one of
its registers
ASYNCHRONOUS BUS (Read operation)
ASYNCHRONOUS BUS Read operation (Input)(Figure
7.6)

Sequence of events during Input (Read operation)


t0—The master places the address and command
information on the bus, and all devices on the bus decode
this information.

t1—The master sets the Master-ready line to 1 to inform the


devices that the address and command information is ready
(All the slaves decodes the information sent by the master).

t2—The selected slave places data on the data lines. At the


same time, it sets the Slave-ready signal to 1 (Indicates to
the master that requested data is available on data line)
ASYNCHRONOUS BUS Read operation (Input) (Fig 7.6)
continued

t3—The master loads the data into its register. Then, it drops
the Master-ready signal, indicating that it has received the data

t4 --The master removes the address and command


information from the bus.

t5—When the Master-ready signal becomes low at t3, duration


t3-t5 is allowed for input device interface to remove the data
and the Slave-ready signal from the bus.

This completes the input transfer.


ASYNCHRONOUS BUS (Write operation or Output)
ASYNCHRONOUS BUS Write operation (Output) (Fig 7.7)

• In this case, the master places the output data on the data
lines at the same time that it transmits the address and
command information.

• The selected slave loads the data into its data register
when it receives the Master-ready signal and indicates that
it has done so by setting the Slave-ready signal to 1.

• The remainder of the cycle is similar to the input operation.


(Remaining things should be explained by the
students if it is asked in the exam t0 - t5)
SYNCHRONOUS BUS - ASYNCHRONOUS BUS
COMPARISON

ASYNCHRONOUS BUS
• Because of handshake protocol, delay in signal doesn’t
cause errors (circuit design becomes simple)
• Data transfer rate is slow because of waiting for Master
ready and Slave read signals
SYNCHRONOUS BUS
• Clock signal duration should be properly designed to take
are of delays
• Faster transfer rates (If slow devices are present, the
number of clock cycles to perform each operation can be
increased
Problems
Problem: A computer system uses 32-bit memory addresses
and it has a main memory consisting of1Gbytes. It has a 4K-
byte cache organized in the block-set-associative manner,
with 4 blocks per set and 64 bytes per block. Calculate the
number of bits in each of the Tag, Set, and Word fields of the
memory address.
Solution: Consecutive addresses refer to bytes.
A block has 64 bytes; hence the Word field is 6 bits long.
With 4 × 64 = 256 bytes in a set (given: 1 set=4 blocks)
there are 4K/256 = 16 sets, requiring a Set field of 4 bits. This
leaves 32 − 4 − 6 = 22 bits for the Tag field.
Problems
Problem: Describe a structure similar to the one in
Figure 8.10 for an 8M × 32 memory using 512K × 8
memory chips.

Solution: The required structure is essentially the


same as in Figure 8.10, except that 16 rows are
needed, each with four 512 × 8 chips. Address lines
A18−0 should be connected to all chips. Address
lines A22−19 should be connected to a 4-bit decoder
to select one of the 16 rows.
Problems
1. Describe a structure similar to that in Figure 8.10 for a
16M × 32 memory using 1M × 4 memory chips.

2. A block-set-associative cache consists of a total of 64


blocks, divided into 4-block sets. The main memory contains
4096 blocks, each consisting of 32 words. Assuming a 32-bit
byte-addressable address space, how many bits are there in
each of the Tag, Set and Word fields?
Problem -- Additional
Problem: Suppose that a computer has a processor with two
L1 caches, one for instructions and one for data, and an L2
cache. Let τ be the access time for the two L1 caches. The
miss penalties are approximately 15τ for transferring a block
from L2 to L1, and 100τ for transferring a block from the main
memory to L2. For the purpose of this problem, assume that
the hit rates are the same for instructions and data and that
the hit rates in the L1 and L2 caches are 0.96 and 0.80,
respectively.
Problem – continued from previous slide
(a) What fraction of accesses miss in both the L1 and L2 caches, thus
requiring access to the main memory?

(b) What is the average access time as seen by the processor?

(c) Suppose that the L2 cache has an ideal hit rate of 1. By what factor
would this reduce the average memory access time as seen by the
processor?

(d) Consider the following change to the memory hierarchy. The L2


cache is removed and the size of the L1 caches is increased so that their
miss rate is cut in half. What is the average memory access time as seen
by the processor in this case?
Problem – continued from previous slide
Solution: The average memory access time with one cache level is
given as
t = hC + (1 − h)M
avg

With L1 and L2 caches, the average memory access time is given as


t = h C + (1 − h )(h C + (1 − h )M)
avg 1 1 1 2 2 2

(a) The fraction of memory accesses that miss in both the L1 and L2
caches is
(1 − h )(1 − h ) = (1 − 0.96)(1 − 0.80) = 0.008
1 2

(b) The average memory access time using two cache levels is
t = 0.96τ + 0.04(0.80 × 15τ + 0.20 × 100τ)
avg

= 2.24τ
Problem – continued from previous slide
(c) With no misses in the L2 cache, we get:
tavg(ideal) = 0.96τ + 0.04 × 15τ = 1.56τ
Therefore,
[tavg(actual)/ tavg(ideal)] = [2.24τ / 1.56τ ]= 1.44

(d) With larger L1 caches and the L2 cache removed, the


access time is
tavg = 0.98τ + 0.02 × 100τ = 2.98τ

You might also like