0% found this document useful (0 votes)
13 views

Module 4 Memory

Uploaded by

ssedits050
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Module 4 Memory

Uploaded by

ssedits050
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

1

Memory Concept
• Digital computer works on stored programmed concept
introduced by Von-Neuman.

• The memory is used to store the information, which includes


both program and data.

• Ideally, we need a memory which should be fast, large and


inexpensive and can be used for every purpose.

• Increased speed and size are achieved at increased cost.

• As one memory can’t be used for execution as well as for


permanent storage, we use different kinds of memories at
different levels.

2
Memory Hierarchy

3
Some Basic Concepts
• Maximum size of the Main Memory is determined by the
addressing Scheme.
• A machine whose instruction generate 32-bit address can capable
of addressing up to 4G memorory locations
• Byte-addressable
• CPU-Main Memory Connection

Processor Memory
k-bit
address bus
MAR
n-bit
data bus Up to 2k addressable
MDR locations

Word length = n bits

Control lines
( R / W , MFC, etc.)

4
Some Basic Concepts (cont...)
 Measures for the speed of a memory:
 memory access time.
 memory cycle time.

 An important design issue is to provide a computer system


with as large and fast a memory as possible, within a given
cost target.

 Several techniques to increase the effective size and


speed of the memory:
 Cache memory (to increase the effective speed).
 Virtual memory (to increase the effective size).

5
• Each memory cell can hold one bit of information.

• Memory cells are organized in the form of an array.

• One row is one memory word.

• All cells of a row are connected to a common line, known as


the “word line”.

• Word line is connected to the address decoder.

• Sense/write circuits are connected to the data input/output


lines of the memory chip.

6
16 x 8 orgmanization

7 7 1 1 0 0
W0




A0 W1




A1
Address Memory
• • • • • • cells
decoder • • • • • •
A2
• • • • • •
A3

W15



Sense / R/W
Write ckt CS

Data input /output lines: b7 b1 b0

7
• Retain its state as long as power is on.
• Two transistor inverters are cross connected to implement a basic flip-flop.
• The cell is connected to 1 word line and 2 bits lines by transistors T1 & T2.
• When word line is at ground level, the transistors are turned off and the
latch retains its state
• Read operation: In order to read state of SRAM cell, the word line is
activated to close switches T1 and T2. Sense/Write circuits at the bottom
monitor the state of b and b’ 
b SRAM Cell b

T1 T2
X Y

Word line

Bit lines

8
Dynamic RAMs (DRAMs):
• Do not retain their state indefinitely.
• Contents must be periodically refreshed.
• Contents may be refreshed while accessing them for reading.

DRAM Cell

9
1. SRAM cell is complex and 1. DRAM cell is simpler and
bigger than DRAM cell. smaller than DRAM cell.

2. The packing density is low 2. The packing density is high


i.e. less number of cells per i.e. more number of cells per
unit area. unit area.

3. SRAM is more expensive 3. DRAM is less expensive than


than DRAM as more number SRAM as less number of
of transistors are used. transistors are used.

4. SRAM is faster than DRAM. 4. DRAM is slower than SRAM.

5. SRAM is normally used for 5. DRAM is normally used for


cache memory as it is faster. main memory as it is less
cost.

10
Construction of Larger Static Memories
21-bit
addresses 19-bit internal chip address
A
0
A 1

A 19
A 20

2-bit
decoder

512 K ´ 8
memory chip
D 31-24
D 23-16
D 15-8
D 7-0

512 K ´ 8 memory chip

2Mx32 using
512Kx8 SRAM chips 19-bit
address
8-bit data
input/output

Chip select 11
Contruction of Static Memories (cont.)
• Implement a memory unit of 2M words of 32 bits each
by using 512K x 8 static memory chips.

 Each column consists of 4 chips.

 Each chip implements one byte position.

 A chip is selected by setting its chip select control line to 1.

 21 bits to address a 32-bit word.

 High order 2 bits are needed to select the row, by activating


the four Chip Select signals.

 19 bits are used to access specific byte locations inside the


selected chip.

12
• Processor is much faster than the main memory.
 As a result, the processor has to spend much of its time
waiting while instructions and data are being fetched
from the main memory.
 Major obstacle towards achieving good performance.

• Speed of the main memory cannot be increased


beyond a certain point.
• Cache memory is an architectural arrangement
which makes the main memory appear faster to
the processor than it really is.
• Cache memory is based on the property of computer
programs known as “locality of reference”.

13
• Analysis of programs indicates that many instructions in localized
areas of a program are executed repeatedly during some period
of time, while the others are accessed relatively less frequently.
 These instructions may be the ones in a loop, nested loop or few
procedures calling each other repeatedly.
 This is called “locality of reference”.

• Temporal locality of reference:


• Recently executed instruction is likely to be executed again
very soon.

• Spatial locality of reference:


• Instructions with addresses close to a recently instruction are
likely to be executed soon.

14
Placement of cache

Main
Processor Cache memor
y

15
• Processor issues a Read request, a block of words is
transferred from the main memory to the cache, one word
at a time.
 Subsequent references to the data in this block of
words are found in the cache.
• At any given time, only some blocks in the main memory
are held in the cache.
• Which blocks in the main memory are in the cache is
determined by a “mapping function”.
• When the cache is full, and a block of words needs to be
transferred from the main memory, some block of words
in the cache must be replaced.
 This is determined by a “replacement algorithm”.

16
• If the data is in the cache it is called a Read or Write hit.

• Read hit:
 The data is obtained from the cache.

• Write hit:
 Cache has a replica of the contents of the main memory.

 Contents of the cache and the main memory may be updated


simultaneously. This is the write-through protocol.

 Update the contents of the cache, and mark it as updated by


setting a bit known as the dirty bit or modified bit. The contents
of the main memory are updated when this block is replaced.
This is write-back or copy-back protocol.

17
• A bit called as “valid bit” is provided for each block.
• If the block contains valid data, then the bit is set to 1, else it is 0.
• Valid bits are set to 0, when the power is just turned on.
• When a block is loaded into the cache for the first time, the valid bit is set to 1.

• Data transfers between main memory and disk occur directly bypassing the
cache.
• When the data on a disk changes, the main memory block is also updated.
• However, if the data is also resident in the cache, then the valid bit is set to 0.

• What happens if the data in the disk and main memory changes and the write-
back protocol is being used?
• In this case, the data in the cache may also have changed and is indicated by
the dirty bit.
• The copies of the data in the cache, and the main memory are different. This
is called the cache coherence problem.
• One option is to force a write-back before the main memory is updated from
the disk.

18
 Mapping functions determine how memory blocks are
placed in the cache.

 A simple processor example:


 Cache consisting of 128 blocks of 16 words each.
 Total size of cache is 2048 (2K) words.
 Main memory is addressable by a 16-bit address.
 Main memory has 64K words.
 Main memory has 4K blocks of 16 words each.

 Three mapping functions:


 Direct mapping
 Associative mapping
 Set-associative mapping.

19
Main Block 0
memory

Block 1
• Block j of the main memory maps
Cache
tag
to j modulo 128 of the cache. 0
Block 0
maps to 0, 129 maps to 1.
tag
Block 1

Block 127 • More than one memory block is


Block 128
mapped onto the same position
tag
in the cache.
Block 127 Block 129

• May lead to contention for cache


blocks even if the cache is not
full.
Block 255
Tag Block Word
Block 256
5 7 4 • Resolve the contention by
Main memory address Block 257 allowing new block to replace the
old block, leading to a trivial
replacement algorithm.
Block 4095

20
Main
• Main memory block can be placed into
Block 0
memory
any cache position.
Cache Block 1
tag
Block 0 • Memory address is divided into two
tag
Block 1 fields:
Block 127
• Low order 4 bits identify the word
within a block.
Block 128
• High order 12 bits or tag bits identify
tag
Block 127 Block 129 a memory block when it is resident
in the cache.

• Flexible, and uses cache space


Block 255
Tag Word efficiently.
12 4 Block 256 • Replacement algorithms can be used to
Main memory address Block 257 replace an existing block in the cache
when the cache is full.
• Cost is higher than direct-mapped cache
because of the need to search all 128
Block 4095 patterns to determine whether a given
block is in the cache.
21
• Set-associative mapping combination of
direct and associative mapping.
• Blocks of cache are grouped into sets.
• Mapping function allows a block of the
main memory to reside in any block of a
specific set.

• Divide the cache into 64 sets, with two


blocks per set.
• Memory block 0, 64, 128 etc. map to
block 0, and they can occupy either of
the two positions.

• Memory address is divided into three


fields:
 6 bit field determines the set number.
 High order 6 bit fields are compared
to the tag fields of the two blocks in a
set.

22
• A key design objective of a computer system is to
achieve the best possible performance at the lowest
possible cost.

• Price/performance ratio is a common measure of


success.

• Performance of a processor depends on:


 How fast machine instructions can be brought into
the processor for execution.
 How fast the instructions can be executed.

23
• Hit rate or Hit ratio (h) = Total no. of cache hits/Total no. of
memory access

• Miss rate or Miss ratio= (1-h)

• Hit rate can be improved by increasing block size, while


keeping cache size constant.

• Block sizes that are neither very small nor very large give
best results.

24
• Let,
• h be the hit ratio or hit rate
• M is the miss penalty (i.e. time to acces to main memory).
• c is the time to acess information from cache

• The average access time Tavg by the processor is

Tavg. = h*c + (1-h)*M


• In high performance processors 2 levels of caches are normally used.

• Avg access time in a system with 2 levels of caches is

T avg = h1c1+(1-h1)h2c2+(1-h1)(1-h2)M

where, h1 & h2 are hit ratio of L1 & L2 cache respectively,


c1 & c2 are time to access from L1 & L2 cache
respectively and M is time to access from main memory.

25
Performance Measurements

In a two-level cache system, the level one cache has a hit time of 1 ns (inside
the CPU), hit rate of 90%, and a miss penalty of 20 ns. The level two cache
has a hit rate of 95% and a miss penalty of 220 ns. What is the average
memory access time?

avg T = h1c1 + (1-h1) h2 c2 + (1-h1)(1-h2)M


={.9*1 + (1-.9) *(.95)* 20 + (1-.9)*(1-.95)*220} ns
= {.9 + 1.9 + 1.1} ns
= 3.9 ns

26
Performance Measurements

The memory access time is 1 nanosecond for a read operation with a hit in cache, 5
nanoseconds for a read operation with a miss in cache, 2 nanoseconds for a write
operation with a hit in cache and 10 nanoseconds for a write operation with a miss in
cache. Execution of a sequence of instructions involves 100 instruction fetch
operations, 60 memory operand read operations and 40 memory operand write
operations. The cache hit-ratio is 0.9. The average memory access time (in
nanoseconds) in executing the sequence of instructions is __________.

27
Performance Measurements
Q)
A memory system has a two-level cache memory. Their access times are given to be 15
ns & 45 ns respectively and their hit ratios are 80% & 90% respectively. The main memory
access time is given to be 200ns. Assuming that 60% of the accesses are for read and
main memory write accesses are 50% slower than read accesses. Find out the average
access time. (Assume 200ns is for memory read access).

Data: Calculation:
Read Operation T= h1* C1 + (1-h1)* h2* C2 + (1-h1)*(1-h2)*M

C1 =15, C2=45 and M=200 Read Operation


T = 0.8 * 15 + 0.2 * 0.9 * 45 + 0.2 * 0.1 * 200
Read operations = 60% = 12+8.1+4 = 24.1

Write Operation Write Operation


T = 0.8 * 15 + 0.2 * 0.9 * 45 + 0.2 * 0.1 * 300
C1 =15, C2=45 and = 12+8.1+6 = 26.1
M=200+(200*50%)=300

Write operations = 40% Tavg= =24.9

28
Replacement Algorithms

Replacement algorithms are only needed for associative and


set associative techniques.
1. Least Recently Used (LRU)
2. First-in First-out (FIFO)
3. Least Frequently Used (LFU)
4. Random Replacement (RR)

29
Least Recently Used (LRU) –
Replace the cache line that has been in the cache the
longest with no references to it.
 This algorithm discards the least recently used item
from the cache in order to make space for the new data item.
 In order to achieve this, history of all data items that is
which data item is used when, is kept.
 A variable known as Aging Bit is used to store this
information
 Although this algorithm provides better performance
but cost of implementation is much more.

30
Example:

Let we have a sequence 7, 0 ,1, 2, 0, 3, 0, 4, 2, 3 and cache memory has 3 lines.

31
First-in First-out (FIFO) –

 Replace the cache line/block that has been in the


cache the longest.
 The first in first out algorithm removes the page
that has not been used for a long time.
 It treats the pages as a circular buffer, and pages
are removed in a round robin fashion.

32
Let we have a sequence 7, 0 ,1, 2, 0, 3, 0, 4, 2, 3, and cache memory has 4 lines.

33
Least Frequently Used –

Replace the cache line that has experienced the fewest


references
 This algorithm counts how often data items have been
used.
The data items which are used less are deleted from the
cache first.
If all objects have same frequency then this algorithm
randomly discards any data item.

34
Random Replacement –

•Pick a line at random from the candidate lines


•This algorithm randomly selects any of the data item from the
cache and replace it with the desired one.
•This algorithm does not need to keep track of the history of the
data contents and it does not need any data structure.
•Due to which it consumes less resources, therefore its cost is
less as compare to other algorithms.

35
 Recall that an important challenge in the design of a
computer system is to provide a large, fast memory
system at an affordable cost.

 Architectural solutions to increase the effective speed and


size of the memory system.

Cache memories were developed to increase the


effective speed of the memory system.

Virtual memory is an architectural solution to


increase the effective size of the memory system.

36 36
 Recall that the addressable memory space depends on the
number of address bits in a computer.
 For example, if a computer issues 32-bit addresses, the
addressable memory space is 4G bytes.

 Physical main memory in a computer is generally not as


large as the entire possible addressable space.
 Physical memory typically ranges from a few hundred
megabytes to 1G bytes.

 Large programs that cannot fit completely into the main


memory have their parts stored on secondary storage
devices such as magnetic disks.
 Pieces of programs must be transferred to the main memory
from secondary storage before they can be executed.

37 37
 When a new piece of a program is to be transferred to the
main memory, and the main memory is full, then some
other piece in the main memory must be replaced.

 Recall this is very similar to what we studied in case of cache


memories.

 Operating system automatically transfers data between


the main memory and secondary storage.

 Application programmer need not be concerned with this


transfer.

 Also, application programmer does not need to be aware of


the limitations imposed by the available physical memory.

38 38
 Techniques that automatically move program and data
between main memory and secondary storage when they
are required for execution are called virtual-memory
techniques.

 Processor issues binary addresses for instructions and


data.
 These binary addresses are called logical or virtual
addresses.

 Virtual addresses are translated into physical addresses by


a combination of hardware and software subsystems.

39 39
Processor • Memory management unit (MMU)
translates virtual addresses into
Virtual address physical addresses.

Data MMU • If the desired data or instructions are


in the main memory they are fetched
Physical address
as described previously.
Cache
• If the desired data or instructions are
Data Physical address
not in the main memory, they must
be transferred from secondary
Main memory storage to the main memory.

DMA transfer • MMU causes the operating system to


bring the data from the secondary
Disk storage storage into the main memory.

40 40
Address translation
 Assume that program and data are composed of fixed-
length units called pages.
 A page consists of a block of words that occupy
contiguous locations in the main memory.
 Page is a basic unit of information that is transferred
between secondary storage and main memory.
 Size of a page commonly ranges from 2K to 16K bytes.
 Pages should not be too small, because the access
time of a secondary storage device is much larger
than the main memory.
 Pages should not be too large, else a large portion of
the page may not be used, and it will occupy valuable
space in the main memory.

41
41
Address translation (contd..)
Concepts of virtual memory are similar to the concepts
of cache memory.
•Cache memory:
- Introduced to bridge the speed gap between the
processor and the main memory.
- Implemented in hardware.
•Virtual memory:
- Introduced to bridge the speed gap between the
main memory and secondary storage.
- Implemented in part by software.

42
42
 Each virtual or logical address generated by a
processor is interpreted as a virtual page number (high-
order bits) plus an offset (low-order bits) that specifies
the location of a particular byte within that page.
 Information about the main memory location of each
page is kept in the page table.
 Main memory address where the page is stored.
 Current status of the page.
 Area of the main memory that can hold a page is called
as page frame.
 Starting address of the page table is kept in a page
table base register.

43
43
•Virtual page number generated by the processor is added
to the contents of the page table base register.

•This provides the address of the corresponding entry in


the page table.

•The contents of this location in the page table give the


starting address of the page if the page is currently in the
main memory.

44
44
Virtual address from processor
PTBR holds
Page table base register
the address of
the page table. Page table address Virtual page number Offset
Virtual address is
interpreted as page
+ number and offset.
PAGE TABLE
PTBR + virtual
page number provide
the entry of the page This entry has the starting location
in the page table. of the page.

Page table holds


information about each
page. This includes the
starting address of the Control
bits
Page frame
in memory Page frame Offset
page in the main memory.

Physical address in main memory

45
45
 Page table entry for a page also includes some control bits
which describe the status of the page while it is in the main
memory.
 One bit indicates the validity of the page.
 Indicates whether the page is actually loaded into the main
memory.
 Allows the operating system to invalidate the page without
actually removing it.
 One bit indicates whether the page has been modified during
its residency in the main memory.
 This bit determines whether the page should be written back
to the disk when it is removed from the main memory.
 Similar to the dirty or modified bit in case of cache memory.

46
46
 Where should the page table be located?
 Recall that the page table is used by the MMU for every read
and write access to the memory.
 Ideal location for the page table is within the MMU.
 Page table is quite large.
 MMU is implemented as part of the processor chip.
 Impossible to include a complete page table on the chip.
 Page table is kept in the main memory.
 A copy of a small portion of the page table can be
accommodated within the MMU.
 Portion consists of page table entries that correspond to the
most recently accessed pages.

47
47
THANK YOU

48

You might also like