3 - Memory
3 - Memory
MEMORY
CHAPTER 3: MEMORY
3.1 Memory
An ES functionality consists of three aspects:
1. Processing: Transmission of data
2. Storage: Retention of data for later use
3. Communication: Transfer of data
A memory stores large number of items commonly referred as words. Each word consists of
a specific number of bits, for e.g. 8 bits. Therefore, a memory can be viewed as m words of n
bits each, for a total of m x n bits as shown in figure below:
If a memory has k address inputs, it can have up to 2k words (m=2k) which implies that; k=log2(m).
This means that log2(m) address input signals are required to identify a particular word. Also, n data
signals are required to output a selected word.
For example: 4096 x 8 memory can store 32768 bits and require 12, (i.e. log 4096 / log2) i.e k=
log2(4096) = 12 address signals and eight I/O data signals. i.e k= log2(4096) = 12
A register is made up of flip-flops. Writing is easily accomplished, just put the data in the data lines
and enable load. Similarly, RAMs which are built around basic transistor storage cells, also have
faster writing ability. These memories are rewritable.
Storage Permanence:
It is the ability to hold the stored bits. The ability is temporary or permanent.
Range of storage permanence are: -
• High end: This range of memory essentially never loses bits e.g., mask-programmed ROM,
OTPROM. The programs of embedded system are placed in this range of memory.
• Middle range: This range memory holds for a bits over certain period of time as days,
months, or years after memory’s power source turned off e.g., NVRAM, a battery backed
RAM or flash memory.
• Lower range: This range memory holds bits as long as power supplied to memory e.g., SRAM
• Low end: It begins to lose bits almost immediately after written i.e refreshing circuit is
needed to hold data correctly e.g., DRAM
The programming of programmable connection depends on the type of ROM being used. Common
ROM types are:
1. Masked Programmed ROM (Flash)
2. One-time programmable ROM (OTP-ROM)
3. Erasable ROM (EPROM)
4. Electrically Erasable Programmable ROM (EEPROM)
1. Mask-programmed ROM
The connections of this ROM are programmed at fabrication by creating an appropriate set of masks.
It can be written only once (in the factory). It has the lowest write ability. But it stores data for ever.
Thus, it has the highest storage permanence. The bits never change unless damaged. These are
typically used for final design of high-volume systems.
2. OTP ROM:
It is One-time programmable ROM. The Connections of this ROM are programmed after
manufacture by user. The user provides file of desired contents of ROM. The file is input to machine
by ROM programmer. Each programmable connection is a fuse. The ROM programmer blows fuses
where connections should not exist. This is done by passing a large current. General characteristics
are:
• Very low write ability: typically written only once and requires ROM programmer device
• Very high storage permanence: bits don’t change unless reconnected to programmer and
more fuses blown
• Commonly used in final products: cheaper, harder to inadvertently modify
5. Flash Memory:
It is an extension of EEPROM. It has the same floating gate principle and same write ability and
storage permanence. It can be erased at a faster rate i.e. large blocks of memory erased at once,
rather than one word at a time. The blocks are typically several thousand bytes large
• Writes to single words may be slower
Entire block must be read, word updated, then entire block written back
• Used with embedded systems storing large data items in nonvolatile memory
e.g., digital cameras, TV set-top boxes, cell phones
• The Internal structure of RAM is more complex than ROM. Each word consists of several
memory cells, each storing 1 bit. Each input and output data line connect to each cell in its
column. r/w connected to every cell.
• Each word enable line from the decoder connects to every cell in its row.
• when row is enabled by decoder, each cell has logic that stores input data bit when r/w
indicates write or outputs stored bit when r/w indicates read.
A lower power SRAM cell may be designed by using cross-coupled CMOS inverters. The most
important advantage of this circuit topology is that static power dissipation is very small, essentially,
it is limited by small leakage current. Other advantages of this design are high noise immunity due to
larger noise margins, and the ability to operate at lower power supply voltage.
The major disadvantage of this topology is larger cell size. The circuit structure of the full CMOS
static RAM cell is shown in figure above.
The memory cell consists of simple CMOS inverters connected back to back, and two access
transistors. The access transistors are turned ON whenever a word line is activated for read or write
operation, connecting the cell to the complementary bit line columns.
To select a cell, the two access transistors must be ON so the elementary cell (flipflop) can be
connected to the internal SRAM circuitry. These two access transistors of a cell are connected to the
word line (also called row or X address). The selected row will be set at Vcc. The two flip-flop sides
are connected to a pair of lines data and data’ (also called columns or Y address).
Figure above shows the internal structure of a DRAM memory cell. DRAM stores each bit in a storage
cell consisting of a capacitor and a transistor. Capacitors tend to lose their charge rather quickly,
thus, the need for recharging or refreshing.
The presence or absence of charge in the capacitor determines whether the cell contains 1 or 0. A
typical DRAM cell’s minimum refresh rate is once every 15.625 microseconds. Usually the refresh
circuitry consists of a refresh counter which contains the address of the row to be refreshed which is
applied to the chip’s row address lines and a timer that increments the counter to step through the
rows, this counter may be part of the memory controller circuitry, or on the memory chip itself. Two
refreshing strategies may be employed: burst and distributed.
• Burst refresh: a series of refresh cycles are performed one after another until all the rows
have been refreshed.
• Distributed refresh: refresh cycles are performed at regular intervals, interspersed with
memory accesses.
RAM variation
1. Pseudo Static RAM (PSRAM):
These are RAMS with built in memory refresh controller. It appears to behave much like an
SRAM. In contrast to true SRAM, PSRAM may be busy refreshing itself when accessed; which
could slow access time and add some system complexity. It is the popular low-cost high-
density memory alternative to SRAM.
• Faster the access time, the greater is the cost per bit
• Greater the storage capacity, smaller the cost per bit
• Greater storage capacity, slower access time
To meet performance requirements, the designer wishes to use expensive, relatively lower-capacity
memories with short access times. But ultimately, the cost increases. So, the designer should employ
a memory hierarchy. A typical hierarchy is illustrated in figure below:
Cache Memory
• Cache memory is a small-sized type of volatile computer memory that provides high-speed
data access to a processor and stores frequently used computer programs, applications and
data.
• It is the fastest memory in a computer, and is typically integrated onto the motherboard and
directly embedded in the processor or main random-access memory (RAM).
• Cache is usually designed using static RAM rather than dynamic RAM, which is one reason
that cache is more expensive but faster than main memory.
• When the processor attempts to read a
word of memory, a check is made to
determine if the word is in the cache. If so,
the word is delivered to the processor. If
not, a block of main memory, consisting of
fixed number of words is read into the
cache and then the word is delivered to the
processor.
• Cache connects to the processor via data
control and address line.
• The data and address lines also attached to
data and address buffer which attached to
a system bus from which main memory is
reached.
• When a cache hit occurs, the data and address buffers are disabled and the communication
is only between processor and cache with no system bus traffic.
• When a cache miss occurs, the desired word is first read into the cache and then transferred
from cache to processor. For later case, the cache is physically interposed between the
processor and main memory for all data, address and control lines.
1. Direct Mapping
• It is the simplest mapping technique.
• It maps certain block of main memory into only one possible cache line, i.e. a given main
memory block can be placed in one and only one place on cache.
• It implements one-to-many function, i.e., one cache line can map many blocks of
memory.
• The cache line no can be calculated as:
i = j MOD m
Where,
i = cache line number; j = main memory block number; m = number of lines in the
cache
For eg, if there are 16 blocks of main memory and 4 cache lines, and if we need to
identify which cache line maps the block no 14 of the main memory.
Here, i = 14 MOD 4 =2
So, the block no 14 of the main memory is mapped by cache line no 2.
This tells to look at cache line no 1 tagged as 00 at its second index if desired address is
present.
So, a 2-bit tag is enough to identify the block of memory in this case.
• If the search is matched, this is a cache hit. But if the search does not match, then it is a
miss.
• Advantage: easy to implement
• Disadvantage: not flexible, contention problem
• The figure below represents a direct cache mapping.
Block 0 0 1 2 3
tag
Block 1 4 5 6 7
Line 0
tag
Block 2 8 9 10 11
Line 1 Block 3 12 13 14 15
tag Block 4 16 17 18 19
Line 2 Block 5 20 21 22 23
tag Block 6 24 25 26 27
Line 3 Block 7 28 29 30 31
Block 8 32 33 34 35
cache Block 9 36 37 38 39
Block 10 40 41 42 43
Block 11 44 45 46 47
Block 12 48 49 50 51
Block 13 52 53 54 55
Block 14 56 57 58 59
Block 15 60 61 62 63
Memory words in each block
Here, memory consists of 16 blocks of each containing 4 words in an array (say Block 0 has
words 0,1,2,3 and Block 1 has 4,5,6,7….Block 15 has 60,61,62,63). Similarly, each cache line
maps 4 blocks of memory.
2. Associative Mapping
• This is a much more flexible mapping technique.
• Any main memory block can be loaded to any cache line position.
• It implements many-to-many function.
• Each physical address generated by the CPU is viewed as two fields:
Tag no word
0001 01
This tells to look at cache line tagged as 0001 if it contains the word 01 of the block
0001 of the main memory.
In this case, a 4-bit tags are required to identify those 16 blocks of main memory.
The tag no represents the block no of the main memory.
• Here, we need to search for all the 16 tags from 0000 to 1111 to determine whether
a given block is in the cache or not. Therefore, the cost of implementation of
hardware (i.e comparator) will be higher.
• Advantage: flexibility
• Disadvantage: increased hardware cost, increased access time
• The figure below represents the associative mapping:
tag
Block 0 0 1 2 3
Line 0
tag Block 1 4 5 6 7
Line 1 Block 2 8 9 10 11
tag Block 3 12 13 14 15
Line 2
Block 4 16 17 18 19
tag
Block 5 20 21 22 23
Line 3
Block 6 24 25 26 27
cache
Block 7 28 29 30 31
Block 8 32 33 34 35
Block 9 36 37 38 39
Block 10 40 41 42 43
Block 11 44 45 46 47
Block 12 48 49 50 51
Block 13 52 53 54 55
Block 14 56 57 58 59
Block 15 60 61 62 63
memory Words in each block
3. Set-associative Mapping
• This is the combination of the direct and associative technique. Overcomes the
limitation of contention and increased hardware cost.
• In this case, blocks of the cache are grouped into sets and the mapping allows to
reside in any block of a particular set.
• It implements many-to-many function.
• Each physical address generated by the CPU is viewed as three fields:
This tells to look at cache line tagged as 00 and set 0 if it contains the word 01 of the
block 0001 of the main memory.
First, it compares set no 01 if it is present in SET 0 (i.e from 00 and 01) and then
looks for tag no 00 and finally looks for the index 2 of the cache line 1.
• In this case, a 4-bit tags are required to identify those 16 blocks of main memory.
The tag no represents the block no of the main memory.
Block 0 0 1 2 3
tag
Set 0 Line 0 Block 1 4 5 6 7
tag Block 2 8 9 10 11
Line 1
Block 3 12 13 14 15
tag
Set 1 Line 2
Block 4 16 17 18 19
tag Block 5 20 21 22 23
Line 3 Block 6 24 25 26 27
Block 7 28 29 30 31
cache Block 8 32 33 34 35
Block 9 36 37 38 39
Block 10 40 41 42 43
Block 11 44 45 46 47
Block 12 48 49 50 51
Block 13 52 53 54 55
Block 14 56 57 58 59
Block 15 60 61 62 63
memory Words in each block
Write Through:
In write-through, data is simultaneously
updated to cache and memory. This process
is simpler and more reliable. This is used
when there are no frequent writes to the
cache (The number of write operations is
less).
It helps in data recovery (In case of a power
outage or system failure). A data write will
experience latency (delay) as we have to
write to two locations (both Memory and
Cache). It Solves the inconsistency problem.
But it questions the advantage of having a
cache in write operation (As the whole point of using a cache was to avoid multiple access to the
main memory).
Write Back:
The data is updated only in the cache and
updated into the memory at a later time. Data
is updated in the memory only when the cache
line is ready to be replaced (cache line
replacement is done using Least Recently Used
Algorithm, FIFO, LIFO, and others depending on
the application).
Writes are only done to cache. There is an
UPDATE bit set (called Dirty bit) when there is a
write. Before a cache line is replaced, if
UPDATE bit is set, its content is written to main
memory. But the problem is that portions of
main memory are invalid for a certain period of
time. If the other devices access those
locations, they will get wrong contents.
Therefore, access to main memory by I/O modules can only be allowed through the cache. This
makes complex circuitry and potential bottleneck.
Memory Fragmentation:
In a computer system, storage unit processes and resources are continuously loaded and released
from memory, because of this; free memory space is broken into small pieces. This causes the
creation of small non-used inefficient memory spaces, which are so small that normal processes
cannot fit into that small memory block. This is causing system capacity or performance degradation.
This problem is known as fragmentation. The state of fragmentation depends on the system of
memory allocation. In most cases, memory space is wasted, which is known as memory
fragmentation.
Internal Fragmentation
At the point when a process is assigned to a memory block and if that process is small than the
requested memory space, it makes a vacant space in the assigned memory block. The difference
between assigned and requested memory space is then called internal fragmentation. Commonly,
the internal fragmentation of memory is isolated into fixed-sized blocks.
Solution: Memory ought to be partitioned into variable-size blocks and assigned to the most suitable
block for the memory requested process. In basic terms, internal fragmentation can be decreased by
effectively allocating the littlest partition but sufficient for the process. However, the issue will not
be solved completely but it can be reduced to some extent.
External Fragmentation
Typically, external fragmentation occurs in the case of dynamic or variable size segmentation.
Although the total space available in memory is sufficient to execute the process; however, this
memory space is not contiguous, which restricts process execution.
The solution to external fragmentation is condensation or alteration of memory contents. In this
technique all the contents of the memory are manipulated and all the free memory is put together
in a big block
Memory size needed often differs from size of readily available memories.
When available memory is larger, we can simply ignore unneeded high-order address bits and higher
data lines. When available memory is smaller, we need to compose several smaller memories into
one larger memory.
a) To increase the number of bits per word (or column size), simply connect the available
memories side by side as shown in figure.
b) To increase the number of words (or row size), simply connect the available memories top to
bottom as shown in figure below:
c) If the available memories have a smaller word width as well as fewer words than required,
we then combine the above two techniques:
• First creating the number of columns of memories necessary to achieve the needed
word width, and
• Then creating the number of columns of memories required to achieve the needed
word, and then creating the number of rows of magnitude necessary, along with a
decode to achieve the needed number of words as shown in figure below:
Steps:
Samples:
Now:
𝑆𝑖𝑧𝑒 𝑜𝑓 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑅𝑂𝑀 1𝐾 𝑥 32
• No of required given ROMs = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑔𝑖𝑣𝑒𝑛 𝑅𝑂𝑀
= 1𝐾 𝑥 8
= 4
• No of required address lines = log2 (no of words in reqd ROM) = log2 (1024 x 1) =10
Now:
𝑆𝑖𝑧𝑒 𝑜𝑓 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑅𝑂𝑀 4𝐾 𝑥 8
• No of required given ROMs = = =4
𝑠𝑖𝑧𝑒 𝑜𝑓 𝑔𝑖𝑣𝑒𝑛 𝑅𝑂𝑀 1𝐾 𝑥 8
• No of required address lines = log2 (no of words in reqd ROM) = log2 (1024 x 4) =12
Now:
𝑆𝑖𝑧𝑒 𝑜𝑓 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑅𝑂𝑀 8𝐾 𝑥 8
• No of required given ROMs = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑔𝑖𝑣𝑒𝑛 𝑅𝑂𝑀
= 1𝐾 𝑥 8
=8
• No of required address lines = log2 (no of words in reqd ROM) = log2 (1024 x 8) =13
Now:
𝑆𝑖𝑧𝑒 𝑜𝑓 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑅𝑂𝑀 2𝐾 𝑥 16
• No of required given ROMs = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑔𝑖𝑣𝑒𝑛 𝑅𝑂𝑀
= 1𝐾 𝑥 8
= 4
• No of required address lines = log2 (no of words in reqd ROM) = log2 (1024 x 2) =11
• Here both multiplicand and multiplier are different (i.e. 1K Vs 4K and 8 Vs 16), we
make both horizontal and vertical alignment of ROMs.
2𝐾
• No of rows = 1𝐾 = 2
16
• No of columns = 8
=2
Here;
First Part: 2K x n ------> 2k+1 x n
Now:
𝑆𝑖𝑧𝑒 𝑜𝑓 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑅𝑂𝑀 2𝑘+1 × 𝑛
• No of required given ROMs = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑔𝑖𝑣𝑒𝑛 𝑅𝑂𝑀
= 2𝑘 ×𝑛
= 2
• No of required address lines = log2 (no of words in reqd ROM) = log2 (2𝑘+1) =(K+1)
• No of required data lines = no of bits in a word = n
• Here multiplicand is different (i.e. 2k Vs 2k+1), we make vertical alignment of ROMs.
• Hence, the required memory is:
• No of required address lines = log2 (no of words in reqd ROM) = log2 (2𝑘 ) =k
• No of required data lines = no of bits in a word = 4n
• Here multiplier is different (i.e. n Vs 4n), we make horizontal alignment of ROMs.
Reference: https://www.youtube.com/watch?v=ldIpuFnFgZw
22 Embedded Systems © Er.Shiva Ram Dam,2022
3. MEMORY
Exam questions.
***