0% found this document useful (0 votes)
267 views62 pages

Memory Unit Bindu Agarwalla

The document discusses memory unit concepts including: - Main memory can address up to 2k locations with a k-bit address bus and n-bit data bus. - Cache memory and virtual memory are techniques to increase effective memory size and speed. - Memory is organized in chips with cells addressed by word and address lines. SRAM uses 6 transistors per bit while DRAM uses 1 transistor and capacitor. - DRAM must be periodically refreshed to restore charge in capacitors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
267 views62 pages

Memory Unit Bindu Agarwalla

The document discusses memory unit concepts including: - Main memory can address up to 2k locations with a k-bit address bus and n-bit data bus. - Cache memory and virtual memory are techniques to increase effective memory size and speed. - Memory is organized in chips with cells addressed by word and address lines. SRAM uses 6 transistors per bit while DRAM uses 1 transistor and capacitor. - DRAM must be periodically refreshed to restore charge in capacitors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

Memory Unit

Bindu Agarwalla
Some Basic concepts

Maximum size of the Main Memory

Byte-addressable

CPU-Main Memory Connection

Processor Memory
k-bit
address bus
MAR
n-bit
data bus Up to 2kaddressable
MDR locations

Word length = n bits

Control lines
( R / W , MFC, etc.)
Some Basic concepts
 Measures for the speed of a memory:
 memory access time.
 memory cycle time.

 An important design issue is to provide a computer system with as large


and fast a memory as possible, within a given cost target.

 Several techniques to increase the effective size and speed of the memory:
 Cache memory (to increase the effective speed).
 Virtual memory (to increase the effective size).
Internal organization of memory chips
Internal organization of memory chips
Each memory cell can hold one bit of information.

Memory cells are organized in the form of an array.

One row is one memory word.

All cells of a row are connected to a common line, known as the “word
line”.

Word line is connected to the address decoder.

Sense/write circuits are connected to the data input/output lines of the


memory chip.
No of external pins required to connect
a memory chip
For 16 x 8 chip

4 address lines
8 data lines
2 (R/W +CS)
2 (Power Supply+GND )
16 in Total

For 128 x 8 chip

For 1K x 1 chip

For 64 x 16 chip
Internal organization of 1K x 1
memory chips
Implementation of a SRAM Cell

Two transistor inverters are cross connected to implement a basic flip-flop.

The cell is connected to one word line and two bits lines by transistors T1 and
T2.

When word line is at ground level, the transistors are turned off and the latch
retains its state.
Implementation of a SRAM Cell

Read operation:
1. In order to read state of SRAM cell, the word line is activated to close switches T1
and T2. Sense/Write circuits at the bottom monitor the state of b and b’

2. Sense/Write circuits at the bottom monitor the state of b and b’ and set the output
accordingly.

3. If the cell is in state 1, the signal on bit line b is high and the signal on bit line b’ is
low.
4. The opposite is true if the cell is in state 0.
Implementation of a SRAM Cell

Write operation:
1. The state of the cell is set by placing the appropriate value on bit line b and its
complement on b’, and then activating the word line.

2. This forces the cell into the corresponding state.

3. The required signals on the bit lines are generated by the Sense/Write ckt.
Implementation of a SRAM Cell

Static RAM (SRAM) for Cache

Requires 6 transistors per bit.

Requires low power to retain bit.


Implementation of a DRAM Cell
Dynamic RAM (DRAM): slow, cheap, and dense memory.

Typical choice for main memory.


Word line
Cell Implementation:
1-Transistor cell (pass transistor)
Pass
Trench capacitor (stores bit) Transistor

Bit is stored as a charge on capacitor. Capacitor

Must be refreshed periodically


bit
Because of leakage of charge from tiny capacitor.
Typical DRAM cell
Refreshing for all memory rows
Reading each row and writing it back to restore the charge.
SRAM Vs DRAM Cell
Static RAMs (SRAMs):
Consist of circuits that are capable of retaining their state as long as the power is
applied.

Volatile memories, because their contents are lost when power is interrupted.

Access times of static RAMs are in the range of few nanoseconds.


Requires low power to retain bit. As power is consumed, only when the cell is
accessed.

However, the cost is usually high.

Dynamic RAMs (DRAMs):

Do not retain their state indefinitely.


Contents must be periodically refreshed.

Contents are be refreshed while accessing them for reading.


DRAM Refresh Cycles
Refresh cycle is about tens of milliseconds.

Refreshing is done for the entire memory.

Each row is read and written back to restore the charge.

Some of the memory bandwidth is lost to refresh cycles.

Voltage 1 Written Refreshed Refreshed Refreshed


for 1

Threshold
voltage

0 Stored Refresh Cycle


Voltage Time
for 0
2M X 8 Memory Design
Each row can store 512 bytes. 12
bits to select a row, and 9 bits to
select a group in a row. Total of 21
bits.
First apply the row address, RAS
signal latches the row address.

Then apply the column address,


CAS signal latches the address.

Timing of the memory unit is


controlled by a specialized unit
which generates RAS and CAS.

This is asynchronous DRAM.


Burst Mode Operation
Block Transfer
Row address is latched and decoded.

A read operation causes all cells in a selected row to be read.

Selected row is latched internally inside the SDRAM chip

Column address is latched and decoded.

Selected column data is placed in the data output register.

Column address is incremented automatically.

Multiple data items are read depending on the block length.

Fast transfer of blocks between memory and cache.

Fast transfer of pages between memory and disk.


SDRAM and DDR SDRAM
SDRAM is Synchronous Dynamic RAM
Added clock to DRAM interface

SDRAM is synchronous with the system clock


Older DRAM technologies were asynchronous

DDR is Double Data Rate SDRAM


Like SDRAM, DDR is synchronous with the system clock, but the difference is
that DDR reads data on both the rising and falling edges of the clock signal.
2M X 32 Using 512K X 8 Chips
2M X 32 Using 512K X 8 Chips
Step 1: Find out , how many smaller size chips are required to meet the
required size:
Divide the total required size/ Size of the smaller chip
4M X 32/ 512K x 8= 222 X 25 / 219 X 23 = 227/ 222= 25= 32 Chips
Step2 : Find, how many smaller size chips need to connect in paralell to meet the
required data size.[ finding the no of columns in the matrix arrangement.

Here, in 4M x 32, 32 bits of data is required, and in 512K x 8 chip, 8 bits of


data can be communicated from one location.

So, 4, 512K x 8 chips need to be connected in parallel, to meet the 32


bits data size.

i.e., we need to connect 512K x 8 chips in a matrix form., where no of columns will
be= size of one location in the bigger size/ size of one location in the smaller size.

No of columns = 32/ 8= 4.
2M X 32 Using 512K X 8 Chips
Step 3: Find out the no of rows:

No of rows x No of Columns = Total no of elements

#rows x 4 =32
#rows =32/8=4

How to connect the address lines:

For 2M x 32 , memory, 21(4M=221) address lines are required, and for 512K x 8,
19(512K=219) address lines are required. So, out of 21 address lines, the 1st 19 lines
will be connected to all the 512K x 8 memory chips.

Then to select a row, out of 4 rows of 512K x 8 memory chips, the higher order 2
address lines (out of 21 address lines) are connected to a decoder, and the output of
the decoder will select a particular row. From the selected row, 4 chips of 512K x 8,
will give/ take 8 bits of data each, meeting the required size of 32 bits.
Typical Memory Hierarchy
Registers are at the top of the hierarchy
Microprocessor
Typical size < 1 KB
Access time < 0.5 ns Registers

Level 1 Cache (8 – 64 KB) L1 Cache


Access time: 1 ns
L2 Cache

Bigger
Faster
L2 Cache (512KB – 8MB)
Memory Bus
Access time: 3 – 10 ns
Main Memory
Main Memory (4 – 16 GB) I/O Bus
Access time: 50 – 100 ns
Magnetic or Flash Disk

Disk Storage (> 200 GB)


Access time: 5 – 10 ms
Why Cache Memory is required?

Processor is much faster than the main memory.


 As a result, the processor has to spend much of its time waiting while instructions
and data are being fetched from the main memory.
 Major obstacle towards achieving good performance.

Speed of the main memory cannot be increased beyond a certain point.

Cache memory is an architectural arrangement which makes the main memory appear
faster to the processor than it really is.

Cache memory is based on the property of computer programs known as “locality


of reference”.
Why Cache Memory has become possible?
Analysis of programs indicates that many instructions in localized areas of a program
are executed repeatedly during some period of time, while the others are accessed
relatively less frequently.

These instructions may be the ones in a loop, nested loop or few procedures calling each
other repeatedly.
This is called “locality of reference”.

 Temporal locality of reference:


 Recently executed instruction is likely to be executed again very soon.
 If an instruction is executed at time instant t, then most likely the same
instruction will be executed at time instant t + ∆t.

 Spatial locality of reference:


 Instructions with addresses close to a recently instruction are likely to be
executed soon.
 If an instruction at address ‘i’ is executed at time instant t, then most likely the s
instruction at address ‘i+1’ will be executed at time instant t + ∆t.
What is a Cache Memory ?
Small and fast (SRAM) memory technology
Stores the subset of instructions & data currently being accessed.

Used to reduce average access time to memory.

Caches exploit temporal locality by …


Keeping recently accessed data closer to the processor.

Caches exploit spatial locality by …


Moving blocks consisting of multiple contiguous words.

Goal is to achieve
Fast speed of cache memory access.
Balance the cost of the memory system.
The Basics of Caches

Main
Processor Cache memory

Processor issues a Read request, a block of words is transferred from the main
memory to the cache, one word at a time.

Subsequent references to the data in this block of words are found in the cache.

At any given time, only some blocks in the main memory are held in the cache.
Which blocks in the main memory are in the cache is determined by a “mapping
function”.

• When the cache is full, and a block of words needs to be transferred from the main
memory, some block of words in the cache must be replaced. This is determined by a
“replacement algorithm”.
The Basics of Caches: Cache Hit
Existence of a cache is transparent to the processor. The processor issues Read and
Write requests in the same manner.

If the data is in the cache it is called a Read or Write hit.

Read hit:
 The data is obtained from the cache.

Write hit:
 Cache has a replica of the contents of the main memory.
 Contents of the cache and the main memory may be updated simultaneously.
This is the write-through protocol.
 Update the contents of the cache, and mark it as updated by setting a bit known as
the dirty bit or modified bit. The contents of the main memory are updated when
this block is replaced. This is write-back or copy-back protocol.
The Basics of Caches: Cache Miss
If the data is not present in the cache, then a Read miss or Write miss occurs.

Read Miss:
 Block of words containing this requested word is transferred from the memory.

 After the block is transferred, the desired word is forwarded to the processor.

 The desired word may also be forwarded to the processor as soon as it is


transferred without waiting for the entire block to be transferred. This is called
load-through or early-restart.
The Basics of Caches: Cache Miss
What happens on a write miss?

Write Allocate:
Allocate new block in cache.
Write miss acts like a read miss, block is fetched and updated.

No Write Allocate:
Send data to lower-level memory.
Cache is not modified.

Typically, write back caches use write allocate


Hoping subsequent writes will be captured in the cache

Write-through caches often use no-write allocate


Reasoning: writes must still go to lower level memory
Mapping functions
 Mapping functions determine how memory blocks are placed in the cache.

A simple processor example:

 Cache consisting of 128 blocks of 16 words each.


 Total size of cache is 2048 (2K) words.
 Main memory is addressable by a 16-bit address.
 Main memory has 64K words.
 Main memory has 4K blocks of 16 words each.

 Three mapping functions:


 Direct mapping
 Associative mapping
 Set-associative mapping.
Block Placement: Direct Mapped
• Block: unit of data transfer between cache and
memory.
Direct Mapped Cache:
A block can be placed in exactly one location in the cache

000
001
010

100
101
011

110
111
In this example:

Cache
Cache index =
least significant 3 bits of
Memory address

Memory
Main
00101

10101
00000
00001
00010

00100

01000
01001
01010

10000
10001
10010

10100
01100
01101

10011
00011

00110

01011

10110

11000
11001
11010
01110

11100
11101
00111

10111

11011
01111

11110
11111
Direct-Mapped Cache
A memory address is divided into
Block address: identifies block in memory
Block Address
Block offset: to access bytes within a block
Tag Index offset

A block address is further divided into


V Tag Block Data
– Index: used for direct cache access
– Tag: most-significant bits of block address
Index = Block Address mod Cache Blocks

Tag must be stored also inside cache


For block identification
=
A valid bit is also required to indicate
Data
Whether a cache block is valid or not
Hit
Direct-Mapped Cache
Address Length =s+w bits
Block Address
No of Addressable units=2s+w words or bytes
Tag Index offset

Block size= line size


=2w words or bytes V Tag Block Data

No of blocks in mm=2(s+w)/2w
Number of lines in cache= m=2r

size of cache=2r+w words or bytes

size of tag=(s-r)bits =2s/2r


=
=
no of blocks in mm/ no of blocks in
cache memory Data
Hit
Direct Mapped Cache – cont’d

• Cache hit: block is stored inside cache Block Address

– Index is used to access cache block Tag Index offset

– Address tag is compared against stored tag


V Tag Block Data
– If equal and cache block is valid then hit
– Otherwise: cache miss
• If number of cache blocks is 2n
– n bits are used for the cache index
• If number of bytes in a block is 2b =

Data
– b bits are used for the block offset Hit

• If 32 bits are used for an address


Direct Mapping
Block j of the main memory maps to

j modulo 128

So, 0 maps to 0, 129 maps to 1.

More than one memory block is mapped onto


the same position in the cache.

May lead to contention for cache blocks even if


the cache is not full.

Resolve the contention by allowing new block to


replace the old block, leading to a trivial
replacement algorithm.
Direct Mapping
Memory address is divided into three fields:

1. Low order 4 bits determine one of the 16


words in a block.

2. When a new block is brought into the cache,


the the next 7 bits determine which cache
block this new block is placed in.

3. High order 5 bits determine which of the


possible 32 blocks is currently present in the
cache. These are tag bits.

Simple to implement but not very flexible.


Problem
A computer system uses 16-bit memory addresses. It has a 2K-byte cache organized in a
direct-mapped manner with 64 bytes per cache block. Assume that the size of each memory
word is 1 byte.

(a)Calculate the number of bits in each of the Tag, Block, and Word fields of the memory
address.
(b)When a program is executed, the processor reads data sequentially from the following word
addresses:

128, 144, 2176, 2180, 128, 2176

All the above addresses are shown in decimal values. Assume that the cache is initially empty.
For each of the above addresses, indicate whether the cache access will result in a hit or a miss.
Problem: Solution
Block size = 64 bytes = 26 bytes = 26 words (since 1 word = 1 byte)
Therefore, Number of bits in the Word field = 6

Cache size = 2K-byte = 211 bytes


Number of cache blocks = Cache size / Block size = 2 11/26 = 25
Therefore, Number of bits in the Block field = 5

Total number of address bits = 16


Therefore, Number of bits in the Tag field = 16 - 6 - 5 = 5

For a given 16-bit address, the 5 most significant bits, represent the Tag, the next 5 bits
represent the Block, and the 6 least significant bits represent the Word.
Problem: Solution
b) The cache is initially empty. Therefore, all the cache blocks are invalid.

Access # 1:
Address = (128)10 = (0000000010000000)2
(Note: Address is shown as a 16-bit number, because the computer uses 16-bit addresses)

For this address, Tag = 00000, Block = 00010, Word = 000000

Since the cache is empty before this access, this will be a cache miss

After this access, Tag field for cache block 00010 is set to 00000
Problem: Solution

Access # 2:
Address = (144)10 = (0000000010010000)2

For this address, Tag = 00000, Block = 00010, Word = 010000

Since tag field for cache block 00010 is 00000 before this access, this will be a cache hit
(because address tag = block tag)
Problem: Solution
Access # 3:
Address = (2176)10 = (0000100010000000)2

For this address, Tag = 00001, Block = 00010, Word = 000000

Since tag field for cache block 00010 is 00000 before this access, this will be a cache
miss (address tag ≠ block tag)
After this access, Tag field for cache block 00010 is set to 00001

Access # 4:
Address = (2180)10 = (0000100010000100)2

For this address, Tag = 00001, Block = 00010, Word = 000100

Since tag field for cache block 00010 is 00001 before this access, this will be a cache hit
(address tag = block tag)
Problem: Solution
Access # 5:
Address = (128)10 = (0000000010000000)2

For this address, Tag = 00000, Block = 00010, Word = 000000

Since tag field for cache block 00010 is 00001 before this access, this will be a cache miss
(address tag ≠ block tag)
After this access, Tag field for cache block 00010 is set to 00000

Access # 6:
Address = (2176)10 = (0000100010000000)2

For this address, Tag = 00001, Block = 00010, Word = 000000

Since tag field for cache block 00010 is 00001 before this access, this will be a cache
miss (address tag ≠ block tag)
After this access, Tag field for cache block 00010 is set to 00001
Example on Cache Placement &
Misses

• Consider a small direct-mapped cache with 32


blocks
– Cache is initially 23empty,
5 Block
4 size = 16 bytes
Tag Index offset
– The following memory addresses (in decimal) are
referenced:
1000, 1004, 1008, 2548, 2552, 2556.
– Map addresses to cache blocks and indicate whether
hit or miss
• Solution:
– 1000 = 0x3E8 cache index = 0x1E Miss (first
Fully Associative Cache
• A block can be placed anywhere in cache 
no indexing
• If m blocks exist then Address
Tag offset

– m comparators are needed to match tag


– VCache
Tag Block Data
data size = m  2 bytes
V Tag Block Data b V Tag Block Data V Tag Block Data

= = = =

mux
m-way associative Data
Hit
Set-Associative Cache
• A set is a group of blocks that can be indexed.
• A block is first mapped onto a set
– Set index = Block No(in mm) mod Number of sets in
cache
• If there are m blocks in a set (m-way set
associative) then
– m tags are checked in parallel using m comparators
• If 2n sets exist then set index consists of n bits
• Cache data size = m  2n+b bytes (with 2b bytes
Set-Associative Cache Diagram
Address Tag Index offset

V Tag Block Data V Tag Block Data V Tag Block Data V Tag Block Data

= = = =

mux
m-way set-associative Hit
Data
Problem
A computer system uses 16-bit memory addresses. It has a 2K-byte cache organized in a 2-
wat set associative manner with 64 bytes per cache block. Assume that the size of each memory
word is 1 byte.

(a)Calculate the number of bits in each of the Tag, set, and Word fields of the memory address.
(b)When a program is executed, the processor reads data sequentially from the following word
addresses:

128, 144, 2176, 2180, 128, 2176

All the above addresses are shown in decimal values. Assume that the cache is initially empty.
For each of the above addresses, indicate whether the cache access will result in a hit or a miss.
Problem: Solution
Block size = 64 bytes = 26 bytes = 26 words (since 1 word = 1 byte)
Therefore, Number of bits in the Word field = 6

Cache size = 2K-byte = 211 bytes


Number of cache blocks = Cache size / Block size = 2 11/26 = 25
Number of cache sets = total no of Cache blocks / way no = 2 5/2 = 24

Therefore, Number of bits in the Set field = 4

Total number of address bits = 16


Therefore, Number of bits in the Tag field = 16 - 6 - 4 = 6

For a given 16-bit address, the 6 most significant bits, represent the Tag, the next 4 bits
represent the Set, and the 6 least significant bits represent the Word.
Problem: Solution
b) The cache is initially empty. Therefore, all the cache blocks are invalid.

Access # 1:
Address = (128)10 = (0000000010000000)2
(Note: Address is shown as a 16-bit number, because the computer uses 16-bit addresses)

For this address, Tag = 000000, Set = 0010, Word = 000000

Since the cache is empty before this access, this will be a cache miss

After this access, Tag field for the first block in the cache set 0010 is set to 000000
Problem: Solution

Access # 2:
Address = (144)10 = (0000000010010000)2

For this address, Tag = 000000, Set = 0010, Word = 010000

Since tag field for the first cache block in the set 0010 is 00000 before this access, this
will be a cache hit (because address tag = block tag)
Problem: Solution
Access # 3:
Address = (2176)10 = (0000100010000000)2

For this address, Tag = 000010, Set = 0010, Word = 000000

The tag field for this address does not match the tag field for the first block in set 0010.
The second block in set 0010 is empty. Therefore, this access will be a cache miss.
After this access, Tag field for the second block in set 0010 is set to 000010

Access # 4:
Address = (2180)10 = (0000100010000100)2

For this address, Tag = 000010, Set = 0010, Word = 000100

Since tag field for the 2nd cache block in the set 0010 is 00001 before this access, this
will be a cache hit (address tag = block tag)
Problem: Solution
Access # 5:
Address = (128)10 = (0000000010000000)2

For this address, Tag = 000000, Set = 0010, Word = 000000

The tag field for this address matches the tag field for the first block in set 0010. Therefore,
this access will be a cache hit.

Access # 6:
Address = (2176)10 = (0000100010000000)2

For this address, Tag = 000010, Set = 0010, Word = 000000

The tag field for this address matches the tag field for the second block in set 0010.
Therefore, this access will be a cache hit.
Numericals on Mapping
Cache/Memory Layout: A computer has an 8 GByte memory with 64 bit word sizes.
Each block of memory stores 16 words. The computer has a direct-mapped cache of 128
blocks. The computer uses word level addressing. What is the address format? If we
change the cache to a 4-way set associative cache, what is the new address format?
Numericals on Mapping
Direct Mapping Question: Assume a computer has 32 bit addresses. Each block stores 16
words. A direct-mapped cache has 256 blocks. In which block (line) of the cache would we
look for each of the following addresses? Addresses are given in hexadecimal for
convenience.
a. 1A2BC012 b. FFFF00FF c. 12345678 d. C109D532
Numericals on Mapping
A two-way set associative cache memory uses block of 4 words. The
cache can have a total of 2048 words from main memory. The main
memory size is 128K X 32.
i)Draw the format of main memory address.
ii)What is the size of cache with tag bits
Numericals on Mapping
A cache consists of a total of 128 blocks. The main memory contains 2K
blocks, each consisting of 32 words.
( I )How many bits are there in each of the TAG, BLOCK and WORD
field in case of direct mapping?
( ii )How many bits are there in each of the TAG, SET, and WORD
field in case of 4-way set-associative mapping?
Numericals
Design a 4M X 32 bits memory using 512KX8 bits memory chip.

How many external connections are required to design 32MX32 memory chip?
Numericals
A computer employs RAM chips of 256X 8 and ROM chips of 1024X8. The
computer system needs 2K bytes of RAM and 4K bytes of ROM. Design the
memory module of above configuration and interface with CPU.

A computer uses RAM chips of 256X4 capacity. Design a memory capacity of 1KB
by using available chip.
Thank You

You might also like