0% found this document useful (0 votes)

20 views65 pages

IT3030E CA Chap6 Memory

The document discusses memory hierarchy and caching techniques. It describes how memory can be organized into multiple levels, with faster but smaller memory closer to the processor. The principles of locality are exploited by bringing frequently used data and instructions closer to the CPU. Caching uses a small fast memory (cache) between the CPU and main memory. If a memory access hits in the cache it is faster, while a miss requires loading a block from main memory into cache with penalty. Direct mapping is introduced as a basic caching technique.

Uploaded by

htdat181203

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views65 pages

IT3030E CA Chap6 Memory

Uploaded by

htdat181203

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Chapter 6: Memory

Ngo Lam Trung, Pham Ngoc Hung

[with materials from Computer Organization and Design, MK

and M.J. Irwin’s presentation, PSU 2008]

IT3030E, Fall 2022 1

Content
❑ Memory hierarchy
❑ Principal of locality
❑ Cache
❑ Virtual memory

IT3030E, Fall 2022 2

Memory
❑ Memory: where data are stored.

Why is memory critical to performance?

IT3030E, Fall 2022 3
Memory technology (2012)
❑ Static RAM (SRAM)
0.5ns – 2.5ns, $500 – $1000 per GB

❑ Dynamic RAM (DRAM)

50ns – 70ns, $10 – $20 per GB

❑ Flash memory
5,000ns – 50,000ns, $0.75 – $1 per GB

❑ Magnetic memory
5,000,000ns – 20,000,000ns, $0.05 – $0.1 per GB

❑ Fact:
Large memories are slow
Fast memories are small (and expensive)

IT3030E, Fall 2022 4

A Typical Memory Hierarchy

On-Chip Components
Control

Cache Cache
Secondary

Instr Data
Second
ITLB DTLB Level Main Memory
Memory (Disk)
RegFile

Datapath Cache
(SRAM) (DRAM)

Speed (%cycles): ½’s 1’s 10’s 100’s 10,000’s

Size (bytes): 100’s 10K’s M’s G’s T’s
Cost: highest lowest

❑ How to get an ideal memory

❑ As fast as SRAM
❑ As cheap as disk?
IT3030E, Fall 2022 5
The Memory Hierarchy: Locality Principal
❑ C program

int x[1000], temp; Data memory at location of

for (i = 0; i < 999; i++) temp and x are accessed
for (j = i+1; j < 1000; j++)
multiple times
if (x[i] < x[j])
{ Instruction memory at
temp = x[i]; location of the two for
x[i] = x[j]; loops are used repeatedly
x[j] = temp;
}

IT3030E, Fall 2022 6

The Memory Hierarchy: Locality Principal

❑ Temporal Locality (locality in time)

If a memory location is referenced then it will tend to be
referenced again soon
 Keep most recently accessed data items closer to the processor

❑ Spatial Locality (locality in space)

If a memory location is referenced, the locations with nearby
addresses will tend to be referenced soon
 Move blocks consisting of contiguous words closer to the
processor

IT3030E, Fall 2022 7

Hierarchical memory access

❑ Data are stored in multiple levels.

High level: fast but small
Low level: slow but large

❑ Data are transferred in units of

block (of multiple words) between
levels, through the hierarchy.
❑ Frequently used data are stored
closer to processor.

IT3030E, Fall 2022 8

Hierarchical memory access

❑ Associative data access:

Processor access data in lower level
Data transfer from lower level to
processor via upper level(s)

❑ If accessed data is present in

upper level
Hit: access satisfied by upper level
- Hit ratio: hits/accesses

❑ If accessed data is absent

Miss: block copied from lower level
- Time taken: miss penalty
- Miss ratio: misses/accesses
= 1 – hit ratio
Then accessed data supplied from
IT3030E, Fall 2022
upper level 9
The Memory Hierarchy: Terminology
❑ Hit: data is in some block in the upper level (Blk X)
Hit Rate: fraction of memory accesses found in upper level
Hit Time: Time to access the upper level which consists of
- RAM access time + Time to determine hit/miss
Lower Level
To Processor Upper Level Memory
Memory
Blk X
From Processor Blk Y

❑ Miss: data is not in the upper level so needs to be retrieve from a

block in the lower level (Blk Y)
Miss Rate = 1 - (Hit Rate)
Miss Penalty: Time to bring in a block from the lower level and replace a
block in the upper level with it + Time to deliver the block to the
processor
Hit Time << Miss Penalty
IT3030E, Fall 2022 10
Cache
❑ The memory hierarchy between the processor and main
memory
CPU fetch instructions and data from cache, if found (cache hit)
→ fast access.
If not found (cache miss) → load a block from main memory into
cache, then access in cache → slower access time (miss penalty)

CPU Main
Cache Blocks of data memory

Instruction fetch
Memory read/write

IT3030E, Fall 2022 11

Cache Basics
❑CPU needs to access a data item in memory
➔Two questions to answer (in hardware):
Q1: How do CPU know if the data item is in the cache?
Q2: If it is, how does CPU find it?

❑ Direct mapped
Each memory block is mapped to exactly one block in the cache
- lots of lower level blocks must share blocks in the cache
Address mapping (to answer Q2):
(block address) modulo (# of blocks in the cache)
The tag field: associated with each cache block that contains
the address information (the upper portion of the address)
required to identify the block (to answer Q1)
The valid bit: if there is data in the block or not

IT3030E, Fall 2022 12

Caching: A Simple First Example
Main Memory
0000xx
Cache 0001xx One word blocks
0010xx Two low order bits
Index Valid Tag Data define the byte in the
0011xx
word (32b words)
00 0100xx
01 0101xx
10 0110xx
11 0111xx
1000xx Q2: How does CPU
1001xx find it?
1010xx
Q1: Is it there? 1011xx Use next 2 low order
1100xx memory address bits
Compare the cache 1101xx – the index – to
tag to the high order 2 1110xx determine which
memory address bits to 1111xx cache block (i.e.,
tell if the memory block modulo the number of
is in the cache blocks in the cache)

IT3030E, Fall 2022

(block address) modulo (# of blocks in the cache) 13
Caching: A Simple First Example
Main Memory
0000xx
Cache 0001xx One word blocks
0010xx Two low order bits
Index Valid Tag Data define the byte in the
0011xx
word (32b words)
00 0100xx
01 0101xx
10 0110xx
11 0111xx
1000xx Q2: How do we find it?
1001xx
1010xx Use next 2 low order
Q1: Is it there? 1011xx memory address bits
1100xx – the index – to
Compare the cache 1101xx determine which
tag to the high order 2 1110xx cache block (i.e.,
memory address bits to 1111xx modulo the number of
tell if the memory block blocks in the cache)
is in the cache
(block address) modulo (# of blocks in the cache)
IT3030E, Fall 2022 14
Direct Mapped Cache
❑ Consider the main memory word reference string
Start with an empty cache - all 0 1 2 3 4 3 4 15
blocks initially marked as not valid

0 1 2 3

4 3 4 15

IT3030E, Fall 2022 15

Direct Mapped Cache
❑ Consider the main memory word reference string
Start with an empty cache - all 0 1 2 3 4 3 4 15
blocks initially marked as not valid

0 miss 1 miss 2 miss 3 miss

00 Mem(0) 00 Mem(0) 00 Mem(0) 00 Mem(0)
00 Mem(1) 00 Mem(1) 00 Mem(1)
00 Mem(2) 00 Mem(2)
00 Mem(3)

4 miss 3 hit 4 hit 15 miss

01 4
00 Mem(0) 01 Mem(4) 01 Mem(4) 01 Mem(4)
00 Mem(1) 00 Mem(1) 00 Mem(1) 00 Mem(1)
00 Mem(2) 00 Mem(2) 00 Mem(2) 00 Mem(2)
00 Mem(3) 00 Mem(3) 00 Mem(3) 11 00 Mem(3)
15
8 requests, 6 misses
What if we repeatedly request 1,000,000 times
IT3030E, Fall 2022 16
Cache performance
❑ Given a MIPS CPU running a program with the miss rate of
instruction cache is 2% and the miss rate of data cache is 4%. The
processor has CPI of 2 without any memory stalls and the miss
penalty is 100 cycles for all misses
❑ Determine how much faster that processor would run with a perfect
cache that never missed. Assume the frequency of all loads and
stores is 36%.
❑ Solution:

IT3030E, Fall 2022 17

Cache performance
❑ Given a MIPS CPU running a program with the miss rate of instruction
cache is 2% and the miss rate of data cache is 4%. The processor has
CPI of 2 without any memory stalls and the miss penalty is 100 cycles
for all misses.
❑ Determine how much faster that processor would run with a perfect
cache that never missed. Assume the frequency of all loads and stores
is 36%.
❑ Solution:
❑ Given instruction count I
Instruction miss cycles = I * 2% * 100 = 2.00 * I
Data miss cycles = I * 36% * 4% * 100 = 1.44 * I

❑ Total mem-stall cycles: 2.00 I + 1.44 I = 3.44 I.

IT3030E, Fall 2022 18

Cache performance
❑ Given a MIPS CPU running a program with the miss rate of instruction
cache is 2% and the miss rate of data cache is 4%. The processor has
CPI of 2 without any memory stalls and the miss penalty is 100 cycles for
all misses
❑ Determine how much faster that processor would run with a perfect
cache that never missed. Assume the frequency of all loads and stores is
36%.
❑ What is the speed up if the CPU now has faster CPI of 1 (instead of 2)?

IT3030E, Fall 2022 19

MIPS Direct Mapped Cache Example
❑ One-word blocks, cache size = 1K words (or 4KB)
Byte
31 30 ... 13 12 11 ... 2 1 0
offset

Tag 20 10 Data
Hit
Index
Index Valid Tag Data
0
1
2
.
.
.
1021
1022
1023
20 32

What kind of locality are we taking advantage of?

IT3030E, Fall 2022 20
MIPS Direct Mapped Cache Example
❑ One-word blocks, cache size = 1K words (or 4KB)
Byte
31 30 ... 13 12 11 ... 2 1 0
offset

Tag 20 10 Data
Hit
Index
Index Valid Tag Data
0
1
2
.
.
.
1021
1022
1023
20 32

Calculate the total size of this cache in Kilobits

IT3030E, Fall 2022 21
Exercise
❑ How many total bits are required for a direct-mapped
cache with 16 KiB of data and 1-word blocks, assuming a
32-bit address?

IT3030E, Fall 2022 22

Multiword Block Direct Mapped Cache
❑ Four words/block, cache size = 1K words
31 30 . . . 13 12 11 ... 4 32 10
Byte
Hit offset Data

Tag 20 8 Block offset

Index

Index Valid Tag Data

0
1
2
.
.
.
253
254
255
20

What kind of locality are we taking advantage of?

IT3030E, Fall 2022 23
Taking Advantage of Spatial Locality
❑ Let cache block hold more than one word
Start with an empty cache - all 0 1 2 3 4 3 4 15
blocks initially marked as not valid

0 1 2

3 4 3

4 15

IT3030E, Fall 2022 24

Taking Advantage of Spatial Locality
❑ Let cache block hold more than one word
Start with an empty cache - all 0 1 2 3 4 3 4 15
blocks initially marked as not valid

0 miss 1 hit 2 miss

00 Mem(1) Mem(0) 00 Mem(1) Mem(0) 00 Mem(1) Mem(0)
00 Mem(3) Mem(2)

3 hit 4 miss 3 hit

01 5 4
00 Mem(1) Mem(0) 00 Mem(1) Mem(0) 01 Mem(5) Mem(4)
00 Mem(3) Mem(2) 00 Mem(3) Mem(2) 00 Mem(3) Mem(2)

4 hit 15 miss
01 Mem(5) Mem(4) 1101 Mem(5) Mem(4)
15 14
00 Mem(3) Mem(2) 00 Mem(3) Mem(2)

8 requests, 4 misses

IT3030E, Fall 2022 25

Miss Rate vs Block Size vs Cache Size

8 KB
Miss rate (%)

16 KB
64 KB
256 KB

Block size (bytes)

❑ Miss rate goes up if the block size becomes a significant

fraction of the cache size because the number of blocks
that can be held in the same size cache is smaller
(increasing capacity misses)
IT3030E, Fall 2022 26
Cache Field Sizes
❑ The number of bits in a cache includes both the storage
for data and for the tags
32-bit byte address
For a direct mapped cache with 2n blocks, n bits are used for the
index
For a block size of 2m words (2m+2 bytes), m bits are used to
address the word within the block and 2 bits are used to address
the byte within the word
❑ What is the size of the tag field?
❑ The total number of bits in a direct-mapped cache is then
2n x (block size + tag field size + valid field size)
❑ How many total bits are required for a direct mapped
cache with 16KB of data and 4-word blocks assuming a
32-bit address?
IT3030E, Fall 2022 27
Exercise
❑ How many total bits are required for a direct-mapped
cache with 16 KiB of data and 4-word blocks, assuming a
32-bit address?

IT3030E, Fall 2022 28

Sources of Cache Misses
❑ Compulsory (cold start, first reference):
First access to a block.
We cannot do much on this.
Solution: increase block size (but also increases miss penalty).

❑ Capacity:
Cache cannot contain all blocks accessed by the program
Solution: increase cache size (may increase access time)

❑ Conflict (collision):
Multiple memory locations mapped to the same cache location
Solution 1: increase cache size
Solution 2: increase associativity (may increase access time)

IT3030E, Fall 2022 29

Reducing Cache Miss Rates #1
➔Allow more flexible block placement
❑ Direct mapped cache: a memory block maps to exactly
one cache block
❑ Fully associative cache allow a memory block to be
mapped to any cache block
❑ A compromise is to divide the cache into sets each of
which consists of n “ways” (n-way set associative). A
memory block maps to a unique set (specified by the
index field) and can be placed in any way of that set (so
there are n choices)
(block address) modulo (# sets in the cache)

IT3030E, Fall 2022 30

Another Reference String Mapping
❑ Consider the main memory word reference string
Start with an empty cache - all 0 4 0 4 0 4 0 4
blocks initially marked as not valid

0 4 0 4

IT3030E, Fall 2022 31

Another Reference String Mapping
❑ Consider the main memory word reference string
Start with an empty cache - all 0 4 0 4 0 4 0 4
blocks initially marked as not valid

0 miss 4 miss 0 miss 4 miss

01 4 00 0 01
00 Mem(0) 00 Mem(0) 01 Mem(4) 00 Mem(0)4

0 miss 4 miss 0 miss 4 miss

00 01 4 00 0 01
0 4
01 Mem(4) 00 Mem(0) 01 Mem(4) 00 Mem(0)

8 requests, 8 misses
❑ Ping pong effect due to conflict misses - two memory
locations that map into the same cache block
IT3030E, Fall 2022 32
Set Associative Cache Example
Main Memory
0000xx
One word blocks
Cache 0001xx
Two low order bits
0010xx define the byte in the
Way Set V Tag Data
0011xx word (32b words)
0 0100xx
0
1 0101xx
0 0110xx
1
1 0111xx
1000xx Q2: How do we find it?
1001xx
Q1: Is it there?
1010xx Use next 1 low order
1011xx memory address bit to
Compare all the cache
1100xx determine which
tags in the set to the
1101xx cache set (i.e., modulo
high order 3 memory
1110xx the number of sets in
address bits to tell if
1111xx the cache)
the memory block is in
the cache
IT3030E, Fall 2022 33
Another Reference String Mapping
❑ Consider the main memory word reference string
Start with an empty cache - all 0 4 0 4 0 4 0 4
blocks initially marked as not valid

0 miss 4 miss 0 hit 4 hit

000 Mem(0) 000 Mem(0) 000 Mem(0) 000 Mem(0)

010 Mem(4) 010 Mem(4) 010 Mem(4)

8 requests, 2 misses

❑ Solves the ping pong effect in a direct mapped cache

due to conflict misses since now two memory locations
that map into the same cache set can co-exist!

IT3030E, Fall 2022 34

Four-Way Set Associative Cache
❑ 28 = 256 sets each with four ways (each with one block)
31 30 ... 13 12 11 ... 2 1 0 Byte offset

Tag 22 8
Index
Index V Tag Data V Tag Data V Tag Data V Tag Data
0 0 0 0
1 1 1 1
2 Way 0 2 Way 1 2 Way 2 2 Way 3
. . . .
. . . .
. . . .
253 253 253 253
254 254 254 254
255 255 255 255

4x1 select

Hit Data
IT3030E, Fall 2022 35
Range of Set Associative Caches
❑ For a fixed size cache, increase of the number of blocks
per set results in decrease of the number of sets

Used for tag compare Selects the set Selects the word in the block

Tag Index Block offset Byte offset

Increasing associativity
Decreasing associativity
Fully associative
Direct mapped (only one set)
(only one way) Tag is all the bits except
Smaller tags, only a block and byte offset
single comparator

IT3030E, Fall 2022 36

Benefits of Set Associative Caches
❑ The choice of direct mapped or set associative depends
on the cost of a miss versus the cost of implementation
12
4KB
10 8KB
16KB
8
Miss Rate

32KB
6 64KB
128KB
4 256KB
512KB
2
Data from Hennessy &
0 Patterson, Computer
1-way 2-way 4-way 8-way Architecture, 2003
Associativity

❑ Largest gains are in going from direct mapped to 2-way

(20%+ reduction in miss rate)
IT3030E, Fall 2022 37
Block replacement
❑ Cache miss: a new block is loaded to cache, it will
replace an old block
➔ Which block should be replaced?
❑ Direct-mapped cache: exactly one choice
❑ Associative cache: one of multiple blocks in the set must
be selected
➔ LRU scheme: (least recently used) block that has been
unused the longest time is selected for replacement.
Mechanism for relative last time used tracking is necessary.

IT3030E, Fall 2022 38

LRU block replacement
❑ Consider the main memory word reference string
Start with an empty cache - all 0 4 2 4 0 0 0 4
blocks initially marked as not valid
Last
used 0 miss 4 miss 2 miss 4 hit
x 000 Mem(0) 000 Mem(0) x 001 Mem(2) 001 Mem(2)

x 010 Mem(4) 010 Mem(4) x 010 Mem(4)

Last
used 0 miss 0 hit 0 hit 4 hit
x 000 Mem(0) x 000 Mem(0) x 000 Mem(0) 000 Mem(0)

010 Mem(4) 010 Mem(4) 010 Mem(4) x 010 Mem(4)

IT3030E, Fall 2022 39

Reducing Cache Miss Rates #2
➔Use multiple levels of caches
Very costly in 1990s: US$100000 or above
Common in 2020s: ~US$500 machines
❑ Normally a unified L2 cache (holding both instructions
and data, for each core) and a unified L3 cache shared
for all cores

IT3030E, Fall 2022 40

Multilevel Cache Design Considerations
❑ Design considerations for L1 and L2 caches are very
different
Primary cache should focus on minimizing hit time in support of
a shorter clock cycle
- Smaller with smaller block sizes
Secondary cache(s) should focus on reducing miss rate to
reduce the penalty of long main memory access times
- Larger with larger block sizes
- Higher levels of associativity

❑ The miss penalty of the L1 cache is significantly reduced

by the presence of an L2 cache – so it can be smaller but
have a higher miss rate
❑ For the L2 cache, hit time is less important than miss rate
The L2$ hit time determines L1$’s miss penalty
L2$ local miss rate >> than the global miss rate
IT3030E, Fall 2022 41
Example
❑ Given a processor with a base CPI of 1.0 and clock rate
of 4 GHz. Main memory access time is 100 ns.
All data references are hit in primary cache (L1).
Instruction miss rate of 2% in primary cache (L1).

❑ A new L2 is added
Access time from L1 to L2 is 5 ns.
Instruction miss rate (to main memory) reduced to 0.5%.

❑ What is speed-up after adding the L2?

IT3030E, Fall 2022 42

Answer
Mem

CPU L2

CPI=1 5ns 100ns

f=4GHz 2% inst. missed 0.5% inst.
missed

❑ 𝐶𝑃𝐼 = 𝐵𝑎𝑠𝑒𝐶𝑃𝐼 + 𝑆𝑡𝑎𝑙𝑙𝐶𝑃𝐼 = 𝐵𝑎𝑠𝑒𝐶𝑃𝐼 + 𝐼𝑆𝑡𝑎𝑙𝑙 + 𝐷𝑆𝑡𝑎𝑙𝑙

❑ 𝐵𝑎𝑠𝑒𝐶𝑃𝐼 = 1, 𝐷𝑆𝑡𝑎𝑙𝑙 = 0,
❑ 5 𝑛𝑠 = 20 𝑐𝑦𝑐𝑙𝑒𝑠, 100 𝑛𝑠 = 400 𝑐𝑦𝑐𝑙𝑒𝑠
❑ Without L2: 𝐼𝑆𝑡𝑎𝑙𝑙 = 2% ∗ 400 = 8
❑ With L2: 𝐼𝑆𝑡𝑎𝑙𝑙 = 𝐼𝑆𝑡𝑎𝑙𝑙1 + 𝐼𝑆𝑡𝑎𝑙𝑙2 = 2% ∗ 20 + 0.05% ∗ 400 = 2.4
1+8
❑ 𝑆𝑝𝑒𝑒𝑑𝑢𝑝 = = 2.6
1+2.4

IT3030E, Fall 2022 43

Handling Cache Hits
❑ Read hits (I$ and D$)
this is what we want!

❑ Write hits (D$ only)

require the cache and memory to be consistent
- always write the data into both the cache block and the next level in
the memory hierarchy (write-through)
- writes run at the speed of the next level in the memory hierarchy – so
slow! – or can use a write buffer and stall only if the write buffer is full
allow cache and memory to be inconsistent
- write the data only into the cache block (write-back the cache block to
the next level in the memory hierarchy when that cache block is
“evicted”)
- need a dirty bit for each data cache block to tell if it needs to be
written back to memory when it is evicted – can use a write buffer to
help “buffer” write-backs of dirty blocks

IT3030E, Fall 2022 44

Handling Cache Misses (Single Word Blocks)
❑ Read misses (I$ and D$)
stall the pipeline, fetch the block from the next level in the memory
hierarchy, install it in the cache and send the requested word to
the processor, then let the pipeline resume
❑ Write misses (D$ only)
1. stall the pipeline, fetch the block from next level in the memory
hierarchy, install it in the cache (which may involve having to evict
a dirty block if using a write-back cache), write the word from the
processor to the cache, then let the pipeline resume
or
2. Write allocate – just write the word into the cache updating both
the tag and data, no need to check for cache hit, no need to stall
or
3. No-write allocate – skip the cache write (but must invalidate that
cache block since it will now hold stale data) and just write the
word to the write buffer (and eventually to the next memory level),
no need to stall if the write buffer isn’t full
IT3030E, Fall 2022 45
Multiword Block Considerations
❑ Read misses (I$ and D$)
Processed the same as for single word blocks – a miss returns
the entire block from memory
Miss penalty grows as block size grows
- Early restart – processor resumes execution as soon as the
requested word of the block is returned
- Requested word first – requested word is transferred from the
memory to the cache (and processor) first
Nonblocking cache – allows the processor to continue to access
the cache while the cache is handling an earlier miss
❑ Write misses (D$)
If using write allocate must first fetch the block from memory and
then write the word to the block (or could end up with a “garbled”
block in the cache (e.g., for 4-word blocks, a new tag, one word
of data from the new block, and three words of data from the old
block))
IT3030E, Fall 2022 46
Exercise
❑ Given a CPU with 32 bits address and the below word
reference string.
3, 180, 43, 2, 191, 88, 190, 14, 181, 44, 186, 253
❑ Identify the binary address, tag field, block index field,
and hit ratio in the following cases.
The CPU has direct-mapped cache of 16 one-word blocks.
The CPU has direct-mapped cache of 8 two-word blocks.

IT3030E, Fall 2022 47

Exercise
❑ Given a CPU with 32 bits address and the below word
reference string.
3, 180, 43, 2, 191, 88, 190, 14, 181, 44, 186, 253
❑ The CPU has direct-mapped cache with a total of 8 data
words. Miss penalty is 25 cycles.
❑ Which of the following designs is optimal given the above
reference string?
8x one-word blocks, access time of 2 cycles
4x two-word blocks, access time of 3 cycles.
2x four-word blocks, access time of 5 cycles.

IT3030E, Fall 2022 48

Virtual Memory
❑ Main memory (RAM) can be used as a “cache”
for secondary storage (disk), but not mainly for
performance.
Virtual memory
(very large main memory)

words blocks pages

CPU Cache Main memory Secondary memory

Purpose: Improving
Purpose???
performance
IT3030E, Fall 2022 49
Virtual Memory
❑ Multiple programs (processes) share one main memory
❑ Large programs can run on computer with small main
memory

RAM

Chris Terman MIT 6.004 Disk

Main memory (page file or
swap
code code code space)
…
data data data

Physical memory Virtual memory

(ex. 1 GB RAM) (4 GB address space)
IT3030E, Fall 2022 50
Relocation and Address translation
❑ Programs are located and run in virtual memory.
Each program has its own continuous address space (virtual
address).
Virtual address are mapped to physical address via translation.
Memory is organized in pages of fixed size (4KB - 64KB).

Do programs need to be allocated in Example: CPU with 32 bit address, but the
contiguous physical pages? computer has only 1GB of physical
memory
IT3030E, Fall 2022 51
Address Translation
❑ CPU accesses a memory location based on virtual
address: Virtual page number + Page offset
❑ If the virtual page number can be translated to physical
page number (hit) → memory access can be done
properly.
❑ Otherwise (miss): page fault → very expensive operation
New physical page is allocated for the running process
- If no free physical pages is available, move an “old” page to disk to
make space for the new page ➔ page replacement
Content for the new page is loaded from disk

IT3030E, Fall 2022 52

Page Tables
❑ Stores placement information of each program (process)
Array of Page Table Entries, indexed by virtual page number
Located in main memory
Page table register in CPU points to page table in physical
memory

❑ If page is present in memory

PTE stores the physical page number
Plus status bits (referenced, dirty, …)

❑ If page is not present

PTE can refer to location in swap space on disk

IT3030E, Fall 2022 53

Translation Using a Page Table

IT3030E, Fall 2022 54

Page Fault Penalty and Storage Mapping
❑ On page fault, the page must be fetched from disk
Usually together with page replacement
Takes millions of clock cycles
Handled by OS code

Memory pages can be stored

in disk page-file or swap space
Managed by OS

IT3030E, Fall 2022 55

Issues in virtual memory design
❑ Minimize cost for page fault and data write: minimize
page fault rate, and minimize disk write frequency
Fully associative placement
Smart replacement algorithms
Write back approach

❑ Fast address translation: this happens for every memory

access, it must be as fast as possible
Caching the page table: Translation Look-aside Buffer (TLB)

IT3030E, Fall 2022 56

Page Replacement and Writes
❑ Least-recently used (LRU) for page replacement
Can be quite slow when number of page is large
Reference bit (aka use bit) in PTE set to 1 on access to page
Periodically cleared to 0 by OS
Pages with reference bit = 0 are considered for replacement

❑ Disk writes take millions of cycles

Disk write is slow and should be done in batches of data.
→Write through is impractical
Use write-back
Dirty bit in PTE set when page is written

IT3030E, Fall 2022 57

Fast Translation Using TLB
❑ Address translation: two consecutive memory references
One to access the PTE, then the actual memory access
Has good locality → page table can be cached

❑ TLB (Translation Look-aside Buffer)

New component inside CPU
Provides fast access to the most recent PTEs
Typical: 16–512 PTEs, 0.5–1 cycle for hit, 10–100 cycles for
miss, 0.01%–1% miss rate
Only contains PTEs corresponding to physical pages

IT3030E, Fall 2022 58

Fast Translation Using a TLB

IT3030E, Fall 2022 59

TLB Miss Handler
❑ TLB miss indicates
Page present, but PTE not in TLB
Page not present

❑ Page present:
Handler copies PTE from memory to TLB
Then restarts instruction

❑ If page not present: page fault will occur

IT3030E, Fall 2022 60

Page Fault Handler
❑ Use faulting virtual address to find PTE (currently not
valid)
❑ Locate page on disk
❑ Choose a page in physical memory to replace
If dirty, write-back the chosen page to disk first

❑ Read page into memory and update page table

❑ Make process runnable again
Restart from faulting instruction

IT3030E, Fall 2022 61

TLB and Cache Interaction

Virtual Memory ❑ Physically addressed cache

Cache uses physical address
1 Need to translate before
2 cache lookup
Slow performance
3
❑ Virtually addressed cache
Physical Memory Skip TLB when in normal
cache access
Aliasing problem:
- Different virtual addresses for
shared physical address

❑ Compromise: virtually indexed

but physically tagged
No alias, but complicated
physical design
or virtual?

IT3030E, Fall 2022 62

Process and Memory protection
❑ Process: an instance of a program in
execution
❑ (Take the IT3070E - OS course for more details)

❑ With separate memory space (virtual)

❑ Share the common physical memory
❑ Important data of a process: PC, register’s
values, page table
❑ Memory must be protected
Read protection: processes not able to read
each other’s memory
Write protection: processes prohibited from
writing to other process’s memory

❑ Super process: the OS

IT3030E, Fall 2022 63
Memory Protection
❑ Read protection
Virtual pages of separate processes map to disjoint physical pages.
Placing page tables in protected address space of OS →
processes are not allowed to modify page tables.

❑ Sharing data
OS creates a page table entry for a virtual page of one process to
point to physical page of another page.
Write protection: use the write protection bit.

❑ Hardware support for protection (used by OS)

Special privileged supervisor mode (aka kernel mode) and
privileged instructions.
Page tables and other state information only accessible in
supervisor mode.
System call exception (e.g., syscall in MIPS) to go from user mode
to supervisor mode.
IT3030E, Fall 2022 64
Summary
❑ Memory hierarchy and the locality principal
❑ Cache design
Direct mapped
Set associative
Memory access when cache hit and miss

❑ Virtual memory
Address translation
TLB
Protection

IT3030E, Fall 2022 65

CCSP Exam Cram DOMAIN 2 Handout
No ratings yet
CCSP Exam Cram DOMAIN 2 Handout
135 pages
IT3030E CA Chap6 Memory
No ratings yet
IT3030E CA Chap6 Memory
65 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
64 pages
IT3030E CA Chap6 Memory
No ratings yet
IT3030E CA Chap6 Memory
72 pages
IT3030E CA Chap6 Memory
No ratings yet
IT3030E CA Chap6 Memory
69 pages
IT3030E CA Chap6 Memory - Removed
No ratings yet
IT3030E CA Chap6 Memory - Removed
10 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
Introduction To Cache Memory: CS223 Computer Architecture & Organization
No ratings yet
Introduction To Cache Memory: CS223 Computer Architecture & Organization
17 pages
Lec 4
No ratings yet
Lec 4
31 pages
CPSC 312 Cache Memories: Topics
No ratings yet
CPSC 312 Cache Memories: Topics
39 pages
06 - Memory System - I
No ratings yet
06 - Memory System - I
63 pages
Lecture 13 16 Post
No ratings yet
Lecture 13 16 Post
24 pages
Chapter 4 Memory Organization Lecture
No ratings yet
Chapter 4 Memory Organization Lecture
54 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Chapter 5 Large and Fast Exploiting Memory Hierarchy
No ratings yet
Chapter 5 Large and Fast Exploiting Memory Hierarchy
101 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Cache Org
No ratings yet
Cache Org
19 pages
Unit Iv
No ratings yet
Unit Iv
61 pages
CA Lecture 08
No ratings yet
CA Lecture 08
38 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
CODch 7 Slides
No ratings yet
CODch 7 Slides
49 pages
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
No ratings yet
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
77 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Fundamentals of Computer Systems: Caches
No ratings yet
Fundamentals of Computer Systems: Caches
28 pages
Cache Memory: CS2100 - Computer Organization
No ratings yet
Cache Memory: CS2100 - Computer Organization
45 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
76 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
Lecture11 Cda3101
No ratings yet
Lecture11 Cda3101
73 pages
Chapter5 - Direct Mapped Caches
No ratings yet
Chapter5 - Direct Mapped Caches
11 pages
DDCA Ch8
No ratings yet
DDCA Ch8
86 pages
9 - Cache
No ratings yet
9 - Cache
58 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Unit 4
No ratings yet
Unit 4
72 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
CMP3010L08 Memory
No ratings yet
CMP3010L08 Memory
45 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
No ratings yet
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
49 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
49 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
91 pages
10 Cacheperf
No ratings yet
10 Cacheperf
24 pages
10 Cache Memories
No ratings yet
10 Cache Memories
49 pages
Cache Memory Virtual Memory
No ratings yet
Cache Memory Virtual Memory
40 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
5 1
No ratings yet
5 1
39 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
Unit 5
No ratings yet
Unit 5
40 pages
Cache Memories
No ratings yet
Cache Memories
41 pages
Lecture 03
No ratings yet
Lecture 03
37 pages
Week 13 - Lecture 13 - Memory (Cont)
No ratings yet
Week 13 - Lecture 13 - Memory (Cont)
31 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
Image Compression: Efficient Techniques for Visual Data Optimization
From Everand
Image Compression: Efficient Techniques for Visual Data Optimization
Fouad Sabry
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
Driving Forces Behind Client
No ratings yet
Driving Forces Behind Client
13 pages
Cloud Management and Operations Module 1
No ratings yet
Cloud Management and Operations Module 1
102 pages
Superplan Stream
No ratings yet
Superplan Stream
2 pages
Spring Boot Using Maven
No ratings yet
Spring Boot Using Maven
10 pages
Ug Fifo
No ratings yet
Ug Fifo
33 pages
Anjali Resume V1.0
No ratings yet
Anjali Resume V1.0
2 pages
Prácticas Bigdata: 1. Lanzar Un Proceso Mapreduce Contra El Cluster
No ratings yet
Prácticas Bigdata: 1. Lanzar Un Proceso Mapreduce Contra El Cluster
3 pages
Static Routing
No ratings yet
Static Routing
17 pages
Client 2 - Synopsys - ATS Speaker Slide - Thomas Li (Synopsys)
No ratings yet
Client 2 - Synopsys - ATS Speaker Slide - Thomas Li (Synopsys)
29 pages
Placed Students - Cradle Point
No ratings yet
Placed Students - Cradle Point
18 pages
Nodejs JWT Workshop
No ratings yet
Nodejs JWT Workshop
5 pages
Azure MySQL Infographic - Final
No ratings yet
Azure MySQL Infographic - Final
2 pages
UNIT 3 PPTT
No ratings yet
UNIT 3 PPTT
35 pages
QBASIC Notes: Algorithms & Psuedocode CLS, Rem, Locate, Print, End
No ratings yet
QBASIC Notes: Algorithms & Psuedocode CLS, Rem, Locate, Print, End
224 pages
Automatic Guiding Computer: Technical Description
No ratings yet
Automatic Guiding Computer: Technical Description
21 pages
Fast API
No ratings yet
Fast API
4 pages
AKLABETH
No ratings yet
AKLABETH
22 pages
Computer Science Project Guide: CIS 490/491 and CIS 700/710
No ratings yet
Computer Science Project Guide: CIS 490/491 and CIS 700/710
33 pages
Schema-Hp Pavilion Dm1 Intel-1018tu 1128tu Fp7-0526 1100
No ratings yet
Schema-Hp Pavilion Dm1 Intel-1018tu 1128tu Fp7-0526 1100
35 pages
Os Lab Manual Bcs303 PDF
No ratings yet
Os Lab Manual Bcs303 PDF
48 pages
057-283 A108 Soft
No ratings yet
057-283 A108 Soft
32 pages
Module 3.1 - CUDA Parallelism Model: GPU Teaching Kit
No ratings yet
Module 3.1 - CUDA Parallelism Model: GPU Teaching Kit
44 pages
Computer Architecture Components
No ratings yet
Computer Architecture Components
3 pages
Ip SQP 003
No ratings yet
Ip SQP 003
7 pages
Bec601 Module 3
No ratings yet
Bec601 Module 3
209 pages
Networking and Security With Linux: Steven Gordon
No ratings yet
Networking and Security With Linux: Steven Gordon
307 pages
Dgalab: An Extensible Software Implementation For Dga: Saleh I. Ibrahim, Sherif S.M. Ghoneim, Ibrahim B.M. Taha
No ratings yet
Dgalab: An Extensible Software Implementation For Dga: Saleh I. Ibrahim, Sherif S.M. Ghoneim, Ibrahim B.M. Taha
8 pages
Cambridge International AS & A Level: Computer Science 9608/12 October/November 2021
No ratings yet
Cambridge International AS & A Level: Computer Science 9608/12 October/November 2021
9 pages
CGR Microproject
No ratings yet
CGR Microproject
21 pages