0% found this document useful (0 votes)

21 views136 pages

Chapter - 05 9wy

Chapter 5 discusses the principles of locality in memory access, emphasizing temporal and spatial locality to optimize memory hierarchy through various levels including SRAM, DRAM, and disk storage. It explains cache memory operations, including direct-mapped cache, hit/miss mechanisms, and the impact of block size on cache performance. The chapter also covers strategies for handling cache misses and write operations, such as write-through and write-back techniques.

Uploaded by

2105019

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views136 pages

Chapter - 05 9wy

Uploaded by

2105019

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 136

COMPUTER ORGANIZATION AND 5th

Edition
The Hardware/Software Interface
DESIGN

Chapter 5
Large and Fast:
Exploiting Memory
Hierarchy
§5.1 Introduction
Principle of Locality
■ Programs access a small proportion of
their address space at any time
■ Temporal locality
■ Items accessed recently are likely to be
accessed again soon
■ e.g., instructions in a loop, induction variables
■ Spatial locality
■ Items near those accessed recently are likely
to be accessed soon
■ E.g., sequential instruction access, array data
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2
Taking Advantage of Locality
■ Memory hierarchy
■ Store everything on disk
■ Copy recently accessed (and nearby)
items from disk to smaller DRAM memory
■ Main memory
■ Copy more recently accessed (and
nearby) items from DRAM to smaller
SRAM memory
■ Cache memory attached to CPU

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3

Memory Hierarchy Levels
■ Block (aka line): unit of copying
■ May be multiple words
■ If accessed data is present in
upper level
■ Hit: access satisfied by upper level
■ Hit ratio: hits/accesses
■ If accessed data is absent
■ Miss: block copied from lower level
■ Time taken: miss penalty
■ Miss ratio: misses/accesses
= 1 – hit ratio
■ Then accessed data supplied from
upper level

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 4

§5.2 Memory Technologies
Memory Technology
■ Static RAM (SRAM)
■ 0.5ns – 2.5ns, $2000 – $5000 per GB
■ Dynamic RAM (DRAM)
■ 50ns – 70ns, $20 – $75 per GB
■ Magnetic disk
■ 5ms – 20ms, $0.20 – $2 per GB
■ Ideal memory
■ Access time of SRAM
■ Capacity and cost/GB of disk

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5

§6.3 Disk Storage
Disk Storage
■ Nonvolatile, rotating magnetic storage

Chapter 6 — Storage and Other I/O Topics — 6

Disk Sectors and Access
■ Each sector records
■ Sector ID
■ Data (512 bytes, 4096 bytes proposed)
■ Error correcting code (ECC)
■ Used to hide defects and recording errors
■ Synchronization fields and gaps
■ Access to a sector involves
■ Queuing delay if other accesses are pending
■ Seek: move the heads
■ Rotational latency
■ Data transfer
■ Controller overhead

Chapter 6 — Storage and Other I/O Topics — 7

Disk Access Example
■ Given
■ 512B sector, 15,000rpm, 4ms average seek
time, 100MB/s transfer rate, 0.2ms controller
overhead, idle disk
■ Average read time
■ 4ms seek time
+ ½ / (15,000/60) = 2ms rotational latency
+ 512 / 100MB/s = 0.005ms transfer time
+ 0.2ms controller delay
= 6.2ms
■ If actual average seek time is 1ms
■ Average read time = 3.2ms

Chapter 6 — Storage and Other I/O Topics — 8

§5.3 The Basics of Caches
Cache Memory
■ Cache memory
■ The level of the memory hierarchy closest to
the CPU
■ Given accesses X1, …, Xn–1, Xn

■ How do we know if
the data is present?
■ Where do we look?

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 9

Direct Mapped Cache
■ Location determined by address
■ Direct mapped: only one choice
■ (Block address) modulo (#Blocks in cache)

■ #Blocks is a
power of 2
■ Use low-order
address bits

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 10

Tags and Valid Bits
■ How do we know which particular block is
stored in a cache location?
■ Store block address as well as the data
■ Actually, only need the high-order bits
■ Called the tag
■ What if there is no data in a location?
■ Valid bit: 1 = present, 0 = not present
■ Initially 0

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 11

Cache Example
■ 8-blocks, 1 word/block, direct mapped
■ Initial state
22,
26, Index V Tag Data
22, 000 N
26, 001 N
16, 010 N
3, 011 N
100 N
16,
101 N
18,
110 N
16
111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 12

Cache Example
Word addr Binary addr Hit/miss Cache block
22 10 110 Miss 110

22,
26, Index V Tag Data
22, 000 N
26, 001 N
16, 010 N
3, 011 N
100 N
16,
101 N
18,
110 Y 10 Mem[10110]
16
111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 13

Cache Example
Word addr Binary addr Hit/miss Cache block
26 11 010 Miss 010

22,
26, Index V Tag Data
22, 000 N
26, 001 N
16, 010 Y 11 Mem[11010]
3, 011 N
100 N
16,
101 N
18,
110 Y 10 Mem[10110]
16
111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 14

Cache Example
Word addr Binary addr Hit/miss Cache block
22 10 110 Hit 110
26 11 010 Hit 010
22,
26, Index V Tag Data
22, 000 N
26, 001 N
16, 010 Y 11 Mem[11010]
3, 011 N
100 N
16,
101 N
18,
110 Y 10 Mem[10110]
16
111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 15

Cache Example
Word addr Binary addr Hit/miss Cache block
16 10 000 Miss 000
3 00 011 Miss 011
22, 16 10 000 Hit 000
26, Index V Tag Data
22, 000 Y 10 Mem[10000]
26, 001 N
16, 010 Y 11 Mem[11010]
3, 011 Y 00 Mem[00011]
100 N
16,
101 N
18,
110 Y 10 Mem[10110]
16
111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 16

Cache Example
Word addr Binary addr Hit/miss Cache block
18 10 010 Miss 010

22,
26, Index V Tag Data
22, 000 Y 10 Mem[10000]
26, 001 N
16, 010 Y 11
10 Mem[11010]
Mem[10010]
3, 011 Y 00 Mem[00011]
100 N
16,
101 N
18,
110 Y 10 Mem[10110]
16
111 N

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 17

Address Subdivision

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 18

Address Subdivision

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 18

v is the
Total bits in a cache Size of
tag field
valid field
size, i.e., 1

■ 32-bit addresses ■ Total number of bits

A direct-mapped cache
■
(C)
■ The cache size is 2n blocks Block
n
■ so n bits are used for the ■ C = 2 x (b + t + v) size
index ■ t = 32 – (n + m + 2)
■ The block size (b) is 2m words ■ b = 2m+5
(2m+2 bytes = 2m+2+3 bits)
■ m bits are used for the ■ C
word within = 2n x (2m+5 + 32 – (n
the block
■ two bits are used for the
+ m + 2) + 1)
byte part of the address = 2n x (2m x 32 + 32
–n - m -1)

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 19

Total bits in a cache n

■ How many total bits ■ 16 KiB = 4 Ki Words =

are required for a 1 Ki Blks = 210 Blks
direct-mapped cache ■ C = 1024 x (b + t + v)
with 16 KiB of = 1024x(4x32 + t + 1)
data and 4-word = 1024x(4x32 +18+1)
blocks (i.e., 16 bytes), = 147 Kilo bits
assuming a 32-bit
address?
31 14 13 43 0
Offs
Tag Index et
18 bits 10 bits 4 bits

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 20

Example: Larger Block Size
Byte Address
■ 64 blocks, 16 bytes/block
■ To what block number does address 1200
map? ⎣byte address /
bytes per block⎦
■ Block address = ⎣1200/16⎦ = 75
■ Block number = 75 modulo 64 = 11
31 10 9 4 3 0
Tag Index Offset
22 bits 6 bits 4 bits

64 blocks 16 bytes

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 21

Example: Larger Block Size
Byte Address
■ 64 blocks, 16 bytes/block
4 3Offs 0

To what block number does address 1200

bits

■
et
4

map? ⎣byte address /

bytes per block⎦
Block address = ⎣1200/16⎦ = 75
0 Index

■
bits
6

■ Block number = 75 modulo 64 = 11

1 9

■ In fact Block 11 maps all addresses

between 1200 and 1215.
Tag
bits
22
3
1

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 22

Block Size Considerations
■ Larger blocks should reduce miss rate
■ Due to spatial locality
■ But in a fixed-sized cache
■ Larger blocks ⇒ fewer of them
■ More competition ⇒ increased miss rate
■ Larger miss penalty
■ Larger blocks ⇒ Larger transfer time
■ Can override benefit of reduced miss rate
■ Early restart critical-word-first can help

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 23

Early restart
■ resume execution as soon as the requested word of the
block is returned; Does not wait for the entire block
■ For instruction access, it works best
■ Instruction accesses are largely sequential

■ so if the memory system can deliver a word every

clock cycle, the processor may be able to restart

operation when the requested word is returned, with
the memory system delivering new instruction words
just in time.
■ This technique is usually less effective for data caches

because it is likely that the words will be requested

from the block in a less predictable

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 24

Critical Word First
■ Organizes the memory so that the
requested word is transferred from the
memory to the cache first.
■ The remainder of the block is then
transferred, starting with the address after
the requested word and wrapping
around to the beginning of the block.
■ Can be slightly faster than early restart
■ but it is limited by the same properties that
limit early restart.
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 25
Cache Misses
■ On cache hit, CPU proceeds normally
■ On cache miss
■ Stall the CPU pipeline
■ Fetch block from next level of hierarchy
■ Instruction cache miss
■ Restart instruction fetch
■ Data cache miss
■ Complete data access

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 26

Write-Through
■ On data-write hit, could just update the block in
cache
■ But then cache and memory would be inconsistent
■ Write through: also update memory
■ But makes writes take longer
■ e.g., if base CPI = 1, 10% of instructions are stores,
write to memory takes 100 cycles
■ Effective CPI = 1 + 0.1×100 = 11
■ Solution: write buffer
■ Holds data waiting to be written to memory
■ CPU continues immediately
■ Only stalls on write if write buffer is already full

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 27

Write-Back
■ Alternative: On data-write hit, just update
the block in cache
■ Keep track of whether each block is dirty
■ When a dirty block is replaced
■ Write it back to memory
■ Can use a write buffer to allow replacing block
to be read first

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 28

Write Allocation
■ What should happen on a write miss?
■ Alternatives for write-through Write allocate
■ Allocate on miss: fetch the block
■ Write around: don’t fetch the block
No Write
■ Since programs often write a whole block before
allocate reading it (e.g., initialization)
■ For write-back
■ Usually fetch the block
■ (See next slides)

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 29

Advantage for write-through
■ we can write the data into the cache and
then read the tag;
■ if the tag mismatches, then a miss occurs.
■ Because the cache is write-through, the
overwriting of the block in the cache is not
catastrophic
■ memory has the correct value and we can do
it right.
■ THIS CANNOT BE DONE FOR WRITE
BACK
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 30
Write Back
■ If we have a cache miss, we must first
write the block back to memory if the data
in the cache is modified.
■ stores require two cycles:
■ a cycle to check for a hit
■ followed by a cycle to actually perform the
write
■ Alternative: Write Buffer to hold that data
■

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 31

Write Back- Write Buffer
■ write buffer holds the data to write in
■ effectively allowing the store to take only
one cycle by pipelining it:
■ When a store buffer is used, the processor
does the cache lookup and places the data in
the store buffer during the normal cache
access cycle.
■ Assuming a cache hit, the new data is written
from the store buffer into the cache on the
next unused cache access cycle.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 32

Write Back- Write Buffer for miss
■ the modified block is moved to a
write-back buffer associated with the
cache in case of a miss
■ while the requested block is read
from memory.
■ The write-back buffer is later written back to
memory.
■ Assuming another miss does not occur
immediately, this technique reduces the
miss penalty
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 33
Example: Intrinsity FastMATH
■ Embedded MIPS processor
■ 12-stage pipeline
■ Instruction and data access on each cycle
■ Split cache: separate I-cache and D-cache
■ Each 16KB: 256 blocks × 16 words/block
■ D-cache: write-through or write-back
■ SPEC2000 miss rates
■ I-cache: 0.4%
■ D-cache: 11.4%
■ Weighted average: 3.2%

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 34

Example: Intrinsity FastMATH
256 blocks × 16 words/block

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 35

§5.4 Measuring and Improving Cache Performance
Measuring Cache Performance
■

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 36

Measuring Cache Performance
■

For Write-through Cache

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 37

Notes: Write Buffer Stalls
■ Write buffer stalls depend
So not easy to
■ on the proximity of writes deduce an equation
■ and not just the frequency
■ Usually we can ignore write buffer stalls
■ in systems with 4 or more words depth
■ With a memory capable of accepting writes at a rate
that significantly exceeds the average write frequency
in programs (by a factor of 2)
■ If a system did not meet these criteria, it would
not be well designed;
■ instead, the designer should have used either a
deeper write buffer or a write-back organization
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 38
§5.4 Measuring and Improving Cache Performance
Measuring Cache Performance
■ With simplifying assumptions:
■ read and write miss penalties are same
■ In most write-through schemes this is the case
■ Write buffer stalls are negligible

For
Data

For
Instruction

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 39

Cache Performance Example
■ Given
■ miss rate: I-cache = 2% ; D-cache = 4% How much
■ Miss penalty = 100 cycles faster is a
processor
■ Base CPI (ideal cache) = 2 with a perfect
■ Load & stores are 36% of instructions cache?

■ Say total instruction: I

■ I-cache miss cycles: I × 0.02 × 100 = 2.00 × I
■ D-cache: I × 0.36 × 0.04 × 100 = 1.44 × I

■ Actual CPI = 2 + 2 + 1.44 = 5.44

■ Ideal CPU is 5.44/2 =2.72 times faster

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 40

Amdahl’s Law
■ A rule stating that the performance
enhancement possible with a given
improvement is limited by the amount that
the improved feature is used.
■ What happens if the processor is made
faster, but the memory system is not?
■ The amount of time spent on memory stalls
will take up an increasing fraction of the
execution time

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 41

Cache Performance Example 2
■ Given
■ miss rate: I-cache = 2% ; D-cache = 4% How much
■ Miss penalty = 100 cycles faster is a
processor
■ Base CPI (ideal cache) = 1 with a perfect
■ Load & stores are 36% of instructions cache?

■ Say total instruction: I

■ I-cache miss cycles: I × 0.02 × 100 = 2.00 × I
■ D-cache: I × 0.36 × 0.04 × 100 = 1.44 × I

■ Actual CPI = 1 + 2 + 1.44 = 4.44

■ Ideal CPU is 4.44/1 =4.44 times faster

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 42

Lesson (re)Learned
■

Previously
it was 5.44

Performance lost increases

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 43

Average Access Time
■ Hit time is also important for performance
■ Average memory access time (AMAT)
■ AMAT = Hit time + Miss rate × Miss penalty
■ Example
■ CPU with 1ns clock, hit time = 1 cycle, miss
penalty = 20 cycles, I-cache miss rate = 5%
■ AMAT = 1 + 0.05 × 20 = 2ns
■ 2 cycles per instruction

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 44

Performance Summary
■ When CPU performance increased
■ Miss penalty becomes more significant
■ Decreasing base CPI
■ Greater proportion of time spent on memory
stalls
■ Increasing clock rate
■ Memory stalls account for more CPU cycles
■ Can’t neglect cache behavior when
evaluating system performance

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 45

Associative Caches
■ Fully associative
■ Allow a given block to go in any cache entry
■ Requires all entries to be searched at once
■ Comparator per entry (expensive)
■ n-way set associative
■ Each set contains n entries
■ Block number determines which set
■ (Block number) modulo (#Sets in cache)
■ Search all entries in a given set at once
■ n comparators (less expensive)
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 46
Associative Cache Example

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 47

1-Way Set Associative

How many bits? 3 bits

■ A cache block can only go in one spot in the cache.
■ It makes a cache block very easy to find
■ but it‛s not very flexible about where to put the blocks.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 48

2-Way Set Associative

How many bits? 2 bits

■ This cache is made up of sets that can fit two blocks
each.
■ The index is now used to find the set
■ The tag helps find the block within the set.
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 49
4-Way Set Associative
0 1

How many bits? 1 bit

■ Each set here fits four blocks,

■ So there are fewer sets.
■ As such, fewer index bits are needed.
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 50
Fully Associative

How many bits? 0 bit

■ No index is needed, since a cache block can go anywhere in
the cache.
■ Every tag must be compared when finding a block in the
cache
■ but block placement is very flexible!
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 51
Spectrum of Associativity
■ For a cache with 8 entries

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 52

Associativity Example
■ Compare 4-block caches
■ Direct mapped, 2-way set associative,
fully associative
■ Block access sequence: 0, 8, 0, 6, 8
■ Direct mapped
Block Cache Hit/miss Cache content after access
address index 0 1 2 3
0 modulo 4 = 0 0 0 miss Mem[0]
8 modulo 4 = 0 8 0 miss Mem[8]
0 modulo 4 = 0 0 0 miss Mem[0]
6 modulo 4 = 2 6 2 miss Mem[0] Mem[6]
8 modulo 4 = 0
8 0 miss Mem[8] Mem[6]

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 53

Associativity Example
■ Compare 4-block caches
■ Direct mapped, 2-way set associative,
fully associative
■ Block access sequence: 0, 8, 0, 6, 8
■ 2-way set associative
Block Cache Hit/miss Cache content after access
address index Set 0 Set 1
0 modulo 2 = 0 0 0 miss Mem[0]
8 modulo 2 = 0 8 0 miss Mem[0] Mem[8]
0 modulo 2 = 0 0 0 hit Mem[0] Mem[8]
6 modulo 2 = 0 6 0 miss Mem[0] Mem[6]
8 modulo 2 = 0
8 0 miss Mem[8] Mem[6]

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 54

Associativity Example
■ Compare 4-block caches
■ Direct mapped, 2-way set associative,
fully associative
■ Block access sequence: 0, 8, 0, 6, 8
■ Fully associative

Block Hit/miss Cache content after access

address
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0] Mem[8]
6 miss Mem[0] Mem[8] Mem[6]
8 hit Mem[0] Mem[8] Mem[6]

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 55

Set Associative Cache Organization

4-Way

An Alternate
Implementation:
Remove
the multiplexor
Enable
Signals

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 56

Replacement Policy
■ Direct mapped: no choice
■ Set associative
■ Prefer non-valid entry, if there is one
■ Otherwise, choose among entries in the set
■ Least-recently used (LRU)
■ Choose the one unused for the longest time
■ Simple for 2-way, manageable for 4-way, too hard
beyond that
■ Random
■ Gives approximately the same performance as
LRU for high associativity

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 57

Tags versus Set Associativity
■ Cache of 4096 blocks
■ a 4-word block size
■ 32-bit address,
■ Find the total number of sets and the total
number of tag bits for caches that are
■ direct mapped
■ two-way set associative
■ four-way set associative
■ fully associative.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 58

Tags versus Set Associativity
■ 4096 blocks; 32 bit address
■ 4 words block = 16 (24) Bytes block
■ So for tag + index 32 – 4 = 28 bits
16 bit28
tagbits (tag + Index)
12 bit index 44 bits
bits

■ Direct Mapped:
■ 4096 (212) 1-way set => 12 bit index
■ 16 bit tag (28 -12)
■ 4096 entries × 16 bit tag = 64 K bits tag

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 59

Tags versus Set Associativity
■ 4096 blocks; 32 bit address
■ 4 words block = 16 (24) Bytes block
■ So for tag + index 32 – 4 = 28 bits
17 bit28
tagbits (tag + Index)
11 bit index 44 bits
bits

■ 2-way set associative :

■ 4096/2 = 2048 (211) sets => 11 bit index
■ 17 bit tag (28 -11)
■ 4096 entries × 17 bit tag = 68 K bits tag

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 60

Tags versus Set Associativity
■ 4096 blocks; 32 bit address
■ 4 words block = 16 (24) Bytes block
■ So for tag + index 32 – 4 = 28 bits
18 bit28
tagbits (tag + Index)
10 bit index 44 bits
bits

■ 4-way set associative :

■ 4096/4 = 1024 (210) sets => 10 bit index
■ 18 bit tag (28 -10)
■ 4096 entries × 18 bit tag = 72 K bits tag

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 61

Tags versus Set Associativity
■ 4096 blocks; 32 bit address
■ 4 words block = 16 (24) Bytes block
■ So for tag + index 32 – 4 = 28 bits
28 bits
28(tag
bit tag
+ Index) 44 bits
bits

■ Fully set associative :

■ No index
■ 28 bits tag (28 - 0)
■ 4096 entries × 28 bit tag = 112 K bits tag

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 62

Tags versus Set Associativity
■ 4096 blocks; 4-words block; 32-bit address
■ 4-words block = 16 (24) Bytes block
■ So, for tag + index has 32 – 4 = 28 bits.
28 bits (tag + Index) 4 bits

Associativity Tag bits (K)

Direct mapped (1 Way) 64
2 Way Set Associative 68
4 Way Set Associative 72
Fully Associative 112

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 63

Multilevel Caches
■ Primary cache attached to CPU
■ Small, but fast
■ Level-2 cache services misses from
primary cache
■ Larger, slower, but still faster than main
memory
■ Main memory services L-2 cache misses
■ Some high-end systems include L-3 cache

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 64

Multilevel Cache Example
■ Given
■ CPU base CPI = 1, clock rate = 4GHz
■ Miss rate/instruction @ primary cache = 2%
■ Main memory access time = 100ns
■ 4 GHz => 0.25ns cycle length
■ With just primary cache
■ Miss penalty = 100ns/0.25ns = 400 cycles
■ Effective CPI = 1 + 0.02 × 400 = 9

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 65

Example (cont.)
■ Now add L-2 cache CPU base CPI = 1,
clock rate = 4GHz
■ Access time = 5ns
■ Global miss rate to main memory = 0.5%
■ Primary miss with L-2 hit
■ Penalty = 5ns/0.25ns = 20 cycles
primary Miss

Primary miss with L-2 miss

rate = 2%

■ CPI = 1 + 0.02 × 20 + 0.005 × 400 = 3.4

■ Performance ratio = 9/3.4 = 2.6

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 66

Multilevel Cache Considerations
■ Primary cache
■ Focus on minimal hit time
■ L-2 cache
■ Focus on low miss rate to avoid main memory
access
■ Hit time has less overall impact
■ Results
■ L-1 cache usually smaller than a single cache
■ L-1 block size smaller than L-2 block size

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 67

Interactions with Advanced CPUs
■ Out-of-order CPUs can execute
instructions during cache miss
■ Pending store stays in load/store unit
■ Dependent instructions wait in reservation
stations
■ Independent instructions continue
■ Effect of miss depends on program data
flow
■ Much harder to analyse
■ Use system simulation

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 68

§5.5 Dependable Memory Hierarchy
Dependability
Service
accomplishment
Service delivered
as specified
■ Fault: failure of a
component
Restoration Failure ■ May or may not lead
to system failure

Service interruption
Deviation from
specified service

Chapter 6 — Storage and Other I/O Topics — 69

Dependability Measures
■ Reliability: mean time to failure (MTTF)
■ Service interruption: mean time to repair (MTTR)
■ Mean time between failures
■ MTBF = MTTF + MTTR
■ Availability = MTTF / (MTTF + MTTR)
■ Improving Availability
■ Increase MTTF: fault avoidance, fault tolerance, fault
forecasting
■ Reduce MTTR: improved tools and processes for
diagnosis and repair

Chapter 6 — Storage and Other I/O Topics — 70

Nines of Availability
■ We want availability to be very high.
■ One shorthand is to quote the number of
“nines of availability” per year.
■
# of Nines % of uptime
One 90
Two 99
Three 99.9
Four 99.99
Five 99.999

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 71

Nines of Availability
■ Given 365 days per year, which is 365 * 24
* 60 = 525,600 minutes.
■ Then the shorthand is decoded as follows:
■ 90% => 525,600 * 0.1 downtime = 52560
minutes = 52560/(60*24) = 36.5 days.
■ # of Nines % of uptime Downtime/Year
One 90 36.5 days
Two 99 3.65 days
Three 99.9 526 min
Four 99.99 52.6 min
Five 99.999 5.26 min

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 72

MTTF and AFR
■ MTTF is a reliability measure.
■ A related term is annual failure rate (AFR)
■ The percentage of devices that would be
expected to fail in a year for a given MTTF.
■ Hours in a year / MTTF
■ When MTTF gets large it can be misleading
■ while AFR leads to better intuition

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 73

High MTTF and AFR
■ Some disks today are quoted to have a
1,000,000-hour MTTF.
■ 1,000,000/(365 * 24) = 114 years
■ they practically never fail???
■ Warehouse scale computers that run
Internet services such as Search might
have 50,000 servers.
■ Assume each server has 2 disks.
■ How many disks we would expect to fail
per year.
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 74
High MTTF and AFR
■ One year => 365 * 24 = 8760 hours.
■ A 1,000,000-hour MTTF means an AFR
of 8760/1,000,000 = 0.876%.
■ We have 50000 * 2 = 100,000 disks
■ we would expect 0.00876 * 100,000 = 876
disks to fail per year
■ On average more than (876/365) 2 disk
failures per day.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 75

§5.7 Virtual Memory
Virtual Memory
■ Use main memory as a “cache” for
secondary (disk) storage
■ Managed jointly by CPU hardware and the
operating system (OS)
■ Programs share main memory
■ Each gets a private virtual address space
holding its frequently used code and data
■ Protected from other programs
■ CPU and OS translate virtual addresses to
physical addresses
■ VM “block” is called a page
■ VM translation “miss” is called a page fault

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 76

Address Translation
■ Fixed-size pages (e.g., 4K)

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 77

Page Fault Penalty
■ On page fault, the page must be fetched
from disk
■ Takes millions of clock cycles
■ Handled by OS code
■ Try to minimize page fault rate
■ Fully associative placement
■ But this needs costly search!!!
■ Smart replacement algorithms

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 78

Page Tables
■ Stores placement information
■ Array of page table entries, indexed by virtual
page number
■ Page table register in CPU points to page
table in physical memory
■ If page is present in memory
■ PTE stores the physical page number
■ Plus other status bits (referenced, dirty, …)
■ If page is not present
■ PTE can refer to location in swap space on
disk
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 79
Translation Using a Page Table

What is
the size
of the
220 (= 1 M) entries

page
table?

19 bits;
but 32 bits
wide usually

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 80

Page Table Size Issues
■ Size of page table => 4 MB (per process)
■ What about 100 processes
■ Each with its own page table?
■ What will happen if we have 64-bit
addresses (according to prev. calc.)?
■ 2(64 – 12) = 252 entries!!!
63………………………………………………………………………………………………….12 11………………….0

52 bits 12 bits

■ There are techniques to reduce the amount of

storage required for the page table.
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 81
Techniques: Limit Registers
■ Keep a limit register that restricts the size
of the page table for a given process.
■ If the virtual page number becomes larger
than the contents of the limit register, entries
must be added to the page table.
■ This technique allows the page table to grow
as a process consumes more space.
■ Thus, the page table will only be large if the
process is using many pages of
virtual address space.
■ This technique requires that the address
space expand in only one direction.
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 82
Techniques: Limit Registers
Limit Register

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 83

Techniques: Two Limits
■ Two separate page tables and two separate
Allowing growth in only one direction is not sufficient

limits.
Stack ■ One grows from highest address down
Heap ■ One grows from lowest address up
■ So, address space is divided into 2 segments.
■ High-order bit of an address usually determines which
segment
■ i.e., which page table to use for that address.
■ So, each segment can be as large as one-half of the
address space.
■ A limit register for each segment specifies the current
size of the segment, which grows in units of pages.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 84

Techniques: Inverted Page Table
■ Keep only one entry per physical block
(i.e., frame)
■ Such a structure is called an inverted page
table.
■ we can no longer just index the page table.
■ So, the lookup process is slightly more
complex
■ May apply a hashing function to the virtual
address
■ To make the lookups faster
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 85
Techniques: Multiple Levels
■ The first level maps large fixed-size blocks
of virtual address space Sometimes called segment
■ Each entry in the segment table:
■ indicates whether any pages in that segment
are allocated
■ if so, points to a page table for that segment.
■ Address translation happens:
■ by first looking in the segment table, using the
highest-order bits of the address.
■ If the segment address is valid, the next set of
high-order bits is used to index the page table
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 86
Techniques: Multiple Levels

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 87

Mapping Pages to Storage

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 88

Page Fault
■ If the valid bit for a virtual page is off, a
page fault occurs.
■ The operating system must be given
control.
■ This transfer is done with the exception
mechanism
■ OS must find the page in the next level of the
hierarchy
■ and decide where to place the requested page
in main memory.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 89

Swap Space
■ Virtual addr. alone does not immediately tell us
where the page is on disk.
■ OS usually creates the space on flash
Swap Space

memory/disk for all the pages of a process

■ OS creates a data structure to record where
each virtual page is stored on disk.
■ may be part of the page table
■ or an auxiliary data structure indexed in the same way
as the page table
■ OS also creates a data structure that tracks
■ which processes and which virtual addresses use
each physical page
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 90
Replacement and Writes
■ To reduce page fault rate, prefer
least-recently used (LRU) replacement
■ Reference bit (aka use bit) in PTE set to 1 on
access to page
■ Periodically cleared to 0 by OS
■ A page with reference bit = 0 has not been
used recently
■ Disk writes take millions of cycles
■ Block at once, not individual locations
■ Write through is impractical
■ Use write-back
■ Dirty bit in PTE set when page is written

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 91

Fast Translation Using a TLB
■ Address translation would appear to require
extra memory references Page Tables are
in main memory
■ One to access the PTE
■ Then the actual memory access
■ But access to page tables has good locality
■ So use a fast cache of PTEs within the CPU
■ Called a Translation Look-aside Buffer (TLB)
■ Typical: 16–512 PTEs
■ Misses could be handled by hardware or software

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 92

Fast Translation Using a TLB

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 93

TLB Misses
■ If page is in memory
■ Load the PTE from memory and retry
■ Could be handled in hardware
■ Can get complex for more complicated page table
structures
■ Or in software
■ Raise a special exception, with optimized handler
■ If page is not in memory (page fault)
■ OS handles fetching the page and updating
the page table
■ Then restart the faulting instruction

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 94

TLB Miss Handler
■ TLB miss indicates
■ Page present, but PTE not in TLB
■ Page not present True Page Fault
■ Handler copies PTE from memory to TLB
■ Then restarts instruction
■ If page not present, page fault will occur
■ The reference and dirty bits may change in
TLB
■ So they must also be copied back to PTE for
the TLB entry that is replaced Write-Back Scheme
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 95
TLB: Associativity Lower Miss Rate

■ Some systems use small, fully associative

TLBs Not so high search cost

■ replacement choice becomes tricky

■ hardware LRU scheme is too expensive.
TLB misses ■ expensive software algorithm is also not feasible
are much
more
(unlike page faults)
frequent ■ Many systems provide some support for randomly
choosing an entry to replace.
■ Other systems use large TLBs, often with
small associativity.
■

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 96

Page Fault Handler
■ Use faulting virtual address to find PTE
■ Locate page on disk
■ Choose page to replace
■ If dirty, write to disk first
■ Read page into memory and update page
table
■ Make process runnable again
■ Restart from faulting instruction

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 97

Intrinsity FastMATH TLB
■ 4 KiB (212) pages; 32-bit address space
■ So, virtual page number is (32 – 12 =) 20 bits long
■ Physical address is same size as virtual address.
■ TLB: 16 entries; fully associative
■ shared between the instruction and data
■ Each entry is 64 bits wide
■ a 20-bit tag (virtual page number for that TLB entry)
■ the corresponding physical page number (also 20 bits),
■ a valid bit, a dirty bit, and other bookkeeping bits.
■ Like most MIPS systems, it uses software to handle
TLB misses.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 98

TLB and Cache
Interaction
Implementation
CAM

Cache
256 blocks × 16 words/block

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 99

Content Addressable Memory (CAM)
■ CAM is a circuit that combines comparison
and storage in a single device.
■ Does not supply an address and read a word
like a RAM
■ you supply the data and the CAM looks to see
if it has a copy and returns the index of the
matching row.
■ With CAMs higher set associativity in cache
can be implemented

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 100

Processing a read or a write-through in the
Intrinsity FastMATH TLB and cache

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 101

TLB, Cache and VM Events Combined

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 102

Physically Addressed Cache
■ Here, cache is
“physically addressed”
and “physically tagged”
■ Time to access
memory for a cache hit
include:
■ TLB access time
■ Cache access time
■ Of course, these
accesses can be
pipelined.
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 103
Virtually Indexed Cache
■ Alternatively, the processor can index the
cache with a virtual address VIVT
■ Virtually indexed and Virtually tagged cache
■ Here, TLB is unused during the normal
cache access
■ Reduce cache latency
■ Cache miss=> the processor needs to
translate the address to a physical address
■ so that it can fetch the cache block from main
memory.
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 104
Aliasing in VIVT Cache

■ Aliasing occurs when the same object has two names

■ Two virtual addresses for the same page.
■ May happen for shared pages between processes
■ This ambiguity creates a problem:
■ A word on such a page may be cached in two different locations,
each corresponding to different virtual addresses.
■ One program may write the data without the other program being
aware that the data had changed.
■ Solution to Aliasing Issues
■ either introduce design limitations on the cache and TLB to
reduce aliases
■ or require the operating system, and possibly the user, to take
steps to ensure that aliases do not occur

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 105

Virtually Indexed Physically Tagged
■ Physical Tag using just the page-offset
portion of the address, which is really
a physical address since it is not translated
■ These designs, which are virtually indexed
but physically tagged, attempt to achieve
the performance advantages of virtually
indexed caches with the architecturally
simpler advantages of a physically
addressed cache.
■ There is no alias problem in this case.
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 106
Memory Protection
■ Different tasks can share parts of their
virtual address spaces
■ But need to protect against errant access
■ Requires OS assistance
■ Hardware support for OS protection
■ Privileged supervisor mode (aka kernel mode)
■ Privileged instructions
■ Page tables and other state information only
accessible in supervisor mode
■ System call exception (e.g., syscall in MIPS)
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 107
Memory Protection: H/W Suport
■ Need to Support:
■ at least two modes:
■ Privileged supervisor mode (aka kernel mode)
■ User mode
■ Different Processor states:
■ Allow for a process to read only; No Write allowed
■ To write, privileged instructions are needed that are only
available in supervisor mode
■ Page tables and other state information only accessible in
supervisor mode
■ System call exception (e.g., syscall in MIPS):
■ This allows a process to change mode:
■ User mode => supervisor mode (and vice versa)

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 108

Memory Protection
■ Each process has its own virtual address space.
■ OS can keep the page tables organized so that the
independent virtual pages map to disjoint physical
pages
■ one process will not be able to access another’s data.
■ But what if the Process changes the mapping?
■ Page tables are placed in the protected address
space of the OS
■ So, user process is refrained from changing the page table
mapping.
■ But OS is able to modify the page tables.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 109

Memory Sharing and Protection
■ OS assists processes for limited information sharing
■ OS can change the Page table as needed
■ The write access bit is used to restrict write
■ Can be changed only by OS

Example:
■ P2 wants P1 to access its page
■ P2 asks OS to create a page table entry for a virtual page in
P1’s address space that points to the same physical page that
P2 wants to share.
■ Any bits that determine the access rights for a page must be
included in both the page table and the TLB because the page
table is accessed only on a TLB miss.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 110

Memory Protection: Context Switch
■ Context Switch: A changing of the internal state of the
processor to allow a different process to use the
processor
■ Suppose a context switch has occurred:
■ P1 was running; now P2 will run
■ OS must ensure that P2 cannot get access to P1’s page tables
■ Page Table register is changed
■ What about the TLB?
■ OS must clear the TLB entries that belong to P1
■ to protect the data of P1 and
■ to force the TLB to load the entries for P2.
■ If the process switch rate were high, this could be quite inefficient
especially if context switch is frequent.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 111

Memory Protection: Context Switch
A common alternative:
■extend the virtual address space by adding a process

identifier or task identifier.

■The Intrinsity FastMATH has an 8-bit address space ID

(ASID) field for this purpose.

■ This small field identifies the currently running process
■ it is kept in a register loaded by OS when it switches processes.
■ The process identifier is concatenated to the tag portion of the TLB
■ TLB hit occurs only if both the page number and the process
identifier match.
■ This eliminates the need to clear the TLB in context switch
Similar problems can occur for a cache, since on a process
■

switch the cache will contain data from the running process.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 112

Exception Enable/Disable
■ Suppose we have a page fault exception and OS is handling
it
■ What will happen, if a second exception occurs?
■ The control unit would overwrite the exception program counter, making it
impossible to return to the instruction that caused the page fault!
■ We need the ability to disable and enable exceptions.
■ When an exception first occurs, the processor sets a bit that
disables all other exceptions;
■ this could happen at the same time the processor sets the
supervisor mode bit.
■ The OS will then save just enough state to allow it to recover if
another exception occurs
■ the exception program counter (EPC) and Cause registers
■ The operating system can then re-enable exceptions.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 113

Special control registers that help with
exceptions, TLB misses, and page faults

When a TLB miss

occurs, the MIPS
hardware saves the
page number of the
reference here.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 114

TLB Miss in MIPS
■ TLB Miss exception invokes OS, which handles the miss in
software.
■ Control is transferred to 8000 0000hex (TLB Miss Handler Address)
■ To find the physical address for the missing page, the TLB miss
routine indexes the page table using the page number of the virtual
address and the page table register
■ To make this indexing fast, MIPS hardware places the address of
the Page Table Entry in a special Context Register
■ Thus, the first two instructions copy the Context register into the
kernel temporary register $k1 and then load the page table entry
from that address into $k1.
■ Recall that $k0 and $k1 are reserved for the operating system

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 115

TLB Miss in MIPS

■ TLB miss handler does not check to see if

the page table entry is valid.
8000 0180h

If invalid, another and different exception

it transfers

■
control to

occurs, and OS recognizes the page fault.

■ (frequent) TLB miss becomes fast
■ at a slight performance penalty for the
(infrequent) page fault.
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 116
§5.8 A Common Framework for Memory Hierarchies
The Memory Hierarchy
The BIG
Picture
■ Common principles apply at all levels of
the memory hierarchy
■ Based on notions of caching
■ At each level in the hierarchy
■ Block placement
■ Finding a block
■ Replacement on a miss
■ Write policy

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 117

Block Placement
■ Determined by associativity
■ Direct mapped (1-way associative)
■ One choice for placement
■ n-way set associative
■ n choices within a set
■ Fully associative
■ Any location
■ Higher associativity reduces miss rate
■ Increases complexity, cost, and access time

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 118

Finding a Block
Associativity Location method Tag comparisons
Direct mapped Index 1
n-way set Set index, then search n
associative entries within the set
Fully associative Search all entries #entries
Full lookup table 0

■ Hardware caches
■ Reduce comparisons to reduce cost
■ Virtual memory
■ Full table lookup makes full associativity feasible
■ Benefit in reduced miss rate

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 119

Replacement
■ Choice of entry to replace on a miss
■ Least recently used (LRU)
■ Complex and costly hardware for high associativity
■ Random
■ Close to LRU, easier to implement
■ Virtual memory
■ LRU approximation with hardware support

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 120

Write Policy
■ Write-through
■ Update both upper and lower levels
■ Simplifies replacement, but may require write
buffer
■ Write-back
■ Update upper level only
■ Update lower level when block is replaced
■ Need to keep more state
■ Virtual memory
■ Only write-back is feasible, given disk write
latency

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 121

Sources of Misses
■ Compulsory misses (aka cold start misses)
■ First access to a block
■ Capacity misses
■ Due to finite cache size
■ A replaced block is later accessed again
■ Conflict misses (aka collision misses)
■ In a non-fully associative cache
■ Due to competition for entries in a set
■ Would not occur in a fully associative cache of
the same total size

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 122

Cache Design Trade-offs
Design change Effect on miss rate Negative performance
effect
Increase cache size Decrease capacity May increase access
misses time
Increase associativity Decrease conflict May increase access
misses time
Increase block size Decrease compulsory Increases miss
misses penalty. Very large
block could increase
miss rate.

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 123

§5.9 Using a Finite State Machine to Control A Simple Cache
Cache Control
■ Example cache characteristics
■ Direct-mapped, write-back, write allocate
■ Block size: 4 words (16 bytes)
■ Cache size: 16 KB (1024 blocks)
■ 32-bit byte addresses
■ Valid bit and dirty bit per block
■ Blocking cache
■ CPU waits until access is complete

31 10 9 4 3 0
Tag Index Offset
18 bits 10 bits 4 bits

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 124

Interface Signals

Read/Write Read/Write
Valid Valid
32 32
Address Address
32 Cache 128 Memory
CPU Write Data Write Data
32 128
Read Data Read Data
Ready Ready

Multiple cycles
per access

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 125

Finite State Machines
■ Use an FSM to
sequence control steps
■ Set of states, transition
on each clock edge
■ State values are binary
encoded
■ Current state stored in a
register
■ Next state
= fn (current state,
current inputs)
■ Control output signals
= fo (current state)
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 126
Cache Controller FSM

Could partition
into separate
states to
reduce clock
cycle time

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 127

Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
75% (4)
Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
105 pages
Memory Organization Ch41
No ratings yet
Memory Organization Ch41
51 pages
5 4 Virtual Memory Cache Coherence
No ratings yet
5 4 Virtual Memory Cache Coherence
42 pages
Chapter 05
No ratings yet
Chapter 05
113 pages
Chapter 05
No ratings yet
Chapter 05
105 pages
Associative Mapping
No ratings yet
Associative Mapping
65 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
L - 3-AssociativeMapping - Virtual Memory
No ratings yet
L - 3-AssociativeMapping - Virtual Memory
52 pages
Chapter 3 Large and Fast
No ratings yet
Chapter 3 Large and Fast
86 pages
CH10 - Memory Hierarchy
No ratings yet
CH10 - Memory Hierarchy
106 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
Chapter 5 Large and Fast Exploiting Memory Hierarchy
No ratings yet
Chapter 5 Large and Fast Exploiting Memory Hierarchy
101 pages
Help 2
No ratings yet
Help 2
102 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
61 pages
Chapter 5 Large and Fast Exploiting Memory Hierarchy
No ratings yet
Chapter 5 Large and Fast Exploiting Memory Hierarchy
96 pages
Cache Memory-Direct Mapping
0% (1)
Cache Memory-Direct Mapping
30 pages
Lecture-04, Adv. Computer Architecture, CS-522
No ratings yet
Lecture-04, Adv. Computer Architecture, CS-522
39 pages
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
No ratings yet
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
77 pages
Chapter 05
No ratings yet
Chapter 05
105 pages
Large and Fast: Exploiting Memory Hierarchy: The Hardware/Software Interface
No ratings yet
Large and Fast: Exploiting Memory Hierarchy: The Hardware/Software Interface
33 pages
Lecture 19 Basics of Cache
No ratings yet
Lecture 19 Basics of Cache
23 pages
Chapter 05
No ratings yet
Chapter 05
52 pages
Lecture-17 CH-05 1
No ratings yet
Lecture-17 CH-05 1
21 pages
04 - Large and Fast Exploiting Memory Hierarchy
No ratings yet
04 - Large and Fast Exploiting Memory Hierarchy
92 pages
04 Cache Memory
No ratings yet
04 Cache Memory
71 pages
Week6 Memory Part2
No ratings yet
Week6 Memory Part2
23 pages
CH 4.ppt Type I
No ratings yet
CH 4.ppt Type I
60 pages
Lec 2
No ratings yet
Lec 2
26 pages
Unit 5 1 Cache Performance V 2
No ratings yet
Unit 5 1 Cache Performance V 2
29 pages
Lecture 9 - The Memory Hierarchy
No ratings yet
Lecture 9 - The Memory Hierarchy
25 pages
Computer Organization & Architecture: Cache Memory
No ratings yet
Computer Organization & Architecture: Cache Memory
71 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
24 pages
Large and Fast: Exploiting Memory Hierarchy: Computer Organization and Design
No ratings yet
Large and Fast: Exploiting Memory Hierarchy: Computer Organization and Design
107 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
No ratings yet
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
47 pages
Module 5 - 5.1 Overview of Computer Memory
No ratings yet
Module 5 - 5.1 Overview of Computer Memory
65 pages
04 - Cache Memory PDF
No ratings yet
04 - Cache Memory PDF
71 pages
Large and Fast: Exploiting Memory Hierarchy: Omputer Rganization and Esign
No ratings yet
Large and Fast: Exploiting Memory Hierarchy: Omputer Rganization and Esign
87 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
72 pages
Computer Organization & Architecture: Cache Memory
No ratings yet
Computer Organization & Architecture: Cache Memory
52 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
79 pages
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
No ratings yet
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
47 pages
Computer Organization and Architecture: Cache Memory
100% (1)
Computer Organization and Architecture: Cache Memory
57 pages
Memory
No ratings yet
Memory
12 pages
William Stallings Computer Organization and Architecture 7th Edition
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition
57 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
71 pages
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
55 pages
03-Chap4-Cache Memory Mapping
No ratings yet
03-Chap4-Cache Memory Mapping
24 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
71 pages
Chapter5 FINAL ASSOC VM
No ratings yet
Chapter5 FINAL ASSOC VM
40 pages
Chapter 5: Large and Fast Exploiting Memory Hierarchy Notes
No ratings yet
Chapter 5: Large and Fast Exploiting Memory Hierarchy Notes
16 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Chap 6
No ratings yet
Chap 6
48 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
51 pages
Introduction To Computing
No ratings yet
Introduction To Computing
45 pages
Vijeo Designer 4.6 Course Manual
100% (2)
Vijeo Designer 4.6 Course Manual
428 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
Address Decoding Technique
100% (1)
Address Decoding Technique
8 pages
PST Notes
No ratings yet
PST Notes
55 pages
9626 Scheme of Work (For Examination From 2020)
No ratings yet
9626 Scheme of Work (For Examination From 2020)
62 pages
Universal Willem EPROM Programmer Quick Start Guide V1.2: 1. Installation
100% (1)
Universal Willem EPROM Programmer Quick Start Guide V1.2: 1. Installation
8 pages
Computer Science
No ratings yet
Computer Science
316 pages
25 Electronics
No ratings yet
25 Electronics
30 pages
Com Mid Term
No ratings yet
Com Mid Term
3 pages
Andy Grove PDF
No ratings yet
Andy Grove PDF
9 pages
T2 CPU Performance
No ratings yet
T2 CPU Performance
26 pages
SHARP LC-26D44 & LC32D44+Training+Course+ (English) PDF
No ratings yet
SHARP LC-26D44 & LC32D44+Training+Course+ (English) PDF
47 pages
Fanuc 6 Alarm Codes, 6T 6M ..
No ratings yet
Fanuc 6 Alarm Codes, 6T 6M ..
8 pages
Intro To Comp CSC 111 - Intensive 2
No ratings yet
Intro To Comp CSC 111 - Intensive 2
36 pages
Ict Dpte Part 1
No ratings yet
Ict Dpte Part 1
34 pages
Lecture - 2 (GST 100) PDF
No ratings yet
Lecture - 2 (GST 100) PDF
35 pages
Microprocessors Lecture11,12predicted
No ratings yet
Microprocessors Lecture11,12predicted
23 pages
At Mega 8535
No ratings yet
At Mega 8535
275 pages
Lab Sheet 03 - COA
No ratings yet
Lab Sheet 03 - COA
3 pages
BCS Unit1
No ratings yet
BCS Unit1
33 pages
Module 1-Introduction To Computers (PART 1)
No ratings yet
Module 1-Introduction To Computers (PART 1)
51 pages
Coa - Bits Answers
No ratings yet
Coa - Bits Answers
4 pages
Class 9 SIP Unit Test 1
No ratings yet
Class 9 SIP Unit Test 1
1 page
What Is An Operating System?: Unit - 1
No ratings yet
What Is An Operating System?: Unit - 1
18 pages
02 - Task - Performance - 1 (7) 0s
No ratings yet
02 - Task - Performance - 1 (7) 0s
2 pages
Lab Assignment 1.1
No ratings yet
Lab Assignment 1.1
7 pages
Lecture 3
No ratings yet
Lecture 3
21 pages
Chapter 15: Query Processing
No ratings yet
Chapter 15: Query Processing
36 pages
Module 3 Program Development Tools in The tms320f28335 1 PDF
No ratings yet
Module 3 Program Development Tools in The tms320f28335 1 PDF
47 pages
Course Instructor: Nausheen Shoaib
No ratings yet
Course Instructor: Nausheen Shoaib
70 pages
Storage Area Networks For Dummies
From Everand
Storage Area Networks For Dummies
Christopher Poelker
3.5/5 (2)
CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 2 Exam 220-1102
From Everand
CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 2 Exam 220-1102
Mike Chapple
No ratings yet