0% found this document useful (0 votes)

26 views31 pages

Week 13 - Lecture 13 - Memory (Cont)

The lecture discusses memory hierarchy, focusing on the principles of locality, cache performance, and the impact of cache miss rates on CPU time. It covers various cache configurations, including direct mapped and set associative caches, as well as replacement policies and multi-level cache designs. Examples illustrate how to compute memory stall cycles and average memory access time (AMAT), emphasizing the importance of cache design in optimizing performance.

Uploaded by

tuyetngan0518

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views31 pages

Week 13 - Lecture 13 - Memory (Cont)

Uploaded by

tuyetngan0518

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

ELT3047 Computer Architecture

Lecture 13: Memory (cont.)

Hoang Gia Hung

Faculty of Electronics and Telecommunications
University of Engineering and Technology, VNU Hanoi
Last lecture review (1)
❑ The Memory Hierarchy
➢ Take advantage of the principle of locality to present the user with as
much memory as is available in the cheapest technology at the speed
offered by the fastest technology.

Processor
4-8 bytes (word)
Inclusive–
Increasing L1$ what is in L1$
distance 8-32 bytes (block) is a subset of
from the L2$ what is in L2$
processor in is a subset of
1 to 4 blocks
access time what is in MM
Main Memory
that is a subset
1,024+ bytes (disk sector = page) of is in SM

Secondary Memory

(Relative) size of the memory at each level

Last lecture review (2)
❑ Temporal Locality
➢ Keep most recently accessed data
Lower Level
items closer to the processor. To Processor Upper Level Memory
Memory
❑ Spatial Locality From Processor
Blk X
Blk Y
➢ Move blocks consisting of
contiguous words to the upper levels
❑ Hit Time << Miss Penalty
➢ Hit: data appears in some block in the upper level (Blk X)
▪ Hit rate: fraction of accesses found in the upper level
▪ Hit Time: RAM access time + time to determine hit/miss
➢ Miss: data needs to be retrieve from a lower level block (Blk Y)
#miss
▪ Miss rate: = 1 – (Hit rate)
#accesses
▪ Miss penalty: Time to replace a block in the upper level with a block
from the lower level + Time to deliver this block’s word to the processor
▪ Miss types: Compulsory, Conflict, Capacity
Measuring Cache Performance
❑ The processor stalls on a cache miss
➢ When fetching instructions from the Instruction Cache (I-cache)
➢ When loading or storing data into the Data Cache (D-cache)
➢ Miss penalty is assumed equal for I-cache & D-cache
➢ Miss penalty is assumed equal for Load and Store

❑ Components of CPU time:

➢ Program execution cycles (includes cache hit time)
➢ Memory stall cycles (mainly from cache misses)
➢ CPU time = IC × CPI × CC = IC × (CPIideal + Memory-stall cycles) × CC
CPIstall
▪ CPIideal = CPI for ideal cache (no cache misses)
▪ CPIstall = CPI in the presence of memory stalls
▪ Memory stall cycles increase the CPI!
Memory Stall Cycles
❑ Sum of read-stalls and write-stalls (due to cache misses)
➢ Read-stall cycles = reads/program × read miss rate × read miss penalty
➢ Write-stall cycles = (writes/program × write miss rate × write miss penalty)
+ write buffer stalls

❑ Memory stall cycles = (I-Cache Misses + D-Cache Misses) ×

Miss Penalty
➢ I-Cache Misses = I-Count × I-Cache Miss Rate
➢ D-Cache Misses = LS-Count × D-Cache Miss Rate
▪ LS-Count (Load & Store) = I-Count × LS Frequency

❑ With simplifying assumptions:

Memory stall cycles = I-Count x misses/instruction x miss penalty
I-Cache Miss Rate + LS Frequency × D-Cache Miss Rate
➢ Memory stall cycles/instruction = I-Cache Miss Rate × Miss Penalty +
LS Frequency × D-Cache Miss Rate × Miss Penalty
➢ For write-through caches: Memory-stall cycles = miss rate × miss penalty
Memory Stall Cycles: example
❑ Example: Compute misses/instruction and memory stall cycles
for a program with the given characteristics
▪ Instruction count (I-Count) = 106 instructions
▪ 30% of instructions are loads and stores
▪ D-cache miss rate is 5% and I-cache miss rate is 1%
▪ Miss penalty is 100 clock cycles for instruction and data caches

❑ Solution:
➢ misses/instruction=1%+30%x5%=0.025;
➢ memory stall cycles/instruction=0.025x100=2.5 cycles
➢ total memory stall cycles=2.5x106=2,500,000 cycles
Impacts of Cache Performance
❑ Relative cache penalty increases as processor performance
improves (faster clock rate and/or lower CPI)
➢ Memory speed is unlikely to improve as fast as processor cycle time → when
calculating CPIstall, the cache miss penalty is measured in processor clock
cycles needed to handle a miss.
➢ The lower the CPIideal, the more pronounced the impact of stalls

❑ Example: Given
▪ I-cache miss rate = 2%, D-cache miss rate = 4%
▪ Miss penalty = 100 cycles
▪ Base CPI (ideal cache) = 2
▪ Load & stores are 36% of instructions
Questions:
➢ What is CPIstall? 2+(2%+36%x4%)x100 = 5.44, % time on memory stall = 63%
➢ What if the CPIideal is reduced to 1? % time on memory stall = 77%
➢ What if the processor clock rate is doubled? Miss penalty = 200, CPIstall = 8.88
Average Memory Access Time (AMAT)
❑ Hit time is also important for performance
➢ A larger cache will have a longer access time → an increase in hit time will
likely add another stage to the pipeline.
➢ At some point, the increase in hit time for a larger cache will overcome the
improvement in hit rate leading to a decrease in performance.

❑ Average Memory Access Time (AMAT) is the average time to

access memory considering both hits and misses.
AMAT = Hit time + Miss rate × Miss penalty
❑ Example: Find the AMAT for a cache with
▪ Cache access time (Hit time) of 1 cycle = 2 ns
▪ Miss penalty of 20 clock cycles
▪ Miss rate of 0.05 per access

❑ Solution:
➢ AMAT = 1 + 0.05 × 20 = 2 cycles = 4 ns
➢ Without the cache, AMAT will be equal to miss penalty = 20 cycles = 40 ns
Reducing cache miss rates #1: cache
associativity
❑ Allow more flexible block placement
➢ In a direct mapped cache a memory block maps to exactly one cache block
➢ At the other extreme, could allow a memory block to be mapped to any cache
block → fully associative cache (no indexing)

❑ A compromise is to divide the cache into sets, each of which

consists of n “ways” (n-way set associative).
➢ A memory block maps to a unique set (specified by the index field) and can
be placed in any way of that set (so there are n choices).
Set index = (block address) modulo (# sets in the cache)
❑ Example: consider the main memory word reference for the
following string
0 4 0 4 0 4 0 4
➢ Start with an empty cache - all blocks initially marked as not valid
Set Associative Cache: Example
Main Memory

0000xx One word blocks

Cache 0001xx Two low order bits
0010xx define the byte in the
Way Set V Tag Data
0011xx word (32b words)
0 0100xx
0 0101xx
1
0110xx
0 Q2: How do we find it?
1 0111xx
1
1000xx Use next 1 low order
1001xx memory address bit to
Q1: Is it there?
1010xx determine which cache
Compare all the cache 1011xx set (i.e., modulo the
tags in the set to the high 1100xx number of sets in the
order 3 memory address 1101xx
cache)
bits to tell if the memory 1110xx
block is in the cache 1111xx
Set associative cache example:
reference string mapping
0 4 0 4 0 4 0 4

0 miss 4 miss 0 hit 4 hit

000 Mem(0) 000 Mem(0) 000 Mem(0) 000 Mem(0)

010 Mem(4) 010 Mem(4) 010 Mem(4)

❑ 8 requests, 2 misses
❑ Solves the ping pong effect in a direct mapped cache due to
conflict misses since now two memory locations that map into
the same cache set can co-exist!
Four-Way Set Associative Cache
Organization

28 = 256 sets
each with Way 0 Way 1 Way 2 Way 3
four ways
(each with
one block)

Content Addressable Memory

(CAM): a circuit that combines
comparison and storage in a single
device - supply the data, it will look
for a copy & returns the index of
the matching row → CAM allows
much higher set associativity (8-
way and above) than the standard
HW of SRAMs + comparators.
Range of Set Associative Caches
Used for tag compare Selects the set Selects the word in the block

Tag Index Block offset Byte offset

Increasing associativity
Decreasing associativity
Fully associative
Direct mapped (only one set)
(only one way) Tag is all the bits except
Smaller tags, only a block and byte offset
single comparator

❑ For a fixed size cache, each increase by a factor of two in

associativity doubles the number of blocks per set (= the number
of ways) and halves the number of sets – decreases the size of
the index by 1 bit and increases the size of the tag by 1 bit.
Replacement Policy
❑ A miss occurred, which way’s block do we pick for replacement?
➢ Direct mapped: no choice.
➢ Set associative: non-valid entry, then choose among entries in the set.

❑ First In First Out (FIFO): replace the oldest block in set

➢ Use one counter per set to specify the oldest block. On a cache miss replace
the block specified by counter & increment the counter.

❑ Least Recently Used (LRU): replace the one that has been
unused for the longest time
➢ Requires hardware to keep track of when each way’s block was used relative
to the other blocks in the set. For 2-way set associative, takes one bit per set
→ set the bit when a block is referenced (and reset the other way’s bit)
➢ Manageable for 4-way, too hard beyond that.

❑ Random
➢ Gives approximately the same performance as LRU for high associativity.
How Much Associativity?
❑ Increased associativity 12
4KB
decreases miss rate 10 8KB
➢ But with diminishing returns 16KB
8

Miss Rate
32KB
❑ The choice of direct 64KB
6
mapped or set associative 128KB
depends on the cost of a 4 256KB
512KB
miss versus the cost of 2
implementation. 0
1-way 2-way 4-way 8-way
❑ N-way set associative
Associativity
cache costs
➢ N comparators (delay and area)
➢ MUX delay (set selection) before data is available
➢ Data available after set selection and Hit/Miss decision (c.f. direct mapped
cache: the cache block is available before the Hit/Miss decision) → can be
an important consideration (why?).
Reducing Cache Miss Rates #2: multi-
level caches
❑ Use multiple levels of caches
➢ Primary (L1) cache attached to CPU
➢ Larger, slower, L2 cache services misses from primary cache. With
advancing technology → have more than enough room on the die for L2,
normally a unified cache (i.e., it holds both instructions and data) and in some
cases even a unified L3 cache.

❑ Example: Given
▪ CPU base CPI = 1, clock rate = 4GHz
▪ Miss rate/instruction = 2%
▪ Main memory access time = 100ns
Questions:
➢ Compute the actual CPI with just primary cache.
➢ Compute the performance gain if we add L2 cache with
▪ Access time = 5ns
▪ Global miss rate to main memory = 0.5%
Multi-level cache: example solution
❑ With just primary cache
➢ Miss penalty = 100ns/0.25ns = 400 cycles
➢ CPIstall = 1 + 0.02 × 400 = 9

❑ With added L2 cache

➢ Primary miss with L2 hit: penalty = 5ns/0.25ns = 20 cycles
➢ Primary miss with L2 miss: penalty = L2 access stall + Main memory stall =
20 + 400 = 420 cycles
➢ CPIstall = 1 + (0.02 - 0.005) × 20 + 0.005 x 420 = 3.4 cycles
➢ [Alternatively, CPIstall = 1 + L1 stalls/instruction + L2 stalls/instruction = 1 +
0.02 x 20 + 0.005 x 400 = 3.4 cycles]
➢ Performance gain = 9/3.4=2.6 times.
Multilevel Cache Design Considerations
❑ Design considerations for L1 and L2 caches are very different
➢ Primary cache should focus on minimizing hit time in support of a shorter
clock cycle → smaller with smaller block sizes.
➢ Secondary cache(s) should focus on reducing miss rate to reduce the
penalty of long main memory access times → larger with larger block sizes &
higher levels of associativity.

❑ The miss penalty of the L1 cache is significantly reduced by the

presence of an L2 cache – so it can be smaller (i.e., faster) but
have a higher miss rate
❑ For the L2 cache, hit time is less important than miss rate
➢ The L2$ hit time determines L1$’s miss penalty
➢ L2$ local miss rate >> the global miss rate
▪ Local miss rate = fraction of references to one level of a cache that miss
▪ Global miss rate = fraction of references that miss in all levels of a multi-
level cache → dictates how often we must access the main memory.
Multi-level cache parameters: two real-
life examples
Intel Nehalem AMD Barcelona
Split I$ and D$; 32KB for each Split I$ and D$; 64KB for each
L1 cache organization & size
per core; 64B blocks per core; 64B blocks
4-way (I), 8-way (D) set assoc.; 2-way set assoc.; LRU
L1 associativity
~LRU replacement replacement
L1 write policy write-back, write-allocate write-back, write-allocate
Unified; 256MB (0.25MB) per Unified; 512KB (0.5MB) per
L2 cache organization & size
core; 64B blocks core; 64B blocks
L2 associativity 8-way set assoc.; ~LRU 16-way set assoc.; ~LRU
L2 write policy write-back, write-allocate write-back, write-allocate
Unified; 8192KB (8MB) shared Unified; 2048KB (2MB) shared
L3 cache organization & size
by cores; 64B blocks by cores; 64B blocks
32-way set assoc.; evict block
L3 associativity 16-way set assoc.
shared by fewest cores
L3 write policy write-back, write-allocate write-back; write-allocate
The Cache Design Space
❑ Several interacting dimensions Cache Size
➢ cache size
➢ block size Associativity
➢ associativity
➢ replacement policy
➢ write-through vs write-back
➢ write allocation
Block Size
❑ The optimal choice is a compromise
➢ depends on access characteristics
▪ workload
▪ use (I-cache, D-cache, TLB) Bad
➢ depends on technology / cost

❑ Simplicity often wins Good Factor A Factor B

Less More
Memory: the next hierarchy

Processor
4-8 bytes (word)
Inclusive–
Increasing L1$ what is in L1$
distance 8-32 bytes (block) is a subset of
from the L2$ what is in L2$
processor in 1 to 4 blocks
is a subset of
access time what is in MM
Main Memory
that is a subset
1,024+ bytes (disk sector = page) of is in SM

Secondary Memory

(Relative) size of the memory at each level

Virtual Memory
❑ A technique that uses RAM as a “cache” for secondary storage
➢ Allows efficient and safe sharing of memory among multiple programs
➢ Provides the ability to run programs larger than the size of physical memory
➢ Simplifies loading a program for execution by enabling code relocation.

❑ What makes it work? – again the Principle of Locality

➢ A program is likely to access a relatively small portion of its address space
during any period of time

❑ Each program is compiled into its own address space – a

“virtual” address space
➢ The processor generates virtual addresses while the memory is accessed
using physical addresses (real locations in main memory) → each virtual
address must be translated to a physical address.
➢ Some chunks of virtual memory can be present on disk, not in main memory.
➢ Multiple programs can use (different chunks of physical) memory at same
time.
Virtual memory: two programs sharing
physical memory
❑ A program’s address space is divided into pages (all one fixed
size) or segments (variable sizes)
➢ The starting location of each page (either in main memory or in secondary
memory) is contained in the program’s page table
Program 1
virtual address space

main memory

Program 2
virtual address space
Virtual memory: address translation
❑ Assuming fixed-size pages, each memory request first requires
an address translation from virtual space to physical space
➢ Done by a combination of hardware and software
➢ page fault: virtual memory miss (i.e., the page is not in physical memory).
Page fault penalty is very costly, often takes millions of clock cycles
Address Translation Mechanisms (1)
Virtual page # Offset

Physical page # Offset

Page table register

Mapping
Physical page
V base addr
1
1
1
index 1
into 1
page 1
table 0
1 Main memory
0
If valid bit is off,
then page is 1
not present in 0
memory.
Page Table
(in main memory)
32 bits wide = V + 18
bits PPN + extra bits
Disk storage
Replacement and Writes
❑ To reduce page fault rate, prefer least-recently used (LRU)
replacement
➢ Reference bit (aka use bit) in the page table entry set to 1 on access to
page
➢ Periodically cleared to 0 by OS
➢ A page with reference bit = 0 means it has not been used recently

❑ Disk writes take millions of cycles

➢ Block at once, not individual locations
➢ Write through is impractical
➢ Use write-back
➢ Dirty bit in the page table entry set when page is written
Handling page fault & space
optimization
❑ A page fault is like a cache miss
➢ Must find page in lower level of hierarchy
➢ If valid bit is zero, the Physical Page Number points to a page on disk

❑ When OS starts new process, it creates space on disk for all the
pages of the process (all valid bits in page table = zero)
➢ called Demand Paging - pages of the process are loaded from disk only as
needed

❑ Page Table too big!

➢ 4GB Virtual Address Space ÷ 4 KB page
➢ 1 million Page Table Entries ≈ 4 MB just for Page Table of a single process!

❑ Variety of solutions to tradeoff Page Table size for slower

performance
➢ E.g., Multi-level page table, Paging page tables, etc.
Address translation optimization
❑ Virtual Memory would appear to require extra memory
references
➢ one to translate Virtual Address into Physical Address (page table lookup) -
Page Table is in physical memory
➢ one to transfer the actual data (hopefully cache hit)

❑ But access to page tables has good locality

➢ So use a fast cache of page tables within the CPU
V Virtual Page # Physical Page # Dirty Ref Access

➢ Called a Translation Look-aside Buffer (TLB)

➢ Typical: 16–512 entries, 0.5–1 cycle for hit, 10–100 cycles for miss, 0.01%–
1% miss rate
➢ Misses could be handled by hardware or software
Making Address Translation Fast (2)
A TLB in the Memory Hierarchy
¼ t hit ¾t
VA PA miss
TLB Main
CPU Cache
Lookup Memory
miss hit

Trans-
lation
data

❑ A TLB miss – is it a page fault or merely a TLB miss?

➢ If the page is loaded into main memory → TLB miss can be handled by
loading the translation information from the page table into the TLB (takes
10’s of cycles to find and load the translation info into the TLB)
➢ If the page is not in main memory, then it’s a true page fault (takes millions of
cycles to service a page fault)

❑ TLB misses are much more frequent than true page faults
Summary: steps in memory access

Unit-6 Cache Memory Organization
No ratings yet
Unit-6 Cache Memory Organization
36 pages
Week12 Updated
No ratings yet
Week12 Updated
60 pages
Unit 4
No ratings yet
Unit 4
72 pages
Chapter 4 Memory Organization Lecture
No ratings yet
Chapter 4 Memory Organization Lecture
54 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
CS252 Graduate Computer Architecture Caches and Memory Systems I
No ratings yet
CS252 Graduate Computer Architecture Caches and Memory Systems I
49 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Cache Org
No ratings yet
Cache Org
19 pages
CA11 2023S1 New
No ratings yet
CA11 2023S1 New
26 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
38 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
49 pages
Unit Iv
No ratings yet
Unit Iv
61 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
24-Cache Memory Mapping Techniques-14!03!2024
No ratings yet
24-Cache Memory Mapping Techniques-14!03!2024
36 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
EE6304 Lecture9 Mem Caches
No ratings yet
EE6304 Lecture9 Mem Caches
61 pages
10 Cacheperf
No ratings yet
10 Cacheperf
24 pages
Cache 1 54
No ratings yet
Cache 1 54
54 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
No ratings yet
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
33 pages
Cache
No ratings yet
Cache
34 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
CA Lecture 08
No ratings yet
CA Lecture 08
38 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
76 pages
ch5 Easy
No ratings yet
ch5 Easy
27 pages
5 1
No ratings yet
5 1
39 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
CMSC 611: Advanced Computer Architecture
No ratings yet
CMSC 611: Advanced Computer Architecture
21 pages
Improving Cache Performance
No ratings yet
Improving Cache Performance
24 pages
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
No ratings yet
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
13 pages
Unit V
No ratings yet
Unit V
44 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
CODch 7 Slides
No ratings yet
CODch 7 Slides
49 pages
Chapter 4 Network Layer
No ratings yet
Chapter 4 Network Layer
54 pages
Log
No ratings yet
Log
184 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
No ratings yet
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
7 pages
WinTariff Readme
100% (1)
WinTariff Readme
35 pages
10 Cache Memories
No ratings yet
10 Cache Memories
49 pages
Ict HSC Chapter-03 10 Cq-2024
No ratings yet
Ict HSC Chapter-03 10 Cq-2024
40 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Windows 10 and Windows Server 2016 Policy Settings
No ratings yet
Windows 10 and Windows Server 2016 Policy Settings
1,868 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Modern Operating Systems
No ratings yet
Modern Operating Systems
17 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
2015Sp CS61C L16 Kavs Caches3
No ratings yet
2015Sp CS61C L16 Kavs Caches3
25 pages
Cloud Computing - Chapter 2
No ratings yet
Cloud Computing - Chapter 2
47 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
Master Slave
No ratings yet
Master Slave
6 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
G1 Comp PA-1 QP
No ratings yet
G1 Comp PA-1 QP
5 pages
Aspire E5-491G Compal A4WAD LA-C871P Rev 1.0
100% (1)
Aspire E5-491G Compal A4WAD LA-C871P Rev 1.0
61 pages
GPL Keyboard Assignment ENG v2.1 2007
No ratings yet
GPL Keyboard Assignment ENG v2.1 2007
1 page
Cache Design
No ratings yet
Cache Design
59 pages
20341B TrainerPrepGuide
No ratings yet
20341B TrainerPrepGuide
6 pages
hw4 Sol
No ratings yet
hw4 Sol
4 pages
Data Domain Commands
100% (1)
Data Domain Commands
2 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
Nec 78k0 Kb2
No ratings yet
Nec 78k0 Kb2
27 pages
02 Introduction To RPi
No ratings yet
02 Introduction To RPi
22 pages
Manual Manutenção Toshiba Satellite-L630-L635-Pro-L630-L635
No ratings yet
Manual Manutenção Toshiba Satellite-L630-L635-Pro-L630-L635
215 pages
Fanuc Open CNC: Operator'S Manual
No ratings yet
Fanuc Open CNC: Operator'S Manual
100 pages
Ice x186
No ratings yet
Ice x186
93 pages
Ese 2023 Coa
No ratings yet
Ese 2023 Coa
4 pages
2024 12 26T19 11 34 - R3dlog
No ratings yet
2024 12 26T19 11 34 - R3dlog
5 pages
F5 80question
No ratings yet
F5 80question
8 pages
Informatica Power Center 9.0.1: Informatica Load Balancing Lab Guide 35
100% (1)
Informatica Power Center 9.0.1: Informatica Load Balancing Lab Guide 35
9 pages
Optiplex 3020 Desktop - Owners Manual - en Us
No ratings yet
Optiplex 3020 Desktop - Owners Manual - en Us
47 pages
300Mbps Wireless N 4G LTE Router: Wi-Fi Where You Need It
No ratings yet
300Mbps Wireless N 4G LTE Router: Wi-Fi Where You Need It
7 pages
Chapter-5 Real Time Operating Systems Context Switching Mechanism
No ratings yet
Chapter-5 Real Time Operating Systems Context Switching Mechanism
8 pages
How To Fix Missing Skype For Business Meeting Add-In For Outlook 2013
No ratings yet
How To Fix Missing Skype For Business Meeting Add-In For Outlook 2013
12 pages
Windows - xp.Sp3.Corporate - Student.edition - August.2012. (Team LiL) .Including - Sata.and - Raid
No ratings yet
Windows - xp.Sp3.Corporate - Student.edition - August.2012. (Team LiL) .Including - Sata.and - Raid
4 pages
Log
No ratings yet
Log
21 pages
Lek Is401 DM
No ratings yet
Lek Is401 DM
1 page
Boat Output
No ratings yet
Boat Output
2 pages

Week 13 - Lecture 13 - Memory (Cont)

Uploaded by

Week 13 - Lecture 13 - Memory (Cont)

Uploaded by

ELT3047 Computer Architecture

Lecture 13: Memory (cont.)

Hoang Gia Hung

(Relative) size of the memory at each level

❑ Components of CPU time:

❑ Memory stall cycles = (I-Cache Misses + D-Cache Misses) ×

❑ With simplifying assumptions:

❑ Average Memory Access Time (AMAT) is the average time to

❑ A compromise is to divide the cache into sets, each of which

0000xx One word blocks

0 miss 4 miss 0 hit 4 hit

010 Mem(4) 010 Mem(4) 010 Mem(4)

Content Addressable Memory

Tag Index Block offset Byte offset

❑ For a fixed size cache, each increase by a factor of two in

❑ First In First Out (FIFO): replace the oldest block in set

❑ With added L2 cache

❑ The miss penalty of the L1 cache is significantly reduced by the

❑ Simplicity often wins Good Factor A Factor B

(Relative) size of the memory at each level

❑ What makes it work? – again the Principle of Locality

❑ Each program is compiled into its own address space – a

Physical page # Offset

❑ Disk writes take millions of cycles

❑ Page Table too big!

❑ Variety of solutions to tradeoff Page Table size for slower

❑ But access to page tables has good locality

➢ Called a Translation Look-aside Buffer (TLB)

❑ A TLB miss – is it a page fault or merely a TLB miss?

You might also like