0% found this document useful (0 votes)

21 views

Lecture 8

Uploaded by

wmostafa021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Lecture 8

Uploaded by

wmostafa021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

CSE 132

Computer Architecture CSE 132

Dr. Gamal Fahmy
Lecture 8
Memory Hierarchy Design
Memory Hierarchy Design
• As time goes, programmers need
unlimited memory
• Large memories are expensive, inefficient,
and slow
• Memory hierarchy is the optimal solution
for the cost-performance trade off in
memory technologies
• Principle of locality
Memory Hierarchy Design
• Principle of locality
• Amdahl's law states that the performance
improvement to be gained from using some
faster mode of execution is limited by the
fraction of the time the faster mode can be
used 1
• Overall speed up= (1− f ) + f
s
• For a computer that has 10% of its code run 90 faster,
speed up = 1.1097
• For a computer that has 90% of its code run 10 faster,
speed up = 5.2631
Memory Hierarchy Design
• 90/10 rule comes from empirical observation:
"A program spends 90% of its time in 10% of its
code"
• we can predict with reasonable accuracy what instructions and data
a program will use in the near future based on its accesses in the
recent past.

• Two different types :

Temporal locality: states that recently accessed items
are likely to be accessed in the near future.
• Spatial locality: says that items whose addresses are
near one another tend to be referenced close
together in time.
Memory Hierarchy Design
• Smaller memories hold the most recently
accessed items close to the CPU and
successively larger (and slower, and less
expensive) memories as we move away
from the CPU.

• This type of organization is called a

memory hierarchy .
• Two important levels of the memory
hierarchy are the cache and virtual
memory.
Memory Hierarchy Design

CPU I/O
Cache Memory devices
Registers

Upper Lower
level level

Size 200B 64KB 32MB 2GB

Speed 5ns 10ns 100ns 5ms

Hits, misses ?
Memory Hierarchy Design
• To evaluate the effectiveness of the
memory hierarchy we can use the formula:
• Memory_stall_cycles =

IC * Mem_Refs * Miss_Rate * Miss_Penalty

• where IC = Instruction count

• Mem_Refs = Memory References per Instruction
• Miss_Rate = the fraction of accesses that are not in the cache
• Miss_Penalty = the additional time to service the miss
Example
• A computer have 1.0 CPI, memory access (loads and
stores) are 50% of the instructions, miss penalty is 25
clock-cycle and we have 2% miss rate, how much faster
with no misses

• CPU execution =(CPU clock cycle + Memory

stall cycles)*clock cycles
• Memory stalls cycles=IC* (Memory_access/Instruction)
*miss rate*miss penalty
• CPU execution with no miss=(CPU clock cycle)*clock
cycles
Memory Hierarchy Design
• Four main issues to consider when
designing a hierarchical memory

– Block Place
– Block ID
– Block replaced
– Cache Main memory interactions
Block Place
• 3 methods to place blocks in the cache
– Direct mapped: has only one slot
» (Block address) MOD (Number of blocks in cache)

– Fully associative: can be any where in the

memory
– Set associative : can be within a set of places
» (Block address) MOD (Number of sets in cache)
frame0 frame24
frame1 frame25

Block Place frame2

frame3
frame26
frame27
frame4 frame28
frame5 frame29
frame6 frame30
Direct mapped frame7 frame31
frame8 frame32
frame9 frame33
frame0
frame10 frame34
frame1 frame11 frame35
frame2 frame12 frame36
frame3 frame13 frame37
frame4 frame14 frame36
frame5 frame15 frame37
frame6 frame16 frame38
frame7 frame17 frame39
frame18 frame40

frame19 frame41

frame20 frame42

frame21 frame43

frame22 frame44
frame45
frame23
frame0 frame24
frame1 frame25

Block Place frame2

frame3
frame26
frame27
frame4 frame28
frame5 frame29
frame6 frame30

Fully associative frame7

frame8
frame31
frame32
frame9 frame33
frame0
frame10 frame34
frame1 frame11 frame35
frame2 frame12 frame36
frame3 frame13 frame37
frame4 frame14 frame36
frame5 frame15 frame37
frame6 frame16 frame38
frame7 frame17 frame39
frame18 frame40

frame19 frame41

frame20 frame42

frame21 frame43

frame22 frame44
frame45
frame23
frame0 frame24
frame1 frame25

Block Place frame2

frame3
frame26
frame27
frame4 frame28
frame5 frame29
frame6 frame30
Set associative frame7 frame31
frame8 frame32
frame9 frame33
frame0
frame10 frame34
Set 0 frame1 frame11 frame35
frame2 frame12 frame36
Set 1 frame3 frame13 frame37
frame4 frame14 frame36
Set 2 frame5 frame15 frame37

frame6 frame16 frame38

Set 3 frame17 frame39
frame7
frame18 frame40

frame19 frame41

frame20 frame42

frame21 frame43

frame22 frame44
frame45
frame23
Block Identification
• Cache memory consists of two portions:
• Directory
- Address Tags ( checked to match the block address from CPU )
- Control Bits ( indicate that the content of a block is valid )
RAM
- Block Frames ( contain data of blocks )

• For an address structure, that has 16 MB memory size, 512 KB

cache size, and 32 Block size, what is the address for the fully
associative methodology, direct mapped, and set associative
(set size is 2 blocks)
Block Identification
• Direct mapped
» Memory size=16 MB=2** 24 bits
» Block size=32 B=2**5 bits
» Number 0f blocks in cache= cache size/Block
size=2**14 bits
» Number of bits in tag=24-5-14=5
5 bits 14 bits 5 bits

tag index offset

24 bits
Block Identification
• Fully associative
» Memory size=16 MB=2** 24 bits
» Block size=32 B=2**5 bits

» Number of bits in tag=24-5=19

19 bits
5 bits

tag offset

24 bits
Block Identification
• Set Associative
» Memory size=16 MB=2** 24 bits
» Block size=32 B=2**5 bits
» Number 0f sets in cache= cache size/ (set size *
Block size) = 512KB/(2*32 B) = 2**13 bits
» Number of bits in tag=24-13-5=6
6 bits 13 bits 5 bits

tag index offset

24 bits
Block Replacement
• When a miss occurs, the cache controller must select a block to
be replaced with the desired data.
• With direct-mapped placement the decision is simple because
there is no choice: only one block frame is checked for a hit and
only that block can be replaced
• With fully-associative or set-associative placement, there are
more than one block to choose from on a miss
• Strategies
» First In First Out (FIFO)
» Most-Recently Used (MRU)
» Least-Frequently Used (LFU)
» Most-Frequently Used (MFU)
» Random
» Least-Recently Used (LRU)
Block Replacement
• Most two popular strategies
– Random - to spread allocation uniformly, candidate
blocks are randomly selected.
Advantage: simple to implement in hardware
Disadvantage: ignores principle of locality

– Least-Recently Used (LRU) - The block replaced is

the one that has been unused for the longest time.

Advantage: takes locality into account

Disadvantage: as the number of blocks to keep track
of increases, LRU becomes more expensive (harder
to implement, slower and often just approximated).
Example
• A computer with 256K memory and 4K cache that is organized in a
set associative manner, with 4 block frames per set and 64 words
per block. The cache is 10 times faster. If cache is initially empty,
suppose we fetch 4352 words from locations 0,1….4351 in order,
then repeat it 14 more times. Specify tag, index and offset, and
estimate the speedup from cache using LRU

Memory size=256 KW=2**18

Block size =64 W=2**6
Number of sets in cache=4KW/(4*64W)=2**4
Number of bits in Tag=18-4-6=8
Example
Cache Structure
S0 S1 S2 S3 s4 s5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F0 F0
F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1 F1
F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2 F2
F3 F3 F3 F3 F3 F3 F3 F3 F3 F3 F3 F3 F3 F3 F3 F3

Cache is 10 times faster then main memory. Assuming Cache Access

Time = t, then Memory Access Time = 10*t
We have 15 iterations, the total number of fetched blocks=4352/64=68
Total time for fetches without cache=Ninstr.*Memory access
time*Niter=4352*10*t*15=652800*t
Total time with for fetches with cache= Time for Fetches From Cache
(on hits) +Time for Fetches From Memory (on misses);
© Gamal Fahmy
Example
• Time for Fetches From Cache (on hits) =
Ninst. * Cache Access Time * Niter;
• Time for Fetches From Memory(on misses) =
Nmisses * Miss Penalty;

• Miss Penalty = Block Size * Memory Access Time = 64 * 10 * t

• First iteration, Nmiss=68
• Second iteration, Nmiss=20
• Third iteration, Nmiss=20 and so on
• Nmisses = 68 + 20*14 = 348
• Total Time for Fetches With Cache = 4352*15*t +
348*(64*10*t)= 65280*t + 222720*t= 288000*t
Example
• Speedup = Total Time for Fetches without Cache/ Total Time for
Fetches with Cache= 652800*t / 288000*t =2.26

• Repeat same problem using MRU

• Take home quiz

© Gamal Famy
Memory Interaction with Cache
• Reads dominate processor cache accesses. All
instruction accesses are reads, and most
instructions do not write to memory.
• The read policies for a miss are:
Read Through - reading a word from main
memory to CPU
No Read Through - reading a block from main
memory to cache and then from cache to CPU
Memory Interaction with Cache
• The write policies on write hit often distinguish
cache designs:
• Write Through - the information is written to both
the block in the cache and to the block in the
lower-level memory.
• Advantage:
- read miss never results in writes to main memory
- easy to implement
- main memory always has the most current copy of the data
(consistent)

• Disadvantage:
- write is slower
- every write needs a main memory access
- as a result uses more memory bandwidth
Memory write on a hit (cont.)
• Write back - the information is written only to
the block in the cache. The modified cache
block is written to main memory only when it is
replaced.

• Advantage:
- writes occur at the speed of the cache memory
- multiple writes within a block require only one write to main
memory
- as a result uses less memory bandwidth
• Disadvantage:
- harder to implement
- main memory is not always consistent with cache
- reads that result in replacement may cause writes to main
memory
Memory write on a miss (cont.)
• Write Allocate - the block is loaded on a write miss,
followed by the write-hit action.

• No Write Allocate - the block is modified in the main

memory and not loaded into the cache.

• Although either write-miss policy could be used with

write through or write back, write-back caches generally
use write allocate (hoping that subsequent writes to that
block will be captured by the cache) and
write-through caches often use no-write allocate
(since subsequent writes to that block will still have to go
to memory).
Memory Interaction with Cache
Possible combinations of interaction policies
with main memory on write.

Write hit policy Write miss policy

Write Through Write Allocate

Write Through No Write Allocate

Write Back Write Allocate

Write Back No Write Allocate

Memory Interaction with Cache
• Write Through with Write Allocate:

• on hits it writes to cache and main memory

• on misses it updates the block in main memory and

brings the block to the cache

• Bringing the block to cache on a miss does not make a

lot of sense in this combination because the next hit to
this block will generate a write to main memory anyway
(according to Write Through policy)
Memory Interaction with Cache
• Write Through with No Write Allocate:

• on hits it writes to cache and main memory;

• on misses it updates the block in main memory not

bringing that block to the cache;

• Subsequent writes to the block will update main memory

because Write Through policy is employed. So, some
time is saved not bringing the block in the cache on a
miss because it appears useless anyway.
Memory Interaction with Cache
• Write Back with Write Allocate:
• on hits it writes to cache setting “dirty” bit for the block,
main memory is not updated;

• on misses it updates the block in main memory and

brings the block to the cache;

• Subsequent writes to the same block, if the block

originally caused a miss, will hit in the cache next time,
setting dirty bit for the block. That will eliminate extra
memory accesses and result in very efficient execution
compared with Write Through with Write Allocate
combination.
Memory Interaction with Cache
• Write Back with No Write Allocate:

• on hits it writes to cache setting “dirty” bit for the block,

main memory is not updated;

• on misses it updates the block in main memory not

bringing that block to the cache;

• Subsequent writes to the same block, if the block

originally caused a miss, will generate misses all the way
and result in very inefficient execution.
Memory Interaction with Cache
Possible combinations of interaction policies
with main memory on write.

Write hit policy Write miss policy

4 Write Through Write Allocate

1 Write Through No Write Allocate

2 Write Back Write Allocate

3 Write Back No Write Allocate

ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
CH04 COA9e Cache Memory Repaired
No ratings yet
CH04 COA9e Cache Memory Repaired
42 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
47 pages
CH04 COA10e
No ratings yet
CH04 COA10e
41 pages
6.Module 2_Part 2
No ratings yet
6.Module 2_Part 2
39 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
Lecture 13- Introduction to Cache
No ratings yet
Lecture 13- Introduction to Cache
47 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
CH04
No ratings yet
CH04
46 pages
cache_ppt
No ratings yet
cache_ppt
38 pages
Cache Memory
No ratings yet
Cache Memory
47 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
EE6304 Lecture9 Mem Caches
No ratings yet
EE6304 Lecture9 Mem Caches
61 pages
Computer Organization and Architecture: Cache Memory
100% (1)
Computer Organization and Architecture: Cache Memory
57 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
72 pages
CH04 Cache Memory
No ratings yet
CH04 Cache Memory
44 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
51 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
CS2115 chapter-6
No ratings yet
CS2115 chapter-6
45 pages
cache_memory
No ratings yet
cache_memory
51 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
UNIT IV.ppt
No ratings yet
UNIT IV.ppt
61 pages
Conspect of Lecture 7
No ratings yet
Conspect of Lecture 7
13 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
10-cacheperf
No ratings yet
10-cacheperf
24 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
Lec 23 CAOCache Memory
No ratings yet
Lec 23 CAOCache Memory
11 pages
4.1 Computer Memory System Overview
No ratings yet
4.1 Computer Memory System Overview
12 pages
04_Cache Memory
No ratings yet
04_Cache Memory
61 pages
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
54 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
Cache Memory
No ratings yet
Cache Memory
57 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Cache
No ratings yet
Cache
34 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
67 pages
03-Chap4-Cache Memory Mapping
No ratings yet
03-Chap4-Cache Memory Mapping
24 pages
Unit 6
No ratings yet
Unit 6
25 pages
CH 4.ppt Type I
No ratings yet
CH 4.ppt Type I
60 pages
Cache Memory
No ratings yet
Cache Memory
89 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Chapter5-The Memory System
No ratings yet
Chapter5-The Memory System
36 pages
Chapter 2z Ppt
No ratings yet
Chapter 2z Ppt
54 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
Lecture 13 16 Post
No ratings yet
Lecture 13 16 Post
24 pages
Unit 1 Part 2 (Chapter 4) Cache Memory
No ratings yet
Unit 1 Part 2 (Chapter 4) Cache Memory
53 pages
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
55 pages
CMSC 611: Advanced Computer Architecture
No ratings yet
CMSC 611: Advanced Computer Architecture
21 pages
Computer Architecture and Organization: Dr. Mohd Hanafi Ahmad Hijazi
No ratings yet
Computer Architecture and Organization: Dr. Mohd Hanafi Ahmad Hijazi
47 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
5 1
No ratings yet
5 1
39 pages
You Are Not Stupid: Computers and Technology Simplified
From Everand
You Are Not Stupid: Computers and Technology Simplified
Jack C. Stanely
No ratings yet
How to Select and Manage Your Contractor 101
From Everand
How to Select and Manage Your Contractor 101
Brian Daniels
No ratings yet
History: of Computer
No ratings yet
History: of Computer
3 pages
University of The East vs. Jader G.R. No. 132344
No ratings yet
University of The East vs. Jader G.R. No. 132344
7 pages
CCNA Exploration 1 - Network Fundamentals - Final Exam - Cisco CCNA Exploration 4.0 Questions - Answers Test Blog
No ratings yet
CCNA Exploration 1 - Network Fundamentals - Final Exam - Cisco CCNA Exploration 4.0 Questions - Answers Test Blog
15 pages
BOL/GENIL Architecture For CRM IC Web Client
No ratings yet
BOL/GENIL Architecture For CRM IC Web Client
1 page
Ihavemoved
No ratings yet
Ihavemoved
10 pages
EVEN sem 2024-25 Subject allocatioN & workload
No ratings yet
EVEN sem 2024-25 Subject allocatioN & workload
4 pages
MOORESHEARERSFOODS ClaimForm
No ratings yet
MOORESHEARERSFOODS ClaimForm
1 page
4B-2084TA - 4B-3064TA - Serise - User - Manual 20190505
No ratings yet
4B-2084TA - 4B-3064TA - Serise - User - Manual 20190505
57 pages
Basgro Business Plan
100% (1)
Basgro Business Plan
15 pages
(Notes) FAR Summary (NICE)
No ratings yet
(Notes) FAR Summary (NICE)
48 pages
M1 User Level Security User Guide
No ratings yet
M1 User Level Security User Guide
8 pages
Spectrum: Integrated Diagnostic and Intervention Solutions
No ratings yet
Spectrum: Integrated Diagnostic and Intervention Solutions
18 pages
Review of Financial Accounting Theory and Practice: Related Standards: PAS 16, 20, 23 & 36
No ratings yet
Review of Financial Accounting Theory and Practice: Related Standards: PAS 16, 20, 23 & 36
7 pages
All Idc Training Directory S v18
No ratings yet
All Idc Training Directory S v18
416 pages
Ex 8
50% (2)
Ex 8
1 page
[Ebooks PDF] download Core Java for the Impatient 3rd Edition Cay Horstmann full chapters
100% (1)
[Ebooks PDF] download Core Java for the Impatient 3rd Edition Cay Horstmann full chapters
40 pages
Tips for Filling Out the PSC
No ratings yet
Tips for Filling Out the PSC
4 pages
Weekly Report # 11
100% (1)
Weekly Report # 11
74 pages
The Use of Data Analytics in Human Resource Management
No ratings yet
The Use of Data Analytics in Human Resource Management
13 pages
People v. Nazario 165 SCRA 186
No ratings yet
People v. Nazario 165 SCRA 186
9 pages
Acta Materialia: Full Length Article
No ratings yet
Acta Materialia: Full Length Article
16 pages
Development
No ratings yet
Development
3 pages
Naveed Sbtech
No ratings yet
Naveed Sbtech
7 pages
Class x Sst Answer Key Set b
No ratings yet
Class x Sst Answer Key Set b
8 pages
Yogita SDM
No ratings yet
Yogita SDM
13 pages
Signalling and Interlocking - Railway Engineering
No ratings yet
Signalling and Interlocking - Railway Engineering
40 pages
vgu
No ratings yet
vgu
2 pages
Pass
No ratings yet
Pass
193 pages
Optics Education
No ratings yet
Optics Education
187 pages
Theory of Machine - II
No ratings yet
Theory of Machine - II
4 pages

Lecture 8

Uploaded by

Lecture 8

Uploaded by

CSE 132

Computer Architecture CSE 132

• Two different types :

• This type of organization is called a

Size 200B 64KB 32MB 2GB

IC * Mem_Refs * Miss_Rate * Miss_Penalty

• where IC = Instruction count

• CPU execution =(CPU clock cycle + Memory

– Fully associative: can be any where in the

Block Place frame2

Block Place frame2

Fully associative frame7

Block Place frame2

frame6 frame16 frame38

• For an address structure, that has 16 MB memory size, 512 KB

tag index offset

» Number of bits in tag=24-5=19

tag index offset

– Least-Recently Used (LRU) - The block replaced is

Advantage: takes locality into account

Memory size=256 KW=2**18

Cache is 10 times faster then main memory. Assuming Cache Access

• Miss Penalty = Block Size * Memory Access Time = 64 * 10 * t

• Repeat same problem using MRU

• Take home quiz

• No Write Allocate - the block is modified in the main

• Although either write-miss policy could be used with

Write hit policy Write miss policy

Write Through Write Allocate

Write Through No Write Allocate

Write Back Write Allocate

Write Back No Write Allocate

• on hits it writes to cache and main memory

• on misses it updates the block in main memory and

• Bringing the block to cache on a miss does not make a

• on hits it writes to cache and main memory;

• on misses it updates the block in main memory not

• Subsequent writes to the block will update main memory

• on misses it updates the block in main memory and

• Subsequent writes to the same block, if the block

• on hits it writes to cache setting “dirty” bit for the block,

• on misses it updates the block in main memory not

• Subsequent writes to the same block, if the block

Write hit policy Write miss policy

4 Write Through Write Allocate

1 Write Through No Write Allocate

2 Write Back Write Allocate

3 Write Back No Write Allocate

You might also like