0% found this document useful (0 votes)

16 views31 pages

Memory 2

Uploaded by

cse.20201016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views31 pages

Memory 2

Uploaded by

cse.20201016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Memory Hierarchy-II

1
Memory Hierarchy

o Motivation
m Exploitinglocality to provide a large, fast and
inexpensive memory
2
Cache Basics
o Cache is a high speed buffer between
CPU and main memory
o Memory is divided into blocks
m Q1: Where can a block be placed in the upper
level? (Block placement)
m Q2: How is a block found if it is in the upper
level? (Block identification)
m Q3: Which block should be replaced on a
miss? (Block replacement)
m Q4: What happens on a write? (Write strategy)

3
Q1: Block Placement
o Fully associative, direct mapped, set
associative
m Example: Block 12 placed in 8 block cache:
n Mapping = Block Number Modulo Number Sets
Direct Mapped 2-Way Assoc
Full Mapped
(12 mod 8) = 4 (12 mod 4) = 0
01234567 01234567 01234567

Cache

1111111111222222222233
01234567890123456789012345678901

Memory

4
Q2: Block Identification
o Tag on each block
m No need to check index or block offset
o Increasing associativity Þshrinks index Þ
expands tag

Block Address Block

Tag Index Offset

5
Q3: Block Replacement
o Easy for direct-mapped caches
o Set associative or fully associative:
m Random
n Easy to implement
m LRU (Least Recently Used)
n Relying on past to predict future, hard to implement
m FIFO
n Sort of approximate LRU
m Not Recently Used
n Maintain reference bits and dirty bits; clear reference bits
periodically; Divide all blocks into four categories; choose one
from the lower category
m Optimal replacement?
n Label the blocks in cache by the number of instructions to be
executed before that block is referenced. Then choose a
block with the highest lable
n Unrealizable!
6
Q4: Write Strategy
Write-Through Write-Back
Write data only
Data written to to the cache
cache block
Policy also written to Update lower
lower-level level when a
memory block falls out
of the cache
Implement Easy Hard
Do read misses
produce writes? No Yes
Do repeated
writes make it Yes No
to lower level?

7
Write Buffers
Cache Lower
Processor Level
Memory
Write Buffer

Write-through cache: holds data awaiting

write-through to lower level memory
Q. Why a write buffer ? A. So CPU doesn’t stall.
Q. Why a buffer, why not A. Bursts of writes are
just one register ? common.
A. Yes! Drain buffer before
Q. Are Read After Write
next read, or check write buffer
(RAW) hazards an issue for
before read and perform read
write buffer?
only when no conflict. 8
Cache Performance
o Average memory access time
m Timetotal mem access = Nhit´Thit + Nmiss´Tmiss
=Nmem access ´Thit + Nmiss ´Tmiss penalty

m AMAT = Thit+ miss rate ´Tmiss penalty

o Miss penalty: time to replace a block from lower

level, including time to replace in CPU
m Access time: time to lower level(latency)
m Transfer time: time to transfer block(bandwidth)

9
Performance Example
o Two data caches (assume one clock cycle for hit)
m I: 8KB, 44% miss rate, 1ns hit time
m II: 64KB, 37% miss rate, 2ns hit time
m Miss penalty: 60ns, 30% memory accesses

m AMATI = 1ns + 44%´60ns = 27.4ns

m AMATII = 2ns + 37%´60ns = 24.2ns

m Larger cache Þsmaller miss rate but longer

ThitÞreduced AMAT

10
Miss Penalty in OOO Environment
o In processors with out-of-order execution
m Memory accesses can overlap with other
computation
m Latency of memory accesses is not always
fully exposed

m E.g.8KB cache, 44% miss rate, 1ns hit time,

miss penalty: 60ns, only 70% exposed on
average
m AMAT= 1ns + 44%´(60ns´70%) = 19.5ns

11
Cache Performance Optimizations
o Performance formulas
m AMAT = Thit+ miss rate ´Tmiss penalty
o Reducing miss rate
m Change cache configurations, compiler optimizations
o Reducing hit time
m Simple cache, fast access and address translation
o Reducing miss penalty
m Multilevel caches, read and write policies
o Taking advantage of parallelism
m Cache serving multiple requests simultaneously
m Prefetching

12
Cache Miss Rate
o Three C’s
o Compulsory misses (cold misses)
m The first access to a block: miss regardless of cache
size
o Capacity misses
m Cache too small to hold all data needed
o Conflict misses
m More blocks mapped to a set than the associativity
o Reducing miss rate
m Larger block size (compulsory)
m Larger cache size (capacity, conflict)
m Higher associativity (conflict)
m Compiler optimizations (all three)
13
Miss Rate vs. Block Size

o Larger blocks: compulsory misses reduced, but may

increase conflict misses or even capacity misses if the
cache is small; may also increase miss penalty
14
Reducing Cache Miss Rate
o Larger cache
m Less capacity misses
m Less conflict misses
n Implies higher associativity: less competition to the same set
m Has to balance hit time, energy consumption, and cost
o Higher associativity
m Less conflict misses
m Miss rate (2-way, X) » Miss rate(direct-map, 2X)

m Similarly, need to balance hit time, energy

consumption: diminishing return on reducing conflict
misses

15
Reducing Cache Miss Penalty
o A difficult decision is
m whether to make the cache hit time fast, to keep pace with the high
clock rate of processors,
m or to make the cache large to reduce the gap between the
processor accesses and main memory accesses.

o Solution:
m Use multi-level cache:
n The first-level cache can be small enough to match a fast clock
cycle time.
n Higher level cache can be large enough to capture many
accesses that would go to main memory.
n Multilevel caches are more power-efficient than single
aggregated cache.

16
Compiler Optimizations for Cache
o Increasing locality of programs
m Temporal locality, spatial locality
o Rearrange code
m Targeting instruction cache directly
m Reorder instructions based on the set of data accessed
o Reorganize data
m Padding to eliminate conflicts:
n Change the address of two variables such that they do not map to
the same cache location
n Change the size of an array via padding
m Group data that tend to be accessed together in one block
o Example optimizations
m Merging arrays, loop interchange, loop fusion

17
Merging Arrays
/* Before: 2 sequential arrays */
int val[SIZE];
int key[SIZE];
/* After: 1 array of structures */
struct merge {
int val;
int key;
};
struct merge merged_array[SIZE];
o Improve spatial locality
m If val[i] and key[i] tend to be accessed together
o Reducing conflicts between val & key
18
Loop Interchange
o Idea: switching the nesting order of two or
more loops

m Sequentialaccesses instead of striding

through memory; improved spatial locality
o Safety of loop interchange
m Need to preserve true data dependences

19
Loop Fusion
o Takes multiple compatible loop nests and
combines their bodies into one loop nest
m Is legal if no data dependences are reversed
o Improves locality directly by merging accesses to
the same cache line into one loop iteration
/* Before */ /* After */
for (i = 0; i < N; i = i+1) for (i = 0; i < N; i = i+1)
for (j = 0; j < N; j = j+1) for (j = 0; j < N; j = j+1){
a[i][j] = 1/b[i][j] * c[i][j]; a[i][j] = 1/b[i][j] * c[i][j];
for (i = 0; i < N; i = i+1) d[i][j] = a[i][j] + c[i][j];
for (j = 0; j < N; j = j+1) }
d[i][j] = a[i][j] + c[i][j];
20
Seminar

o Pipelining Cache

o Prefetching Cache

21
Main Memory Background
o Main memory performance
m Latency: cache miss penalty
n Access time: time between request and word arrives
n Cycle time: time between requests
m Bandwidth: multiprocessors, I/O,
n large block miss penalty
o Main memory technology
m Memory is DRAM: Dynamic Random Access Memory
n Dynamic since needs to be refreshed periodically
n Requires data written back after being read
n Concerned with cost per bit and capacity
m Cache is SRAM: Static Random Access Memory
n Concerned with speed and capacity

22
Memory vs. Virtual Memory
o Analogy to cache
m Size: cache << memory << address space
m Both provide big and fast memory - exploit locality

m Both need a policy - 4 memory hierarchy questions

o Difference from cache

m Cache primarily focuses on speed
m VM facilitates transparent memory management
n Providing large address space
n Sharing, protection in multi-programming environment

23
Four Memory Hierarchy Questions
o Where can a block be placed in main memory?
m OS allows block to be placed anywhere: fully
associative
n No conflict misses;
o Which block should be replaced?
m An approximation of LRU: true LRU too costly and
adds little benefit
n A reference bit is set if a page is accessed
n The bit is shifted into a history register periodically
n When replacing, find one with smallest value in history
register
o What happens on a write?
m Write back: write through is prohibitively expensive

24
Four Memory Hierarchy Questions
o How is a block found in main memory?
m Use page table to translate virtual address into
physical address
• 32-bit virtual
address, page
size: 4KB, 4 bytes
per page table
entry, page table
size?
• (232/212)´22= 222
or 4MB

25
Fast Address Translation
o Motivation
m Page table is too large to be stored in cache
n May even expand multiple pages itself
m Multiple page table levels
o Solution: exploit locality and cache recent
translations

Example:
n Four page table levels

26
Fast Address Translation
o TLB: translation look-aside buffer
m A special fully-associative cache for recent translation
m Tag: virtual address
m Data: physical page frame number, protection field,
valid bit, use bit, dirty bit

o Translation
m Send virtual
address to all tags
m Check violation
m Matching tag send
physical address
m Combine offset to
get full physical address
27
Virtual Memory and Cache
o Physical cache: index cache using physical
address
m Always address translation before accessing cache
m Simple implementation, performance issue

o Virtual cache: index cache using virtual address

to avoid translation
m Address translation only @ cache misses
m Issues
n Protection: copy protection info to each block
n Context switch: add PID to address tag
n Synonym/alias -- different virtual addresses map the same
physical address
l Checking multiple places, enforce aliases to be identical in a
fixed number of bits (page coloring)

28
Virtual Memory and Cache
o Physical cache (PIPT)
cache line return

• Slow

Processor Physical
TLB Main Memory
Core Cache
VA PA miss

hit

o Virtual cache (VIVT)

cache line return
• Protection bits
missing
Processor Virtual TLB Main Memory
Core Cache miss • Context-switch
VA
enforces cache flush

hit • Aliasing issue

29
Virtually-Indexed Physically-Tagged
o Virtually-indexed physically-tagged cache
m Use the page offset (identical in virtual & physical
addresses) to index the cache
m Associate physical address of the block as the
verification tag
m Perform cache reading and tag matching with the
physical address at the same time
m Issue: cache size is limited by page size (the length of
offset bits)

30
Advantages of Virtual Memory
o Translation
m Program can be given a consistent view of memory,
even though physical memory is scrambled
m Only the most important part of program (“Working
Set”) must be in physical memory
o Protection
m Different threads/processes protected from each other
m Different pages can be given special behavior
n Read only, invisible to user programs, etc.
m Kernel data protected from user programs
m Very important for protection from malicious programs

o Sharing
m Can map same physical page to multiple users
31

Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Unit II
No ratings yet
Unit II
9 pages
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
No ratings yet
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
73 pages
Cache
No ratings yet
Cache
36 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Memory Design
No ratings yet
Memory Design
36 pages
Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy
No ratings yet
Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy
25 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
CS 3853 Computer Architecture - Memory Hierarchy
No ratings yet
CS 3853 Computer Architecture - Memory Hierarchy
37 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
Unit 4
No ratings yet
Unit 4
72 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
37 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Cache 1 54
No ratings yet
Cache 1 54
54 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
Cache Writing & Performance
No ratings yet
Cache Writing & Performance
23 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
10 Caches
No ratings yet
10 Caches
34 pages
CA Lecture 08
No ratings yet
CA Lecture 08
38 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
L07 MemoryII
No ratings yet
L07 MemoryII
27 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
No ratings yet
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
77 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
91 pages
02b Cache
No ratings yet
02b Cache
48 pages
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
No ratings yet
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
17 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
10 Cache Memories
No ratings yet
10 Cache Memories
49 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
Input Output Organization (2.3)
No ratings yet
Input Output Organization (2.3)
151 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
Unit 6
No ratings yet
Unit 6
25 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
No ratings yet
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
47 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
No ratings yet
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
20 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
Chapter 03
No ratings yet
Chapter 03
57 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
Cache Memory
No ratings yet
Cache Memory
39 pages
Cache Org
No ratings yet
Cache Org
19 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
51 pages
Week 12 - Lecture 12 - Memory
No ratings yet
Week 12 - Lecture 12 - Memory
27 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
11 Cache Memory
No ratings yet
11 Cache Memory
40 pages
04 Cache Memory
No ratings yet
04 Cache Memory
36 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
Memory Hierarchy 1
No ratings yet
Memory Hierarchy 1
44 pages
TLP
No ratings yet
TLP
19 pages
Gmail - Associate Engineer-Trainee Hiring - Entry-Level Opportunity at Innofied Solutions
No ratings yet
Gmail - Associate Engineer-Trainee Hiring - Entry-Level Opportunity at Innofied Solutions
2 pages
Response of Mung Bean (Vigna Radiata (L.) R. Wilczek) To An Increasing Natural Temperature Gradient Under Different Crop Management Systems
No ratings yet
Response of Mung Bean (Vigna Radiata (L.) R. Wilczek) To An Increasing Natural Temperature Gradient Under Different Crop Management Systems
18 pages
Casting PDF
100% (1)
Casting PDF
48 pages
Well Foundation
No ratings yet
Well Foundation
15 pages
Fhwa NJ 2005 027
No ratings yet
Fhwa NJ 2005 027
236 pages
Organic Chemistry Test
No ratings yet
Organic Chemistry Test
8 pages
Problem Set 3
No ratings yet
Problem Set 3
5 pages
Chalimbana University BFM 3100 2024
No ratings yet
Chalimbana University BFM 3100 2024
6 pages
Auxiliary Materials - Physical Stability Agents PVPP: Adsorption Capacity
No ratings yet
Auxiliary Materials - Physical Stability Agents PVPP: Adsorption Capacity
2 pages
Gas Law
No ratings yet
Gas Law
6 pages
Digital Filter Design 0
0% (1)
Digital Filter Design 0
3 pages
Em8 1session2.1
No ratings yet
Em8 1session2.1
56 pages
St. John's School, Ledhupur, Varanasi Class: XI Session 2021 - 2022 Project Work Subject Topics English-I
No ratings yet
St. John's School, Ledhupur, Varanasi Class: XI Session 2021 - 2022 Project Work Subject Topics English-I
2 pages
Mehta-OS - Intermediate 6CO-1
No ratings yet
Mehta-OS - Intermediate 6CO-1
2 pages
IADS - Programming User Guide
No ratings yet
IADS - Programming User Guide
187 pages
4.3 Orthogonal Diagonalization
No ratings yet
4.3 Orthogonal Diagonalization
11 pages
June CH 4 Atomic Structure Class Viii Notes
No ratings yet
June CH 4 Atomic Structure Class Viii Notes
4 pages
Ephysicsl Experiment 6 - Torque - Finalreport
No ratings yet
Ephysicsl Experiment 6 - Torque - Finalreport
4 pages
Synthesis and Characterization of ZnCo2O4 Nanomaterial For Symmetric Supercapacitor Applications
100% (8)
Synthesis and Characterization of ZnCo2O4 Nanomaterial For Symmetric Supercapacitor Applications
4 pages
Base Plate Calculation
100% (1)
Base Plate Calculation
8 pages
Conceptual
No ratings yet
Conceptual
45 pages
Cambridge IGCSE™: Combined Science 0653/42 May/June 2021
No ratings yet
Cambridge IGCSE™: Combined Science 0653/42 May/June 2021
9 pages
F5 PEKA 1 Concentration
No ratings yet
F5 PEKA 1 Concentration
2 pages
Paper 3 PDF
No ratings yet
Paper 3 PDF
11 pages
Afcat Maths DPP - 38792791 - 2024 - 07 - 28 - 17 - 15
No ratings yet
Afcat Maths DPP - 38792791 - 2024 - 07 - 28 - 17 - 15
269 pages
The Water Colour Technique of Architectural Rendering PDF
75% (4)
The Water Colour Technique of Architectural Rendering PDF
40 pages
W Scientific Inquiry Design Lab
No ratings yet
W Scientific Inquiry Design Lab
7 pages
Topic - Syllogism: DIRECTIONS For Questions 1 - 10: in Each of The Questions Below Are Given Three Statements Followed by
No ratings yet
Topic - Syllogism: DIRECTIONS For Questions 1 - 10: in Each of The Questions Below Are Given Three Statements Followed by
4 pages
Sect 3 Fuel System 1fs Engine Ce303
No ratings yet
Sect 3 Fuel System 1fs Engine Ce303
28 pages
Iot Based Aeroponics Agriculture Monitoring System Using Raspberry Pi
No ratings yet
Iot Based Aeroponics Agriculture Monitoring System Using Raspberry Pi
8 pages
IOQM Mock Test-3
100% (2)
IOQM Mock Test-3
7 pages

Memory 2

Uploaded by

Memory 2

Uploaded by

Memory Hierarchy-II

Block Address Block

Write-through cache: holds data awaiting

m AMAT = Thit+ miss rate ´Tmiss penalty

o Miss penalty: time to replace a block from lower

m AMATI = 1ns + 44%´60ns = 27.4ns

m Larger cache Þsmaller miss rate but longer

m E.g.8KB cache, 44% miss rate, 1ns hit time,

o Larger blocks: compulsory misses reduced, but may

m Similarly, need to balance hit time, energy

m Sequentialaccesses instead of striding

m Both need a policy - 4 memory hierarchy questions

o Difference from cache

o Virtual cache: index cache using virtual address

o Virtual cache (VIVT)

hit • Aliasing issue

You might also like