0% found this document useful (0 votes)

73 views32 pages

Memory Hierarchy: Haresh Dagale Dept of ESE

Memory is organized in a hierarchy from fastest and smallest (SRAM cache) to slowest and largest (magnetic disks). Caches improve performance by taking advantage of spatial and temporal locality to keep frequently used data close to the CPU. Cache coherence protocols like snooping ensure caches maintain a consistent view of shared memory across processors to avoid inconsistencies.

Uploaded by

mailstonaik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views32 pages

Memory Hierarchy: Haresh Dagale Dept of ESE

Uploaded by

mailstonaik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Memory Hierarchy

Haresh Dagale
Dept of ESE
Motivation
Memory Technologies
Access Time vs Cost
• SRAM : Levels closer to the CPU Technology Access Time Cost (ratio)
• DRAM : Main Memory SRAM 5 – 25 ns 100
• Magnetic Disk : Largest and
Slowest level DRAM 60 – 120 ns 5

Disk 10 – 20 ms 0.1
Memory
▪ Programmer’s dream:
• Unlimited amount of fast memory (..that works at the same speed as the processor..)
▪ Hardware designer’s response:
• Create the illusion of a vast memory that can be accessed without making the
processor wait (..on an average..)
▪ How could this illusion be created?
• Programs access a relatively small portion of the address space at any instant of time
- “principle of locality”
▪ Temporal locality
• If an item is referenced it will tend to be referenced again soon.
▪ Spatial locality
• If an item is referenced, items whose addresses are close by will tend to be
referenced soon.
Memory Hierarchy
• To take advantage of the
temporal locality, memory is
build as a hierarchy of levels
• Faster and smaller memory close
to the processor
• Slower and larger (less expensive)
memory below the first level
Memory Organization
• All data is stored at the lowest
level
• A level closer to the processor
is a subset of any level further
away
• Data is copied between two
adjacent levels at a time
• The minimum unit of
information that is
transferred from one level to
another is called a block
• The block size must be larger
than one word
• to take advantage of the spatial
locality
Cache
▪ Intermediary between processor and memory
▪ A standard feature in all modern processors
▪ Most CPU designs use two levels of cache:
▪ “Level 1” or “Primary” cache (also ▪ “Level 2” or “Secondary” cache - also
called internal cache when it is called external cache when it is
implemented off-chip
implemented on-chip)
▪ L2 cache is usually implemented
• Usually implemented on-chip and separately from the processor using fast
runs at the same clock rate as the static RAM (SRAM)
processor • Varies in size from 2Kb up to (?) Mb
• In some processors, L1 cache is • The communication between this
divided into separate I-cache and D- cache and the CPU is usually via a
dedicated bus to ease the traffic
cache congestion with other subsystems
• The L1 caches varies in size from ▪ Recent trend is to build L2 cache also on
2Kb up to 64Kb chip and yet another level (L3) off-chip.
Cache
Cache Organization
▪ The cache is divided into slots (or lines), each containing a block of
data and a Tag field.
Cache line bits:
▪ Data field: block of data (multiples of words)
▪ Tag field: the upper portion of the
address,
• bits that are not used as an index for the
cache
• required to identify whether a word in the
cache corresponds to the requested word
▪ Dirty bit: data written to cache but not
to external memory
• Instruction cache lines do not have this bit
because it is read-only
▪ Valid bit: cache line is not empty or has
not been deleted
▪ Lock bit: cache line can be accessed but
not replaced
Direct-mapped Cache
Cache Associativity
2-way Set Associative Cache
Cache Miss
▪ Type of misses
• Compulsory misses (or cold-start misses)
• Increase the block size?
• Capacity misses
• Increase the cache size. Additional hardware, address resolution
• Conflict misses
• Reduce swapping of blocks in and out
▪ Design considerations
• Block size
• Replacement policy
Updating Memory
▪ How to update main memory if cached data is modified?
▪ Write-through
• data is written immediately to the main memory
• causes more traffic on the bus
▪ Write-back (or copy-back)
• data is delayed until block replacement occurs
• complex to implement
Write Through

• Data is written to the cache and also to

Write hit external memory
• Data is written only to external memory -
Write miss data cache is not changed
Copy Back

Write hit • Data is written only to the cache

• The line of data in external memory is loaded into

Write miss the cache and the data is written only to the cache
Simple FSM for cache controller
Valid CPU request
Compare
Tag,
Idle Cache hit
Valid and
Dirty

Cache miss
Old block is dirty

Memory

Write
Allocate
Back
Memory update Update
buffer
Cache Coherency Problem
▪ Main memory is shared among the processors and I/O subsystems
• Individual caches improve performance by storing frequently used data in faster
memory
▪ The view of memory through the cache could be different from the view of
memory through the I/O subsystem
▪ Since all processors share the same address space, it is possible for more
than one processor to cache an address (or data item) at a time
▪ If one processor updates the data item without informing the other
processors, inconsistencies may result and cause incorrect executions
Coherency /Consistency
▪ Coherence and Consistency are two complimentary issues though both
define the behaviour of reads and writes to memory locations
▪ The Coherence model defines what value can be returned by a read
▪ The Consistency model defines when a written value must be seen by a
read
▪ A simple definition of coherency:

A memory system is said to be coherent if any read of a data item

returns the most recently written value of that data item
Coherence
▪ More formal definition
• A read by processor P to a location X after a write by P to X, with no writes of X by
another processor occurring between the write and the read by P, always returns the
value written by P.
• A read by a processor P to location X after a write by another processor Q to X
returns the written value if the read and write are sufficiently separated in time and
no other writes to X occur between the two accesses.

▪ Writes to same location are serialized

• Two writes to the same location by any two processors are seen in the same order
by all processors
Enforcing Coherence
▪ For correct execution, coherence must be enforced among the caches
▪ Two major factors influence the selection of coherence enforcing strategy:
• Performance
• Implementation cost
▪ Four primary design issues are:
• Coherence detection strategy
• How to detect incoherent caches?
• Coherence enforcement strategy
• Updating / invalidating entries
• Precision of block-sharing information
• How sharing information is stored?
• Cache block size
Enforcing Coherence
▪ Mechanisms to make caches consistent:
• Write-update (WU)
• Write-invalidate (WI)
• Hybrid protocols, competitive-update (CU)
▪ Performance of WU and WI vary depending on the application and the
number of writes
▪ Hybrid protocols switch between WU and WI based on the number of
writes to a block
Hardware Protocols - Snooping
▪ Snooping protocols rely on a shared bus between the processors for
coherence
▪ On a processor write, the write is passed through the cache to main
memory on the bus
• Any processor caching the address may update or invalidate its cache entry as
appropriate
• Snooping protocols do not scale well beyond 32 processors because of the shared
bus
• The choice between WU, WI, and CU is especially important to reduce
communication
Snooping (1/3)

The most popular protocol to maintain cache coherency

Snooping (2/3)
▪ Write invalidate
• The writing processor issues an invalidation signal over the
bus
• All caches check to see if they have a copy
• If so, they must invalidate the block containing the word
• The writing processor is then free to update the local data
until another processor asks for it
▪ Write update
• The writing processor broadcasts new data over the bus
• All copies are updated with the new value
Shared data has lower spatial and temporal locality than other
types of data. Shared data misses often dominate cache behaviour
even though they may be just 10% of the data accesses
Snooping (3/3)
▪ On a write
• All caches check to see if they have a copy and then act to either invalidate or update
their copy to the new value as per the snooping protocol in use
▪ On a read miss
• All caches check to see if they have a copy of the requested block and take
appropriate action
• e.g. supplying data to the cache that missed
▪ Every bus transaction checks cache address tags
• The address tag portion of the cache is duplicated to get an extra read port for
snooping
• Snooping does not interfere with the processor’s access to the cache
MSI Protocol
STORE LOAD

M
LOAD FLUSH
STORE L_REQ
S_REQ
STORE

S
S_REQ
S_REQ
FLUSH

S_REQ
L_REQ
LOAD L_REQ

I
MESI Protocol

▪ Modified state is same as “Valid Dirty”

▪ Shared and Exclusive states imply clean data – memory has up-to-
date version of the data
• Exclusive state implies that this is the only copy of the data
• A write to data in the exclusive state does not require an invalidation
• Shared state implies that there are multiple copies of the data
MESI
LOAD LOAD
Transitions on processor
requests
STORE S_REQ STORE
Current Processor Next
M S State Request State

I Load E
Load S
L_REQ (S)

STORE LOAD Store M

STORE
S_REQ E Load E
Store M
M Load M

LOAD
E I Store M
S Store M
LOAD L_REQ (S`)
Load S
MESI
Transitions on requests from other
L_REQ
L_REQ FLUSH cache controller
FLUSH

M S
Current Request from other Next
State Cache Controller State
S_REQ
FLUSH E Store I
Load S
S_REQ S Load S
L_REQ FLUSH
Store I
FLUSH
M Store I
Load S

E I
S_REQ FLUSH
Cache Design
• Design a cache-memory system for a
processor with 8 bit data bus. It has 4
MBytes of RAM and 16 Kbytes of on-
chip cache. The cache is 4-way set Cache
associative. Assume that cache-line
(cache-blocks) is 128 bytes long.
• Minimum address bus width ?
Main
• The tag field ? Memory
set
• Index ?
• Offset ?
• Number of Sets
• Number of possible( competing)
memory-block per set ?
• Bits required to address 4Mb
Cache Design Solution
▪ RAM: Minimum 22 bits required to address 4 Mbytes of memory
▪ Number bits required to identify the byte cache-block = offset
▪ Offset = No. of bits required to address a byte in 128 byte = 7 bits
▪ Number of sets =
• Cache size / (# of cachelines per set * length of cache block)
• = 16k / ( 4 * 128) = 32
• Therefore, index field length = log232 = 5
▪ We have 32 sets with each set having 4 cache-block.
• Therefore, for a particular set possible cache blocks:
• Total memory-blocks = 4096 MB /128 bytes = 32000
• Number of competing memory-blocks for a particular set:
• 32000/32 (Total cache blocks / Total number of sets available) = 1000

Module 4 Memory
100% (1)
Module 4 Memory
48 pages
Cache Memories
No ratings yet
Cache Memories
41 pages
CCNP Security PDF
No ratings yet
CCNP Security PDF
310 pages
02b Cache
No ratings yet
02b Cache
48 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Unit Iv
No ratings yet
Unit Iv
61 pages
Chap 4 Cache Memory
No ratings yet
Chap 4 Cache Memory
55 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Memory: Computer Architecture and Assembly Language
No ratings yet
Memory: Computer Architecture and Assembly Language
15 pages
Week 12 - Lecture 12 - Memory
No ratings yet
Week 12 - Lecture 12 - Memory
27 pages
Cache 1 54
No ratings yet
Cache 1 54
54 pages
Unit3 Coa
No ratings yet
Unit3 Coa
30 pages
Chapter5-The Memory System
No ratings yet
Chapter5-The Memory System
36 pages
Unit 5 Memory System
No ratings yet
Unit 5 Memory System
77 pages
Cache
No ratings yet
Cache
36 pages
F5 101 Test Notes
No ratings yet
F5 101 Test Notes
103 pages
CH05
No ratings yet
CH05
56 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Cache Memory
No ratings yet
Cache Memory
89 pages
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
No ratings yet
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
77 pages
CH06 Memory Organization
No ratings yet
CH06 Memory Organization
85 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Cache Coherence
No ratings yet
Cache Coherence
53 pages
Lec 4
No ratings yet
Lec 4
31 pages
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
52-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
52-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
11 pages
QA Testing Interview Questions
100% (2)
QA Testing Interview Questions
84 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
Discover The AREST Framework - Easily Control Your Arduino, Raspberry Pi & ESP8266 Projects
No ratings yet
Discover The AREST Framework - Easily Control Your Arduino, Raspberry Pi & ESP8266 Projects
203 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
Malpass Luke.-SolidWorks 2008 API - Programming and Automation (2011) PDF
No ratings yet
Malpass Luke.-SolidWorks 2008 API - Programming and Automation (2011) PDF
268 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Cache Memory
No ratings yet
Cache Memory
24 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
CA-unit 5-Material-For Reference
No ratings yet
CA-unit 5-Material-For Reference
16 pages
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
No ratings yet
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
47 pages
Cache&Virtual Memory
No ratings yet
Cache&Virtual Memory
50 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Cache Coherency
No ratings yet
Cache Coherency
33 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Ibm HHTP Server
No ratings yet
Ibm HHTP Server
233 pages
Unit 5
No ratings yet
Unit 5
40 pages
Cache Memory
No ratings yet
Cache Memory
60 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Crystall Growth
100% (1)
Crystall Growth
41 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
Cache Memory
No ratings yet
Cache Memory
7 pages
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
No ratings yet
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
33 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Caching: Acknowledgements
No ratings yet
Caching: Acknowledgements
6 pages
EC 2007 With Solutions
No ratings yet
EC 2007 With Solutions
46 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
DSL Engineering PDF
No ratings yet
DSL Engineering PDF
558 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
CCNA Cheat Sheet: Experts in Networking
No ratings yet
CCNA Cheat Sheet: Experts in Networking
8 pages
Spring Batch - Reference
No ratings yet
Spring Batch - Reference
26 pages
Term Paper: Cahe Coherence Schemes
No ratings yet
Term Paper: Cahe Coherence Schemes
12 pages
Cache AN3544
No ratings yet
Cache AN3544
12 pages
Shared Memory Architecture Concepts and Performance Issues: Outline
No ratings yet
Shared Memory Architecture Concepts and Performance Issues: Outline
7 pages
Akamai
No ratings yet
Akamai
3 pages
Microelectronics Tech
No ratings yet
Microelectronics Tech
33 pages
4 Unit Speed, Size and Cost
No ratings yet
4 Unit Speed, Size and Cost
5 pages
TopUniversityDetails NIRF18
No ratings yet
TopUniversityDetails NIRF18
5 pages
Data and Computer Communications: Chapter 2 - Protocol Architecture, TCP/IP, and Internet-Based Applications
No ratings yet
Data and Computer Communications: Chapter 2 - Protocol Architecture, TCP/IP, and Internet-Based Applications
44 pages
LECTURE 1: Introduction: Ivan Marsic Rutgers University
No ratings yet
LECTURE 1: Introduction: Ivan Marsic Rutgers University
42 pages
SoapUI Day3 Day4 Day5 Day6 Day7 Day8
No ratings yet
SoapUI Day3 Day4 Day5 Day6 Day7 Day8
37 pages
Java Script
No ratings yet
Java Script
4 pages
Introduction To Vector Space Theory Matrices
No ratings yet
Introduction To Vector Space Theory Matrices
17 pages
Crystal Growth Techniques
No ratings yet
Crystal Growth Techniques
29 pages
Crystal Growth Techniques
No ratings yet
Crystal Growth Techniques
29 pages
RPA Course Content 1 PDF
No ratings yet
RPA Course Content 1 PDF
3 pages
Naslovna Strana
No ratings yet
Naslovna Strana
2 pages
BCH Codes
No ratings yet
BCH Codes
23 pages
Semiconductor Manufacturing Processes
No ratings yet
Semiconductor Manufacturing Processes
32 pages
Coding Theory: A Bird S Eye View: Continued Block Codes: Basics
No ratings yet
Coding Theory: A Bird S Eye View: Continued Block Codes: Basics
32 pages
The Hindu-Dairy Events of 2013
No ratings yet
The Hindu-Dairy Events of 2013
82 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
RS - SN-2.2.4.9 Packet Tracer - Configuring Switch Port Security Instructions
100% (2)
RS - SN-2.2.4.9 Packet Tracer - Configuring Switch Port Security Instructions
2 pages
Decoding of Linear Block Codes: Syndrome and Error Correction
No ratings yet
Decoding of Linear Block Codes: Syndrome and Error Correction
25 pages
How To Remove Recycler Virus From My Drive (Solved)
No ratings yet
How To Remove Recycler Virus From My Drive (Solved)
2 pages
CDSecPlusEhance PDF
No ratings yet
CDSecPlusEhance PDF
24 pages
Lecture 4
No ratings yet
Lecture 4
20 pages
Arduino Lesson For Loops For Simple LED Circuit - Technology Tutorials
No ratings yet
Arduino Lesson For Loops For Simple LED Circuit - Technology Tutorials
10 pages
10 Extension Fileds
No ratings yet
10 Extension Fileds
18 pages
Current Position-IT Engineer Current CTC-3.43Lacs Expected CTC-4Lacs Notice Period-3Months Kaushik Paul
No ratings yet
Current Position-IT Engineer Current CTC-3.43Lacs Expected CTC-4Lacs Notice Period-3Months Kaushik Paul
3 pages
Phase Diagram: - Only Certain Compositions of Materials Are Allowed at A
No ratings yet
Phase Diagram: - Only Certain Compositions of Materials Are Allowed at A
16 pages
Convolutional Codes
No ratings yet
Convolutional Codes
16 pages
The Fundamental Approach: Decoding Procedure For BCH Codes
No ratings yet
The Fundamental Approach: Decoding Procedure For BCH Codes
16 pages
VFPReportWriter NewIn8
No ratings yet
VFPReportWriter NewIn8
8 pages
Systematic Cyclic Codes Parity Check Polynomial
No ratings yet
Systematic Cyclic Codes Parity Check Polynomial
15 pages
Cheat Sheet - Useful CLI Commands 4.0
No ratings yet
Cheat Sheet - Useful CLI Commands 4.0
3 pages
Interface Exercise
0% (1)
Interface Exercise
3 pages
Ip-Opgaver 10.8
No ratings yet
Ip-Opgaver 10.8
13 pages
Lecture 12
No ratings yet
Lecture 12
13 pages
08 - Midterm Review PDF
No ratings yet
08 - Midterm Review PDF
10 pages
Oceanspace S5000 Series Data Sheet
No ratings yet
Oceanspace S5000 Series Data Sheet
4 pages
ECE 563: Information Theory Moulin, Fall 2012 Solutions To Problem Set 5 (CT 7.33)
No ratings yet
ECE 563: Information Theory Moulin, Fall 2012 Solutions To Problem Set 5 (CT 7.33)
8 pages
Student Information Synopsis
No ratings yet
Student Information Synopsis
7 pages
Shourie Amireddy Resume
No ratings yet
Shourie Amireddy Resume
1 page
Jhenning 2017 Resume
No ratings yet
Jhenning 2017 Resume
1 page

Memory Hierarchy: Haresh Dagale Dept of ESE

Uploaded by

Memory Hierarchy: Haresh Dagale Dept of ESE

Uploaded by

Memory Hierarchy

• Data is written to the cache and also to

Write hit • Data is written only to the cache

• The line of data in external memory is loaded into

A memory system is said to be coherent if any read of a data item

▪ Writes to same location are serialized

The most popular protocol to maintain cache coherency

▪ Modified state is same as “Valid Dirty”

STORE LOAD Store M

You might also like