0% found this document useful (0 votes)

7 views

cache_concepts_memory

The document provides an overview of cache memory, including its definition, structure, and operations such as block read and write. It discusses cache mapping algorithms, specifically fully associative and direct mapping, along with their respective advantages and examples. Additionally, it covers replacement algorithms for managing cache when it is full, emphasizing the importance of locality in memory access.

Uploaded by

ashalibaba123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

cache_concepts_memory

Uploaded by

ashalibaba123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

System

Fundamentals
Cache memories and
principle of locality

1
Cache CPU

Memory
CU ALU Clock

Registers

I/O BUS
Definition and Cache
Concepts

2
Cache: Definition
cache (kash), n.
• A hiding place used especially for storing provisions.
• A place for concealment and safekeeping, as of
valuables.
• The store of goods or valuables concealed in a hiding
place.
• Computer Science. A fast storage buffer in the central
processing unit (CPU) of a computer. In this sense, also
called cache memory.
Cache Memory
Small, fast SRAM-based memory managed
automatically in hardware (i.e., not software)
• Located on the CPU
• Hold frequently accessed blocks in main memory (RAM)

CPU
RAM
CU ALU Clock
Controller
Registers BUS
I/O BUS

data
Cache
General Cache Concept
Main memory is partitioned into equal
RAM
size chucks called a block
• Not physical partitioned Controller
• Block is a contiguous range of physical 0
address locations
1
2
For example, 3
• RAM partitioned into N blocks 4
5

N-1
Block Read Operation
Basic steps (for a block not in cache)
• CPU sends RAM controller the start address of
the block. RAM
• RAM controller puts a copy of the block on
the BUS Controller
• CPU controller reads the BUS and puts copy of
block in cache. 0
1
CPU
2
CU ALU Clock 3

Registers 4
3 (copy)
I/O BUS

5
Cache BUS

3 (copy) N-1
Block Write Operation
Basic steps:
RAM
• CPU controller puts a copy of the block (in
cache) on the BUS Controller
• RAM controller reads the BUS and
0
replaces block with the copy.
1
CPU
2
CU ALU Clock 3

Registers 4
3 (copy)
5
I/O BUS

Cache BUS

3 (copy) N-1
RAM

Main
Controller

Memory 1
2
3
Block Partitioning 4
5

N-1

8
Memory Partition: Blocks
Address (binary) Storage (1 Byte)
Simple example: 0000

• Physical address (m) is 4

block
0001
0010
bits 0011
• block size = 4 bytes 0100

block
0101
0110

Block address bits: 0111

1000
• 0000 to 0011 1001

• 0100 to 0111 block 1010

1011
• 1000 to 1011 1100

• 1100 to 1111
block

1101
1110
1111
Block Offset Bits
Block address bits: 00 00 Address (binary) Storage (1 Byte)
0000
• 0000 to 0011 00 01

block
0001
0010
00 10 0011

00 11 0100

block
0101

Red bit values are block 0110

offset (b) bits 0111

1000
• Reference a specific byte 1001

block
in the block, e.g., byte at
offset 01 1010
1011
• block size = 2b bytes 2! 1100
• Total number of blocks = "
2
block
1101
1110
1111
RAM

Cache
Controller

Mapping 1
2
3
Block Placement 4
5
Algorithms
N-1

11
Cache Mapping Algos
Three types:
• Fully associative
• Direct mapping
• Set associative

We’ll focus on Fully associative and direct mapping

• Set associative Comp 530 and 630.
• Set associative cache memories is used in modern
processors.

12
Fully Associative (FA)
Important concepts:
1. Block data could be anywhere in the cache
2. Flexible block storage strategy
3. Expensive to evict and replace a block
• Block replacement algorithm
FA Read Example
memory address (m) bits = 4 Address Data (1 Byte)
0000 0xA1
Block size = 4 bytes 0001 0xA2
Tag bits (t) = 2 msb lsb 0010 0xA3
0011 0xA4
Block offset (b) bits = 2 0100 0xB1

Valid bit: 0 = invalid 0101 0xB2

0110 0xB3

three-line Cache design 0111 0xB4

1000 0xC1

Block offset (b) 1001 0xC2

1010 0xC3
line valid tag (t) 00 01 10 11 1011 0xC4
0 0 1100 0xD1
1101 0xD2
1 0 1110 0xD3
2 0 1111 0xD4
FA Read Example (Cont.)
CPU: Load data instruction, put data at Address Data (1 Byte)
address 0111 in register $8 0000 0xA1
• Search tags -> cache miss (cache is 0001 0xA2
empty) 0010 0xA3
• Place a copy of block in any open line in 0011 0xA4
cache (i.e., valid = 0) 0100 0xB1
• Valid = 1 0101 0xB2
• Tag bits = 01 0110 0xB3
• Put 0xB4 in register $8 0111 0xB4
1000 0xC1
Block offset (b) 1001 0xC2
1010 0xC3
line valid tag (t) 00 01 10 11
1011 0xC4
0 1 01 0xB1 0xB2 0xB3 0xB4 1100 0xD1
1101 0xD2
1 0
1110 0xD3
2 0 1111 0xD4
FA Read Example (Cont.)
CPU: Load data instruction, put Address Data (1 Byte)

data at address 0101 in register 0000

0001
0xA1
0xA2
$9 0010 0xA3
0011 0xA4
• Search tags -> cache hit 0100 0xB1
• Line 0 holds tag 01 (valid=1) 0101 0xB2
0110 0xB3
• Put 0xB2 in register $9 0111 0xB4
1000 0xC1
Block offset (b) 1001 0xC2
1010 0xC3
line valid tag (t) 00 01 10 11
1011 0xC4
0 1 01 0xB1 0xB2 0xB3 0xB4 1100 0xD1
1101 0xD2
1 0
1110 0xD3
2 0 1111 0xD4
FA Read Example (Cont.)
CPU: Load data instruction, put data Address Data (1 Byte)
at address 1111 in register $8 0000 0xA1
• Search tags -> cache miss 0001 0xA2

• Place a copy of block in any open 0010 0xA3

line in cache (i.e., valid = 0) 0011 0xA4

• Valid = 1 0100 0xB1

0101 0xB2
• Tag bits = 11
0110 0xB3
• Put 0xD4 in register $8 0111 0xB4
1000 0xC1
Block offset (b) 1001 0xC2
1010 0xC3
line valid tag (t) 00 01 10 11
1011 0xC4
0 1 01 0xB1 0xB2 0xB3 0xB4 1100 0xD1
1101 0xD2
1 1 11 0xD1 0xD2 0xD3 0xD4
1110 0xD3
2 0 1111 0xD4
FA Read Example (Cont.)
CPU: Load data instruction, put Address Data (1 Byte)
data at address 1011 in register 0000 0xA1

$8 0001
0010
0xA2
0xA3
• Search tags -> cache miss 0011 0xA4
• Oh snap, cache is full!! 0100 0xB1
0101 0xB2
• Must evict a valid line (i.e.,
0110 0xB3
invalidate) and replace with new
0111 0xB4
block data!
1000 0xC1
1001 0xC2
Block offset (b) 1010 0xC3
1011 0xC4
line valid tag (t) 00 01 10 11
1100 0xD1
0 1 01 0xB1 0xB2 0xB3 0xB4 1101 0xD2
1110 0xD3
1 1 11 0xD1 0xD2 0xD3 0xD4
1111 0xD4
2 1 00 0xA1 0xA2 0xA3 0xA4
FA Replacement Algorithm
When cache is full, and a line must evicted, how to
pick which line to replace?
• LRU (Least-recently used)
• replaces the line that has gone UNACCESSED the LONGEST
• favors the most recently accessed data
• FIFO/LRR (first-in, first-out/least-recently replaced)
• replaces the OLDEST line in cache
• favors recently loaded items over older STALE items
• Random
• replace some line at RANDOM
• no favoritism – uniform distribution
Direct Mapping (DM)
Important concepts:
1. Line bits determine the exact location of the block
data in cache.
2. Fairly rigid storage strategy (see 1 above)
3. Simple to evict and replace a block (see 1 above)
• No block replacement algorithm
DM Read Example
memory address (m) bits = 4
Address Data (1 Byte)
Block size = 4 bytes
0000 0xA1
Tag bits (t) = 1 msb lsb 0001 0xA2
Line bits (s) = 1 0010 0xA3
0011 0xA4
Block offset (b) bits = 2 0100 0xB1
Valid bit: 0 = invalid 0101 0xB2
0110 0xB3
0111 0xB4
Two-line Cache design
1000 0xC1
1001 0xC2
Block offset (b)
1010 0xC3
Line (s) valid tag (t) 00 01 10 11 1011 0xC4
1100 0xD1
0 0
1101 0xD2
1 0 1110 0xD3
1111 0xD4
DM Read Example (Cont.)
CPU: Load data instruction, put data at
address 0111 in register $8 Address
0000
Data (1 Byte)
0xA1
• Go to line and identify tag -> cache miss 0001 0xA2
• Put copy of block at line (s=1) in cache 0010 0xA3
(i.e., valid = 0) 0011 0xA4
0100 0xB1
• Valid = 1
0101 0xB2
• Tag bit = 0
0110 0xB3
• Put 0xB4 in register $8 0111 0xB4
1000 0xC1
1001 0xC2
Block offset (b)
1010 0xC3
Line (s) valid tag (t) 00 01 10 11 1011 0xC4
1100 0xD1
0 0
1101 0xD2
1 1 0 0xB1 0xB2 0xB3 0xB4 1110 0xD3
1111 0xD4
DM Read Example (Cont.)
CPU: Load data instruction, put data
at address 0101 in register $9
Address Data (1 Byte)
0000 0xA1

• Go to line and identify tag -> cache hit 0001 0xA2

0010 0xA3
• Line 1 holds tag 0 (valid=1) 0011 0xA4

• Put 0xB2 in register $9 0100 0xB1

0101 0xB2
0110 0xB3
0111 0xB4
1000 0xC1
Block offset (b) 1001 0xC2
1010 0xC3
Line (s) valid tag (t) 00 01 10 11 1011 0xC4
0 0 1100 0xD1
1101 0xD2
1 1 0 0xB1 0xB2 0xB3 0xB4
1110 0xD3
1111 0xD4
DM Read Example (Cont.)
CPU: Load data instruction, put data at
address 0000 in register $8 Address Data (1 Byte)
0000 0xA1
• Go to line and identify tag -> cache miss 0001 0xA2
• Put copy of block at line (s=0) in cache 0010 0xA3
(i.e., valid = 0) 0011 0xA4
• Valid = 1 0100 0xB1
0101 0xB2
• Tag bit = 0
0110 0xB3
• Put 0xA1 in register $8 0111 0xB4
1000 0xC1
Block offset (b) 1001 0xC2
1010 0xC3
Line (s) valid tag (t) 00 01 10 11 1011 0xC4
0 1 0 0xA1 0xA2 0xA3 0xA4 1100 0xD1
1101 0xD2
1 1 0 0xB1 0xB2 0xB3 0xB4
1110 0xD3
1111 0xD4
DM Read Example (Cont.)
CPU: Load data instruction, put data
at address 1011 in register $9 Address
0000
Data (1 Byte)
0xA1
• Go to line and identify tag -> cache 0001 0xA2
miss 0010 0xA3
0011 0xA4
• Line 0 is being used (valid=1)
0100 0xB1
• Must evict line 0 (i.e., invalidate) and 0101 0xB2
replace with new block data! 0110 0xB3
• Put 0xC4 in register $9 0111 0xB4
1000 0xC1
Block offset (b) 1001 0xC2
1010 0xC3
Line (s) valid tag (t) 00 01 10 11 1011 0xC4
0 1 1 0xC1 0xC2 0xC3 0xC4 1100 0xD1
1101 0xD2
1 1 0 0xB1 0xB2 0xB3 0xB4 1110 0xD3
1111 0xD4
Key Ideas
Keep data used often in a small fast SRAM
• “cache”
• access frequently
• on the CPU (fast!)

Keep all data in a bigger but slower DRAM

• “main memory”
• access rarely
• BUS transfers between CPU and RAM (slow!)
Cache R/W Operations
Read operations, very straight-forward!
• Most (80+%) memory operations are read.

Write operations, not straight-forward, two different policies:

• WRITE-THROUGH: CPU writes are cached, but also written to main
memory (stalling the CPU until write is completed). Memory always
holds the latest values.

• WRITE-BACK: CPU writes are cached, but not immediately written

to main memory. Main memory contents can become “stale”. Only
when a value has to be evicted from the cache, and only if it had
been modified (i.e., is “dirty”), it is written to main memory.

27
Cache: Bytes, Shorts, and Words
In general, the size of a block in physical memory is
one (or more) words!
• Never put a single short or byte from DRAM into cache
• Instead, put the entire word, much more efficient!
• Why memory alignment is important!
Once the word is in cache, the byte
or short can be accessed through
hardware operations:
• i.e., bit mask and shifting

28
address

Principle stack

of Locality data

Block organization
program

time

29
Principle of Locality
Def: Programs tend to use data and instructions in
memory that have addresses near or equal to those
they have used recently!

Temporal locality: Recently

referenced blocks are likely to be
referenced again in the near future.

Spatial locality: Blocks with nearby

addresses tend to be referenced close Why blocks
together in time. work!
Locality Example
sum = 0;
for (i = 0; i < n; i++)
sum += a[i];
return sum;

Data references
• Reference array elements in succession (stride-1 Spatial locality
reference pattern).
• Reference variable sum each iteration. Temporal locality

Instruction references Spatial locality

• Reference instructions in sequence.
• Cycle through loop repeatedly. Temporal locality
Summary
Programs that repeatedly reference the same variables
enjoy good temporal locality.

For programs with stride-k reference patterns, the

smaller the stride the better the spatial locality.
• Programs with stride-1 reference patterns have good spatial
locality.
• Programs that hop around memory with large strides have
poor spatial locality.

Loops have good temporal and spatial locality with

respect to variables and instruction fetches.
• The smaller the loop body and the greater the number of
loop iterations, the better the locality.

On Training
No ratings yet
On Training
43 pages
10-Cache
No ratings yet
10-Cache
28 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
09 Caches Tlbs
No ratings yet
09 Caches Tlbs
33 pages
Chap 6
No ratings yet
Chap 6
48 pages
EE6304 Lecture9 Mem Caches
No ratings yet
EE6304 Lecture9 Mem Caches
61 pages
Unit V
No ratings yet
Unit V
44 pages
Today: - How Do Caches Work?
No ratings yet
Today: - How Do Caches Work?
38 pages
Chapter 5.1-5.6 Memory
No ratings yet
Chapter 5.1-5.6 Memory
26 pages
CMSC 611: Advanced Computer Architecture
No ratings yet
CMSC 611: Advanced Computer Architecture
21 pages
Basic Philosophy: Cache Memory
No ratings yet
Basic Philosophy: Cache Memory
16 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
Week 12 - Lecture 12 - Memory
No ratings yet
Week 12 - Lecture 12 - Memory
27 pages
CODch 7 Slides
No ratings yet
CODch 7 Slides
49 pages
Cache Mapping
100% (1)
Cache Mapping
44 pages
Comp Arch Lect5
No ratings yet
Comp Arch Lect5
26 pages
CS2115 chapter-6
No ratings yet
CS2115 chapter-6
45 pages
Memory Organization
No ratings yet
Memory Organization
52 pages
Lecture 13- Introduction to Cache
No ratings yet
Lecture 13- Introduction to Cache
47 pages
Memory Sub-System: CT101 - Computing Systems
No ratings yet
Memory Sub-System: CT101 - Computing Systems
46 pages
4.1 Computer Memory System Overview
No ratings yet
4.1 Computer Memory System Overview
12 pages
Elements of Cache Design Pentium IV Cache Organization
No ratings yet
Elements of Cache Design Pentium IV Cache Organization
43 pages
William Stallings Computer Organization and Architecture: Internal Memory
No ratings yet
William Stallings Computer Organization and Architecture: Internal Memory
60 pages
Supplemental Material On Cache From ECE-341 Memory
No ratings yet
Supplemental Material On Cache From ECE-341 Memory
79 pages
06_Memory System_I
No ratings yet
06_Memory System_I
63 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
Computer Organization and Architecture: Cache Memory
100% (1)
Computer Organization and Architecture: Cache Memory
57 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
49 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
No ratings yet
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
49 pages
Unit VI Final
No ratings yet
Unit VI Final
94 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
chapter 5
No ratings yet
chapter 5
60 pages
Assosiative Mapping - Cache Memory
No ratings yet
Assosiative Mapping - Cache Memory
2 pages
Module 5.3
No ratings yet
Module 5.3
39 pages
Computer Architecture: Cache Memory
No ratings yet
Computer Architecture: Cache Memory
28 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
No ratings yet
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
20 pages
9 - Cache
No ratings yet
9 - Cache
58 pages
Unit 5
No ratings yet
Unit 5
21 pages
11 Cache Memory
No ratings yet
11 Cache Memory
40 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Cache memory,Virtual memory and Auxiliary memory notes
No ratings yet
Cache memory,Virtual memory and Auxiliary memory notes
42 pages
Memory Organization: CS 147 Presented By: Duong Pham
No ratings yet
Memory Organization: CS 147 Presented By: Duong Pham
28 pages
Wk10a Cache PDF
No ratings yet
Wk10a Cache PDF
25 pages
Chapter5 Memorysystem
No ratings yet
Chapter5 Memorysystem
56 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
55 pages
Unit 1 Part 2 (Chapter 4) Cache Memory
No ratings yet
Unit 1 Part 2 (Chapter 4) Cache Memory
53 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Lec 4
No ratings yet
Lec 4
31 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
Lectures wk11
No ratings yet
Lectures wk11
21 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
IES 4 Memory System Mechanisms
No ratings yet
IES 4 Memory System Mechanisms
11 pages
Neo Geo Architecture: Architecture of Consoles: A Practical Analysis, #23
From Everand
Neo Geo Architecture: Architecture of Consoles: A Practical Analysis, #23
Rodrigo Copetti
No ratings yet
0 PLANOS TINGLADO TIPO UPRE BUEN Y MAL SUELO Model PDF
No ratings yet
0 PLANOS TINGLADO TIPO UPRE BUEN Y MAL SUELO Model PDF
1 page
Cdd116690-Sony Vaio - PCG - 7x2l PDF
No ratings yet
Cdd116690-Sony Vaio - PCG - 7x2l PDF
7 pages
HL Upgrade Instructions
No ratings yet
HL Upgrade Instructions
5 pages
Tugas Gabriela Pakasi
No ratings yet
Tugas Gabriela Pakasi
4 pages
Hard Disc
No ratings yet
Hard Disc
12 pages
Activity Sheets in Basic Computer 1
No ratings yet
Activity Sheets in Basic Computer 1
4 pages
Diagnostics
No ratings yet
Diagnostics
25 pages
IRIScan Book Documentation
No ratings yet
IRIScan Book Documentation
13 pages
Dual Pentium III / II Motherboard
No ratings yet
Dual Pentium III / II Motherboard
0 pages
8085 PROG examples
No ratings yet
8085 PROG examples
36 pages
HP 3PAR HP-UX - Implementation Guide - Asynch
No ratings yet
HP 3PAR HP-UX - Implementation Guide - Asynch
50 pages
OS Unit 1 Answer Key
No ratings yet
OS Unit 1 Answer Key
21 pages
Canon LBP-800 Service Manual
No ratings yet
Canon LBP-800 Service Manual
178 pages
PVTC Needs
No ratings yet
PVTC Needs
11 pages
ICT Final Exam For Grade 8
75% (8)
ICT Final Exam For Grade 8
3 pages
Dell Studio 1555 Quanta FM8 DAOFM8MB8E0 Discrete Rev 3A Schematics
No ratings yet
Dell Studio 1555 Quanta FM8 DAOFM8MB8E0 Discrete Rev 3A Schematics
58 pages
VEE15
No ratings yet
VEE15
14 pages
1940 - 1956: First Generation - Vacuum Tubes
No ratings yet
1940 - 1956: First Generation - Vacuum Tubes
8 pages
Apple Lisa Computer Information - Lisa Operating System Introduction (March 1982)
No ratings yet
Apple Lisa Computer Information - Lisa Operating System Introduction (March 1982)
25 pages
Brosur Solar Panel
No ratings yet
Brosur Solar Panel
12 pages
Dell Dimension 3100 - E310 Specifications
100% (1)
Dell Dimension 3100 - E310 Specifications
4 pages
MC A Equivalent
No ratings yet
MC A Equivalent
4 pages
CG Ranking
No ratings yet
CG Ranking
4 pages
Assembly Programming Basics
No ratings yet
Assembly Programming Basics
73 pages
Lab 2 8086
100% (1)
Lab 2 8086
3 pages
EELE4315 LectureNotes
No ratings yet
EELE4315 LectureNotes
92 pages
Paging and Segmentation
No ratings yet
Paging and Segmentation
33 pages
Worksheet Q2W5&6
No ratings yet
Worksheet Q2W5&6
4 pages
Typesofcomputersystemerror 160120080549
100% (1)
Typesofcomputersystemerror 160120080549
46 pages

cache_concepts_memory

Uploaded by

cache_concepts_memory

Uploaded by

System

• Physical address (m) is 4

Block address bits: 0111

• 0100 to 0111 block 1010

Red bit values are block 0110

offset (b) bits 0111

We’ll focus on Fully associative and direct mapping

Valid bit: 0 = invalid 0101 0xB2

three-line Cache design 0111 0xB4

Block offset (b) 1001 0xC2

data at address 0101 in register 0000

• Place a copy of block in any open 0010 0xA3

line in cache (i.e., valid = 0) 0011 0xA4

• Valid = 1 0100 0xB1

• Go to line and identify tag -> cache hit 0001 0xA2

• Put 0xB2 in register $9 0100 0xB1

Keep all data in a bigger but slower DRAM

Write operations, not straight-forward, two different policies:

• WRITE-BACK: CPU writes are cached, but not immediately written

Temporal locality: Recently

Spatial locality: Blocks with nearby

Instruction references Spatial locality

For programs with stride-k reference patterns, the

Loops have good temporal and spatial locality with

You might also like