0% found this document useful (0 votes)

14 views21 pages

Lectures wk11

Uploaded by

Abhay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views21 pages

Lectures wk11

Uploaded by

Abhay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Describing Caches

We characterize a cache using 5 parameters

• Access Time: Thit
• Capacity: the total amount of data the cache can hold
• # of lines * line length
• Line Length / Block size: The amount of data that gets moved into or
out of the cache as a chunk
• Analagous to page size in virtual memory
• What happens on a write?
• Replacement Policy: What data is replaced on a miss?
• Associativity: How many locations in the cache is a given address
eligible to be placed in?
• Unified, Instruction, Data: What type of data is kept in the cache?

139
Capacity
• In general, bigger is better
• The more data you can store in the cache, the less
often you have to go out to the main memory

• However, bigger caches tend to be slower

• Need to understand how both Thit and Phit change as
you change the capacity of the cache.
• Declining return on investment as cache size goes up

140
Cache Line Length
• Cache groups contiguous addresses into lines
• Lines almost always aligned on their size
• Caches fetch or write back an entire line of data on a
miss
• Spatial Locality
• Reading/Writing a Line
• Typically, takes much longer to fetch the first word of a
line than subsequent words
• Page Mode memories

Tfetch = Tfirst + (line length / fetch width) * Tsubsequent

141
What causes a MISS?
• Three Major Categories of Cache Misses:
• Compulsory Misses: first access to a block
• Capacity Misses: cache cannot contain all blocks needed to
execute the program
• Conflict Misses: block replaced by another block and then later
retrieved - (affects set assoc. or direct mapped caches)
• Nightmare Scenario: ping pong effect!

142
3/13/2024 142
Block Size and Spatial Locality
Block is unit of transfer between the cache and memory

Tag Word0 Word1 Word2 Word3 4 word block, b=2

Split CPU address block address offsetb

32-b bits b bits

2b = block size a.k.a line size (in bytes)
Larger block size has distinct hardware advantages
• less tag overhead
• exploit fast burst transfers from DRAM
• exploit fast burst transfers over wide busses

What are the disadvantages of increasing block size?

Fewer blocks => more conflicts. Can waste bandwidth.
143
3/13/2024 CS252-Fall’07 143
Miss Rates Vs Block Size
40%

35%

30%

25%
Miss rate

20%

15%

10%

0%
4 16 64 256
Block size (bytes) 1 KB
8 KB
Cache size 16 KB
64 KB
256 KB
144
Hit Rate isn’t Everything
• Average access time is better performance
indicator than hit rate
Tavg = Phit * Thit + Pmiss * Tmiss
Tmiss = Tfetch = Tfirst + (line length / fetch width) *
Tsubsequent

Trade-off: Increasing line length usually increases hit

rate, but also increases fetch time
• As lines get bigger, increase in fetch time starts to
outweigh increase in miss rate

145
Block Size Tradeoff

• In general, larger block size take advantage of spatial locality BUT:

• Larger block size means larger miss penalty:
• Takes longer time to fill up the block
• If block size is too big relative to cache size, miss rate will go up
• Too few cache blocks
• In general, Average Access Time:
• = Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate

Average
Miss Miss Access
Penalty Time
Rate Exploits Spatial Locality
Increased Miss Penalty
Fewer blocks: & Miss Rate
compromises
temporal locality

Block Size Block Size Block Size

146
Basics of caches
• How do we determine if the data is in the cache?
• If data is in the cache, how is it found?

• We only have information on:

• address of data
• how the cache is organised
• Direct mapped cache:
• the data can only be at a specific place

147
Contents of a direct mapped cache

• Data == Cached block

• TAG == Most significant bits of cached block address that identify the
block in that cache row from other blocks that map to that same row
• VALID == Flag bit to indicate the cache content is valid

148
ADDRESS
Direct cache Address (showing bit positions)
31 30 13 12 11 210
Byte
offset

20 10
Hit Data
Tag
Index
Separate address into fields:
Index Valid Tag Data
•Byte offset in word
0
•Index for row of cache 1
2
•Tag identifier of block

1021
Cache of 2^n words, a block being 1022
1023
a 4 byte word, has 2^n*(63-n) bits
20 32
for 32 bit address

#rows=2^n
#bits/row=32+32-2-n+1=63-n
149
Example: 1 KB Direct Mapped Cache with 32 Byte Blocks

• For a 2 ** N byte cache:

• The uppermost (32 - N) bits are always the Cache Tag
• The lowest M bits are the Byte Select (Block Size = 2 ** M)

31 9 4 0
Cache Tag Example: 0x50 Cache Index Byte Select
Ex: 0x01 Ex: 0x00
Stored as part
of the cache “state”

Valid Bit Cache Tag Cache Data

Byte 31 Byte 1 Byte 0 0

: :
0x50 Byte 63 Byte 33 Byte 32 1
2
3
: : :

Byte 1023 Byte 992 31

:
150
Extreme Example: single big line

Valid Bit Cache Tag Cache Data

Byte 3 Byte 2 Byte 1 Byte 0 0

• Cache Size = 4 bytes Block Size = 4 bytes

• Only ONE entry in the cache
• If an item is accessed, likely that it will be accessed again soon
• But it is unlikely that it will be accessed again immediately!!!
• The next access will likely be a miss again
• Continually loading data into the cache but
discard (force out) them before they are used again
• Worst nightmare of a cache designer: Ping Pong Effect
• Conflict Misses are misses caused by:
• Different memory locations mapped to the same cache index
• Solution 1: make the cache size bigger
• Solution 2: Multiple entries for the same Cache Index

151
A Two-way Set Associative Cache

• N-way set associative: N entries for each Cache Index

• N direct mapped caches operates in parallel
• Example: Two-way set associative cache
• Cache Index selects a “set” from the cache
• The two tags in the set are compared in parallel
• Data is selected based on the tag result

Cache Index
Valid Cache Tag Cache Data Cache Data Cache Tag Valid
Cache Block 0 Cache Block 0
: : : : : :

Adr Tag
Compare Sel1 1 Mux 0 Sel0 Compare

OR
Cache Block
Hit 152
Another Extreme Example: Fully Associative

• Fully Associative Cache, N blocks of 32 bytes each

• Forget about the Cache Index
• Compare the Cache Tags of all cache entries in parallel
• Example: Block Size = 32 Byte blocks, we need N 27-bit comparators

31 4 0
Cache Tag (27 bits long) Byte Select
Ex: 0x01

Cache Tag Valid Bit Cache Data

X Byte 31 Byte 1 Byte 0

: :
X Byte 63 Byte 33 Byte 32
X
X
: : :
X
153
Which Block Should be Replaced on a Miss?
• Easy for Direct Mapped
• Set Associative or Fully Associative:
• Random - easier to implement
• Least Recently used - harder to implement - may approximate
• Miss rates for caches with different size, associativity and
replacemnt algorithm.
Associativity: 2-way 4-way 8-way
Size LRU Random LRU Random LRU Random
16 KB 5.18% 5.69% 4.67% 5.29% 4.39% 4.96%
64 KB 1.88% 2.01% 1.54% 1.66% 1.39% 1.53%
256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%

For caches with low miss rates, random is almost as good as LRU.
Q4: What Happens on a Write?
• Write through: The information is written to both the block in
the cache and to the block in the lower-level memory.
• Write back: The information is written only to the block in the
cache. The modified cache block is written to main memory only
when it is replaced.
• is block clean or dirty? (add a dirty bit to each block)
• Pros and Cons of each:
• Write through
• read misses cannot result in writes to memory,
• easier to implement
• Always combine with write buffers to avoid memory latency
• Write back
• Less memory traffic
• Perform writes at the speed of the cache
Q4: What Happens on a Write?
• Since data does not have to be brought into the cache on a write
miss, there are two options:
• Write allocate
• The block is brought into the cache on a write miss
• Used with write-back caches
• Hope subsequent writes to the block hit in cache
• No-write allocate
• The block is modified in memory, but not brought into the cache
• Used with write-through caches
• Writes have to go to memory anyway, so why bring the block into the cache
ARM9 – Split Cache
• ARM9TDMI
• ARM 32-bit and Thumb 16-bit instructions (v4T ISA).
• Code compatibility with ARM7TDMI:
• Portable to 0.25, 0.18 µm CMOS and below.
• Harvard 5-stage pipeline implementation:
• Higher performance (CPI = 1.5)
• Coprocessor interface for on-chip coprocessors:
• Allows floating point, DSP, graphics accelerators.
• EmbeddedICE debug capability
CPU Pipeline structure with Cache
Fetch Decode Read E1 E2 Write
Phase Phase Phase Phase Phase Phase

Mul

Register Register
I-buffer IU Rollback,
And File
Access Write back
128-bit Variable and
ICache word Size And
Bypassing
Decoding Immediate Load/Store
Generation

DCach
PC and e
Fetch Branch Unit
Address Exception
Generation Generation

13-Mar-24 158
ARM Cortex-A9 MPCore

Multicore processor providing up to 4 cache-coherent Cortex-A9

cores each implementing the ARM v7 instruction set
architecture.

Of Master of Business Administration: Study of Warehouse Operations in Mahindra Logistics Limited
No ratings yet
Of Master of Business Administration: Study of Warehouse Operations in Mahindra Logistics Limited
71 pages
Digital Vortex 2019
100% (1)
Digital Vortex 2019
16 pages
Cache Design
No ratings yet
Cache Design
59 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Cache Basics and Operation
No ratings yet
Cache Basics and Operation
42 pages
Computer Arch 06
No ratings yet
Computer Arch 06
41 pages
Lec 4
No ratings yet
Lec 4
31 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
Chap 6
No ratings yet
Chap 6
48 pages
EE6304 Lecture9 Mem Caches
No ratings yet
EE6304 Lecture9 Mem Caches
61 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
Caches - Basic Idea
No ratings yet
Caches - Basic Idea
11 pages
9 - Cache
No ratings yet
9 - Cache
58 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Cache Mapping
No ratings yet
Cache Mapping
11 pages
Cache Memory
No ratings yet
Cache Memory
51 pages
10 Caches Detail
No ratings yet
10 Caches Detail
45 pages
Today: - How Do Caches Work?
No ratings yet
Today: - How Do Caches Work?
38 pages
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
No ratings yet
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
17 pages
55-Types of Caches, Caches Misses,-04!03!2025
No ratings yet
55-Types of Caches, Caches Misses,-04!03!2025
64 pages
CMSC 611: Advanced Computer Architecture
No ratings yet
CMSC 611: Advanced Computer Architecture
21 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
Unit V
No ratings yet
Unit V
44 pages
11 Cache Memory
No ratings yet
11 Cache Memory
40 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
61 pages
Chapter 5.1-5.6 Memory
No ratings yet
Chapter 5.1-5.6 Memory
26 pages
4.1 Computer Memory System Overview
No ratings yet
4.1 Computer Memory System Overview
12 pages
Cache Memory
No ratings yet
Cache Memory
39 pages
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
No ratings yet
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
33 pages
04 Cache Memory
No ratings yet
04 Cache Memory
71 pages
CL10 MemoryMgmt
No ratings yet
CL10 MemoryMgmt
45 pages
09 Caches Tlbs
No ratings yet
09 Caches Tlbs
33 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
CODch 7 Slides
No ratings yet
CODch 7 Slides
49 pages
Course Code: CS 283 Course Title: Computer Architecture: Class Day: Friday Timing: 12:00 To 1:30
No ratings yet
Course Code: CS 283 Course Title: Computer Architecture: Class Day: Friday Timing: 12:00 To 1:30
23 pages
Assosiative Mapping - Cache Memory
No ratings yet
Assosiative Mapping - Cache Memory
2 pages
Cache Memory
No ratings yet
Cache Memory
47 pages
Cache Mapping
100% (1)
Cache Mapping
44 pages
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
No ratings yet
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
20 pages
DECO - Module 4.3 - Cache
No ratings yet
DECO - Module 4.3 - Cache
20 pages
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II
No ratings yet
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II
27 pages
Wk10a Cache PDF
No ratings yet
Wk10a Cache PDF
25 pages
Elements of Cache Design Pentium IV Cache Organization
No ratings yet
Elements of Cache Design Pentium IV Cache Organization
43 pages
Unit 4
No ratings yet
Unit 4
72 pages
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
No ratings yet
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
49 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
Class11 Cache
No ratings yet
Class11 Cache
41 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
51 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
24-Cache Memory Mapping Techniques-14!03!2024
No ratings yet
24-Cache Memory Mapping Techniques-14!03!2024
36 pages
BELL 429 : Performance Limits
100% (1)
BELL 429 : Performance Limits
18 pages
Exchange Interface Bell 202
No ratings yet
Exchange Interface Bell 202
1 page
CV Ricky
No ratings yet
CV Ricky
4 pages
KCTD-601: 1 Cylinder 4 Stroke Petrol Engine Test Rig With Open ECU & Eddy Current Dynamometer
No ratings yet
KCTD-601: 1 Cylinder 4 Stroke Petrol Engine Test Rig With Open ECU & Eddy Current Dynamometer
3 pages
IoT Protocols
No ratings yet
IoT Protocols
8 pages
Battle Armor - Core Rules
100% (2)
Battle Armor - Core Rules
25 pages
OX Instructions R4.5-1
No ratings yet
OX Instructions R4.5-1
66 pages
REST API Security Cheat Sheet - 1714665717010
No ratings yet
REST API Security Cheat Sheet - 1714665717010
16 pages
Lista Precios Filtro Aire Tyc Enero 2024
No ratings yet
Lista Precios Filtro Aire Tyc Enero 2024
14 pages
Final Exam IT Risk Man-2
No ratings yet
Final Exam IT Risk Man-2
3 pages
Crop Yield Prediction Based On Indian Agriculture Using Machine Learning
100% (1)
Crop Yield Prediction Based On Indian Agriculture Using Machine Learning
56 pages
3 - Processor (Single Cycle)
No ratings yet
3 - Processor (Single Cycle)
53 pages
SMC Change Config Management Report - February 2023
No ratings yet
SMC Change Config Management Report - February 2023
10 pages
ZSMCP (Module Pool Program: ZMD - Kiabi - Main) .
No ratings yet
ZSMCP (Module Pool Program: ZMD - Kiabi - Main) .
36 pages
IBM QRadar + Keysight Visibility - Intelligence To Streamline Response
No ratings yet
IBM QRadar + Keysight Visibility - Intelligence To Streamline Response
4 pages
66112438X 2
No ratings yet
66112438X 2
12 pages
Plant Visor Installation
No ratings yet
Plant Visor Installation
5 pages
ELE 4623: Control Systems: Faculty of Engineering Technology
No ratings yet
ELE 4623: Control Systems: Faculty of Engineering Technology
16 pages
Unit Sub-Station 160kva
No ratings yet
Unit Sub-Station 160kva
2 pages
Construction and Site Supervision and Management
100% (2)
Construction and Site Supervision and Management
2 pages
Silverwing R-Scan: Manual, Dry-Coupled Ultrasonic Inspection System For Ferrous and Non-Ferrous Applications
No ratings yet
Silverwing R-Scan: Manual, Dry-Coupled Ultrasonic Inspection System For Ferrous and Non-Ferrous Applications
4 pages
Mono Channel Sepp Audio Power Amplifier Ic: Ics For TV
No ratings yet
Mono Channel Sepp Audio Power Amplifier Ic: Ics For TV
7 pages
KS833 Standard Power & Calibrator Instruction Manual: Kingsine Electric Automationco., LTD
100% (1)
KS833 Standard Power & Calibrator Instruction Manual: Kingsine Electric Automationco., LTD
82 pages
UNIT3
No ratings yet
UNIT3
32 pages
8 - 1 2017 Fire
No ratings yet
8 - 1 2017 Fire
22 pages
Football Data Analysis Using Machine Learning Techniques
No ratings yet
Football Data Analysis Using Machine Learning Techniques
3 pages
Final Report
No ratings yet
Final Report
25 pages
Thiet Bi Trung The Medium Voltage Products 0416 Int
No ratings yet
Thiet Bi Trung The Medium Voltage Products 0416 Int
33 pages

Lectures wk11

Uploaded by

Lectures wk11

Uploaded by

Describing Caches

We characterize a cache using 5 parameters

• However, bigger caches tend to be slower

Tfetch = Tfirst + (line length / fetch width) * Tsubsequent

Tag Word0 Word1 Word2 Word3 4 word block, b=2

Split CPU address block address offsetb

32-b bits b bits

What are the disadvantages of increasing block size?

Trade-off: Increasing line length usually increases hit

• In general, larger block size take advantage of spatial locality BUT:

Block Size Block Size Block Size

• We only have information on:

• Data == Cached block

• For a 2 ** N byte cache:

Valid Bit Cache Tag Cache Data

Byte 1023 Byte 992 31

Valid Bit Cache Tag Cache Data

• Cache Size = 4 bytes Block Size = 4 bytes

• N-way set associative: N entries for each Cache Index

• Fully Associative Cache, N blocks of 32 bytes each

Cache Tag Valid Bit Cache Data

Multicore processor providing up to 4 cache-coherent Cortex-A9

You might also like