0% found this document useful (0 votes)

116 views21 pages

CMSC 611: Advanced Computer Architecture

This document discusses computer memory hierarchies and cache memory. It describes why caches are needed to bridge the growing gap between processor and memory speeds. Various cache design issues are covered, such as block identification, placement, and replacement policies, as well as the basics of direct-mapped and set-associative caches. The goal is to maximize memory performance with the least cost through an effective cache design.

Uploaded by

manish0009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

116 views21 pages

CMSC 611: Advanced Computer Architecture

Uploaded by

manish0009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

CMSC 611: Advanced

Computer Architecture

Cache
Introduction
• Why do designers need to know about Memory technology?
– Processor performance is usually limited by memory bandwidth
– As IC densities increase, lots of memory will fit on chip
• What are the different types of memory?
• How to maximize memory performance with least cost?

Computer
Processor Memory Devices

Control Input

Datapath Output
Processor-Memory
Performance µProc
1000 CPU 60%/yr.
CPU-DRAM Gap“Moore’s Law” (2X/1.5yr)
Performance

100 Processor-Memory
Performance Gap:
(grows 50% / year)
10
DRAM
DRAM 9%/yr.
1 (2X/10 yrs)
1989
1980
1981
1982
1983
1984
1985
1986
1987
1988

1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Time
Problem: Memory can be a bottleneck for processor performance
Solution: Rely on memory hierarchy of faster memory to bridge the gap
Memory Hierarchy
• Temporal Locality (Locality in Time):
fi Keep most recently accessed data items closer to the processor

• Spatial Locality (Locality in Space):

fi Move blocks consists of contiguous words to the faster levels
Processor

Control
Secondary
Storage
Second Main (Disk)
On-Chip
Registers

Level Memory
Cache

Datapath Cache (DRAM)

(SRAM)

Compiler
Hardware
Speed: Fastest Operating Slowest
Size: Smallest System
Biggest
Cost: Highest Lowest
Memory Hierarchy
Terminology
• Hit: data appears in some block in the faster level (example: Block X)
– Hit Rate: the fraction of memory access found in the faster level
– Hit Time: Time to access the faster level which consists of
• Memory access time + Time to determine hit/miss
• Miss: data needs to be retrieve from a block in the slower level (Block Y)
– Miss Rate = 1 - (Hit Rate)
– Miss Penalty: Time to replace a block in the upper level + Time to
deliver the block the processor
• Hit Time << Miss Penalty

Slower Level
To Processor Faster Level Memory
Memory
Block X
From Processor
Block Y

Slide: Dave Patterson

Memory Hierarchy Design
Issues
• Block identification
– How is a block found if it is in the upper (faster) level?
• Tag/Block
• Block placement
– Where can a block be placed in the upper (faster) level?
• Fully Associative, Set Associative, Direct Mapped
• Block replacement
– Which block should be replaced on a miss?
• Random, LRU
• Write strategy
– What happens on a write?
• Write Back or Write Through (with Write Buffer)

Slide: Dave Patterson

The Basics of Cache
• Cache: level of hierarchy closest to processor
• Caches first appeared in research machines in early 1960s
• Virtually every general-purpose computer produced today
includes cache
X4 X4

X1 X1
Requesting Xn Xn – 2 Xn – 2
generates a miss and
the word Xn will be
Xn – 1 Xn – 1
brought from main
X2 X2
memory to cache
Xn

X3 X3

Issues: a. Before the reference to Xn b. After the reference to Xn

• How do we know that a data item is in cache?

• If so, How to find it?
Direct-Mapped Cache
Valid Bit Cache Tag Cache Data
Byte 3 Byte 2 Byte 1 Byte 0

Cache
• Worst case is to keep replacing
a block followed by a miss for it: Memory words can be
Ping Pong Effect mapped only to one
cache block
• To reduces misses:
– make the cache size bigger
– multiple entries for the same
Cache Index

00001 00101 01001 01101 10001 10101 11001 11101

Memory

Cache block address = (Block address) modulo (Number of cache blocks)

Accessing Cache
Address (showing bit positions)
• Cache Size depends on: 31 30 13 12 11 210
Byte

– # cache blocks 20 10
offset

Hit Data
– # address bits Tag
Index
– Word size
Index Valid Tag Data
• Example: 0
1
– For n-bit address, 4-byte 2

word & 1024 cache

blocks:
– cache size = 1021
1022

1024 [(n-10 -2) + 1 + 32] bit 1023

20 32

Valid bit
# cache
Tag Word
blocks
size
Cache with Multi-Word/Block
Address (showing bit positions)
31 16 15 4 32 1 0

16 12 2 Byte
Hit Tag Data
offset
Index Block offset
16 bits 128 bits
V Tag Data

4K
entries

16 32 32 32 32

Mux
32

• Takes advantage of spatial locality to improve performance

• Cache block address = (Block address) modulo (Number of cache
blocks)
• Block address = (byte address) / (bytes per block)
Determining Block Size
• Larger block size take advantage of spatial locality BUT:
– Larger block size means larger miss penalty:
• Takes longer time to fill up the block
– If block size is too big relative to cache size, miss rate will go up
• Too few cache blocks
• Average Access Time =
Hit Time * (1 - Miss Rate) + Miss Penalty * Miss Rate

Average
Miss Miss Access
Penalty Rate Time
Exploits Spatial Locality

Increased Miss Penalty

Fewer blocks: & Miss Rate
compromises
temporal locality

Block Size Block Size Block Size

Slide: Dave Patterson
Block Placement Hardware Complexity

Cache utilization

Direct mapped Set associative Fully associative

Block # 0 1 2 3 4 5 6 7 Set # 0 1 2 3

Data Data Data

1 1 1
Tag Tag Tag
2 2 2

Search Search Search

• Set number = (Block number) modulo (Number of sets in the cache)

• Increased flexibility of block placement reduces probability of cache misses
Fully Associative Cache
• Forget about the Cache Index
• Compare the Cache Tags of all cache entries in parallel
• Example: Block Size = 32 Byte blocks, we need N 27-bit comparators
• By definition: Conflict Miss = 0 for a fully associative cache
31 4 0
Cache Tag (27 bits long) Byte Select
Ex: 0x01

Cache Tag Valid Bit Cache Data

X Byte 31 Byte 1 Byte 0

: :
X Byte 63 Byte 33 Byte 32

X
X

: : :
X
Slide: Dave Patterson
N-way Set Associative Cache
• N entries for each Cache Index
• Example: Two-way set associative cache
– Cache Index selects a “set” from the cache
– The two tags in the set are compared in parallel
– Data is selected based on the tag result
Cache Index
Valid Cache Tag Cache Data Cache Data Cache Tag Valid
Cache Block 0 Cache Block 0

: : : : : :

Adr Tag
Compare Sel1 1 Mux 0 Sel0 Compare

OR
Cache Block
Hit Slide: Dave Patterson
Locating a Block in
Associative Cache
Address
31 30 12 11 10 9 8 3210

22 8

Index V Tag Data V Tag Data V Tag Data V Tag Data

0
1
2

253
254
255
22 32

Tag
Tag size
size increases
increases with
with 4-to-1 multiplexor

higher
higher level
level of
of associativity
associativity
Hit Data
Handling Cache Misses
• Misses for read access always bring blocks from main memory
• Write access requires careful maintenance of consistency between
cache and main memory
• Two possible strategies for handling write access misses:
– Write through: The information is written to both the block in the cache and to
the block in the slower memory
• Read misses cannot result in writes
• No allocation of a cache block is needed
• Always combined with write buffers so that don’t wait for slow memory
– Write back: The information is written only to the block in the cache. The
modified cache block is written to main memory only when it is replaced
• Is block clean or dirty?
• No writes to slow memory for repeated write accesses
• Requires allocation of a cache block
Write Through via Buffering
Cache
Processor DRAM

Write Buffer
• Processor writes data into the cache and the write buffer
• Memory controller writes contents of the buffer to memory
• Increased write frequency can cause saturation of write buffer
• If CPU cycle time too fast and/or too many store instructions in a row:
– Store buffer will overflow no matter how big you make it
– The CPU Cycle Time get closer to DRAM Write Cycle Time
• Write buffer saturation can be handled by installing a second level (L2) cache

Cache L2
Processor DRAM
Cache

Write Buffer Slide: Dave Patterson

Block Replacement Strategy
• Straight forward for Direct Mapped since every block has only one
location
• Set Associative or Fully Associative:
– Random: pick any block
– LRU (Least Recently Used)
• requires tracking block reference
• for two-way set associative cache, reference bit attached to every block
• more complex hardware is needed for higher level of cache associativity
Associativity 2-way 4-way 8-way
LRU Random LRU Random LRU Random
Size
16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0%
64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5%
256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%

• Empirical results indicates less significance of replacement strategy with

increased cache sizes

Slide: Dave Patterson

Measuring Cache Performance
• To enhance cache performance, one can:
– reduce the miss rate (e.g. diminishing blocks collision
probability)
– reduce the miss penalty (e.g. adding multi-level caching)
– Enhance hit access time (e.g. simple and small cache)

CPU time = (CPU execution clock cycles + Memory - stall clock cycles) ¥ Clock cycle time

Memory - stall clock cycles = Read - stall cycles + Write - stall cycles

Read
Read - stall cycles = ¥ Read miss rate ¥ Read miss penalty
Program

For write-through scheme: Hard to control, assume

enough buffer size
Ê Write ˆ
Write - stall cycles = ÁÁ ¥ Write miss rate ¥ Write miss penalty ˜˜ + Write buffer stalls
Ë Program ¯
Example
Assume an instruction cache miss rate for gcc of 2% and a data cache miss rate of 4%.
If a machine has a CPI of 2 without any memory stalls and the miss penalty is 40 cycles
for all misses, determine how much faster a machine would run with a perfect cache that
never missed. Assume 36% combined frequencies for load and store instructions
Answer:
Assume number of instructions = I
The number of memory miss cycles = I ¥ 2% ¥ 40 = 0.8 ¥ I
Data miss cycles = I ¥ 36% ¥ 4% ¥ 40 = 0.56 ¥ I
Total number of memory-stall cycles = 0.8 I + 0.56 I = 1.36 I
The CPI with memory stalls = 2 + 1.36 = 3.36
CPU time with stalls I ¥ CPIstall ¥ Clock cycle CPIstall 3.36
= = =
CPU time with perfect cache I ¥ CPI perfect ¥ Clock cycle CPI perfect 2

What
What happen
happen ifif CPU
CPU gets
gets faster?
faster?
Multi-level Cache Performance
Suppose we have a 500 MHz processor with a base CPI of 1.0 with no cache misses.
Assume memory access time is 200 ns and average cache miss rate is 5%. Compare
performance after adding a second level cache, with access time 20 ns, that reduces
miss rate to main memory to 2%.
Answer:
The miss penalty to main memory = 200/cycle time
= 200 ¥ 500/1000 = 100 clock cycles

Effective CPI = Base CPI + memory-stall cycles/instr. = 1 + 5% ¥ 100 = 6.0

With two-level caches

The miss penalty for accessing 2nd cache = 20 ¥ 500/1000 = 10 clock cycles
Total CPI = Base CPI + main memory-stall cycles/instruction +
secondary cache stall cycles/instruction

= 1 + 2% ¥ 100 + 5% ¥ 10 = 3.5

CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
CMP3010L08 Memory
No ratings yet
CMP3010L08 Memory
45 pages
Unit 4
No ratings yet
Unit 4
72 pages
Academic Writing Reaction Review Critique Concept Position Survey
No ratings yet
Academic Writing Reaction Review Critique Concept Position Survey
41 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
04 Cache Memory
No ratings yet
04 Cache Memory
71 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
Unit Iv
No ratings yet
Unit Iv
61 pages
Lectures wk11
No ratings yet
Lectures wk11
21 pages
Computer Arch 06
No ratings yet
Computer Arch 06
41 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
24-Cache Memory Mapping Techniques-14!03!2024
No ratings yet
24-Cache Memory Mapping Techniques-14!03!2024
36 pages
Lec 4
No ratings yet
Lec 4
31 pages
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
No ratings yet
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
33 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
10 Cacheperf
No ratings yet
10 Cacheperf
24 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
9 - Cache
No ratings yet
9 - Cache
58 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
Cache Basics and Operation
No ratings yet
Cache Basics and Operation
42 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
61 pages
Cache Memory
No ratings yet
Cache Memory
51 pages
09 Caches Tlbs
No ratings yet
09 Caches Tlbs
33 pages
EE6304 Lecture9 Mem Caches
No ratings yet
EE6304 Lecture9 Mem Caches
61 pages
Questions For The May 2024 IDU
100% (3)
Questions For The May 2024 IDU
13 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Computer Organization and Architecture: Cache Memory
100% (1)
Computer Organization and Architecture: Cache Memory
57 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
BSC Data Science Iit Kharagpur
No ratings yet
BSC Data Science Iit Kharagpur
8 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
MARITES B. MERCADO - : Santa Cruz
No ratings yet
MARITES B. MERCADO - : Santa Cruz
9 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Cache Memory
No ratings yet
Cache Memory
61 pages
Chapter 5.1-5.6 Memory
No ratings yet
Chapter 5.1-5.6 Memory
26 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
47 pages
Computer Architecture: Memory Hierarchy Design
No ratings yet
Computer Architecture: Memory Hierarchy Design
60 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
71 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
Unit V
No ratings yet
Unit V
44 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Cache
No ratings yet
Cache
34 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
No ratings yet
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
49 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Design Approach To Handle Late Arriving Dimensions and Late Arriving Facts
No ratings yet
Design Approach To Handle Late Arriving Dimensions and Late Arriving Facts
109 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
CODch 7 Slides
No ratings yet
CODch 7 Slides
49 pages
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II
No ratings yet
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II
27 pages
Pull Back Load and Stress Analysis of Pi
No ratings yet
Pull Back Load and Stress Analysis of Pi
18 pages
BUAN6320 - Chapter - 1-4 & 9
No ratings yet
BUAN6320 - Chapter - 1-4 & 9
191 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
Inclusive Education in The Philippines: Through The Eyes of Teachers, Administrators, and Parents of Children With Special Needs
No ratings yet
Inclusive Education in The Philippines: Through The Eyes of Teachers, Administrators, and Parents of Children With Special Needs
24 pages
Comparative Analysis of The Readymade Garments (RMG) Industry in Bangladesh and Neighboring Countries: A Study of Competitiveness, Challenges, and Opportunities
No ratings yet
Comparative Analysis of The Readymade Garments (RMG) Industry in Bangladesh and Neighboring Countries: A Study of Competitiveness, Challenges, and Opportunities
11 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Practical 7.2
No ratings yet
Practical 7.2
12 pages
Griffiths v3
No ratings yet
Griffiths v3
72 pages
12 JDBC Questions & Answers For Java Developers - Java Database Connectivity
No ratings yet
12 JDBC Questions & Answers For Java Developers - Java Database Connectivity
9 pages
Cache Design
No ratings yet
Cache Design
59 pages
Ict Notes
No ratings yet
Ict Notes
17 pages
CLASS 11 IP MySQL
No ratings yet
CLASS 11 IP MySQL
2 pages
Nelson - Mbi - Data Scientist - A
No ratings yet
Nelson - Mbi - Data Scientist - A
7 pages
Gianluca Hotz: SQL Server Modernization
No ratings yet
Gianluca Hotz: SQL Server Modernization
74 pages
4.1 Computer Memory System Overview
No ratings yet
4.1 Computer Memory System Overview
12 pages
D7 DevelopersGuide (0624-0833)
No ratings yet
D7 DevelopersGuide (0624-0833)
210 pages
Chapter 1 The Implementation of The Kabataan Kontra Droga at Terorismo 2 1
No ratings yet
Chapter 1 The Implementation of The Kabataan Kontra Droga at Terorismo 2 1
16 pages
What Is Meant by Utility Program
No ratings yet
What Is Meant by Utility Program
6 pages
Cultural Heritage Preservation Through Community e
No ratings yet
Cultural Heritage Preservation Through Community e
10 pages
ITK User Interface
100% (1)
ITK User Interface
40 pages
DBMS Interview Questions by Company
No ratings yet
DBMS Interview Questions by Company
15 pages
Data Science
No ratings yet
Data Science
10 pages
The Alignment Between International and National Academic Accreditations - An Application in Information Systems Bachelor Program at Kingdom of Saudi Arabia
No ratings yet
The Alignment Between International and National Academic Accreditations - An Application in Information Systems Bachelor Program at Kingdom of Saudi Arabia
25 pages
Databricks - Data Analyst
No ratings yet
Databricks - Data Analyst
5 pages
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
From Everand
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Rodrigo Copetti
No ratings yet
Record Book BCSL-022
No ratings yet
Record Book BCSL-022
52 pages
Pemetaan Industri Pengolahan Pangan Unggulan Berbasis Sistem Informasi Geografis Pada Propinsi Jawa Tengah
No ratings yet
Pemetaan Industri Pengolahan Pangan Unggulan Berbasis Sistem Informasi Geografis Pada Propinsi Jawa Tengah
8 pages
Module 05 - Oracle Apex - Part 5
No ratings yet
Module 05 - Oracle Apex - Part 5
9 pages
Paging: Example
No ratings yet
Paging: Example
11 pages
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
Hot Backups, Redo, and Fractured Blocks
No ratings yet
Hot Backups, Redo, and Fractured Blocks
11 pages
SNES Architecture: Architecture of Consoles: A Practical Analysis, #4
From Everand
SNES Architecture: Architecture of Consoles: A Practical Analysis, #4
Rodrigo Copetti
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet

CMSC 611: Advanced Computer Architecture

Uploaded by

CMSC 611: Advanced Computer Architecture

Uploaded by

CMSC 611: Advanced

• Spatial Locality (Locality in Space):

Datapath Cache (DRAM)

Slide: Dave Patterson

Slide: Dave Patterson

Issues: a. Before the reference to Xn b. After the reference to Xn

• How do we know that a data item is in cache?

00001 00101 01001 01101 10001 10101 11001 11101

Cache block address = (Block address) modulo (Number of cache blocks)

word & 1024 cache

1024 [(n-10 -2) + 1 + 32] bit 1023

• Takes advantage of spatial locality to improve performance

Increased Miss Penalty

Block Size Block Size Block Size

Direct mapped Set associative Fully associative

Data Data Data

Search Search Search

• Set number = (Block number) modulo (Number of sets in the cache)

Cache Tag Valid Bit Cache Data

Index V Tag Data V Tag Data V Tag Data V Tag Data

Write Buffer Slide: Dave Patterson

• Empirical results indicates less significance of replacement strategy with

Slide: Dave Patterson

For write-through scheme: Hard to control, assume

Effective CPI = Base CPI + memory-stall cycles/instr. = 1 + 5% ¥ 100 = 6.0

With two-level caches

You might also like