0% found this document useful (0 votes)

47 views39 pages

Lect12 Cache

The document discusses how memory hierarchies exploit locality by caching frequently used data closer to the CPU. It describes the basic concepts of hits, misses, hit rates and miss penalties when accessing different levels of memory. Direct-mapped caches index into the cache using some bits of the memory address to determine if a request is a hit or miss. Larger block sizes in caches can improve performance by taking advantage of spatial locality, but larger blocks also increase conflict misses if the block size is too large relative to the cache size.

Uploaded by

Vishal Mittal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views39 pages

Lect12 Cache

Uploaded by

Vishal Mittal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Exploiting Memory Hierarchy

Locality
 Locality is a principle that makes having a memory hierarchy a
good idea
 If an item is referenced then because of
 temporal locality: it will tend to be again referenced soon
 spatial locality: nearby items will tend to be referenced soon
 why does code have locality – consider instruction and data?
Hit and Miss
 Focus on any two adjacent levels – called, upper (closer to CPU)
and lower (farther from CPU) – in the memory hierarchy,
because each block copy is always between two adjacent levels
 Terminology:
 block: minimum unit of data to move between levels

 hit: data requested is in upper level

 miss: data requested is not in upper level

 hit rate: fraction of memory accesses that are hits (i.e.,

found at upper level)
 miss rate: fraction of memory accesses that are not hits

 miss rate = 1 – hit rate

 hit time: time to determine if the access is indeed a hit +

time to access and deliver the data from the upper level to
the CPU
 miss penalty: time to determine if the access is a miss + time
to replace block at upper level with corresponding block at
lower level + time to deliver the block to the CPU
Caches
 By simple example
 assume block size = one word of data

X4 X4
Reference to Xn
X1 X1
causes miss so
Xn – 2 Xn – 2
it is fetched from
memory
Xn – 1 Xn – 1

X2 X2
Xn
X3 X3

a. Before the reference to Xn b. After the reference to Xn

 Issues:
 how do we know if a data item is in the cache?

 if it is, how do we find it?

 if not, what do we do?

 Solution depends on cache addressing scheme…

Direct Mapped Cache
 MIPS style:
A d dre ss ( s h ow ing b it p os itio ns )
31 30 13 12 11 2 1 0
Byte
offse t
20 10
H it D a ta
Tag

Ind e x

In de x V a lid T ag D a ta
0
1
2

1021
1022
1023
20 32

Cache with 1024 1-word blocks: byte offset

(least 2 significant bits) is ignored and
next 10 bits used to index into cache

What kind of locality are we taking advantage of?

DECStation 3100 Cache(MIPS R2000 processor)
Address (showing bit positions)
3 1 30 17 1 6 15 54 32 10

16 14 Byte
offset
H it D ata

16 bits 32 bits
Valid Tag D ata

16K
entries

16 32

Cache with 16K 1-word blocks: byte offset

(least 2 significant bits) is ignored and
next 14 bits used to index into cache
Cache Read Hit/Miss
 Cache read hit: no action needed
 Instruction cache read miss:
1. Send original PC value (current PC – 4, as PC has already
been incremented in first step of instruction cycle) to
memory
2. Instruct main memory to perform read and wait for
memory to complete access – stall on read
3. After read completes write cache entry
4. Restart instruction execution at first step to refetch
instruction
 Data cache read miss:
 Similar to instruction cache miss
 To reduce data miss penalty allow processor to execute
instructions while waiting for the read to complete until
the word is required – stall on use
Cache Write Hit/Miss
Write-through scheme
 on write hit: replace data in cache and memory with every
write hit to avoid inconsistency
 on write miss: write the word into cache and memory –
obviously no need to read missed word from memory!
 Write-through is slow because of always required memory write

 performance is improved with a write buffer where words

are stored while waiting to be written to memory –

processor can continue execution until write buffer is full
 when a word in the write buffer completes writing into main

memory that buffer slot is freed and becomes available for

future writes
 DEC 3100 write buffer has 4 words

 Write-back scheme
 write the data block only into the cache and write-back the
block to main only when it is replaced in cache
 more efficient than write-through, more complex to implement
Direct Mapped Cache: Taking Advantage of Spatial Locality
 Taking advantage of spatial locality with larger blocks:

Address (showing bit positions)

31 16 1 5 4 32 1 0

16 12 2 Byte
H it T ag D a ta
offset
Index Block offset
1 6 bits 12 8 bits
V T ag D ata

4K
entrie s

16 32 32 32 32

M ux
32

Cache with 4K 4-word blocks: byte offset (least 2 significant bits) is ignored,
next 2 bits are block offset, and the next 12 bits are used to index into cache
Direct Mapped Cache: Taking Advantage of
Spatial Locality
 Cache replacement in large (multiword) blocks:
 word read miss: read entire block from main memory

 word write miss: cannot simply write word and tag! Why?!

 writing in a write-through cache:

 if write hit, i.e., tag of requested address and cache

entry are equal, continue as for 1-word blocks by

replacing word and writing block to both cache and
memory
 if write miss, i.e., tags are unequal, fetch block from
memory, replace word that caused miss, and write block
to both cache and memory
 therefore, unlike case of 1-word blocks, a write miss with
a multiword block causes a memory read
Direct Mapped Cache: Taking Advantage of
Spatial Locality
 Miss rate falls at first with increasing block size as expected,
but, as block size becomes a large fraction of total cache size,
miss rate may go up because
 there are few blocks
 competition for blocks increases
 blocks get ejected before most of their words are accessed
(thrashing in cache)
40%

35%

30%

Miss rate vs. block size for 25%

Miss rate

various cache sizes 20%

15%

10%

0%
4 16 64 256
Block size (bytes) 1 KB
8 KB
16 KB
64 KB
256 KB
Example

 How many total bits are required for a direct-mapped cache

with 128 KB of data and 1-word block size, assuming a 32-bit
address?
Example
 How many total bits are required for a direct-mapped cache
with 128 KB of data and 1-word block size, assuming a 32-bit
address?

 Cache data = 128 KB = 217 bytes = 215 words = 215 blocks

 Cache entry size = block data bits + tag bits + valid bit
= 32 + (32 – 15 – 2) + 1 = 48 bits
 Therefore, cache size = 215  48 bits =
215  (1.5  32) bits = 1.5  220 bits = 1.5 Mbits
 data bits in cache = 128 KB  8 = 1 Mbits

 total cache size/actual cache data = 1.5

Example Problem
 How many total bits are required for a direct-mapped cache with
128 KB of data and 4-word block size, assuming a 32-bit
address?

 Cache size = 128 KB = 217 bytes = 215 words = 213 blocks

 Cache entry size = block data bits + tag bits + valid bit
= 128 + (32 – 13 – 2 – 2) + 1 = 144 bits
 Therefore, cache size = 213  144 bits =
213  (1.25  128) bits = 1.25  220 bits = 1.25 Mbits
 data bits in cache = 128 KB  8 = 1 Mbits

 total cache size/actual cache data = 1.25

Example Problem
 Consider a cache with 64 blocks and a block size of 16 bytes. What
block number does byte address 1200 map to?

 As block size = 16 bytes:

byte address 1200  block address 1200/16  = 75
 As cache size = 64 blocks:
block address 75  cache block (75 mod 64) = 11
Block Size Considerations

 Larger blocks should reduce miss rate

 Due to spatial locality
 But in a fixed-sized cache
 Larger blocks  fewer of them
 More competition  increased miss rate
 Larger miss penalty
 Can override benefit of reduced miss rate
 Early restart and critical-word-first can help
Performance
 Simplified model assuming equal read and write miss penalties:
 CPU time = (execution cycles + memory stall cycles)  cycle

time
 memory stall cycles = number of memory accesses  miss

rate  miss penalty

 Therefore, two ways to improve performance in cache:

 decrease miss rate

 decrease miss penalty

 what happens if we increase block size?

Example
 Assume for a given machine and program:
 instruction cache miss rate 2%
 data cache miss rate 4%
 miss penalty always 40 cycles
 CPI of 2 without memory stalls
 frequency of load/stores 36% of instructions

1. How much faster is a machine with a perfect cache that never

misses?
2. What happens if we speed up the machine by reducing its CPI to 1
without changing the clock rate?
3. What happens if we speed up the machine by doubling its clock rate,
but if the absolute time for a miss penalty remains same?
Solution
1.
 Assume instruction count = I
 Instruction miss cycles = I  2%  40 = 0.8  I
 Data miss cycles = I  36%  4%  40 = 0.576  I
 So, total memory-stall cycles = 0.8  I + 0.576  I = 1.376  I
 in other words, 1.376 stall cycles per instruction

 Therefore, CPI with memory stalls = 2 + 1.376 = 3.376

 Assuming instruction count and clock rate remain same for a
perfect cache and a cache that misses:
CPU time with stalls / CPU time with perfect cache
= 3.376 / 2 = 1.688
 Performance with a perfect cache is better by a factor of 1.688
Solution (cont.)
2. What happens if we speed up the machine by reducing its CPI to 1
without changing the clock rate?

 CPI without stall = 1

 CPI with stall = 1 + 1.376 = 2.376 (clock has not changed so
stall cycles per instruction
remains same)
 CPU time with stalls / CPU time with perfect cache
= CPI with stall / CPI without stall
= 2.376
 Performance with a perfect cache is better by a factor of 2.376
 Conclusion: Lower the CPI more pronounced is the impact of
stall cycles
Solution (cont.)
3. What happens if we speed up the machine by doubling its clock rate, but if
the absolute time for a miss penalty remains same?

 With doubled clock rate, miss penalty = 2  40 = 80 clock cycles

 Stall cycles per instruction = (I  2%  80) + (I  36%  4%  80)
= 2.752  I
 So, faster machine with cache miss has CPI = 2 + 2.752 = 4.752
 CPU time with stalls / CPU time with perfect cache
= CPI with stall / CPI without stall
= 4.752 / 2 = 2.376
 Performance with a perfect cache is better by a factor of 2.376
 Conclusion: with higher clock rate cache misses “hurt more” than
with lower clock rate
Decreasing Miss Rates with Associative Block Placement

 Direct mapped: one unique cache location for each memory block
 cache block address = memory block address mod cache size
 Fully associative: each memory block can locate anywhere in cache
 all cache entries are searched (in parallel) to locate block
 Set associative: each memory block can place in a unique set of
cache locations – if the set is of size n it is n-way set-associative
 cache set address = memory block address mod number of sets in
cache
 all cache entries in the corresponding set are searched (in parallel) to
locate block
 Increasing degree of associativity
 reduces miss rate
 increases hit time because of the parallel search and then fetch
Decreasing Miss Rates with Associative Block Placement

Direct Mapped 2-way Set Associative Fully Associative

Direct mapped Set associative Fully associative
Block # 0 1 2 3 4 5 6 7 Set # 0 1 2 3

Data Data Data

12 mod 8 = 4 12 mod 4 = 0

1 1 1
Tag Tag Tag
2 2 2

Search Search Search

Location of a memory block with address 12 in a cache with 8

blocks with different degrees of associativity
Decreasing Miss Rates with Associative Block Placement
One-way set set
One-way associative
associative
(direct mapped)
Block Tag Data
0
Two-way set associative
1
Set Tag Data Tag Data
2
3 0

4 1

5 2

6 3

Four-way set associative

Set Tag Data Tag Data Tag Data Tag Data
0
1

Eight-way set associative (fully associative)

Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data

Configurations of an 8-block cache with different degrees of associativity

Example
 Find the number of misses for a cache with four 1-word blocks given
the following sequence of memory block accesses:
0, 8, 0, 6, 8,
for each of the following cache configurations
1. direct mapped
2. 2-way set associative (use LRU replacement policy)
3. fully associative

 Note about LRU replacement

 in a 2-way set associative cache LRU replacement can be
implemented with one bit at each set whose value
indicates the most recently referenced block
Solution
 1 (direct-mapped)
Block address Cache block
0 0 (= 0 mod 4)
6 2 (= 6 mod 4)
8 0 (= 8 mod 4)
Block address translation in direct-mapped cache

Address of memory Hit or Contents of cache blocks after reference

block accessed miss 0 1 2 3
0 miss Memory[0]
8 miss Memory[8]
0 miss Memory[0]
6 miss Memory[0] Memory[6]
8 miss Memory[8] Memory[6]
Cache contents after each reference – red indicates new entry added
 5 misses
Solution (cont.)

 2 (two-way set-associative)
Block address Cache set
0 0 (= 0 mod 2)
6 0 (= 6 mod 2)
8 0 (= 8 mod 2)
Block address translation in a two-way set-associative cache

Address of memory Hit or Contents of cache blocks after reference

block accessed miss Set 0 Set 0 Set 1 Set 1
0 miss Memory[0]
8 miss Memory[0] Memory[8]
0 hit Memory[0] Memory[8]
6 miss Memory[0] Memory[6]
8 miss Memory[8] Memory[6]
Cache contents after each reference – red indicates new entry added
 Four misses
Solution (cont.)
 3 (fully associative)

Address of memory Hit or Contents of cache blocks after reference

block accessed miss Block 0 Block 1 Block 2 Block 3
0 miss Memory[0]
8 miss Memory[0] Memory[8]
0 hit Memory[0] Memory[8]
6 miss Memory[0] Memory[8] Memory[6]
8 hit Memory[0] Memory[8] Memory[6]
Cache contents after each reference – red indicates new entry added

 3 misses
Implementation of a Set-Associative Cache
A d dr es s
31 3 0 1 2 11 10 9 8 3 2 1 0

22 8

4 - to - 1 m ultip le xo r

H it D a ta

4-way set-associative cache with 4 comparators and one 4-to-1

multiplexor:size of cache is 1K blocks = 256 sets * 4-block set size
Performance with Set-Associative Caches
1 5%

1 2%

9%
Miss rate

0%
O n e -w a y T w o -w a y F ou r-w a y E ig h t-w a y
A sso c ia tivity 1 KB 16 KB
2 KB 32 KB
4 KB 64 KB
8 KB 1 28 KB

Miss rates for each of eight cache sizes

with increasing associativity:
data generated from SPEC92 benchmarks
with 32 byte block size for all caches
Replacement Policy
 Direct mapped: no choice
 Set associative
 Prefer non-valid entry, if there is one
 Otherwise, choose among entries in the set
 Least-recently used (LRU)
 Choose the one unused for the longest time
 Simple for 2-way, manageable for 4-way, too hard
beyond that
 Random
 Gives approximately the same performance as
LRU for high associativity
Multilevel Caches
 Primary cache attached to CPU
 Small, but fast
 Level-2 cache services misses from primary
cache
 Larger, slower, but still faster than main memory
 Main memory services L-2 cache misses
 Some high-end systems include L-3 cache
Decreasing Miss Penalty with Multilevel Caches
 Add a second-level cache
 primary cache is on the same chip as the processor

 use SRAMs to add a second-level cache, between main

memory and the first-level cache
 if miss occurs in primary cache second-level cache is
accessed
 if data is found in second-level cache miss penalty is access
time of second-level cache which is much less than main
memory access time
 if miss occurs again at second-level then main memory
access is required and large miss penalty is incurred
 Design considerations using two levels of caches:
 try and optimize the hit time on the 1
st level cache to reduce
clock cycle
 try and optimize the miss rate on the 2
nd level cache to
reduce memory access penalties
 In other words, 2
nd level allows 1st level to go for speed

without “worrying” about failure…

Example Problem

 Assume a 500 MHz machine with

 base CPI 1.0

 main memory access time 200 ns.

 miss rate 5%

 How much faster will the machine be if we add a second-level

cache with 20ns access time that decreases the miss rate to 2%?
Solution
 Miss penalty to main = 200 ns / (2 ns / clock cycle) = 100 clock cycles
 Effective CPI with one level of cache
= Base CPI + Memory-stall cycles per instruction
= 1.0 + 5%  100 = 6.0
 With two levels of cache, miss penalty to second-level cache
= 20 ns / (2 ns / clock cycle) = 10 clock cycles
 Effective CPI with two levels of cache
= Base CPI + Primary stalls per instruction
+ Secondary stall per instruction
= 1 + 5%  10 + 2%  100 = 3.5
= 1 + (5%-2%)x10 + 2%x(100+10)
 Therefore, machine with secondary cache is faster by a factor of
6.0 / 3.5 = 1.71
Multilevel On-Chip Caches
Intel Nehalem 4-core processor

Per core: 32KB L1 I-cache, 32KB L1 D-cache, 512KB L2 cache

3-Level Cache Organization
Intel Nehalem AMD Opteron X4
L1 caches L1 I-cache: 32KB, 64-byte L1 I-cache: 32KB, 64-byte
(per core) blocks, 4-way, approx LRU blocks, 2-way, LRU
replacement, hit time n/a replacement, hit time 3 cycles
L1 D-cache: 32KB, 64-byte L1 D-cache: 32KB, 64-byte
blocks, 8-way, approx LRU blocks, 2-way, LRU
replacement, write- replacement, write-
back/allocate, hit time n/a back/allocate, hit time 9 cycles
L2 unified 512KB, 64-byte blocks, 8-way, 512KB, 64-byte blocks, 16-way,
cache approx LRU replacement, write- approx LRU replacement, write-
(per core) back/allocate, hit time n/a back/allocate, hit time n/a
L3 unified 8MB, 64-byte blocks, 16-way, 2MB, 64-byte blocks, 32-way,
cache replacement n/a, write- replace block shared by fewest
(shared) back/allocate, hit time n/a cores, write-back/allocate, hit
time 32 cycles
n/a: data not available
Sources of Misses (3C’s)
 Compulsory misses (aka cold start misses)
 First access to a block
 Capacity misses
 Due to finite cache size
 A replaced block is later accessed again
 Conflict misses (aka collision misses)
 In a non-fully associative cache
 Due to competition for entries in a set
 Would not occur in a fully associative cache of the
same total size
Cache Design Trade-offs
Design change Effect on miss rate Negative performance
effect
Increase cache size Decrease capacity May increase access
misses time
Increase associativity Decrease conflict May increase access
misses time
Increase block size Decrease compulsory Increases miss
misses penalty.

PU Chronicles BITS Pilani
100% (1)
PU Chronicles BITS Pilani
217 pages
Quality Manual
100% (1)
Quality Manual
45 pages
CODch 7 Slides
No ratings yet
CODch 7 Slides
49 pages
Unit V
No ratings yet
Unit V
44 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
49 pages
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
No ratings yet
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
49 pages
9 - Cache
No ratings yet
9 - Cache
58 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
10 Cacheperf
No ratings yet
10 Cacheperf
24 pages
CSE 332 L 15 Complete - 26th Sep 2020
No ratings yet
CSE 332 L 15 Complete - 26th Sep 2020
16 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Cache
No ratings yet
Cache
34 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Week 13 - Lecture 13 - Memory (Cont)
No ratings yet
Week 13 - Lecture 13 - Memory (Cont)
31 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
Chapter 5.1-5.6 Memory
No ratings yet
Chapter 5.1-5.6 Memory
26 pages
Part 1. Memory Analysis: Question 1 (40 PT) - You Are Given Designs of 3 Caches For A 16-Bit Address D1
No ratings yet
Part 1. Memory Analysis: Question 1 (40 PT) - You Are Given Designs of 3 Caches For A 16-Bit Address D1
4 pages
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
No ratings yet
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
25 pages
5 1
No ratings yet
5 1
39 pages
ch5 1
No ratings yet
ch5 1
44 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
Caches - Basic Idea
No ratings yet
Caches - Basic Idea
11 pages
Improving Cache Performance
No ratings yet
Improving Cache Performance
24 pages
Chapter5 - Direct Mapped Caches
No ratings yet
Chapter5 - Direct Mapped Caches
11 pages
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
No ratings yet
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
7 pages
1239302344
No ratings yet
1239302344
19 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Lec 4
No ratings yet
Lec 4
31 pages
CA11 2023S1 New
No ratings yet
CA11 2023S1 New
26 pages
CMSC 611: Advanced Computer Architecture
No ratings yet
CMSC 611: Advanced Computer Architecture
21 pages
CA Lecture 08
No ratings yet
CA Lecture 08
38 pages
Unit 4
No ratings yet
Unit 4
72 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Cache Design
No ratings yet
Cache Design
59 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
CL10 MemoryMgmt
No ratings yet
CL10 MemoryMgmt
45 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
24-Cache Memory Mapping Techniques-14!03!2024
No ratings yet
24-Cache Memory Mapping Techniques-14!03!2024
36 pages
Lectures wk11
No ratings yet
Lectures wk11
21 pages
Fundamentals of Computer Systems: Caches
No ratings yet
Fundamentals of Computer Systems: Caches
28 pages
Basic Philosophy: Cache Memory
No ratings yet
Basic Philosophy: Cache Memory
16 pages
Lec-8 Memory-4 CompArch
No ratings yet
Lec-8 Memory-4 CompArch
19 pages
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
No ratings yet
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
20 pages
Assign1 PDF
No ratings yet
Assign1 PDF
5 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
No ratings yet
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
33 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Dictionary of Computing
From Everand
Dictionary of Computing
Handz Valentin, Sr
No ratings yet
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
From Everand
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Rodrigo Copetti
No ratings yet
Lect 1
No ratings yet
Lect 1
54 pages
Instructions: Language of The Machine
No ratings yet
Instructions: Language of The Machine
47 pages
Lect11 Cache Basics
No ratings yet
Lect11 Cache Basics
22 pages
How Many New Processes Are Created by Fork Calls?: CS C372/IS C362 - Operating System
No ratings yet
How Many New Processes Are Created by Fork Calls?: CS C372/IS C362 - Operating System
3 pages
Revisiting Hazards: Data Hazards Control Hazards Hardware
No ratings yet
Revisiting Hazards: Data Hazards Control Hazards Hardware
45 pages
Question #1 (9 Marks) : Operating Systems (CS C372) Comprehensive Exam Regular
No ratings yet
Question #1 (9 Marks) : Operating Systems (CS C372) Comprehensive Exam Regular
4 pages
Lect5 Single Cycle Control
No ratings yet
Lect5 Single Cycle Control
29 pages
OS Quiz Solutions PDF
No ratings yet
OS Quiz Solutions PDF
4 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
Number (I) Max (Number (0), Number (1),, Number (N - 1) ) +1 J 0 J N J++) (Number (J) ! 0) &&
No ratings yet
Number (I) Max (Number (0), Number (1),, Number (N - 1) ) +1 J 0 J N J++) (Number (J) ! 0) &&
3 pages
Os1 First 2013-14 PDF
No ratings yet
Os1 First 2013-14 PDF
2 pages
Open Book Component
No ratings yet
Open Book Component
3 pages
CS C372/IS C362 - Operating System: Birla Institute of Technology & Science, Pilani QUIZ 3 - Closed Book
No ratings yet
CS C372/IS C362 - Operating System: Birla Institute of Technology & Science, Pilani QUIZ 3 - Closed Book
2 pages
CS C372/IS C362 - Operating System: Birla Institute of Technology & Science, Pilani QUIZ 2 - Closed Book
No ratings yet
CS C372/IS C362 - Operating System: Birla Institute of Technology & Science, Pilani QUIZ 2 - Closed Book
2 pages
CS C372/IS C362 - Operating System: Birla Institute of Technology & Science, Pilani Test 1 - Closed Book
No ratings yet
CS C372/IS C362 - Operating System: Birla Institute of Technology & Science, Pilani Test 1 - Closed Book
2 pages
Birla Institute of Technology & Science, Pilani Comprehensive Examination
No ratings yet
Birla Institute of Technology & Science, Pilani Comprehensive Examination
6 pages
Open Book Component
No ratings yet
Open Book Component
2 pages
Closed Book Component
No ratings yet
Closed Book Component
2 pages
Birla Institute of Technology & Science, Pilani (Raj) I Semester 2009-2010 CS C372/IS C362 Operating Systems TEST I (Close Book) 23/09/2009 Time: 50 Min MM: 50
No ratings yet
Birla Institute of Technology & Science, Pilani (Raj) I Semester 2009-2010 CS C372/IS C362 Operating Systems TEST I (Close Book) 23/09/2009 Time: 50 Min MM: 50
3 pages
PIPING QUESTIONAIRE (Interview) - Google Groups PDF
100% (1)
PIPING QUESTIONAIRE (Interview) - Google Groups PDF
11 pages
Inflation Student Spending
No ratings yet
Inflation Student Spending
4 pages
Pivot Point
100% (1)
Pivot Point
22 pages
b2b Portal Matter
No ratings yet
b2b Portal Matter
9 pages
Duties in Jurisprudence
No ratings yet
Duties in Jurisprudence
12 pages
Road Map
No ratings yet
Road Map
27 pages
Iot Smart Home Concept: David Vasicek, Jakub Jalowiczor, Lukas Sevcik and Miroslav Voznak, Senior Member, Ieee
No ratings yet
Iot Smart Home Concept: David Vasicek, Jakub Jalowiczor, Lukas Sevcik and Miroslav Voznak, Senior Member, Ieee
4 pages
Shruthi Resume Updated 2023
No ratings yet
Shruthi Resume Updated 2023
3 pages
Missing Homework Notice Template
100% (1)
Missing Homework Notice Template
6 pages
CLIX Safety Hypodermic Needle: Vacuette
No ratings yet
CLIX Safety Hypodermic Needle: Vacuette
2 pages
Peru Consumer Profile
No ratings yet
Peru Consumer Profile
48 pages
Islamic Economics The Islamic Bourgeoisie and The Imagined Community Islami Iktisat Islami Burjuvazi Ve Hayali Cemaat
No ratings yet
Islamic Economics The Islamic Bourgeoisie and The Imagined Community Islami Iktisat Islami Burjuvazi Ve Hayali Cemaat
113 pages
Bcac-201 Bca-C Ca2
No ratings yet
Bcac-201 Bca-C Ca2
10 pages
Tsep 2024-1
No ratings yet
Tsep 2024-1
1 page
BfEEkZJ7Nr - Campus Proposition Presentation - ICICI Bank Finals
No ratings yet
BfEEkZJ7Nr - Campus Proposition Presentation - ICICI Bank Finals
28 pages
Echo Cancellation Project Proposal
No ratings yet
Echo Cancellation Project Proposal
2 pages
Value of Libraries
No ratings yet
Value of Libraries
7 pages
Fundamental Principles of Occupational Health and Safety 2nd Edition Benjamin O. Alli Download
No ratings yet
Fundamental Principles of Occupational Health and Safety 2nd Edition Benjamin O. Alli Download
52 pages
Electronics Repair
No ratings yet
Electronics Repair
2 pages
Livingston White Dissertation
100% (2)
Livingston White Dissertation
5 pages
Installation, Operation, and Maintenance Manual: GHC Series Iron Pumps
No ratings yet
Installation, Operation, and Maintenance Manual: GHC Series Iron Pumps
44 pages
Curriculum Vitae
No ratings yet
Curriculum Vitae
4 pages
M Revised Payment Policy
No ratings yet
M Revised Payment Policy
2 pages
MCA and BCA Projects Download With Source Code
No ratings yet
MCA and BCA Projects Download With Source Code
1 page
Vaizdai Ieškant Audi A4 v159: Pereiti Prie Pagrindinio Turinio Atsiliepimai Apie Pritaikymą Neįgaliesiems
No ratings yet
Vaizdai Ieškant Audi A4 v159: Pereiti Prie Pagrindinio Turinio Atsiliepimai Apie Pritaikymą Neįgaliesiems
5 pages
Civl 409 MUNICIPAL ENGINEERING Lectures
No ratings yet
Civl 409 MUNICIPAL ENGINEERING Lectures
4 pages
Astm A 148
No ratings yet
Astm A 148
4 pages
SJ SK Assessment 2 - AURETH101 - Skills Test
No ratings yet
SJ SK Assessment 2 - AURETH101 - Skills Test
30 pages
Design Challenges For Energy-Constrained Ad Hoc Wireless Networks
No ratings yet
Design Challenges For Energy-Constrained Ad Hoc Wireless Networks
26 pages

Lect12 Cache

Uploaded by

Lect12 Cache

Uploaded by

Exploiting Memory Hierarchy

 hit: data requested is in upper level

 miss: data requested is not in upper level

 hit rate: fraction of memory accesses that are hits (i.e.,

 miss rate = 1 – hit rate

 hit time: time to determine if the access is indeed a hit +

a. Before the reference to Xn b. After the reference to Xn

 if it is, how do we find it?

 if not, what do we do?

 Solution depends on cache addressing scheme…

Cache with 1024 1-word blocks: byte offset

What kind of locality are we taking advantage of?

Cache with 16K 1-word blocks: byte offset

 performance is improved with a write buffer where words

are stored while waiting to be written to memory –

memory that buffer slot is freed and becomes available for

Address (showing bit positions)

 writing in a write-through cache:

 if write hit, i.e., tag of requested address and cache

entry are equal, continue as for 1-word blocks by

Miss rate vs. block size for 25%

various cache sizes 20%

 How many total bits are required for a direct-mapped cache

 Cache data = 128 KB = 217 bytes = 215 words = 215 blocks

 total cache size/actual cache data = 1.5

 Cache size = 128 KB = 217 bytes = 215 words = 213 blocks

 total cache size/actual cache data = 1.25

 As block size = 16 bytes:

 Larger blocks should reduce miss rate

rate  miss penalty

 Therefore, two ways to improve performance in cache:

 decrease miss penalty

 what happens if we increase block size?

1. How much faster is a machine with a perfect cache that never

 Therefore, CPI with memory stalls = 2 + 1.376 = 3.376

 CPI without stall = 1

 With doubled clock rate, miss penalty = 2  40 = 80 clock cycles

Direct Mapped 2-way Set Associative Fully Associative

Data Data Data

Search Search Search

Location of a memory block with address 12 in a cache with 8

Four-way set associative

Eight-way set associative (fully associative)

Configurations of an 8-block cache with different degrees of associativity

 Note about LRU replacement

Address of memory Hit or Contents of cache blocks after reference

Address of memory Hit or Contents of cache blocks after reference

Address of memory Hit or Contents of cache blocks after reference

In d ex V Tag D a ta V Tag D a ta V T ag D ata V T ag D ata

4-way set-associative cache with 4 comparators and one 4-to-1

Miss rates for each of eight cache sizes

 use SRAMs to add a second-level cache, between main

without “worrying” about failure…

 Assume a 500 MHz machine with

 main memory access time 200 ns.

 How much faster will the machine be if we add a second-level

Per core: 32KB L1 I-cache, 32KB L1 D-cache, 512KB L2 cache

You might also like