0% found this document useful (0 votes)

87 views96 pages

Cache Performance Research Paper

The document discusses cache performance and memory hierarchy. It provides examples of direct-mapped cache, calculating cache hits and misses. The key aspects covered are cache block placement, identification and replacement strategies on misses.

Uploaded by

sharad6546

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views96 pages

Cache Performance Research Paper

Uploaded by

sharad6546

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 96

CSCE430/830 Computer Architecture

Memory Hierarchy: Set-Associative Cache

Lecturer: Prof. Hong Jiang

Courtesy of Yifeng Zhu (U. Maine)

Fall, 2006

Portions of these slides are derived from:

CSCE430/830 Memory: Set-Associative $
Dave Patterson © UCB
Cache performance
• Miss-oriented Approach to Memory Access:
 MemAccess 
CPUtime  IC   CPI   MissRate  MissPenalty   CycleTime
 Execution Inst 
 MemMisses 
CPUtime  IC   CPI   MissPenalty   CycleTime
 Execution Inst 
– CPIExecution includes ALU and Memory instructions

• Separating out Memory component entirely

– AMAT = Average Memory Access Time
– CPIALUOps does not include memory instructions
 AluOps MemAccess 
CPUtime  IC    CPI   AMAT   CycleTime
 
AluOps
Inst Inst
AMAT  HitTime  MissRate  MissPenalty
  HitTime Inst  MissRate Inst  MissPenalty Inst  

CSCE430/830
 HitTime Data  MissRate Data  MissPenaltyData  Memory: Set-Associative $
Cache Performance Example
• Assume we have a computer where the clock per instruction (CPI) is 1.0
when all memory accesses hit in the cache. The only data accesses are
loads and stores, and these total 50% of the instructions. If the miss penalty
is 25 clock cycles and the miss rate is 2% (Unified instruction cache and
data cache), how much faster would the computer be if all instructions and
data were cache hit?
CPUtime  CPUClockCycles  MemeoryStalls   ClockCycleTime
 ( IC  CPI  MemoryStalls )  ClockCycleTime

When all instructions are hit

CPUtime _ Ideal  ( IC  CPI  MemoryStalls )  ClockCycleTime
 ( IC  1.0  0)  ClockCycleTime
 IC  ClockCycleTime

In reality: MemAccess
MemoryStallCycles  IC   MissRate  MissPenalt y
Inst
 IC  (1  0.5)  0.02  25  IC  0.75

CPUtime _ Cache  ( IC  CPI  MemoryStalls )  ClockCycleTime

 ( IC  1.0  IC  0.75)  ClockCycleTime
 1.75  IC  ClockCycleTime
CSCE430/830 Memory: Set-Associative $
Performance Example Problem
Assume:
– For gcc, the frequency for all loads and stores is 36%.
– instruction cache miss rate for gcc = 2%
– data cache miss rate for gcc = 4%.
– If a machine has a CPI of 2 without memory stalls
– and the miss penalty is 40 cycles for all misses,
how much faster is a machine with a perfect cache?

Instruction miss cycles =IC x 2% x 40 = 0.80 x IC

Data miss cycles = IC x 36% x 4% x 40 = 0.576 x IC

CPIstall = 2 + ( 0.80 + 0.567 ) = 2 + 1.376 = 3.376

IC x CPIstall x Clock period 3.376

= = 1.69
IC x CPIperfect x Clock period 2

CSCE430/830 Memory: Set-Associative $

Performance Example Problem

Assume: we increase the performance of the previous machine by

doubling its clock rate. Since the main memory speed is unlikely to
change, assume that the absolute time to handle a cache miss does not
change. How much faster will the machine be with the faster clock?

For gcc, the frequency for all loads and stores is 36%
Instruction miss cycles = IC x 2% x 80 = 1.600 x IC
Data miss cycles = IC x 36% x 4% x 80 = 1.152 x IC
2.752 x IC
I x CPIslowClk x Clock period 3.376
I x CPIfastClk x Clock period
= 4.752 x 0.5 = 1.42 (not 2)

CSCE430/830 Memory: Set-Associative $

Fundamental Questions

• Q1: Where can a block be placed in the upper level?

(Block placement)

• Q2: How is a block found if it is in the upper level?

(Block identification)

• Q3: Which block should be replaced on a miss?

(Block replacement)

• Q4: What happens on a write?

(Write strategy)

CSCE430/830 Memory: Set-Associative $

Q1: Block Placement

• Where can block be placed in cache?

– In one predetermined place - direct-mapped
» Use part of address to calculate block location in cache
» Compare cache block with tag to check if block present
– Anywhere in cache - fully associative
» Compare tag to every block in cache
– In a limited set of places - set-associative
» Use portion of address to calculate set (like direct-
mapped)
» Place in any block in the set
» Compare tag to every block in set
» Hybrid of direct mapped and fully associative

CSCE430/830 Memory: Set-Associative $

Direct Mapped Block Placement

Cache

0 4 8 C address maps to block:

location = (block address MOD # blocks in cache)

00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C

Memory

CSCE430/830 Memory: Set-Associative $

Example: Accessing A Direct-Mapped Cache
• DM cache contains 4 1-word blocks. Find the # Misses for each
cache given this sequence of memory block accesses: 0, 8, 0, 6, 8

DM Memory Access 1: Mapping: 0 modulo 4 = 0

Mem Block DM Hit/Miss

Block 0
0
Block 1

Block 2

Block 3

CSCE430/830 Memory: Set-Associative $

Example: Accessing A Direct-Mapped Cache
DM cache contains 4 1-word blocks. Find the # Misses for each cache
given this sequence of memory block accesses: 0, 8, 0, 6, 8

DM Memory Access 1: Mapping: 0 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[0]

0 miss
Block 1

Block 2

Block 3

Set 0 is empty: write Mem[0]

CSCE430/830 Memory: Set-Associative $
Example: Accessing A Direct-Mapped Cache
• DM cache contains 4 1-word blocks. Find the # Misses for each
cache given this sequence of memory block accesses: 0, 8, 0, 6, 8

DM Memory Access 2: Mapping: 8 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[0]

0 miss
Block 1
8
Block 2

Block 3

CSCE430/830 Memory: Set-Associative $

Example: Accessing A Direct-Mapped Cache
• DM cache contains 4 1-word blocks. Find the # Misses for each
cache given this sequence of memory block accesses: 0, 8, 0, 6, 8

DM Memory Access 2: Mapping: 8 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[8]

0 miss
Block 1
8 miss
Block 2

Block 3

CSCE430/830
Set 0 contains Mem[0]. Overwrite withMemory:
Mem[8] Set-Associative $
Example: Accessing A Direct-Mapped Cache
• DM cache contains 4 1-word blocks. Find the # Misses for each
cache given this sequence of memory block accesses: 0, 8, 0, 6, 8

DM Memory Access 3: Mapping: 0 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[8]

0 miss
Block 1
8 miss
0 Block 2

Block 3

CSCE430/830 Memory: Set-Associative $

Example: Accessing A Direct-Mapped Cache
• DM cache contains 4 1-word blocks. Find the # Misses for each
cache given this sequence of memory block accesses: 0, 8, 0, 6, 8

DM Memory Access 3: Mapping: 0 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[0]

0 miss
Block 1
8 miss
0 miss Block 2

Block 3

CSCE430/830
Set 0 contains Mem[8]. Overwrite withMemory:
Mem[0] Set-Associative $
Example: Accessing A Direct-Mapped Cache
• DM cache contains 4 1-word blocks. Find the # Misses for each
cache given this sequence of memory block accesses: 0, 8, 0, 6, 8

DM Memory Access 4: Mapping: 6 mod 4 = 2

Mem Block DM Hit/Miss Block 0 Mem[0]

0 miss
Block 1
8 miss
0 miss Block 2
6
Block 3

CSCE430/830 Memory: Set-Associative $

Example: Accessing A Direct-Mapped Cache
• DM cache contains 4 1-word blocks. Find the # Misses for each
cache given this sequence of memory block accesses: 0, 8, 0, 6, 8

DM Memory Access 4: Mapping: 6 mod 4 = 2

Mem Block DM Hit/Miss Block 0 Mem[0]

0 miss
Block 1
8 miss
0 miss Block 2 Mem[6]
6 miss
Block 3

Set 2 empty. Write Mem[6]

DM Memory Access 5: Mapping: 8 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[0]

0 miss
Block 1
8 miss
0 miss Block 2 Mem[6]
6 miss
Block 3
8

CSCE430/830 Memory: Set-Associative $

Example: Accessing A Direct-Mapped Cache
• DM cache contains 4 1-word blocks. Find the # Misses for each
cache given this sequence of memory block accesses: 0, 8, 0, 6, 8

DM Memory Access 5: Mapping: 8 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[8]

0 miss
Block 1
8 miss
0 miss Block 2 Mem[6]
6 miss
Block 3
8 miss

CSCE430/830
Set 0 contains Mem[0]. Overwrite withMemory:
Mem[8] Set-Associative $
Direct-Mapped Cache with n one-word blocks
• Pros: find data fast
• Con: What if access 00001 and 10001 repeatedly?
 We always miss…

Cache

000
001
010
011

111
100
101
110

00001 00101 01001 01101 10001 10101 11001 11101

Memory
CSCE430/830 Memory: Set-Associative $
Fully Associative Block Placement

Cache
arbitrary block mapping
location = any

00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C

Memory

CSCE430/830 Memory: Set-Associative $

Example: Accessing A Fully-Associative Cache

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

for each cache given this sequence of memory block accesses: 0, 8,
0, 6, 8

FA Memory Access 1:

Mem Block DM Hit/Miss S

0 e
t
0

FA Block Replacement Rule: replace least recently used block

CSCE430/830 inSet-Associative
Memory: set $
Example: Accessing A Fully-Associative Cache

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

for each cache given this sequence of memory block accesses: 0, 8,
0, 6, 8

FA Memory Access 1:

Mem Block DM Hit/Miss S Mem

0 miss e [0]
t
0

CSCE430/830 Set 0 is empty: write Mem[0] toMemory:

Block 0 $
Set-Associative
Example: Accessing A Fully-Associative Cache

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

for each cache given this sequence of memory block accesses: 0, 8,
0, 6, 8

FA Memory Access 2:

Mem Block DM Hit/Miss S Mem

0 miss e [0]
8 t
0

CSCE430/830 Memory: Set-Associative $

Example: Accessing A Fully-Associative Cache

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

for each cache given this sequence of memory block accesses: 0, 8,
0, 6, 8

FA Memory Access 2:

Mem Block DM Hit/Miss S Mem Mem

0 miss e [0] [8]
8 miss t
0

CSCE430/830 Blocks 1-3 are LRU: write Mem[8] to BlockMemory:

1 Set-Associative $
Example: Accessing A Fully-Associative Cache

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

for each cache given this sequence of memory block accesses: 0, 8,
0, 6, 8

FA Memory Access 3:

Mem Block DM Hit/Miss S Mem Mem

0 miss e [0] [8]
8 miss t
0
0

CSCE430/830 Memory: Set-Associative $

Example: Accessing A Fully-Associative Cache

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

for each cache given this sequence of memory block accesses: 0, 8,
0, 6, 8

FA Memory Access 3:

Mem Block DM Hit/Miss S Mem Mem

0 miss e [0] [8]
8 miss t
0
0 hit

Block 0 contains Mem[0]

CSCE430/830 Memory: Set-Associative $
Example: Accessing A Fully-Associative Cache

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

for each cache given this sequence of memory block accesses: 0, 8,
0, 6, 8

FA Memory Access 4:

Mem Block DM Hit/Miss S Mem Mem

0 miss e [0] [8]
8 miss t
0 hit 0
6

CSCE430/830 Memory: Set-Associative $

Example: Accessing A Fully-Associative Cache

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

for each cache given this sequence of memory block accesses: 0, 8,
0, 6, 8

FA Memory Access 4:

Mem Block DM Hit/Miss S Mem Mem Mem

0 miss e [0] [8] [6]
8 miss t
0
0 hit
6 miss

CSCE430/830 Blocks 2-3 are LRU : write Mem[6] to Memory:

Block 2
Set-Associative $
Example: Accessing A Fully-Associative Cache

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

for each cache given this sequence of memory block accesses: 0, 8,
0, 6, 8

FA Memory Access 5:

Mem Block DM Hit/Miss S Mem Mem Mem

0 miss e [0] [8] [6]
8 miss t
0 hit 0
6 miss
8

CSCE430/830 Memory: Set-Associative $

Example: Accessing A Fully-Associative Cache

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

for each cache given this sequence of memory block accesses: 0, 8,
0, 6, 8

FA Memory Access 5:

Mem Block DM Hit/Miss S Mem Mem Mem

0 miss e [0] [8] [6]
8 miss t
0
0 hit
6 miss
8 hit

CSCE430/830 Block 1 contains Mem[8] Memory: Set-Associative $

Fully-Associative Cache Basics
1 set, n blocks: no mapping restrictions on how blocks are
stored in cache: many ways, e.g. least recently used is
replaced (LRU)
0…0000
0…0001
0…0010
0…0011
Example: 1-set, 8-block FA cache 0…0100
0…0101
0…0110
0…0111
Set 0 Bloc Bloc Bloc Bloc Bloc Bloc Bloc Bloc 0…1000
k0 k1 k2 k3 k4 k5 k6 k7
0…1001
0…1010
0…1011
0…1100
0…1101
0…1110
PRO: Less likely to replace needed data 0…1111
…

CON: Must search entire cache for hit/miss

CSCE430/830 Memory: Set-Associative $
Set-Associative Block Placement

Cache

address maps to set:

*0 *0 *4 *4 *8 *8 *C *C location = (block address MOD # sets in cache)
(arbitrary location in set)

Set 0 Set 1 Set 2 Set 3

00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 44 48 4C

Memory

CSCE430/830 Memory: Set-Associative $

Set-Associative Cache Basics
n/m sets, m blocks (m-way): blocks are mapped from
memory location to a specific set in cache

Mapping: Mem Address % n/m. If n/m is

0…0000 Mem block 0
a power of 2, log2(n/m) = #low-order bits 0…0001
0…0010
of memory address = cache set index 0…0011
0…0100
0…0101
0…0110

Example: 4 set, 0…0111

Set Block 0 Block 1 0…1000 Mem block 8
2-way SA cache 00
0…1001
0…1010
(ADD mod 4) Set
01 0…1011
0…1100
Set 0…1101
10
0…1110

Set 0…1111
11 …

CSCE430/830 Memory: Set-Associative $

Example: Accessing A Set-Associative Cache
• 2-way Set-Associative cache contains 2 sets, 2 one-word blocks each.
Find the # Misses for each cache given this sequence of memory block
accesses: 0, 8, 0, 6, 8

SA Memory Access 1: Mapping: 0 mod 2 = 0

Mem Block DM Hit/Miss Set 0

0
Set 1

SA Block Replacement Rule: replace least recently used block

CSCE430/830 inSet-Associative
Memory: set $
Example: Accessing A Set-Associative Cache
• 2-way Set-Associative cache contains 2 sets, 2 one-word blocks each.
Find the # Misses for each cache given this sequence of memory block
accesses: 0, 8, 0, 6, 8

SA Memory Access 1: Mapping: 0 mod 2 = 0

Mem Block DM Hit/Miss Set 0 Mem[0]

0 miss
Set 1

CSCE430/830 Set 0 is empty: write Mem[0] toMemory:

Block 0 $
Set-Associative
Example: Accessing A Set-Associative Cache
• 2-way Set-Associative cache contains 2 sets, 2 one-word blocks each.
Find the # Misses for each cache given this sequence of memory block
accesses: 0, 8, 0, 6, 8

SA Memory Access 2: Mapping: 8 mod 2 = 0

Mem Block DM Hit/Miss Set 0 Mem[0]

0 miss
Set 1
8

CSCE430/830 Memory: Set-Associative $

SA Memory Access 2: Mapping: 8 mod 2 = 0

Mem Block DM Hit/Miss Set 0 Mem[0] Mem[8]

0 miss
Set 1
8 miss

CSCE430/830 Set 0, Block 1 is LRU: write Mem[8]

Memory: Set-Associative $
Example: Accessing A Set-Associative Cache
• 2-way Set-Associative cache contains 2 sets, 2 one-word blocks each.
Find the # Misses for each cache given this sequence of memory block
accesses: 0, 8, 0, 6, 8

SA Memory Access 3: Mapping: 0 mod 2 = 0

Mem Block DM Hit/Miss Set 0 Mem[0] Mem[8]

0 miss
Set 1
8 miss
0

CSCE430/830 Memory: Set-Associative $

SA Memory Access 3: Mapping: 0 mod 2 = 0

Mem Block DM Hit/Miss Set 0 Mem[0] Mem[8]

0 miss
Set 1
8 miss
0 hit

CSCE430/830 Set 0, Block 0 contains Mem[0] Memory: Set-Associative $

SA Memory Access 4: Mapping: 6 mod 2 = 0

Mem Block DM Hit/Miss Set 0 Mem[0] Mem[8]

0 miss
Set 1
8 miss
0 hit
6

CSCE430/830 Memory: Set-Associative $

SA Memory Access 4: Mapping: 6 mod 2 = 0

Mem Block DM Hit/Miss Set 0 Mem[0] Mem[6]

0 miss
Set 1
8 miss
0 hit
6 miss

CSCE430/830 Set 0, Block 1 is LRU: overwrite with Memory:

Mem[6] Set-Associative $
Example: Accessing A Set-Associative Cache
• 2-way Set-Associative cache contains 2 sets, 2 one-word blocks each.
Find the # Misses for each cache given this sequence of memory block
accesses: 0, 8, 0, 6, 8

SA Memory Access 5: Mapping: 8 mod 2 = 0

Mem Block DM Hit/Miss Set 0 Mem[0] Mem[6]

0 miss
Set 1
8 miss
0 hit
6 miss
8

CSCE430/830 Memory: Set-Associative $

SA Memory Access 5: Mapping: 8 mod 2 = 0

Mem Block DM Hit/Miss Set 0 Mem[8] Mem[6]

0 miss
Set 1
8 miss
0 hit
6 miss
8 miss

CSCE430/830 Set 0, Block 0 is LRU: overwrite with Memory:

Mem[8] Set-Associative $
Set-Associative Cache Basics
n/m sets, m blocks (m-way): blocks are mapped from
memory location to a specific set in cache

Mapping: Mem Address % n/m. If n/m is

0…0000 Mem block 0
a power of 2, log2(n/m) = #low-order bits 0…0001
0…0010
of memory address = cache set index 0…0011
0…0100
0…0101
0…0110
Example: 4 set, 2-way SA cache 0…0111

(X mod 4) 0…1000 Mem block 8

0…1001

PRO: Set
00
Block 0 Block 1 0…1010
0…1011
Easier to find but won’t Set
01
0…1100
0…1101
always overwrite Set
10
0…1110
0…1111
Set
CON: 11 …

Must search set for hit/miss

CSCE430/830 Memory: Set-Associative $
Associativity Considerations
• DM and FA are special cases of SA cache
– Set-Associative: n/m sets; m blocks/set (associativity=m)
– Direct-Mapped: m=1 (1-way set-associative, associativity=1)
– Fully-Associative: m=n (n-way set-associative, associativity=n)

• Advantage of Associativity: as associativity increases,

miss rate decreases (because more blocks per set that
we’re less likely to overwrite)
• Disadvantage of Associativity: as associativity increases,
hit time increases (because we have to search more blocks
– more HW required)
• Block Replacement: LRU or random. Random is easier to
implement and often not much worse

CSCE430/830 Memory: Set-Associative $

Q2: Block Identification

• Every cache block has an address tag that

identifies its location in memory
• Hit when tag and address of desired word
match
(comparison by hardware)
• Q: What happens when a cache block is
empty?
A: Mark this condition with a valid bit (0 if
empty)

Valid Tag Data

1 0x00001C0 0xff083c2d

CSCE430/830 Memory: Set-Associative $

Q2: Block Identification?

• Tag on each block

– No need to check index or block offset
• Increasing associativity shrinks index, expands
tag

Block Address Block

Offset
Tag Index

Fully Associative: No index

Direct Mapped: Large index

CSCE430/830 Memory: Set-Associative $

Direct-Mapped Cache Design

Cache Byte Offset DATA HIT =1

ADDRESS Tag Index
0x0000000 3 0
ADDR
V Tag Data
1 0x00001C0 0xff083c2d
0
1 0x0000000 0x00000021
1 0x0000000 0x00000103
CACHE SRAM
0
0
1
0 0x23F0210 0x00000009
DATA[59] DATA[58:32] DATA[31:0]

CSCE430/830 Memory: Set-Associative $

Set Associative Cache Design

Address

• Key idea: 31 30 12 11 10 9 8 3210

– Divide cache into sets 22 8

– Allow block anywhere in a set

Index V Tag Data V Tag Data V Tag Data V Tag Data
• Advantages: 0
1
– Better hit rate 2

• Disadvantage: 253
254
255
– More tag bits 22 32

– More hardware
– Higher access time

4-to-1 multiplexor

Hit Data

A Four-Way Set-Associative Cache

CSCE430/830 Memory: Set-Associative $

Fully Associative Cache Design

• Key idea: set size of one block

– 1 comparator required for each block
– No address decoding
– Practical only for small caches due to hardware demands

tag in 11110111 data out 1111000011110000101011

= tag 00011100 data 0000111100001111111101
= tag 11110111 data 1111000011110000101011
= tag 11111110 data 0000000000001111111100
= tag 00000011 data 1110111100001110000001
= tag 11100110 data 1111111111111111111111

CSCE430/830 Memory: Set-Associative $

Calculating Bits in Cache

• How many total bits are needed for a direct- mapped cache with
64 KBytes of data and one word blocks, assuming a 32-bit
address?

• How many total bits would be needed for a 4-way set associative
cache to store the same amount of data

• How many total bits are needed for a direct- mapped cache with
64 KBytes of data and 8 word blocks, assuming a 32-bit address?

CSCE430/830 Memory: Set-Associative $

Calculating Bits in Cache
• How many total bits are needed for a direct- mapped cache with
64 KBytes of data and one word blocks, assuming a 32-bit
address?
– 64 Kbytes = 16 K words = 2^14 words = 2^14 blocks
– block size = 4 bytes => offset size = 2 bits,
– #sets = #blocks = 2^14 => index size = 14 bits
– tag size = address size - index size - offset size = 32 - 14 - 2 = 16 bits
– bits/block = data bits + tag bits + valid bit = 32 + 16 + 1 = 49
– bits in cache = #blocks x bits/block = 2^14 x 49 = 98 Kbytes
• How many total bits would be needed for a 4-way set associative
cache to store the same amount of data
– block size and #blocks does not change
– #sets = #blocks/4 = (2^14)/4 = 2^12 => index size = 12 bits
– tag size = address size - index size - offset = 32 - 12 - 2 = 18 bits
– bits/block = data bits + tag bits + valid bit = 32 + 18 + 1 = 51
– bits in cache = #blocks x bits/block = 2^14 x 51 = 102 Kbytes
• Increase associativity => increase bits in cache

CSCE430/830 Memory: Set-Associative $

Calculating Bits in Cache

• How many total bits are needed for a direct- mapped

cache with 64 KBytes of data and 8 word blocks,
assuming a 32-bit address?
– 64 Kbytes = 2^14 words = (2^14)/8 = 2^11 blocks
– block size = 32 bytes => offset size = 5 bits,
– #sets = #blocks = 2^11 => index size = 11 bits
– tag size = address size - index size - offset size = 32 - 11 - 5 = 16 bits
– bits/block = data bits + tag bits + valid bit = 8x32 + 16 + 1 = 273 bits
– bits in cache = #blocks x bits/block = 2^11 x 273 = 68.25 Kbytes
• Increase block size => decrease bits in cache

CSCE430/830 Memory: Set-Associative $

Q3: Block Replacement

• On a miss, data must be read from memory.

• So, where do we put the new data?
– Direct-mapped cache: must place in fixed location
– Set-associative, fully-associative - can pick within set

CSCE430/830 Memory: Set-Associative $

Replacement Algorithms
• When a block is fetched, which block in the target set should be
replaced?
• Optimal algorithm:
» replace the block that will not be used for the longest time
(must know the future)
• Usage based algorithms:
– Least recently used (LRU)
» replace the block that has been referenced least recently
» hard to implement
• Non-usage based algorithms:
– First-in First-out (FIFO)
» treat the set as a circular queue, replace head of queue.
» easy to implement
– Random (RAND)
» replace a random block in the set
» even easier to implement

CSCE430/830 Memory: Set-Associative $

Q4: Write Strategy

• What happens on a write?

– Write through - write to memory, stall processor until
done
– Write buffer - place in buffer (allows pipeline to continue*)
– Write back - delay write to memory until block is replaced
in cache
• Special considerations when using DMA,
multiprocessors (coherence between caches)

CSCE430/830 Memory: Set-Associative $

Write Through
• Store by processor updates cache and memory
• Memory always consistent with cache
• ~2X more loads than stores
• WT always combined with write buffers so that don’t wait for lower
level memory

Store
Memory
Processor

Cache
Load
Cache
Load
CSCE430/830 Memory: Set-Associative $
Write Back

• Store by processor only updates cache line

• Modified line written to memory only when it is evicted
– Requires “dirty bit” for each line
» Set when line in cache is modified
» Indicates that line in memory is stale
• Memory not always consistent with cache
• No writes of repeated writes

Write
Processor Store Back
Memory
Cache
Load Cache
Load

CSCE430/830 Memory: Set-Associative $

Store Miss?

• Write-Allocate
– Bring written block into cache
– Update word in block
– Anticipate further use of block
• No-write Allocate
– Main memory is updated
– Cache contents unmodified

CSCE430/830 Memory: Set-Associative $

Cache Basics

• Cache: level of temporary memory storage between

CPU and main memory. Improves overall memory
speed by taking advantage of the principle of locality

• Cache is divided into sets; each set holds from a

particular group of main memory locations

• Cache parameters
– Cache size, block size, associativity

• 3 types of Cache (w/ n total blocks):

– Direct-mapped: n sets, each holds 1 block
– Fully-associative: 1 set, holds n blocks
– Set-associative: n/m sets, each holds m blocks

CSCE430/830 Memory: Set-Associative $

Classifying Misses: 3C

– Compulsory—The first access to a block is not in the cache, so

the block must be brought into the cache. Also called cold start
misses or first reference misses.
(Misses in even an Infinite Cache)
– Capacity—If the cache cannot contain all the blocks needed
during execution of a program, capacity misses will occur due
to blocks being discarded and later retrieved.
(Misses in Fully Associative Size X Cache)
– Conflict—If block-placement strategy is set associative or direct
mapped, conflict misses (in addition to compulsory & capacity
misses) will occur because a block can be discarded and later
retrieved if too many blocks map to its set. Also called collision
misses or interference misses.
(Misses in N-way Associative, Size X Cache)

CSCE430/830 Memory: Set-Associative $

Classifying Misses: 3C

3Cs Absolute Miss Rate (SPEC92)

0.14
1-way Conflict
0.12
2-way
0.1
Miss Rate per Type

4-way
0.08
8-way
0.06
Capacity
0.04
0.02
0
1

128
Cache Size (KB) Compulsory
Compulsory vanishingly
small
CSCE430/830 Memory: Set-Associative $
2:1 Cache Rule

miss rate 1-way associative cache size X

= miss rate 2-way associative cache size X/2

0.14 Conflict
1-way
0.12
2-way
0.1
Miss Rate per Type

4-way
0.08
8-way
0.06
Capacity
0.04
0.02
0
1

128
Cache Size (KB) Compulsory
CSCE430/830 Memory: Set-Associative $
3C Relative Miss Rate

100%
1-way
80%
2-way
Miss Rate per Type

4-way Conflict
60% 8-way

40%
Capacity

20%

0%
1

128
Compulsory
Flaws: for fixed block size Cache Size (KB)
Good: insight => invention

CSCE430/830 Memory: Set-Associative $

Improve Cache Performance
improve cache and memory access times:

Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty

Section 5.5 Section 5.3 Section 5.4

CPUtime  IC * (CPI Execution  MemoryAccess

Instruction * MissRate * MissPenalt y * ClockCycleTime)

• Improve performance by:

1. Reduce the miss rate,
2. Reduce the miss penalty, or
3. Reduce the time to hit in the cache.

CSCE430/830 Memory: Set-Associative $

Reducing Cache Misses: 1. Larger Block Size

Using the principle of locality. The larger the block, the greater the
chance parts of it will be used again.

25% Size of Cache

20% 1K

4K
15%
Miss
16K
Rate
10%
64K
5% 256K

0%
16

128

256
Block Size (bytes)

CSCE430/830 Memory: Set-Associative $

Increasing Block Size

• One way to reduce the miss rate is to

increase the block size
– Take advantage of spatial locality
– Decreases compulsory misses
• However, larger blocks have disadvantages
– May increase the miss penalty (need to get more data)
– May increase hit time (need to read more data from cache
and larger mux)
– May increase miss rate, since conflict misses
• Increasing the block size can help, but don’t
overdo it.

CSCE430/830 Memory: Set-Associative $

Block Size vs. Cache Measures
• Increasing Block Size generally increases
Miss Penalty and decreases Miss Rate
• As the block size increases the AMAT starts
to decrease, but eventually increases

Miss X Miss = Avg.

Penalty Rate Memory
Access
Time

Block Size Block Size

Block Size
CSCE430/830 Memory: Set-Associative $
Reducing Cache Misses: 2. Higher Associativity

• Increasing associativity helps reduce conflict

misses
• 2:1 Cache Rule:
– The miss rate of a direct mapped cache of size N is about
equal to the miss rate of a 2-way set associative cache of
size N/2
– For example, the miss rate of a 32 Kbyte direct mapped
cache is about equal to the miss rate of a 16 Kbyte 2-way
set associative cache
• Disadvantages of higher associativity
– Need to do large number of comparisons
– Need n-to-1 multiplexor for n-way set associative
– Could increase hit time

CSCE430/830 Memory: Set-Associative $

AMAT vs. Associativity

Cache Size Associativity

(KB) 1-way 2-way 4-way 8-way
1 7.65 6.60 6.22 5.44
2 5.90 4.90 4.62 4.09
4 4.60 3.95 3.57 3.19
8 3.30 3.00 2.87 2.59
16 2.45 2.20 2.12 2.04
32 2.00 1.80 1.77 1.79
64 1.70 1.60 1.57 1.59
128 1.50 1.45 1.42 1.44
Red means A.M.A.T. not improved by more associativity
Does not take into account effect of slower clock on rest of program

CSCE430/830 Memory: Set-Associative $

Reducing Cache Misses: 3. Victim Cache
• Data discarded from cache is placed in an extra small buffer (victim cache).
• On a cache miss check victim cache for data before going to main memory
• Jouppi [1990]: A 4-entry victim cache removed 20% to 95% of conflicts for a 4 KB
direct mapped data cache
• Used in Alpha, HP PA-RISC CPUs.

CPU
Address
Address
In
Out
=?
Tag
Victim Cache
Data
Cache
=?

Write
Buffer

Lower Level Memory

CSCE430/830 Memory: Set-Associative $

Reducing Cache Misses:
4. Way Prediction and Pseudoassociative Caches

 Way prediction helps select one block among those in a set,

thus requiring only one tag comparison (if hit).
 Preserves advantages of direct-mapping (why?);
 In case of a miss, other block(s) are checked.
 Pseudoassociative (also called column associative) caches
 Operate exactly as direct-mapping caches when hit, thus
again preserving advantages of the direct-mapping;
 In case of a miss, another block is checked (as if in set-
associative caches), by simply inverting the most
significant bit of the index field to find the other block in
the “pseudoset”.
 real hit time < pseudo-hit time
 too many pseudo hits would defeat the purpose

CSCE430/830 Memory: Set-Associative $

Reducing Cache Misses:
5. Compiler Optimizations

CSCE430/830 Memory: Set-Associative $

Reducing Cache Misses:
5. Compiler Optimizations

CSCE430/830 Memory: Set-Associative $

Reducing Cache Misses:
5. Compiler Optimizations

CSCE430/830 Memory: Set-Associative $

Reducing Cache Misses:
5. Compiler Optimizations
• Blocking: improve temporal and spatial locality
a) multiple arrays are accessed in both ways (i.e., row-major and column-
major), namely, orthogonal accesses that can not be helped by earlier
methods
b) concentrate on submatrices, or blocks

c) All N*N elements of Y and Z are accessed N times and each element of X
is accessed once. Thus, there are N3 operations and 2N3 + N2 reads!
Capacity misses are a function of N and cache size in this case.

CSCE430/830 Memory: Set-Associative $

Reducing Cache Misses:
5. Compiler Optimizations
• Blocking: improve temporal and spatial locality
a) To ensure that elements being accessed can fit in the cache, the original
code is changed to compute a submatrix of size B*B, where B is called the
blocking factor.
b) To total number of memory words accessed is 2N3//B + N2
c) Blocking exploits a combination of spatial (Y) and temporal (Z) locality.

CSCE430/830 Memory: Set-Associative $

Reducing Cache Miss Penalty:
1. Multi-level Cache
a) To keep up with the widening gap between CPU and main
memory, try to:
i. make cache faster, and
ii. make cache larger
by adding another, larger but slower cache between cache and
the main memory.

CSCE430/830 Memory: Set-Associative $

Adding an L2 Cache
• If a direct mapped cache has a hit rate of 95%, a hit time of
4 ns, and a miss penalty of 100 ns, what is the AMAT?

• If an L2 cache is added with a hit time of 20 ns and a hit rate

of 50%, what is the new AMAT?

CSCE430/830 Memory: Set-Associative $

Adding an L2 Cache
• If a direct mapped cache has a hit rate of 95%, a hit time of
4 ns, and a miss penalty of 100 ns, what is the AMAT?
AMAT = Hit time + Miss rate x Miss penalty = 4 + 0.05 x 100 = 9 ns

• If an L2 cache is added with a hit time of 20 ns and a hit rate

of 50%, what is the new AMAT?
AMAT = Hit TimeL1 + Miss RateL1 x (Hit TimeL2 + Miss RateL2 x Miss PenaltyL2 )
=4 + 0.05 x (20 + 0.5x100) = 7.5 ns

CSCE430/830 Memory: Set-Associative $

Reducing Cache Miss Penalty:
2. Handling Misses Judiciously
 Critical Word First and Early Restart
 CPU needs just one word of the block at a time:
 critical word first: fetch the required word first, and
 early start: as soon as the required word arrives, send it to CPU.
 Giving Priority to Read Misses over Write Misses
 Serves reads before writes have been completed:
 while write buffers improve write-through performance, they
complicate memory accesses by potentially delaying updates to
memory;
 instead of waiting for the write buffer to become empty before
processing a read miss, the write buffer is checked for content that
might satisfy the missing read.

 in a write-back scheme, the dirty copy upon replacing is first written to

the write buffer instead of the memory, thus improving performance.

CSCE430/830 Memory: Set-Associative $

Reducing Cache Miss Penalty:
3. Compiler-Controlled Prefetching
 Compiler inserts prefetch instructions
 An Example
for(i:=0; i<3; i:=i+1)
for(j:=0; j<100; j:=j+1)
a[i][j] := b[j][0] * b[j+1][0]
 16-byte blocks, 8KB cache, 1-way write back, 8-byte
elements; What kind of locality, if any, exists for a and b?
a. 3 100-element rows (100 columns) visited; spatial locality:
even-indexed elements miss and odd-indexed elements hit,
leading to 3*100/2 = 150 misses
b. 101 rows and 3 columns visited; no spatial locality, but
there is temporal locality: same element is used in ith and (i
+ 1)st iterations and the same element is access in each i
iteration (outer loop). 100 misses for i = 0 and 1 miss for j =
0 for a total of 101 misses
 Assuming large penalty (50 cycles and at least 7
iterations must be prefetched). Splitting the loop into
two, we have

CSCE430/830 Memory: Set-Associative $

Reducing Cache Miss Penalty:
3. Compiler-Controlled Prefetching
 An Example (continued)
for(j:=0; j<100; j:=j+1){
prefetch(b[j+7][0];
prefetch(a[0][j+7];
a[0][j] := b[j][0] * b[j+1][0];};
for(i:=1; i<3; i:=i+1)
for(j:=0; j<100; j:=j+1){
prefetch(a[i][j+7];
a[i][j] := b[j][0] * b[j+1][0]}
 Assuming that each iteration of the pre-split loop
consumes 7 cycles and no conflict and capacity misses,
then it consumes a total of 7*300 + 251*50 = 14650 cycles
(total iteration cycles plus total cache miss cycles);

CSCE430/830 Memory: Set-Associative $

Reducing Cache Miss Penalty:
3. Compiler-Controlled Prefetching
 An Example (continued)
 the first loop consumes 9 cycles per iteration (due to the
two prefetch instruction)
 the second loop consumes 8 cycles per iteration (due to
the single prefetch instruction),
 during the first 7 iterations of the first loop array a incurs
4 cache misses,
 array b incurs 7 cache misses,
 during the first 7 iterations of the second loop for i = 1
and i = 2 array a incurs 4 cache misses each
 array b does not incur any cache miss in the second
split!.
 the split loop consumes a total of
(1+1+7)*100+(4+7)*50+(1+7)*200+(4+4)*50 = 3450
 Prefetching improves performance: 14650/3450=4.25 folds

CSCE430/830 Memory: Set-Associative $

Reducing Cache Hit Time:

 Small and simple caches

 smaller is faster:
 small index, less address translation time
 small cache can fit on the same chip with CPU
 low associativity: in addition to a simpler/shorter tag
check, 1-way cache allows overlapping tag check with
transmission of data which is not possible with any
higher associativity!
 Avoid address translation during indexing
 Make the common case fast:
 use virtual address for cache because most memory
accesses (more than 90%) take place in cache, resulting
in virtual cache

CSCE430/830 Memory: Set-Associative $

Reducing Cache Hit Time:
 Make the common case fast (continued):
 there are at least three important performance aspects that
directly relate to virtual-to-physical translation:
1) improperly organized or insufficiently sized TLBs may create
excess not-in-TLB faults, adding time to program execution time
2) for a physical cache, the TLB access time must occur before the
cache access, extending the cache access time
3) two-line address (e.g., an I-line and a D-line address) may be
independent of each other in virtual address space yet collide in
the real address space, when they draw pages whose lower page
address bits (and upper cache address bits) are identical
 problems with virtual cache:
1) Page-level protection must be enforced no matter what during
address translation (solution: copy protection info from TLB on a
miss and hold it in a field for future virtual indexing/tagging)
2) when a process is switched in/out, the entire cache has to be
flushed out ‘cause physical address will be different each time,
i.e., the problem of context switching (solution: process identifier
tag -- PID)

CSCE430/830 Memory: Set-Associative $

Reducing Cache Hit Time:
 Avoid address translation during indexing (continued)
 problems with virtual cache:
3) different virtual addresses may refer to the same physical
address, i.e., the problem of synonyms/aliases
 HW solution: guarantee every cache block a unique phy.
Address
 SW solution: force aliases to share some address bits (e.g.,
page-coloring)
 Virtually indexed and physically tagged
 Pipelined cache writes
 the solution is to reduce CCT and increase # of stages – increases instr.
throughput
 Trace caches
 Finds a dynamic sequence of instructions including taken branches to
load into a cache block:
 Put traces of the executed instructions into cache blocks as
determined by the CPU
 Branch prediction is folded in to the cache and must be validated
along with the addresses to have a valid fetch.
 Disadvantage: store the same instructions multiple times
CSCE430/830 Memory: Set-Associative $
Cache Performance Measures

• Hit rate: fraction found in the cache

– So high that we usually talk about Miss rate = 1 - Hit Rate
• Hit time: time to access the cache
• Miss penalty: time to replace a block from lower level,
including time to replace in CPU
– access time: time to access lower level
– transfer time: time to transfer block
• Average memory-access time (AMAT)
= Hit time + Miss rate x Miss penalty (ns or clocks)

CSCE430/830 Memory: Set-Associative $

Cache performance
• Miss-oriented Approach to Memory Access:
 MemAccess 
CPUtime  IC   CPI   MissRate  MissPenalty   CycleTime
 Execution Inst 
 MemMisses 
CPUtime  IC   CPI   MissPenalty   CycleTime
 Execution Inst 
– CPIExecution includes ALU and Memory instructions

• Separating out Memory component entirely

CSCE430/830
 HitTime Data  MissRate Data  MissPenaltyData  Memory: Set-Associative $
Calculating AMAT
• If a direct mapped cache has a hit rate of 95%, a hit
time of 4 ns, and a miss penalty of 100 ns, what is the
AMAT?

• If replacing the cache with a 2-way set associative

increases the hit rate to 97%, but increases the hit
time to 5 ns, what is the new AMAT?

CSCE430/830 Memory: Set-Associative $

Calculating AMAT
• If a direct mapped cache has a hit rate of 95%, a hit
time of 4 ns, and a miss penalty of 100 ns, what is the
AMAT?
AMAT = Hit time + Miss rate x Miss penalty = 4 + 0.05 x 100 = 9 ns

• If replacing the cache with a 2-way set associative

increases the hit rate to 97%, but increases the hit
time to 5 ns, what is the new AMAT?
AMAT = Hit time + Miss rate x Miss penalty = 5 + 0.03 x 100 = 8 ns

CSCE430/830 Memory: Set-Associative $

Impact on Performance

• Suppose a processor executes at

– Clock Rate = 200 MHz (5 ns per cycle), Ideal (no misses) CPI = 1.1
– 50% arith/logic, 30% ld/st, 20% control
• Suppose that 10% of data memory operations get 50 cycle
miss penalty
• Suppose that 1% of instructions get same miss penalty
• Calculate AMAT?

CSCE430/830 Memory: Set-Associative $

Impact on Performance

• Suppose a processor executes at

– Clock Rate = 200 MHz (5 ns per cycle), Ideal (no misses) CPI = 1.1
– 50% arith/logic, 30% ld/st, 20% control
• Suppose that 10% of data memory operations get 50 cycle miss penalty
• Suppose that 1% of instructions get same miss penalty

• CPI = ideal CPI + average stalls per instruction

1.1(cycles/ins) +
[ 0.30 (DataMops/ins) x 0.10 (miss/DataMop) x 50 (cycle/miss)] +
[ 1 (InstMop/ins) x 0.01 (miss/InstMop) x 50 (cycle/miss)]
= (1.1 + 1.5 + .5) cycle/ins = 3.1

• AMAT=(1/1.3)x[1+0.01x50]+(0.3/1.3)x[1+0.1x50]=2.54

AMAT  HitTime  MissRate  MissPenalty

  HitTime Inst  MissRate Inst  MissPenalty Inst  
CSCE430/830  HitTime Data  MissRate Data  MissPenaltyData  Memory: Set-Associative $
Unified vs Split Caches
• Unified vs Separate I&D
Proc
Proc
I-Cache-1 D-Cache-1
Unified
Cache-1 Unified
Cache-2
Unified
Cache-2

• Example:
– 16KB I&D: Inst miss rate=0.64%, Data miss rate=6.47%
– 32KB unified: Aggregate miss rate=1.99%
• Which is better (ignore L2 cache)?
– Assume 33% data ops  75% accesses from instructions (1.0/1.33)
– hit time=1, miss time=50
– Note that data hit has 1 stall for unified cache (only one port)

CSCE430/830 Memory: Set-Associative $

Unified vs Split Caches
• Unified vs Separate I&D
Proc
Proc
I-Cache-1 D-Cache-1
Unified
Cache-1 Unified
Cache-2
Unified
Cache-2

AMATHarvard=75%x(1+0.64%x50)+25%x(1+6.47%x50) = 2.05
AMATUnified=75%x(1+1.99%x50)+25%x(1+1+1.99%x50)= 2.24
CSCE430/830 Memory: Set-Associative $
Cache Performance Summary

• AMAT = Hit time + Miss rate x Miss penalty

• Split vs. Unified Cache
• 3C’s of misses
– compulsory
– capacity
– conflict
• Methods for improving performance
– Reduce miss rate: increase cache size, block size,
associativity, compiler optimization, way-prediction, victim
cache, etc.
– Reduce miss penalty: multi-level cache, handling misses
judiciously, compiler-controlled prefetching, etc.
– Reduce hit time: smaller and simpler caches, avoiding
address translation in indexing, pipelining cache writes,
trace cache, etc.

CSCE430/830 Memory: Set-Associative $

C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Mazak Service Biuletyn 02
100% (2)
Mazak Service Biuletyn 02
5 pages
Chapter 4 Memory Organization Lecture
No ratings yet
Chapter 4 Memory Organization Lecture
54 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
19 Cache 2
No ratings yet
19 Cache 2
46 pages
Week12 Updated
No ratings yet
Week12 Updated
60 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
CA I - Chapter 5 Caches 3
No ratings yet
CA I - Chapter 5 Caches 3
70 pages
Week 13 - Lecture 13 - Memory (Cont)
No ratings yet
Week 13 - Lecture 13 - Memory (Cont)
31 pages
Cache Memory: CSE 410, Spring 2008 Computer Systems
No ratings yet
Cache Memory: CSE 410, Spring 2008 Computer Systems
42 pages
hw4 Sol
No ratings yet
hw4 Sol
4 pages
CL10 MemoryMgmt
No ratings yet
CL10 MemoryMgmt
45 pages
My Presentation - 6th Oct. 2011
No ratings yet
My Presentation - 6th Oct. 2011
18 pages
16-Cache Memory-13-03-2024
No ratings yet
16-Cache Memory-13-03-2024
50 pages
CA11 2023S1 New
No ratings yet
CA11 2023S1 New
26 pages
10 Cacheperf
No ratings yet
10 Cacheperf
24 pages
10 Cache
No ratings yet
10 Cache
28 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Lecture 6 Memory 2023
No ratings yet
Lecture 6 Memory 2023
66 pages
SS Computer Architecture Cache Memory Organization
No ratings yet
SS Computer Architecture Cache Memory Organization
24 pages
24-Cache Memory Mapping Techniques-14!03!2024
No ratings yet
24-Cache Memory Mapping Techniques-14!03!2024
36 pages
EE6304 Lecture9 Mem Caches
No ratings yet
EE6304 Lecture9 Mem Caches
61 pages
PDF
No ratings yet
PDF
6 pages
Fundamentals of Computer Systems: Caches
No ratings yet
Fundamentals of Computer Systems: Caches
28 pages
Direct-Mapped Cache: Write Allocate With Write-Through Protocol
No ratings yet
Direct-Mapped Cache: Write Allocate With Write-Through Protocol
25 pages
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
No ratings yet
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
25 pages
55-Types of Caches, Caches Misses,-04!03!2025
No ratings yet
55-Types of Caches, Caches Misses,-04!03!2025
64 pages
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
No ratings yet
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
7 pages
Lecture 15 Set Associative Cache Cache Performance Cache Performance
No ratings yet
Lecture 15 Set Associative Cache Cache Performance Cache Performance
20 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
Unit Iv
No ratings yet
Unit Iv
61 pages
1239302344
No ratings yet
1239302344
19 pages
Cache
No ratings yet
Cache
34 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
Improving Cache Performance
No ratings yet
Improving Cache Performance
24 pages
CS252 Graduate Computer Architecture Caches and Memory Systems I
No ratings yet
CS252 Graduate Computer Architecture Caches and Memory Systems I
49 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
76 pages
24 - Caching
No ratings yet
24 - Caching
22 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Cache Memory: Prepared by - : Manan Mewada (TA, IET)
No ratings yet
Cache Memory: Prepared by - : Manan Mewada (TA, IET)
19 pages
Kien-Truc-May-Tinh-Nang-Cao - Tran-Ngoc-Thinh - Lec04-Cache - (Cuuduongthancong - Com)
No ratings yet
Kien-Truc-May-Tinh-Nang-Cao - Tran-Ngoc-Thinh - Lec04-Cache - (Cuuduongthancong - Com)
16 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
53-Cache Memory - Principles, Cache Memory Management Techniques-28!02!2025
38 pages
Computer Organization Exercise Answer7
No ratings yet
Computer Organization Exercise Answer7
7 pages
The Motivation For Caches: Memory System
No ratings yet
The Motivation For Caches: Memory System
9 pages
Memory Organization PPT1
No ratings yet
Memory Organization PPT1
23 pages
FALLSEM2021-22 CSE2001 TH VL2021220103528 Reference Material I 01-10-2021 Cache Memory
No ratings yet
FALLSEM2021-22 CSE2001 TH VL2021220103528 Reference Material I 01-10-2021 Cache Memory
63 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
Lecture21 PDF
No ratings yet
Lecture21 PDF
34 pages
Computer Architecture: Memory Hierarchy Design
No ratings yet
Computer Architecture: Memory Hierarchy Design
60 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
COSS - Lecture - 5 - With Annotation
No ratings yet
COSS - Lecture - 5 - With Annotation
23 pages
5 1
No ratings yet
5 1
39 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
Lec 23 CAOCache Memory
No ratings yet
Lec 23 CAOCache Memory
11 pages
ch5 Easy
No ratings yet
ch5 Easy
27 pages
Prototype Technical Report 1976 Sasson - Report
100% (1)
Prototype Technical Report 1976 Sasson - Report
47 pages
Chapter 6 Memory Management
No ratings yet
Chapter 6 Memory Management
38 pages
Presented To:: Miss Sehreen Technology (MUST) Presented By: Bsse 3 Semester Raza Ahmed Waqas Sharoom Abaid Ullah
No ratings yet
Presented To:: Miss Sehreen Technology (MUST) Presented By: Bsse 3 Semester Raza Ahmed Waqas Sharoom Abaid Ullah
21 pages
B.tech 3 - Coa - Unit-4 Notes PDF
No ratings yet
B.tech 3 - Coa - Unit-4 Notes PDF
24 pages
MIL Lesson 8
No ratings yet
MIL Lesson 8
33 pages
CIT 111 Theory 01
No ratings yet
CIT 111 Theory 01
31 pages
184-Pin RIMM Capacity
No ratings yet
184-Pin RIMM Capacity
4 pages
Hardware and Software Components
No ratings yet
Hardware and Software Components
2 pages
BY: For:: Ahmad Khairi Halis
No ratings yet
BY: For:: Ahmad Khairi Halis
19 pages
SAP HANA Memory Usage Explained 2
No ratings yet
SAP HANA Memory Usage Explained 2
12 pages
VT03 Data Sheet RevisionA6 - ENG
No ratings yet
VT03 Data Sheet RevisionA6 - ENG
51 pages
Nios 2 Processor
No ratings yet
Nios 2 Processor
6 pages
Tn-Ed-04 Gddr6 Design Guide
No ratings yet
Tn-Ed-04 Gddr6 Design Guide
24 pages
TU BBA 1st Sem Computer Ref
No ratings yet
TU BBA 1st Sem Computer Ref
74 pages
Wong 2010
No ratings yet
Wong 2010
27 pages
8051
100% (1)
8051
140 pages
Org One: The x86 Microprocessor
No ratings yet
Org One: The x86 Microprocessor
39 pages
Lec1 - Introduction To Computer
No ratings yet
Lec1 - Introduction To Computer
18 pages
Operating System LRU
No ratings yet
Operating System LRU
4 pages
Chapter - 1 Notes of Tony Gaddis Python Book
No ratings yet
Chapter - 1 Notes of Tony Gaddis Python Book
2 pages
Computer Notes Form 1 4 Topical Booklet
No ratings yet
Computer Notes Form 1 4 Topical Booklet
196 pages
Tutorial 8
No ratings yet
Tutorial 8
13 pages
Types of Main Memory: 1-Read-Write or Random Access Memory (RAM)
No ratings yet
Types of Main Memory: 1-Read-Write or Random Access Memory (RAM)
4 pages
SW DWNLD
No ratings yet
SW DWNLD
53 pages
The FRANCES Melllory Board
No ratings yet
The FRANCES Melllory Board
8 pages
Asup Final 2023 Final
No ratings yet
Asup Final 2023 Final
216 pages
Synopsys - DDR4.datasheet - DWC ddr4 Multiphy Ds
No ratings yet
Synopsys - DDR4.datasheet - DWC ddr4 Multiphy Ds
4 pages
Acronym of Computer System PDF
No ratings yet
Acronym of Computer System PDF
10 pages
EEE-363 L3T2 20eee Class-4
No ratings yet
EEE-363 L3T2 20eee Class-4
7 pages

Cache Performance Research Paper

Uploaded by

Cache Performance Research Paper

Uploaded by

CSCE430/830 Computer Architecture

Memory Hierarchy: Set-Associative Cache

Lecturer: Prof. Hong Jiang

Portions of these slides are derived from:

• Separating out Memory component entirely

When all instructions are hit

CPUtime _ Cache  ( IC  CPI  MemoryStalls )  ClockCycleTime

Instruction miss cycles =IC x 2% x 40 = 0.80 x IC

CPIstall = 2 + ( 0.80 + 0.567 ) = 2 + 1.376 = 3.376

IC x CPIstall x Clock period 3.376

CSCE430/830 Memory: Set-Associative $

Assume: we increase the performance of the previous machine by

CSCE430/830 Memory: Set-Associative $

• Q1: Where can a block be placed in the upper level?

• Q2: How is a block found if it is in the upper level?

• Q3: Which block should be replaced on a miss?

• Q4: What happens on a write?

CSCE430/830 Memory: Set-Associative $

• Where can block be placed in cache?

CSCE430/830 Memory: Set-Associative $

*0 *4 *8 *C address maps to block:

CSCE430/830 Memory: Set-Associative $

DM Memory Access 1: Mapping: 0 modulo 4 = 0

Mem Block DM Hit/Miss

CSCE430/830 Memory: Set-Associative $

DM Memory Access 1: Mapping: 0 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[0]

Set 0 is empty: write Mem[0]

DM Memory Access 2: Mapping: 8 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[0]

CSCE430/830 Memory: Set-Associative $

DM Memory Access 2: Mapping: 8 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[8]

DM Memory Access 3: Mapping: 0 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[8]

CSCE430/830 Memory: Set-Associative $

DM Memory Access 3: Mapping: 0 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[0]

DM Memory Access 4: Mapping: 6 mod 4 = 2

Mem Block DM Hit/Miss Block 0 Mem[0]

CSCE430/830 Memory: Set-Associative $

DM Memory Access 4: Mapping: 6 mod 4 = 2

Mem Block DM Hit/Miss Block 0 Mem[0]

Set 2 empty. Write Mem[6]

DM Memory Access 5: Mapping: 8 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[0]

CSCE430/830 Memory: Set-Associative $

DM Memory Access 5: Mapping: 8 mod 4 = 0

Mem Block DM Hit/Miss Block 0 Mem[8]

00001 00101 01001 01101 10001 10101 11001 11101

CSCE430/830 Memory: Set-Associative $

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

Mem Block DM Hit/Miss S

FA Block Replacement Rule: replace least recently used block

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

Mem Block DM Hit/Miss S Mem

CSCE430/830 Set 0 is empty: write Mem[0] toMemory:

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

Mem Block DM Hit/Miss S Mem

CSCE430/830 Memory: Set-Associative $

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

Mem Block DM Hit/Miss S Mem Mem

CSCE430/830 Blocks 1-3 are LRU: write Mem[8] to BlockMemory:

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

Mem Block DM Hit/Miss S Mem Mem

CSCE430/830 Memory: Set-Associative $

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

Mem Block DM Hit/Miss S Mem Mem

Block 0 contains Mem[0]

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

Mem Block DM Hit/Miss S Mem Mem

CSCE430/830 Memory: Set-Associative $

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

Mem Block DM Hit/Miss S Mem Mem Mem

CSCE430/830 Blocks 2-3 are LRU : write Mem[6] to Memory:

• Fully-Associative cache contains 4 1-word blocks. Find the # Misses

0 4 8 C address maps to block: