0% found this document useful (0 votes)
14 views21 pages

Lectures wk11

Uploaded by

Abhay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views21 pages

Lectures wk11

Uploaded by

Abhay Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Describing Caches

We characterize a cache using 5 parameters


• Access Time: Thit
• Capacity: the total amount of data the cache can hold
• # of lines * line length
• Line Length / Block size: The amount of data that gets moved into or
out of the cache as a chunk
• Analagous to page size in virtual memory
• What happens on a write?
• Replacement Policy: What data is replaced on a miss?
• Associativity: How many locations in the cache is a given address
eligible to be placed in?
• Unified, Instruction, Data: What type of data is kept in the cache?

139
Capacity
• In general, bigger is better
• The more data you can store in the cache, the less
often you have to go out to the main memory

• However, bigger caches tend to be slower


• Need to understand how both Thit and Phit change as
you change the capacity of the cache.
• Declining return on investment as cache size goes up

140
Cache Line Length
• Cache groups contiguous addresses into lines
• Lines almost always aligned on their size
• Caches fetch or write back an entire line of data on a
miss
• Spatial Locality
• Reading/Writing a Line
• Typically, takes much longer to fetch the first word of a
line than subsequent words
• Page Mode memories

Tfetch = Tfirst + (line length / fetch width) * Tsubsequent

141
What causes a MISS?
• Three Major Categories of Cache Misses:
• Compulsory Misses: first access to a block
• Capacity Misses: cache cannot contain all blocks needed to
execute the program
• Conflict Misses: block replaced by another block and then later
retrieved - (affects set assoc. or direct mapped caches)
• Nightmare Scenario: ping pong effect!

142
3/13/2024 142
Block Size and Spatial Locality
Block is unit of transfer between the cache and memory

Tag Word0 Word1 Word2 Word3 4 word block, b=2

Split CPU address block address offsetb

32-b bits b bits


2b = block size a.k.a line size (in bytes)
Larger block size has distinct hardware advantages
• less tag overhead
• exploit fast burst transfers from DRAM
• exploit fast burst transfers over wide busses

What are the disadvantages of increasing block size?


Fewer blocks => more conflicts. Can waste bandwidth.
143
3/13/2024 CS252-Fall’07 143
Miss Rates Vs Block Size
40%

35%

30%

25%
Miss rate

20%

15%

10%

5%

0%
4 16 64 256
Block size (bytes) 1 KB
8 KB
Cache size 16 KB
64 KB
256 KB
144
Hit Rate isn’t Everything
• Average access time is better performance
indicator than hit rate
Tavg = Phit * Thit + Pmiss * Tmiss
Tmiss = Tfetch = Tfirst + (line length / fetch width) *
Tsubsequent

Trade-off: Increasing line length usually increases hit


rate, but also increases fetch time
• As lines get bigger, increase in fetch time starts to
outweigh increase in miss rate

145
Block Size Tradeoff

• In general, larger block size take advantage of spatial locality BUT:


• Larger block size means larger miss penalty:
• Takes longer time to fill up the block
• If block size is too big relative to cache size, miss rate will go up
• Too few cache blocks
• In general, Average Access Time:
• = Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate

Average
Miss Miss Access
Penalty Time
Rate Exploits Spatial Locality
Increased Miss Penalty
Fewer blocks: & Miss Rate
compromises
temporal locality

Block Size Block Size Block Size


146
Basics of caches
• How do we determine if the data is in the cache?
• If data is in the cache, how is it found?

• We only have information on:


• address of data
• how the cache is organised
• Direct mapped cache:
• the data can only be at a specific place

147
Contents of a direct mapped cache

• Data == Cached block


• TAG == Most significant bits of cached block address that identify the
block in that cache row from other blocks that map to that same row
• VALID == Flag bit to indicate the cache content is valid

148
ADDRESS
Direct cache Address (showing bit positions)
31 30 13 12 11 210
Byte
offset

20 10
Hit Data
Tag
Index
Separate address into fields:
Index Valid Tag Data
•Byte offset in word
0
•Index for row of cache 1
2
•Tag identifier of block

1021
Cache of 2^n words, a block being 1022
1023
a 4 byte word, has 2^n*(63-n) bits
20 32
for 32 bit address

#rows=2^n
#bits/row=32+32-2-n+1=63-n
149
Example: 1 KB Direct Mapped Cache with 32 Byte Blocks

• For a 2 ** N byte cache:


• The uppermost (32 - N) bits are always the Cache Tag
• The lowest M bits are the Byte Select (Block Size = 2 ** M)

31 9 4 0
Cache Tag Example: 0x50 Cache Index Byte Select
Ex: 0x01 Ex: 0x00
Stored as part
of the cache “state”

Valid Bit Cache Tag Cache Data


Byte 31 Byte 1 Byte 0 0

: :
0x50 Byte 63 Byte 33 Byte 32 1
2
3
: : :

Byte 1023 Byte 992 31

:
150
Extreme Example: single big line

Valid Bit Cache Tag Cache Data


Byte 3 Byte 2 Byte 1 Byte 0 0

• Cache Size = 4 bytes Block Size = 4 bytes


• Only ONE entry in the cache
• If an item is accessed, likely that it will be accessed again soon
• But it is unlikely that it will be accessed again immediately!!!
• The next access will likely be a miss again
• Continually loading data into the cache but
discard (force out) them before they are used again
• Worst nightmare of a cache designer: Ping Pong Effect
• Conflict Misses are misses caused by:
• Different memory locations mapped to the same cache index
• Solution 1: make the cache size bigger
• Solution 2: Multiple entries for the same Cache Index

151
A Two-way Set Associative Cache

• N-way set associative: N entries for each Cache Index


• N direct mapped caches operates in parallel
• Example: Two-way set associative cache
• Cache Index selects a “set” from the cache
• The two tags in the set are compared in parallel
• Data is selected based on the tag result

Cache Index
Valid Cache Tag Cache Data Cache Data Cache Tag Valid
Cache Block 0 Cache Block 0
: : : : : :

Adr Tag
Compare Sel1 1 Mux 0 Sel0 Compare

OR
Cache Block
Hit 152
Another Extreme Example: Fully Associative

• Fully Associative Cache, N blocks of 32 bytes each


• Forget about the Cache Index
• Compare the Cache Tags of all cache entries in parallel
• Example: Block Size = 32 Byte blocks, we need N 27-bit comparators

31 4 0
Cache Tag (27 bits long) Byte Select
Ex: 0x01

Cache Tag Valid Bit Cache Data


X Byte 31 Byte 1 Byte 0

: :
X Byte 63 Byte 33 Byte 32
X
X
: : :
X
153
Which Block Should be Replaced on a Miss?
• Easy for Direct Mapped
• Set Associative or Fully Associative:
• Random - easier to implement
• Least Recently used - harder to implement - may approximate
• Miss rates for caches with different size, associativity and
replacemnt algorithm.
Associativity: 2-way 4-way 8-way
Size LRU Random LRU Random LRU Random
16 KB 5.18% 5.69% 4.67% 5.29% 4.39% 4.96%
64 KB 1.88% 2.01% 1.54% 1.66% 1.39% 1.53%
256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%

For caches with low miss rates, random is almost as good as LRU.
Q4: What Happens on a Write?
• Write through: The information is written to both the block in
the cache and to the block in the lower-level memory.
• Write back: The information is written only to the block in the
cache. The modified cache block is written to main memory only
when it is replaced.
• is block clean or dirty? (add a dirty bit to each block)
• Pros and Cons of each:
• Write through
• read misses cannot result in writes to memory,
• easier to implement
• Always combine with write buffers to avoid memory latency
• Write back
• Less memory traffic
• Perform writes at the speed of the cache
Q4: What Happens on a Write?
• Since data does not have to be brought into the cache on a write
miss, there are two options:
• Write allocate
• The block is brought into the cache on a write miss
• Used with write-back caches
• Hope subsequent writes to the block hit in cache
• No-write allocate
• The block is modified in memory, but not brought into the cache
• Used with write-through caches
• Writes have to go to memory anyway, so why bring the block into the cache
ARM9 – Split Cache
• ARM9TDMI
• ARM 32-bit and Thumb 16-bit instructions (v4T ISA).
• Code compatibility with ARM7TDMI:
• Portable to 0.25, 0.18 µm CMOS and below.
• Harvard 5-stage pipeline implementation:
• Higher performance (CPI = 1.5)
• Coprocessor interface for on-chip coprocessors:
• Allows floating point, DSP, graphics accelerators.
• EmbeddedICE debug capability
CPU Pipeline structure with Cache
Fetch Decode Read E1 E2 Write
Phase Phase Phase Phase Phase Phase

Mul

Register Register
I-buffer IU Rollback,
And File
Access Write back
128-bit Variable and
ICache word Size And
Bypassing
Decoding Immediate Load/Store
Generation

DCach
PC and e
Fetch Branch Unit
Address Exception
Generation Generation

13-Mar-24 158
ARM Cortex-A9 MPCore

Multicore processor providing up to 4 cache-coherent Cortex-A9


cores each implementing the ARM v7 instruction set
architecture.

You might also like