0% found this document useful (0 votes)
64 views23 pages

Course Code: CS 283 Course Title: Computer Architecture: Class Day: Friday Timing: 12:00 To 1:30

CS 283 is a computer architecture course taught on Fridays from 12:00 to 1:30 pm. The lecture covers cache design including write policies and write buffers. Caches exploit spatial and temporal locality to improve memory access times. There are three types of cache misses: compulsory, capacity, and conflict. Write policies determine when written data is propagated to lower levels of memory, such as write-back which only writes on eviction or write-through which writes on every cache write. Write buffers allow caching writes to reduce memory bandwidth and overlap writes with other cache operations.

Uploaded by

Niaz Ahmed Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views23 pages

Course Code: CS 283 Course Title: Computer Architecture: Class Day: Friday Timing: 12:00 To 1:30

CS 283 is a computer architecture course taught on Fridays from 12:00 to 1:30 pm. The lecture covers cache design including write policies and write buffers. Caches exploit spatial and temporal locality to improve memory access times. There are three types of cache misses: compulsory, capacity, and conflict. Write policies determine when written data is propagated to lower levels of memory, such as write-back which only writes on eviction or write-through which writes on every cache write. Write buffers allow caching writes to reduce memory bandwidth and overlap writes with other cache operations.

Uploaded by

Niaz Ahmed Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Course Code: CS 283

Course Title: Computer Architecture


Class Day: Friday Timing: 12:00 to 1:30

Lecture / Week No. : Lecture 9

Instructor Name: Sameen Fatima

Department of Software Engineering


Contents

1. General review on cache

2. Write Policies and Write Buffers


Reference No. 1 Topic: Cache Design  

What is a cache?
Small, fast storage used to improve average access time
to slow memory
• Hold subset of the instructions and data used by program
• Exploits spacial and temporal locality
Reference No. 1 Topic: Cache Design  

Program locality is why caches work


• Memory hierarchy exploit program locality:
– Programs tend to reference parts of their address space that are local in time and space
– Temporal locality: recently referenced addresses are likely to be referenced again
(reuse)
– Spatial locality: If an address is referenced, nearby addresses are likely to be
referenced soon
• Programs that don’t exploit locality won’t benefit from caches
Reference No. 1 Topic: Cache Design  

Where do misses come from?


Classifying Misses: 3 Cs
– Compulsory—The first access to a block is not in the cache, so the
block must be brought into the cache. Also called cold start misses or first
reference misses. (Misses in even an Infinite Cache)
– Capacity—If the cache cannot contain all the blocks needed during
execution of a program, capacity misses will occur due to blocks being
discarded and later retrieved. (Misses in Fully Associative Size X Cache)
– Conflict—If block-placement strategy is set associative or direct
mapped, conflict misses (in addition to compulsory & capacity misses)
will occur because a block can be discarded and later retrieved if too many
blocks map to its set. Also called collision misses or interference misses.
(Misses in N-way Associative, Size X Cache)
Reference No. 1 Topic: Cache Design  
Cache Examples: Cycles 1 – 5 Spatial Locality!
Reference No. 1 Topic: Cache Design  

General View of Caches


• Cache is made of frames
– Frame = data + tag + state bits
– State bits: Valid (tag/data there), Dirty (wrote into data)
• Cache Algorithm
– Find frame(s)
– If incoming tag != stored tag then Miss
• Evict block currently in frame
• Replace with block from memory (or L2 cache)
– Return appropriate word within block
Reference No. 1 Topic: Cache Design  

Basic Cache Organization

Block Frames organized into sets


Number of Frames (ways) in each set is associativity
•One Frame per set (1 column) = Direct Mapped
Reference No. 1 Topic: Cache Design  

Mapping Addresses to Frames


Divide Address into offset, index, tag
– Offset: finds word within a cache block
• O-bit offset 2O-byte block size
– Index: Finds set containing block frame
• N-bit offset 2N sets in cache
• Direct Mapped Cache: Index finds frame directly
– Tag: Remaining bits not implied by block frame must match
Reference No. 1 Topic: Cache Design  

Direct Mapped Caches

Partition Memory Address into three regions


– C = Cache Size
– M = Numbers of bits in memory address
– B = Block Size
Reference No. 1 Topic: Cache Design  

Set Associative Caches


• Partition Memory Address into three regions
– C = Cache Size, B=Block Size, A=number of members per set
Reference No. 1 Topic: Cache Design  

Cache Example
• 32-bit machine
• 4KB, 16B Blocks, direct-mapped cache
– 16B Blocks => 4 Offset Bits
– 4KB / 16B Blocks => 256 Frames
– 256 Frames / 1 –way (DM) => 256 Sets => 8 index bits
– 32-bit address – 4 offset bits – 8 offset bits => 20 tag bits

Another Example
• 32-bit machine
• 64KB, 32B Block, 2-Way Set Associative
• Compute Total Size of Tag Array
– 64KB/ 32B blocks => 2K Blocks
– 2K Blocks / 2-way set-associative => 1K Sets
– 32B Blocks => 5 Offset Bits
– 1K Sets => 10 index bits
– 32-bit address – 5 offset bits – 10 index bits = 17 tag bits
– 17 tag bits * 2K Blocks => 34Kb => 4.25KB
Reference No. 1 Topic: Cache Design  

Summary of Set Associativity


• Direct Mapped
– One place in cache, One Comparator, No Muxes
• Set Associative Caches
– Restricted set of places
– N-way set associativity
– Number of comparators = number of blocks per set
– N:1 mux
• Fully Associative
– Anywhere in cache
– Number of comparators = number of blocks in cache
– N:1 mux needed
Reference No. 1 Topic: Cache Design  

More Detailed Questions


• Block placement policy?
– Where does a block go when it is fetched?
• Block identification policy?
– How do we find a block in the cache?
• Block replacement policy?
– When fetching a block into a full cache, how do we
decide what other block gets kicked out?
• Write strategy?
– Does any of this differ for reads vs. writes?
Reference No. 1 Topic: Cache Design  
Block Placement + ID
• Placement
– Invariant: block always goes in exactly one set
– Fully-Associative: Cache is one set, block goes anywhere
– Direct-Mapped: Block goes in exactly one frame
– Set-Associative: Block goes in one of a few frames
• Identification
– Find Set
– Search ways in parallel (compare tags, check valid bits)

Block Replacement
• Cache miss requires a replacement
• No decision needed in direct mapped cache
• More than one place for memory blocks in set associative
• Replacement Strategies
– Optimal
• Replace Block used furthest ahead in time (oracle)
– Least Recently Used (LRU)
• Optimized for temporal locality
– (Pseudo) Random
• Nearly as good as LRU, simpler
Reference No. 1 Topic: Cache Design  

Write Policies
• Writes are only about 21% of data cache traffic
• Optimize cache for reads, do writes “on the side”
– Reads can do tag check/data read in parallel
– Writes must be sure we are updating the correct data and the correct amount of data (1-8 byte writes)
– Serial process => slow
• What to do on a write hit?
• What to do on a write miss?
Reference No. 1 Topic: Cache Design  

Write Hit Policies


• Q1: When to propagate new values to memory?
• Write back – Information is only written to the cache.
– Next lower level only updated when it is evicted (dirty bits say when data has been modified)
– Can write at speed of cache
– Caches become temporarily inconsistent with lower-levels of hierarchy.
– Uses less memory bandwidth/power (multiple consecutive writes may require only 1 final write)
– Multiple writes within a block can be merged into one write
– Evictions are longer latency now (must write back)
Reference No. 1 Topic: Cache Design  

Write Hit Policies


• Q1: When to propagate new values to memory?
• Write through – Information is written to cache
and to the lower-level memory
– Main memory is always “consistent/coherent”
– Easier to implement – no dirty bits
– Reads never result in writes to lower levels (cheaper)
– Higher bandwidth needed
– Write buffers used to avoid write stalls
Reference No. 1 Topic: Cache Design  

Write buffers
• Small chunks of memory to buffer outgoing writes
• Processor can continue when data written to buffer
• Allows overlap of processor execution with memory update

• Write buffers are essential for write-through caches


Reference No. 1 Topic: Cache Design  

Write buffers
• Writes can now be pipelined (rather than serial)
• Check tag + Write store data into Write Buffer
• Write data from Write buffer to L2 cache (tags ok)
• Loads must check write buffer for pending stores to same address
• Loads Check:
• Write Buffer
• Cache
• Subsequent Levels of Memory
Reference No. 1 Topic: Cache Design  

Write Merging
Reference No. 1 Topic: Cache Design  

Write buffer policies:


Performance/Complexity Tradeoffs

• Allow merging of multiple stores? (“coalescing”)


• “Flush Policy” – How to do flushing of entries?
• “Load Servicing Policy” – What happens when a load occurs to data currently in write
buffer?
References / Resources

• Digital Design and Computer Architecture by David Harris

You might also like