0% found this document useful (0 votes)
67 views32 pages

Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek

The document discusses computer memory hierarchy and caches. It begins by explaining the tradeoffs between memory speed and size, and introduces the concept of a memory hierarchy to overcome these tradeoffs. It then discusses caches in more detail, including how they take advantage of locality, the levels in a memory hierarchy, and key cache concepts like block placement, identification, replacement and write strategies. The goal of these techniques is to create the illusion of fast, large and cheap memory.

Uploaded by

Radwa Tawfeek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views32 pages

Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek

The document discusses computer memory hierarchy and caches. It begins by explaining the tradeoffs between memory speed and size, and introduces the concept of a memory hierarchy to overcome these tradeoffs. It then discusses caches in more detail, including how they take advantage of locality, the levels in a memory hierarchy, and key cache concepts like block placement, identification, replacement and write strategies. The goal of these techniques is to create the illusion of fast, large and cheap memory.

Uploaded by

Radwa Tawfeek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

ADVANCED

COMPUTER
ARCHITECTURE
BY
DR. RADWA M. TAWFEEK
LAST LECTURE

• Pipelining
• Working with Hazards and Datapath Control Structures
• Hazards:
• Data, Control and Structural
MEMORY
Ch 5
THIS LECTURE

• Ideally, computer memory would be large and fast


• Unfortunately, memory implementation involves tradeoffs

• Memory Hierarchy
• Includes caches, main memory, and disks

• Caches
• Small and fast
• Contain subset of data from main memory
• Generally close to the processor

• Terminology
• Cache blocks, hit rate, miss rate
MEMORY TECHNOLOGY TRADE-OFFS
MEMORY BASICS

• Users want large and fast memories


• Fact
• Large memories are slow
• Fast memories are small

• Large memories use DRAM technology: Dynamic Random Access Memory


• High density, low power, cheap, slow
• Dynamic: needs to be “refreshed” regularly

• DRAM access times are 50-70ns at cost of $10 to $20 per GB


• Fast memories use SRAM: Static Random Access Memory
• Low density, high power, expensive, fast
• Static: content lasts “forever” (until lose power)

• SRAM access times are .5 – 5ns at cost of $400 to $1,000 per GB


MEMORY TECHNOLOGY

• Static RAM (SRAM)


• 0.5ns – 2.5ns, $2000 – $5000 per GB

• Dynamic RAM (DRAM)


• 50ns – 70ns, $20 – $75 per GB

• Magnetic disk
• 5ms – 20ms, $0.20 – $2 per GB

• Ideal memory
• Access time of SRAM
• Capacity and cost/GB of disk
ASSIGNMENT 1

Choose one of the two papers provided on the Course site


about the DRAM refreshing problem and write a review on
it.
MEMORY HIERARCHY
• Capacity: Register << SRAM << DRAM
• Latency: Register << SRAM << DRAM
• Bandwidth: on-chip >> off-chip
• On a data access:
• –if data is in fast memory -> low-latency access to SRAM
• –if data is not in fast memory -> long-latency access to DRAM

• Memory hierarchies only work if the small, fast memory actually stores
data that is reused by the processor
MEMORY REFERENCE PATTERN
THE GOAL: ILLUSION OF LARGE, FAST, CHEAP
MEMORY
• How do we create a memory that is large, cheap and fast (most of the
time)?
• Strategy: Provide a Small, Fast Memory which holds a subset of the main
memory – called cache
• Keep frequently-accessed locations in fast cache
• Cache retrieves more than one word at a time
• Sequential accesses are faster after first access
TAKING ADVANTAGE OF LOCALITY

• Memory hierarchy
• Store everything on disk
• Copy recently accessed (and nearby) items from disk to smaller DRAM
memory
• Main memory

• Copy more recently accessed (and nearby) items from DRAM to smaller
SRAM memory
• Cache memory attached to CPU
MEMORY HIERARCHY LEVELS

• Block : unit of copying


• May be multiple words

• If accessed data is present in upper level


• Hit: access satisfied by upper level
• Hit ratio: fraction of memory accesses found in upper level = hits/accesses

• If accessed data is absent


• Miss: block copied from lower level
• Time taken: miss penalty
• Miss ratio: misses/accesses
= 1 – hit ratio
• Then accessed data supplied from upper level
• Miss Penalty: Time to bring in a block from the lower level and replace
a block in the upper level with it + Time to deliver the
block the processor
HOW IS THE HIERARCHY MANAGED?

• Registers « Memory
• By the compiler (or assembly language Programmer)

• Cache « Main Memory


• By hardware (Cache Controller)

• Main Memory « Disks


• By combination of hardware and the operating system
• virtual memory
CLASSIFYING CACHES

• Block Placement: Where can a block be placed in the cache?


• •Block Identification: How a block is found if it is in the cache?
• •Block Replacement: Which block should be replaced on a miss?
• •Write Strategy: What happens on a write?
BLOCK PLACEMENT: WHERE PLACE BLOCK IN CACHE?
DIRECT MAPPED CACHE

• Location determined by address


• Direct mapped: only one choice (each memory location is
mapped to exactly one location in the cache.)
• (Block address) modulo (#Blocks in cache)
1 modulo 8 =1 5 modulo 8 =5
9 modulo 8 =1 13 modulo 8 = 5
17 modulo 8 = 1 21 modulo 8 = 5
25 modulo 8 = 1 29 modulo 8 = 5
 #Blocks is a power of 2
 Use (log2 #blocks) low-order
address bits
TAGS AND VALID BITS

• How do we know which particular block is stored in a cache location?


• Store block address as well as the data
• Actually, only need the high-order bits
• Called the tag

• What if there is no data in a location?


• Valid bit: 1 = present, 0 = not present
• Initially 0
ADDRESS SUBDIVISION

1024 Cache Blocks


Each block contains one word
Word stored as bytes in main memory
Main memory 232 words
ADDRESS SUBDIVIDING

 The total number of bits needed for a cache is a function of the cache size and the
address size, because the cache includes both the storage for the data and the tags.
 The size of the block above was one word, but normally it is several. For the
following situation:
 32-bit addresses
 A direct-mapped cache
 The cache size is 2n blocks, so n bits are used for the index
 The block size is 2m words (2m+2 bytes), so m bits are used for the word within the
block, and two bits are used for the byte part of the address
 the size of the tag field is
 32 - (n + m + 2).
BLOCK REPLACEMENT: WHICH BLOCK TO REPLACE?

• No choice in a direct mapped cache


• •In an associative cache, which block from set should be evicted when the set
becomes full?
• •Random
• •Least Recently Used (LRU)
• –LRU cache state must be updated on every access
• –True implementation only feasible for small sets (2-way)

• •First In, First Out (FIFO) aka Round-Robin


• –Used in highly associative caches
WRITE STRATEGY: HOW ARE WRITES HANDLED?
• Cache Hit
• –Write Through – write both cache and memory, generally higher traffic but
simpler to design
• –Write Back – write cache only, memory is written when evicted, dirty bit per block
avoids unnecessary write backs, more complicated

• •Cache Miss
• –No Write Allocate – only write to main memory
• –Write Allocate – fetch block into cache, then write
WRITE-THROUGH

• On data-write hit, could just update the block in cache


• But then cache and memory would be inconsistent
• Write through: also update memory
• But makes writes take longer
• e.g., if base CPI = 1, 10% of instructions are stores, write to memory
takes 100 cycles
• Effective CPI = 1 + 0.1×100 = 11
• Solution: write buffer
• Holds data waiting to be written to memory
• CPU continues immediately
• Only stalls on write if write buffer is already full
WRITE-BACK

• Alternative: On data-write hit, just update the block in cache


• Keep track of whether each block is dirty

• When a dirty block is replaced


• Write it back to memory
• Can use a write buffer to allow replacing block to be read first

It is harder to implement a write back cache than a write


through cache
CACHE MISSES

• On cache hit, CPU proceeds normally


• On cache miss
• Stall the CPU pipeline
• Fetch block from next level of hierarchy
• Instruction cache miss
• Restart instruction fetch
• Data cache miss
• Complete data access
AVERAGE MEMORY ACCESS TIME
CATEGORIZING MISSES: THE THREE C’S
• Compulsory – first-reference to a block, occur even with infinite cache
• Capacity – cache is too small to hold all data needed by program, occur
even under perfect replacement policy
• Conflict – misses that occur because of collisions due to less than full
associativity
REDUCE MISS RATE: LARGE BLOCK SIZE
REDUCE MISS RATE: LARGE CACHE SIZE
IMPROVING CACHE PERFORMANCE
Reduce the miss penalty Reduce the miss rate
Reduce the hit time  smaller blocks  bigger cache
• smaller cache - for large blocks  associative
• direct mapped cache fetch critical
cache
word first
• smaller blocks  larger blocks (16
 use a write buffer
• for writes to 64 bytes)
- check write
buffer on read
 use a victim
• no write allocate
miss – may get cache – a small
– just write to
lucky buffer that holds
write buffer
 use multiple cache the most
• write allocate – recently
levels – L2 cache
write to a delayed not tied to CPU discarded blocks
write buffer that clock rate
then writes to the  Early restart
cache

You might also like