0% found this document useful (0 votes)
42 views29 pages

Unit 5 1 Cache Performance V 2

This document summarizes key points about memory hierarchies and caches from a textbook chapter. It introduces the concepts of locality, how memory hierarchies take advantage of locality by copying data between faster but smaller memory levels, and the basic organization of caches. It describes the different memory technologies like SRAM, DRAM, flash, and disks and how they have improved over time. It also covers the fundamentals of direct-mapped caches, including how tags and valid bits allow determining whether a block of data is present in the cache.

Uploaded by

Lekshmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views29 pages

Unit 5 1 Cache Performance V 2

This document summarizes key points about memory hierarchies and caches from a textbook chapter. It introduces the concepts of locality, how memory hierarchies take advantage of locality by copying data between faster but smaller memory levels, and the basic organization of caches. It describes the different memory technologies like SRAM, DRAM, flash, and disks and how they have improved over time. It also covers the fundamentals of direct-mapped caches, including how tags and valid bits allow determining whether a block of data is present in the cache.

Uploaded by

Lekshmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

COMPUTER ORGANIZATION AND DESIGN

5
Editio
th

The Hardware/Software Interface n

Chapter 5
Large and Fast:
Exploiting Memory
Hierarchy
§5.1 Introduction
Principle of Locality
 Programs access a small proportion of
their address space at any time
 Temporal locality
 Items accessed recently are likely to be
accessed again soon
 e.g., instructions in a loop, induction variables
 Spatial locality
 Items near those accessed recently are likely
to be accessed soon
 E.g., sequential instruction access, array data
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2
Taking Advantage of Locality
 Memory hierarchy
 Store everything on disk
 Copy recently accessed (and nearby)
items from disk to smaller DRAM memory
 Main memory
 Copy more recently accessed (and
nearby) items from DRAM to smaller
SRAM memory
 Cache memory attached to CPU

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 3


Memory Hierarchy Levels
 Block (aka line): unit of copying
 May be multiple words
 If accessed data is present in
upper level
 Hit: access satisfied by upper level
 Hit ratio: hits/accesses
 If accessed data is absent
 Miss: block copied from lower level
 Time taken: miss penalty
 Miss ratio: misses/accesses
= 1 – hit ratio
 Then accessed data supplied from
upper level

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 4


§5.2 Memory Technologies
Memory Technology
 Static RAM (SRAM)
 0.5ns – 2.5ns, $2000 – $5000 per GB
 Dynamic RAM (DRAM)
 50ns – 70ns, $20 – $75 per GB
 Magnetic disk
 5ms – 20ms, $0.20 – $2 per GB
 Ideal memory
 Access time of SRAM
 Capacity and cost/GB of disk

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 5


DRAM Technology
 Data stored as a charge in a capacitor
 Single transistor used to access the charge
 Must periodically be refreshed
 Read contents and write back
 Performed on a DRAM “row”

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 6


Advanced DRAM Organization
 Bits in a DRAM are organized as a
rectangular array
 DRAM accesses an entire row
 Burst mode: supply successive words from a
row with reduced latency
 Double data rate (DDR) DRAM
 Transfer on rising and falling clock edges
 Quad data rate (QDR) DRAM
 Separate DDR inputs and outputs

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 7


DRAM Generations

Year Capacity $/GB

1980 64Kbit $1500000

1983 256Kbit $500000

1985 1Mbit $200000

1989 4Mbit $50000

1992 16Mbit $15000

1996 64Mbit $10000

1998 128Mbit $4000


Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 8
2000 256Mbit $1000
DRAM Performance Factors
 Row buffer
 Allows several words to be read and refreshed in
parallel
 Synchronous DRAM
 Allows for consecutive accesses in bursts without
needing to send each address
 Improves bandwidth
 DRAM banking
 Allows simultaneous access to multiple DRAMs
 Improves bandwidth

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 9


Increasing Memory Bandwidth

 4-word wide memory


 Miss penalty = 1 + 15 + 1 = 17 bus cycles
 Bandwidth = 16 bytes / 17 cycles = 0.94 B/cycle
 4-bank interleaved memory
 Miss penalty = 1 + 15 + 4×1 = 20 bus cycles
 Bandwidth = 16 bytes / 20 cycles = 0.8 B/cycle
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 10
§6.4 Flash Storage
Flash Storage
 Nonvolatile semiconductor storage
 100× – 1000× faster than disk
 Smaller, lower power, more robust
 But more $/GB (between disk and DRAM)

Chapter 6 — Storage and Other I/O Topics — 11


Flash Types
 NOR flash: bit cell like a NOR gate
 Random read/write access
 Used for instruction memory in embedded systems
 NAND flash: bit cell like a NAND gate
 Denser (bits/area), but block-at-a-time access
 Cheaper per GB
 Used for USB keys, media storage, …
 Flash bits wears out after 1000’s of accesses
 Not suitable for direct RAM or disk replacement
 Wear leveling: remap data to less used blocks

Chapter 6 — Storage and Other I/O Topics — 12


§6.3 Disk Storage
Disk Storage
 Nonvolatile, rotating magnetic storage

Chapter 6 — Storage and Other I/O Topics — 13


Disk Sectors and Access
 Each sector records
 Sector ID
 Data (512 bytes, 4096 bytes proposed)
 Error correcting code (ECC)
 Used to hide defects and recording errors
 Synchronization fields and gaps
 Access to a sector involves
 Queuing delay if other accesses are pending
 Seek: move the heads
 Rotational latency
 Data transfer
 Controller overhead

Chapter 6 — Storage and Other I/O Topics — 14


Disk Access Example
 Given
 512B sector, 15,000rpm, 4ms average seek
time, 100MB/s transfer rate, 0.2ms controller
overhead, idle disk
 Average read time
 4ms seek time
+ ½ / (15,000/60) = 2ms rotational latency
+ 512 / 100MB/s = 0.005ms transfer time
+ 0.2ms controller delay
= 6.2ms
 If actual average seek time is 1ms
 Average read time = 3.2ms

Chapter 6 — Storage and Other I/O Topics — 15


Disk Performance Issues
 Manufacturers quote average seek time
 Based on all possible seeks
 Locality and OS scheduling lead to smaller actual
average seek times
 Smart disk controller allocate physical sectors on
disk
 Present logical sector interface to host
 SCSI, ATA, SATA
 Disk drives include caches
 Prefetch sectors in anticipation of access
 Avoid seek and rotational delay

Chapter 6 — Storage and Other I/O Topics — 16


§5.3 The Basics of Caches
Cache Memory
 Cache memory
 The level of the memory hierarchy closest to
the CPU
 Given accesses X1, …, Xn–1, Xn

 How do we know if
the data is present?
 Where do we look?

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 17


Direct Mapped Cache
 Location determined by address
 Direct mapped: only one choice
 (Block address) modulo (#Blocks in cache)

 #Blocks is a
power of 2
 Use low-order
address bits

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 18


Cache Organizations I :
Cache Memory DirectMapped Cache
 For a 2N byte cache:

1 KB Direct−Mapped Cache with 32 Byte Lines
 The upper-most (32 - N) bits are always the Cache Tag
 The lowest M bits are the Byte Select (Block Size = 2M)
 One cache miss, pull in complete “Cache Block” (or “Cache Line”)
Block address
31 9 4 0
Cache Tag Example: 0x50 Cache Index Byte Select
Ex: 0x01 Ex: 0x00
Stored as part
of the cache “state”

Valid Bit Cache Tag Cache Data


Byte 31 Byte 1 Byte 0 0

: :
0x50 Byte 63 Byte 33 Byte 32 1
2
3
: : :

Byte 1023 Byte 992 31


Chapter 6: Memory : 19
Tags and Valid Bits
 How do we know which particular block is
stored in a cache location?
 Store block address as well as the data
 Actually, only need the high-order bits
 Called the tag
 What if there is no data in a location?
 Valid bit: 1 = present, 0 = not present
 Initially 0

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 20


Cache Example
 8-blocks, 1 word/block, direct mapped
 Initial state
Index V Tag Data

000 N

001 N

010 N

011 N

100 N

101 N

110 N 5 — Large and Fast: Exploiting Memory Hierarchy — 21


Chapter
Cache Example

Word addr Binary addr Hit/miss Cache block

22 10 110 Miss 110

Index V Tag Data

000 N

001 N

010 N

011 N

100 N Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 22


Cache Example

Word addr Binary addr Hit/miss Cache block

26 11 010 Miss 010

Index V Tag Data

000 N

001 N

010 Y 11 Mem[11010]

011 N

100 N Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 23


Cache Example

Word addr Binary addr Hit/miss Cache block

22 10 110 Hit 110

26 11 010 Hit 010


Index V Tag Data

000 N

001 N

010 Y 11 Mem[11010]

011 N

100 N Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 24


Cache Example

Word addr Binary addr Hit/miss Cache block

16 10 000 Miss 000

3 00 011 Miss 011


Index V Tag Data
16 10 000 Hit 000
000 Y 10 Mem[10000]

001 N

010 Y 11 Mem[11010]

011 Y 00 Mem[00011]

100 N Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 25


Cache Example

Word addr Binary addr Hit/miss Cache block

18 10 010 Miss 010

Index V Tag Data

000 Y 10 Mem[10000]

001 N

010 Y 10 Mem[10100]

011 Y 00 Mem[00011]

100 N Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 26


Address Subdivision

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 27


Example: Larger Block Size
 64 blocks, 16 bytes/block
 To what block number does address 1200
map?
 Block address = 1200/16 = 75
 Block number = 75 modulo 64 = 11
31 10 9 4 3 0
Tag Index Offset
22 bits 6 bits 4 bits

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 28


Block Size Considerations
 Larger blocks should reduce miss rate
 Due to spatial locality
 But in a fixed-sized cache
 Larger blocks  fewer of them
 More competition  increased miss rate
 Larger blocks  pollution
 Larger miss penalty
 Can override benefit of reduced miss rate
 Early restart and critical-word-first can help

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 29

You might also like