0% found this document useful (0 votes)
36 views43 pages

Modulo18 RiscV DDCArv Ch8

This chapter discusses memory systems and caches. It begins with an introduction to caches and explains that computer performance depends on both processor and memory performance. It then describes the processor-memory gap and challenges in bridging this gap. The chapter introduces the memory hierarchy of caches, main memory, and disk. It discusses how caches aim to make memory access as fast as the processor by exploiting data locality through caching recently and spatially related data. The chapter covers cache design issues like cache size, block size, associativity, and cache hit and miss rates. It also provides examples of calculating cache performance metrics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views43 pages

Modulo18 RiscV DDCArv Ch8

This chapter discusses memory systems and caches. It begins with an introduction to caches and explains that computer performance depends on both processor and memory performance. It then describes the processor-memory gap and challenges in bridging this gap. The chapter introduces the memory hierarchy of caches, main memory, and disk. It discusses how caches aim to make memory access as fast as the processor by exploiting data locality through caching recently and spatially related data. The chapter covers cache design issues like cache size, block size, associativity, and cache hit and miss rates. It also provides examples of calculating cache performance metrics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Digital Design &

Computer Architecture
Sarah Harris & David Harris

Chapter 8:
Memory Systems
Chapter 8 :: Topics
• Introduction
• Memory System Performance
Analysis
• Caches
• Virtual Memory
• Memory­Mapped I/O
• Summary

2 Digital Design & Computer Architecture Memory Systems


Introduction
• Computer performance depends on:
– Processor performance
– Memory system performance

Processor / Memory Interface:

CLK CLK

MemWrite WE
Address ReadData
Processor Memory
WriteData

3 Digital Design & Computer Architecture Memory Systems


Processor­Memory Gap
• In prior chapters, assumed access memory in 1
clock cycle – but hasn’t been true since the
1980’s.

4 Digital Design & Computer Architecture Memory Systems


Memory System Challenge
• Make memory system appear as fast as
processor
• Use hierarchy of memories
• Ideal memory:
– Fast
– Cheap (inexpensive)
– Large (capacity)

• But can only choose two!

5 Digital Design & Computer Architecture Memory Systems


Memory Hierarchy
CLK Processor Chip

Main
Hard
CPU Cache Memory
Disk

Technology Price / GB Access Time (ns) Bandwidth (GB/s)

SRAM $100 0.2 ­ 3 100+


Cache

DRAM $3 10 ­ 50 30
Speed

Main Memory

SSD $0.10 20,000 0.05 ­ 3


HDD $0.03 5,000,000 0.001 ­ 0.1
Virtual Memory
Capacity

6 Digital Design & Computer Architecture Memory Systems


Locality
Exploit locality to make memory accesses fast:
• Temporal Locality:
– Locality in time
– If data used recently, likely to use it again soon
– How to exploit: keep recently accessed data in higher
levels of memory hierarchy
• Spatial Locality:
– Locality in space
– If data used recently, likely to use nearby data soon
– How to exploit: when access data, bring nearby data
into higher levels of memory hierarchy too
7 Digital Design & Computer Architecture Memory Systems
Chapter 7: Microarchitecture

Memory Performance
Memory Performance
• Hit: data found in that level of memory hierarchy
• Miss: data not found (must go to next level)
Hit Rate = # hits / # memory accesses
= 1 – Miss Rate

Miss Rate = # misses / # memory accesses


= 1 – Hit Rate
• Average memory access time (AMAT): average
time for processor to access data
AMAT = tcache + MRcache[tMM + MRMM(tVM)]
9 Digital Design & Computer Architecture Memory Systems
Memory Performance Example 1
• A program has 2,000 loads and stores
• 1,250 of these data values in cache
• Rest supplied by other levels of memory
hierarchy
• What are the cache hit and miss rates?

Hit Rate = 1250/2000 = 0.625


Miss Rate = 750/2000 = 0.375 = 1 – Hit Rate

10 Digital Design & Computer Architecture Memory Systems


Memory Performance Example 2
• Suppose processor has 2 levels of hierarchy:
cache and main memory
• tcache = 1 cycle, tMM = 100 cycles
• What is the AMAT (average memory access
time) of the program from Example 1?

AMAT = tcache + MRcache(tMM)


= [1 + 0.375(100)] cycles
= 38.5 cycles

11 Digital Design & Computer Architecture Memory Systems


Gene Amdahl, 1922 ­
• Amdahl’s Law: the effort
spent increasing the
performance of a
subsystem is wasted
unless the subsystem
affects a large percentage
of overall performance
• Co­founded 3 companies,
including one called
Amdahl Corporation in
1970
12 Digital Design & Computer Architecture Memory Systems
Chapter 7: Microarchitecture

Caches
Cache
• Highest level in memory hierarchy
• Fast (typically ~ 1 cycle access time)
• Ideally supplies most data to processor
• Usually holds most recently accessed data

CLK Processor Chip

Main
Hard
CPU Cache Memory
Disk

14 Digital Design & Computer Architecture Memory Systems


Cache Design Questions
• What data is held in the cache?
• How is data found?
• What data is replaced?

We focus on data loads, but stores follow the same principles.

15 Digital Design & Computer Architecture Memory Systems


What data is held in the cache?
• Ideally, cache anticipates needed data and
puts it in cache
• But impossible to predict future
• Use past to predict future – temporal and
spatial locality:
– Temporal locality: copy newly accessed data into
cache
– Spatial locality: copy neighboring data into cache
too

16 Digital Design & Computer Architecture Memory Systems


Cache Terminology
• Capacity (C):
– number of data bytes in cache
• Block size (b):
– bytes of data brought into cache at once
• Number of blocks (B = C/b):
– number of blocks in cache: B = C/b
• Degree of associativity (N):
– number of blocks in a set
• Number of sets (S = B/N):
– each memory address maps to exactly one cache set

17 Digital Design & Computer Architecture Memory Systems


How is data found?
• Cache organized into S sets
• Each memory address maps to exactly one set
• Caches categorized by # of blocks in a set:
– Direct mapped: 1 block per set
– N­way set associative: N blocks per set
– Fully associative: all cache blocks in 1 set
• Examine each organization for a cache with:
– Capacity (C = 8 words)
– Block size (b = 1 word)
– So, number of blocks (B = 8)

18 Digital Design & Computer Architecture Memory Systems


Example Cache Parameters
• C = 8 words (capacity)
• b = 1 word (block size)
• So, B = 8 (# of blocks)

Ridiculously small, but will illustrate organizations

19 Digital Design & Computer Architecture Memory Systems


Chapter 7: Microarchitecture

Direct­Mapped Caches
Direct Mapped Cache
Address
11...11111100 mem[0xFF...FC]
11...11111000 mem[0xFF...F8]
11...11110100 mem[0xFF...F4]
11...11110000 mem[0xFF...F0]
11...11101100 mem[0xFF...EC]
11...11101000 mem[0xFF...E8]
11...11100100 mem[0xFF...E4]
11...11100000 mem[0xFF...E0]

00...00100100 mem[0x00...24]
00...00100000 mem[0x00..20] Set Number
00...00011100 mem[0x00..1C] 7 (111)
00...00011000 mem[0x00...18] 6 (110)
00...00010100 mem[0x00...14] 5 (101)
00...00010000 mem[0x00...10] 4 (100)
00...00001100 mem[0x00...0C] 3 (011)
00...00001000 mem[0x00...08] 2 (010)
00...00000100 mem[0x00...04] 1 (001)
00...00000000 mem[0x00...00] 0 (000)

230 Word Main Memory 23 Word Cache


21 Digital Design & Computer Architecture Memory Systems
Direct Mapped Cache Hardware
Byte
Tag Set Offset
Memory
00
Address
27 3
V Tag Data

8-entry x
(1+27+32)-bit
SRAM

27 32

Hit Data

22 Digital Design & Computer Architecture Memory Systems


Direct Mapped Cache Performance
Byte
Tag Set Offset
Memory
00...00 001 00
Address 3
V Tag Data
0 Set 7 (111)
# RISC-V assembly code 0 Set 6 (110)
addi s0, zero, 5 0 Set 5 (101)
addi s1, zero, 0 0 Set 4 (100)
1 00...00 mem[0x00...0C] Set 3 (011)
LOOP: beq s0, zero, DONE
1 00...00 mem[0x00...08] Set 2 (010)
lw s2, 4(s1) 1 00...00 mem[0x00...04] Set 1 (001)
lw s3, 12(s1) 0 Set 0 (000)
lw s4, 8(s1)
addi s0, s0, -1 Miss Rate = 3/15
j LOOP = 20%
DONE:
Temporal Locality
Compulsory Misses
23 Digital Design & Computer Architecture Memory Systems
Direct Mapped Cache: Conflict Miss
Byte
Tag Set Offset
Memory
00...01 001 00
Address 3
V Tag Data
# RISC-V assembly code 0 Set 7 (111)
0 Set 6 (110)
addi s0, zero, 5
0 Set 5 (101)
addi s1, zero, 0 0 Set 4 (100)
LOOP: beq s0, zero, DONE 0 Set 3 (011)
lw s2, 0x4(s1) 0 Set 2 (010)
mem[0x00...04] Set 1 (001)
1 00...00
lw s4, 0x24(s1) mem[0x00...24]
0 Set 0 (000)
addi s0, s0, -1
j LOOP Miss Rate = 10/10
DONE:
= 100%
Conflict Misses

24 Digital Design & Computer Architecture Memory Systems


Chapter 7: Microarchitecture

Associative Caches
N­Way Set Associative Cache
Byte
Tag Set Offset
Memory
00
Address Way 1 Way 0
28 2
V Tag Data V Tag Data

28 32 28 32

= =

0
Hit1 Hit0 Hit1
32

Hit
Conflict Misses
Data

26 Digital Design & Computer Architecture Memory Systems


N­Way Set Assoc. Cache Performance
# RISC-V assembly code
addi s0, zero, 5
addi s1, zero, 0
LOOP: beq s0, zero, DONE Miss Rate = 2/10
lw s2, 0x4(s1) = 20%
lw s4, 0x24(s1)
addi s0, s0, -1 Associativity reduces
j LOOP
conflict misses
DONE:
Way 1 Way 0
V Tag Data V Tag Data
0 0 Set 3
0 0 Set 2
1 00...10 mem[0x00...24] 1 00...00 mem[0x00...04] Set 1
0 0 Conflict Misses Set 0

27 Digital Design & Computer Architecture Memory Systems


Fully Associative Cache

V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data

Reduces conflict misses


Expensive to build

Conflict Misses

28 Digital Design & Computer Architecture Memory Systems


Chapter 7: Microarchitecture

Spatial Locality
Spatial Locality
• Increase block size:
– Block size, b = 4 words
– C = 8 words
– Direct mapped (1 block per set)
– Number of blocks, B = 2 (C/b = 8/4 = 2)
Block Byte
Tag Set Offset Offset
Memory
00
Address
27 2
V Tag Data
Set 1
Set 0
27 32 32 32 32
11

10

01

00
32
=

Hit Data

30 Digital Design & Computer Architecture Memory Systems


Cache with Larger Block Size

Block Byte
Tag Set Offset Offset
Memory
00
Address
27 2
V Tag Data
Set 1
Set 0
27 32 32 32 32
11

10

01

00
32
=

Hit Data

31 Digital Design & Computer Architecture Memory Systems


Cache Perf. with Spatial Locality
addi s0, zero, 5
addi s1, zero, 0 Miss Rate = 1/15
LOOP: beq s0, zero, DONE
= 6.67%
lw s2, 4(s1)
lw s3, 12(s1)
Larger blocks reduce
lw s4, 8(s1)
addi s0, s0, -1 compulsory misses
j LOOP through spatial locality
DONE: Tag
Block Byte
Set Offset Offset
Memory
00...00 0 11 00
Address
27 2
V Tag Data
0 Set 1
1 00...00 mem[0x00...0C] mem[0x00...08] mem[0x00...04] mem[0x00...00] Set 0
27 32 32 32 32

11

10

01

00
32
=

Hit Data
32 Digital Design & Computer Architecture Memory Systems
Types of Misses
• Compulsory: first time data accessed
• Capacity: cache too small to hold all data of
interest
• Conflict: data of interest maps to same
location in cache

Miss penalty: time it takes to retrieve a block from


lower level of hierarchy

33 Digital Design & Computer Architecture Memory Systems


Cache Organization Recap
• Capacity: C
• Block size: b
• Number of blocks in cache: B = C/b
• Number of blocks in a set: N
• Number of sets: S = B/N

Number of Ways Number of Sets


Organization (N) (S = B/N)
Direct Mapped 1 B
N-Way Set Associative 1 < N < B B/N
Fully Associative B 1

34 Digital Design & Computer Architecture Memory Systems


Chapter 7: Microarchitecture

Cache Replacement
Policy
Replacement Policy
• Cache is too small to hold all data of interest at
once
• If cache full: program accesses data X and evicts
data Y
• Capacity miss when access Y again
• How to choose Y to minimize chance of needing it
again?
– Least recently used (LRU) replacement: the least
recently used block in a set evicted

36 Digital Design & Computer Architecture Memory Systems


LRU Replacement
# RISC-V assembly
lw s1, 0x04(zero)
lw s2, 0x24(zero)
lw s3, 0x54(zero)

Way 1 Way 0

V U Tag Data V Tag Data


0 0 0 Set 3 (11)
0 0 0 Set 2 (10)
01 0 00...010 mem[0x00...24] 01 00...000 mem[0x00...04] Set 1 (01)
0 0 0 Set 0 (00)
(a)
Way 1 Way 0

V U Tag Data V Tag Data


0 0 0 Set 3 (11)
0 0 0 Set 2 (10)
1 1 00...010 mem[0x00...24] 1 00...101 mem[0x00...54] Set 1 (01)
0 0 0 Set 0 (00)
(b)

37 Digital Design & Computer Architecture Memory Systems


Chapter 7: Microarchitecture

Cache Summary
Cache Summary
• What data is held in the cache?
– Recently used data (temporal locality)
– Nearby data (spatial locality)
• How is data found?
– Set is determined by address of data
– Word within block also determined by address
– In associative caches, data could be in one of several
ways
• What data is replaced?
– Least­recently used way in the set

39 Digital Design & Computer Architecture Memory Systems


Miss Rate Trends
• Bigger caches reduce capacity misses
• Greater associativity reduces conflict misses

Adapted from Patterson & Hennessy, Computer Architecture: A Quantitative Approach, 2011
40 Digital Design & Computer Architecture Memory Systems
Miss Rate Trends

• Bigger blocks reduce compulsory misses


• Bigger blocks increase conflict misses
41 Digital Design & Computer Architecture Memory Systems
Multilevel Caches
• Larger caches have lower miss rates, longer
access times
• Expand memory hierarchy to multiple levels of
caches
• Level 1: small and fast (e.g. 16 KB, 1 cycle)
• Level 2: larger and slower (e.g. 256 KB, 2­6
cycles)
• Most modern PCs have L1, L2, and L3 cache

42 Digital Design & Computer Architecture Memory Systems


Intel Pentium III Die

© Intel Corp.

43 Digital Design & Computer Architecture Memory Systems

You might also like