0% found this document useful (0 votes)
2 views45 pages

CMP3010L08 Memory

The document discusses memory hierarchy in computer architecture, emphasizing the principles of locality, including temporal and spatial locality. It explains the structure and operation of caches, particularly direct-mapped caches, and the trade-offs involved in cache design, such as block size and write policies. Additionally, it covers cache access mechanisms, hit/miss terminology, and the implications of different write policies on cache performance.

Uploaded by

Mostafa Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views45 pages

CMP3010L08 Memory

The document discusses memory hierarchy in computer architecture, emphasizing the principles of locality, including temporal and spatial locality. It explains the structure and operation of caches, particularly direct-mapped caches, and the trade-offs involved in cache design, such as block size and write policies. Additionally, it covers cache access mechanisms, hit/miss terminology, and the implications of different write policies on cache performance.

Uploaded by

Mostafa Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

CMP3010: Computer Architecture

L08: Memory Hierarchy

Dina Tantawy
Computer Engineering Department
Cairo University
Agenda
• Introduction
• The principle of locality
• Temporal locality
• Spatial locality
• Memory Hierarchy
• The basics of caches
• Direct mapped memory
Is it a “Computing” Machine ??
Is it a “Computing” Machine ??
Compute

Load/store data
to memory

Fetch instruction Load Store data


from memory to registers
Memory
Is it a “Computing” Machine ??
Compute

Load/store data
to memory

Fetch instruction Load Store data


from memory to registers
Introduction

– Large memories are slow but cheap


–Small memories are fast but expensive
•Make the average access time small by:
–Servicing most accesses from a small, fast memory.

6
Principle of locality
States that programs access a relatively small
portion of their address space at any instant of
time.

Temporal locality Spatial locality


If an item is referenced, If an item is referenced,
it will tend to be items whose addresses are
referenced again soon. close by will tend to be
referenced soon. ‹#›
Memory Hierarchy
• A structure that uses multiple levels of memories; as the distance from
the CPU increases, the size of the memories and the access time both
increase.

• Main memory is implemented from DRAM , while levels closer to the


processor (caches) use SRAM.

• Memory hierarchies take advantage of temporal locality by keeping


more recently accessed data items closer to the processor. Memory
hierarchies take advantage of spatial locality by moving blocks consisting
of multiple contiguous words in memory to upper levels of the
hierarchy.
8
Memory Hierarchy
• The data is similarly hierarchical: a level closer to the processor is
generally a subset of any level further away, and all the data is stored
at the lowest level

9
The basic structure of memory hierarchy

10
Memory Hierarchy : Principles of Operation
• A memory hierarchy can consist of multiple levels, but data is copied
between only two adjacent levels at a time.
• Upper Level (Cache) : the one closer to the processor
• Smaller, faster, and uses more expensive technology
• Lower Level (Memory): the one further away from the processor
• Bigger, slower, and uses less expensive technology

• Block (line): The minimum unit of information that can be either present
or not present in a cache.

11
Memory Hierarchy : Terminologies
•Hit: data appears in some block in the upper level
–Hit Rate: the fraction of memory access found in the upper level
–Hit Time: Time to access the upper level which consists of
cache access time + Time to determine hit/miss

•Miss: data needs to be retrieved from a block in the lower level


–Miss Rate = 1 - (Hit Rate)
–Miss Penalty = Time to replace a block in the upper level + Time to deliver the
block to the processor

•Hit Time << Miss Penalty


12
Memory Hierarchy

13
The Basics of Caches

14
The Basics of Caches
• How do we know if a data item is in the cache?
• How do we find it?

16
The Basics of Caches

17
Direct Mapped Cache
Direct-mapped cache: A
cache structure in which
each memory location is
mapped to exactly one
location in the cache.

18
Direct Mapped Cache
Direct-mapped cache: A
Which Should I search ?
cache structure in which
each memory location is (Block address) Mod (number of
Blocks in cache)
mapped to exactly one
location in the cache. i.e. Address 11001 → 25
25%8 = 1

19
Direct Mapped Cache
• Because each cache location can contain the contents of a number of
different memory locations, how do we know whether the data in
the cache corresponds to a requested word?

Address 11001 & 10001 both maps to block 1

• Tag: A field in a table used for a memory hierarchy that contains the
address information required to identify whether the associated block
in the hierarchy corresponds to a requested word.

20
Direct Mapped Cache
• How to recognize that a cache block does not have valid
information?

• Valid bit: A field in the tables of a memory hierarchy that indicates


that the associated block in the hierarchy contains valid data.

21
Accessing Caches

MISS

22
Accessing Caches

MISS

23
Accessing Caches

HIT

24
Accessing Caches

HIT

25
Accessing Caches

MISS

26
Accessing Caches

MISS

27
Accessing Caches

HIT

28
Accessing Caches

MISS

29
Accessing Caches

HIT

30
Definition of a Cache Block
• Cache Block: the cache data that has in its own cache tag
• Example:
• 4-byte Direct Mapped cache: Block Size = 1 Byte
• Take advantage of Temporal Locality: If a byte is referenced,
it will tend to be referenced soon.
• Did not take advantage of Spatial Locality: If a byte is referenced, its adjacent
bytes will be referenced soon.
• To take advantage of Spatial Locality: increase the block size
Valid Cache Tag Direct Mapped Cache Data
Byte 0
Byte 1
Byte 2
Byte 3
31
Example: 1 KB Direct Mapped Cache with 32-Byte Blocks
• For a 2 ** N byte cache, the address is split as follows:
• The uppermost (32 - N) bits are always the Cache Tag
• The lowest M bits are the Byte Select (Block Size = 2 ** M) – bytes within the block
31 9 4 0
Cache Tag Cache Index Byte Offset
Example: 0x50 Ex: 0x01 Ex: 0x00
Stored as part
of the cache
“state”
Valid Cache Cache Data
Bit Tag Byte 31 Byte 1 Byte 0 0

: :
0x50 Byte 63 Byte 33Byte 32 1
2
3

: : :
Byte 1023 Byte 992 31

:
Direct-Mapped Cache
Tag Index Byte
Offset

t
k b
V Tag Data Block

2k
line
s
t
=

HIT Data Word or Byte


33
Block Size Tradeoff
• In general, larger block size take advantage of spatial locality BUT:
• Larger block size means larger miss penalty:
• Takes longer time to fill up the block
• If block size is too big relative to cache size, miss rate will go up
• Average Access Time:
• = Hit Time + Miss Penalty x Miss Rate

Average
Access
Miss Miss Time
Penalty Rate Exploits Spatial
Locality
Increased Miss
Fewer blocks: Penalty
compromises & Miss Rate
temporal
locality

Block Block Block


Size 34
Size Size
Cache Size
• The total number of bits needed for a cache is a function of the cache
size and the address size, because the cache includes both the
storage for the data and the tags.

35
Cache Size
• For the following situation:
• 32bit byte addresses
• A directmapped cache
• The cache size is 2^n blocks, so n bits are used for the index
• The block size is 2^m words ,so m bits are used for the word within
the block, and two bits are used for the byte part of the address.
• the size of the tag field is 32 - (n + m + 2).

• The total number of bits in a direct mapped cache is:

36
Exercise

37
Exercise

38
What happens if we replaced a block
that already has data ??

‹#›
Read and Write Policies
• Cache read is much easier to handle than cache write:
• Instruction cache is much easier to design than data cache

• Cache write:
• How do we keep data in the cache and memory consistent?

40
Read and Write Policies
• Two write options when the data block is in the memory :
• Write Through: write to cache and memory at the same time.
• Isn’t memory too slow for this?

• Write Back: write to cache only. Write the cache block to memory
when that cache block is being replaced on a cache miss.
• Need a “dirty” bit for each cache block
• Control can be complex

41
Write Buffer for Write Through
Cache
Processor DRAM

Write Buffer

• A Write Buffer is needed between the Cache and Memory


• Processor: writes data into the cache and the write buffer
• Memory controller: write contents of the buffer to memory
• Write buffer is just a FIFO:
• Typical number of entries: 4
• Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle
• Memory system designer’s nightmare:
• Store frequency (w.r.t. time) -> 1 / DRAM write cycle
• Write buffer saturation
42
What if the data block we are writing to
not in the memory ?

‹#›
Write Miss Policies
• Write allocate (also called fetch on write): data at the missed-
write location is loaded to cache, followed by a write-hit
operation. In this approach, write misses are like read misses.

• No-write allocate (also called write-no-allocate or write


around): data at the missed-write location is not loaded to
cache, and is written directly to the backing store. In this
approach, data is loaded into the cache on read misses only.

‹#›
Write Miss Policies
• Both write-through and write-back policies can use
either of these write-miss policies, but usually they are
paired in this way:

• A write-back cache uses write allocate, h o p i n g f o r


subsequent writes (or even reads) to the same
location, which is now cached.

• A write-through cache uses no-write allocate. H e r e ,


subsequent writes have no advantage, since they still
need to be written directly to the backing store.

‹#›
Thank you

‹#›

You might also like