0% found this document useful (0 votes)
30 views41 pages

Lecture 04 IS064

Uploaded by

fredrickodara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views41 pages

Lecture 04 IS064

Uploaded by

fredrickodara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Memory and Caching

The Memory Hierarchy


Hierarchy List
• Registers • As one goes down the
• L1 Cache hierarchy
• L2 Cache – Decreasing cost per bit
– Increasing capacity
• Main memory
– Increasing access time
• Disk cache – Decreasing frequency
• Disk of access of the
• Optical memory by the
processor – locality of
• Tape reference
So you want fast?
• It is possible to build a computer which uses
only static RAM (see later)
• This would be very fast
• This would need no cache
– How can you cache cache?
• This would cost a very large amount
Locality of Reference
• Temporal Locality
– Programs tend to reference the same memory locations
at a future point in time
– Due to loops and iteration, programs spending a lot of
time in one section of code
• Spatial Locality
– Programs tend to reference memory locations that are
near other recently-referenced memory locations
– Due to the way contiguous memory is referenced, e.g.
an array or the instructions that make up a program
• Locality of reference does not always hold, but it
usually holds
Cache Example
• Consider a Level 1 cache capable of holding 1000
words with a 0.1 s access time. Level 2 is memory
with a 1 s access time.
• If 95% of memory access is in the cache:
– T=(0.95)*(0.1 s) + (0.05)*(0.1+1 s) = 0.15 s
• If 5% of memory access is in the cache:
– T=(0.05)*(0.1 s) + (0.95)*(0.1+1 s) = 1.05 s
• Want as many cache hits as possible!
1.1 s

0.1 s

0% 100%
Semiconductor Memory
• RAM – Random Access Memory
– Misnamed as all semiconductor memory is
random access
– Read/Write
– Volatile
– Temporary storage
– Two main types: Static or Dynamic
Dynamic RAM
• Bits stored as charge in semiconductor capacitors
• Charges leak
• Need refreshing even when powered
• Simpler construction
• Smaller per bit
• Less expensive
• Need refresh circuits (every few milliseconds)
• Slower
• Main memory
Static RAM
• Bits stored as on/off switches via flip-flops
• No charges to leak
• No refreshing needed when powered
• More complex construction
• Larger per bit
• More expensive
• Does not need refresh circuits
• Faster
• Cache
Read Only Memory (ROM)
• Permanent storage
• Microprogramming
• Library subroutines
• Systems programs (BIOS)
• Function tables
Types of ROM
• Written during manufacture
– Very expensive for small runs
• Programmable (once)
– PROM
– Needs special equipment to program
• Read “mostly”
– Erasable Programmable (EPROM)
• Erased by UV
– Electrically Erasable (EEPROM)
• Takes much longer to write than read
– Flash memory
• Erase whole memory electrically
Characteristics of Memory
“Access method”
• Based on the hardware implementation of
the storage device
• Four types
– Sequential
– Direct
– Random
– Associative
Sequential Access Method
• Start at the beginning and read through in
order
• Access time depends on location of data and
previous location
• Example: tape
Direct Access Method
• Individual blocks have unique address
• Access is by jumping to vicinity then
performing a sequential search
• Access time depends on location of data
within "block" and previous location
• Example: hard disk
Random Access Method

• Individual addresses identify locations


exactly
• Access time is consistent across all
locations and is independent previous
access
• Example: RAM
Cache
• Small amount of fast memory
• Sits between normal main memory and CPU
• May be located on CPU chip or module
– An entire blocks of data is copied from memory to the cache
because the principle of locality tells us that once a byte is accessed,
it is likely that a nearby data element will be needed soon.
Cache operation - overview
• CPU requests contents of memory location
• Check cache for this data
• If present, get from cache (fast)
• If not present, read required block from main
memory to cache
• Then deliver from cache to CPU
• Cache includes tags to identify which block of
main memory is in each cache slot
Cache Structure
• Cache includes tags to identify the address of the
block of main memory contained in a line of the
cache
• Each word in main memory has a unique n-bit
address
• There are M=2n/K block of K words in main
memory
• Cache contains C lines of K words each plus a tag
uniquely identifying the block of K words
Cache Structure (continued)
Line
number Tag Block
0
1
2

C-1

Block length
(K words)
Memory Divided into Blocks
Memory
Address
1
2
3 Block of
K words

Block

2n-1

Word length
Cache Performance Metrics
Miss Rate
– Fraction of memory references not found in cache (misses / accesses)
• 1 – hit rate 
– Typical numbers (in percentages):
• 3-10% for L1
• can be quite small (e.g., < 1%) for L2, depending on size,
etc.
Hit Time
– Time to deliver a line in the cache to the processor
• includes time to determine whether the line is in the cache
– Typical numbers:
• 1-2 clock cycle for L1
• 5-20 clock cycles for L2
Cache Performance Metrics
Miss Penalty
– Additional time required because of a miss
• typically 50-200 cycles for main memory (Trend:
increasing!)
Associative Access Method
• Addressing information must be stored with
data in a general data location
• A specific data element is located by a
comparing desired address with address
portion of stored elements
• Access time is independent of location or
previous access
• Example: cache
Mapping Functions
• A mapping function is the method used to locate a
memory address within a cache
• It is used when copying a block from main
memory to the cache and it is used again when
trying to retrieve data from the cache
• There are three kinds of mapping functions
– Direct
– Associative
– Set Associative
Mapping Function
• Suppose we have the following configuration
– Word size of 1 byte
– Cache of 16 bytes
– Cache line / Block size is 2 bytes
• i.e. cache is 16/2 = 8 (23) lines of 2 bytes per line
• Will need 8 addresses for a block in the cache
– Main memory of 64 bytes
• 6 bit address needed to reference 64 bytes
• (26= 64)
• 64 bytes / 2 bytes-per-block  32 Memory Blocks
– Somehow we have to map the 32 memory blocks to the
8 lines in the cache. Multiple memory blocks will have
to map to the same line in the cache!
Mapping Function – 64K Cache
Example
• Suppose we have the following configuration
– Word size of 1 byte
– Cache of 64 KByte
– Cache line / Block size is 4 bytes
• i.e. cache is 64 Kb / 4 bytes = 16 (24) lines of 4 bytes
– Main memory of 16MBytes
• 24 bit address
• (224=16M)
• 16Mb / 4bytes-per-block  4 Meg of Memory Blocks
– Somehow we have to map the 4 Meg of blocks in
memory onto the 16K of lines in the cache. Multiple
memory blocks will have to map to the same line in the
cache!
Direct Mapping
• Simplest mapping technique - each block of main
memory maps to only one cache line
– i.e. if a block is in cache, it must be in one specific
place
• Formula to map a memory block to a cache line:
– i = j mod c
• i=Cache Line Number
• j=Main Memory Block Number
• c=Number of Lines in Cache
– i.e. we divide the memory block by the number of
cache lines and the remainder is the cache line address
Direct Mapping with C=4
• Shrinking our example to a cache line size of 4
slots (each slot/line/block still contains 4 words):
– Cache Line Memory Block Held
• 0 0, 4, 8, …
• 1 1, 5, 9, …
• 2 2, 6, 10, …
• 3 3, 7, 11, …
– In general:
• 0 0, C, 2C, 3C, …
• 1 1, C+1, 2C+1, 3C+1, …
• 2 2, C+2, 2C+2, 3C+2, …
• 3 3, C+3, 2C+3, 3C+3, …
Direct Mapping with C=4
Valid Dirty Tag Block 0
Main
Block 1 Memory
Slot 0

Slot 1 Block 2

Slot 2 Block 3

Slot 3 Block 4

Cache Memory Block 5

Each slot contains K words Block 6


Tag: Identifies which memory block Block 7
is in the slot
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
– If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very
high – condition called thrashing
Fully Associative Mapping
• A fully associative mapping scheme can overcome the
problems of the direct mapping scheme
– A main memory block can load into any line of cache
– Memory address is interpreted as tag and word
– Tag uniquely identifies block of memory
– Every line’s tag is examined for a match
– Also need a Dirty and Valid bit
• But Cache searching gets expensive!
– Ideally need circuitry that can simultaneously examine all tags
for a match
– Lots of circuitry needed, high cost
• Need replacement policies now that anything can get
thrown out of the cache (will look at this shortly)
Associative Mapping Example

Valid Dirty Tag Block 0


Main
Block 1 Memory
Slot 0

Slot 1 Block 2

Slot 2 Block 3

Slot 3 Block 4

Cache Memory Block 5

Block 6
Block can map to any slot
Tag used to identify which block is in which slot Block 7
All slots searched in parallel for target
Associative Mapping Traits
• A main memory block can load into any line of
cache
• Memory address is interpreted as:
– Least significant w bits = word position within block
– Most significant s bits = tag used to identify which
block is stored in a particular line of cache
• Every line's tag must be examined for a match
• Cache searching gets expensive and slower
Associative Mapping Address Structure
Example

Tag – s bits Word – w bits


(22 in example) (2 in ex.)

• 22 bit tag stored with each 32 bit block of data


• Compare tag field with tag entry in cache to
check for hit
• Least significant 2 bits of address identify which
of the four 8 bit words is required from 32 bit
data block
Fully Associative Cache Organization
Fully Associative Mapping Example
Assume that a portion of the tags in the cache in our example
looks like the table below. Which of the following addresses are
contained in the cache?

a.) 438EE816 b.) F18EFF16 c.) 6B8EF316 d.) AD8EF316

Tag (binary) Addresses wi/ block


00 01 10 11
0101 0011 1000 1110 1110 10
1110 1101 1100 1001 1011 01
1010 1101 1000 1110 1111 00
0110 1011 1000 1110 1111 11
1011 0101 0101 1001 0010 00
1111 0001 1000 1110 1111 11
Associative Mapping Summary

• Address length = (s + w) bits


• Number of addressable units = 2s+w words or
bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+ w/2w = 2s
• Number of lines in cache = undetermined
• Size of tag = s bits
Set Associative Mapping
• Compromise between fully-associative and direct-
mapped cache
– Cache is divided into a number of sets
– Each set contains a number of lines
– A given block maps to any line in a specific set
• Use direct-mapping to determine which set in the cache
corresponds to a set in memory
• Memory block could then be in any line of that set
– e.g. 2 lines per set
• 2 way associative mapping
• A given block can be in either of 2 lines in a specific set
– e.g. K lines per set
• K way associative mapping
• A given block can be in one of K lines in a specific set
• Much easier to simultaneously search one set than all lines
Set Associative Mapping
• To compute cache set number:
– SetNum = j mod v
Main Memory
• j = main memory block number
Block 0
• v = number of sets in cache
Block 1

Set 0 Slot 0 Block 2

Slot 1 Block 3
Set 1 Slot 2 Block 4

Slot 3 Block 5
Set Associative Mapping
64K Cache Example
Word
Tag 9 bit Set 13 bit 2 bit

• E.g. Given our 64Kb cache, with a line size of 4 bytes, we have
16384 lines. Say that we decide to create 8192 sets, where each set
contains 2 lines. Then we need 13 bits to identify a set (213=8192)
• Use set field to determine cache set to look in
• Compare tag field of all slots in the set to see if we have a hit, e.g.:
– Address = 16339C = 0001 0110 0011 0011 1001 1100
• Tag = 0 0010 1100 = 02C
• Set = 0 1100 1110 0111 = 0CE7
• Word = 00 = 0
– Address = 008004 = 0000 0000 1000 0000 0000 0100
• Tag = 0 0000 0001 = 001
• Set = 0 0000 0000 0001 = 0001
• Word = 00 = 0
K-Way Set Associative
• Two-way set associative gives much better
performance than direct mapping
– Just one extra slot avoids the thrashing problem
• Four-way set associative gives only slightly
better performance over two-way
• Further increases in the size of the set has
little effect other than increased cost of the
hardware!

You might also like