0% found this document useful (0 votes)
73 views

Cache: CIT 595 Spring 2007

Cache is used to speed up data access by storing frequently used data from main memory in faster SRAM cache memory. A direct mapped cache mapping scheme maps each block of main memory to a single block in the cache through address tag fields. For example, in a cache with 16 blocks, memory block 7 would map to cache block 7. The cache indexing and tag fields are used to determine if a requested data block is present in the cache.

Uploaded by

vip_handa
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

Cache: CIT 595 Spring 2007

Cache is used to speed up data access by storing frequently used data from main memory in faster SRAM cache memory. A direct mapped cache mapping scheme maps each block of main memory to a single block in the cache through address tag fields. For example, in a cache with 16 blocks, memory block 7 would map to cache block 7. The cache indexing and tag fields are used to determine if a requested data block is present in the cache.

Uploaded by

vip_handa
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Cache

• The purpose of cache memory is to speed up data


accesses for processor by storing information in faster
memory made of SRAM
¾ SRAM access time is 3ns to 10ns
Cache ¾ DRAM access time is 30ns to 90ns

CIT 595 • The data stored in cache is data that the processor is
likely to use in the very near future
Spring 2007 ¾ SRAM is fast but has less memory, so store only a subset of
the data stored in main memory

CIT 595 11 - 2

Basic Terminology Cache Mapping Scheme


• Memory is divided into blocks 00 Word 0 Block 0 • If the CPU generates an address for a particular word
01 Word 1 in main memory, and that data happens to be in the
• Each block contains fixed numbers of
words
10 Word 2
Block 1
cache, the same main memory address cannot be used
• Word = size of data stored in one 11 Word 3 to access the cache
location e.g. 8 bits, 16 bits etc..
Main Memory
• Hence a mapping scheme is required that converts the
• One block is used as the minimum unit
of transfer between main memory and
generated main memory address into a cache location
cache
0 Word 1 Word 0 • Mapping Scheme also determines where the block will
• Hence, each location in the cache placed when it originally copied into the cache
stores 1 block 1 Word 3 Word 2

Cache

CIT 595 11 - 3 CIT 595 11 - 4

1
Address Conversion to Cache Location Mapping Scheme 1: Direct Mapped Cache
• Address Conversion is done by giving special • In a direct mapped cache
significance to the bits of the main memory address consisting of N blocks of
cache (i.e. N locations) , block
• The address is split into distinct groups called fields X of main memory maps to
¾ Just like instruction decoding is done based on certain bit fields cache block Y = X mod N.

• The group fields are a way to find: • E.g. if we have 10 blocks of


¾ Which cache location ? cache, block 7 of cache may
¾ Which word in the block ?
hold blocks 7, 17, 27, 37, . . .
¾ Whether it is the right data are looking for? Some kind of
unique identifier
of main memory.

CIT 595 11 - 5 CIT 595 11 - 6

Direct Mapped Scheme: Address Conversion Cache with 4 blocks and 8 words per block

Tag Block Word Block


No. Tag Data
0
n-bit main memory address
1
2
Word = which block in word? Note: Will
3
explain why the
Block = Which location in Cache?
breakup is this
Tag = unique identifier w.r.t one block way….later

Note: Tag is used to distinguish whether


main memory block 7 or 17 is stored in
cache block 7
CIT 595 11 - 7 CIT 595 11 - 8

2
Example of Direct Mapped Scheme Direct Mapped Cache with 16 blocks

• Suppose our memory consists of 214 locations (or words), Block


and cache has 16 = 24 blocks, and each block holds 8 No. Tag Data
words 0
1
• Thus main memory is divided into 214 / 2 3 = 211 blocks 2
3
• Of the 14 bit address, we need 4 bits for the block field, 3 4
bits for the word, and the tag is what’s left over
5

13
14
15

CIT 595 11 - 9 CIT 595 11 - 10

Cache Indexing and Data Retrieval Example of Direct Mapped Cache (contd..)
• Suppose a program generates the address 1AA
• In 14-bit binary, this number is: 000001 1010 1010

Block • The first 7 bits of this address go in the tag field, the next 4
No. Tag Data bits go in the block field, and the final 3 bits indicate the
word within the block.

n
7
7 MUX 3
= n-bit
?
CIT 595 Desired Word if tags match 11 - 11 CIT 595 11 - 12

3
Direct Mapped Cache Example Direct Mapped Cache Example (contd..)
Block • However, if the program generates the address,
No. Tag Data 3AB
0 • 3AB also maps to block 0101, but we will not find data for
1
3AB in cache
¾Tags will not match i.e. 0000111 (of addr 3AB) is not
2
equal to 0000011 (of addr 1AB)
3
4 • Hence we get it from main memory
5 0000011
• The block loaded for address 1AA would be evicted
(removed) from the cache, and replaced by the blocks
13
associated with the 3AB reference
14
15

CIT 595 1AB 1AA 11 - 13 CIT 595 11 - 14

Direct Mapped Cache with address 3AB Disadvantage of Direct Mapped Cache
Block • Suppose a program generates a series of memory
No. Tag Data references such as: 1AB, 3AB, 1AB, 3AB, . .
0 ¾The cache will continually evict and replace blocks
1
2 • The theoretical advantage offered by the cache is
3 lost in this extreme case
4
• Other cache mapping schemes are designed to
5 0000111
prevent this kind of thrashing

13
14
15

CIT 595 CIT 595


3AB 11 - 15 11 - 16

4
Address Breakup Valid Cache block
• Why is the address broken up in a particular manner ? • How do we know whether the block in cache is valid or not?

Tag Block Word • For example:


¾When processor just starts up, the cache will be empty
and tag fields in each location will be meaningless
• This is done if we use high-order memory address interleaving
¾ Due to spatial locality, data from consecutive addresses are brought ¾Thus tag fields must be ignored initially when the cache
into cache is starting to fill up

• If the higher order bits (i.e. bits used for tag) are used for • For validity, another bit called valid bit is added to the cache
determining cache location then values from consecutive addresses indicate whether the block contains valid information
would map to same location in cache
¾ The middle bits are preferred for block location as they would cause
¾0 – not valid, 1 – valid
less thrashing ¾All blocks at start up would be not valid
¾If data from main memory is got into cache for a
particular block, then valid bit for that field is set
¾Valid bit will contribute to the cache size
CIT 595 11 - 17 CIT 595 11 - 18

Direct Mapped Cache with Valid (V) Field Calculating Cache Size
Suppose our memory consists of 214 locations (or words), and
Block cache has 16 = 24 blocks, and each block holds 8 words
No. Tag Data V
0 0
1 0
2 0
3 0
4 0
• There are 16 locations in the cache
5 0000111 0 1 • Each row has 7 bits for tag + 8 words + 1 valid bit
Address 3AB • Assume 1 word is 8 bits, the total bits in row (8 x 8) + 7 + 1 = 72
13 0 • 72 bits = 9 bytes
referenced for
14 the first time. 0 • Cache size = 16 x 9 bytes = 144 bytes
15 Entire block is 0
brought into
CIT 595 cache block 5. 11 - 19 CIT 595 11 - 20

5
Hit or Miss in the Cache Some more Terminology
• Hit means that we actually found data in the cache
• The hit rate is the percentage of time data is found
at a given memory level
• A hit occurs when valid bit = 1 AND tag in the cache
matches the tag field of the address • The miss rate is the percentage of time it is not
• Miss rate = 1 - hit rate
• If both conditions don’t hold then we did not find the
data in cache • The hit time is the time required to access data at a
¾This is known as miss in cache given memory level

• On a miss, the data is brought from main memory into • The miss penalty is the time required to process a
the cache, and the valid bit is set miss

CIT 595 11 - 21 CIT 595 11 - 22

Scheme 2: Fully Associative Cache Fully Associative Cache: Address Conversion


• Instead of placing memory blocks in specific cache • Suppose, as before, we have 14-bit memory
locations based on memory address, we could allow addresses and a cache with 16 blocks, each block
a block to go anywhere in cache of size 8. The field format of a memory reference is:

• This way, cache would have to fill up before any


blocks are evicted

• This is how fully associative cache works • When the cache is searched, all tags are searched
in parallel to retrieve the data quickly
• A memory address is partitioned into only two
fields: the tag and the word ¾More hardware cost than direct mapped
¾Basically we need “n” comparators where n = #
of blocks in cache

CIT 595 11 - 23 CIT 595 11 - 24

6
Fully Associate: Which block to replace if cache is full? Scheme 3: Set Associative
• Recall that direct mapped cache evicts a block • Set associative cache combines the ideas of direct
whenever another memory reference needs that mapped cache and fully associative cache
block
• A set associative cache mapping is like direct
• With fully associative cache, we have no such mapped cache in that a memory reference maps to
mapping, thus we must devise an algorithm to a particular location in cache
determine which block to evict from the cache
• But that cache location can hold more than one
• The block that is evicted is called the victim block main memory block. The cache location is then
called a set.
• There are a number of ways to pick a victim, we ¾Instead of mapping anywhere in the entire cache (fully
associative), a memory reference can map only to the
will discuss them shortly…. subset of cache

CIT 595 11 - 25 CIT 595 11 - 26

Scheme 3: Set Associative Scheme 3: Address Conversion


• The number of blocks per set in set associative
cache varies according to overall system design

• For example, a 2-way set associative cache can


be conceptualized as shown in the schematic below
¾Each set contains two different memory blocks • Like direct-mapped cache except, middle bits of the
main memory address indicate the set in cache

K-way set associate cache will have K blocks per set


CIT 595 11 - 27 CIT 595 11 - 28

7
K-Set Associative Cache Example Advantage & Disadvantage Set Associative
• Suppose we have a main memory of 214 locations • Advantage
¾ Unlike direct mapped cache, if an address maps to a set, there
• This memory is mapped to a 2-way set associative cache having 16 is choice for placing the new block
blocks where each block contains 8 words
¾ If both slots are filled, then we need an algorithm that will
• Number of Sets = Number of Blocks in cache/ K decide which old block to evict (like fully associative)
¾ Since this is a 2-way cache, each set consists of 2 blocks, and there are 16
sets i.e. 16/2 = 8
• Disadvantage
Thus, we need 3 bits for the set, 3 bits for the word, giving 8 leftover bits ¾ Tags of each block in a set need to be matched (in parallel)
for the tag: to figure out whether the data is present in cache. Need k
comparators.

¾ Although, the hardware cost for matching is less than fully


associative (need n comparators, where n = # blocks), but it is
more than direct mapped (need only one comparator)

CIT 595 11 - 29 CIT 595 11 - 30

Which block to replace? Replacement Algorithm/Policy


• With fully associative and set associative cache, a LRU - Least recently used
replacement algorithm/policy needs to be used
• Evicts the block that has been unused for the
when it becomes necessary to evict a block from
longest period of time
cache

• The disadvantage of this approach is its


• The replacement policy that we choose depends
complexity: LRU has to maintain an access history
upon the locality that we are trying to optimize
for each block, which will slow down the cache
• E.g. if we are interested in temporal locality i.e.
referenced memory is likely to be referenced again soon
(e.g. code within a loop) then we will keep the most
recently used blocks

CIT 595 11 - 31 CIT 595 11 - 32

8
Replacement Algorithm/Policy What about blocks that have been written to?
FIFO - First-in, first-out • While your program is running, it will modify some
• In FIFO, the block that has been in the cache the locations
longest, regardless of when it was last used
• We need to keep main memory and cache consistent if
we are modifying data
Random Replacement
• Does what its name implies: It picks a block at • We have two options
random and replaces it with a new block ¾Should we update cache and memory at the same
time? OR
• Random replacement can certainly evict a block
that will be needed often or needed soon, but it ¾Update the cache and then main memory at a later
time
never thrashes (like in the case of direct-mapped
cache) ¾The two choices are known Cache Write policies

CIT 595 11 - 33 CIT 595 11 - 34

Cache Write Policies Cache Write Policies (contd..)


Write-Through Write Back or Copy Back
• Update cache and main memory simultaneously on • Data that is modified is written back to main memory when the
cache block is going to be evicted (removed) from cache
every write
• Advantage
• Advantage ¾ Faster than write-through, time is not spent accessing main memory
¾ Keeps cache main memory consistent at the same time ¾ Writes to multiple words within a block require only one write to the
main-memory

• Disadvantage • Disadvantage
¾ All writes require main memory access (bus transaction) ¾ Need extra bit in cache to indicate which block has been modified
¾ Slows down the system - If the there is another read request ¾ Like valid bit, a another bit is introduced called Dirty Bit, to indicate a
for main memory due to miss in cache, the read request has to modified cache block.
wait until the earlier write was serviced ¾ 0 – Not Dirty, 1 – Dirty (modified)
¾ Adds to size of the cache

CIT 595 11 - 35 CIT 595 11 - 36

9
Direct Mapped Cache with Valid and Dirty Bit What affects Performance of Cache?
• Programs that exhibit bad locality
Block • E.g. Spatial Locality with matrix operations
No. Tag Data V D ¾ Suppose Matrix data kept in memory is by rows (known as
row-major) i.e. offset = row*NUMCOLS + column
0 0 0
1 xxxxx 1 0 • Poor code:
2 0 0 ¾ for (j = 0; j < numcols; j++)
3 xxxxx 1 1 for(i = 0; i < numrows; i++)
¾ i.e. x[i][j] followed by x[i + 1][j]
¾ The array is being accessed by column and we going to miss
Dirty Words within one block in the cache every time

• Solution: switch the for loops


D – Dirty Bit V- Valid Bit
• C/C++ are row-major, FORTRAN & MATLAB is
column-major

CIT 595 11 - 37 CIT 595 11 - 38

Multi-level Cache Multi-level Caches (contd..)


• Most of today’s systems employ multilevel cache • In a multi-level cache:
hierarchies
• If the cache system used an inclusive cache, the same data
may be present at multiple levels of cache
• The levels of cache form their own small memory
hierarchy
• Strictly inclusive caches guarantee that all data in a smaller
Current day processor uses: cache also exists at the next higher level.
• Level1 cache (8KB to 64KB) is situated on the
processor itself
• Exclusive caches permit only one copy of the data
¾ Access time is typically about 4ns

• Level 2 cache (64KB to 2MB) located external to the • The tradeoffs in choosing one over the other involve
processor weighing the variables of access time, memory size, and
circuit complexity
¾ Access time is usually around 15 - 20ns

CIT 595 11 - 39 CIT 595 11 - 40

10
Instruction and Data Caches Review of Cache Organization
• A unified or integrated cache is one where both Q1: Where can a block be placed in the cache level?
instructions and data are cached Mapping scheme

Q2: How is a block found if it is in the cache?


• Many modern systems employ separate caches Mapping Scheme
for data and instructions
¾This is called a Harvard cache Q3: If cache is full, then where do we put the new block
i.e. which old block should we replace?
Block replacement policy
• Advantage:
¾Allows accesses to be less random and more clustered
Q4: If we write to a block in cache, should we update the
¾Less access time than unified cache (typically larger) main memory at the same time?
Write Policy

CIT 595 11 - 41 CIT 595 11 - 42

Looking Forward
• Studied interaction between cache and main memory
¾ The memory is managed by hardware

• Next study the interaction between main memory and


disk
¾ The memory is managed by hardware and complier

CIT 595 11 - 43

11

You might also like