0% found this document useful (0 votes)
48 views

Cache - Memory - Concept

The document discusses cache memory and its role in the memory hierarchy. It explains that cache memory sits between the CPU and main memory, storing frequently accessed data from main memory to allow for faster access by the CPU. When the CPU requests data, it first checks the cache and will retrieve the data from there if it exists; otherwise, it must access slower main memory. The document covers principles of cache memory like locality of reference and different mapping techniques like direct mapping and associative mapping that determine how data is stored in the cache.

Uploaded by

ni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Cache - Memory - Concept

The document discusses cache memory and its role in the memory hierarchy. It explains that cache memory sits between the CPU and main memory, storing frequently accessed data from main memory to allow for faster access by the CPU. When the CPU requests data, it first checks the cache and will retrieve the data from there if it exists; otherwise, it must access slower main memory. The document covers principles of cache memory like locality of reference and different mapping techniques like direct mapping and associative mapping that determine how data is stored in the cache.

Uploaded by

ni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

25/03/2020 1

BVM ENGG. COLLEGE


AY: 2019-20 Sem: even Div: 9(EC)

Cache Memory

Microprocessor and computer


architecture
Dr. Bhargav Goradiya
The Memory Hierarchy
This storage organization can be thought of as a pyramid:

25/03/2020 Cache Memory 2


The Memory Hierarchy
To access a particular piece of data, the CPU
first sends a request to its nearest memory,
usually cache.

If the data is not in cache, then main memory


is queried. If the data is not in main memory,
then the request goes to disk.

Once the data is located, then the data, and a


number of its nearby data elements are
fetched into cache memory.

25/03/2020 Cache Memory 3


Principle of Locality of Reference
Programs tend to reuse data and instructions
they have used recently

Instructions in localized areas are executed


repeatedly

“Make the common case fast”  favor


accesses to such data

Keep recently accessed data in the fastest


memory
25/03/2020 Cache Memory 4
Principle of Locality of Reference
Most program execution time spent on routines in
which many instructions are executed repeatedly

Temporal
– Recently executed instruction is likely to be executed again
very soon
– Whenever an item is first needed, it is first brought to the
cache, where it will hopefully remain until it is needed
again. Also influences choice on which item to discard when
cache is full

Spatial
– Instructions in close proximity to a recently executed
instruction are also likely to be executed soon
– Instead of fetching just one item into the cache, fetch
several adjacent data items as well (block/cache line)
25/03/2020 Cache Memory 5
Cache Memory
Cache Memory is intended to give:
– Memory speed approaching that of the fastest
memories available.
– Large memory size at the price of less expensive
types of semiconductor memories.

Small amount of fast memory.

Sits between normal main memory and CPU.

May be located on CPU chip or module.


25/03/2020 Cache Memory 6
Conceptual Operation
Relatively large and slow main memory together
with faster, smaller cache.
Cache contains a copy of portions of main memory.
When processor attempts to read a word from
memory, a check is made to determine if the word
exists in cache.
– If it is, the word is delivered to the processor.
– If not, a block of main memory is read into the cache, then the
word is delivered to the processor.
Word Block
Transfer Transfer
Cache Main
CPU Memory
Memory

25/03/2020 Cache Memory 7


Hit Ratio
A measure of the efficiency of the cache
structure.
– When the CPU refers to memory and the word is
found in the cache, this called a hit.
– When the word is not found in cache, this is called
a miss.

Hit ratio is the total number of hits divided


by the total number of access attempts (hits
+ misses).
– It has been shown practically that hit rations
higher than 0.9 are possible.

25/03/2020 Cache Memory 8


Cache vs. Main Memory Structure
0
1
Tag Block 2 Block
0 (K words)
1
2
3
.
.
C-1
.
Block Length
(K Words)

Cache
Main
2n-1 Memory
Word Length

25/03/2020 Cache Memory 9


Main Memory and Cache Memory
Main Memory consists of 2n words.
– Each word has a unique n-bit address.

We can consider that main memory is made


up of blocks of K words each.

Cache consists of C lines of K words each.

A block of main memory is copied into a line


of Cache.
– The “tag” field of the line identifies which main
memory block each cache line represents

25/03/2020 Cache Memory 10


Memory Hierarchy Design
Block placement:
– Where can a block be placed in the upper
level?
Block identification:
– How is a block found if it is in the upper
level?
Block replacement:
– Which block should be replaced on a miss?
Write strategy:
– What happens on a write?
25/03/2020 Cache Memory 11
Where can a block be placed in a cache?

Mapping function determines how a


block is placed in the cache

25/03/2020 Cache Memory 12


Mapping Functions
Three Types
– Direct Mapping
– Associative Mapping
– Set-Associative Mapping
Examples assume 64K words main
memory and 2K words cache
1 Block consists of 16 words
65536/16 = 4096 Blocks in Memory
2048/16 = 128 Blocks in Cache

25/03/2020 Cache Memory 13


Where can a block be placed in a cache?
How is a block found?
MAIN MEMORY
CACHE
Block 0
TAG Block 0 Block 1

TAG Block 1 MAPPING


FUNCTION Block 127
Block 128
Block 129
TAG Block 126

TAG Block 127 Block 255


Block 256
Word Block 257
16-Bit
Address 12 4

Block 4095
25/03/2020 Cache Memory 14
Direct Mapping
Block j of main memory maps onto block (j modulo
128) of the cache.
Example:
– Block 2103 of main memory maps to block (2103 mod 128)
= block 55
Each main memory block has only one place in
cache
Cache Block 0
– Memory Blocks 0, 128, 256...
Cache Block 1
– Memory Blocks 1, 129, 257...
Contention may occur even when cache is not full
Replacement algorithm is trivial
Simplest
– Very inflexible

25/03/2020 Cache Memory 15


Direct Mapping
16-bit address (64K words)
16 words per block  lower 4 bits
Cache block position  middle 7 bits
32 blocks are mapped to the same
word
Higher 5 bits tell which of the 32 blocks
are mapped
Higher 5 bits are stored in 5 tag bits
associated with cache location

25/03/2020 Cache Memory 16


How is a block found if it is in the cache?
Direct Mapping
Middle 7 bits select determine which
location in cache is used
Higher-order 5 bits are matched with
tag bits in cache to check if desired
block is the one stored in the cache

25/03/2020 Cache Memory 17


Direct Mapping
MAIN MEMORY
CACHE
Block 0
TAG Block 0 Block 1

TAG Block 1 MAPPING


FUNCTION Block 127
Block 128
Block 129
TAG Block 126

TAG Block 127 Block 255


Block 256
Tag Block Word Block 257
16-Bit
Address 5 7 4

Block 4095
25/03/2020 Cache Memory 18
Direct Mapping Cache Organization

25/03/2020 Cache Memory 19


Example: Direct Mapping
Main Memory: 2M words = 221 Words
Cache: 64K words = 216 Words
Words per Block: 8 words = 23 Words
No. of Blocks in a Cache:
– Cache Size / Block Size = 216 / 23 = 213
No. of bits in Tag field:
Total Bits – Block Bits – Word Bits
21 - 13 - 3 = 5
Tag Block Word

5 13 3

25/03/2020 Cache Memory 20


Associative Mapping
To improve the hit ratio of the cache,
another mapping techniques is often utilized,
“associative mapping”.

A block of main memory may be mapped


into ANY line of the cache.
– A block of memory is no longer restricted to a
single line of cache.

Higher 12 bits are stored in tag bits

25/03/2020 Cache Memory 21


How is a block found if it is in the cache?
Associative Mapping
Tag bits (Higher-order 12 bits) of an
address are compared with tag bits of
each block to check if desired block is
present
Higher cost than direct mapping due to
need to search all 128 tags
Tags must be searched in parallel for
performance reasons

25/03/2020 Cache Memory 22


Associative Mapping
MAIN MEMORY
CACHE
Block 0
TAG Block 0 Block 1

TAG Block 1 MAPPING


FUNCTION Block 127
Block 128
Block 129
TAG Block 126

TAG Block 127 Block 255


Block 256
Word Block 257
16-Bit
Address 12 4

Block 4095
25/03/2020 Cache Memory 23
Associative Mapping

25/03/2020 Cache Memory 24


Set-Associative Mapping
Cache blocks are grouped into sets
A main memory block can reside in any
block of a specific set
Less contention than direct mapping
Less cost than associative mapping
Set = (Block Address) MOD (Number of
Sets in Cache)
k-way set associative cache: k blocks
per set

25/03/2020 Cache Memory 25


How is a block found if it is in the cache?
Set-Associative Mapping
Example: Cache groups two blocks per
set  64 sets (6-bit set field)
64 blocks can be mapped onto one set
Tag bits in each cache block store
upper 6 bits of address to tell which of
the 64 blocks are currently in the cache

25/03/2020 Cache Memory 26


Set-Associative Mapping
MAIN MEMORY
CACHE
Block 0
TAG Block 0 Block 1
Set 0
TAG Block 1 MAPPING
FUNCTION Block 127
Block 128
Block 129
TAG Block 126
Set 63
TAG Block 127 Block 255
Block 256
Tag Set Word Block 257
16-Bit
Address 6 6 4

Block 4095
25/03/2020 Cache Memory 27
25/03/2020 Cache Memory 28
Set-Associative Mapping

25/03/2020 Cache Memory 29


Example: Set-Associative Mapping
Main Memory: 32M words = 225 Words
Cache: 128K words = 217 Words
Words per Block: 16 words = 24 Words
8 Blocks per Set = 23 blocks
No of Sets in Cache:
– 217 / (23 *24) = 210
No of tag bits: 25 – 10 – 4 = 11
Tag Set Word

11 10 4

25/03/2020 Cache Memory 30


Levels of Set Associativity
Direct Mapping: 1 block per set  128
sets
Fully Associative Mapping: 128 blocks
per set  1 set
Set Associative Mapping is in between
Direct and Fully Associative
Different mappings are just different
degrees of set associativity

25/03/2020 Cache Memory 31


Which block should be replaced on a cache
miss?
Replacement Algorithm
– Determines which block in the cache is to
be replaced in the event of a cache miss
and the cache is full
Trivial for direct mapped caches

25/03/2020 Cache Memory 32


Which block should be replaced on a cache
miss?
Replacement algorithms
– Random Replacement
– First-In First-Out (FIFO)
– Optimal Algorithm
– Least Recently Used (LRU)
– Least Frequently Used
– Most Frequently Used

25/03/2020 Cache Memory 33


What happens on a write?
Write policies
– Write-through
– Write-back

25/03/2020 Cache Memory 34


Write-Through
Cache location and main memory
location are updated simultaneously
Simpler but results in unnecessary
write operations if word is updated
many times during its cache residency
Requires only valid bit

25/03/2020 Cache Memory 35


Valid Bit
Indicate if block stored in cache is still valid
Set to 1 when block is initially loaded to cache
Transfers from disk to main memory use DMA,
bypass cache
When main memory block is updated by a
source that bypasses the cache, if block is also
in cache, its valid bit is set to 0

25/03/2020 Cache Memory 36


Write-Back
Update only the cache location and mark it
updated using dirty bit/modified bit
Main memory location is updated later when
block is replaced
Writes at the speed of the cache
Also results in unnecessary writes because
whole block is written back to memory even
if only one word is updated
Requires valid bit and dirty bit

25/03/2020 Cache Memory 37


Dirty Bit
Tells whether block in cache has been
modified/has newer data than main
memory block
Problem: Transfer from main memory to
disk bypassing the cache
Solution: Flush the cache (write back all
dirty blocks) before DMA transfer begins

25/03/2020 Cache Memory 38


What happens on a write miss?
No-write allocate: Data is directly
written to main memory
Write allocate: Block is first loaded
from cache, then cache block is written
to

25/03/2020 Cache Memory 39


Write Buffer
Used as temporary holding location for
data to be written to memory
Processor need not wait for write to
finish
Data in write buffer will be written
when memory is available for writing
Works for both write-through and
write-back caches

25/03/2020 Cache Memory 40


Example of Mapping Techniques
Consider data cache with 8 blocks of data
Each block of data consists of only one word
These are greatly simplified parameters
Consider 4 x 10 array of numbers, arranged
in column order
40 elements = 28h stored from 7A00h to
7A27h

25/03/2020 Cache Memory 41


Example of Mapping Techniques
Address Array
7A00h A(0,0)
7A01h A(1,0)
7A02h A(2,0)
7A03h A(3,0) Direct Mapped 13

7A04h A(0,1)
Set-Associative 15
7A05h A(1,1)
7A06h A(2,1)
Associative 16
7A07h A(3,1)
… … 16-Bit
16
7A24h A(0,9) Address

7A25h A(1,9)
7A26h A(2,9)
7A27h A(3,9)

25/03/2020 Cache Memory 42


Example of Mapping Techniques
Consider the following algorithm
SUM := 0
for j:= 0 to 9 do
SUM := SUM + A(0,j)
end
AVE := SUM / 10
for i:= 9 downto 0 do
A(0,i) := A(0,i) / AVE
end
This gets the average of the first row (0),
and stores the value of the element divided
by the average of all the elements

25/03/2020 Cache Memory 43


Example of Mapping Techniques

Array Address
A(0,0) 7A00h
A(0,1) 7A04h
SUM := 0 A(0,2) 7A08h
for j:= 0 to 9 do A(0,3) 7A0Ch
SUM := SUM + A(0,j) A(0,4) 7A10h
end
A(0,5) 7A14h
AVE := SUM / 10
A(0,6) 7A18h
for i:= 9 downto 0 do
A(0,i) := A(0,i) / AVE A(0,7) 7A1Ch
end A(0,8) 7A20h
A(0,9) 7A24h

25/03/2020 Cache Memory 44


Direct-Mapped Cache

j=1 j=3 j=5 j=7 j=9 i=6 i=4 i=2 i=0


0 A(0,0) A(0,2) A(0,4) A(0,6) A(0,8) A(0,6) A(0,4) A(0,2) A(0,0)

1
2
3
4 A(0,1) A(0,3) A(0,5) A(0,7) A(0,9) A(0,7) A(0,5) A(0,3) A(0,1)

5
6
7

25/03/2020 Cache Memory 45


Direct-Mapped Cache
Only two cache locations are used due
to the way the array is arranged
First loop (0 to 9): all references result
in cache miss
Second loop (9 to 0): reference to 9
and 8 are in the cache, the rest are not
in the cache

25/03/2020 Cache Memory 46


Associative-Mapped Cache

j=7 j=8 j=9 i=1 i=0


0 A(0,0) A(0,8) A(0,8) A(0,8) A(0,0)

1 A(0,1) A(0,1) A(0,9) A(0,1) A(0,1)

2 A(0,2) A(0,2) A(0,2) A(0,2) A(0,2)

3 A(0,3) A(0,3) A(0,3) A(0,3) A(0,3)

4 A(0,4) A(0,4) A(0,4) A(0,4) A(0,4)

5 A(0,5) A(0,5) A(0,5) A(0,5) A(0,5)

6 A(0,6) A(0,6) A(0,6) A(0,6) A(0,6)

7 A(0,7) A(0,7) A(0,7) A(0,7) A(0,7)

25/03/2020 Cache Memory 47


Associative-Mapped Cache
All cache locations are used
First loop (0 to 9): all references result
in cache miss
Second loop (9 to 0): references from
9 to 2 are in the cache, the rest are not
in the cache
Good utilization of cache because of
the order of the loop

25/03/2020 Cache Memory 48


Set-Associative-Mapped Cache

j=3 j=7 j=9 i=4 i=2 i=0


0 A(0,0) A(0,4) A(0,8) A(0,4) A(0,4) A(0,0)

Set 0 1 A(0,1) A(0,5) A(0,9) A(0,5) A(0,5) A(0,1)

2 A(0,2) A(0,6) A(0,6) A(0,6) A(0,2) A(0,2)

3 A(0,3) A(0,7) A(0,7) A(0,7) A(0,3) A(0,3)

4
Set 1 5
6
7

25/03/2020 Cache Memory 49


Associative-Mapped Cache
Half of the cache was used  only one
set
First loop (0 to 9): all references result
in cache miss
Second loop (9 to 0): references 9 to 6
are in the cache, the rest are not in the
cache

25/03/2020 Cache Memory 50


Example of Mapping Techniques
In general, full associative mapping
performs best
Second is set-associative mapping
Worst performance: direct mapping
But full associative mapping is
expensive to implement
Compromise: set-associative mapping

25/03/2020 Cache Memory 51


Measuring Cache Performance
Hit Rate: h
– Ratio of number of hits to number of all
attempted accesses
Miss Rate
– Ratio of number of misses to number of all
attempted acccesses
Hit Rate + Miss Rate = 1
Miss Penalty: M
– Extra time to bring desired block into cache
Hit time: C
– time to hit in the cache; time to access data from
the cache

25/03/2020 Cache Memory 52


Measuring Cache Performance
Average memory access time: tavg
– Hit Rate x Hit Time + Miss Rate x Miss
Penalty

tavg = hC + (1–h)M

25/03/2020 Cache Memory 53


Measuring Cache Performance
Example
– 10 clock cycles for memory access
– 17 clock cycles to load a block into the cache
(miss penalty)
– 1 clock cycle to load word from cache (hit time)
– Assume 30% of instructions perform read/write 
130 memory access for every 100 instructions
– Hit rates: 0.95 for instructions, 0.9 for data
Time without cache = 130 * 10 = 1300
Time with cache = 100 (0.95*1 + 0.05*17)
+ 30 (0.9*1 + 0.1*17) = 258
Ratio: 5.04

25/03/2020 Cache Memory 54


Cache Examples
Intel 386 has no internal cache
Intel 486 – on-chip 8KB unified cache
Intel Pentium – 8KB/8KB split L1 cache
Intel Pentium II – 8KB/16KB split L1
cache / 256KB L2 Cache

25/03/2020 Cache Memory 55


Pentium III Cache
Two cache levels L1 and L2
L1
– 16-KB data cache
• Four-way set-associative / write-back or write-
through
– 16-KB instruction cache
• Two-way set-associative
L2
– Unified cache
– 256K (Coppermine), 512K (Tualatin)

25/03/2020 Cache Memory 56


Pentium III Cache

Processing Units

LI Instruction LI Data
Cache Cache

Bus interface unit


System Bus
Cache Bus

L2 Cache Main Memory Input/Output

25/03/2020 Cache Memory 57


Pentium IV Cache
Up to 3 levels of caches
L1
– 8K data cache, 4-way set-associative,
write-through
– 12K instruction cache
L2
– Unified cache: 256 KB, 512 KB, write-back
L1 and L2 implemented on chip

25/03/2020 Cache Memory 58


Memory Interleaving
Each memory module has its own ABR
(address buffer register) and DBR
(data buffer register)
Consecutive addresses are located in
successive modules
Lower-order address bits select a
module
Higher-order address bits select a
location within the module

25/03/2020 Cache Memory 59


Memory Interleaving
CONSECUTIVE WORDS IN A MODULE

k m

Module Address in Module Main Memory Address

ABR DBR ABR DBR ABR DBR

Module Module Module


0 1 2

25/03/2020 Cache Memory 60


Memory Interleaving
CONSECUTIVE WORDS IN CONSECUTIVE MODULES
(INTERLEAVED)

m k

Address in Module Module Main Memory Address

ABR DBR ABR DBR ABR DBR

Module Module Module


0 1 2

25/03/2020 Cache Memory 61


Example
Assume a cache with 8-word blocks is
used.
On a read miss the block that contains
the desired word must be copied from
the main memory into Cache.
Hardware takes 1 cycle to put address.
Main memory is, Say, DRAM allow the
first word to be accessed in 8 cycle, but
subsequent words in 4 cycles.

25/03/2020 Cache Memory 62


If consecutive words are in single
module, then the time required to load
the desired block in to cache is
1 + 8 + (7 x 4) + 1 = 38
If consecutive words are in Consecutive
module, then the time required to load
the desired block in to cache is
1 + 8 + 4 + 4 = 17

25/03/2020 Cache Memory 63


Example Based on Cache Memory
A Program consists of two nested
loops. The general structure of a
program is given below. The program
is to be rum on that has instruction
cache organized in the direct mapped
manner that has following parameters:
Main Memory Size: 64K words
Cache Size: 1K Words
Block Size: 128 Words

25/03/2020 Cache Memory 64


START 17 The Cycle time of the
main memory is 10τ s
23 and that of the cache is
1τ s.
165 (a) Specify the no of
20
Times 10 bits in TAG, BLOCK and
239 Times WORD fields in main
memory address.
1200
(b) Compute the total
END 1500 time needed for
instruction fetching
during the execution of
the program.
25/03/2020 Cache Memory 65
Solution
Main memory address length is 16 bits.
BLOCK field is 3 bits (8 blocks).
WORD field is 7 bits (128 words/block).
TAG field is 6 bits.

25/03/2020 Cache Memory 66


25/03/2020 Cache Memory 67
Solution
Hence, the sequence of reads from the main
memory blocks into cache blocks is

As this sequence shows, both the beginning


and the end of the outer loop use blocks 0
and 1 in the cache. They overwrite each
other on each pass through the loop. Blocks
2 to 7 remain resident in the cache until the
outer loop is completed.

25/03/2020 Cache Memory 68


Solution
The total time for reading the blocks from
the main memory into the cache is therefore
(10 + 4 x 9 + 2) x 128 x 10τ = 61,440τ

Executing the program out of the cache:


Outer loop - inner loop =
[(1200 - 22) - (239 - 164)]10 1τ = 11,030τ
Inner loop = (239 - 164)200 x1τ = 15,000τ
End section of program =
1500 - 1200 = 300 x 1τ
Total execution time = 87,770τ

25/03/2020 Cache Memory 69


Example Set-associative Mapping
Problem 5.10

25/03/2020 Cache Memory 70


Solution
TAG field is 10 bits.
SET field is 4 bits.
WORD field is 6 bits.

Words 0, 1, 2,…., 4351 occupy blocks 0 to


67 in the main memory (MM).
After blocks 0, 1, 2,…., 63 have been read
from MM into the cache on the first pass, the
cache is full. Because of the fact that the
replacement algorithm is LRU, MM blocks
that occupy the first four sets of the 16
cache sets are always overwritten before
they can be used on a successive pass.

25/03/2020 Cache Memory 71


Solution
In particular, MM blocks 0, 16, 32, 48, and 64
continually displace each other in competing for the
4 block positions in cache set 0. The same thing
occurs in cache set 1 (MM blocks, 1, 17, 33, 49, 65),
cache set 2 (MM blocks 2, 18, 34, 50, 66) and cache
set 3 (MM blocks 3, 19, 35, 51, 67).
MM blocks that occupy the last 12 sets (sets 4
through 15) are fetched once on the first pass and
remain in the cache for the next 9 passes.
On the first pass, all 68 blocks of the loop must be
fetched from the MM. On each of the 9 successive
passes, blocks in the last 12 sets of the cache (4 x
12 = 48) are found in the cache, and the remaining
20 (68-48) blocks must be fetched from the MM

25/03/2020 Cache Memory 72


Solution
Time with Cache:
= 1 x 68 x 11τ + 9(20x11τ + 48x1τ)
=

Time without Cache:


= 10 x 68 x 10τ =

Improvement Factor = 2.15

25/03/2020 Cache Memory 73

You might also like