0% found this document useful (0 votes)
35 views19 pages

"Cache Memory" in (Microprocessor Systems and Interfacing)

The document discusses computer memory systems and cache memory. It describes how computer memory is organized in a hierarchy from fastest to slowest as registers, cache memory, main memory, hard disks, and removable media. Cache memory is intended to provide fast memory access like registers but at a lower cost than main memory. Cache memory stores copies of frequently accessed data from main memory. Modern processors typically have multiple levels of cache memory, such as L1, L2, and L3 caches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views19 pages

"Cache Memory" in (Microprocessor Systems and Interfacing)

The document discusses computer memory systems and cache memory. It describes how computer memory is organized in a hierarchy from fastest to slowest as registers, cache memory, main memory, hard disks, and removable media. Cache memory is intended to provide fast memory access like registers but at a lower cost than main memory. Cache memory stores copies of frequently accessed data from main memory. Modern processors typically have multiple levels of cache memory, such as L1, L2, and L3 caches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

“Cache Memory”

in
[ Microprocessor Systems and Interfacing ]

Lecture-21

M. M. Yasin
[email protected]
“Computer Memory System”
Computer memory is organized into a hierarchy:
At the highest level (closest to the processor), are the
processor Registers.
Next comes one or more levels of Cache. When multiple
levels are used, they are denoted L1, L2, and so on.
Next comes Main Memory, which is usually made out of
dynamic random-access memory (DRAM).
All of these are considered internal to the computer system.

The hierarchy continues with external memory, with the


next level typically being a fixed Hard Disk,
and one or more levels below that consisting of removable
media such as Optical Disks and Tape.

Fall 2019 - M. M. Yasin 1.2


“Computer Memory System”
(frequency of access
by the processor) (cost/bit) (capacity) (access time)
decreases decreases increases increases
Register
Cache
Main Memory
Hard Disk
Optical Disk and Tape

Note: It would be nice to use only the fastest memory, but because
that is the most expensive memory, we trade off access time for cost by
using more of the slower memory.

Fall 2019 - M. M. Yasin 1.3


The key to the success of this organization is:
“decreasing frequency of access
by the processor”.

Fall 2019 - M. M. Yasin 1.4


“Computer Memory System”
Few Characteristics of Memory:
1. Capacity
Internal: bytes/words
External: bytes
2. Unit of Transfer
Internal: No. of lines in/out of Memory. i.e., 64, 128
External: Blocks
3. Method of Accessing
Sequential Tapes
Direct Disks
Random Main Memory and some Cache
Associative some Cache

Fall 2019 - M. M. Yasin 1.5


“Computer Memory System”
Sequential:
Memory is organized into units of data, called records.
Access must be made in a specific linear sequence.

Direct:
Individual blocks or records have a unique address based on
physical location. Access is accomplished by direct access to
reach a general vicinity plus sequential searching, counting,
or waiting to reach the final location.

Fall 2019 - M. M. Yasin 1.6


“Computer Memory System”
Random Access:
Each addressable location in memory has a unique,
physically
wired-in addressing mechanism. Thus, any location can be
selected at random and directly addressed and accessed.

Associative:
This is a random access type of memory that enables one to
make a comparison of desired bit locations within a word for
a specified match. Thus, a word is retrieved based on a
portion of its contents rather than its address. As with
ordinary random-access memory, each location has its own
addressing mechanism, and retrieval time is constant
independent of location.
Fall 2019 - M. M. Yasin 1.7
“Cache Memory Principles”
Cache memory is intended to give memory speed
approaching that of the fastest memories available, and at
the same time provide a large memory size at the price of
less expensive memories.

The cache contains a copy of portions of main memory.


When the processor attempts to read a word of memory, a
check is made to determine if the word is in the cache. If so,
the word is delivered to the processor.
If not, a block of main memory, consisting of some fixed
number of words, is read into the cache and then the word is
delivered to the processor.

Fall 2019 - M. M. Yasin 1.8


“Cache Memory Principles”
Because of the phenomenon of locality of reference,
when a block of data is fetched into the cache to satisfy a
single memory reference, it is likely that there will be future
references to that same memory location or to other words
in the block.

Figure 4.3b depicts the use of multiple levels of cache. The


L2 cache is slower and typically larger than the L1 cache, and
the L3 cache is slower and typically larger than the L2 cache.

Fall 2019 - M. M. Yasin 1.9


Fall 2019 - M. M. Yasin 1.10
“Few Elements of Cache Design”
(1) Cache Size
We would like the size of the cache to be
small enough so that the overall average cost per bit is close
to that of main memory alone and
large enough so that the overall average access time is close
to that of the cache alone.
The larger the cache, the larger the number of gates
involved in addressing the cache. The result is that large
caches tend to be slightly slower than small ones.

Fall 2019 - M. M. Yasin 1.11


“Few Elements of Cache Design”
(2) Number of Caches
When caches were originally introduced, the typical system
had a single cache. More recently, the use of multiple caches
has become the norm.
Two aspects of this design issue concern the number of
levels of caches and the use of unified versus split caches.

[2-1] Multilevel Cache: As logic density has increased, it has


become possible to have a cache on the same chip as the
processor: the on-chip cache. Compared with a cache
reachable via an external bus, the on-chip cache reduces the
processor’s external bus activity and therefore speeds up
execution times and increases overall system performance.

Fall 2019 - M. M. Yasin 1.12


“Few Elements of Cache Design”
The inclusion of an off-chip or external cache is still desirable.
Now a days, designs include both on-chip and external
caches. The simplest such organization is known as a two-
level cache, with the internal cache designated as level 1 (L1)
and the external cache designated as level 2 (L2).
The reason for including an L2 cache is the following:
If there is no L2 cache and the processor makes an access
request for a memory location not in the L1 cache, then the
processor must access Main Memory across the bus. Due to
the typically slow bus speed and slow memory access time,
this results in poor performance. On the other hand, if an L2
cache is used, then frequently the missing information can be
quickly retrieved.

Fall 2019 - M. M. Yasin 1.13


“Few Elements of Cache Design”
With the increasing availability of on-chip area available for
cache, recent microprocessors have moved the L2 cache onto
the processor chip and added an L3 cache.
Originally, the L3 cache was accessible over the external bus.
More recently, most microprocessors have incorporated an
on-chip L3 cache.
In either case, there appears to be a performance advantage
to adding the third level.

Fall 2019 - M. M. Yasin 1.14


“Few Elements of Cache Design”
[2-2] Unified versus Split Cache: Initially, on-chip cache used to
store references to both data and instructions.
More recently, it has become common to split the cache into
two: one dedicated to instructions and one dedicated to data.
These two caches both exist at the same level, typically as
two L1 caches.
When the processor attempts to fetch an instruction from
main memory, it first consults the instruction L1 cache, and
when the processor attempts to fetch data from main
memory, it first consults the data L1 cache.

Fall 2019 - M. M. Yasin 1.15


“Few Elements of Cache Design”
There are two potential advantages of a unified cache:
1. For a given cache size, a unified cache has a higher hit rate
than split caches because it balances the load between
instruction and data fetches automatically. That is, if an
execution pattern involves many more instruction fetches
than data fetches, then the cache will tend to fill up with
instructions, and if an execution pattern involves relatively
more data fetches, the opposite will occur.
2. Only one cache needs to be designed and implemented.

Fall 2019 - M. M. Yasin 1.16


“Few Elements of Cache Design”
Key advantage of split cache is:
1. It eliminates the dispute for the cache between the
instruction fetch/decode unit and the execution unit.
Detail: When the execution unit performs a memory access
to load and store data, the request is submitted to the unified
cache. If, at the same time, the instruction pre-fetcher issues
a read request to the cache for an instruction, that request
will be temporarily blocked so that the cache can service the
execution unit first, enabling it to complete the currently
executing instruction.
This cache contention/dispute can degrade performance by
interfering with efficient use of the instruction pipeline. The
split cache structure overcomes this difficulty.

Fall 2019 - M. M. Yasin 1.17


“Few Elements of Cache Design”
Despite these advantages, the trend is toward split caches,
particularly for superscalar machines such as the Pentium
and PowerPC, which emphasize parallel instructions
execution and the pre-fetching of predicted future
instructions.

Fall 2019 - M. M. Yasin 1.18


“Computer Memory System”
Information about memory access:

Registers - work at clock speed of CPU


L1 cache - a few nanoseconds
L2 cache - a few more nanoseconds
L3 cache - (on some processors)
RAM - 10s to 100s of nanoseconds

Disk controller memory - midway between disk and PC


Hard disk itself - 10s of milliseconds
Offline storage (tape) - seconds to minutes

Fall 2019 - M. M. Yasin 1.19

You might also like