Irtual Ddress Aches: Part 1: Problems and Solutions in Uniprocessors
Irtual Ddress Aches: Part 1: Problems and Solutions in Uniprocessors
VIRTUAL-ADDRESS CACHES
Part 1: Problems and Solutions in Uniprocessors
Michel Cekleov
Sun Microsystems
Michel Dubois
University of Southern
California
IEEE Micro
Virtual memory
In a typical virtual memory system, each
program is compiled in a virtual space,
which is dynamically mapped onto the
physical memory of the machine at runtime.
Thus all processes have separate virtual
spaces. A process context may host multiple
execution threads, which can be executed
concurrently on different processors. In this
case, the threads share the resources of the
context, including its virtual memory space.
Paging. At the lowest level, virtual memory is divided into equal chunks of consecutive memory locations called virtual pages
(or simply, pages). Pages are dynamically
mapped onto physical pages (or page
frames) in main memory through a set of
translation tables called page tables. Pages
are brought into page frames on demand
(demand-paging) as processes need them.
An access to a page not resident in memory triggers a page fault, which the processor
treats as an exception. A software page fault
handler swaps the missing page in memory
and validates the new virtual-to-physical
Glossary
Cache indexing: First phase of the cache access in which
the least significant bits of the block address are
used to select and fetch the cache directory entries
in the accessed set
Page: Unit of memory allocation in a virtual memory system
Page table: Dictionary of virtual-to-physical address translations accessed with the virtual page number and
yielding the physical page number
Virtual address: Processor-computed address used to
access the page tables
Physical address: Address in physical memory obtained
after translation of the virtual address in the page
tables
P/P cache: Physical cache indexed and tagged with bits
from the physical address
V/P cache: Cache indexed with virtual-address bits but
tagged with physical bits
P/V cache: Cache indexed with physical bits but tagged
with virtual bits
V/V cache: Cache indexed and tagged with virtual bits
Physical cache: P/P cache
Virtual cache: V/P, P/V, or V/V cache
Superset bits: Bits used to index the cache and which are
part of the page number
Page color: Number defined by the superset bits (either
physical or virtual)
Victim block: Block selected for replacement on a cache
miss
Cache coherence: Property of a shared-memory system
that all cached copies of the same memory block
contain the latest value of the block
S/U
RWX
Virtual page number
Displacement
address translation.3
Besides its primary role of transparent memory management, the
Page table
virtual-address translation mechanism conveniently supports protection because it is in the required path
of all memory accesses. Protection
Page table entry (PTE)
enforces access rights of processes
Check
September/October 1997
65
.
Virtual-address caches
p_id
RWX
Misc.
66
IEEE Micro
Word
Compare tags
PPN
DATA
Data Tags
VPN
PPN
Disp
Disp
VPN
Disp
Compare tags
Word
PPN
DATA
Data Tags
Disp
VPN
Disp
VPN
September/October 1997
67
.
Virtual-address caches
68
IEEE Micro
p_1
the task of maintaining consistency in the presence of synonyms. These solutions enforce the rule that only one copy
of a block is present in a cache at any one time, even if the
block is read only. To achieve this, the hardware removes
synonyms from the cache on a miss. The basic approach
does not rely on any special-purpose hardware.
At the time of a cache miss, the cache controller must search
for a synonym in every set in the superset (including the
indexed set). The controller visits the sets one by one until it
finds a synonym or until it has visited all the sets. In a V/P
cache this search is efficient. It exploits the hardware available for normal cache accesses: all the physical tags in one
set are matched against the TLB entry in one cache cycle. In
contrast, in a V/V cache every virtual tag in each set of the
superset must be translated one by one in the TLB, and the
tags physical address must be compared to the physical
address of the processor. If, during this search, the controller
finds the block in the cache under a synonym, we say that the
cache miss is a short miss. The controller simply retags and/or
moves the block within the cache, and aborts the memory
fetch.
The operating system kernel should strive to align as many
synonyms as possible to lessen the miss penalty. In fact, an
access to a block present in a V/P cache under a different
(but aligned) synonym always hits, and the presence of a
synonym is transparent to the hardware. In a V/V cache,
when the controller finds an aligned synonym in the indexed
set, it simply retags the block. In the ideal case, when all synonyms are constrained to be aligned, the miss penalty is further cut, as the search for synonyms is limited to the indexed
set. In particular, in a V/P cache, the hardware can simply
ignore synonyms altogether when they are all aligned.
Dynamic synonym detectionreverse maps. The
basic approach to detect synonyms may be very slow, especially when some synonyms are not aligned and the superset and the cache are large. In practical terms, the size of the
cache is still limited because the search through the superset must be completed before the memory block is retrieved.
Thus, researchers have proposed many ways to find synonyms fast, independently of the cache size. They all rely
on a reverse map indexed with physical addresses.
One solution is based on a copy of the main cache directory called the dual directory and accessed with physical
addresses. Ideally, we can imagine a fully associative dual
directory with as many entries as there are block frames in the
virtual cache. Each entry of the dual directory contains a backpointer to the block frame in the cache where the block is
stored. On a miss in the cache, the dual directory is accessed
with the physical address obtained from the TLB. A valid
backpointer to the main cache points to the location of a synonym. Overall this solution is very expensive in uniprocessor
systems, but may make sense in bus-based multiprocessors
with snooping protocols,17 where a dual directory is present
to maintain coherence. We will therefore examine this solution more closely in the context of multiprocessors.
A second organization for the reverse map is an unalignedsynonym cache. We have seen that the miss penalty is
reduced or even eliminated when all synonyms are aligned.
However, it may not be possible nor desirable to align all
September/October 1997
69
.
Virtual-address caches
Page-mapping changes
Since these changes create homonyms and aliases in virtual caches, we need to clarify the consistency issues and
the possible solutions.
V/P caches. In a V/P cache, homonyms created by mapping changes are not a problem since the TLB is checked on
every access. When pages are not V-P aligned, the hardware
must take care of unaligned aliases by dynamic synonym
detection. Consider the example of Figure 5b. If v_1 and v_2
index in different sets, stale copies of some blocks of p_1
may exist. These stale copies may overwrite page frame p_1
on replacement when they are written back later. Moreover,
if p_1 is eventually remapped to a page indexing in the same
set as v_1, accesses to p_1 could hit on a stale copy.
Say that no hardware support exists for unaligned synonyms and pages are not V-P aligned. Then the software
must flush (write-back cache) or purge (write-through cache)
70
IEEE Micro
Software restrictions
P/P
Synonyms
Cache types
V/P
P/V
V/V
P/P
No synonym + V-P
alignment
No synonym
Ignore
Ignore
Ignore
Ignore
Ignore
Ignore
Ignore
Ignore
Ignore
V-P alignment
Ignore
Ignore
Ignore
Ignore
Unaligned synonyms
Ignore
Flush or
detect
Flush or
detect
Flush or
detect
Flush or
detect
Ignore
Aligned synonyms
Flush or
detect
Flush or
detect
Flush or
detect
Ignore
Ignore
Mapping change
Cache types
V/P
P/V
Ignore
Flush or
detect
Ignore
Flush or
detect
Flush or
detect
V/V
Flush
Flush
Flush
Flush
Flush
Flush
Flush
Flush
Flush
Flush
Hardware support
sor servers.
Cekleov received his engineering degree from Ecole
Suprieure dIngnieurs de Marseille and his doctorate from
Ecole Nationale Suprieure des Tlcommunications.
WE
Medium 165
High 166
September/October 1997
71