0% found this document useful (0 votes)
210 views62 pages

Intel Segmentation and Paging PDF

Uploaded by

Soumitra Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
210 views62 pages

Intel Segmentation and Paging PDF

Uploaded by

Soumitra Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

CHAPTER 3

PROTECTED-MODE MEMORY MANAGEMENT

This chapter describes the I ntel 64 and I A-32 architecture’s protected-mode memory management facilities,
including the physical memory requirements, segmentation mechanism, and paging mechanism.
See also: Chapter 5, “Protection” (for a description of the processor’s protection mechanism) and Chapter 20,
“8086 Emulation” (for a description of memory addressing protection in real-address and virtual-8086 modes).

3.1 MEMORY MANAGEMENT OVERVIEW


The memory management facilities of the I A-32 architecture are divided into two parts: segmentation and paging.
Segmentation provides a mechanism of isolating individual code, data, and stack modules so that multiple
programs (or tasks) can run on the same processor without interfering with one another. Paging provides a mech-
anism for implementing a conventional demand-paged, virtual-memory system where sections of a program’s
execution environment are mapped into physical memory as needed. Paging can also be used to provide isolation
between multiple tasks. When operating in protected mode, some form of segmentation must be used. There is
no mode bit to disable segmentation. The use of paging, however, is optional.
These two mechanisms (segmentation and paging) can be configured to support simple single-program (or single-
task) systems, multitasking systems, or multiple-processor systems that used shared memory.
As shown in Figure 3-1, segmentation provides a mechanism for dividing the processor’s addressable memory
space (called the linear address space) into smaller protected address spaces called segments. Segments can
be used to hold the code, data, and stack for a program or to hold system data structures (such as a TSS or LDT).
I f more than one program (or task) is running on a processor, each program can be assigned its own set of
segments. The processor then enforces the boundaries between these segments and insures that one program
does not interfere with the execution of another program by writing into the other program’s segments. The
segmentation mechanism also allows typing of segments so that the operations that may be performed on a partic-
ular type of segment can be restricted.
All the segments in a system are contained in the processor’s linear address space. To locate a byte in a particular
segment, a logical address (also called a far pointer) must be provided. A logical address consists of a segment
selector and an offset. The segment selector is a unique identifier for a segment. Among other things it provides an
offset into a descriptor table (such as the global descriptor table, GDT) to a data structure called a segment
descriptor. Each segment has a segment descriptor, which specifies the size of the segment, the access rights and
privilege level for the segment, the segment type, and the location of the first byte of the segment in the linear
address space (called the base address of the segment). The offset part of the logical address is added to the base
address for the segment to locate a byte within the segment. The base address plus the offset thus forms a linear
address in the processor’s linear address space.

Vol. 3A 3-1
PROTECTED-MODE MEMORY MANAGEMENT

Logical Address
(or Far Pointer)

Segment
Selector Offset Linear Address
Space

Linear Address
Global Descriptor
Dir Table Offset Physical
Table (GDT)
Address
Space
Segment
Page Table Page
Segment
Descriptor
Page Directory Phy. Addr.
Lin. Addr.
Entry
Entry
Segment
Base Address

Page

Segmentation Paging

Figure 3-1. Segmentation and Paging

I f paging is not used, the linear address space of the processor is mapped directly into the physical address space
of processor. The physical address space is defined as the range of addresses that the processor can generate on
its address bus.
Because multitasking computing systems commonly define a linear address space much larger than it is economi-
cally feasible to contain all at once in physical memory, some method of “virtualizing” the linear address space is
needed. This virtualization of the linear address space is handled through the processor’s paging mechanism.
Paging supports a “virtual memory” environment where a large linear address space is simulated with a small
amount of physical memory (RAM and ROM) and some disk storage. When using paging, each segment is divided
into pages (typically 4 KBytes each in size), which are stored either in physical memory or on the disk. The oper-
ating system or executive maintains a page directory and a set of page tables to keep track of the pages. When a
program (or task) attempts to access an address location in the linear address space, the processor uses the page
directory and page tables to translate the linear address into a physical address and then performs the requested
operation (read or write) on the memory location.
I f the page being accessed is not currently in physical memory, the processor interrupts execution of the program
(by generating a page-fault exception). The operating system or executive then reads the page into physical
memory from the disk and continues executing the program.
When paging is implemented properly in the operating-system or executive, the swapping of pages between phys-
ical memory and the disk is transparent to the correct execution of a program. Even programs written for 16-bit I A-
32 processors can be paged (transparently) when they are run in virtual-8086 mode.

3.2 USING SEGMENTS


The segmentation mechanism supported by the I A-32 architecture can be used to implement a wide variety of
system designs. These designs range from flat models that make only minimal use of segmentation to protect

3-2 Vol. 3A
PROTECTED-MODE MEMORY MANAGEMENT

programs to multi-segmented models that employ segmentation to create a robust operating environment in
which multiple programs and tasks can be executed reliably.
The following sections give several examples of how segmentation can be employed in a system to improve
memory management performance and reliability.

3.2.1 Basic Flat Model


The simplest memory model for a system is the basic “flat model,” in which the operating system and application
programs have access to a continuous, unsegmented address space. To the greatest extent possible, this basic flat
model hides the segmentation mechanism of the architecture from both the system designer and the application
programmer.
To implement a basic flat memory model with the I A-32 architecture, at least two segment descriptors must be
created, one for referencing a code segment and one for referencing a data segment (see Figure 3-2). Both of
these segments, however, are mapped to the entire linear address space: that is, both segment descriptors have
the same base address value of 0 and the same segment limit of 4 GBytes. By setting the segment limit to 4
GBytes, the segmentation mechanism is kept from generating exceptions for out of limit memory references, even
if no physical memory resides at a particular address. ROM (EPROM) is generally located at the top of the physical
address space, because the processor begins execution at FFFF_ FFF0H. RAM (DRAM) is placed at the bottom of the
address space because the initial base address for the DS data segment after reset initialization is 0.

3.2.2 Protected Flat Model


The protected flat model is similar to the basic flat model, except the segment limits are set to include only the
range of addresses for which physical memory actually exists (see Figure 3-3). A general-protection exception
(#GP) is then generated on any attempt to access nonexistent memory. This model provides a minimum level of
hardware protection against some kinds of program bugs.

Linear Address Space


(or Physical Memory)
Segment FFFFFFFFH
Registers Code
CS Code- and Data-Segment
Descriptors Not Present
SS

DS Access Limit Data and


ES Base Address Stack 0

FS

GS

Figure 3-2. Flat Model

Vol. 3A 3-3
PROTECTED-MODE MEMORY MANAGEMENT

Segment Linear Address Space


Descriptors (or Physical Memory)
Segment Access Limit
Registers Code FFFFFFFFH
Base Address
CS
Not Present
ES
Memory I/O
SS Access Limit
DS Base Address
Data and
FS Stack

GS 0

Figure 3-3. Protected Flat Model

More complexity can be added to this protected flat model to provide more protection. For example, for the paging
mechanism to provide isolation between user and supervisor code and data, four segments need to be defined:
code and data segments at privilege level 3 for the user, and code and data segments at privilege level 0 for the
supervisor. Usually these segments all overlay each other and start at address 0 in the linear address space. This
flat segmentation model along with a simple paging structure can protect the operating system from applications,
and by adding a separate paging structure for each task or process, it can also protect applications from each other.
Similar designs are used by several popular multitasking operating systems.

3.2.3 Multi-Segment Model


A multi-segment model (such as the one shown in Figure 3-4) uses the full capabilities of the segmentation mech-
anism to provided hardware enforced protection of code, data structures, and programs and tasks. Here, each
program (or task) is given its own table of segment descriptors and its own segments. The segments can be
completely private to their assigned programs or shared among programs. Access to all segments and to the
execution environments of individual programs running on the system is controlled by hardware.

3-4 Vol. 3A
PROTECTED-MODE MEMORY MANAGEMENT

Segment Segment Linear Address Space


Registers Descriptors (or Physical Memory)
Access Limit
CS
Base Address Stack
Access Limit
SS
Base Address

Access Limit
DS
Base Address Code
Access Limit
ES
Base Address
Data
Access Limit
FS
Base Address
Data
Access Limit
GS
Base Address
Data
Access Limit
Base Address

Access Limit
Base Address
Data
Access Limit
Base Address

Access Limit
Base Address

Figure 3-4. Multi-Segment Model

Access checks can be used to protect not only against referencing an address outside the limit of a segment, but
also against performing disallowed operations in certain segments. For example, since code segments are desig-
nated as read-only segments, hardware can be used to prevent writes into code segments. The access rights infor-
mation created for segments can also be used to set up protection rings or levels. Protection levels can be used to
protect operating-system procedures from unauthorized access by application programs.

3.2.4 Segmentation in IA-32e Mode


I n I A-32e mode of I ntel 64 architecture, the effects of segmentation depend on whether the processor is running
in compatibility mode or 64-bit mode. I n compatibility mode, segmentation functions just as it does using legacy
16-bit or 32-bit protected mode semantics.
I n 64-bit mode, segmentation is generally (but not completely) disabled, creating a flat 64-bit linear-address
space. The processor treats the segment base of CS, DS, ES, SS as zero, creating a linear address that is equal to
the effective address. The FS and GS segments are exceptions. These segment registers (which hold the segment
base) can be used as an additional base registers in linear address calculations. They facilitate addressing local
data and certain operating system data structures.
Note that the processor does not perform segment limit checks at runtime in 64-bit mode.

3.2.5 Paging and Segmentation


Paging can be used with any of the segmentation models described in Figures 3-2, 3-3, and 3-4. The processor’s
paging mechanism divides the linear address space (into which segments are mapped) into pages (as shown in
Figure 3-1). These linear-address-space pages are then mapped to pages in the physical address space. The
paging mechanism offers several page-level protection facilities that can be used with or instead of the segment-

Vol. 3A 3-5
PROTECTED-MODE MEMORY MANAGEMENT

protection facilities. For example, it lets read-write protection be enforced on a page-by-page basis. The paging
mechanism also provides two-level user-supervisor protection that can also be specified on a page-by-page basis.

3.3 PHYSICAL ADDRESS SPACE


I n protected mode, the I A-32 architecture provides a normal physical address space of 4 GBytes (2 32 bytes). This
is the address space that the processor can address on its address bus. This address space is flat (unsegmented),
with addresses ranging continuously from 0 to FFFFFFFFH. This physical address space can be mapped to read-
write memory, read-only memory, and memory mapped I / O. The memory mapping facilities described in this
chapter can be used to divide this physical memory up into segments and/ or pages.
Starting with the Pentium Pro processor, the I A-32 architecture also supports an extension of the physical address
space to 236 bytes (64 GBytes); with a maximum physical address of FFFFFFFFFH. This extension is invoked in
either of two ways:
• Using the physical address extension (PAE) flag, located in bit 5 of control register CR4.
• Using the 36-bit page size extension (PSE-36) feature (introduced in the Pentium III processors).
Physical address support has since been extended beyond 36 bits. See Chapter 4, “Paging” for more information
about 36-bit physical addressing.

3.3.1 Intel® 64 Processors and Physical Address Space


On processors that support I ntel 64 architecture (CPUI D.80000001: EDX[29] = 1), the size of the physical address
range is implementation-specific and indicated by CPUI D.80000008H: EAX[bits 7-0].
For the format of information returned in EAX, see “CPUI D—CPU I dentification” in Chapter 3 of the I ntel® 64 and
I A-32 Architectures Software Developer’s Manual, Volume 2A. See also: Chapter 4, “Paging.”

3.4 LOGICAL AND LINEAR ADDRESSES


At the system-architecture level in protected mode, the processor uses two stages of address translation to arrive
at a physical address: logical-address translation and linear address space paging.
Even with the minimum use of segments, every byte in the processor’s address space is accessed with a logical
address. A logical address consists of a 16-bit segment selector and a 32-bit offset (see Figure 3-5). The segment
selector identifies the segment the byte is located in and the offset specifies the location of the byte in the segment
relative to the base address of the segment.
The processor translates every logical address into a linear address. A linear address is a 32-bit address in the
processor’s linear address space. Like the physical address space, the linear address space is a flat (unsegmented),
232-byte address space, with addresses ranging from 0 to FFFFFFFFH. The linear address space contains all the
segments and system tables defined for a system.
To translate a logical address into a linear address, the processor does the following:
1. Uses the offset in the segment selector to locate the segment descriptor for the segment in the GDT or LDT and
reads it into the processor. (This step is needed only when a new segment selector is loaded into a segment
register.)
2. Examines the segment descriptor to check the access rights and range of the segment to insure that the
segment is accessible and that the offset is within the limits of the segment.
3. Adds the base address of the segment from the segment descriptor to the offset to form a linear address.

3-6 Vol. 3A
PROTECTED-MODE MEMORY MANAGEMENT

15 0 31(63) 0
Logical Seg. Selector Offset (Effective Address)
Address

Descriptor Table

Segment Base Address


Descriptor
+

31(63) 0
Linear Address

Figure 3-5. Logical Address to Linear Address Translation

I f paging is not used, the processor maps the linear address directly to a physical address (that is, the linear
address goes out on the processor’s address bus). I f the linear address space is paged, a second level of address
translation is used to translate the linear address into a physical address.
See also: Chapter 4, “Paging.”

3.4.1 Logical Address Translation in IA-32e Mode


I n I A-32e mode, an I ntel 64 processor uses the steps described above to translate a logical address to a linear
address. I n 64-bit mode, the offset and base address of the segment are 64-bits instead of 32 bits. The linear
address format is also 64 bits wide and is subject to the canonical form requirement.
Each code segment descriptor provides an L bit. This bit allows a code segment to execute 64-bit code or legacy
32-bit code by code segment.

3.4.2 Segment Selectors


A segment selector is a 16-bit identifier for a segment (see Figure 3-6). I t does not point directly to the segment,
but instead points to the segment descriptor that defines the segment. A segment selector contains the following
items:
I ndex (Bits 3 through 15) — Selects one of 8192 descriptors in the GDT or LDT. The processor multiplies
the index value by 8 (the number of bytes in a segment descriptor) and adds the result to the base
address of the GDT or LDT (from the GDTR or LDTR register, respectively).
TI (table indicator) flag
(Bit 2) — Specifies the descriptor table to use: clearing this flag selects the GDT; setting this flag
selects the current LDT.

15 3 2 1 0

Index T RPL
I

Table Indicator
0 = GDT
1 = LDT
Requested Privilege Level (RPL)

Figure 3-6. Segment Selector

Vol. 3A 3-7
PROTECTED-MODE MEMORY MANAGEMENT

Requested Privilege Level (RPL)


(Bits 0 and 1) — Specifies the privilege level of the selector. The privilege level can range from 0 to
3, with 0 being the most privileged level. See Section 5.5, “Privilege Levels”, for a description of the
relationship of the RPL to the CPL of the executing program (or task) and the descriptor privilege
level (DPL) of the descriptor the segment selector points to.
The first entry of the GDT is not used by the processor. A segment selector that points to this entry of the GDT (that
is, a segment selector with an index of 0 and the TI flag set to 0) is used as a “null segment selector.” The processor
does not generate an exception when a segment register (other than the CS or SS registers) is loaded with a null
selector. I t does, however, generate an exception when a segment register holding a null selector is used to access
memory. A null selector can be used to initialize unused segment registers. Loading the CS or SS register with a null
segment selector causes a general-protection exception (#GP) to be generated.
Segment selectors are visible to application programs as part of a pointer variable, but the values of selectors are
usually assigned or modified by link editors or linking loaders, not application programs.

3.4.3 Segment Registers


To reduce address translation time and coding complexity, the processor provides registers for holding up to 6
segment selectors (see Figure 3-7). Each of these segment registers support a specific kind of memory reference
(code, stack, or data). For virtually any kind of program execution to take place, at least the code-segment (CS),
data-segment (DS), and stack-segment (SS) registers must be loaded with valid segment selectors. The processor
also provides three additional data-segment registers (ES, FS, and GS), which can be used to make additional data
segments available to the currently executing program (or task).
For a program to access a segment, the segment selector for the segment must have been loaded in one of the
segment registers. So, although a system can define thousands of segments, only 6 can be available for immediate
use. Other segments can be made available by loading their segment selectors into these registers during program
execution.

Visible Part Hidden Part


Segment Selector Base Address, Limit, Access Information CS
SS
DS
ES
FS
GS

Figure 3-7. Segment Registers

Every segment register has a “visible” part and a “hidden” part. (The hidden part is sometimes referred to as a
“descriptor cache” or a “shadow register.”) When a segment selector is loaded into the visible part of a segment
register, the processor also loads the hidden part of the segment register with the base address, segment limit, and
access control information from the segment descriptor pointed to by the segment selector. The information cached
in the segment register (visible and hidden) allows the processor to translate addresses without taking extra bus
cycles to read the base address and limit from the segment descriptor. I n systems in which multiple processors
have access to the same descriptor tables, it is the responsibility of software to reload the segment registers when
the descriptor tables are modified. I f this is not done, an old segment descriptor cached in a segment register might
be used after its memory-resident version has been modified.
Two kinds of load instructions are provided for loading the segment registers:
1. Direct load instructions such as the MOV, POP, LDS, LES, LSS, LGS, and LFS instructions. These instructions
explicitly reference the segment registers.

3-8 Vol. 3A
PROTECTED-MODE MEMORY MANAGEMENT

2. I mplied load instructions such as the far pointer versions of the CALL, J MP, and RET instructions, the SYSENTER
and SYSEXI T instructions, and the I RET, I NTn, I NTO and I NT3 instructions. These instructions change the
contents of the CS register (and sometimes other segment registers) as an incidental part of their operation.
The MOV instruction can also be used to store visible part of a segment register in a general-purpose register.

3.4.4 Segment Loading Instructions in IA-32e Mode


Because ES, DS, and SS segment registers are not used in 64-bit mode, their fields (base, limit, and attribute) in
segment descriptor registers are ignored. Some forms of segment load instructions are also invalid (for example,
LDS, POP ES). Address calculations that reference the ES, DS, or SS segments are treated as if the segment base
is zero.
The processor checks that all linear-address references are in canonical form instead of performing limit checks.
Mode switching does not change the contents of the segment registers or the associated descriptor registers.
These registers are also not changed during 64-bit mode execution, unless explicit segment loads are performed.
I n order to set up compatibility mode for an application, segment-load instructions (MOV to Sreg, POP Sreg) work
normally in 64-bit mode. An entry is read from the system descriptor table (GDT or LDT) and is loaded in the
hidden portion of the segment descriptor register. The descriptor-register base, limit, and attribute fields are all
loaded. However, the contents of the data and stack segment selector and the descriptor registers are ignored.
When FS and GS segment overrides are used in 64-bit mode, their respective base addresses are used in the linear
address calculation: (FS or GS).base + index + displacement. FS.base and GS.base are then expanded to the full
linear-address size supported by the implementation. The resulting effective address calculation can wrap across
positive and negative addresses; the resulting linear address must be canonical.
I n 64-bit mode, memory accesses using FS-segment and GS-segment overrides are not checked for a runtime limit
nor subjected to attribute-checking. Normal segment loads (MOV to Sreg and POP Sreg) into FS and GS load a
standard 32-bit base value in the hidden portion of the segment descriptor register. The base address bits above
the standard 32 bits are cleared to 0 to allow consistency for implementations that use less than 64 bits.
The hidden descriptor register fields for FS.base and GS.base are physically mapped to MSRs in order to load all
address bits supported by a 64-bit implementation. Software with CPL = 0 (privileged software) can load all
supported linear-address bits into FS.base or GS.base using WRMSR. Addresses written into the 64-bit FS.base
and GS.base registers must be in canonical form. A WRMSR instruction that attempts to write a non-canonical
address to those registers causes a #GP fault.
When in compatibility mode, FS and GS overrides operate as defined by 32-bit mode behavior regardless of the
value loaded into the upper 32 linear-address bits of the hidden descriptor register base field. Compatibility mode
ignores the upper 32 bits when calculating an effective address.
A new 64-bit mode instruction, SWAPGS, can be used to load GS base. SWAPGS exchanges the kernel data struc-
ture pointer from the I A32_ KernelGSbase MSR with the GS base register. The kernel can then use the GS prefix on
normal memory references to access the kernel data structures. An attempt to write a non-canonical value (using
WRMSR) to the I A32_ KernelGSBase MSR causes a #GP fault.

3.4.5 Segment Descriptors


A segment descriptor is a data structure in a GDT or LDT that provides the processor with the size and location of
a segment, as well as access control and status information. Segment descriptors are typically created by
compilers, linkers, loaders, or the operating system or executive, but not application programs. Figure 3-8 illus-
trates the general descriptor format for all types of segment descriptors.

Vol. 3A 3-9
PROTECTED-MODE MEMORY MANAGEMENT

31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0

D A Seg. D
Base 31:24 G / L V Limit P P S Type Base 23:16 4
B L 19:16 L

31 16 15 0

Base Address 15:00 Segment Limit 15:00 0

L — 64-bit code segment (IA-32e mode only)


AVL — Available for use by system software
BASE — Segment base address
D/B — Default operation size (0 = 16-bit segment; 1 = 32-bit segment)
DPL — Descriptor privilege level
G — Granularity
LIMIT — Segment Limit
P — Segment present
S — Descriptor type (0 = system; 1 = code or data)
TYPE — Segment type

Figure 3-8. Segment Descriptor

The flags and fields in a segment descriptor are as follows:


Segment limit field
Specifies the size of the segment. The processor puts together the two segment limit fields to form
a 20-bit value. The processor interprets the segment limit in one of two ways, depending on the
setting of the G (granularity) flag:
• I f the granularity flag is clear, the segment size can range from 1 byte to 1 MByte, in byte incre-
ments.
• I f the granularity flag is set, the segment size can range from 4 KBytes to 4 GBytes, in 4-KByte
increments.
The processor uses the segment limit in two different ways, depending on whether the segment is
an expand-up or an expand-down segment. See Section 3.4.5.1, “Code- and Data-Segment
Descriptor Types”, for more information about segment types. For expand-up segments, the offset
in a logical address can range from 0 to the segment limit. Offsets greater than the segment limit
generate general-protection exceptions (#GP, for all segment other than SS) or stack-fault excep-
tions (#SS for the SS segment). For expand-down segments, the segment limit has the reverse
function; the offset can range from the segment limit plus 1 to FFFFFFFFH or FFFFH, depending on
the setting of the B flag. Offsets less than or equal to the segment limit generate general-protection
exceptions or stack-fault exceptions. Decreasing the value in the segment limit field for an expand-
down segment allocates new memory at the bottom of the segment's address space, rather than at
the top. I A-32 architecture stacks always grow downwards, making this mechanism convenient for
expandable stacks.
Base address fields
Defines the location of byte 0 of the segment within the 4-GByte linear address space. The
processor puts together the three base address fields to form a single 32-bit value. Segment base
addresses should be aligned to 16-byte boundaries. Although 16-byte alignment is not required,
this alignment allows programs to maximize performance by aligning code and data on 16-byte
boundaries.
Type field I ndicates the segment or gate type and specifies the kinds of access that can be made to the
segment and the direction of growth. The interpretation of this field depends on whether the
descriptor type flag specifies an application (code or data) descriptor or a system descriptor. The
encoding of the type field is different for code, data, and system descriptors (see Figure 5-1). See
Section 3.4.5.1, “Code- and Data-Segment Descriptor Types”, for a description of how this field is
used to specify code and data-segment types.

3-10 Vol. 3A
PROTECTED-MODE MEMORY MANAGEMENT

S (descriptor type) flag


Specifies whether the segment descriptor is for a system segment (S flag is clear) or a code or data
segment (S flag is set).
DPL (descriptor privilege level) field
Specifies the privilege level of the segment. The privilege level can range from 0 to 3, with 0 being
the most privileged level. The DPL is used to control access to the segment. See Section 5.5, “Priv-
ilege Levels”, for a description of the relationship of the DPL to the CPL of the executing code
segment and the RPL of a segment selector.
P (segment-present) flag
I ndicates whether the segment is present in memory (set) or not present (clear). I f this flag is clear,
the processor generates a segment-not-present exception (#NP) when a segment selector that
points to the segment descriptor is loaded into a segment register. Memory management software
can use this flag to control which segments are actually loaded into physical memory at a given
time. I t offers a control in addition to paging for managing virtual memory.
Figure 3-9 shows the format of a segment descriptor when the segment-present flag is clear. When
this flag is clear, the operating system or executive is free to use the locations marked “Available” to
store its own data, such as information regarding the whereabouts of the missing segment.
D/ B (default operation size/ default stack pointer size and/ or upper bound) flag
Performs different functions depending on whether the segment descriptor is an executable code
segment, an expand-down data segment, or a stack segment. (This flag should always be set to 1
for 32-bit code and data segments and to 0 for 16-bit code and data segments.)
• Executable code segment. The flag is called the D flag and it indicates the default length for
effective addresses and operands referenced by instructions in the segment. I f the flag is set,
32-bit addresses and 32-bit or 8-bit operands are assumed; if it is clear, 16-bit addresses and
16-bit or 8-bit operands are assumed.
The instruction prefix 66H can be used to select an operand size other than the default, and the
prefix 67H can be used select an address size other than the default.
• Stack segment (data segment pointed to by the SS register). The flag is called the B (big)
flag and it specifies the size of the stack pointer used for implicit stack operations (such as
pushes, pops, and calls). I f the flag is set, a 32-bit stack pointer is used, which is stored in the
32-bit ESP register; if the flag is clear, a 16-bit stack pointer is used, which is stored in the 16-
bit SP register. I f the stack segment is set up to be an expand-down data segment (described in
the next paragraph), the B flag also specifies the upper bound of the stack segment.
• Expand-down data segment. The flag is called the B flag and it specifies the upper bound of
the segment. I f the flag is set, the upper bound is FFFFFFFFH (4 GBytes); if the flag is clear, the
upper bound is FFFFH (64 KBytes).

31 16 15 14 13 12 11 8 7 0

D
Available 0 P S Type Available 4
L

31 0

Available 0

Figure 3-9. Segment Descriptor When Segment-Present Flag Is Clear

G (granularity) flag
Determines the scaling of the segment limit field. When the granularity flag is clear, the segment
limit is interpreted in byte units; when flag is set, the segment limit is interpreted in 4-KByte units.
(This flag does not affect the granularity of the base address; it is always byte granular.) When the
granularity flag is set, the twelve least significant bits of an offset are not tested when checking the

Vol. 3A 3-11
PROTECTED-MODE MEMORY MANAGEMENT

offset against the segment limit. For example, when the granularity flag is set, a limit of 0 results in
valid offsets from 0 to 4095.
L (64-bit code segment) flag
I n I A-32e mode, bit 21 of the second doubleword of the segment descriptor indicates whether a
code segment contains native 64-bit code. A value of 1 indicates instructions in this code segment
are executed in 64-bit mode. A value of 0 indicates the instructions in this code segment are
executed in compatibility mode. I f L-bit is set, then D-bit must be cleared. When not in I A-32e mode
or for non-code segments, bit 21 is reserved and should always be set to 0.
Available and reserved bits
Bit 20 of the second doubleword of the segment descriptor is available for use by system software.

3.4.5.1 Code- and Data-Segment Descriptor Types


When the S (descriptor type) flag in a segment descriptor is set, the descriptor is for either a code or a data
segment. The highest order bit of the type field (bit 11 of the second double word of the segment descriptor) then
determines whether the descriptor is for a data segment (clear) or a code segment (set).
For data segments, the three low-order bits of the type field (bits 8, 9, and 10) are interpreted as accessed (A),
write-enable (W), and expansion-direction (E). See Table 3-1 for a description of the encoding of the bits in the
type field for code and data segments. Data segments can be read-only or read/ write segments, depending on the
setting of the write-enable bit.

Table 3-1. Code- and Data-Segment Types


Type Field Descriptor Description
Type
Decimal 11 10 9 8
E W A
0 0 0 0 0 Data Read-Only
1 0 0 0 1 Data Read-Only, accessed
2 0 0 1 0 Data Read/Write
3 0 0 1 1 Data Read/Write, accessed
4 0 1 0 0 Data Read-Only, expand-down
5 0 1 0 1 Data Read-Only, expand-down, accessed
6 0 1 1 0 Data Read/Write, expand-down
7 0 1 1 1 Data Read/Write, expand-down, accessed
C R A
8 1 0 0 0 Code Execute-Only
9 1 0 0 1 Code Execute-Only, accessed
10 1 0 1 0 Code Execute/Read
11 1 0 1 1 Code Execute/Read, accessed
12 1 1 0 0 Code Execute-Only, conforming
13 1 1 0 1 Code Execute-Only, conforming, accessed
14 1 1 1 0 Code Execute/Read, conforming
15 1 1 1 1 Code Execute/Read, conforming, accessed

Stack segments are data segments which must be read/ write segments. Loading the SS register with a segment
selector for a nonwritable data segment generates a general-protection exception (#GP). I f the size of a stack
segment needs to be changed dynamically, the stack segment can be an expand-down data segment (expansion-
direction flag set). Here, dynamically changing the segment limit causes stack space to be added to the bottom of

3-12 Vol. 3A
PROTECTED-MODE MEMORY MANAGEMENT

the stack. I f the size of a stack segment is intended to remain static, the stack segment may be either an expand-
up or expand-down type.
The accessed bit indicates whether the segment has been accessed since the last time the operating-system or
executive cleared the bit. The processor sets this bit whenever it loads a segment selector for the segment into a
segment register, assuming that the type of memory that contains the segment descriptor supports processor
writes. The bit remains set until explicitly cleared. This bit can be used both for virtual memory management and
for debugging.
For code segments, the three low-order bits of the type field are interpreted as accessed (A), read enable (R), and
conforming (C). Code segments can be execute-only or execute/ read, depending on the setting of the read-enable
bit. An execute/ read segment might be used when constants or other static data have been placed with instruction
code in a ROM. Here, data can be read from the code segment either by using an instruction with a CS override
prefix or by loading a segment selector for the code segment in a data-segment register (the DS, ES, FS, or GS
registers). I n protected mode, code segments are not writable.
Code segments can be either conforming or nonconforming. A transfer of execution into a more-privileged
conforming segment allows execution to continue at the current privilege level. A transfer into a nonconforming
segment at a different privilege level results in a general-protection exception (#GP), unless a call gate or task
gate is used (see Section 5.8.1, “Direct Calls or J umps to Code Segments”, for more information on conforming and
nonconforming code segments). System utilities that do not access protected facilities and handlers for some types
of exceptions (such as, divide error or overflow) may be loaded in conforming code segments. Utilities that need to
be protected from less privileged programs and procedures should be placed in nonconforming code segments.

NOTE
Execution cannot be transferred by a call or a jump to a less-privileged (numerically higher
privilege level) code segment, regardless of whether the target segment is a conforming or
nonconforming code segment. Attempting such an execution transfer will result in a general-
protection exception.

All data segments are nonconforming, meaning that they cannot be accessed by less privileged programs or proce-
dures (code executing at numerically high privilege levels). Unlike code segments, however, data segments can be
accessed by more privileged programs or procedures (code executing at numerically lower privilege levels) without
using a special access gate.
I f the segment descriptors in the GDT or an LDT are placed in ROM, the processor can enter an indefinite loop if
software or the processor attempts to update (write to) the ROM-based segment descriptors. To prevent this
problem, set the accessed bits for all segment descriptors placed in a ROM. Also, remove operating-system or
executive code that attempts to modify segment descriptors located in ROM.

3.5 SYSTEM DESCRIPTOR TYPES


When the S (descriptor type) flag in a segment descriptor is clear, the descriptor type is a system descriptor. The
processor recognizes the following types of system descriptors:
• Local descriptor-table (LDT) segment descriptor.
• Task-state segment (TSS) descriptor.
• Call-gate descriptor.
• I nterrupt-gate descriptor.
• Trap-gate descriptor.
• Task-gate descriptor.
These descriptor types fall into two categories: system-segment descriptors and gate descriptors. System-
segment descriptors point to system segments (LDT and TSS segments). Gate descriptors are in themselves
“gates,” which hold pointers to procedure entry points in code segments (call, interrupt, and trap gates) or which
hold segment selectors for TSS’s (task gates).

Vol. 3A 3-13
PROTECTED-MODE MEMORY MANAGEMENT

Table 3-2 shows the encoding of the type field for system-segment descriptors and gate descriptors. Note that
system descriptors in I A-32e mode are 16 bytes instead of 8 bytes.

Table 3-2. System-Segment and Gate-Descriptor Types


Type Field Description
Decimal 11 10 9 8 32-Bit Mode IA-32e Mode
0 0 0 0 0 Reserved Upper 8 byte of an 16-byte
descriptor
1 0 0 0 1 16-bit TSS (Available) Reserved
2 0 0 1 0 LDT LDT
3 0 0 1 1 16-bit TSS (Busy) Reserved
4 0 1 0 0 16-bit Call Gate Reserved
5 0 1 0 1 Task Gate Reserved
6 0 1 1 0 16-bit Interrupt Gate Reserved
7 0 1 1 1 16-bit Trap Gate Reserved
8 1 0 0 0 Reserved Reserved
9 1 0 0 1 32-bit TSS (Available) 64-bit TSS (Available)
10 1 0 1 0 Reserved Reserved
11 1 0 1 1 32-bit TSS (Busy) 64-bit TSS (Busy)
12 1 1 0 0 32-bit Call Gate 64-bit Call Gate
13 1 1 0 1 Reserved Reserved
14 1 1 1 0 32-bit Interrupt Gate 64-bit Interrupt Gate
15 1 1 1 1 32-bit Trap Gate 64-bit Trap Gate

See also: Section 3.5.1, “Segment Descriptor Tables”, and Section 7.2.2, “TSS Descriptor” (for more information
on the system-segment descriptors); see Section 5.8.3, “Call Gates”, Section 6.11, “I DT Descriptors”, and Section
7.2.5, “Task-Gate Descriptor” (for more information on the gate descriptors).

3.5.1 Segment Descriptor Tables


A segment descriptor table is an array of segment descriptors (see Figure 3-10). A descriptor table is variable in
length and can contain up to 8192 (213) 8-byte descriptors. There are two kinds of descriptor tables:
• The global descriptor table (GDT)
• The local descriptor tables (LDT)

3-14 Vol. 3A
PROTECTED-MODE MEMORY MANAGEMENT

Global Local
Descriptor Descriptor
Table (GDT) Table (LDT)

T
I TI = 0 TI = 1
Segment
Selector
56 56

48 48

40 40

32 32

24 24

16 16

8 8
First Descriptor in
GDT is Not Used 0 0

GDTR Register LDTR Register


Limit Limit
Base Address Base Address
Seg. Sel.

Figure 3-10. Global and Local Descriptor Tables

Each system must have one GDT defined, which may be used for all programs and tasks in the system. Optionally,
one or more LDTs can be defined. For example, an LDT can be defined for each separate task being run, or some or
all tasks can share the same LDT.
The GDT is not a segment itself; instead, it is a data structure in linear address space. The base linear address and
limit of the GDT must be loaded into the GDTR register (see Section 2.4, “Memory-Management Registers”). The
base addresses of the GDT should be aligned on an eight-byte boundary to yield the best processor performance.
The limit value for the GDT is expressed in bytes. As with segments, the limit value is added to the base address to
get the address of the last valid byte. A limit value of 0 results in exactly one valid byte. Because segment descrip-
tors are always 8 bytes long, the GDT limit should always be one less than an integral multiple of eight (that is, 8N
– 1).
The first descriptor in the GDT is not used by the processor. A segment selector to this “null descriptor” does not
generate an exception when loaded into a data-segment register (DS, ES, FS, or GS), but it always generates a
general-protection exception (#GP) when an attempt is made to access memory using the descriptor. By initializing
the segment registers with this segment selector, accidental reference to unused segment registers can be guar-
anteed to generate an exception.
The LDT is located in a system segment of the LDT type. The GDT must contain a segment descriptor for the LDT
segment. I f the system supports multiple LDTs, each must have a separate segment selector and segment
descriptor in the GDT. The segment descriptor for an LDT can be located anywhere in the GDT. See Section 3.5,
“System Descriptor Types”, information on the LDT segment-descriptor type.
An LDT is accessed with its segment selector. To eliminate address translations when accessing the LDT, the
segment selector, base linear address, limit, and access rights of the LDT are stored in the LDTR register (see
Section 2.4, “Memory-Management Registers”).
When the GDTR register is stored (using the SGDT instruction), a 48-bit “pseudo-descriptor” is stored in memory
(see top diagram in Figure 3-11). To avoid alignment check faults in user mode (privilege level 3), the pseudo-

Vol. 3A 3-15
PROTECTED-MODE MEMORY MANAGEMENT

descriptor should be located at an odd word address (that is, address MOD 4 is equal to 2). This causes the
processor to store an aligned word, followed by an aligned doubleword. User-mode programs normally do not store
pseudo-descriptors, but the possibility of generating an alignment check fault can be avoided by aligning pseudo-
descriptors in this way. The same alignment should be used when storing the I DTR register using the SI DT instruc-
tion. When storing the LDTR or task register (using the SLDT or STR instruction, respectively), the pseudo-
descriptor should be located at a doubleword address (that is, address MOD 4 is equal to 0).

47 16 15 0
32-bit Base Address Limit

79 16 15 0
64-bit Base Address Limit

Figure 3-11. Pseudo-Descriptor Formats

3.5.2 Segment Descriptor Tables in IA-32e Mode


I n I A-32e mode, a segment descriptor table can contain up to 8192 (2 13) 8-byte descriptors. An entry in the
segment descriptor table can be 8 bytes. System descriptors are expanded to 16 bytes (occupying the space of two
entries).
GDTR and LDTR registers are expanded to hold 64-bit base address. The corresponding pseudo-descriptor is 80
bits. (see the bottom diagram in Figure 3-11).
The following system descriptors expand to 16 bytes:
— Call gate descriptors (see Section 5.8.3.1, “I A-32e Mode Call Gates”)
— I DT gate descriptors (see Section 6.14.1, “64-Bit Mode I DT”)
— LDT and TSS descriptors (see Section 7.2.3, “TSS Descriptor in 64-bit mode”).

3-16 Vol. 3A
CHAPTER 4
PAGING

Chapter 3 explains how segmentation converts logical addresses to linear addresses. Paging (or linear-address
translation) is the process of translating linear addresses so that they can be used to access memory or I / O
devices. Paging translates each linear address to a physical address and determines, for each translation, what
accesses to the linear address are allowed (the address’s access rights) and the type of caching used for such
accesses (the address’s memory type).
I ntel-64 processors support three different paging modes. These modes are identified and defined in Section 4.1.
Section 4.2 gives an overview of the translation mechanism that is used in all modes. Section 4.3, Section 4.4, and
Section 4.5 discuss the three paging modes in detail.
Section 4.6 details how paging determines and uses access rights. Section 4.7 discusses exceptions that may be
generated by paging (page-fault exceptions). Section 4.8 considers data which the processor writes in response to
linear-address accesses (accessed and dirty flags).
Section 4.9 describes how paging determines the memory types used for accesses to linear addresses. Section 4.10
provides details of how a processor may cache information about linear-address translation. Section 4.11 outlines
interactions between paging and certain VMX features. Section 4.12 gives an overview of how paging can be used
to implement virtual memory.

4.1 PAGING MODES AND CONTROL BITS


Paging behavior is controlled by the following control bits:
• The WP and PG flags in control register CR0 (bit 16 and bit 31, respectively).
• The PSE, PAE, PGE, PCI DE, and SMEP flags in control register CR4 (bit 4, bit 5, bit 7, bit 17, and bit 20 respec-
tively).
• The LME and NXE flags in the I A32_ EFER MSR (bit 8 and bit 11, respectively).
Software enables paging by using the MOV to CR0 instruction to set CR0.PG. Before doing so, software should
ensure that control register CR3 contains the physical address of the first paging structure that the processor will
use for linear-address translation (see Section 4.2) and that structure is initialized as desired. See Table 4-3,
Table 4-7, and Table 4-12 for the use of CR3 in the different paging modes.
Section 4.1.1 describes how the values of CR0.PG, CR4.PAE, and I A32_ EFER.LME determine whether paging is in
use and, if so, which of three paging modes is in use. Section 4.1.2 explains how to manage these bits to establish
or make changes in paging modes. Section 4.1.3 discusses how CR0.WP, CR4.PSE, CR4.PGE, CR4.PCI DE,
CR4.SMEP, and I A32_ EFER.NXE modify the operation of the different paging modes.

4.1.1 Three Paging Modes


I f CR0.PG = 0, paging is not used. The logical processor treats all linear addresses as if they were physical
addresses. CR4.PAE and I A32_ EFER.LME are ignored by the processor, as are CR0.WP, CR4.PSE, CR4.PGE,
CR4.SMEP, and I A32_ EFER.NXE.
Paging is enabled if CR0.PG = 1. Paging can be enabled only if protection is enabled (CR0.PE = 1). I f paging is
enabled, one of three paging modes is used. The values of CR4.PAE and I A32_ EFER.LME determine which paging
mode is used:
• I f CR0.PG = 1 and CR4.PAE = 0, 32-bit paging is used. 32-bit paging is detailed in Section 4.3. 32-bit paging
uses CR0.WP, CR4.PSE, CR4.PGE, and CR4.SMEP as described in Section 4.1.3.
• I f CR0.PG = 1, CR4.PAE = 1, and I A32_ EFER.LME = 0, PAE paging is used. PAE paging is detailed in Section
4.4. PAE paging uses CR0.WP, CR4.PGE, CR4.SMEP, and I A32_ EFER.NXE as described in Section 4.1.3.

Vol. 3A 4-1
PAGING

• I f CR0.PG = 1, CR4.PAE = 1, and I A32_ EFER.LME = 1, I A-32e paging is used. 1 I A-32e paging is detailed in
Section 4.5. I A-32e paging uses CR0.WP, CR4.PGE, CR4.PCI DE, CR4.SMEP, and I A32_ EFER.NXE as described
in Section 4.1.3. I A-32e paging is available only on processors that support the I ntel 64 architecture.
The three paging modes differ with regard to the following details:
• Linear-address width. The size of the linear addresses that can be translated.
• Physical-address width. The size of the physical addresses produced by paging.
• Page size. The granularity at which linear addresses are translated. Linear addresses on the same page are
translated to corresponding physical addresses on the same page.
• Support for execute-disable access rights. I n some paging modes, software can be prevented from fetching
instructions from pages that are otherwise readable.
• Support for PCI Ds. I n some paging modes, software can enable a facility by which a logical processor caches
information for multiple linear-address spaces. The processor may retain cached information when software
switches between different linear-address spaces.
Table 4-1 illustrates the key differences between the three paging modes.

Table 4-1. Properties of Different Paging Modes

Lin.- Phys.- Supports


Paging PG in PAE in LME in Page Supports
Addr. Addr. Execute-
Mode CR0 CR4 IA32_EFER Sizes PCIDs?
Width Width1 Disable?

None 0 N/A N/A 32 32 N/A No No

Up to 4 KB
32-bit 1 0 02 32 No No
403 4 MB4

Up to 4 KB
PAE 1 1 0 32 Yes5 No
52 2 MB

4 KB
Up to
IA-32e 1 1 1 48 2 MB Yes5 Yes7
52
1 GB6

NOTES:
1. The physical-address width is always bounded by MAXPHYADDR; see Section 4.1.4.
2. The processor ensures that IA32_EFER.LME must be 0 if CR0.PG = 1 and CR4.PAE = 0.
3. 32-bit paging supports physical-address widths of more than 32 bits only for 4-MByte pages and only if the PSE-36 mechanism is
supported; see Section 4.1.4 and Section 4.3.
4. 4-MByte pages are used with 32-bit paging only if CR4.PSE = 1; see Section 4.3.
5. Execute-disable access rights are applied only if IA32_EFER.NXE = 1; see Section 4.6.
6. Not all processors that support IA-32e paging support 1-GByte pages; see Section 4.1.4.
7. PCIDs are used only if CR4.PCIDE = 1; see Section 4.10.1.

Because they are used only if I A32_ EFER.LME = 0, 32-bit paging and PAE paging is used only in legacy protected
mode. Because legacy protected mode cannot produce linear addresses larger than 32 bits, 32-bit paging and PAE
paging translate 32-bit linear addresses.
Because it is used only if I A32_ EFER.LME = 1, I A-32e paging is used only in I A-32e mode. (I n fact, it is the use of
I A-32e paging that defines I A-32e mode.) I A-32e mode has two sub-modes:
• Compatibility mode. This mode uses only 32-bit linear addresses. I A-32e paging treats bits 47: 32 of such an
address as all 0.

1. The LMA flag in the IA32_EFER MSR (bit 10) is a status bit that indicates whether the logical processor is in IA-32e mode (and thus
using IA-32e paging). The processor always sets IA32_EFER.LMA to CR0.PG & IA32_EFER.LME. Software cannot directly modify
IA32_EFER.LMA; an execution of WRMSR to the IA32_EFER MSR ignores bit 10 of its source operand.

4-2 Vol. 3A
PAGING

• 64-bit mode. While this mode produces 64-bit linear addresses, the processor ensures that bits 63: 47 of such
an address are identical. 1 I A-32e paging does not use bits 63: 48 of such addresses.

4.1.2 Paging-Mode Enabling


I f CR0.PG = 1, a logical processor is in one of three paging modes, depending on the values of CR4.PAE and
I A32_ EFER.LME. Figure 4-1 illustrates how software can enable these modes and make transitions between them.
The following items identify certain limitations and other details:

#GP #GP

Set LME
Set LME

No Paging Set PG 32-bit Paging Set PAE


PAE Paging
PG = 0 PG = 1
PAE = 0 PG = 1
PAE = 0
LME = 0 PAE = 1
Clear PG LME = 0 Clear PAE LME = 0
Setr LME

#GP
Clear LME

Clear PAE Set PG


Set PAE Clear PAE
Clear PG

No Paging No Paging IA-32e Paging


PG = 0 PG = 0 PG = 1
PAE = 0 PAE = 1 PAE = 1
LME = 1 LME = 0 LME = 1
Set PG
Setr LME

Clear LME
Clear LME

Set PG

Clear PAE Clear PG


Set PAE
#GP
#GP

No Paging
PG = 0
PAE = 1
LME = 1

Figure 4-1. Enabling and Changing Paging Modes

• I A32_ EFER.LME cannot be modified while paging is enabled (CR0.PG = 1). Attempts to do so using WRMSR
cause a general-protection exception (#GP(0)).
• Paging cannot be enabled (by setting CR0.PG to 1) while CR4.PAE = 0 and I A32_ EFER.LME = 1. Attempts to do
so using MOV to CR0 cause a general-protection exception (#GP(0)).
• CR4.PAE cannot be cleared while I A-32e paging is active (CR0.PG = 1 and I A32_ EFER.LME = 1). Attempts to
do so using MOV to CR4 cause a general-protection exception (#GP(0)).
• Regardless of the current paging mode, software can disable paging by clearing CR0.PG with MOV to CR0. 2

1. Such an address is called canonical. Use of a non-canonical linear address in 64-bit mode produces a general-protection exception
(#GP(0)); the processor does not attempt to translate non-canonical linear addresses using IA-32e paging.
2. If CR4.PCIDE = 1, an attempt to clear CR0.PG causes a general-protection exception (#GP); software should clear CR4.PCIDE before
attempting to disable paging.

Vol. 3A 4-3
PAGING

• Software can make transitions between 32-bit paging and PAE paging by changing the value of CR4.PAE with
MOV to CR4.
• Software cannot make transitions directly between I A-32e paging and either of the other two paging modes. I t
must first disable paging (by clearing CR0.PG with MOV to CR0), then set CR4.PAE and I A32_ EFER.LME to the
desired values (with MOV to CR4 and WRMSR), and then re-enable paging (by setting CR0.PG with MOV to
CR0). As noted earlier, an attempt to clear either CR4.PAE or I A32_ EFER.LME cause a general-protection
exception (#GP(0)).
• VMX transitions allow transitions between paging modes that are not possible using MOV to CR or WRMSR. This
is because VMX transitions can load CR0, CR4, and I A32_ EFER in one operation. See Section 4.11.1.

4.1.3 Paging-Mode Modifiers


Details of how each paging mode operates are determined by the following control bits:
• The WP flag in CR0 (bit 16).
• The PSE, PGE, PCI DE, and SMEP flags in CR4 (bit 4, bit 7, bit 17, and bit 20, respectively).
• The NXE flag in the I A32_ EFER MSR (bit 11).
CR0.WP allows pages to be protected from supervisor-mode writes. I f CR0.WP = 0, supervisor-mode write
accesses are allowed to linear addresses with read-only access rights; if CR0.WP = 1, they are not. (User-mode
write accesses are never allowed to linear addresses with read-only access rights, regardless of the value of
CR0.WP.) Section 4.6 explains how access rights are determined, including the definition of supervisor-mode and
user-mode accesses.
CR4.PSE enables 4-MByte pages for 32-bit paging. I f CR4.PSE = 0, 32-bit paging can use only 4-KByte pages; if
CR4.PSE = 1, 32-bit paging can use both 4-KByte pages and 4-MByte pages. See Section 4.3 for more information.
(PAE paging and I A-32e paging can use multiple page sizes regardless of the value of CR4.PSE.)
CR4.PGE enables global pages. I f CR4.PGE = 0, no translations are shared across address spaces; if CR4.PGE = 1,
specified translations may be shared across address spaces. See Section 4.10.2.4 for more information.
CR4.PCI DE enables process-context identifiers (PCI Ds) for I A-32e paging (CR4.PCI DE can be 1 only when I A-32e
paging is in use). PCI Ds allow a logical processor to cache information for multiple linear-address spaces. See
Section 4.10.1 for more information.
CR4.SMEP allows pages to be protected from supervisor-mode instruction fetches. I f CR4.SMEP = 1, software
operating in supervisor mode cannot fetch instructions from linear addresses that are accessible in user mode.
Section 4.6 explains how access rights are determined, including the definition of supervisor-mode accesses and
user-mode accessibility.
I A32_ EFER.NXE enables execute-disable access rights for PAE paging and I A-32e paging. I f I A32_ EFER.NXE = 1,
instructions fetches can be prevented from specified linear addresses (even if data reads from the addresses are
allowed). Section 4.6 explains how access rights are determined. (I A32_ EFER.NXE has no effect with 32-bit
paging. Software that wants to use this feature to limit instruction fetches from readable pages must use either PAE
paging or I A-32e paging.)

4.1.4 Enumeration of Paging Features by CPUID


Software can discover support for different paging features using the CPUI D instruction:
• PSE: page-size extensions for 32-bit paging.
I f CPUI D.01H: EDX.PSE [bit 3] = 1, CR4.PSE may be set to 1, enabling support for 4-MByte pages with 32-bit
paging (see Section 4.3).
• PAE: physical-address extension.
I f CPUI D.01H: EDX.PAE [bit 6] = 1, CR4.PAE may be set to 1, enabling PAE paging (this setting is also required
for I A-32e paging).
• PGE: global-page support.
I f CPUI D.01H: EDX.PGE [bit 13] = 1, CR4.PGE may be set to 1, enabling the global-page feature (see Section
4.10.2.4).

4-4 Vol. 3A
PAGING

• PAT: page-attribute table.


I f CPUI D.01H: EDX.PAT [bit 16] = 1, the 8-entry page-attribute table (PAT) is supported. When the PAT is
supported, three bits in certain paging-structure entries select a memory type (used to determine type of
caching used) from the PAT (see Section 4.9.2).
• PSE-36: page-size extensions with 40-bit physical-address extension.
I f CPUI D.01H: EDX.PSE-36 [bit 17] = 1, the PSE-36 mechanism is supported, indicating that translations using
4-MByte pages with 32-bit paging may produce physical addresses with up to 40 bits (see Section 4.3).
• PCI D: process-context identifiers.
I f CPUI D.01H: ECX.PCI D [bit 17] = 1, CR4.PCI DE may be set to 1, enabling process-context identifiers (see
Section 4.10.1).
• SMEP: supervisor-mode execution prevention.
I f CPUI D.(EAX=07H,ECX=0H): EBX.SMEP [bit 7] = 1, CR4.SMEP may be set to 1, enabling supervisor-mode
execution prevention (see Section 4.6).
• NX: execute disable.
I f CPUI D.80000001H: EDX.NX [bit 20] = 1, I A32_ EFER.NXE may be set to 1, allowing PAE paging and I A-32e
paging to disable execute access to selected pages (see Section 4.6). (Processors that do not support CPUI D
function 80000001H do not allow I A32_ EFER.NXE to be set to 1.)
• Page1GB: 1-GByte pages.
I f CPUI D.80000001H: EDX.Page1GB [bit 26] = 1, 1-GByte pages are supported with I A-32e paging (see
Section 4.5).
• LM: I A-32e mode support.
I f CPUI D.80000001H: EDX.LM [bit 29] = 1, I A32_ EFER.LME may be set to 1, enabling I A-32e paging.
(Processors that do not support CPUI D function 80000001H do not allow I A32_ EFER.LME to be set to 1.)
• CPUI D.80000008H: EAX[7: 0] reports the physical-address width supported by the processor. (For processors
that do not support CPUI D function 80000008H, the width is generally 36 if CPUI D.01H: EDX.PAE [bit 6] = 1
and 32 otherwise.) This width is referred to as MAXPHYADDR. MAXPHYADDR is at most 52.
• CPUI D.80000008H: EAX[15: 8] reports the linear-address width supported by the processor. Generally, this
value is 48 if CPUI D.80000001H: EDX.LM [bit 29] = 1 and 32 otherwise. (Processors that do not support CPUI D
function 80000008H, support a linear-address width of 32.)

4.2 HIERARCHICAL PAGING STRUCTURES: AN OVERVIEW


All three paging modes translate linear addresses use hierarchical paging structures. This section provides an
overview of their operation. Section 4.3, Section 4.4, and Section 4.5 provide details for the three paging modes.
Every paging structure is 4096 Bytes in size and comprises a number of individual entries. With 32-bit paging,
each entry is 32 bits (4 bytes); there are thus 1024 entries in each structure. With PAE paging and I A-32e paging,
each entry is 64 bits (8 bytes); there are thus 512 entries in each structure. (PAE paging includes one exception, a
paging structure that is 32 bytes in size, containing 4 64-bit entries.)
The processor uses the upper portion of a linear address to identify a series of paging-structure entries. The last of
these entries identifies the physical address of the region to which the linear address translates (called the page
frame). The lower portion of the linear address (called the page offset) identifies the specific address within that
region to which the linear address translates.
Each paging-structure entry contains a physical address, which is either the address of another paging structure or
the address of a page frame. I n the first case, the entry is said to reference the other paging structure; in the
latter, the entry is said to map a page.
The first paging structure used for any translation is located at the physical address in CR3. A linear address is
translated using the following iterative procedure. A portion of the linear address (initially the uppermost bits)
select an entry in a paging structure (initially the one located using CR3). I f that entry references another paging
structure, the process continues with that paging structure and with the portion of the linear address immediately
below that just used. I f instead the entry maps a page, the process completes: the physical address in the entry is
that of the page frame and the remaining lower portion of the linear address is the page offset.

Vol. 3A 4-5
PAGING

The following items give an example for each of the three paging modes (each example locates a 4-KByte page
frame):
• With 32-bit paging, each paging structure comprises 1024 = 2 10 entries. For this reason, the translation
process uses 10 bits at a time from a 32-bit linear address. Bits 31: 22 identify the first paging-structure entry
and bits 21: 12 identify a second. The latter identifies the page frame. Bits 11: 0 of the linear address are the
page offset within the 4-KByte page frame. (See Figure 4-2 for an illustration.)
• With PAE paging, the first paging structure comprises only 4 = 2 2 entries. Translation thus begins by using
bits 31: 30 from a 32-bit linear address to identify the first paging-structure entry. Other paging structures
comprise 512 =29 entries, so the process continues by using 9 bits at a time. Bits 29: 21 identify a second
paging-structure entry and bits 20: 12 identify a third. This last identifies the page frame. (See Figure 4-5 for
an illustration.)
• With I A-32e paging, each paging structure comprises 512 = 2 9 entries and translation uses 9 bits at a time
from a 48-bit linear address. Bits 47: 39 identify the first paging-structure entry, bits 38: 30 identify a second,
bits 29: 21 a third, and bits 20: 12 identify a fourth. Again, the last identifies the page frame. (See Figure 4-8
for an illustration.)
The translation process in each of the examples above completes by identifying a page frame; the page frame is
part of the translation of the original linear address. I n some cases, however, the paging structures may be
configured so that translation process terminates before identifying a page frame. This occurs if process encoun-
ters a paging-structure entry that is marked “not present” (because its P flag — bit 0 — is clear) or in which a
reserved bit is set. I n this case, there is no translation for the linear address; an access to that address causes a
page-fault exception (see Section 4.7).
I n the examples above, a paging-structure entry maps a page with 4-KByte page frame when only 12 bits remain
in the linear address; entries identified earlier always reference other paging structures. That may not apply in
other cases. The following items identify when an entry maps a page and when it references another paging struc-
ture:
• I f more than 12 bits remain in the linear address, bit 7 (PS — page size) of the current paging-structure entry
is consulted. I f the bit is 0, the entry references another paging structure; if the bit is 1, the entry maps a page.
• I f only 12 bits remain in the linear address, the current paging-structure entry always maps a page (bit 7 is
used for other purposes).
I f a paging-structure entry maps a page when more than 12 bits remain in the linear address, the entry identifies
a page frame larger than 4 KBytes. For example, 32-bit paging uses the upper 10 bits of a linear address to locate
the first paging-structure entry; 22 bits remain. I f that entry maps a page, the page frame is 2 22 Bytes = 4 MBytes.
32-bit paging supports 4-MByte pages if CR4.PSE = 1. PAE paging and I A-32e paging support 2-MByte pages
(regardless of the value of CR4.PSE). I A-32e paging may support 1-GByte pages (see Section 4.1.4).
Paging structures are given different names based their uses in the translation process. Table 4-2 gives the names
of the different paging structures. I t also provides, for each structure, the source of the physical address used to
locate it (CR3 or a different paging-structure entry); the bits in the linear address used to select an entry from the
structure; and details of about whether and how such an entry can map a page.

4.3 32-BIT PAGING


A logical processor uses 32-bit paging if CR0.PG = 1 and CR4.PAE = 0. 32-bit paging translates 32-bit linear
addresses to 40-bit physical addresses. 1 Although 40 bits corresponds to 1 TByte, linear addresses are limited to
32 bits; at most 4 GBytes of linear-address space may be accessed at any given time.
32-bit paging uses a hierarchy of paging structures to produce a translation for a linear address. CR3 is used to
locate the first paging-structure, the page directory. Table 4-3 illustrates how CR3 is used with 32-bit paging.

1. Bits in the range 39:32 are 0 in any physical address used by 32-bit paging except those used to map 4-MByte pages. If the proces-
sor does not support the PSE-36 mechanism, this is true also for physical addresses used to map 4-MByte pages. If the processor
does support the PSE-36 mechanism and MAXPHYADDR < 40, bits in the range 39:MAXPHYADDR are 0 in any physical address used
to map a 4-MByte page. (The corresponding bits are reserved in PDEs.) See Section 4.1.4 for how to determine MAXPHYADDR and
whether the PSE-36 mechanism is supported.

4-6 Vol. 3A
PAGING

Table 4-2. Paging Structures in the Different Paging Modes

Physical
Entry Bits Selecting
Paging Structure Paging Mode Address of Page Mapping
Name Entry
Structure

32-bit, PAE N/A


PML4 table PML4E
IA-32e CR3 47:39 N/A (PS must be 0)

32-bit N/A
Page-directory-
PDPTE PAE CR3 31:30 N/A (PS must be 0)
pointer table
IA-32e PML4E 38:30 1-GByte page if PS=11

32-bit CR3 31:22 4-MByte page if PS=12


Page directory PDE
PAE, IA-32e PDPTE 29:21 2-MByte page if PS=1

32-bit 21:12 4-KByte page


Page table PTE PDE
PAE, IA-32e 20:12 4-KByte page

NOTES:
1. Not all processors allow the PS flag to be 1 in PDPTEs; see Section 4.1.4 for how to determine whether 1-GByte pages are supported.
2. 32-bit paging ignores the PS flag in a PDE (and uses the entry to reference a page table) unless CR4.PSE = 1. Not all processors allow
CR4.PSE to be 1; see Section 4.1.4 for how to determine whether 4-MByte pages are supported with 32-bit paging.

32-bit paging may map linear addresses to either 4-KByte pages or 4-MByte pages. Figure 4-2 illustrates the
translation process when it uses a 4-KByte page; Figure 4-3 covers the case of a 4-MByte page. The following
items describe the 32-bit paging process in more detail as well has how the page size is determined:
• A 4-KByte naturally aligned page directory is located at the physical address specified in bits 31: 12 of CR3 (see
Table 4-3). A page directory comprises 1024 32-bit entries (PDEs). A PDE is selected using the physical address
defined as follows:
— Bits 39: 32 are all 0.
— Bits 31: 12 are from CR3.
— Bits 11: 2 are bits 31: 22 of the linear address.
— Bits 1: 0 are 0.
Because a PDE is identified using bits 31: 22 of the linear address, it controls access to a 4-Mbyte region of the
linear-address space. Use of the PDE depends on CR.PSE and the PDE’s PS flag (bit 7):
• I f CR4.PSE = 1 and the PDE’s PS flag is 1, the PDE maps a 4-MByte page (see Table 4-4). The final physical
address is computed as follows:
— Bits 39: 32 are bits 20: 13 of the PDE.
— Bits 31: 22 are bits 31: 22 of the PDE. 1
— Bits 21: 0 are from the original linear address.
• I f CR4.PSE = 0 or the PDE’s PS flag is 0, a 4-KByte naturally aligned page table is located at the physical
address specified in bits 31: 12 of the PDE (see Table 4-5). A page table comprises 1024 32-bit entries (PTEs).
A PTE is selected using the physical address defined as follows:
— Bits 39: 32 are all 0.
— Bits 31: 12 are from the PDE.
— Bits 11: 2 are bits 21: 12 of the linear address.

1. The upper bits in the final physical address do not all come from corresponding positions in the PDE; the physical-address bits in the
PDE are not all contiguous.

Vol. 3A 4-7
PAGING

— Bits 1: 0 are 0.
• Because a PTE is identified using bits 31: 12 of the linear address, every PTE maps a 4-KByte page (see
Table 4-6). The final physical address is computed as follows:
— Bits 39: 32 are all 0.
— Bits 31: 12 are from the PTE.
— Bits 11: 0 are from the original linear address.
I f a paging-structure entry’s P flag (bit 0) is 0 or if the entry sets any reserved bit, the entry is used neither to refer-
ence another paging-structure entry nor to map a page. There is no translation for a linear address whose transla-
tion would use such a paging-structure entry; a reference to such a linear address causes a page-fault exception
(see Section 4.7).
With 32-bit paging, there are reserved bits only if CR4.PSE = 1:
• I f the P flag and the PS flag (bit 7) of a PDE are both 1, the bits reserved depend on MAXPHYADDR whether the
PSE-36 mechanism is supported: 1
— I f the PSE-36 mechanism is not supported, bits 21: 13 are reserved.
— I f the PSE-36 mechanism is supported, bits 21: (M– 19) are reserved, where M is the minimum of 40 and
MAXPHYADDR.
• I f the PAT is not supported: 2
— I f the P flag of a PTE is 1, bit 7 is reserved.
— I f the P flag and the PS flag of a PDE are both 1, bit 12 is reserved.
(I f CR4.PSE = 0, no bits are reserved with 32-bit paging.)
A reference using a linear address that is successfully translated to a physical address is performed only if allowed
by the access rights of the translation; see Section 4.6.

Linear Address
31 22 21 12 11 0
Directory Table Offset

12 4-KByte Page

10 10 Page Table Physical Address


Page Directory

PTE
20
PDE with PS=0
20
32
CR3

Figure 4-2. Linear-Address Translation to a 4-KByte Page using 32-Bit Paging

1. See Section 4.1.4 for how to determine MAXPHYADDR and whether the PSE-36 mechanism is supported.
2. See Section 4.1.4 for how to determine whether the PAT is supported.

4-8 Vol. 3A
PAGING

Linear Address
31 22 21 0
Directory Offset

22 4-MByte Page

10 Page Directory
Physical Address

PDE with PS=1


18

32
CR3

Figure 4-3. Linear-Address Translation to a 4-MByte Page using 32-Bit Paging

Figure 4-4 gives a summary of the formats of CR3 and the paging-structure entries with 32-bit paging. For the
paging structure entries, it identifies separately the format of entries that map pages, those that reference other
paging structures, and those that do neither because they are “not present”; bit 0 (P) and bit 7 (PS) are high-
lighted because they determine how such an entry is used.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
P PW
Address of page directory1 Ignored C Ignored CR3
D T

Bits 31:22 of address Reserved Bits 39:32 of P P PW U R PDE:


of 4MB page frame (must be 0) address2 A Ignored G 1 D A C / / 1 4MB
T D T S W page
I P PW U R PDE:
Address of page table Ignored 0 g A C / / 1 page
n D T S W table
PDE:
Ignored 0 not
present

P P U R PTE:
Address of 4KB page frame Ignored G A D A C PW / / 1 4KB
T
T D S W page
PTE:
Ignored 0 not
present

Figure 4-4. Formats of CR3 and Paging-Structure Entries with 32-Bit Paging
NOTES:
1. CR3 has 64 bits on processors supporting the Intel-64 architecture. These bits are ignored with 32-bit paging.
2. This example illustrates a processor in which MAXPHYADDR is 36. If this value is larger or smaller, the number of bits reserved in
positions 20:13 of a PDE mapping a 4-MByte will change.

Vol. 3A 4-9
PAGING

Table 4-3. Use of CR3 with 32-Bit Paging

Bit Contents
Position(s)

2:0 Ignored

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the page directory during linear-
address translation (see Section 4.9)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the page directory during linear-
address translation (see Section 4.9)

11:5 Ignored

31:12 Physical address of the 4-KByte aligned page directory used for linear-address translation

63:32 Ignored (these bits exist only on processors supporting the Intel-64 architecture)

4-10 Vol. 3A
PAGING

Table 4-4. Format of a 32-Bit Page-Directory Entry that Maps a 4-MByte Page

Bit Contents
Position(s)

0 (P) Present; must be 1 to map a 4-MByte page

1 (R/W) Read/write; if 0, writes may not be allowed to the 4-MByte page referenced by this entry (see Section 4.6)

2 (U/S) User/supervisor; if 0, user-mode accesses are not allowed to the 4-MByte page referenced by this entry (see Section
4.6)

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the 4-MByte page referenced by
this entry (see Section 4.9)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the 4-MByte page referenced by
this entry (see Section 4.9)

5 (A) Accessed; indicates whether software has accessed the 4-MByte page referenced by this entry (see Section 4.8)

6 (D) Dirty; indicates whether software has written to the 4-MByte page referenced by this entry (see Section 4.8)

7 (PS) Page size; must be 1 (otherwise, this entry references a page table; see Table 4-5)

8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section 4.10); ignored otherwise

11:9 Ignored

12 (PAT) If the PAT is supported, indirectly determines the memory type used to access the 4-MByte page referenced by this
entry (see Section 4.9.2); otherwise, reserved (must be 0)1

(M–20):13 Bits (M–1):32 of physical address of the 4-MByte page referenced by this entry 2

21:(M–19) Reserved (must be 0)

31:22 Bits 31:22 of physical address of the 4-MByte page referenced by this entry

NOTES:
1. See Section 4.1.4 for how to determine whether the PAT is supported.
2. If the PSE-36 mechanism is not supported, M is 32, and this row does not apply. If the PSE-36 mechanism is supported, M is the min-
imum of 40 and MAXPHYADDR (this row does not apply if MAXPHYADDR = 32). See Section 4.1.4 for how to determine MAXPHYA-
DDR and whether the PSE-36 mechanism is supported.

Vol. 3A 4-11
PAGING

Table 4-5. Format of a 32-Bit Page-Directory Entry that References a Page Table

Bit Contents
Position(s)

0 (P) Present; must be 1 to reference a page table

1 (R/W) Read/write; if 0, writes may not be allowed to the 4-MByte region controlled by this entry (see Section 4.6)

2 (U/S) User/supervisor; if 0, user-mode accesses are not allowed to the 4-MByte region controlled by this entry (see Section
4.6)

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the page table referenced by this
entry (see Section 4.9)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the page table referenced by this
entry (see Section 4.9)

5 (A) Accessed; indicates whether this entry has been used for linear-address translation (see Section 4.8)

6 Ignored

7 (PS) If CR4.PSE = 1, must be 0 (otherwise, this entry maps a 4-MByte page; see Table 4-4); otherwise, ignored

11:8 Ignored

31:12 Physical address of 4-KByte aligned page table referenced by this entry

Table 4-6. Format of a 32-Bit Page-Table Entry that Maps a 4-KByte Page

Bit Contents
Position(s)

0 (P) Present; must be 1 to map a 4-KByte page

1 (R/W) Read/write; if 0, writes may not be allowed to the 4-KByte page referenced by this entry (see Section 4.6)

2 (U/S) User/supervisor; if 0, user-mode accesses are not allowed to the 4-KByte page referenced by this entry (see Section
4.6)

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the 4-KByte page referenced by this
entry (see Section 4.9)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the 4-KByte page referenced by this
entry (see Section 4.9)

5 (A) Accessed; indicates whether software has accessed the 4-KByte page referenced by this entry (see Section 4.8)

6 (D) Dirty; indicates whether software has written to the 4-KByte page referenced by this entry (see Section 4.8)

7 (PAT) If the PAT is supported, indirectly determines the memory type used to access the 4-KByte page referenced by this
entry (see Section 4.9.2); otherwise, reserved (must be 0) 1

8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section 4.10); ignored otherwise

11:9 Ignored

31:12 Physical address of the 4-KByte page referenced by this entry

NOTES:
1. See Section 4.1.4 for how to determine whether the PAT is supported.

4-12 Vol. 3A
PAGING

4.4 PAE PAGING


A logical processor uses PAE paging if CR0.PG = 1, CR4.PAE = 1, and I A32_ EFER.LME = 0. PAE paging translates
32-bit linear addresses to 52-bit physical addresses. 1 Although 52 bits corresponds to 4 PBytes, linear addresses
are limited to 32 bits; at most 4 GBytes of linear-address space may be accessed at any given time.
With PAE paging, a logical processor maintains a set of four (4) PDPTE registers, which are loaded from an address
in CR3. Linear address are translated using 4 hierarchies of in-memory paging structures, each located using one
of the PDPTE registers. (This is different from the other paging modes, in which there is one hierarchy referenced
by CR3.)
Section 4.4.1 discusses the PDPTE registers. Section 4.4.2 describes linear-address translation with PAE paging.

4.4.1 PDPTE Registers


When PAE paging is used, CR3 references the base of a 32-Byte page-directory-pointer table. Table 4-7 illus-
trates how CR3 is used with PAE paging.

Table 4-7. Use of CR3 with PAE Paging

Bit Contents
Position(s)

4:0 Ignored

31:5 Physical address of the 32-Byte aligned page-directory-pointer table used for linear-address translation

63:32 Ignored (these bits exist only on processors supporting the Intel-64 architecture)

The page-directory-pointer-table comprises four (4) 64-bit entries called PDPTEs. Each PDPTE controls access to a
1-GByte region of the linear-address space. Corresponding to the PDPTEs, the logical processor maintains a set of
four (4) internal, non-architectural PDPTE registers, called PDPTE0, PDPTE1, PDPTE2, and PDPTE3. The logical
processor loads these registers from the PDPTEs in memory as part of certain operations:
• I f PAE paging would be in use following an execution of MOV to CR0 or MOV to CR4 (see Section 4.1.1) and the
instruction is modifying any of CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then the
PDPTEs are loaded from the address in CR3.
• I f MOV to CR3 is executed while the logical processor is using PAE paging, the PDPTEs are loaded from the
address being loaded into CR3.
• I f PAE paging is in use and a task switch changes the value of CR3, the PDPTEs are loaded from the address in
the new CR3 value.
• Certain VMX transitions load the PDPTE registers. See Section 4.11.1.
Table 4-8 gives the format of a PDPTE. I f any of the PDPTEs sets both the P flag (bit 0) and any reserved bit, the
MOV to CR instruction causes a general-protection exception (#GP(0)) and the PDPTEs are not loaded. 2 As shown
in Table 4-8, bits 2: 1, 8: 5, and 63: MAXPHYADDR are reserved in the PDPTEs.

1. If MAXPHYADDR < 52, bits in the range 51:MAXPHYADDR will be 0 in any physical address used by PAE paging. (The corresponding
bits are reserved in the paging-structure entries.) See Section 4.1.4 for how to determine MAXPHYADDR.
2. On some processors, reserved bits are checked even in PDPTEs in which the P flag (bit 0) is 0.

Vol. 3A 4-13
PAGING

Table 4-8. Format of a PAE Page-Directory-Pointer-Table Entry (PDPTE)

Bit Contents
Position(s)

0 (P) Present; must be 1 to reference a page directory

2:1 Reserved (must be 0)

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the page directory referenced by
this entry (see Section 4.9)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the page directory referenced by
this entry (see Section 4.9)

8:5 Reserved (must be 0)

11:9 Ignored

(M–1):12 Physical address of 4-KByte aligned page directory referenced by this entry 1

63:M Reserved (must be 0)

NOTES:
1. M is an abbreviation for MAXPHYADDR, which is at most 52; see Section 4.1.4.

4.4.2 Linear-Address Translation with PAE Paging


PAE paging may map linear addresses to either 4-KByte pages or 2-MByte pages. Figure 4-5 illustrates the trans-
lation process when it produces a 4-KByte page; Figure 4-6 covers the case of a 2-MByte page. The following items
describe the PAE paging process in more detail as well has how the page size is determined:
• Bits 31: 30 of the linear address select a PDPTE register (see Section 4.4.1); this is PDPTEi, where i is the value
of bits 31: 30. 1 Because a PDPTE register is identified using bits 31: 30 of the linear address, it controls access
to a 1-GByte region of the linear-address space. I f the P flag (bit 0) of PDPTEi is 0, the processor ignores bits
63: 1, and there is no mapping for the 1-GByte region controlled by PDPTEi. A reference using a linear address
in this region causes a page-fault exception (see Section 4.7).
• I f the P flag of PDPTEi is 1, 4-KByte naturally aligned page directory is located at the physical address specified
in bits 51: 12 of PDPTEi (see Table 4-8 in Section 4.4.1) A page directory comprises 512 64-bit entries (PDEs).
A PDE is selected using the physical address defined as follows:
— Bits 51: 12 are from PDPTEi.
— Bits 11: 3 are bits 29: 21 of the linear address.
— Bits 2: 0 are 0.
Because a PDE is identified using bits 31: 21 of the linear address, it controls access to a 2-Mbyte region of the
linear-address space. Use of the PDE depends on its PS flag (bit 7):
• I f the PDE’s PS flag is 1, the PDE maps a 2-MByte page (see Table 4-9). The final physical address is computed
as follows:
— Bits 51: 21 are from the PDE.
— Bits 20: 0 are from the original linear address.
• I f the PDE’s PS flag is 0, a 4-KByte naturally aligned page table is located at the physical address specified in
bits 51: 12 of the PDE (see Table 4-10). A page directory comprises 512 64-bit entries (PTEs). A PTE is selected
using the physical address defined as follows:
— Bits 51: 12 are from the PDE.

1. With PAE paging, the processor does not use CR3 when translating a linear address (as it does the other paging modes). It does not
access the PDPTEs in the page-directory-pointer table during linear-address translation.

4-14 Vol. 3A
PAGING

— Bits 11: 3 are bits 20: 12 of the linear address.


— Bits 2: 0 are 0.
• Because a PTE is identified using bits 31: 12 of the linear address, every PTE maps a 4-KByte page (see
Table 4-11). The final physical address is computed as follows:
— Bits 51: 12 are from the PTE.
— Bits 11: 0 are from the original linear address.
I f the P flag (bit 0) of a PDE or a PTE is 0 or if a PDE or a PTE sets any reserved bit, the entry is used neither to
reference another paging-structure entry nor to map a page. There is no translation for a linear address whose
translation would use such a paging-structure entry; a reference to such a linear address causes a page-fault
exception (see Section 4.7).
The following bits are reserved with PAE paging:
• I f the P flag (bit 0) of a PDE or a PTE is 1, bits 62: MAXPHYADDR are reserved.
• I f the P flag and the PS flag (bit 7) of a PDE are both 1, bits 20: 13 are reserved.
• I f I A32_ EFER.NXE = 0 and the P flag of a PDE or a PTE is 1, the XD flag (bit 63) is reserved.
• I f the PAT is not supported: 1
— I f the P flag of a PTE is 1, bit 7 is reserved.
— I f the P flag and the PS flag of a PDE are both 1, bit 12 is reserved.
A reference using a linear address that is successfully translated to a physical address is performed only if allowed
by the access rights of the translation; see Section 4.6.

Linear Address
31 30 29 21 20 12 11 0
Directory Pointer Directory Table Offset

12 4-KByte Page

Page Table Physical Address


Page Directory 9
9 PTE
40
PDE with PS=0
2 40

PDPTE Registers
40

PDPTE value

Figure 4-5. Linear-Address Translation to a 4-KByte Page using PAE Paging

1. See Section 4.1.4 for how to determine whether the PAT is supported.

Vol. 3A 4-15
PAGING

Linear Address
31 30 29 21 20 0
Directory Offset
Pointer Directory

21 2-MByte Page

9
Page Directory Physical Address

PDPTE Registers
2
PDE with PS=1
31
PDPTE value
40

Figure 4-6. Linear-Address Translation to a 2-MByte Page using PAE Paging

Table 4-9. Format of a PAE Page-Directory Entry that Maps a 2-MByte Page

Bit Contents
Position(s)

0 (P) Present; must be 1 to map a 2-MByte page

1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte page referenced by this entry (see Section 4.6)

2 (U/S) User/supervisor; if 0, user-mode accesses are not allowed to the 2-MByte page referenced by this entry (see Section
4.6)

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the 2-MByte page referenced by
this entry (see Section 4.9)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the 2-MByte page referenced by this
entry (see Section 4.9)

5 (A) Accessed; indicates whether software has accessed the 2-MByte page referenced by this entry (see Section 4.8)

6 (D) Dirty; indicates whether software has written to the 2-MByte page referenced by this entry (see Section 4.8)

7 (PS) Page size; must be 1 (otherwise, this entry references a page table; see Table 4-10)

8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section 4.10); ignored otherwise

11:9 Ignored

12 (PAT) If the PAT is supported, indirectly determines the memory type used to access the 2-MByte page referenced by this
entry (see Section 4.9.2); otherwise, reserved (must be 0)1

20:13 Reserved (must be 0)

(M–1):21 Physical address of the 2-MByte page referenced by this entry

62:M Reserved (must be 0)

63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed from the 2-MByte page controlled by
this entry; see Section 4.6); otherwise, reserved (must be 0)

NOTES:
1. See Section 4.1.4 for how to determine whether the PAT is supported.

4-16 Vol. 3A
PAGING

Table 4-10. Format of a PAE Page-Directory Entry that References a Page Table

Bit Contents
Position(s)

0 (P) Present; must be 1 to reference a page table

1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte region controlled by this entry (see Section 4.6)

2 (U/S) User/supervisor; if 0, user-mode accesses are not allowed to the 2-MByte region controlled by this entry (see
Section 4.6)

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the page table referenced by this
entry (see Section 4.9)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the page table referenced by this
entry (see Section 4.9)

5 (A) Accessed; indicates whether this entry has been used for linear-address translation (see Section 4.8)

6 Ignored

7 (PS) Page size; must be 0 (otherwise, this entry maps a 2-MByte page; see Table 4-9)

11:8 Ignored

(M–1):12 Physical address of 4-KByte aligned page table referenced by this entry

62:M Reserved (must be 0)

63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed from the 2-MByte region controlled
by this entry; see Section 4.6); otherwise, reserved (must be 0)

Table 4-11. Format of a PAE Page-Table Entry that Maps a 4-KByte Page

Bit Contents
Position(s)

0 (P) Present; must be 1 to map a 4-KByte page

1 (R/W) Read/write; if 0, writes may not be allowed to the 4-KByte page referenced by this entry (see Section 4.6)

2 (U/S) User/supervisor; if 0, user-mode accesses are not allowed to the 4-KByte page referenced by this entry (see Section
4.6)

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the 4-KByte page referenced by
this entry (see Section 4.9)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the 4-KByte page referenced by this
entry (see Section 4.9)

5 (A) Accessed; indicates whether software has accessed the 4-KByte page referenced by this entry (see Section 4.8)

6 (D) Dirty; indicates whether software has written to the 4-KByte page referenced by this entry (see Section 4.8)

7 (PAT) If the PAT is supported, indirectly determines the memory type used to access the 4-KByte page referenced by this
entry (see Section 4.9.2); otherwise, reserved (must be 0)1

8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section 4.10); ignored otherwise

Vol. 3A 4-17
PAGING

Table 4-11. Format of a PAE Page-Table Entry that Maps a 4-KByte Page (Contd.)

Bit Contents
Position(s)

11:9 Ignored

(M–1):12 Physical address of the 4-KByte page referenced by this entry

62:M Reserved (must be 0)

63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed from the 4-KByte page controlled by
this entry; see Section 4.6); otherwise, reserved (must be 0)

NOTES:
1. See Section 4.1.4 for how to determine whether the PAT is supported.

Figure 4-7 gives a summary of the formats of CR3 and the paging-structure entries with PAE paging. For the paging
structure entries, it identifies separately the format of entries that map pages, those that reference other paging
structures, and those that do neither because they are “not present”; bit 0 (P) and bit 7 (PS) are highlighted
because they determine how a paging-structure entry is used.

6666555555555 33322222222221111111111
M1 M-1
3210987654321 210987654321098765432109876543210

Ignored2 Address of page-directory-pointer table Ignored CR3

P P Rs
PDPTE:
Reserved3 Address of page directory Ign. Rsvd. CW 1
D T vd present

PDTPE:
Ignored 0 not
present
X P PP R PDE:
Address of
D Reserved Reserved A Ign. G 1 D A C W U / 1 2MB
2MB page frame
4 T D T /S W page

X I PPUR PDE:
Reserved Address of page table Ign. 0 g A C W /S / 1 page
D n DT W table
PDE:
Ignored 0 not
present

X P PPUR PTE:
Reserved Address of 4KB page frame Ign. G A D A C W /S / 1 4KB
D T DT W page
PTE:
Ignored 0 not
present

Figure 4-7. Formats of CR3 and Paging-Structure Entries with PAE Paging
NOTES:
1. M is an abbreviation for MAXPHYADDR.
2. CR3 has 64 bits only on processors supporting the Intel-64 architecture. These bits are ignored with PAE paging.
3. Reserved fields must be 0.
4. If IA32_EFER.NXE = 0 and the P flag of a PDE or a PTE is 1, the XD flag (bit 63) is reserved.

4-18 Vol. 3A
PAGING

4.5 IA-32E PAGING


A logical processor uses I A-32e paging if CR0.PG = 1, CR4.PAE = 1, and I A32_ EFER.LME = 1. With I A-32e paging,
linear address are translated using a hierarchy of in-memory paging structures located using the contents of CR3.
I A-32e paging translates 48-bit linear addresses to 52-bit physical addresses. 1 Although 52 bits corresponds to 4
PBytes, linear addresses are limited to 48 bits; at most 256 TBytes of linear-address space may be accessed at any
given time.
I A-32e paging uses a hierarchy of paging structures to produce a translation for a linear address. CR3 is used to
locate the first paging-structure, the PML4 table. Use of CR3 with I A-32e paging depends on whether process-
context identifiers (PCI Ds) have been enabled by setting CR4.PCI DE:
• Table 4-12 illustrates how CR3 is used with I A-32e paging if CR4.PCI DE = 0.

Table 4-12. Use of CR3 with IA-32e Paging and CR4.PCIDE = 0

Bit Contents
Position(s)

2:0 Ignored

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the PML4 table during linear-
address translation (see Section 4.9.2)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the PML4 table during linear-address
translation (see Section 4.9.2)

11:5 Ignored

M–1:12 Physical address of the 4-KByte aligned PML4 table used for linear-address translation 1

63:M Reserved (must be 0)

NOTES:
1. M is an abbreviation for MAXPHYADDR, which is at most 52; see Section 4.1.4.

• Table 4-13 illustrates how CR3 is used with I A-32e paging if CR4.PCI DE = 1.

Table 4-13. Use of CR3 with IA-32e Paging and CR4.PCIDE = 1

Bit Contents
Position(s)

11:0 PCID (see Section 4.10.1)1

M–1:12 Physical address of the 4-KByte aligned PML4 table used for linear-address translation 2

63:M Reserved (must be 0)3

NOTES:
1. Section 4.9.2 explains how the processor determines the memory type used to access the PML4 table during linear-address transla-
tion with CR4.PCIDE = 1.
2. M is an abbreviation for MAXPHYADDR, which is at most 52; see Section 4.1.4.
3. See Section 4.10.4.1 for use of bit 63 of the source operand of the MOV to CR3 instruction.

After software modifies the value of CR4.PCI DE, the logical processor immediately begins using CR3 as specified
for the new value. For example, if software changes CR4.PCI DE from 1 to 0, the current PCI D immediately changes

1. If MAXPHYADDR < 52, bits in the range 51:MAXPHYADDR will be 0 in any physical address used by IA-32e paging. (The correspond-
ing bits are reserved in the paging-structure entries.) See Section 4.1.4 for how to determine MAXPHYADDR.

Vol. 3A 4-19
PAGING

from CR3[11: 0] to 000H (see also Section 4.10.4.1). I n addition, the logical processor subsequently determines
the memory type used to access the PML4 table using CR3.PWT and CR3.PCD, which had been bits 4: 3 of the PCI D.
I A-32e paging may map linear addresses to 4-KByte pages, 2-MByte pages, or 1-GByte pages. 1 Figure 4-8 illus-
trates the translation process when it produces a 4-KByte page; Figure 4-9 covers the case of a 2-MByte page, and
Figure 4-10 the case of a 1-GByte page.

Linear Address
47 39 38 30 29 21 20 12 11 0
PML4 Directory Ptr Directory Table Offset

9 9
9 12 4-KByte Page
Physical Addr

PTE
Page-Directory- PDE with PS=0
40
Pointer Table 40 Page Table
Page-Directory
PDPTE 40

40
PML4E

40
CR3

Figure 4-8. Linear-Address Translation to a 4-KByte Page using IA-32e Paging

1. Not all processors support 1-GByte pages; see Section 4.1.4.

4-20 Vol. 3A
PAGING

Linear Address
47 39 38 30 29 21 20 0
PML4 Directory Ptr Directory Offset

9 21
9
2-MByte Page

Physical Addr
Page-Directory- PDE with PS=1
Pointer Table 31
Page-Directory
PDPTE
40
9

40
PML4E

40

CR3

Figure 4-9. Linear-Address Translation to a 2-MByte Page using IA-32e Paging

Linear Address
47 39 38 30 29 0
PML4 Directory Ptr Offset

30
9

1-GByte Page
Page-Directory-
Pointer Table
Physical Addr
PDPTE with PS=1
22
9

40
PML4E

40

CR3

Figure 4-10. Linear-Address Translation to a 1-GByte Page using IA-32e Paging

Vol. 3A 4-21
PAGING

The following items describe the I A-32e paging process in more detail as well has how the page size is determined.
• A 4-KByte naturally aligned PML4 table is located at the physical address specified in bits 51: 12 of CR3 (see
Table 4-12). A PML4 table comprises 512 64-bit entries (PML4Es). A PML4E is selected using the physical
address defined as follows:
— Bits 51: 12 are from CR3.
— Bits 11: 3 are bits 47: 39 of the linear address.
— Bits 2: 0 are all 0.
Because a PML4E is identified using bits 47: 39 of the linear address, it controls access to a 512-GByte region of
the linear-address space.
• A 4-KByte naturally aligned page-directory-pointer table is located at the physical address specified in
bits 51: 12 of the PML4E (see Table 4-14). A page-directory-pointer table comprises 512 64-bit entries
(PDPTEs). A PDPTE is selected using the physical address defined as follows:
— Bits 51: 12 are from the PML4E.
— Bits 11: 3 are bits 38: 30 of the linear address.
— Bits 2: 0 are all 0.
Because a PDPTE is identified using bits 47: 30 of the linear address, it controls access to a 1-GByte region of the
linear-address space. Use of the PDPTE depends on its PS flag (bit 7): 1
• I f the PDPTE’s PS flag is 1, the PDPTE maps a 1-GByte page (see Table 4-15). The final physical address is
computed as follows:
— Bits 51: 30 are from the PDPTE.
— Bits 29: 0 are from the original linear address.
• I f the PDE’s PS flag is 0, a 4-KByte naturally aligned page directory is located at the physical address specified
in bits 51: 12 of the PDPTE (see Table 4-16). A page directory comprises 512 64-bit entries (PDEs). A PDE is
selected using the physical address defined as follows:
— Bits 51: 12 are from the PDPTE.
— Bits 11: 3 are bits 29: 21 of the linear address.
— Bits 2: 0 are all 0.
Because a PDE is identified using bits 47: 21 of the linear address, it controls access to a 2-MByte region of the
linear-address space. Use of the PDE depends on its PS flag:
• I f the PDE’s PS flag is 1, the PDE maps a 2-MByte page. The final physical address is computed as shown in
Table 4-17.
— Bits 51: 21 are from the PDE.
— Bits 20: 0 are from the original linear address.
• I f the PDE’s PS flag is 0, a 4-KByte naturally aligned page table is located at the physical address specified in
bits 51: 12 of the PDE (see Table 4-18). A page table comprises 512 64-bit entries (PTEs). A PTE is selected
using the physical address defined as follows:
— Bits 51: 12 are from the PDE.
— Bits 11: 3 are bits 20: 12 of the linear address.
— Bits 2: 0 are all 0.
• Because a PTE is identified using bits 47: 12 of the linear address, every PTE maps a 4-KByte page (see
Table 4-19). The final physical address is computed as follows:
— Bits 51: 12 are from the PTE.
— Bits 11: 0 are from the original linear address.

1. The PS flag of a PDPTE is reserved and must be 0 (if the P flag is 1) if 1-GByte pages are not supported. See Section 4.1.4 for how
to determine whether 1-GByte pages are supported.

4-22 Vol. 3A
PAGING

I f a paging-structure entry’s P flag (bit 0) is 0 or if the entry sets any reserved bit, the entry is used neither to refer-
ence another paging-structure entry nor to map a page. There is no translation for a linear address whose transla-
tion would use such a paging-structure entry; a reference to such a linear address causes a page-fault exception
(see Section 4.7).
The following bits are reserved with I A-32e paging:
• I f the P flag of a paging-structure entry is 1, bits 51: MAXPHYADDR are reserved.
• I f the P flag of a PML4E is 1, the PS flag is reserved.
• I f 1-GByte pages are not supported and the P flag of a PDPTE is 1, the PS flag is reserved. 1
• I f the P flag and the PS flag of a PDPTE are both 1, bits 29: 13 are reserved.
• I f the P flag and the PS flag of a PDE are both 1, bits 20: 13 are reserved.
• I f I A32_ EFER.NXE = 0 and the P flag of a paging-structure entry is 1, the XD flag (bit 63) is reserved.
A reference using a linear address that is successfully translated to a physical address is performed only if allowed
by the access rights of the translation; see Section 4.6.
Figure 4-11 gives a summary of the formats of CR3 and the I A-32e paging-structure entries. For the paging struc-
ture entries, it identifies separately the format of entries that map pages, those that reference other paging struc-
tures, and those that do neither because they are “not present”; bit 0 (P) and bit 7 (PS) are highlighted because
they determine how a paging-structure entry is used.

Table 4-14. Format of an IA-32e PML4 Entry (PML4E) that References a Page-Directory-Pointer Table

Bit Contents
Position(s)

0 (P) Present; must be 1 to reference a page-directory-pointer table

1 (R/W) Read/write; if 0, writes may not be allowed to the 512-GByte region controlled by this entry (see Section 4.6)

2 (U/S) User/supervisor; if 0, user-mode accesses are not allowed to the 512-GByte region controlled by this entry (see
Section 4.6)

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the page-directory-pointer table
referenced by this entry (see Section 4.9.2)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the page-directory-pointer table
referenced by this entry (see Section 4.9.2)

5 (A) Accessed; indicates whether this entry has been used for linear-address translation (see Section 4.8)

6 Ignored

7 (PS) Reserved (must be 0)

11:8 Ignored

M–1:12 Physical address of 4-KByte aligned page-directory-pointer table referenced by this entry

51:M Reserved (must be 0)

62:52 Ignored

63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed from the 512-GByte region
controlled by this entry; see Section 4.6); otherwise, reserved (must be 0)

1. See Section 4.1.4 for how to determine whether 1-GByte pages are supported.

Vol. 3A 4-23
PAGING

Table 4-15. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that Maps a 1-GByte Page

Bit Contents
Position(s)

0 (P) Present; must be 1 to map a 1-GByte page

1 (R/W) Read/write; if 0, writes may not be allowed to the 1-GByte page referenced by this entry (see Section 4.6)

2 (U/S) User/supervisor; if 0, user-mode accesses are not allowed to the 1-GByte page referenced by this entry (see Section
4.6)

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the 1-GByte page referenced by this
entry (see Section 4.9.2)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the 1-GByte page referenced by this
entry (see Section 4.9.2)

5 (A) Accessed; indicates whether software has accessed the 1-GByte page referenced by this entry (see Section 4.8)

6 (D) Dirty; indicates whether software has written to the 1-GByte page referenced by this entry (see Section 4.8)

7 (PS) Page size; must be 1 (otherwise, this entry references a page directory; see Table 4-16)

8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section 4.10); ignored otherwise

11:9 Ignored

12 (PAT) Indirectly determines the memory type used to access the 1-GByte page referenced by this entry (see Section
4.9.2)1

29:13 Reserved (must be 0)

(M–1):30 Physical address of the 1-GByte page referenced by this entry

51:M Reserved (must be 0)

62:52 Ignored

63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed from the 1-GByte page controlled by
this entry; see Section 4.6); otherwise, reserved (must be 0)

NOTES:
1. The PAT is supported on all processors that support IA-32e paging.

4-24 Vol. 3A
PAGING

Table 4-16. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that References a Page Directory

Bit Contents
Position(s)

0 (P) Present; must be 1 to reference a page directory

1 (R/W) Read/write; if 0, writes may not be allowed to the 1-GByte region controlled by this entry (see Section 4.6)

2 (U/S) User/supervisor; if 0, user-mode accesses are not allowed to the 1-GByte region controlled by this entry (see Section
4.6)

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the page directory referenced by
this entry (see Section 4.9.2)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the page directory referenced by
this entry (see Section 4.9.2)

5 (A) Accessed; indicates whether this entry has been used for linear-address translation (see Section 4.8)

6 Ignored

7 (PS) Page size; must be 0 (otherwise, this entry maps a 1-GByte page; see Table 4-15)

11:8 Ignored

(M–1):12 Physical address of 4-KByte aligned page directory referenced by this entry

51:M Reserved (must be 0)

62:52 Ignored

63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed from the 1-GByte region controlled
by this entry; see Section 4.6); otherwise, reserved (must be 0)

Table 4-17. Format of an IA-32e Page-Directory Entry that Maps a 2-MByte Page

Bit Contents
Position(s)

0 (P) Present; must be 1 to map a 2-MByte page

1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte page referenced by this entry (see Section 4.6)

2 (U/S) User/supervisor; if 0, user-mode accesses are not allowed to the 2-MByte page referenced by this entry (see Section
4.6)

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the 2-MByte page referenced by
this entry (see Section 4.9.2)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the 2-MByte page referenced by
this entry (see Section 4.9.2)

5 (A) Accessed; indicates whether software has accessed the 2-MByte page referenced by this entry (see Section 4.8)

6 (D) Dirty; indicates whether software has written to the 2-MByte page referenced by this entry (see Section 4.8)

7 (PS) Page size; must be 1 (otherwise, this entry references a page table; see Table 4-18)

8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section 4.10); ignored otherwise

Vol. 3A 4-25
PAGING

Table 4-17. Format of an IA-32e Page-Directory Entry that Maps a 2-MByte Page (Contd.)

Bit Contents
Position(s)

11:9 Ignored

12 (PAT) Indirectly determines the memory type used to access the 2-MByte page referenced by this entry (see Section
4.9.2)

20:13 Reserved (must be 0)

(M–1):21 Physical address of the 2-MByte page referenced by this entry

51:M Reserved (must be 0)

62:52 Ignored

63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed from the 2-MByte page controlled by
this entry; see Section 4.6); otherwise, reserved (must be 0)

Table 4-18. Format of an IA-32e Page-Directory Entry that References a Page Table

Bit Contents
Position(s)

0 (P) Present; must be 1 to reference a page table

1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte region controlled by this entry (see Section 4.6)

2 (U/S) User/supervisor; if 0, user-mode accesses are not allowed to the 2-MByte region controlled by this entry (see Section
4.6)

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the page table referenced by this
entry (see Section 4.9.2)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the page table referenced by this
entry (see Section 4.9.2)

5 (A) Accessed; indicates whether this entry has been used for linear-address translation (see Section 4.8)

6 Ignored

7 (PS) Page size; must be 0 (otherwise, this entry maps a 2-MByte page; see Table 4-17)

11:8 Ignored

(M–1):12 Physical address of 4-KByte aligned page table referenced by this entry

51:M Reserved (must be 0)

62:52 Ignored

63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed from the 2-MByte region controlled
by this entry; see Section 4.6); otherwise, reserved (must be 0)

4-26 Vol. 3A
PAGING

Table 4-19. Format of an IA-32e Page-Table Entry that Maps a 4-KByte Page

Bit Contents
Position(s)

0 (P) Present; must be 1 to map a 4-KByte page

1 (R/W) Read/write; if 0, writes may not be allowed to the 4-KByte page referenced by this entry (see Section 4.6)

2 (U/S) User/supervisor; if 0, user-mode accesses are not allowed to the 4-KByte page referenced by this entry (see Section
4.6)

3 (PWT) Page-level write-through; indirectly determines the memory type used to access the 4-KByte page referenced by
this entry (see Section 4.9.2)

4 (PCD) Page-level cache disable; indirectly determines the memory type used to access the 4-KByte page referenced by this
entry (see Section 4.9.2)

5 (A) Accessed; indicates whether software has accessed the 4-KByte page referenced by this entry (see Section 4.8)

6 (D) Dirty; indicates whether software has written to the 4-KByte page referenced by this entry (see Section 4.8)

7 (PAT) Indirectly determines the memory type used to access the 4-KByte page referenced by this entry (see Section 4.9.2)

8 (G) Global; if CR4.PGE = 1, determines whether the translation is global (see Section 4.10); ignored otherwise

11:9 Ignored

(M–1):12 Physical address of the 4-KByte page referenced by this entry

51:M Reserved (must be 0)

62:52 Ignored

63 (XD) If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed from the 4-KByte page controlled by
this entry; see Section 4.6); otherwise, reserved (must be 0)

Vol. 3A 4-27
PAGING

6666555555555 33322222222221111111111
M1 M-1
3210987654321 210987654321098765432109876543210
PP
Reserved2 Address of PML4 table Ignored C W Ign. CR3
DT
X
Rs gI A P PUR
D Ignored Rsvd. Address of page-directory-pointer table Ign. C W / 1 PML4E:
3 vd n D T /S W present

PML4E:
Ignored 0 not
present

X P PPUR PDPTE:
Address of
Ignored Rsvd. Reserved A Ign. G 1 D A C W /S / 1 1GB
D 1GB page frame T DT W page

X I PPUR PDPTE:
Ignored Rsvd. Address of page directory Ign. 0 g A C W /S / 1 page
D n DT W directory
PDTPE:
Ignored 0 not
present

X P PPUR PDE:
Address of
Ignored Rsvd. Reserved A Ign. G 1 D A C W /S / 1 2MB
D 2MB page frame T DT W page

X I PPUR PDE:
Ignored Rsvd. Address of page table Ign. 0 g A C W /S / 1 page
D n DT W table
PDE:
Ignored 0 not
present

X P PPUR PTE:
Ignored Rsvd. Address of 4KB page frame Ign. G A D A C W /S / 1 4KB
D T DT W page
PTE:
Ignored 0 not
present

Figure 4-11. Formats of CR3 and Paging-Structure Entries with IA-32e Paging
NOTES:
1. M is an abbreviation for MAXPHYADDR.
2. Reserved fields must be 0.
3. If IA32_EFER.NXE = 0 and the P flag of a paging-structure entry is 1, the XD flag (bit 63) is reserved.

4.6 ACCESS RIGHTS


There is a translation for a linear address if the processes described in Section 4.3, Section 4.4.2, and Section 4.5
(depending upon the paging mode) completes and produces a physical address. Whether an access is permitted by
a translation is determined by the access rights specified by the paging-structure entries controlling the transla-
tion; 1 paging-mode modifiers in CR0, CR4, and the I A32_ EFER MSR; and the mode of the access.

1. With PAE paging, the PDPTEs do not determine access rights.

4-28 Vol. 3A
PAGING

Every access to a linear address is either a supervisor-mode access or a user-mode access. All accesses
performed while the current privilege level (CPL) is less than 3 are supervisor-mode accesses. I f CPL = 3, accesses
are generally user-mode accesses. However, some operations implicitly access system data structures with linear
addresses; the resulting accesses to those data structures are supervisor-mode accesses regardless of CPL. Exam-
ples of such implicit supervisor accesses include the following: accesses to the global descriptor table (GDT) or
local descriptor table (LDT) to load a segment descriptor; accesses to the interrupt descriptor table (I DT) when
delivering an interrupt or exception; and accesses to the task-state segment (TSS) as part of a task switch or
change of CPL.
The following items detail how paging determines access rights:
• For supervisor-mode accesses:
— Data reads.
Data may be read from any linear address with a translation.
— Data writes.
• I f CR0.WP = 0, data may be written to any linear address with a translation.
• I f CR0.WP = 1, data may be written to any linear address with a translation for which the R/ W flag
(bit 1) is 1 in every paging-structure entry controlling the translation.
— I nstruction fetches.
• For 32-bit paging or if I A32_ EFER.NXE = 0, access rights depend on the value of CR4.SMEP:
— I f CR4.SMEP = 0, instructions may be fetched from any linear address with a translation.
— I f CR4.SMEP = 1, instructions may be fetched from any linear address with a translation for which
the U/ S flag (bit 2) is 0 in at least one of the paging-structure entries controlling the translation.
• For PAE paging or I A-32e paging with I A32_ EFER.NXE = 1, access rights depend on the value of
CR4.SMEP:
— I f CR4.SMEP = 0, instructions may be fetched from any linear address with a translation for which
the XD flag (bit 63) is 0 in every paging-structure entry controlling the translation.
— I f CR4.SMEP = 1, instructions may be fetched from any linear address with a translation for which
(1) the U/ S flag is 0 in at least one of the paging-structure entries controlling the translation; and
(2) the XD flag is 0 in every paging-structure entry controlling the translation.
• For user-mode accesses:
— Data reads.
Data may be read from any linear address with a translation for which the U/ S flag (bit 2) is 1 in every
paging-structure entry controlling the translation.
— Data writes.
Data may be written to any linear address with a translation for which both the R/ W flag and the U/ S flag
are 1 in every paging-structure entry controlling the translation.
— I nstruction fetches.
• For 32-bit paging or if I A32_ EFER.NXE = 0, instructions may be fetched from any linear address with a
translation for which the U/ S flag is 1 in every paging-structure entry controlling the translation.
• For PAE paging or I A-32e paging with I A32_ EFER.NXE = 1, instructions may be fetched from any linear
address with a translation for which the U/ S flag is 1 and the XD flag is 0 in every paging-structure entry
controlling the translation.
A processor may cache information from the paging-structure entries in TLBs and paging-structure caches (see
Section 4.10). These structures may include information about access rights. The processor may enforce access
rights based on the TLBs and paging-structure caches instead of on the paging structures in memory.
This fact implies that, if software modifies a paging-structure entry to change access rights, the processor might
not use that change for a subsequent access to an affected linear address (see Section 4.10.4.3). See Section
4.10.4.2 for how software can ensure that the processor uses the modified access rights.

Vol. 3A 4-29
PAGING

4.7 PAGE-FAULT EXCEPTIONS


Accesses using linear addresses may cause page-fault exceptions (#PF; exception 14). An access to a linear
address may cause page-fault exception for either of two reasons: (1) there is no translation for the linear address;
or (2) there is a translation for the linear address, but its access rights do not permit the access.
As noted in Section 4.3, Section 4.4.2, and Section 4.5, there is no translation for a linear address if the translation
process for that address would use a paging-structure entry in which the P flag (bit 0) is 0 or one that sets a
reserved bit. I f there is a translation for a linear address, its access rights are determined as specified in Section
4.6.
Figure 4-12 illustrates the error code that the processor provides on delivery of a page-fault exception. The
following items explain how the bits in the error code describe the nature of the page-fault exception:

31 4 3 2 1 0

RSVD
I/D

U/S
W/R
P
Reserved

P 0 The fault was caused by a non-present page.


1 The fault was caused by a page-level protection violation.

W/R 0 The access causing the fault was a read.


1 The access causing the fault was a write.

U/S 0 A supervisor-mode access caused the fault.


1 A user-mode access caused the fault.

RSVD 0 The fault was not caused by reserved bit violation.


1 The fault was caused by a reserved bit set to 1 in some
paging-structure entry.

I/D 0 The fault was not caused by an instruction fetch.


1 The fault was caused by an instruction fetch.

Figure 4-12. Page-Fault Error Code

• P flag (bit 0).


This flag is 0 if there is no translation for the linear address because the P flag was 0 in one of the paging-
structure entries used to translate that address.
• W/ R (bit 1).
I f the access causing the page-fault exception was a write, this flag is 1; otherwise, it is 0. This flag describes
the access causing the page-fault exception, not the access rights specified by paging.
• U/ S (bit 2).
I f a user-mode access caused the page-fault exception, this flag is 1; it is 0 if a supervisor-mode access did so.
This flag describes the access causing the page-fault exception, not the access rights specified by paging. User-
mode and supervisor-mode accesses are defined in Section 4.6.
• RSVD flag (bit 3).
This flag is 1 if there is no translation for the linear address because a reserved bit was set in one of the paging-
structure entries used to translate that address. (Because reserved bits are not checked in a paging-structure
entry whose P flag is 0, bit 3 of the error code can be set only if bit 0 is also set.)
Bits reserved in the paging-structure entries are reserved for future functionality. Software developers should
be aware that such bits may be used in the future and that a paging-structure entry that causes a page-fault
exception on one processor might not do so in the future.
• I / D flag (bit 4).
This flag is 1 if (1) the access causing the page-fault exception was an instruction fetch; and (2) either
(a) CR4.SMEP = 1; or (b) both (i) CR4.PAE = 1 (either PAE paging or I A-32e paging is in use); and
(ii) I A32_ EFER.NXE = 1. Otherwise, the flag is 0. This flag describes the access causing the page-fault
exception, not the access rights specified by paging.

4-30 Vol. 3A
PAGING

Page-fault exceptions occur only due to an attempt to use a linear address. Failures to load the PDPTE registers
with PAE paging (see Section 4.4.1) cause general-protection exceptions (#GP(0)) and not page-fault exceptions.

4.8 ACCESSED AND DIRTY FLAGS


For any paging-structure entry that is used during linear-address translation, bit 5 is the accessed flag. 1 For
paging-structure entries that map a page (as opposed to referencing another paging structure), bit 6 is the dirty
flag. These flags are provided for use by memory-management software to manage the transfer of pages and
paging structures into and out of physical memory.
Whenever the processor uses a paging-structure entry as part of linear-address translation, it sets the accessed
flag in that entry (if it is not already set).
Whenever there is a write to a linear address, the processor sets the dirty flag (if it is not already set) in the paging-
structure entry that identifies the final physical address for the linear address (either a PTE or a paging-structure
entry in which the PS flag is 1).
Memory-management software may clear these flags when a page or a paging structure is initially loaded into
physical memory. These flags are “sticky,” meaning that, once set, the processor does not clear them; only soft-
ware can clear them.
A processor may cache information from the paging-structure entries in TLBs and paging-structure caches (see
Section 4.10). This fact implies that, if software changes an accessed flag or a dirty flag from 1 to 0, the processor
might not set the corresponding bit in memory on a subsequent access using an affected linear address (see
Section 4.10.4.3). See Section 4.10.4.2 for how software can ensure that these bits are updated as desired.

NOTE
The accesses used by the processor to set these flags may or may not be exposed to the
processor’s self-modifying code detection logic. I f the processor is executing code from the same
memory area that is being used for the paging structures, the setting of these flags may or may not
result in an immediate change to the executing code stream.

4.9 PAGING AND MEMORY TYPING


The memory type of a memory access refers to the type of caching used for that access. Chapter 11, “Memory
Cache Control” provides many details regarding memory typing in the I ntel-64 and I A-32 architectures. This
section describes how paging contributes to the determination of memory typing.
The way in which paging contributes to memory typing depends on whether the processor supports the Page
Attribute Table (PAT; see Section 11.12). 2 Section 4.9.1 and Section 4.9.2 explain how paging contributes to
memory typing depending on whether the PAT is supported.

4.9.1 Paging and Memory Typing When the PAT is Not Supported (Pentium Pro and
Pentium II Processors)

NOTE
The PAT is supported on all processors that support I A-32e paging. Thus, this section applies only
to 32-bit paging and PAE paging.

1. With PAE paging, the PDPTEs are not used during linear-address translation but only to load the PDPTE registers for some execu-
tions of the MOV CR instruction (see Section 4.4.1). For this reason, the PDPTEs do not contain accessed flags with PAE paging.
2. The PAT is supported on Pentium III and more recent processor families. See Section 4.1.4 for how to determine whether the PAT is
supported.

Vol. 3A 4-31
PAGING

I f the PAT is not supported, paging contributes to memory typing in conjunction with the memory-type range regis-
ters (MTRRs) as specified in Table 11-6 in Section 11.5.2.1.
For any access to a physical address, the table combines the memory type specified for that physical address by
the MTRRs with a PCD value and a PWT value. The latter two values are determined as follows:
• For an access to a PDE with 32-bit paging, the PCD and PWT values come from CR3.
• For an access to a PDE with PAE paging, the PCD and PWT values come from the relevant PDPTE register.
• For an access to a PTE, the PCD and PWT values come from the relevant PDE.
• For an access to the physical address that is the translation of a linear address, the PCD and PWT values come
from the relevant PTE (if the translation uses a 4-KByte page) or the relevant PDE (otherwise).
• With PAE paging, the UC memory type is used when loading the PDPTEs (see Section 4.4.1).

4.9.2 Paging and Memory Typing When the PAT is Supported (Pentium III and More Recent
Processor Families)
I f the PAT is supported, paging contributes to memory typing in conjunction with the PAT and the memory-type
range registers (MTRRs) as specified in Table 11-7 in Section 11.5.2.2.
The PAT is a 64-bit MSR (I A32_ PAT; MSR index 277H) comprising eight (8) 8-bit entries (entry i comprises
bits 8i+7: 8i of the MSR).
For any access to a physical address, the table combines the memory type specified for that physical address by
the MTRRs with a memory type selected from the PAT. Table 11-11 in Section 11.12.3 specifies how a memory type
is selected from the PAT. Specifically, it comes from entry i of the PAT, where i is defined as follows:
• For an access to an entry in a paging structure whose address is in CR3 (e.g., the PML4 table with I A-32e
paging):
— For I A-32e paging with CR4.PCI DE = 1, i = 0.
— Otherwise, i = 2* PCD+PWT, where the PCD and PWT values come from CR3.
• For an access to a PDE with PAE paging, i = 2* PCD+PWT, where the PCD and PWT values come from the
relevant PDPTE register.
• For an access to a paging-structure entry X whose address is in another paging-structure entry Y, i =
2* PCD+PWT, where the PCD and PWT values come from Y.
• For an access to the physical address that is the translation of a linear address, i = 4* PAT+2* PCD+PWT, where
the PAT, PCD, and PWT values come from the relevant PTE (if the translation uses a 4-KByte page), the relevant
PDE (if the translation uses a 2-MByte page or a 4-MByte page), or the relevant PDPTE (if the translation uses
a 1-GByte page).
• With PAE paging, the WB memory type is used when loading the PDPTEs (see Section 4.4.1). 1

4.9.3 Caching Paging-Related Information about Memory Typing


A processor may cache information from the paging-structure entries in TLBs and paging-structure caches (see
Section 4.10). These structures may include information about memory typing. The processor may use memory-
typing information from the TLBs and paging-structure caches instead of from the paging structures in memory.
This fact implies that, if software modifies a paging-structure entry to change the memory-typing bits, the
processor might not use that change for a subsequent translation using that entry or for access to an affected linear
address. See Section 4.10.4.2 for how software can ensure that the processor uses the modified memory typing.

1. Some older IA-32 processors used the UC memory type when loading the PDPTEs. Some processors may use the UC memory type if
CR0.CD = 1 or if the MTRRs are disabled. These behaviors are model-specific and not architectural.

4-32 Vol. 3A
PAGING

4.10 CACHING TRANSLATION INFORMATION


The I ntel-64 and I A-32 architectures may accelerate the address-translation process by caching data from the
paging structures on the processor. Because the processor does not ensure that the data that it caches are always
consistent with the structures in memory, it is important for software developers to understand how and when the
processor may cache such data. They should also understand what actions software can take to remove cached
data that may be inconsistent and when it should do so. This section provides software developers information
about the relevant processor operation.
Section 4.10.1 introduces process-context identifiers (PCI Ds), which a logical processor may use to distinguish
information cached for different linear-address spaces. Section 4.10.2 and Section 4.10.3 describe how the
processor may cache information in translation lookaside buffers (TLBs) and paging-structure caches, respectively.
Section 4.10.4 explains how software can remove inconsistent cached information by invalidating portions of the
TLBs and paging-structure caches. Section 4.10.5 describes special considerations for multiprocessor systems.

4.10.1 Process-Context Identifiers (PCIDs)


Process-context identifiers (PCI Ds) are a facility by which a logical processor may cache information for multiple
linear-address spaces. The processor may retain cached information when software switches to a different linear-
address space with a different PCI D (e.g., by loading CR3; see Section 4.10.4.1 for details).
A PCI D is a 12-bit identifier. Non-zero PCI Ds are enabled by setting the PCI DE flag (bit 17) of CR4. I f CR4.PCI DE =
0, the current PCI D is always 000H; otherwise, the current PCI D is the value of bits 11: 0 of CR3. Not all processors
allow CR4.PCI DE to be set to 1; see Section 4.1.4 for how to determine whether this is allowed.
The processor ensures that CR4.PCI DE can be 1 only in I A-32e mode (thus, 32-bit paging and PAE paging use only
PCI D 000H). I n addition, software can change CR4.PCI DE from 0 to 1 only if CR3[11: 0] = 000H. These require-
ments are enforced by the following limitations on the MOV CR instruction:
• MOV to CR4 causes a general-protection exception (#GP) if it would change CR4.PCI DE from 0 to 1 and either
I A32_ EFER.LMA = 0 or CR3[11: 0] ≠ 000H.
• MOV to CR0 causes a general-protection exception if it would clear CR0.PG to 0 while CR4.PCI DE = 1.
When a logical processor creates entries in the TLBs (Section 4.10.2) and paging-structure caches (Section
4.10.3), it associates those entries with the current PCI D. When using entries in the TLBs and paging-structure
caches to translate a linear address, a logical processor uses only those entries associated with the current PCI D
(see Section 4.10.2.4 for an exception).
I f CR4.PCI DE = 0, a logical processor does not cache information for any PCI D other than 000H. This is because
(1) if CR4.PCI DE = 0, the logical processor will associate any newly cached information with the current PCI D,
000H; and (2) if MOV to CR4 clears CR4.PCI DE, all cached information is invalidated (see Section 4.10.4.1).

NOTE
I n revisions of this manual that were produced when no processors allowed CR4.PCI DE to be set to
1, Section 4.10 discussed the caching of translation information without any reference to PCI Ds.
While the section now refers to PCI Ds in its specification of this caching, this documentation change
is not intended to imply any change to the behavior of processors that do not allow CR4.PCI DE to
be set to 1.

4.10.2 Translation Lookaside Buffers (TLBs)


A processor may cache information about the translation of linear addresses in translation lookaside buffers
(TLBs). I n general, TLBs contain entries that map page numbers to page frames; these terms are defined in
Section 4.10.2.1. Section 4.10.2.2 describes how information may be cached in TLBs, and Section 4.10.2.3 gives
details of TLB usage. Section 4.10.2.4 explains the global-page feature, which allows software to indicate that
certain translations should receive special treatment when cached in the TLBs.

Vol. 3A 4-33
PAGING

4.10.2.1 Page Numbers, Page Frames, and Page Offsets


Section 4.3, Section 4.4.2, and Section 4.5 give details of how the different paging modes translate linear
addresses to physical addresses. Specifically, the upper bits of a linear address (called the page number) deter-
mine the upper bits of the physical address (called the page frame); the lower bits of the linear address (called the
page offset) determine the lower bits of the physical address. The boundary between the page number and the
page offset is determined by the page size. Specifically:
• 32-bit paging:
— I f the translation does not use a PTE (because CR4.PSE = 1 and the PS flag is 1 in the PDE used), the page
size is 4 MBytes and the page number comprises bits 31: 22 of the linear address.
— I f the translation does use a PTE, the page size is 4 KBytes and the page number comprises bits 31: 12 of
the linear address.
• PAE paging:
— I f the translation does not use a PTE (because the PS flag is 1 in the PDE used), the page size is 2 MBytes
and the page number comprises bits 31: 21 of the linear address.
— I f the translation does uses a PTE, the page size is 4 KBytes and the page number comprises bits 31: 12 of
the linear address.
• I A-32e paging:
— I f the translation does not use a PDE (because the PS flag is 1 in the PDPTE used), the page size is 1 GBytes
and the page number comprises bits 47: 30 of the linear address.
— I f the translation does use a PDE but does not uses a PTE (because the PS flag is 1 in the PDE used), the
page size is 2 MBytes and the page number comprises bits 47: 21 of the linear address.
— I f the translation does use a PTE, the page size is 4 KBytes and the page number comprises bits 47: 12 of
the linear address.

4.10.2.2 Caching Translations in TLBs


The processor may accelerate the paging process by caching individual translations in translation lookaside
buffers (TLBs). Each entry in a TLB is an individual translation. Each translation is referenced by a page number.
I t contains the following information from the paging-structure entries used to translate linear addresses with the
page number:
• The physical address corresponding to the page number (the page frame).
• The access rights from the paging-structure entries used to translate linear addresses with the page number
(see Section 4.6):
— The logical-AND of the R/ W flags.
— The logical-AND of the U/ S flags.
— The logical-OR of the XD flags (necessary only if I A32_ EFER.NXE = 1).
• Attributes from a paging-structure entry that identifies the final page frame for the page number (either a PTE
or a paging-structure entry in which the PS flag is 1):
— The dirty flag (see Section 4.8).
— The memory type (see Section 4.9).
(TLB entries may contain other information as well. A processor may implement multiple TLBs, and some of these
may be for special purposes, e.g., only for instruction fetches. Such special-purpose TLBs may not contain some of
this information if it is not necessary. For example, a TLB used only for instruction fetches need not contain infor-
mation about the R/ W and dirty flags.)
As noted in Section 4.10.1, any TLB entries created by a logical processor are associated with the current PCI D.
Processors need not implement any TLBs. Processors that do implement TLBs may invalidate any TLB entry at any
time. Software should not rely on the existence of TLBs or on the retention of TLB entries.

4-34 Vol. 3A
PAGING

4.10.2.3 Details of TLB Use


Because the TLBs cache entries only for linear addresses with translations, there can be a TLB entry for a page
number only if the P flag is 1 and the reserved bits are 0 in each of the paging-structure entries used to translate
that page number. I n addition, the processor does not cache a translation for a page number unless the accessed
flag is 1 in each of the paging-structure entries used during translation; before caching a translation, the processor
sets any of these accessed flags that is not already 1.
The processor may cache translations required for prefetches and for accesses that are a result of speculative
execution that would never actually occur in the executed code path.
I f the page number of a linear address corresponds to a TLB entry associated with the current PCI D, the processor
may use that TLB entry to determine the page frame, access rights, and other attributes for accesses to that linear
address. I n this case, the processor may not actually consult the paging structures in memory. The processor may
retain a TLB entry unmodified even if software subsequently modifies the relevant paging-structure entries in
memory. See Section 4.10.4.2 for how software can ensure that the processor uses the modified paging-structure
entries.
I f the paging structures specify a translation using a page larger than 4 KBytes, some processors may choose to
cache multiple smaller-page TLB entries for that translation. Each such TLB entry would be associated with a page
number corresponding to the smaller page size (e.g., bits 47: 12 of a linear address with I A-32e paging), even
though part of that page number (e.g., bits 20: 12) are part of the offset with respect to the page specified by the
paging structures. The upper bits of the physical address in such a TLB entry are derived from the physical address
in the PDE used to create the translation, while the lower bits come from the linear address of the access for which
the translation is created. There is no way for software to be aware that multiple translations for smaller pages
have been used for a large page.
I f software modifies the paging structures so that the page size used for a 4-KByte range of linear addresses
changes, the TLBs may subsequently contain multiple translations for the address range (one for each page size).
A reference to a linear address in the address range may use any of these translations. Which translation is used
may vary from one execution to another, and the choice may be implementation-specific.

4.10.2.4 Global Pages


The I ntel-64 and I A-32 architectures also allow for global pages when the PGE flag (bit 7) is 1 in CR4. I f the G flag
(bit 8) is 1 in a paging-structure entry that maps a page (either a PTE or a paging-structure entry in which the PS
flag is 1), any TLB entry cached for a linear address using that paging-structure entry is considered to be global.
Because the G flag is used only in paging-structure entries that map a page, and because information from such
entries are not cached in the paging-structure caches, the global-page feature does not affect the behavior of the
paging-structure caches.
A logical processor may use a global TLB entry to translate a linear address, even if the TLB entry is associated with
a PCI D different from the current PCI D.

4.10.3 Paging-Structure Caches


I n addition to the TLBs, a processor may cache other information about the paging structures in memory.

4.10.3.1 Caches for Paging Structures


A processor may support any or of all the following paging-structure caches:
• PML4 cache (I A-32e paging only). Each PML4-cache entry is referenced by a 9-bit value and is used for linear
addresses for which bits 47: 39 have that value. The entry contains information from the PML4E used to
translate such linear addresses:
— The physical address from the PML4E (the address of the page-directory-pointer table).
— The value of the R/ W flag of the PML4E.
— The value of the U/ S flag of the PML4E.
— The value of the XD flag of the PML4E.

Vol. 3A 4-35
PAGING

— The values of the PCD and PWT flags of the PML4E.


The following items detail how a processor may use the PML4 cache:
— I f the processor has a PML4-cache entry for a linear address, it may use that entry when translating the
linear address (instead of the PML4E in memory).
— The processor does not create a PML4-cache entry unless the P flag is 1 and all reserved bits are 0 in the
PML4E in memory.
— The processor does not create a PML4-cache entry unless the accessed flag is 1 in the PML4E in memory;
before caching a translation, the processor sets the accessed flag if it is not already 1.
— The processor may create a PML4-cache entry even if there are no translations for any linear address that
might use that entry (e.g., because the P flags are 0 in all entries in the referenced page-directory-pointer
table).
— I f the processor creates a PML4-cache entry, the processor may retain it unmodified even if software subse-
quently modifies the corresponding PML4E in memory.
• PDPTE cache (I A-32e paging only). 1 Each PDPTE-cache entry is referenced by an 18-bit value and is used for
linear addresses for which bits 47: 30 have that value. The entry contains information from the PML4E and
PDPTE used to translate such linear addresses:
— The physical address from the PDPTE (the address of the page directory). (No PDPTE-cache entry is created
for a PDPTE that maps a 1-GByte page.)
— The logical-AND of the R/ W flags in the PML4E and the PDPTE.
— The logical-AND of the U/ S flags in the PML4E and the PDPTE.
— The logical-OR of the XD flags in the PML4E and the PDPTE.
— The values of the PCD and PWT flags of the PDPTE.
The following items detail how a processor may use the PDPTE cache:
— I f the processor has a PDPTE-cache entry for a linear address, it may use that entry when translating the
linear address (instead of the PML4E and the PDPTE in memory).
— The processor does not create a PDPTE-cache entry unless the P flag is 1, the PS flag is 0, and the reserved
bits are 0 in the PML4E and the PDPTE in memory.
— The processor does not create a PDPTE-cache entry unless the accessed flags are 1 in the PML4E and the
PDPTE in memory; before caching a translation, the processor sets any accessed flags that are not already
1.
— The processor may create a PDPTE-cache entry even if there are no translations for any linear address that
might use that entry.
— I f the processor creates a PDPTE-cache entry, the processor may retain it unmodified even if software
subsequently modifies the corresponding PML4E or PDPTE in memory.
• PDE cache. The use of the PDE cache depends on the paging mode:
— For 32-bit paging, each PDE-cache entry is referenced by a 10-bit value and is used for linear addresses for
which bits 31: 22 have that value.
— For PAE paging, each PDE-cache entry is referenced by an 11-bit value and is used for linear addresses for
which bits 31: 21 have that value.
— For I A-32e paging, each PDE-cache entry is referenced by a 27-bit value and is used for linear addresses for
which bits 47: 21 have that value.
A PDE-cache entry contains information from the PML4E, PDPTE, and PDE used to translate the relevant linear
addresses (for 32-bit paging and PAE paging, only the PDE applies):
— The physical address from the PDE (the address of the page table). (No PDE-cache entry is created for a
PDE that maps a page.)

1. With PAE paging, the PDPTEs are stored in internal, non-architectural registers. The operation of these registers is described in Sec-
tion 4.4.1 and differs from that described here.

4-36 Vol. 3A
PAGING

— The logical-AND of the R/ W flags in the PML4E, PDPTE, and PDE.


— The logical-AND of the U/ S flags in the PML4E, PDPTE, and PDE.
— The logical-OR of the XD flags in the PML4E, PDPTE, and PDE.
— The values of the PCD and PWT flags of the PDE.
The following items detail how a processor may use the PDE cache (references below to PML4Es and PDPTEs
apply on to I A-32e paging):
— I f the processor has a PDE-cache entry for a linear address, it may use that entry when translating the
linear address (instead of the PML4E, the PDPTE, and the PDE in memory).
— The processor does not create a PDE-cache entry unless the P flag is 1, the PS flag is 0, and the reserved
bits are 0 in the PML4E, the PDPTE, and the PDE in memory.
— The processor does not create a PDE-cache entry unless the accessed flag is 1 in the PML4E, the PDPTE,
and the PDE in memory; before caching a translation, the processor sets any accessed flags that are not
already 1.
— The processor may create a PDE-cache entry even if there are no translations for any linear address that
might use that entry.
— I f the processor creates a PDE-cache entry, the processor may retain it unmodified even if software subse-
quently modifies the corresponding PML4E, the PDPTE, or the PDE in memory.
I nformation from a paging-structure entry can be included in entries in the paging-structure caches for other
paging-structure entries referenced by the original entry. For example, if the R/ W flag is 0 in a PML4E, then the R/ W
flag will be 0 in any PDPTE-cache entry for a PDPTE from the page-directory-pointer table referenced by that
PML4E. This is because the R/ W flag of each such PDPTE-cache entry is the logical-AND of the R/ W flags in the
appropriate PML4E and PDPTE.
The paging-structure caches contain information only from paging-structure entries that reference other paging
structures (and not those that map pages). Because the G flag is not used in such paging-structure entries, the
global-page feature does not affect the behavior of the paging-structure caches.
The processor may create entries in paging-structure caches for translations required for prefetches and for
accesses that are a result of speculative execution that would never actually occur in the executed code path.
As noted in Section 4.10.1, any entries created in paging-structure caches by a logical processor are associated
with the current PCI D.
A processor may or may not implement any of the paging-structure caches. Software should rely on neither their
presence nor their absence. The processor may invalidate entries in these caches at any time. Because the
processor may create the cache entries at the time of translation and not update them following subsequent modi-
fications to the paging structures in memory, software should take care to invalidate the cache entries appropri-
ately when causing such modifications. The invalidation of TLBs and the paging-structure caches is described in
Section 4.10.4.

4.10.3.2 Using the Paging-Structure Caches to Translate Linear Addresses


When a linear address is accessed, the processor uses a procedure such as the following to determine the physical
address to which it translates and whether the access should be allowed:
• I f the processor finds a TLB entry that is for the page number of the linear address and that is associated with
the current PCI D (or which is global), it may use the physical address, access rights, and other attributes from
that entry.
• I f the processor does not find a relevant TLB entry, it may use the upper bits of the linear address to select an
entry from the PDE cache that is associated with the current PCI D (Section 4.10.3.1 indicates which bits are
used in each paging mode). I t can then use that entry to complete the translation process (locating a PTE, etc.)
as if it had traversed the PDE (and, for I A-32e paging, the PDPTE and PML4) corresponding to the PDE-cache
entry.
• The following items apply when I A-32e paging is used:
— I f the processor does not find a relevant TLB entry or a relevant PDE-cache entry, it may use bits 47: 30 of
the linear address to select an entry from the PDPTE cache that is associated with the current PCI D. I t can

Vol. 3A 4-37
PAGING

then use that entry to complete the translation process (locating a PDE, etc.) as if it had traversed the
PDPTE and the PML4 corresponding to the PDPTE-cache entry.
— I f the processor does not find a relevant TLB entry, a relevant PDE-cache entry, or a relevant PDPTE-cache
entry, it may use bits 47: 39 of the linear address to select an entry from the PML4 cache that is associated
with the current PCI D. I t can then use that entry to complete the translation process (locating a PDPTE,
etc.) as if it had traversed the corresponding PML4.
(Any of the above steps would be skipped if the processor does not support the cache in question.)
I f the processor does not find a TLB or paging-structure-cache entry for the linear address, it uses the linear
address to traverse the entire paging-structure hierarchy, as described in Section 4.3, Section 4.4.2, and Section
4.5.

4.10.3.3 Multiple Cached Entries for a Single Paging-Structure Entry


The paging-structure caches and TLBs and paging-structure caches may contain multiple entries associated with a
single PCI D and with information derived from a single paging-structure entry. The following items give some
examples for I A-32e paging:
• Suppose that two PML4Es contain the same physical address and thus reference the same page-directory-
pointer table. Any PDPTE in that table may result in two PDPTE-cache entries, each associated with a different
set of linear addresses. Specifically, suppose that the n1th and n2th entries in the PML4 table contain the same
physical address. This implies that the physical address in the m th PDPTE in the page-directory-pointer table
would appear in the PDPTE-cache entries associated with both p1 and p2, where (p1 » 9) = n1, (p2 » 9) = n2,
and (p1 & 1FFH) = (p2 & 1FFH) = m. This is because both PDPTE-cache entries use the same PDPTE, one
resulting from a reference from the n1th PML4E and one from the n2th PML4E.
• Suppose that the first PML4E (i.e., the one in position 0) contains the physical address X in CR3 (the physical
address of the PML4 table). This implies the following:
— Any PML4-cache entry associated with linear addresses with 0 in bits 47: 39 contains address X.
— Any PDPTE-cache entry associated with linear addresses with 0 in bits 47: 30 contains address X. This is
because the translation for a linear address for which the value of bits 47: 30 is 0 uses the value of
bits 47: 39 (0) to locate a page-directory-pointer table at address X (the address of the PML4 table). I t then
uses the value of bits 38: 30 (also 0) to find address X again and to store that address in the PDPTE-cache
entry.
— Any PDE-cache entry associated with linear addresses with 0 in bits 47: 21 contains address X for similar
reasons.
— Any TLB entry for page number 0 (associated with linear addresses with 0 in bits 47: 12) translates to page
frame X » 12 for similar reasons.
The same PML4E contributes its address X to all these cache entries because the self-referencing nature of the
entry causes it to be used as a PML4E, a PDPTE, a PDE, and a PTE.

4.10.4 Invalidation of TLBs and Paging-Structure Caches


As noted in Section 4.10.2 and Section 4.10.3, the processor may create entries in the TLBs and the paging-struc-
ture caches when linear addresses are translated, and it may retain these entries even after the paging structures
used to create them have been modified. To ensure that linear-address translation uses the modified paging struc-
tures, software should take action to invalidate any cached entries that may contain information that has since
been modified.

4.10.4.1 Operations that Invalidate TLBs and Paging-Structure Caches


The following instructions invalidate entries in the TLBs and the paging-structure caches:
• I NVLPG. This instruction takes a single operand, which is a linear address. The instruction invalidates any TLB
entries that are for a page number corresponding to the linear address and that are associated with the current
PCI D. I t also invalidates any global TLB entries with that page number, regardless of PCI D (see Section

4-38 Vol. 3A
PAGING

4.10.2.4). 1 I NVLPG also invalidates all entries in all paging-structure caches associated with the current PCI D,
regardless of the linear addresses to which they correspond.
• I NVPCI D. The operation of this instruction is based on instruction operands, called the I NVPCI D type and the
I NVPCI D descriptor. Four I NVPCI D types are currently defined:
— I ndividual-address. I f the I NVPCI D type is 0, the logical processor invalidates mappings—except global
translations—associated with the PCI D specified in the I NVPCI D descriptor and that would be used to
translate the linear address specified in the I NVPCI D descriptor. (The instruction may also invalidate global
translations, as well as mappings associated with other PCI Ds and for other linear addresses.)
— Single-context. I f the I NVPCI D type is 1, the logical processor invalidates all mappings—except global
translations—associated with the PCI D specified in the I NVPCI D descriptor. (The instruction may also
invalidate global translations, as well as mappings associated with other PCI Ds.)
— All-context, including globals. I f the I NVPCI D type is 2, the logical processor invalidates
mappings—including global translations—associated with all PCI Ds.
— All-context. I f the I NVPCI D type is 3, the logical processor invalidates mappings—except global transla-
tions—associated with all PCI Ds. (The instruction may also invalidate global translations.)
See Chapter 3 of the I ntel 64 and I A-32 Architecture Software Developer’s Manual, Volume 2A for details of the
I NVPCI D instruction.
• MOV to CR0. The instruction invalidates all TLB entries (including global entries) and all entries in all paging-
structure caches (for all PCI Ds) if it changes the value of CR0.PG from 1 to 0.
• MOV to CR3. The behavior of the instruction depends on the value of CR4.PCI DE:
— I f CR4.PCI DE = 0, the instruction invalidates all TLB entries associated with PCI D 000H except those for
global pages. I t also invalidates all entries in all paging-structure caches associated with PCI D 000H.
— I f CR4.PCI DE = 1 and bit 63 of the instruction’s source operand is 0, the instruction invalidates all TLB
entries associated with the PCI D specified in bits 11: 0 of the instruction’s source operand except those for
global pages. I t also invalidates all entries in all paging-structure caches associated with that PCI D. I t is not
required to invalidate entries in the TLBs and paging-structure caches that are associated with other PCI Ds.
— I f CR4.PCI DE = 1 and bit 63 of the instruction’s source operand is 1, the instruction is not required to
invalidate any TLB entries or entries in paging-structure caches.
• MOV to CR4. The behavior of the instruction depends on the bits being modified:
— The instruction invalidates all TLB entries (including global entries) and all entries in all paging-structure
caches (for all PCI Ds) if (1) it changes the value of CR4.PGE; 2 or (2) it changes the value of the CR4.PCI DE
from 1 to 0.
— The instruction invalidates all TLB entries and all entries in all paging-structure caches for the current PCI D
if (1) it changes the value of CR4.PAE; or (2) it changes the value of CR4.SMEP from 0 to 1.
• Task switch. I f a task switch changes the value of CR3, it invalidates all TLB entries associated with PCI D 000H
except those for global pages. I t also invalidates all entries in all paging-structure caches for associated with
PCI D 000H. 3
• VMX transitions. See Section 4.11.1.
The processor is always free to invalidate additional entries in the TLBs and paging-structure caches. The following
are some examples:
• I NVLPG may invalidate TLB entries for pages other than the one corresponding to its linear-address operand. I t
may invalidate TLB entries and paging-structure-cache entries associated with PCI Ds other than the current
PCI D.

1. If the paging structures map the linear address using a page larger than 4 KBytes and there are multiple TLB entries for that page
(see Section 4.10.2.3), the instruction invalidates all of them.
2. If CR4.PGE is changing from 0 to 1, there were no global TLB entries before the execution; if CR4.PGE is changing from 1 to 0, there
will be no global TLB entries after the execution.
3. Task switches do not occur in IA-32e mode and thus cannot occur with IA-32e paging. Since CR4.PCIDE can be set only with IA-32e
paging, task switches occur only with CR4.PCIDE = 0.

Vol. 3A 4-39
PAGING

• I NVPCI D may invalidate TLB entries for pages other than the one corresponding to the specified linear address.
I t may invalidate TLB entries and paging-structure-cache entries associated with PCI Ds other than the specified
PCI D.
• MOV to CR0 may invalidate TLB entries even if CR0.PG is not changing. For example, this may occur if either
CR0.CD or CR0.NW is modified.
• MOV to CR3 may invalidate TLB entries for global pages. I f CR4.PCI DE = 1 and bit 63 of the instruction’s source
operand is 0, it may invalidate TLB entries and entries in the paging-structure caches associated with PCI Ds
other than the current PCI D. I t may invalidate entries if CR4.PCI DE = 1 and bit 63 of the instruction’s source
operand is 1.
• MOV to CR4 may invalidate TLB entries when changing CR4.PSE or when changing CR4.SMEP from 1 to 0.
• On a processor supporting Hyper-Threading Technology, invalidations performed on one logical processor may
invalidate entries in the TLBs and paging-structure caches used by other logical processors.
(Other instructions and operations may invalidate entries in the TLBs and the paging-structure caches, but the
instructions identified above are recommended.)
I n addition to the instructions identified above, page faults invalidate entries in the TLBs and paging-structure
caches. I n particular, a page-fault exception resulting from an attempt to use a linear address will invalidate any
TLB entries that are for a page number corresponding to that linear address and that are associated with the
current PCI D. it also invalidates all entries in the paging-structure caches that would be used for that linear address
and that are associated with the current PCI D. 1 These invalidations ensure that the page-fault exception will not
recur (if the faulting instruction is re-executed) if it would not be caused by the contents of the paging structures
in memory (and if, therefore, it resulted from cached entries that were not invalidated after the paging structures
were modified in memory).
As noted in Section 4.10.2, some processors may choose to cache multiple smaller-page TLB entries for a transla-
tion specified by the paging structures to use a page larger than 4 KBytes. There is no way for software to be aware
that multiple translations for smaller pages have been used for a large page. The I NVLPG instruction and page
faults provide the same assurances that they provide when a single TLB entry is used: they invalidate all TLB
entries corresponding to the translation specified by the paging structures.

4.10.4.2 Recommended Invalidation


The following items provide some recommendations regarding when software should perform invalidations:
• I f software modifies a paging-structure entry that identifies the final page frame for a page number (either a
PTE or a paging-structure entry in which the PS flag is 1), it should execute I NVLPG for any linear address with
a page number whose translation uses that PTE. 2
(I f the paging-structure entry may be used in the translation of different page numbers — see Section 4.10.3.3
— software should execute I NVLPG for linear addresses with each of those page numbers; alternatively, it could
use MOV to CR3 or MOV to CR4.)
• I f software modifies a paging-structure entry that references another paging structure, it may use one of the
following approaches depending upon the types and number of translations controlled by the modified entry:
— Execute I NVLPG for linear addresses with each of the page numbers with translations that would use the
entry. However, if no page numbers that would use the entry have translations (e.g., because the P flags are
0 in all entries in the paging structure referenced by the modified entry), it remains necessary to execute
I NVLPG at least once.
— Execute MOV to CR3 if the modified entry controls no global pages.
— Execute MOV to CR4 to modify CR4.PGE.
• I f CR4.PCI DE = 1 and software modifies a paging-structure entry that does not map a page or in which the G
flag (bit 8) is 0, additional steps are required if the entry may be used for PCI Ds other than the current one. Any
one of the following suffices:

1. Unlike INVLPG, page faults need not invalidate all entries in the paging-structure caches, only those that would be used to translate
the faulting linear address.
2. One execution of INVLPG is sufficient even for a page with size greater than 4 KBytes.

4-40 Vol. 3A
PAGING

— Execute MOV to CR4 to modify CR4.PGE, either immediately or before again using any of the affected
PCI Ds. For example, software could use different (previously unused) PCI Ds for the processes that used the
affected PCI Ds.
— For each affected PCI D, execute MOV to CR3 to make that PCI D current (and to load the address of the
appropriate PML4 table). I f the modified entry controls no global pages and bit 63 of the source operand to
MOV to CR3 was 0, no further steps are required. Otherwise, execute I NVLPG for linear addresses with each
of the page numbers with translations that would use the entry; if no page numbers that would use the
entry have translations, execute I NVLPG at least once.
• I f software using PAE paging modifies a PDPTE, it should reload CR3 with the register’s current value to ensure
that the modified PDPTE is loaded into the corresponding PDPTE register (see Section 4.4.1).
• I f the nature of the paging structures is such that a single entry may be used for multiple purposes (see Section
4.10.3.3), software should perform invalidations for all of these purposes. For example, if a single entry might
serve as both a PDE and PTE, it may be necessary to execute I NVLPG with two (or more) linear addresses, one
that uses the entry as a PDE and one that uses it as a PTE. (Alternatively, software could use MOV to CR3 or
MOV to CR4.)
• As noted in Section 4.10.2, the TLBs may subsequently contain multiple translations for the address range if
software modifies the paging structures so that the page size used for a 4-KByte range of linear addresses
changes. A reference to a linear address in the address range may use any of these translations.
Software wishing to prevent this uncertainty should not write to a paging-structure entry in a way that would
change, for any linear address, both the page size and either the page frame, access rights, or other attributes.
I t can instead use the following algorithm: first clear the P flag in the relevant paging-structure entry (e.g.,
PDE); then invalidate any translations for the affected linear addresses (see above); and then modify the
relevant paging-structure entry to set the P flag and establish modified translation(s) for the new page size.
• Software should clear bit 63 of the source operand to a MOV to CR3 instruction that establishes a PCI D that had
been used earlier for a different linear-address space (e.g., with a different value in bits 51: 12 of CR3). This
ensures invalidation of any information that may have been cached for the previous linear-address space.
This assumes that both linear-address spaces use the same global pages and that it is thus not necessary to
invalidate any global TLB entries. I f that is not the case, software should invalidate those entries by executing
MOV to CR4 to modify CR4.PGE.

4.10.4.3 Optional Invalidation


The following items describe cases in which software may choose not to invalidate and the potential consequences
of that choice:
• I f a paging-structure entry is modified to change the P flag from 0 to 1, no invalidation is necessary. This is
because no TLB entry or paging-structure cache entry is created with information from a paging-structure
entry in which the P flag is 0. 1
• I f a paging-structure entry is modified to change the accessed flag from 0 to 1, no invalidation is necessary
(assuming that an invalidation was performed the last time the accessed flag was changed from 1 to 0). This is
because no TLB entry or paging-structure cache entry is created with information from a paging-structure
entry in which the accessed flag is 0.
• I f a paging-structure entry is modified to change the R/ W flag from 0 to 1, failure to perform an invalidation
may result in a “spurious” page-fault exception (e.g., in response to an attempted write access) but no other
adverse behavior. Such an exception will occur at most once for each affected linear address (see Section
4.10.4.1).
• I f CR4.SMEP = 0 and a paging-structure entry is modified to change the U/ S flag from 0 to 1, failure to perform
an invalidation may result in a “spurious” page-fault exception (e.g., in response to an attempted user-mode
access) but no other adverse behavior. Such an exception will occur at most once for each affected linear
address (see Section 4.10.4.1).
• I f a paging-structure entry is modified to change the XD flag from 1 to 0, failure to perform an invalidation may
result in a “spurious” page-fault exception (e.g., in response to an attempted instruction fetch) but no other

1. If it is also the case that no invalidation was performed the last time the P flag was changed from 1 to 0, the processor may use a
TLB entry or paging-structure cache entry that was created when the P flag had earlier been 1.

Vol. 3A 4-41
PAGING

adverse behavior. Such an exception will occur at most once for each affected linear address (see Section
4.10.4.1).
• I f a paging-structure entry is modified to change the accessed flag from 1 to 0, failure to perform an invali-
dation may result in the processor not setting that bit in response to a subsequent access to a linear address
whose translation uses the entry. Software cannot interpret the bit being clear as an indication that such an
access has not occurred.
• I f software modifies a paging-structure entry that identifies the final physical address for a linear address
(either a PTE or a paging-structure entry in which the PS flag is 1) to change the dirty flag from 1 to 0, failure
to perform an invalidation may result in the processor not setting that bit in response to a subsequent write to
a linear address whose translation uses the entry. Software cannot interpret the bit being clear as an indication
that such a write has not occurred.
• The read of a paging-structure entry in translating an address being used to fetch an instruction may appear to
execute before an earlier write to that paging-structure entry if there is no serializing instruction between the
write and the instruction fetch. Note that the invalidating instructions identified in Section 4.10.4.1 are all
serializing instructions.
• Section 4.10.3.3 describes situations in which a single paging-structure entry may contain information cached
in multiple entries in the paging-structure caches. Because all entries in these caches are invalidated by any
execution of I NVLPG, it is not necessary to follow the modification of such a paging-structure entry by
executing I NVLPG multiple times solely for the purpose of invalidating these multiple cached entries. (I t may be
necessary to do so to invalidate multiple TLB entries.)

4.10.4.4 Delayed Invalidation


Required invalidations may be delayed under some circumstances. Software developers should understand that,
between the modification of a paging-structure entry and execution of the invalidation instruction recommended in
Section 4.10.4.2, the processor may use translations based on either the old value or the new value of the paging-
structure entry. The following items describe some of the potential consequences of delayed invalidation:
• I f a paging-structure entry is modified to change from 1 to 0 the P flag from 1 to 0, an access to a linear address
whose translation is controlled by this entry may or may not cause a page-fault exception.
• I f a paging-structure entry is modified to change the R/ W flag from 0 to 1, write accesses to linear addresses
whose translation is controlled by this entry may or may not cause a page-fault exception.
• I f a paging-structure entry is modified to change the U/ S flag from 0 to 1, user-mode accesses to linear
addresses whose translation is controlled by this entry may or may not cause a page-fault exception.
• I f a paging-structure entry is modified to change the XD flag from 1 to 0, instruction fetches from linear
addresses whose translation is controlled by this entry may or may not cause a page-fault exception.
As noted in Section 8.1.1, an x87 instruction or an SSE instruction that accesses data larger than a quadword may
be implemented using multiple memory accesses. I f such an instruction stores to memory and invalidation has
been delayed, some of the accesses may complete (writing to memory) while another causes a page-fault excep-
tion. 1 I n this case, the effects of the completed accesses may be visible to software even though the overall instruc-
tion caused a fault.
I n some cases, the consequences of delayed invalidation may not affect software adversely. For example, when
freeing a portion of the linear-address space (by marking paging-structure entries “not present”), invalidation
using I NVLPG may be delayed if software does not re-allocate that portion of the linear-address space or the
memory that had been associated with it. However, because of speculative execution (or errant software), there
may be accesses to the freed portion of the linear-address space before the invalidations occur. I n this case, the
following can happen:
• Reads can occur to the freed portion of the linear-address space. Therefore, invalidation should not be delayed
for an address range that has read side effects.
• The processor may retain entries in the TLBs and paging-structure caches for an extended period of time.
Software should not assume that the processor will not use entries associated with a linear address simply
because time has passed.

1. If the accesses are to different pages, this may occur even if invalidation has not been delayed.

4-42 Vol. 3A
PAGING

• As noted in Section 4.10.3.1, the processor may create an entry in a paging-structure cache even if there are
no translations for any linear address that might use that entry. Thus, if software has marked “not present” all
entries in page table, the processor may subsequently create a PDE-cache entry for the PDE that references
that page table (assuming that the PDE itself is marked “present”).
• I f software attempts to write to the freed portion of the linear-address space, the processor might not generate
a page fault. (Such an attempt would likely be the result of a software error.) For that reason, the page frames
previously associated with the freed portion of the linear-address space should not be reallocated for another
purpose until the appropriate invalidations have been performed.

4.10.5 Propagation of Paging-Structure Changes to Multiple Processors


As noted in Section 4.10.4, software that modifies a paging-structure entry may need to invalidate entries in the
TLBs and paging-structure caches that were derived from the modified entry before it was modified. I n a system
containing more than one logical processor, software must account for the fact that there may be entries in the
TLBs and paging-structure caches of logical processors other than the one used to modify the paging-structure
entry. The process of propagating the changes to a paging-structure entry is commonly referred to as “TLB shoot-
down.”
TLB shootdown can be done using memory-based semaphores and/ or interprocessor interrupts (I PI ). The
following items describe a simple but inefficient example of a TLB shootdown algorithm for processors supporting
the I ntel-64 and I A-32 architectures:
1. Begin barrier: Stop all but one logical processor; that is, cause all but one to execute the HLT instruction or to
enter a spin loop.
2. Allow the active logical processor to change the necessary paging-structure entries.
3. Allow all logical processors to perform invalidations appropriate to the modifications to the paging-structure
entries.
4. Allow all logical processors to resume normal operation.
Alternative, performance-optimized, TLB shootdown algorithms may be developed; however, software developers
must take care to ensure that the following conditions are met:
• All logical processors that are using the paging structures that are being modified must participate and perform
appropriate invalidations after the modifications are made.
• I f the modifications to the paging-structure entries are made before the barrier or if there is no barrier, the
operating system must ensure one of the following: (1) that the affected linear-address range is not used
between the time of modification and the time of invalidation; or (2) that it is prepared to deal with the conse-
quences of the affected linear-address range being used during that period. For example, if the operating
system does not allow pages being freed to be reallocated for another purpose until after the required invalida-
tions, writes to those pages by errant software will not unexpectedly modify memory that is in use.
• Software must be prepared to deal with reads, instruction fetches, and prefetch requests to the affected linear-
address range that are a result of speculative execution that would never actually occur in the executed code
path.
When multiple logical processors are using the same linear-address space at the same time, they must coordinate
before any request to modify the paging-structure entries that control that linear-address space. I n these cases,
the barrier in the TLB shootdown routine may not be required. For example, when freeing a range of linear
addresses, some other mechanism can assure no logical processor is using that range before the request to free it
is made. I n this case, a logical processor freeing the range can clear the P flags in the PTEs associated with the
range, free the physical page frames associated with the range, and then signal the other logical processors using
that linear-address space to perform the necessary invalidations. All the affected logical processors must complete
their invalidations before the linear-address range and the physical page frames previously associated with that
range can be reallocated.

Vol. 3A 4-43
PAGING

4.11 INTERACTIONS WITH VIRTUAL-MACHINE EXTENSIONS (VMX)


The architecture for virtual-machine extensions (VMX) includes features that interact with paging. Section 4.11.1
discusses ways in which VMX-specific control transfers, called VMX transitions specially affect paging. Section
4.11.2 gives an overview of VMX features specifically designed to support address translation.

4.11.1 VMX Transitions


The VMX architecture defines two control transfers called VM entries and VM exits; collectively, these are called
VMX transitions. VM entries and VM exits are described in detail in Chapter 26 and Chapter 27, respectively, in
the I ntel® 64 and I A-32 Architectures Software Developer’s Manual, Volume 3C. The following items identify
paging-related details:
• VMX transitions modify the CR0 and CR4 registers and the I A32_ EFER MSR concurrently. For this reason, they
allow transitions between paging modes that would not otherwise be possible:
— VM entries allow transitions from I A-32e paging directly to either 32-bit paging or PAE paging.
— VM exits allow transitions from either 32-bit paging or PAE paging directly to I A-32e paging.
• VMX transitions that result in PAE paging load the PDPTE registers (see Section 4.4.1) as follows:
— VM entries load the PDPTE registers either from the physical address being loaded into CR3 or from the
virtual-machine control structure (VMCS); see Section 26.3.2.4.
— VM exits load the PDPTE registers from the physical address being loaded into CR3; see Section 27.5.4.
• VMX transitions invalidate the TLBs and paging-structure caches based on certain control settings. See Section
26.3.2.5 and Section 27.5.5 in the I ntel® 64 and I A-32 Architectures Software Developer’s Manual, Volume
3C.

4.11.2 VMX Support for Address Translation


Chapter 28, “VMX Support for Address Translation,” in the I ntel® 64 and I A-32 Architectures Software Developer’s
Manual, Volume 3C describe two features of the virtual-machine extensions (VMX) that interact directly with
paging. These are virtual-processor identifiers (VPI Ds) and the extended page table mechanism (EPT).
VPI Ds provide a way for software to identify to the processor the address spaces for different “virtual processors.”
The processor may use this identification to maintain concurrently information for multiple address spaces in its
TLBs and paging-structure caches, even when non-zero PCI Ds are not being used. See Section 28.1 for details.
When EPT is in use, the addresses in the paging-structures are not used as physical addresses to access memory
and memory-mapped I / O. I nstead, they are treated as guest-physical addresses and are translated through a set
of EPT paging structures to produce physical addresses. EPT can also specify its own access rights and memory
typing; these are used on conjunction with those specified in this chapter. See Section 28.2 for more information.
Both VPI Ds and EPT may change the way that a processor maintains information in TLBs and paging structure
caches and the ways in which software can manage that information. Some of the behaviors documented in Section
4.10 may change. See Section 28.3 for details.

4.12 USING PAGING FOR VIRTUAL MEMORY


With paging, portions of the linear-address space need not be mapped to the physical-address space; data for the
unmapped addresses can be stored externally (e.g., on disk). This method of mapping the linear-address space is
referred to as virtual memory or demand-paged virtual memory.
Paging divides the linear address space into fixed-size pages that can be mapped into the physical-address space
and/ or external storage. When a program (or task) references a linear address, the processor uses paging to trans-
late the linear address into a corresponding physical address if such an address is defined.
I f the page containing the linear address is not currently mapped into the physical-address space, the processor
generates a page-fault exception as described in Section 4.7. The handler for page-fault exceptions typically

4-44 Vol. 3A
PAGING

directs the operating system or executive to load data for the unmapped page from external storage into physical
memory (perhaps writing a different page from physical memory out to external storage in the process) and to
map it using paging (by updating the paging structures). When the page has been loaded into physical memory, a
return from the exception handler causes the instruction that generated the exception to be restarted.
Paging differs from segmentation through its use of fixed-size pages. Unlike segments, which usually are the same
size as the code or data structures they hold, pages have a fixed size. I f segmentation is the only form of address
translation used, a data structure present in physical memory will have all of its parts in memory. I f paging is used,
a data structure can be partly in memory and partly in disk storage.

4.13 MAPPING SEGMENTS TO PAGES


The segmentation and paging mechanisms provide in the support a wide variety of approaches to memory
management. When segmentation and paging are combined, segments can be mapped to pages in several ways.
To implement a flat (unsegmented) addressing environment, for example, all the code, data, and stack modules
can be mapped to one or more large segments (up to 4-GBytes) that share same range of linear addresses (see
Figure 3-2 in Section 3.2.2). Here, segments are essentially invisible to applications and the operating-system or
executive. I f paging is used, the paging mechanism can map a single linear-address space (contained in a single
segment) into virtual memory. Alternatively, each program (or task) can have its own large linear-address space
(contained in its own segment), which is mapped into virtual memory through its own paging structures.
Segments can be smaller than the size of a page. I f one of these segments is placed in a page which is not shared
with another segment, the extra memory is wasted. For example, a small data structure, such as a 1-Byte sema-
phore, occupies 4 KBytes if it is placed in a page by itself. I f many semaphores are used, it is more efficient to pack
them into a single page.
The I ntel-64 and I A-32 architectures do not enforce correspondence between the boundaries of pages and
segments. A page can contain the end of one segment and the beginning of another. Similarly, a segment can
contain the end of one page and the beginning of another.
Memory-management software may be simpler and more efficient if it enforces some alignment between page and
segment boundaries. For example, if a segment which can fit in one page is placed in two pages, there may be
twice as much paging overhead to support access to that segment.
One approach to combining paging and segmentation that simplifies memory-management software is to give
each segment its own page table, as shown in Figure 4-13. This convention gives the segment a single entry in the
page directory, and this entry provides the access control information for paging the entire segment.

Page Frames

LDT Page Directory Page Tables

PTE
PTE
PTE
Seg. Descript. PDE
Seg. Descript. PDE

PTE
PTE

Figure 4-13. Memory Management Convention That Assigns a Page Table


to Each Segment

Vol. 3A 4-45
PAGING

4-46 Vol. 3A

You might also like