Intel Architecture Software Developer's Manuals, Volume 3: System Programming
Intel Architecture Software Developer's Manuals, Volume 3: System Programming
Software Developer’s
Manual
Volume 3:
System Programming
1999
Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel
or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel’s Terms and
Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied
warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular
purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are
not intended for use in medical, life saving, or life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or
“undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or
incompatibilities arising from future changes to them.
Intel’s Intel Architecture processors (e.g., Pentium®, Pentium® II, Pentium® III, and Pentium® Pro processors) may
contain design defects or errors known as errata which may cause the product to deviate from published
specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your
product order.
Copies of documents which have an ordering number and are referenced in this document, or other Intel literature,
may be obtained by calling 1-800-548-4725, or by visiting Intel's literature center at http://www.intel.com.
CHAPTER 1
ABOUT THIS MANUAL
1.1. P6 FAMILY PROCESSOR TERMINOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.2. OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPER’S MANUAL,
VOLUME 3: SYSTEM PROGRAMMING GUIDE . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.3. OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPER’S MANUAL,
VOLUME 1: BASIC ARCHITECTURE 1-3
1.4. OVERVIEW OF THE INTEL ARCHITECTURE SOFTWARE DEVELOPER’S MANUAL,
VOLUME 2: INSTRUCTION SET REFERENCE 1-5
1.5. NOTATIONAL CONVENTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
1.5.1. Bit and Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6
1.5.2. Reserved Bits and Software Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6
1.5.3. Instruction Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7
1.5.4. Hexadecimal and Binary Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7
1.5.5. Segmented Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7
1.5.6. Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-8
1.6. RELATED LITERATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9
CHAPTER 2
SYSTEM ARCHITECTURE OVERVIEW
2.1. OVERVIEW OF THE SYSTEM-LEVEL ARCHITECTURE . . . . . . . . . . . . . . . . . . . 2-1
2.1.1. Global and Local Descriptor Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3
2.1.2. System Segments, Segment Descriptors, and Gates . . . . . . . . . . . . . . . . . . . . . .2-3
2.1.3. Task-State Segments and Task Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-4
2.1.4. Interrupt and Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-4
2.1.5. Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-5
2.1.6. System Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-5
2.1.7. Other System Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-6
2.2. MODES OF OPERATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
2.3. SYSTEM FLAGS AND FIELDS IN THE EFLAGS REGISTER . . . . . . . . . . . . . . . . 2-8
2.4. MEMORY-MANAGEMENT REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
2.4.1. Global Descriptor Table Register (GDTR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-10
2.4.2. Local Descriptor Table Register (LDTR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-11
2.4.3. IDTR Interrupt Descriptor Table Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-11
2.4.4. Task Register (TR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-11
2.5. CONTROL REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
2.5.1. CPUID Qualification of Control Register Flags . . . . . . . . . . . . . . . . . . . . . . . . . .2-18
2.6. SYSTEM INSTRUCTION SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18
2.6.1. Loading and Storing System Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-20
2.6.2. Verifying of Access Privileges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-20
2.6.3. Loading and Storing Debug Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-21
2.6.4. Invalidating Caches and TLBs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-21
2.6.5. Controlling the Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-22
2.6.6. Reading Performance-Monitoring and Time-Stamp Counters . . . . . . . . . . . . . .2-22
2.6.7. Reading and Writing Model-Specific Registers . . . . . . . . . . . . . . . . . . . . . . . . . .2-23
2.6.8. Loading and Storing the Streaming SIMD Extensions Control/Status Word . . . .2-23
iii
TABLE OF CONTENTS
CHAPTER 3
PROTECTED-MODE MEMORY MANAGEMENT
3.1. MEMORY MANAGEMENT OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
3.2. USING SEGMENTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
3.2.1. Basic Flat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3
3.2.2. Protected Flat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-4
3.2.3. Multisegment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5
3.2.4. Paging and Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6
3.3. PHYSICAL ADDRESS SPACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
3.4. LOGICAL AND LINEAR ADDRESSES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
3.4.1. Segment Selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7
3.4.2. Segment Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-8
3.4.3. Segment Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-9
3.4.3.1. Code- and Data-Segment Descriptor Types. . . . . . . . . . . . . . . . . . . . . . . . . .3-13
3.5. SYSTEM DESCRIPTOR TYPES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15
3.5.1. Segment Descriptor Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-16
3.6. PAGING (VIRTUAL MEMORY) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18
3.6.1. Paging Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-19
3.6.2. Page Tables and Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-20
3.6.2.1. Linear Address Translation (4-KByte Pages) . . . . . . . . . . . . . . . . . . . . . . . . .3-20
3.6.2.2. Linear Address Translation (4-MByte Pages). . . . . . . . . . . . . . . . . . . . . . . . .3-21
3.6.2.3. Mixing 4-KByte and 4-MByte Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-22
3.6.3. Base Address of the Page Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-23
3.6.4. Page-Directory and Page-Table Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-23
3.6.5. Not Present Page-Directory and Page-Table Entries . . . . . . . . . . . . . . . . . . . . .3-28
3.7. TRANSLATION LOOKASIDE BUFFERS (TLBS) . . . . . . . . . . . . . . . . . . . . . . . . . 3-28
3.8. PHYSICAL ADDRESS EXTENSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-29
3.8.1. Linear Address Translation With Extended
Addressing Enabled (4-KByte Pages) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-30
3.8.2. Linear Address Translation With Extended Addressing Enabled
(2-MByte or 4-MByte Pages) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-32
3.8.3. Accessing the Full Extended Physical Address Space With the
Extended Page-Table Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-32
3.8.4. Page-Directory and Page-Table Entries With Extended Addressing Enabled . .3-33
3.9. 36-BIT PAGE SIZE EXTENSION (PSE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-35
3.9.1. Description of the 36-bit PSE Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-36
3.9.2. Fault Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-39
3.10. MAPPING SEGMENTS TO PAGES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-40
CHAPTER 4
PROTECTION
4.1. ENABLING AND DISABLING SEGMENT AND PAGE PROTECTION . . . . . . . . . . 4-2
4.2. FIELDS AND FLAGS USED FOR SEGMENT-LEVEL AND
PAGE-LEVEL PROTECTION 4-2
4.3. LIMIT CHECKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
4.4. TYPE CHECKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
4.4.1. Null Segment Selector Checking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7
4.5. PRIVILEGE LEVELS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8
4.6. PRIVILEGE LEVEL CHECKING WHEN ACCESSING DATA SEGMENTS . . . . . . 4-9
4.6.1. Accessing Data in Code Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-12
4.7. PRIVILEGE LEVEL CHECKING WHEN LOADING THE SS REGISTER . . . . . . . 4-12
iv
TABLE OF CONTENTS
CHAPTER 5
INTERRUPT AND EXCEPTION HANDLING
5.1. INTERRUPT AND EXCEPTION OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
5.1.1. Sources of Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
5.1.1.1. External Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
5.1.1.2. Maskable Hardware Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
5.1.1.3. Software-Generated Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
5.1.2. Sources of Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
5.1.2.1. Program-Error Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
5.1.2.2. Software-Generated Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
5.1.2.3. Machine-Check Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
5.2. EXCEPTION AND INTERRUPT VECTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
5.3. EXCEPTION CLASSIFICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
5.4. PROGRAM OR TASK RESTART. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
5.5. NONMASKABLE INTERRUPT (NMI). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
5.5.1. Handling Multiple NMIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
5.6. ENABLING AND DISABLING INTERRUPTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
5.6.1. Masking Maskable Hardware Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
5.6.2. Masking Instruction Breakpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
5.6.3. Masking Exceptions and Interrupts When Switching Stacks . . . . . . . . . . . . . . . 5-10
5.7. PRIORITY AMONG SIMULTANEOUS EXCEPTIONS AND INTERRUPTS . . . . . 5-10
5.8. INTERRUPT DESCRIPTOR TABLE (IDT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11
5.9. IDT DESCRIPTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13
5.10. EXCEPTION AND INTERRUPT HANDLING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
5.10.1. Exception- or Interrupt-Handler Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
5.10.1.1. Protection of Exception- and Interrupt-Handler Procedures . . . . . . . . . . . . . 5-17
5.10.1.2. Flag Usage By Exception- or Interrupt-Handler Procedure. . . . . . . . . . . . . . 5-18
v
TABLE OF CONTENTS
CHAPTER 6
TASK MANAGEMENT
6.1. TASK MANAGEMENT OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
6.1.1. Task Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-1
6.1.2. Task State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-2
6.1.3. Executing a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-3
6.2. TASK MANAGEMENT DATA STRUCTURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4
6.2.1. Task-State Segment (TSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-4
6.2.2. TSS Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-6
6.2.3. Task Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-8
6.2.4. Task-Gate Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-8
6.3. TASK SWITCHING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10
6.4. TASK LINKING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14
6.4.1. Use of Busy Flag To Prevent Recursive Task Switching . . . . . . . . . . . . . . . . . .6-16
6.4.2. Modifying Task Linkages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-16
6.5. TASK ADDRESS SPACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-17
6.5.1. Mapping Tasks to the Linear and Physical Address Spaces. . . . . . . . . . . . . . . .6-17
6.5.2. Task Logical Address Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-18
6.6. 16-BIT TASK-STATE SEGMENT (TSS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19
CHAPTER 7
MULTIPLE-PROCESSOR MANAGEMENT
7.1. LOCKED ATOMIC OPERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
7.1.1. Guaranteed Atomic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-2
7.1.2. Bus Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-3
7.1.2.1. Automatic Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-3
7.1.2.2. Software Controlled Bus Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-4
7.1.3. Handling Self- and Cross-Modifying Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-5
7.1.4. Effects of a LOCK Operation on Internal Processor Caches. . . . . . . . . . . . . . . . .7-6
7.2. MEMORY ORDERING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6
7.2.1. Memory Ordering in the Pentium ® and Intel486™ Processors. . . . . . . . . . . . . . .7-7
7.2.2. Memory Ordering in the P6 Family Processors. . . . . . . . . . . . . . . . . . . . . . . . . . .7-7
7.2.3. Out of Order Stores From String Operations in P6 Family Processors . . . . . . . . .7-9
7.2.4. Strengthening or Weakening the Memory Ordering Model . . . . . . . . . . . . . . . . . .7-9
7.3. PROPAGATION OF PAGE TABLE ENTRY CHANGES TO
MULTIPLE PROCESSORS 7-11
7.4. SERIALIZING INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11
7.5. ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC). . . . . . . . . 7-13
7.5.1. Presence of APIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-14
7.5.2. Enabling or Disabling the Local APIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-14
7.5.3. APIC Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-14
7.5.4. Valid Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-15
7.5.5. Interrupt Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-15
7.5.6. Bus Arbitration Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-15
7.5.7. The Local APIC Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-16
7.5.8. Relocation of the APIC Registers Base Address. . . . . . . . . . . . . . . . . . . . . . . . .7-19
7.5.9. Interrupt Destination and APIC ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-20
7.5.9.1. Physical Destination Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-20
vi
TABLE OF CONTENTS
CHAPTER 8
PROCESSOR MANAGEMENT AND INITIALIZATION
8.1. INITIALIZATION OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1
8.1.1. Processor State After Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
8.1.2. Processor Built-In Self-Test (BIST) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
8.1.3. Model and Stepping Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5
8.1.4. First Instruction Executed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6
8.2. FPU INITIALIZATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6
8.2.1. Configuring the FPU Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6
8.2.2. Setting the Processor for FPU Software Emulation. . . . . . . . . . . . . . . . . . . . . . . 8-8
8.3. CACHE ENABLING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8
8.4. MODEL-SPECIFIC REGISTERS (MSRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8
8.5. MEMORY TYPE RANGE REGISTERS (MTRRS) . . . . . . . . . . . . . . . . . . . . . . . . . . 8-9
8.6. SOFTWARE INITIALIZATION FOR REAL-ADDRESS MODE OPERATION . . . . 8-10
vii
TABLE OF CONTENTS
CHAPTER 9
MEMORY CACHE CONTROL
9.1. INTERNAL CACHES, TLBS, AND BUFFERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1
9.2. CACHING TERMINOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4
9.3. METHODS OF CACHING AVAILABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
9.3.1. Buffering of Write Combining Memory Locations . . . . . . . . . . . . . . . . . . . . . . . . .9-7
9.3.2. Choosing a Memory Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-8
9.4. CACHE CONTROL PROTOCOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9
9.5. CACHE CONTROL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9
9.5.1. Precedence of Cache Controls (P6 Family Processor) . . . . . . . . . . . . . . . . . . . .9-13
9.5.2. Preventing Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-14
9.6. CACHE MANAGEMENT INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15
9.7. SELF-MODIFYING CODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15
9.8. IMPLICIT CACHING (P6 FAMILY PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . 9-16
9.9. EXPLICIT CACHING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-16
9.10. INVALIDATING THE TRANSLATION LOOKASIDE BUFFERS (TLBS) . . . . . . . . 9-17
9.11. WRITE BUFFER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17
9.12. MEMORY TYPE RANGE REGISTERS (MTRRS) . . . . . . . . . . . . . . . . . . . . . . . . . 9-18
9.12.1. MTRR Feature Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-20
9.12.2. Setting Memory Ranges with MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-21
viii
TABLE OF CONTENTS
CHAPTER 10
MMX™ TECHNOLOGY SYSTEM PROGRAMMING
10.1. EMULATION OF THE MMX™ INSTRUCTION SET . . . . . . . . . . . . . . . . . . . . . . . 10-1
10.2. THE MMX™ STATE AND MMX™ REGISTER ALIASING . . . . . . . . . . . . . . . . . . 10-1
10.2.1. Effect of MMX™ and Floating-Point Instructions on the FPU Tag Word . . . . . . 10-3
10.3. SAVING AND RESTORING THE MMX™ STATE AND REGISTERS . . . . . . . . . . 10-4
10.4. DESIGNING OPERATING SYSTEM TASK AND CONTEXT
SWITCHING FACILITIES 10-5
10.4.1. Using the TS Flag in Control Register CR0 to Control MMX™/FPU
State Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-5
10.5. EXCEPTIONS THAT CAN OCCUR WHEN EXECUTING
MMX™ INSTRUCTIONS 10-7
10.5.1. Effect of MMX™ Instructions on Pending Floating-Point Exceptions . . . . . . . . 10-8
10.6. DEBUGGING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-8
CHAPTER 11
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
11.1. EMULATION OF THE STREAMING SIMD EXTENSIONS . . . . . . . . . . . . . . . . . . 11-1
11.2. MMX™ STATE AND STREAMING SIMD EXTENSIONS . . . . . . . . . . . . . . . . . . . 11-1
11.3. NEW PENTIUM® III PROCESSOR REGISTERS . . . . . . . . . . . . . . . . . . . . . . . . . 11-1
11.3.1. SIMD Floating-point Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
11.3.2. SIMD Floating-point Control/Status Registers . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
11.3.2.1. Rounding Control Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
11.3.2.2. Flush-to-Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-5
11.4. ENABLING STREAMING SIMD EXTENSIONS SUPPORT. . . . . . . . . . . . . . . . . . 11-6
11.4.1. Enabling Streaming SIMD Extensions Support . . . . . . . . . . . . . . . . . . . . . . . . . 11-6
11.4.2. Device Not Available (DNA) Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-6
11.4.3. FXSAVE/FXRSTOR as a Replacement for FSAVE/FRSTOR. . . . . . . . . . . . . . 11-7
11.4.4. Numeric Error flag and IGNNE# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7
11.5. SAVING AND RESTORING THE STREAMING SIMD EXTENSIONS STATE . . . 11-7
11.6. DESIGNING OPERATING SYSTEM TASK AND CONTEXT
SWITCHING FACILITIES 11-8
ix
TABLE OF CONTENTS
11.6.1. Using the TS Flag in Control Register CR0 to Control SIMD Floating-Point
State Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-8
11.7. EXCEPTIONS THAT CAN OCCUR WHEN EXECUTING STREAMING SIMD
EXTENSIONS INSTRUCTIONS 11-11
11.7.1. SIMD Floating-point Non-Numeric Exceptions . . . . . . . . . . . . . . . . . . . . . . . . .11-12
11.7.2. SIMD Floating-point Numeric Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-13
11.7.2.1. Exception Priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-13
11.7.2.2. Automatic Masked Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-14
11.7.2.3. Software Exception Handling - Unmasked Exceptions. . . . . . . . . . . . . . . . .11-15
11.7.2.4. Interaction with x87 numeric exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . .11-16
11.7.3. SIMD Floating-point Numeric Exception Conditions and
Masked/Unmasked Responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-16
11.7.3.1. Invalid Operation Exception(#IA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-17
11.7.3.2. Division-By-Zero Exception (#Z). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-18
11.7.3.3. Denormal Operand Exception (#D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-19
11.7.3.4. Numeric Overflow Exception (#O) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-19
11.7.3.5. Numeric Underflow Exception (#U) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-20
11.7.3.6. Inexact Result (Precision) Exception (#P) . . . . . . . . . . . . . . . . . . . . . . . . . .11-21
11.7.4. Effect of Streaming SIMD Extensions Instructions on Pending
Floating-Point Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-22
11.8. DEBUGGING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-22
CHAPTER 12
SYSTEM MANAGEMENT MODE (SMM)
12.1. SYSTEM MANAGEMENT MODE OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1
12.2. SYSTEM MANAGEMENT INTERRUPT (SMI) . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
12.3. SWITCHING BETWEEN SMM AND THE OTHER PROCESSOR
OPERATING MODES 12-2
12.3.1. Entering SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-2
12.3.1.1. Exiting From SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-3
12.4. SMRAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-4
12.4.1. SMRAM State Save Map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-5
12.4.2. SMRAM Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-7
12.5. SMI HANDLER EXECUTION ENVIRONMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-8
12.6. EXCEPTIONS AND INTERRUPTS WITHIN SMM . . . . . . . . . . . . . . . . . . . . . . . 12-10
12.7. NMI HANDLING WHILE IN SMM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-11
12.8. SAVING THE FPU STATE WHILE IN SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-11
12.9. SMM REVISION IDENTIFIER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-12
12.10. AUTO HALT RESTART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-13
12.10.1. Executing the HLT Instruction in SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-14
12.11. SMBASE RELOCATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-14
12.11.1. Relocating SMRAM to an Address Above 1 MByte. . . . . . . . . . . . . . . . . . . . . .12-15
12.12. I/O INSTRUCTION RESTART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-15
12.12.1. Back-to-Back SMI Interrupts When I/O Instruction Restart Is Being Used . . . .12-16
12.13. SMM MULTIPLE-PROCESSOR CONSIDERATIONS. . . . . . . . . . . . . . . . . . . . . 12-17
CHAPTER 13
MACHINE-CHECK ARCHITECTURE
13.1. MACHINE-CHECK EXCEPTIONS AND ARCHITECTURE . . . . . . . . . . . . . . . . . . 13-1
13.2. COMPATIBILITY WITH PENTIUM® PROCESSOR . . . . . . . . . . . . . . . . . . . . . . . 13-1
13.3. MACHINE-CHECK MSRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2
13.3.1. Machine-Check Global Control MSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-2
x
TABLE OF CONTENTS
CHAPTER 14
CODE OPTIMIZATION
14.1. CODE OPTIMIZATION GUIDELINES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1
14.1.1. General Code Optimization Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1
14.1.2. Guidelines for Optimizing MMX™ Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2
14.1.3. Guidelines for Optimizing Floating-Point Code . . . . . . . . . . . . . . . . . . . . . . . . . 14-2
14.1.4. Guidelines for Optimizing SIMD Floating-point Code . . . . . . . . . . . . . . . . . . . . 14-3
14.2. BRANCH PREDICTION OPTIMIZATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-4
14.2.1. Branch Prediction Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-4
14.2.2. Optimizing Branch Predictions in Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-5
14.2.3. Eliminating and Reducing the Number of Branches . . . . . . . . . . . . . . . . . . . . . 14-5
14.3. REDUCING PARTIAL REGISTER STALLS ON P6 FAMILY PROCESSORS. . . . 14-7
14.4. ALIGNMENT RULES AND GUIDELINES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-9
14.4.1. Alignment Penalties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-9
14.4.2. Code Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-9
14.4.3. Data Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-9
14.4.3.1. Alignment of Data Structures and Arrays Greater Than 32 Bytes . . . . . . . 14-10
14.4.3.2. Alignment of Data in Memory and on the Stack . . . . . . . . . . . . . . . . . . . . . 14-10
14.5. INSTRUCTION SCHEDULING OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12
14.5.1. Instruction Pairing Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12
14.5.1.1. General Pairing Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-12
14.5.1.2. Integer Pairing Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13
14.5.1.3. MMX™ Instruction Pairing Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-17
14.5.2. Pipelining Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18
14.5.2.1. MMX™ Instruction Pipelining Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18
14.5.2.2. Floating-Point Pipelining Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-18
14.5.3. Scheduling Rules for P6 Family Processors . . . . . . . . . . . . . . . . . . . . . . . . . . 14-22
14.6. ACCESSING MEMORY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-24
14.6.1. Using MMX™ Instructions That Access Memory. . . . . . . . . . . . . . . . . . . . . . . 14-24
14.6.2. Partial Memory Accesses With MMX™ Instructions . . . . . . . . . . . . . . . . . . . . 14-25
14.6.3. Write Allocation Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-27
xi
TABLE OF CONTENTS
CHAPTER 15
DEBUGGING AND PERFORMANCE MONITORING
15.1. OVERVIEW OF THE DEBUGGING SUPPORT FACILITIES . . . . . . . . . . . . . . . . 15-1
15.2. DEBUG REGISTERS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2
15.2.1. Debug Address Registers (DR0-DR3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-4
15.2.2. Debug Registers DR4 and DR5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-4
15.2.3. Debug Status Register (DR6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-4
15.2.4. Debug Control Register (DR7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-5
15.2.5. Breakpoint Field Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-6
15.3. DEBUG EXCEPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-7
15.3.1. Debug Exception (#DB)—Interrupt Vector 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-8
15.3.1.1. Instruction-Breakpoint Exception Condition . . . . . . . . . . . . . . . . . . . . . . . . . .15-8
15.3.1.2. Data Memory and I/O Breakpoint Exception Conditions . . . . . . . . . . . . . . . .15-9
15.3.1.3. General-Detect Exception Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-10
15.3.1.4. Single-Step Exception Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-10
15.3.1.5. Task-Switch Exception Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-11
15.3.2. Breakpoint Exception (#BP)—Interrupt Vector 3 . . . . . . . . . . . . . . . . . . . . . . . .15-11
15.4. LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING . . . . . . . . . . . . 15-11
15.4.1. DebugCtlMSR Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-11
15.4.2. Last Branch and Last Exception MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-13
15.4.3. Monitoring Branches, Exceptions, and Interrupts . . . . . . . . . . . . . . . . . . . . . . .15-13
15.4.4. Single-Stepping on Branches, Exceptions, and Interrupts . . . . . . . . . . . . . . . .15-14
15.4.5. Initializing Last Branch or Last Exception/Interrupt Recording . . . . . . . . . . . . .15-14
15.5. TIME-STAMP COUNTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-14
15.6. PERFORMANCE-MONITORING COUNTERS . . . . . . . . . . . . . . . . . . . . . . . . . . 15-15
15.6.1. P6 Family Processor Performance-Monitoring Counters . . . . . . . . . . . . . . . . .15-15
15.6.1.1. PerfEvtSel0 and PerfEvtSel1 MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-16
15.6.1.2. PerfCtr0 and PerfCtr1 MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-18
15.6.1.3. Starting and Stopping the Performance-Monitoring Counters . . . . . . . . . . .15-18
15.6.1.4. Event and Time-Stamp Monitoring Software . . . . . . . . . . . . . . . . . . . . . . . .15-18
15.6.2. Monitoring Counter Overflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-19
15.6.3. Pentium® Processor Performance-Monitoring Counters. . . . . . . . . . . . . . . . . .15-20
15.6.3.1. Control and Event Select Register (CESR) . . . . . . . . . . . . . . . . . . . . . . . . .15-20
15.6.3.2. Use of the Performance-Monitoring Pins . . . . . . . . . . . . . . . . . . . . . . . . . . .15-21
15.6.3.3. Events Counted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-22
CHAPTER 16
8086 EMULATION
16.1. REAL-ADDRESS MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1
16.1.1. Address Translation in Real-Address Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-3
16.1.2. Registers Supported in Real-Address Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-4
16.1.3. Instructions Supported in Real-Address Mode . . . . . . . . . . . . . . . . . . . . . . . . . .16-4
16.1.4. Interrupt and Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-6
16.2. VIRTUAL-8086 MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-9
16.2.1. Enabling Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-9
16.2.2. Structure of a Virtual-8086 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-9
16.2.3. Paging of Virtual-8086 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-10
xii
TABLE OF CONTENTS
CHAPTER 17
MIXING 16-BIT AND 32-BIT CODE
17.1. DEFINING 16-BIT AND 32-BIT PROGRAM MODULES . . . . . . . . . . . . . . . . . . . . 17-2
17.2. MIXING 16-BIT AND 32-BIT OPERATIONS WITHIN A CODE SEGMENT. . . . . . 17-2
17.3. SHARING DATA AMONG MIXED-SIZE CODE SEGMENTS . . . . . . . . . . . . . . . . 17-3
17.4. TRANSFERRING CONTROL AMONG MIXED-SIZE CODE SEGMENTS . . . . . . 17-4
17.4.1. Code-Segment Pointer Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-5
17.4.2. Stack Management for Control Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-5
17.4.2.1. Controlling the Operand-Size Attribute For a Call. . . . . . . . . . . . . . . . . . . . . 17-7
17.4.2.2. Passing Parameters With a Gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-7
17.4.3. Interrupt Control Transfers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8
17.4.4. Parameter Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8
17.4.5. Writing Interface Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8
CHAPTER 18
INTEL ARCHITECTURE COMPATIBILITY
18.1. INTEL ARCHITECTURE FAMILIES AND CATEGORIES . . . . . . . . . . . . . . . . . . . 18-1
18.2. RESERVED BITS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-1
18.3. ENABLING NEW FUNCTIONS AND MODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-2
18.4. DETECTING THE PRESENCE OF NEW FEATURES THROUGH SOFTWARE . 18-2
18.5. MMX™ TECHNOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-3
18.6. STREAMING SIMD EXTENSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-3
18.7. NEW INSTRUCTIONS IN THE PENTIUM® AND LATER INTEL
ARCHITECTURE PROCESSORS 18-3
18.7.1. Instructions Added Prior to the Pentium® Processor. . . . . . . . . . . . . . . . . . . . . 18-5
18.8. OBSOLETE INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-5
18.9. UNDEFINED OPCODES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-6
xiii
TABLE OF CONTENTS
xiv
TABLE OF CONTENTS
xv
TABLE OF CONTENTS
APPENDIX A
PERFORMANCE-MONITORING EVENTS
A.1. P6 FAMILY PROCESSOR PERFORMANCE-MONITORING EVENTS . . . . . . . . . A-1
A.2. PENTIUM® PROCESSOR PERFORMANCE-MONITORING EVENTS . . . . . . . . A-12
APPENDIX B
MODEL-SPECIFIC REGISTERS
APPENDIX C ®
DUAL-PROCESSOR (DP) BOOTUP SEQUENCE EXAMPLE (SPECIFIC TO PENTIUM
PROCESSORS)
C.1. PRIMARY PROCESSOR’S SEQUENCE OF EVENTS . . . . . . . . . . . . . . . . . . . . . . C-1
C.2. SECONDARY PROCESSOR’S SEQUENCE OF EVENTS FOLLOWING
RECEIPT OF START-UP IPI C-4
APPENDIX D
MULTIPLE-PROCESSOR (MP) BOOTUP SEQUENCE EXAMPLE (SPECIFIC TO P6 FAMILY
PROCESSORS)
D.1. BSP’S SEQUENCE OF EVENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1
D.2. AP’S SEQUENCE OF EVENTS FOLLOWING RECEIPT OF START-UP IPI . . . . . D-3
APPENDIX E
PROGRAMMING THE LINT0 AND LINT1 INPUTS
E.1. CONSTANTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-1
E.2. LINT[0:1] PINS PROGRAMMING PROCEDURE . . . . . . . . . . . . . . . . . . . . . . . . . . E-1
xvi
TABLE OF FIGURES
xvii
TABLE OF FIGURES
xviii
TABLE OF FIGURES
Figure 9-8. Page Attribute Table Index Scheme for Paging Hierarchy . . . . . . . . . . . . . . 9-36
Figure 10-1. Mapping of MMX™ Registers to Floating-Point Registers . . . . . . . . . . . . . . 10-2
Figure 10-2. Example of MMX™/FPU State Saving During an
Operating System-Controlled Task Switch . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6
Figure 10-3. Mapping of MMX™ Registers to Floating-Point (FP) Registers . . . . . . . . . . 10-9
Figure 11-1. Streaming SIMD Extensions Control/Status Register Format. . . . . . . . . . . . 11-3
Figure 11-2. Example of SIMD Floating-Point State Saving During an
Operating System-Controlled Task Switch . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9
Figure 12-1. SMRAM Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5
Figure 12-2. SMM Revision Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-13
Figure 12-3. Auto HALT Restart Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-13
Figure 12-4. SMBASE Relocation Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-15
Figure 12-5. I/O Instruction Restart Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-16
Figure 13-1. Machine-Check MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2
Figure 13-2. MCG_CAP Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-3
Figure 13-3. MCG_STATUS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-3
Figure 13-4. MCi_CTL Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-4
Figure 13-5. MCi_STATUS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-5
Figure 13-6. Machine-Check Bank Address Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6
Figure 14-1. Stack and Memory Layout of Static Variables . . . . . . . . . . . . . . . . . . . . . . 14-11
Figure 14-2. Pipeline Example of AGI Stall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-29
Figure 15-1. Debug Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-3
Figure 15-2. DebugCtlMSR Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-12
Figure 15-3. PerfEvtSel0 and PerfEvtSel1 MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-17
Figure 15-4. CESR MSR (Pentium® Processor Only) . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-21
Figure 16-1. Real-Address Mode Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-4
Figure 16-2. Interrupt Vector Table in Real-Address Mode. . . . . . . . . . . . . . . . . . . . . . . . 16-7
Figure 16-3. Entering and Leaving Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 16-12
Figure 16-4. Privilege Level 0 Stack After Interrupt or Exception in Virtual-8086 Mode . 16-18
Figure 16-5. Software Interrupt Redirection Bit Map in TSS . . . . . . . . . . . . . . . . . . . . . . 16-25
Figure 17-1. Stack after Far 16- and 32-Bit Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6
Figure 18-1. I/O Map Base Address Differences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-30
xix
TABLE OF FIGURES
xx
TABLE OF TABLES
Table 2-1. Action Taken for Combinations of EM, MP, TS, CR4.OSFXSR,
and CPUID.XMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-15
Table 2-2. Summary of System Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-19
Table 3-1. Code- and Data-Segment Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-14
Table 3-2. System-Segment and Gate-Descriptor Types . . . . . . . . . . . . . . . . . . . . . . . .3-16
Table 3-3. Page Sizes and Physical Address Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-20
Table 3-4. Paging Modes and Physical Address Size . . . . . . . . . . . . . . . . . . . . . . . . . . .3-37
Table 4-1. Privilege Check Rules for Call Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-19
Table 4-2. Combined Page-Directory and Page-Table Protection. . . . . . . . . . . . . . . . . .4-33
Table 5-1. Protected-Mode Exceptions and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . .5-6
Table 5-2. SIMD Floating-Point Exceptions Priority. . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-11
Table 5-3. Priority Among Simultaneous Exceptions and Interrupts . . . . . . . . . . . . . . . .5-12
Table 5-4. Interrupt and Exception Classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-32
Table 5-5. Conditions for Generating a Double Fault . . . . . . . . . . . . . . . . . . . . . . . . . . .5-33
Table 5-6. Invalid TSS Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-35
Table 5-7. Alignment Requirements by Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-50
Table 6-1. Exception Conditions Checked During a Task Switch . . . . . . . . . . . . . . . . . .6-13
Table 6-2. Effect of a Task Switch on Busy Flag, NT Flag, Previous Task Link Field,
and TS Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-15
Table 7-1. Local APIC Register Address Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-18
Table 7-2. Valid Combinations for the APIC Interrupt Command Register . . . . . . . . . . .7-29
Table 7-3. EOI Message (14 Cycles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-37
Table 7-4. Short Message (21 Cycles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-38
Table 7-5. Nonfocused Lowest Priority Message (34 Cycles) . . . . . . . . . . . . . . . . . . . .7-39
Table 7-6. APIC Bus Status Cycles Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-40
Table 7-7. Types of Boot Phase IPIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-47
Table 7-8. Boot Phase IPI Message Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-47
Table 8-1. 32-Bit Intel Architecture Processor States Following Power-up,
Reset, or INIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-3
Table 8-2. Recommended Settings of EM and MP Flags on Intel
Architecture Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-7
Table 8-3. Software Emulation Settings of EM, MP, and NE Flags . . . . . . . . . . . . . . . . . .8-8
Table 8-4. Main Initialization Steps in STARTUP.ASM Source Listing . . . . . . . . . . . . . .8-18
Table 8-5. Relationship Between BLD Item and ASM Source File . . . . . . . . . . . . . . . . .8-31
Table 8-6. P6 Family Processor MSR Register Components . . . . . . . . . . . . . . . . . . . . .8-33
Table 8-7. Microcode Update Encoding Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-34
Table 8-8. Microcode Update Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-43
Table 8-9. Parameters for the Presence Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-44
Table 8-10. Parameters for the Write Update Data Function. . . . . . . . . . . . . . . . . . . . . . .8-45
Table 8-11. Parameters for the Control Update Sub-function . . . . . . . . . . . . . . . . . . . . . .8-48
Table 8-12. Mnemonic Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-48
Table 8-13. Parameters for the Read Microcode Update Data Function. . . . . . . . . . . . . .8-49
Table 8-14. Return Code Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-50
Table 9-1. Characteristics of the Caches, TLBs, and Write Buffer in
Intel Architecture Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-3
Table 9-2. Methods of Caching Available in P6 Family, Pentium ®,
and Intel486™ Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-6
Table 9-3. MESI Cache Line States. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-9
Table 9-4. Cache Operating Modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-11
xxi
TABLE OF TABLES
Table 9-5. Effective Memory Type Depending on MTRR, PCD, and PWT Settings . . . .9-14
Table 9-6. MTRR Memory Types and Their Properties . . . . . . . . . . . . . . . . . . . . . . . . . .9-19
Table 9-7. Address Mapping for Fixed-Range MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . .9-23
Table 9-8. PAT Indexing and Values After Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-35
Table 9-9. Effective Memory Type Depending on MTRRs and PAT . . . . . . . . . . . . . . . .9-37
Table 9-10. PAT Memory Types and Their Properties . . . . . . . . . . . . . . . . . . . . . . . . . . .9-38
Table 10-1. Effects of MMX™ Instructions on FPU State . . . . . . . . . . . . . . . . . . . . . . . . .10-3
Table 10-2. Effect of the MMX™ and Floating-Point Instructions on the
FPU Tag Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-3
Table 11-1. SIMD Floating-point Register Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-2
Table 11-2. Rounding Control Field (RC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-4
Table 11-3. Rounding of Positive Numbers Greater than the
Maximum Positive Finite Value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-5
Table 11-4. Rounding of Negative Numbers Smaller than the
Maximum Negative Finite Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-5
Table 11-5. CPUID Bits for Streaming SIMD Extensions Support . . . . . . . . . . . . . . . . . .11-6
Table 11-6. CR4 Bits for Streaming SIMD Extensions Support . . . . . . . . . . . . . . . . . . . .11-6
Table 11-7. Streaming SIMD Extensions Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-12
Table 11-8. Invalid Arithmetic Operations and the Masked Responses to Them . . . . . .11-18
Table 11-9. Masked Responses to Numeric Overflow . . . . . . . . . . . . . . . . . . . . . . . . . .11-20
Table 12-1. SMRAM State Save Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-5
Table 12-2. Processor Register Initialization in SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-9
Table 12-3. Auto HALT Restart Flag Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-14
Table 12-4. I/O Instruction Restart Field Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-16
Table 13-1. Simple Error Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-9
Table 13-2. General Forms of Compound Error Codes. . . . . . . . . . . . . . . . . . . . . . . . . . .13-9
Table 13-3. Encoding for TT (Transaction Type) Sub-Field. . . . . . . . . . . . . . . . . . . . . . .13-10
Table 13-4. Level Encoding for LL (Memory Hierarchy Level) Sub-Field . . . . . . . . . . . .13-10
Table 13-5. Encoding of Request (RRRR) Sub-Field . . . . . . . . . . . . . . . . . . . . . . . . . . .13-10
Table 13-6. Encodings of PP, T, and II Sub-Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-11
Table 13-7. Encoding of the MCi_STATUS Register for External Bus Errors . . . . . . . .13-11
Table 14-1. Small and Large General-Purpose Register Pairs . . . . . . . . . . . . . . . . . . . . .14-7
Table 14-2. Pairable Integer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-14
Table 15-1. Breakpointing Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-7
Table 15-2. Debug Exception Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-8
Table 16-1. Real-Address Mode Exceptions and Interrupts . . . . . . . . . . . . . . . . . . . . . .16-8
Table 16-2. Software Interrupt Handling Methods While in Virtual-8086 Mode . . . . . . . .16-24
Table 17-1. Characteristics of 16-Bit and 32-Bit Program Modules. . . . . . . . . . . . . . . . . .17-1
Table 18-1. New Instructions in the Pentium® and Later Intel Architecture Processors . .18-3
Table 18-1. Recommended Values of the FP Related Bits for Intel486™ SX
Microprocessor/Intel 487 SX Math Coprocessor System . . . . . . . . . . . . . . .18-20
Table 18-2. EM and MP Flag Interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18-20
Table A-1. Events That Can Be Counted with the P6 Family Performance-
Monitoring Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Table A-2. Events That Can Be Counted with the Pentium® Processor Performance-
Monitoring Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-12
Table B-1. Model-Specific Registers (MSRs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
xxii
1
About This Manual
CHAPTER 1
ABOUT THIS MANUAL
The Intel Architecture Software Developer’s Manual, Volume 2: Instruction Set Reference
(Order Number 243191) is part of a three-volume set that describes the architecture and
programming environment of all Intel Architecture processors. The other two volumes in this
set are:
• The Intel Architecture Software Developer’s Manual, Volume 1: Basic Architecture (Order
Number 243190).
• The Intel Architecture Software Developer’s Manual, Volume 3: System Programing Guide
(Order Number 243192).
The Intel Architecture Software Developer’s Manual, Volume 1, describes the basic architecture
and programming environment of an Intel Architecture processor; the Intel Architecture Soft-
ware Developer’s Manual, Volume 2, describes the instructions set of the processor and the
opcode structure. These two volumes are aimed at application programmers who are writing
programs to run under existing operating systems or executives. The Intel Architecture Software
Developer’s Manual, Volume 3, describes the operating-system support environment of an Intel
Architecture processor, including memory management, protection, task management, interrupt
and exception handling, and system management mode. It also provides Intel Architecture
processor compatibility information. This volume is aimed at operating-system and BIOS
designers and programmers.
1-1
ABOUT THIS MANUAL
1-2
ABOUT THIS MANUAL
1-3
ABOUT THIS MANUAL
manuals and lists related Intel manuals and documentation of interest to programmers and hard-
ware designers.
Chapter 2 — Introduction to the Intel Architecture. Introduces the Intel Architecture and the
families of Intel processors that are based on this architecture. It also gives an overview of the
common features found in these processors and brief history of the Intel Architecture.
Chapter 3 — Basic Execution Environment. Introduces the models of memory organization
and describes the register set used by applications.
Chapter 4 — Procedure Calls, Interrupts, and Exceptions. Describes the procedure stack
and the mechanisms provided for making procedure calls and for servicing interrupts and
exceptions.
Chapter 5 — Data Types and Addressing Modes. Describes the data types and addressing
modes recognized by the processor.
Chapter 6 — Instruction Set Summary. Gives an overview of all the Intel Architecture
instructions except those executed by the processor’s floating-point unit. The instructions are
presented in functionally related groups.
Chapter 7 — Floating-Point Unit. Describes the Intel Architecture floating-point unit,
including the floating-point registers and data types; gives an overview of the floating-point
instruction set; and describes the processor’s floating-point exception conditions.
Chapter 8 — Programming with the Intel MMX™ Technology. Describes the Intel MMX™
technology, including MMX™ registers and data types, and gives an overview of the MMX™
instruction set.
Chapter 9 — Programming with the Streaming SIMD Extensions. Describes the Intel
Streaming SIMD Extensions, including the registers and data types.
Chapter 10— Input/Output. Describes the processor’s I/O architecture, including I/O port
addressing, the I/O instructions, and the I/O protection mechanism.
Chapter 11 — Processor Identification and Feature Determination. Describes how to deter-
mine the CPU type and the features that are available in the processor.
Appendix A — EFLAGS Cross-Reference. Summarizes how the Intel Architecture instruc-
tions affect the flags in the EFLAGS register.
Appendix B — EFLAGS Condition Codes. Summarizes how the conditional jump, move, and
byte set on condition code instructions use the condition code flags (OF, CF, ZF, SF, and PF) in
the EFLAGS register.
Appendix C — Floating-Point Exceptions Summary. Summarizes the exceptions that can be
raised by floating-point instructions.
Appendix D — SIMD Floating-Point Exceptions Summary. Provides the Streaming SIMD
Extensions mnemonics, and the exceptions that each instruction can cause.
Appendix E — Guidelines for Writing FPU Exception Handlers. Describes how to design
and write MS-DOS* compatible exception handling facilities for FPU and SIMD floating-point
exceptions, including both software and hardware requirements and assembly-language code
1-4
ABOUT THIS MANUAL
examples. This appendix also describes general techniques for writing robust FPU exception
handlers.
Appendix F — Guidelines for Writing SIMD-FP Exception Handlers. Provides guidelines
for the Streaming SIMD Extensions instructions that can generate numeric (floating-point)
exceptions, and gives an overview of the necessary support for handling such exceptions.
1-5
ABOUT THIS MANUAL
NOTE
Avoid any software dependence upon the state of reserved bits in Intel Archi-
tecture registers. Depending upon the values of reserved register bits will
make software dependent upon the unspecified manner in which the
processor handles these bits. Programs that depend upon reserved values risk
incompatibility with future processors.
Data Structure
Highest 24 23 8 7 Bit offset
31 16 15 0
Address
28
24
20
16
12
8
4
Lowest
Byte 3 Byte 2 Byte 1 Byte 0 0 Address
Byte Offset
1-6
ABOUT THIS MANUAL
1-7
ABOUT THIS MANUAL
refer to the code space, and stack addresses would always refer to the stack space. The following
notation is used to specify a byte address within a segment:
Segment-register:Byte-address
For example, the following segment address identifies the byte at address FF79H in the segment
pointed by the DS register:
DS:FF79H
The following segment address identifies an instruction address in the code segment. The CS
register points to the code segment and the EIP register contains the address of the instruction.
CS:EIP
1.5.6. Exceptions
An exception is an event that typically occurs when an instruction causes an error. For example,
an attempt to divide by zero generates an exception. However, some exceptions, such as break-
points, occur under other conditions. Some types of exceptions may provide error codes. An
error code reports additional information about the error. An example of the notation used to
show an exception and error code is shown below.
#PF(fault code)
This example refers to a page-fault exception under conditions where an error code naming a
type of fault is reported. Under some conditions, exceptions which produce error codes may not
be able to report an accurate code. In this case, the error code is zero, as shown below for a
general-protection exception.
#GP(0)
Refer to Chapter 5, Interrupt and Exception Handling, for a list of exception mnemonics and
their descriptions.
1-8
ABOUT THIS MANUAL
1-9
ABOUT THIS MANUAL
1-10
2
System Architecture
Overview
SYSTEM ARCHITECTURE OVERVIEW
CHAPTER 2
SYSTEM ARCHITECTURE OVERVIEW
The 32-bit members of the Intel Architecture family of processors provide extensive support for
operating-system and system-development software. This support is part of the processor’s
system-level architecture and includes features to assist in the following operations:
• Memory management
• Protection of software modules
• Multitasking
• Exception and interrupt handling
• Multiprocessing
• Cache management
• Hardware resource and power management
• Debugging and performance monitoring
This chapter provides a brief overview of the processor’s system-level architecture; a detailed
description of each part of this architecture given in the following chapters. This chapter also
describes the system registers that are used to set up and control the processor at the system level
and gives a brief overview of the processor’s system-level (operating system) instructions.
Many of the system-level architectural features of the processor are used only by system
programmers. Application programmers may need to read this chapter, and the following chap-
ters which describe the use of these features, in order to understand the hardware facilities used
by system programmers to create a reliable and secure environment for application programs.
NOTE
This overview and most of the subsequent chapters of this book focus on the
“native” or protected-mode operation of the 32-bit Intel Architecture
processors. As described in Chapter 8, Processor Management and Initial-
ization, all Intel Architecture processors enter real-address mode following a
power-up or reset. Software must then initiate a switch from real-address
mode to protected mode.
2-1
SYSTEM ARCHITECTURE OVERVIEW
Interrupt Handler
Segment Sel. Seg. Desc.
Code
Current
Interrupt TSS Seg. Sel. TSS Desc. TSS Stack
Vector
Seg. Desc.
Interrupt Descriptor Task-State
Table (IDT) Segment (TSS) Task
TSS Desc.
Code
Interrupt Gate LTD Desc. Data
Stack
Task Gate
GDTR
Trap Gate
Local Descriptor Exception Handler
Table (LDT) Code
Current
TSS Stack
IDTR Call-Gate Seg. Desc.
Segment Selector
Call Gate
Protected Procedure
Code
LDTR Current
TSS Stack
Linear Addr.
Page Directory Page Table Page
Physical Addr.
Pg. Dir. Entry Pg. Tbl. Entry
2-2
SYSTEM ARCHITECTURE OVERVIEW
1. The word “procedure” is commonly used in this document as a general term for a logical unit or block of
code (such as a program, procedure, function, or routine). The term is not restricted to the definition of a
procedure in the Intel Architecture assembly language.
2-3
SYSTEM ARCHITECTURE OVERVIEW
If the call requires a change in privilege level, the processor also switches to the stack for that
privilege level. (The segment selector for the new stack is obtained from the TSS for the
currently running task.) Gates also facilitate transitions between 16-bit and 32-bit code
segments, and vice versa.
2-4
SYSTEM ARCHITECTURE OVERVIEW
2-5
SYSTEM ARCHITECTURE OVERVIEW
registers varies among the different members of the Intel Architecture processor families.
Section 8.4., “Model-Specific Registers (MSRs)” in Chapter 8, Processor Management and
Initialization for more information about the MSRs and Appendix B, Model-Specific Registers
for a complete list of the MSRs.
Most systems restrict access to all system registers (other than the EFLAGS register) by appli-
cation programs. Systems can be designed, however, where all programs and procedures run at
the most privileged level (privilege level 0), in which case application programs are allowed to
modify the system registers.
2-6
SYSTEM ARCHITECTURE OVERVIEW
system management interrupt (SMI). In SMM, the processor switches to a separate address
space while saving the context of the currently running program or task. SMM-specific
code may then be executed transparently. Upon returning from SMM, the processor is
placed back into its state prior to the SMI.
• Virtual-8086 mode. In protected mode, the processor supports a quasi-operating mode
known as virtual-8086 mode. This mode allows the processor to execute 8086 software in
a protected, multitasking environment.
Figure 2-2 shows how the processor moves among these operating modes.
Real-Address
Mode
Reset SMI#
or
Reset or RSM
PE=1
PE=0
SMI# System
Reset Protected Mode Management
Mode
RSM
VM=0 VM=1
SMI#
Virtual-8086
Mode
RSM
The processor is placed in real-address mode following power-up or a reset. Thereafter, the PE
flag in control register CR0 controls whether the processor is operating in real-address or
protected mode (refer to Section 2.5., “Control Registers”). Refer to Section 8.8., “Mode
Switching” in Chapter 8, Processor Management and Initialization for detailed information on
switching between real-address mode and protected mode.
The VM flag in the EFLAGS register determines whether the processor is operating in protected
mode or virtual-8086 mode. Transitions between protected mode and virtual-8086 mode are
generally carried out as part of a task switch or a return from an interrupt or exception handler
(refer to Section 16.2.5., “Entering Virtual-8086 Mode” in Chapter 16, 8086 Emulation).
The processor switches to SMM whenever it receives an SMI while the processor is in real-
address, protected, or virtual-8086 modes. Upon execution of the RSM instruction, the
processor always returns to the mode it was in when the SMI occurred.
2-7
SYSTEM ARCHITECTURE OVERVIEW
31 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
I
V V
I I I A V R 0 N O O D I T S Z A P C
Reserved (set to 0) D C M F T P F F F F F F 0 F 0 F 1 F
P F
L
ID — Identification Flag
VIP — Virtual Interrupt Pending
VIF — Virtual Interrupt Flag
AC — Alignment Check
VM — Virtual-8086 Mode
RF — Resume Flag
NT — Nested Task Flag
IOPL— I/O Privilege Level
IF — Interrupt Enable Flag
TF — Trap Flag
Reserved
IF Interrupt enable (bit 9). Controls the response of the processor to maskable hardware
interrupt requests (refer to Section 5.1.1.2., “Maskable Hardware Interrupts” in
Chapter 5, Interrupt and Exception Handling). Set to respond to maskable hardware
interrupts; cleared to inhibit maskable hardware interrupts. The IF flag does not affect
the generation of exceptions or nonmaskable interrupts (NMI interrupts). The CPL,
IOPL, and the state of the VME flag in control register CR4 determine whether the IF
flag can be modified by the CLI, STI, POPF, POPFD, and IRET instructions.
IOPL I/O privilege level field (bits 12 and 13). Indicates the I/O privilege level (IOPL) of
the currently running program or task. The CPL of the currently running program or
task must be less than or equal to the IOPL to access the I/O address space. This field
can only be modified by the POPF and IRET instructions when operating at a CPL of
0. Refer to Chapter 10, Input/Output, of the Intel Architecture Software Developer’s
Manual, Volume 1, for more information on the relationship of the IOPL to I/O opera-
tions.
2-8
SYSTEM ARCHITECTURE OVERVIEW
The IOPL is also one of the mechanisms that controls the modification of the IF flag
and the handling of interrupts in virtual-8086 mode when the virtual mode extensions
are in effect (the VME flag in control register CR4 is set).
NT Nested task (bit 14). Controls the chaining of interrupted and called tasks. The
processor sets this flag on calls to a task initiated with a CALL instruction, an interrupt,
or an exception. It examines and modifies this flag on returns from a task initiated with
the IRET instruction. The flag can be explicitly set or cleared with the POPF/POPFD
instructions; however, changing to the state of this flag can generate unexpected excep-
tions in application programs. Refer to Section 6.4., “Task Linking” in Chapter 6, Task
Management for more information on nested tasks.
RF Resume (bit 16). Controls the processor’s response to instruction-breakpoint condi-
tions. When set, this flag temporarily disables debug exceptions (#DE) from being
generated for instruction breakpoints; although, other exception conditions can
cause an exception to be generated. When clear, instruction breakpoints will generate
debug exceptions.
The primary function of the RF flag is to allow the restarting of an instruction following
a debug exception that was caused by an instruction breakpoint condition. Here,
debugger software must set this flag in the EFLAGS image on the stack just prior to
returning to the interrupted program with the IRETD instruction, to prevent the instruc-
tion breakpoint from causing another debug exception. The processor then automati-
cally clears this flag after the instruction returned to has been successfully executed,
enabling instruction breakpoint faults again.
Refer to Section 15.3.1.1., “Instruction-Breakpoint Exception Condition”, in Chapter
15, Debugging and Performance Monitoring, for more information on the use of this
flag.
VM Virtual-8086 mode (bit 17). Set to enable virtual-8086 mode; clear to return to
protected mode. Refer to Section 16.2.1., “Enabling Virtual-8086 Mode” in Chapter
16, 8086 Emulation for a detailed description of the use of this flag to switch to virtual-
8086 mode.
AC Alignment check (bit 18). Set this flag and the AM flag in the CR0 register to enable
alignment checking of memory references; clear the AC flag and/or the AM flag to
disable alignment checking. An alignment-check exception is generated when refer-
ence is made to an unaligned operand, such as a word at an odd byte address or a
doubleword at an address which is not an integral multiple of four. Alignment-check
exceptions are generated only in user mode (privilege level 3). Memory references that
default to privilege level 0, such as segment descriptor loads, do not generate this
exception even when caused by instructions executed in user-mode.
The alignment-check exception can be used to check alignment of data. This is useful
when exchanging data with other processors, which require all data to be aligned. The
alignment-check exception can also be used by interpreters to flag some pointers as
special by misaligning the pointer. This eliminates overhead of checking each pointer
and only handles the special pointer when used.
2-9
SYSTEM ARCHITECTURE OVERVIEW
VIF Virtual Interrupt (bit 19). Contains a virtual image of the IF flag. This flag is used in
conjunction with the VIP flag. The processor only recognizes the VIF flag when either
the VME flag or the PVI flag in control register CR4 is set and the IOPL is less than 3.
(The VME flag enables the virtual-8086 mode extensions; the PVI flag enables the
protected-mode virtual interrupts.) Refer to Section 16.3.3.5., “Method 6: Software
Interrupt Handling” and Section 16.4., “Protected-Mode Virtual Interrupts” in Chapter
16, 8086 Emulation for detailed information about the use of this flag.
VIP Virtual interrupt pending (bit 20). Set by software to indicate that an interrupt is
pending; cleared to indicate that no interrupt is pending. This flag is used in conjunc-
tion with the VIF flag. The processor reads this flag but never modifies it. The
processor only recognizes the VIP flag when either the VME flag or the PVI flag in
control register CR4 is set and the IOPL is less than 3. (The VME flag enables the
virtual-8086 mode extensions; the PVI flag enables the protected-mode virtual inter-
rupts.) Refer to Section 16.3.3.5., “Method 6: Software Interrupt Handling” and
Section 16.4., “Protected-Mode Virtual Interrupts” in Chapter 16, 8086 Emulation for
detailed information about the use of this flag.
ID Identification (bit 21). The ability of a program or procedure to set or clear this flag
indicates support for the CPUID instruction.
2-10
SYSTEM ARCHITECTURE OVERVIEW
the limit is set to FFFFH. A new base address must be loaded into the GDTR as part of the
processor initialization process for protected-mode operation. Refer to Section 3.5.1., “Segment
Descriptor Tables” in Chapter 3, Protected-Mode Memory Management for more information
on the base address and limit fields.
2-11
SYSTEM ARCHITECTURE OVERVIEW
address, limit, and descriptor attributes from the TSS descriptor are automatically loaded into
the task register. On power up or reset of the processor, the base address is set to the default value
of 0 and the limit is set to FFFFH.
When a task switch occurs, the task register is automatically loaded with the segment selector
and descriptor for the TSS for the new task. The contents of the task register are not automati-
cally saved prior to writing the new TSS information into the register.
31 10 9 8 7 6 5 4 3 2 1 0
P P M P P T P V
Reserved (set to 0) C G C A S D S V M CR4
E D I E
E E E E E
OSXMMEXCPT
OSFXSR
31 12 11 5 4 3 2 0
P P
C W CR3
Page-Directory Base
D T (PDBR)
31 0
31 0
CR1
31 30 29 30 19 18 17 16 15 6 5 4 3 2 1 0
P C N A W N E T E M P
G D W M P E T S M P E CR0
Reserved
2-12
SYSTEM ARCHITECTURE OVERVIEW
Table 9-4, in Chapter 9, Memory Cache Control, for detailed information about the
affect of the NW flag on caching for other settings of the CD and NW flags.
AM Alignment Mask (bit 18 of CR0). Enables automatic alignment checking when set;
disables alignment checking when clear. Alignment checking is performed only when
the AM flag is set, the AC flag in the EFLAGS register is set, the CPL is 3, and the
processor is operating in either protected or virtual-8086 mode.
WP Write Protect (bit 16 of CR0). Inhibits supervisor-level procedures from writing into
user-level read-only pages when set; allows supervisor-level procedures to write into
user-level read-only pages when clear. This flag facilitates implementation of the copy-
on-write method of creating a new process (forking) used by operating systems such as
UNIX*.
NE Numeric Error (bit 5 of CR0). Enables the native (internal) mechanism for reporting
FPU errors when set; enables the PC-style FPU error reporting mechanism when clear.
When the NE flag is clear and the IGNNE# input is asserted, FPU errors are ignored.
When the NE flag is clear and the IGNNE# input is deasserted, an unmasked FPU error
causes the processor to assert the FERR# pin to generate an external interrupt and to
stop instruction execution immediately before executing the next waiting floating-
point instruction or WAIT/FWAIT instruction. The FERR# pin is intended to drive an
input to an external interrupt controller (the FERR# pin emulates the ERROR# pin of
the Intel 287 and Intel 387 DX math coprocessors). The NE flag, IGNNE# pin, and
FERR# pin are used with external logic to implement PC-style error reporting. (Refer
to “Software Exception Handling” in Chapter 7, and Appendix D in the Intel Architec-
ture Software Developer’s Manual, Volume 1, for more information about FPU error
reporting and for detailed information on when the FERR# pin is asserted, which is
implementation dependent.)
ET Extension Type (bit 4 of CR0). Reserved in the P6 family and Pentium® processors.
(In the P6 family processors, this flag is hardcoded to 1.) In the Intel386™ and
Intel486™ processors, this flag indicates support of Intel 387 DX math coprocessor
instructions when set.
TS Task Switched (bit 3 of CR0). Allows the saving of FPU context on a task switch to
be delayed until the FPU is actually accessed by the new task. The processor sets this
flag on every task switch and tests it when interpreting floating-point arithmetic
instructions.
• If the TS flag is set, a device-not-available exception (#NM) is raised prior to the
execution of a floating-point instruction.
• If the TS flag and the MP flag (also in the CR0 register) are both set, an #NM
exception is raised prior to the execution of floating-point instruction or a
WAIT/FWAIT instruction.
Table 2-1 shows the actions taken for floating-point, WAIT/FWAIT, MMX™, and
Streaming SIMD Extensions based on the settings of the TS, EM, and MP flags.
2-14
SYSTEM ARCHITECTURE OVERVIEW
Table 2-1. Action Taken for Combinations of EM, MP, TS, CR4.OSFXSR, and CPUID.XMM
CR0 Flags CR4 CPUID Instruction Type
EM MP TS OSFXSR XMM Floating-Point WAIT/FWAIT MMX™ Streaming
Technology SIMD
Extensions
0 0 0 - - Execute Execute Execute -
0 0 1 - - #NM Exception Execute #NM -
Exception
0 1 0 - - Execute Execute Execute -
0 1 1 - - #NM Exception #NM Exception #NM -
Exception
1 0 0 - - #NM Exception Execute #UD Exception -
1 0 1 - - #NM Exception Execute #UD Exception -
1 1 0 - - #NM Exception Execute #UD Exception -
EM MP TS OSFXSR XMM Floating-Point WAIT/FWAIT MMX™ Streaming
Technology SIMD
Extensions
1 1 1 - - #NM Exception #NM Exception #UD Exception -
1 - - - - - - - #UD Interrupt
6
0 - 1 1 1 - - - #NM Interrupt
7
- - - 0 - - - - #UD Interrupt
6
- - - - 0 - - - #UD Interrupt
6
The processor does not automatically save the context of the FPU on a task switch.
Instead it sets the TS flag, which causes the processor to raise an #NM exception when-
ever it encounters a floating-point instruction in the instruction stream for the new task.
The fault handler for the #NM exception can then be used to clear the TS flag (with the
CLTS instruction) and save the context of the FPU. If the task never encounters a
floating-point instruction, the FPU context is never saved.
EM Emulation (bit 2 of CR0). Indicates that the processor does not have an internal or
external FPU when set; indicates an FPU is present when clear. When the EM flag is
set, execution of a floating-point instruction generates a device-not-available exception
(#NM). This flag must be set when the processor does not have an internal FPU or is
not connected to a math coprocessor. If the processor does have an internal FPU,
setting this flag would force all floating-point instructions to be handled by software
emulation. Table 8-2 in Chapter 8, Processor Management and Initialization shows the
recommended setting of this flag, depending on the Intel Architecture processor and
2-15
SYSTEM ARCHITECTURE OVERVIEW
FPU or math coprocessor present in the system. Table 2-1 shows the interaction of the
EM, MP, and TS flags.
Note that the EM flag also affects the execution of the MMX™ instructions (refer to
Table 2-1). When this flag is set, execution of an MMX™ instruction causes an invalid
opcode exception (#UD) to be generated. Thus, if an Intel Architecture processor
incorporates MMX™ technology, the EM flag must be set to 0 to enable execution of
MMX™ instructions.
Similarly for the Streaming SIMD Extensions, when this flag is set, execution of a Streaming
SIMD Extensions instruction causes an invalid opcode exception (#UD) to be generated. Thus,
if an Intel Architecture processor incorporates Streaming SIMD Extensions, the EM flag must
be set to 0 to enable execution of Streaming SIMD Extensions. The exception to this is the
PREFETCH and SFENCE instructions. These instructions are not affected by the EM flag.
MP Monitor Coprocessor (bit 1 of CR0). Controls the interaction of the WAIT (or
FWAIT) instruction with the TS flag (bit 3 of CR0). If the MP flag is set, a WAIT
instruction generates a device-not-available exception (#NM) if the TS flag is set. If the
MP flag is clear, the WAIT instruction ignores the setting of the TS flag. Table 8-2 in
Chapter 8, Processor Management and Initialization shows the recommended setting
of this flag, depending on the Intel Architecture processor and FPU or math copro-
cessor present in the system. Table 2-1 shows the interaction of the MP, EM, and TS
flags.
PE Protection Enable (bit 0 of CR0). Enables protected mode when set; enables real-
address mode when clear. This flag does not enable paging directly. It only enables
segment-level protection. To enable paging, both the PE and PG flags must be set.
Refer to Section 8.8., “Mode Switching” in Chapter 8, Processor Management and
Initialization for information using the PE flag to switch between real and protected
mode.
PCD Page-level Cache Disable (bit 4 of CR3). Controls caching of the current page direc-
tory. When the PCD flag is set, caching of the page-directory is prevented; when the
flag is clear, the page-directory can be cached. This flag affects only the processor’s
internal caches (both L1 and L2, when present). The processor ignores this flag if
paging is not used (the PG flag in register CR0 is clear) or the CD (cache disable) flag
in CR0 is set. Refer to Chapter 9, Memory Cache Control, for more information about
the use of this flag. Refer to Section 3.6.4., “Page-Directory and Page-Table Entries”
in Chapter 3, Protected-Mode Memory Management for a description of a companion
PCD flag in the page-directory and page-table entries.
PWT Page-level Writes Transparent (bit 3 of CR3). Controls the write-through or write-
back caching policy of the current page directory. When the PWT flag is set, write-
through caching is enabled; when the flag is clear, write-back caching is enabled. This
flag affects only the internal caches (both L1 and L2, when present). The processor
ignores this flag if paging is not used (the PG flag in register CR0 is clear) or the CD
(cache disable) flag in CR0 is set. Refer to Section 9.5., “Cache Control”, in Chapter
9, Memory Cache Control, for more information about the use of this flag. Refer to
Section 3.6.4., “Page-Directory and Page-Table Entries” in Chapter 3, Protected-Mode
2-16
SYSTEM ARCHITECTURE OVERVIEW
2-17
SYSTEM ARCHITECTURE OVERVIEW
2-18
SYSTEM ARCHITECTURE OVERVIEW
:
Table 2-2. Summary of System Instructions
Useful to Protected from
Instruction Description Application? Application?
LLDT Load LDT Register No Yes
SLDT Store LDT Register No No
LGDT Load GDT Register No Yes
SGDT Store GDT Register No No
LTR Load Task Register No Yes
STR Store Task Register No No
LIDT Load IDT Register No Yes
SIDT Store IDT Register No No
MOV CRn Load and store control registers Yes Yes (load only)
SMSW Store MSW Yes No
LMSW Load MSW No Yes
CLTS Clear TS flag in CR0 No Yes
ARPL Adjust RPL Yes1 No
LAR Load Access Rights Yes No
LSL Load Segment Limit Yes No
VERR Verify for Reading Yes No
VERW Verify for Writing Yes No
MOV DBn Load and store debug registers No Yes
INVD Invalidate cache, no writeback No Yes
WBINVD Invalidate cache, with writeback No Yes
INVLPG Invalidate TLB entry No Yes
HLT Halt Processor No Yes
LOCK (Prefix) Bus Lock Yes No
RSM Return from system management mode No Yes
RDMSR3 Read Model-Specific Registers No Yes
3
WRMSR Write Model-Specific Registers No Yes
RDPMC4 Read Performance-Monitoring Counter Yes Yes2
3
RDTSC Read Time-Stamp Counter Yes Yes2
LDMXCSR5 Load MXCSR Register Yes No
5
STMXCSR Store MXCSR Resister Yes No
NOTES:
1. Useful to application programs running at a CPL of 1 or 2.
2. The TSD and PCE flags in control register CR4 control access to these instructions by application
programs running at a CPL of 3.
3. These instructions were introduced into the Intel Architecture with the Pentium® processor.
4. This instruction was introduced into the Intel Architecture with the Pentium® Pro processor and the Pen-
tium processor with MMX™ technology.
5. This instruction was introduced into the Intel Architecture with the Pentium® III processor.
2-19
SYSTEM ARCHITECTURE OVERVIEW
2-20
SYSTEM ARCHITECTURE OVERVIEW
duplicate some of the automatic access rights and type checking done by the processor, thus
allowing operating-system or executive software to prevent exceptions from being generated.
The ARPL (adjust RPL) instruction adjusts the RPL (requestor privilege level) of a segment
selector to match that of the program or procedure that supplied the segment selector. Refer to
Section 4.10.4., “Checking Caller Access Privileges (ARPL Instruction)” in Chapter 4, Protec-
tion for a detailed explanation of the function and use of this instruction.
The LAR (load access rights) instruction verifies the accessibility of a specified segment and
loads the access rights information from the segment’s segment descriptor into a general-
purpose register. Software can then examine the access rights to determine if the segment type
is compatible with its intended use. Refer to Section 4.10.1., “Checking Access Rights (LAR
Instruction)” in Chapter 4, Protection for a detailed explanation of the function and use of this
instruction.
The LSL (load segment limit) instruction verifies the accessibility of a specified segment and
loads the segment limit from the segment’s segment descriptor into a general-purpose register.
Software can then compare the segment limit with an offset into the segment to determine
whether the offset lies within the segment. Refer to Section 4.10.3., “Checking That the Pointer
Offset Is Within Limits (LSL Instruction)” in Chapter 4, Protection for a detailed explanation of
the function and use of this instruction.
The VERR (verify for reading) and VERW (verify for writing) instructions verify if a selected
segment is readable or writable, respectively, at the CPL. Refer to Section 4.10.2., “Checking
Read/Write Rights (VERR and VERW Instructions)” in Chapter 4, Protection for a detailed
explanation of the function and use of this instruction.
2-21
SYSTEM ARCHITECTURE OVERVIEW
The HLT (halt processor) instruction stops the processor until an enabled interrupt (such as NMI
or SMI, which are normally enabled), the BINIT# signal, the INIT# signal, or the RESET#
signal is received. The processor generates a special bus cycle to indicate that the halt mode has
been entered. Hardware may respond to this signal in a number of ways. An indicator light on
the front panel may be turned on. An NMI interrupt for recording diagnostic information may
be generated. Reset initialization may be invoked. (Note that the BINIT# pin was introduced
with the Pentium® Pro processor.)
The LOCK prefix invokes a locked (atomic) read-modify-write operation when modifying a
memory operand. This mechanism is used to allow reliable communications between processors
in multiprocessor systems. In the Pentium® and earlier Intel Architecture processors, the LOCK
prefix causes the processor to assert the LOCK# signal during the instruction, which always
causes an explicit bus lock to occur. In the P6 family processors, the locking operation is handled
with either a cache lock or bus lock. If a memory access is cacheable and affects only a single
cache line, a cache lock is invoked and the system bus and the actual memory location in system
memory are not locked during the operation. Here, other P6 family processors on the bus write-
back any modified data and invalidate their caches as necessary to maintain system memory
coherency. If the memory access is not cacheable and/or it crosses a cache line boundary, the
processor’s LOCK# signal is asserted and the processor does not respond to requests for bus
control during the locked operation.
The RSM (return from SMM) instruction restores the processor (from a context dump) to the
state it was in prior to an system management mode (SMM) interrupt.
2-22
SYSTEM ARCHITECTURE OVERVIEW
2-23
SYSTEM ARCHITECTURE OVERVIEW
2-24
3
Protected-Mode
Memory
Management
PROTECTED-MODE MEMORY MANAGEMENT
CHAPTER 3
PROTECTED-MODE MEMORY MANAGEMENT
This chapter describes the Intel Architecture’s protected-mode memory management facilities,
including the physical memory requirements, the segmentation mechanism, and the paging
mechanism. Refer to Chapter 4, Protection for a description of the processor’s protection mech-
anism. Refer to Chapter 16, 8086 Emulation for a description of memory addressing protection
in real-address and virtual-8086 modes.
3-1
PROTECTED-MODE MEMORY MANAGEMENT
address space.
Logical Address
(or Far Pointer)
Segment
Selector Offset Linear Address
Space
Linear Address
Global Descriptor
Dir Table Offset Physical
Table (GDT)
Address
Space
Segment
Page Table Page
Segment
Descriptor
Page Directory Phy. Addr.
Lin. Addr.
Entry
Entry
Segment
Base Address
Page
Segmentation Paging
If paging is not used, the linear address space of the processor is mapped directly into the phys-
ical address space of processor. The physical address space is defined as the range of addresses
that the processor can generate on its address bus.
Because multitasking computing systems commonly define a linear address space much larger
than it is economically feasible to contain all at once in physical memory, some method of
“virtualizing” the linear address space is needed. This virtualization of the linear address space
is handled through the processor’s paging mechanism.
Paging supports a “virtual memory” environment where a large linear address space is simulated
with a small amount of physical memory (RAM and ROM) and some disk storage. When using
paging, each segment is divided into pages (ordinarily 4 KBytes each in size), which are stored
either in physical memory or on the disk. The operating system or executive maintains a page
directory and a set of page tables to keep track of the pages. When a program (or task) attempts
to access an address location in the linear address space, the processor uses the page directory
3-2
PROTECTED-MODE MEMORY MANAGEMENT
and page tables to translate the linear address into a physical address and then performs the
requested operation (read or write) on the memory location. If the page being accessed is not
currently in physical memory, the processor interrupts execution of the program (by generating
a page-fault exception). The operating system or executive then reads the page into physical
memory from the disk and continues executing the program.
When paging is implemented properly in the operating-system or executive, the swapping of
pages between physical memory and the disk is transparent to the correct execution of a
program. Even programs written for 16-bit Intel Architecture processors can be paged (transpar-
ently) when they are run in virtual-8086 mode.
3-3
PROTECTED-MODE MEMORY MANAGEMENT
FS
GS
GS 0
More complexity can be added to this protected flat model to provide more protection. For
example, for the paging mechanism to provide isolation between user and supervisor code and
data, four segments need to be defined: code and data segments at privilege level 3 for the user,
and code and data segments at privilege level 0 for the supervisor. Usually these segments all
overlay each other and start at address 0 in the linear address space. This flat segmentation
3-4
PROTECTED-MODE MEMORY MANAGEMENT
model along with a simple paging structure can protect the operating system from applications,
and by adding a separate paging structure for each task or process, it can also protect applica-
tions from each other. Similar designs are used by several popular multitasking operating
systems.
Access Limit
DS
Base Address Code
Access Limit
ES
Base Address
Data
Access Limit
FS
Base Address
Data
Access Limit
GS
Base Address
Data
Access Limit
Base Address
Access Limit
Base Address
Data
Access Limit
Base Address
Access Limit
Base Address
3-5
PROTECTED-MODE MEMORY MANAGEMENT
Access checks can be used to protect not only against referencing an address outside the limit
of a segment, but also against performing disallowed operations in certain segments. For
example, since code segments are designated as read-only segments, hardware can be used to
prevent writes into code segments. The access rights information created for segments can also
be used to set up protection rings or levels. Protection levels can be used to protect operating-
system procedures from unauthorized access by application programs.
3-6
PROTECTED-MODE MEMORY MANAGEMENT
FFFFFFFH. The linear address space contains all the segments and system tables defined for a
system.
To translate a logical address into a linear address, the processor does the following:
1. Uses the offset in the segment selector to locate the segment descriptor for the segment in
the GDT or LDT and reads it into the processor. (This step is needed only when a new
segment selector is loaded into a segment register.)
2. Examines the segment descriptor to check the access rights and range of the segment to
insure that the segment is accessible and that the offset is within the limits of the segment.
3. Adds the base address of the segment from the segment descriptor to the offset to form a
linear address.
15 0 31 0
Logical Seg. Selector Offset
Address
Descriptor Table
31 0
Linear Address
If paging is not used, the processor maps the linear address directly to a physical address (that
is, the linear address goes out on the processor’s address bus). If the linear address space is
paged, a second level of address translation is used to translate the linear address into a physical
address. Page translation is described in Section 3.6., “Paging (Virtual Memory)”
3-7
PROTECTED-MODE MEMORY MANAGEMENT
15 3 2 1 0
Index T RPL
I
Table Indicator
0 = GDT
1 = LDT
Requested Privilege Level (RPL)
3-8
PROTECTED-MODE MEMORY MANAGEMENT
can be available for immediate use. Other segments can be made available by loading their
segment selectors into these registers during program execution.
Every segment register has a “visible” part and a “hidden” part. (The hidden part is sometimes
referred to as a “descriptor cache” or a “shadow register.”) When a segment selector is loaded
into the visible part of a segment register, the processor also loads the hidden part of the segment
register with the base address, segment limit, and access control information from the segment
descriptor pointed to by the segment selector. The information cached in the segment register
(visible and hidden) allows the processor to translate addresses without taking extra bus cycles
to read the base address and limit from the segment descriptor. In systems in which multiple
processors have access to the same descriptor tables, it is the responsibility of software to reload
the segment registers when the descriptor tables are modified. If this is not done, an old segment
descriptor cached in a segment register might be used after its memory-resident version has been
modified.
Two kinds of load instructions are provided for loading the segment registers:
1. Direct load instructions such as the MOV, POP, LDS, LES, LSS, LGS, and LFS instruc-
tions. These instructions explicitly reference the segment registers.
2. Implied load instructions such as the far pointer versions of the CALL, JMP, and RET
instructions and the IRET, INTn, INTO and INT3 instructions. These instructions change
the contents of the CS register (and sometimes other segment registers) as an incidental
part of their operation.
The MOV instruction can also be used to store visible part of a segment register in a general-
purpose register.
3-9
PROTECTED-MODE MEMORY MANAGEMENT
utive, but not application programs. Figure 3-8 illustrates the general descriptor format for all
types of segment descriptors.
The flags and fields in a segment descriptor are as follows:
Segment limit field
Specifies the size of the segment. The processor puts together the two segment
limit fields to form a 20-bit value. The processor interprets the segment limit
in one of two ways, depending on the setting of the G (granularity) flag:
• If the granularity flag is clear, the segment size can range from 1 byte to 1
MByte, in byte increments.
• If the granularity flag is set, the segment size can range from 4 KBytes to
4 GBytes, in 4-KByte increments.
The processor uses the segment limit in two different ways, depending on
whether the segment is an expand-up or an expand-down segment. Refer to
Section 3.4.3.1., “Code- and Data-Segment Descriptor Types” for more infor-
mation about segment types. For expand-up segments, the offset in a logical
address can range from 0 to the segment limit. Offsets greater than the segment
limit generate general-protection exceptions (#GP). For expand-down
segments, the segment limit has the reverse function; the offset can range from
the segment limit to FFFFFFFFH or FFFFH, depending on the setting of the B
flag. Offsets less than the segment limit generate general-protection excep-
tions. Decreasing the value in the segment limit field for an expand-down
segment allocates new memory at the bottom of the segment's address space,
rather than at the top. Intel Architecture stacks always grow downwards,
making this mechanism is convenient for expandable stacks.
3-10
PROTECTED-MODE MEMORY MANAGEMENT
31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0
D A Seg. D
Base 31:24 G / 0 V Limit P P S Type Base 23:16 4
B L 19:16 L
31 16 15 0
3-11
PROTECTED-MODE MEMORY MANAGEMENT
3-12
PROTECTED-MODE MEMORY MANAGEMENT
31 16 15 14 13 12 11 8 7 0
D
Available 0 P S Type Available 4
L
31 0
Available 0
G (granularity) flag
Determines the scaling of the segment limit field. When the granularity flag is
clear, the segment limit is interpreted in byte units; when flag is set, the
segment limit is interpreted in 4-KByte units. (This flag does not affect the
granularity of the base address; it is always byte granular.) When the granu-
larity flag is set, the twelve least significant bits of an offset are not tested when
checking the offset against the segment limit. For example, when the granu-
larity flag is set, a limit of 0 results in valid offsets from 0 to 4095.
Available and reserved bits
Bit 20 of the second doubleword of the segment descriptor is available for use
by system software; bit 21 is reserved and should always be set to 0.
3-13
PROTECTED-MODE MEMORY MANAGEMENT
Stack segments are data segments which must be read/write segments. Loading the SS register
with a segment selector for a nonwritable data segment generates a general-protection exception
(#GP). If the size of a stack segment needs to be changed dynamically, the stack segment can be
an expand-down data segment (expansion-direction flag set). Here, dynamically changing the
segment limit causes stack space to be added to the bottom of the stack. If the size of a stack
segment is intended to remain static, the stack segment may be either an expand-up or expand-
down type.
The accessed bit indicates whether the segment has been accessed since the last time the oper-
ating-system or executive cleared the bit. The processor sets this bit whenever it loads a segment
selector for the segment into a segment register. The bit remains set until explicitly cleared. This
bit can be used both for virtual memory management and for debugging.
For code segments, the three low-order bits of the type field are interpreted as accessed (A), read
enable (R), and conforming (C). Code segments can be execute-only or execute/read, depending
on the setting of the read-enable bit. An execute/read segment might be used when constants or
other static data have been placed with instruction code in a ROM. Here, data can be read from
the code segment either by using an instruction with a CS override prefix or by loading a
segment selector for the code segment in a data-segment register (the DS, ES, FS, or GS regis-
ters). In protected mode, code segments are not writable.
Code segments can be either conforming or nonconforming. A transfer of execution into a more-
privileged conforming segment allows execution to continue at the current privilege level. A
transfer into a nonconforming segment at a different privilege level results in a general-protec-
tion exception (#GP), unless a call gate or task gate is used (refer to Section 4.8.1., “Direct Calls
or Jumps to Code Segments” in Chapter 4, Protection for more information on conforming and
3-14
PROTECTED-MODE MEMORY MANAGEMENT
nonconforming code segments). System utilities that do not access protected facilities and
handlers for some types of exceptions (such as, divide error or overflow) may be loaded in
conforming code segments. Utilities that need to be protected from less privileged programs and
procedures should be placed in nonconforming code segments.
NOTE
Execution cannot be transferred by a call or a jump to a less-privileged
(numerically higher privilege level) code segment, regardless of whether the
target segment is a conforming or nonconforming code segment. Attempting
such an execution transfer will result in a general-protection exception.
All data segments are nonconforming, meaning that they cannot be accessed by less privileged
programs or procedures (code executing at numerically high privilege levels). Unlike code
segments, however, data segments can be accessed by more privileged programs or procedures
(code executing at numerically lower privilege levels) without using a special access gate.
The processor may update the Type field when a segment is accessed, even if the access is a read
cycle. If the descriptor tables have been put in ROM, it may be necessary for hardware to prevent
the ROM from being enabled onto the data bus during a write cycle. It also may be necessary to
return the READY# signal to the processor when a write cycle to ROM occurs, otherwise
the cycle will not terminate. These features of the hardware design are necessary for using
ROM-based descriptor tables with the Intel386™ DX processor, which always sets the
Accessed bit when a segment descriptor is loaded. The P6 family, Pentium®, and Intel486™
processors, however, only set the accessed bit if it is not already set. Writes to descriptor tables
in ROM can be avoided by setting the accessed bits in every descriptor.
3-15
PROTECTED-MODE MEMORY MANAGEMENT
For more information on the system-segment descriptors, refer to Section 3.5.1., “Segment
Descriptor Tables”, and Section 6.2.2., “TSS Descriptor” in Chapter 6, Task Management. For
more information on the gate descriptors, refer to Section 4.8.2., “Gate Descriptors” in Chapter
4, Protection; Section 5.9., “IDT Descriptors” in Chapter 5, Interrupt and Exception Handling;
and Section 6.2.4., “Task-Gate Descriptor” in Chapter 6, Task Management.
3-16
PROTECTED-MODE MEMORY MANAGEMENT
Global Local
Descriptor Descriptor
Table (GDT) Table (LDT)
T
I TI = 0 TI = 1
Segment
Selector
56 56
48 48
40 40
32 32
24 24
16 16
8 8
First Descriptor in
GDT is Not Used 0 0
Each system must have one GDT defined, which may be used for all programs and tasks in the
system. Optionally, one or more LDTs can be defined. For example, an LDT can be defined for
each separate task being run, or some or all tasks can share the same LDT.
The GDT is not a segment itself; instead, it is a data structure in the linear address space. The
base linear address and limit of the GDT must be loaded into the GDTR register (refer to Section
2.4., “Memory-Management Registers” in Chapter 2, System Architecture Overview). The base
addresses of the GDT should be aligned on an eight-byte boundary to yield the best processor
performance. The limit value for the GDT is expressed in bytes. As with segments, the limit
value is added to the base address to get the address of the last valid byte. A limit value of 0
results in exactly one valid byte. Because segment descriptors are always 8 bytes long, the GDT
limit should always be one less than an integral multiple of eight (that is, 8N – 1).
The first descriptor in the GDT is not used by the processor. A segment selector to this “null
descriptor” does not generate an exception when loaded into a data-segment register (DS, ES,
FS, or GS), but it always generates a general-protection exception (#GP) when an attempt is
3-17
PROTECTED-MODE MEMORY MANAGEMENT
made to access memory using the descriptor. By initializing the segment registers with this
segment selector, accidental reference to unused segment registers can be guaranteed to generate
an exception.
The LDT is located in a system segment of the LDT type. The GDT must contain a segment
descriptor for the LDT segment. If the system supports multiple LDTs, each must have a sepa-
rate segment selector and segment descriptor in the GDT. The segment descriptor for an LDT
can be located anywhere in the GDT. Refer to Section 3.5., “System Descriptor Types” for infor-
mation on the LDT segment-descriptor type.
An LDT is accessed with its segment selector. To eliminate address translations when accessing
the LDT, the segment selector, base linear address, limit, and access rights of the LDT are stored
in the LDTR register (refer to Section 2.4., “Memory-Management Registers” in Chapter 2,
System Architecture Overview).
When the GDTR register is stored (using the SGDT instruction), a 48-bit “pseudo-descriptor”
is stored in memory (refer to Figure 3-11). To avoid alignment check faults in user mode (priv-
ilege level 3), the pseudo-descriptor should be located at an odd word address (that is, address
MOD 4 is equal to 2). This causes the processor to store an aligned word, followed by an aligned
doubleword. User-mode programs normally do not store pseudo-descriptors, but the possibility
of generating an alignment check fault can be avoided by aligning pseudo-descriptors in this
way. The same alignment should be used when storing the IDTR register using the SIDT instruc-
tion. When storing the LDTR or task register (using the SLTR or STR instruction, respectively),
the pseudo-descriptor should be located at a doubleword address (that is, address MOD 4 is
equal to 0).
47 16 15 0
Base Address Limit
3-18
PROTECTED-MODE MEMORY MANAGEMENT
out to disk in the process). When the page has been loaded in physical memory, a return from
the exception handler causes the instruction that generated the exception to be restarted. The
information that the processor uses to map linear addresses into the physical address space and
to generate page-fault exceptions (when necessary) is contained in page directories and page
tables stored in memory.
Paging is different from segmentation through its use of fixed-size pages. Unlike segments,
which usually are the same size as the code or data structures they hold, pages have a fixed size.
If segmentation is the only form of address translation used, a data structure present in physical
memory will have all of its parts in memory. If paging is used, a data structure can be partly in
memory and partly in disk storage.
To minimize the number of bus cycles required for address translation, the most recently
accessed page-directory and page-table entries are cached in the processor in devices called
translation lookaside buffers (TLBs). The TLBs satisfy most requests for reading the current
page directory and page tables without requiring a bus cycle. Extra bus cycles occur only when
the TLBs do not contain a page-table entry, which typically happens when a page has not been
accessed for a long time. Refer to Section 3.7., “Translation Lookaside Buffers (TLBs)” for
more information on the TLBs.
3-19
PROTECTED-MODE MEMORY MANAGEMENT
3-20
PROTECTED-MODE MEMORY MANAGEMENT
Linear Address
31 22 21 12 11 0
Directory Table Offset
12 4-KByte Page
Page-Table Entry
Directory Entry
To select the various table entries, the linear address is divided into three sections:
• Page-directory entry—Bits 22 through 31 provide an offset to an entry in the page
directory. The selected entry provides the base physical address of a page table.
• Page-table entry—Bits 12 through 21 of the linear address provide an offset to an entry in
the selected page table. This entry provides the base physical address of a page in physical
memory.
• Page offset—Bits 0 through 11 provides an offset to a physical address in the page.
Memory management software has the option of using one page directory for all programs and
tasks, one page directory for each task, or some combination of the two.
3-21
PROTECTED-MODE MEMORY MANAGEMENT
Linear Address
31 22 21 0
Directory Offset
22 4-MByte Page
10 Page Directory
Physical Address
Directory Entry
The 4-MByte page size is selected by setting the PSE flag in control register CR4 and setting
the page size (PS) flag in a page-directory entry (refer to Figure 3-14). With these flags set, the
linear address is divided into two sections:
• Page directory entry—Bits 22 through 31 provide an offset to an entry in the page
directory. The selected entry provides the base physical address of a 4-MByte page.
• Page offset—Bits 0 through 21 provides an offset to a physical address in the page.
NOTE
®
(For the Pentium processor only.) When enabling or disabling large page
sizes, the TLBs must be invalidated (flushed) after the PSE flag in control
register CR4 has been set or cleared. Otherwise, incorrect page translation
might occur due to the processor using outdated page translation information
stored in the TLBs. Refer to Section 9.10., “Invalidating the Translation
Lookaside Buffers (TLBs)”, in Chapter 9, Memory Cache Control, for
information on how to invalidate the TLBs.
3-22
PROTECTED-MODE MEMORY MANAGEMENT
TLBs. So, placing often used code such as the kernel in a large page, frees up 4-KByte-page
TLB entries for application programs and tasks.
3-23
PROTECTED-MODE MEMORY MANAGEMENT
Figure 3-14. Format of Page-Directory and Page-Table Entries for 4-KByte Pages
and 32-Bit Physical Addresses
3-24
PROTECTED-MODE MEMORY MANAGEMENT
Figure 3-15. Format of Page-Directory Entries for 4-MByte Pages and 32-Bit Addresses
The functions of the flags and fields in the entries in Figures 3-14 and 3-15 are as follows:
Page base address, bits 12 through 32
(Page-table entries for 4-KByte pages.) Specifies the physical address of the
first byte of a 4-KByte page. The bits in this field are interpreted as the 20 most-
significant bits of the physical address, which forces pages to be aligned on
4-KByte boundaries.
(Page-directory entries for 4-KByte page tables.) Specifies the physical
address of the first byte of a page table. The bits in this field are interpreted as
the 20 most-significant bits of the physical address, which forces page tables to
be aligned on 4-KByte boundaries.
(Page-directory entries for 4-MByte pages.) Specifies the physical address of
the first byte of a 4-MByte page. Only bits 22 through 31 of this field are used
(and bits 12 through 21 are reserved and must be set to 0, for Intel Architecture
processors through the Pentium® II processor). The base address bits are inter-
preted as the 10 most-significant bits of the physical address, which forces 4-
MByte pages to be aligned on 4-MByte boundaries.
Present (P) flag, bit 0
Indicates whether the page or page table being pointed to by the entry is
currently loaded in physical memory. When the flag is set, the page is in phys-
ical memory and address translation is carried out. When the flag is clear, the
page is not in memory and, if the processor attempts to access the page, it
generates a page-fault exception (#PF).
The processor does not set or clear this flag; it is up to the operating system or
executive to maintain the state of the flag.
3-25
PROTECTED-MODE MEMORY MANAGEMENT
3-26
PROTECTED-MODE MEMORY MANAGEMENT
disabled for pages that contain memory-mapped I/O ports or that do not
provide a performance benefit when cached. The processor ignores this flag
(assumes it is set) if the CD (cache disable) flag in CR0 is set. Refer to Chapter
9, Memory Cache Control, for more information about the use of this flag.
Refer to Section 2.5. in Chapter 2, System Architecture Overview for a descrip-
tion of a companion PCD flag in control register CR3.
Accessed (A) flag, bit 5
Indicates whether a page or page table has been accessed (read from or written
to) when set. Memory management software typically clears this flag when a
page or page table is initially loaded into physical memory. The processor then
sets this flag the first time a page or page table is accessed. This flag is a
“sticky” flag, meaning that once set, the processor does not implicitly clear it.
Only software can clear this flag. The accessed and dirty flags are provided for
use by memory management software to manage the transfer of pages and page
tables into and out of physical memory.
Dirty (D) flag, bit 6
Indicates whether a page has been written to when set. (This flag is not used in
page-directory entries that point to page tables.) Memory management soft-
ware typically clears this flag when a page is initially loaded into physical
memory. The processor then sets this flag the first time a page is accessed for
a write operation. This flag is “sticky,” meaning that once set, the processor
does not implicitly clear it. Only software can clear this flag. The dirty and
accessed flags are provided for use by memory management software to
manage the transfer of pages and page tables into and out of physical memory.
Page size (PS) flag, bit 7
Determines the page size. This flag is only used in page-directory entries.
When this flag is clear, the page size is 4 KBytes and the page-directory entry
points to a page table. When the flag is set, the page size is 4 MBytes for normal
32-bit addressing (and 2 MBytes if extended physical addressing is enabled)
and the page-directory entry points to a page. If the page-directory entry points
to a page table, all the pages associated with that page table will be 4-KByte
pages.
Global (G) flag, bit 8
(Introduced in the Pentium® Pro processor.) Indicates a global page when set.
When a page is marked global and the page global enable (PGE) flag in register
CR4 is set, the page-table or page-directory entry for the page is not invalidated
in the TLB when register CR3 is loaded or a task switch occurs. This flag is
provided to prevent frequently used pages (such as pages that contain kernel or
other operating system or executive code) from being flushed from the TLB.
Only software can set or clear this flag. For page-directory entries that point to
page tables, this flag is ignored and the global characteristics of a page are set
in the page-table entries. Refer to Section 3.7., “Translation Lookaside Buffers
(TLBs)” for more information about the use of this flag. (This bit is reserved in
Pentium® and earlier Intel Architecture processors.)
3-27
PROTECTED-MODE MEMORY MANAGEMENT
31 0
3-28
PROTECTED-MODE MEMORY MANAGEMENT
All (nonglobal) TLBs are automatically invalidated any time the CR3 register is loaded (unless
the G flag for a page or page-table entry is set, as describe later in this section). The CR3 register
can be loaded in either of two ways:
• Explicitly, using the MOV instruction, for example:
MOV CR3, EAX
3-29
PROTECTED-MODE MEMORY MANAGEMENT
31 0
P P
Page-Directory-Pointer-Table Base Address C W 0 0 0
D T
Figure 3-17. Register CR3 Format When the Physical Address Extension is Enabled
3-30
PROTECTED-MODE MEMORY MANAGEMENT
Linear Address
31 30 29 21 20 12 11 0
Directory Pointer Directory Table Offset
12 4-KByte Page
Page-Directory-
Pointer Table
4 PDPTE ∗ 512 PDE ∗ 512 PTE = 220 Pages
32*
CR3 (PDBR)
*32 bits aligned onto a 32-byte boundary
Figure 3-18. Linear Address Translation With Extended Physical Addressing Enabled
(4-KByte Pages)
To select the various table entries, the linear address is divided into three sections:
• Page-directory-pointer-table entry—Bits 30 and 31 provide an offset to one of the 4 entries
in the page-directory-pointer table. The selected entry provides the base physical address
of a page directory.
• Page-directory entry—Bits 21 through 29 provide an offset to an entry in the selected page
directory. The selected entry provides the base physical address of a page table.
• Page-table entry—Bits 12 through 20 provide an offset to an entry in the selected page
table. This entry provides the base physical address of a page in physical memory.
• Page offset—Bits 0 through 11 provide an offset to a physical address in the page.
3-31
PROTECTED-MODE MEMORY MANAGEMENT
3-32
PROTECTED-MODE MEMORY MANAGEMENT
Linear Address
31 30 29 21 20 0
Directory Offset
Pointer Directory
21 2 or 4-MByte Pages
9
Page Directory Physical Address
Page-Directory-
Pointer Table
2
Directory Entry
Dir. Pointer Entry
Figure 3-19. Linear Address Translation With Extended Physical Addressing Enabled
(2-MByte or 4-MByte Pages)
3-33
PROTECTED-MODE MEMORY MANAGEMENT
Page-Directory-Pointer-Table Entry
63 36 35 32
Base
Reserved (set to 0) Addr.
31 12 11 9 8 5 4 3 2 1 0
P P
Page-Directory Base Address Avail. Reserved C W Res. 1
D T
31 12 11 9 8 7 6 5 4 3 2 1 0
P P U R
Page-Table Base Address Avail. 0 0 0 A C W / / P
D T S W
31 12 11 9 8 7 6 5 4 3 2 1 0
P P U R
Page Base Address Avail. G 0 D A C W / / P
D T S W
The base physical address in an entry specifies the following, depending on the type of entry:
• Page-directory-pointer-table entry—the physical address of the first byte of a
4-KByte page directory.
• Page-directory entry—the physical address of the first byte of a 4-KByte page table or a
2-MByte page.
• Page-table entry—the physical address of the first byte of a 4-KByte page.
For all table entries (except for page-directory entries that point to 2-MByte or 4-MByte pages),
the bits in the page base address are interpreted as the 24 most-significant bits of a 36-bit phys-
ical address, which forces page tables and pages to be aligned on 4-KByte boundaries. When a
page-directory entry points to a 2-MByte or 4-MByte page, the base address is interpreted as the
15 most-significant bits of a 36-bit physical address, which forces pages to be aligned on 2-
MByte or 4-MByte boundaries.
3-34
PROTECTED-MODE MEMORY MANAGEMENT
Page-Directory-Pointer-Table Entry
63 36 35 32
Base
Reserved (set to 0) Addr.
31 12 11 9 8 5 4 3 2 1 0
P P
Page Directory Base Address Avail. Reserved C W Res. 1
D T
31 21 20 12 11 9 8 7 6 5 4 3 2 1 0
P P U R
Page Base Address Reserved (set to 0) Avail. G 1 D A C W / / P
D T S W
The present (P) flag (bit 0) in all page-directory-pointer-table entries must be set to 1 anytime
extended physical addressing mode is enabled; that is, whenever the PAE flag (bit 5 in register
CR4) and the PG flag (bit 31 in register CR0) are set. If the P flag is not set in all 4 page-direc-
tory-pointer-table entries in the page-directory-pointer table when extended physical addressing
is enabled, a general-protection exception (#GP) is generated.
The page size (PS) flag (bit 7) in a page-directory entry determines if the entry points to a page
table or a 2-MByte or 4-MByte page. When this flag is clear, the entry points to a page table;
when the flag is set, the entry points to a 2-MByte or 4-MByte page. This flag allows 4-KByte,
2-MByte, or 4-MByte pages to be mixed within one set of paging tables.
Access (A) and dirty (D) flags (bits 5 and 6) are provided for table entries that point to pages.
Bits 9, 10, and 11 in all the table entries for the physical address extension are available for use
by software. (When the present flag is clear, bits 1 through 63 are available to software.) All bits
in Figure 3-14 that are marked reserved or 0 should be set to 0 by software and not accessed by
software. When the PSE and/or PAE flags in control register CR4 are set, the processor gener-
ates a page fault (#PF) if reserved bits in page-directory and page-table entries are not set to 0,
and it generates a general-protection exception (#GP) if reserved bits in a page-directory-
pointer-table entry are not set to 0.
3-35
PROTECTED-MODE MEMORY MANAGEMENT
vendors to address physical memory above 4-GBytes without requiring major design changes,
but has practical limitations with respect to demand paging.
The P6 family of processors’ physical address extension (PAE) feature provides generic access
to a 36-bit physical address space. However, it requires expansion of the page-directory and
page-table entries to an 8-byte format (64 bit), and the addition of a page-directory-pointer table,
resulting in another level of indirection to address translation.
For P6-family processors that support the 36-bit PSE feature, the virtual memory architecture is
extended to support 4-MByte page size granularity in combination with 36-bit physical
addressing. Note that some P6-family processors do not support this feature. For information
about determining a processor’s feature support, refer to the following documents:
• AP-485, Intel Processor Identification and the CPUID Instruction
• Addendum—Intel Architecture Software Developer’s Manual, Volume1: Basic Archi-
tecture
For information about the virtual memory architecture features of P6-family processors, refer to
Chapter 3 of the Intel Architecture Software Developer’s Manual, Volume3: System Program-
ming Guide.
3-36
PROTECTED-MODE MEMORY MANAGEMENT
To use the 36-bit PSE feature, the PAE feature must be cleared (as indicated in Table 3-4).
However, the 36-bit PSE in no way affects the PAE feature. Existing operating systems and soft-
wware that use the PAE will continue to have compatible functionality and features with P6-
family processors that support 36-bit PSE. Specifically, the Page-Directory Entry (PDE) format
when PAE is enabled for 2-MByte or 4-MByte pages is exactly as depicted in Figure 3-21 of the
Intel Architecture Software Developer’s Manual, Volume3: System Programming Guide.
No matter which 36-bit addressing feature is used (PAE or 36-bit PSE), the linear address space
of the processor remains at 32 bits. Applications must partition the address space of their work
loads across multiple operating system process to take advantage of the additonal physical
memory provided in the system.
The 36-bit PSE feature estends the PDE format of the Intel Architecture for 4-MByte pages and
32-bit addresses by utilizing bits 16-13 (formerly reserved bits that were required to be zero) to
extend the physical address without requiring an 8-byte page-directory entry. Therefore, with
the 36-bit PSE feature, a page directory can contain up to 1024 entries, each pointing to a 4-
MByte page that can exist anywhere in the 36-bit physical address space of the processor.
Figure 3-22 shows the difference between PDE formats for 4-MByte pages on P6-family proces-
sors that support the 36-bit PSE feature compared to P6-family processors that do not support
the 36-bit PSE feature (i.e., 32-bit addressing).
Figure 3-22 also shows the linear address mapping to 4-MByte pages when the 36-bit PSE is
enabled. The base physical address of the 4-MByte page is contained in the PDE. PA-2 (bits 13-
16) is used to provide the upper four bits (bits 32-35) of the 36-bit physical address. PA-1 (bits
22-31) continues to provide the next ten bits (bits 22-31) of the physical address for the 4-MByte
page. The offset into the page is provided by the lower 22 bits of the linear address. This scheme
eliminates the second level of indirection caused by the use of 4-KByte page tables.
3-37
PROTECTED-MODE MEMORY MANAGEMENT
Page Directory Entry format for processors that support 36-bit addressing for 4-MByte pages
31 22 21 17 16 13 12 11 8 7 6 0
PA - 1 Reserved PA - 2 PAT PS=1
Page Directory Entry format for processors that support 32-bit addressing for 4-MByte pages
31 22 21 12 11 8 7 6 0
Base Page Address Reserved PS=1
Figure 3-22. PDE Format Differences between 36-bit and 32-bit addressing
Notes:
1. PA-2 = Bits 35-32 of thebase physical address for the 4-MByte page (correspond to bits 16-13)
2. PA-2 = Bits 31-22 of thebase physical address for the 4-MByte page
3. PAT = Bit 12 used as the Most Significant Bit of the index into Page Attribute Table (PAT); see Section
10.2.
4. PS = Bit 7 is the Page Size Bit—indicates 4-MByte page (must be set to 1)
5. Reserved = Bits 21-17 are reserved for future expansion
6. No change in format or meaning of bits 11-8 and 6-0; refer to Figure 3-15 for details.
The PSE-36 feature is transparent to existing operating systems that utilize 4-MByte pages,
because unused bits in PA-2 are currently enforced as zero by Intel processors. The feature
requires 4-MByte pages aligned on a 4-MByte boundary and 4 MBytes of physically contiguous
memory. Therefore, the ten bits of PA-1 are sufficient to specify the base physical address of any
4-MByte page below 4 GBytes. An operating system can easily support addresses greater than
4 GBytes simply by providing the upper 4 bits of the physical address in PA-2 when creating a
PDE for a 4-MByte page.
Figure 3-23 shows the linear address mapping to 4 MB pages when the 36-bit PSE is enabled.
The base physical address of the 4 MB page is contained in the PDE. PA-2 (bits 13-16) is used
to provide the upper four bits (bits 32-35) of the 36-bit physical address. PA-1 (bits 22-31)
continues to provide the next ten bits (bits 22-31) of the physical address for the 4 MB page. The
offset into the page is provided by the lower 22 bits of the linear address. This scheme eliminates
the second level of indirection caused by the use of 4 KB page tables.
3-38
PROTECTED-MODE MEMORY MANAGEMENT
Page Directory
31 2221 17 16 13 12 11 8 7 6 0
CR3
The PSE-36 feature is transparent to existing operating systems that utilize 4 MB pages because
unused bits in PA-2 are currently enforced as zero by Intel processors. The feature requires 4
MB pages aligned on a 4 MB boundary and 4 MB of physically contiguous memory. Therefore,
the ten bits of PA-1 are sufficient to specify the base physical address of any 4 MB page below
4GB. An operating system easily can support addresses greater than 4 GB simply by providing
the upper 4 bits of the physical address in PA-2 when creating a PDE for a 4 MB page.
3-39
PROTECTED-MODE MEMORY MANAGEMENT
Page Frames
PTE
PTE
PTE
Seg. Descript. PDE
Seg. Descript. PDE
PTE
PTE
Figure 3-24. Memory Management Convention That Assigns a Page Table to Each
Segment
3-40
4
Protection
PROTECTION
CHAPTER 4
PROTECTION
In protected mode, the Intel Architecture provides a protection mechanism that operates at both
the segment level and the page level. This protection mechanism provides the ability to limit
access to certain segments or pages based on privilege levels (four privilege levels for segments
and two privilege levels for pages). For example, critical operating-system code and data can be
protected by placing them in more privileged segments than those that contain applications
code. The processor’s protection mechanism will then prevent application code from accessing
the operating-system code and data in any but a controlled, defined manner.
Segment and page protection can be used at all stages of software development to assist in local-
izing and detecting design problems and bugs. It can also be incorporated into end-products to
offer added robustness to operating systems, utilities software, and applications software.
When the protection mechanism is used, each memory reference is checked to verify that it
satisfies various protection checks. All checks are made before the memory cycle is started; any
violation results in an exception. Because checks are performed in parallel with address transla-
tion, there is no performance penalty. The protection checks that are performed fall into the
following categories:
• Limit checks.
• Type checks.
• Privilege level checks.
• Restriction of addressable domain.
• Restriction of procedure entry-points.
• Restriction of instruction set.
All protection violation results in an exception being generated. Refer to Chapter 5, Interrupt
and Exception Handling for an explanation of the exception mechanism. This chapter describes
the protection mechanism and the violations which lead to exceptions.
The following sections describe the protection mechanism available in protected mode. Refer to
Chapter 16, 8086 Emulation for information on protection in real-address and virtual-8086
mode.
4-1
PROTECTION
4-2
PROTECTION
• Current privilege level (CPL) field. (Bits 0 and 1 of the CS segment register.) Indicates the
privilege level of the currently executing program or procedure. The term current privilege
level (CPL) refers to the setting of this field.
• User/supervisor (U/S) flag. (Bit 2 of a page-directory or page-table entry.) Determines the
type of page: user or supervisor.
• Read/write (R/W) flag. (Bit 1 of a page-directory or page-table entry.) Determines the type
of access allowed to a page: read only or read-write.
Figure 4-1 shows the location of the various fields and flags in the data, code, and system-
segment descriptors; Figure 3-6 in Chapter 3, Protected-Mode Memory Management shows the
location of the RPL (or CPL) field in a segment selector (or the CS register); and Figure 3-14 in
Chapter 3, Protected-Mode Memory Management shows the location of the U/S and R/W flags
in the page-directory and page-table entries.
4-3
PROTECTION
Data-Segment Descriptor
31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0
A Limit D Type
Base 31:24 G B 0 V P P Base 23:16 4
19:16
L L 1 0 E W A
31 16 15 0
Code-Segment Descriptor
31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0
A Limit D Type
Base 31:24 G D 0 V P P Base 23:16 4
19:16
L L 1 1 C R A
31 16 15 0
System-Segment Descriptor
31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0
Limit D
Base 31:24 G 0 P P 0 Type Base 23:16 4
19:16
L
31 16 15 0
Many different styles of protection schemes can be implemented with these fields and flags.
When the operating system creates a descriptor, it places values in these fields and flags in
keeping with the particular protection style chosen for an operating system or executive. Appli-
cation program do not generally access or modify these fields and flags.
The following sections describe how the processor uses these fields and flags to perform the
various categories of checks described in the introduction to this chapter.
4-4
PROTECTION
4-5
PROTECTION
4-6
PROTECTION
— The LAR instruction must reference a segment or gate descriptor for an LDT, TSS,
call gate, task gate, code segment, or data segment.
— The LSL instruction must reference a segment descriptor for a LDT, TSS, code
segment, or data segment.
— IDT entries must be interrupt, trap, or task gates.
• During certain internal operations. For example:
— On a far call or far jump (executed with a far CALL or far JMP instruction), the
processor determines the type of control transfer to be carried out (call or jump to
another code segment, a call or jump through a gate, or a task switch) by checking the
type field in the segment (or gate) descriptor pointed to by the segment (or gate)
selector given as an operand in the CALL or JMP instruction. If the descriptor type is
for a code segment or call gate, a call or jump to another code segment is indicated; if
the descriptor type is for a TSS or task gate, a task switch is indicated.
— On a call or jump through a call gate (or on an interrupt- or exception-handler call
through a trap or interrupt gate), the processor automatically checks that the segment
descriptor being pointed to by the gate is for a code segment.
— On a call or jump to a new task through a task gate (or on an interrupt- or exception-
handler call to a new task through a task gate), the processor automatically checks that
the segment descriptor being pointed to by the task gate is for a TSS.
— On a call or jump to a new task by a direct reference to a TSS, the processor automati-
cally checks that the segment descriptor being pointed to by the CALL or JMP
instruction is for a TSS.
— On return from a nested task (initiated by an IRET instruction), the processor checks
that the previous task link field in the current TSS points to a TSS.
4-7
PROTECTION
Protection Rings
Operating
System
Kernel Level 0
Operating System
Services Level 1
Level 2
Applications Level 3
The processor uses privilege levels to prevent a program or task operating at a lesser privilege
level from accessing a segment with a greater privilege, except under controlled situations.
When the processor detects a privilege level violation, it generates a general-protection excep-
tion (#GP).
To carry out privilege-level checks between code segments and data segments, the processor
recognizes the following three types of privilege levels:
• Current privilege level (CPL). The CPL is the privilege level of the currently executing
program or task. It is stored in bits 0 and 1 of the CS and SS segment registers. Normally,
the CPL is equal to the privilege level of the code segment from which instructions are
being fetched. The processor changes the CPL when program control is transferred to a
code segment with a different privilege level. The CPL is treated slightly differently when
accessing conforming code segments. Conforming code segments can be accessed from
any privilege level that is equal to or numerically greater (less privileged) than the DPL of
the conforming code segment. Also, the CPL is not changed when the processor accesses a
conforming code segment that has a different privilege level than the CPL.
• Descriptor privilege level (DPL). The DPL is the privilege level of a segment or gate. It is
stored in the DPL field of the segment or gate descriptor for the segment or gate. When the
currently executing code segment attempts to access a segment or gate, the DPL of the
4-8
PROTECTION
segment or gate is compared to the CPL and RPL of the segment or gate selector (as
described later in this section). The DPL is interpreted differently, depending on the type of
segment or gate being accessed:
— Data segment. The DPL indicates the numerically highest privilege level that a
program or task can have to be allowed to access the segment. For example, if the DPL
of a data segment is 1, only programs running at a CPL of 0 or 1 can access the
segment.
— Nonconforming code segment (without using a call gate). The DPL indicates the
privilege level that a program or task must be at to access the segment. For example, if
the DPL of a nonconforming code segment is 0, only programs running at a CPL of 0
can access the segment.
— Call gate. The DPL indicates the numerically highest privilege level that the currently
executing program or task can be at and still be able to access the call gate. (This is the
same access rule as for a data segment.)
— Conforming code segment and nonconforming code segment accessed through a
call gate. The DPL indicates the numerically lowest privilege level that a program or
task can have to be allowed to access the segment. For example, if the DPL of a
conforming code segment is 2, programs running at a CPL of 0 or 1 cannot access the
segment.
— TSS. The DPL indicates the numerically highest privilege level that the currently
executing program or task can be at and still be able to access the TSS. (This is the
same access rule as for a data segment.)
• Requested privilege level (RPL). The RPL is an override privilege level that is assigned
to segment selectors. It is stored in bits 0 and 1 of the segment selector. The processor
checks the RPL along with the CPL to determine if access to a segment is allowed. Even if
the program or task requesting access to a segment has sufficient privilege to access the
segment, access is denied if the RPL is not of sufficient privilege level. That is, if the RPL
of a segment selector is numerically greater than the CPL, the RPL overrides the CPL, and
vice versa. The RPL can be used to insure that privileged code does not access a segment
on behalf of an application program unless the program itself has access privileges for that
segment. Refer to Section 4.10.4., “Checking Caller Access Privileges (ARPL
Instruction)” for a detailed description of the purpose and typical use of the RPL.
Privilege levels are checked when the segment selector of a segment descriptor is loaded into a
segment register. The checks used for data access differ from those used for transfers of program
control among code segments; therefore, the two kinds of accesses are considered separately in
the following sections.
4-9
PROTECTION
(Segment registers can be loaded with the MOV, POP, LDS, LES, LFS, LGS, and LSS instruc-
tions.) Before the processor loads a segment selector into a segment register, it performs a priv-
ilege check (refer to Figure 4-3) by comparing the privilege levels of the currently running
program or task (the CPL), the RPL of the segment selector, and the DPL of the segment’s
segment descriptor. The processor loads the segment selector into the segment register if the
DPL is numerically greater than or equal to both the CPL and the RPL. Otherwise, a general-
protection fault is generated and the segment register is not loaded.
CS Register
CPL
Segment Selector
For Data Segment
RPL
Figure 4-4 shows four procedures (located in codes segments A, B, C, and D), each running at
different privilege levels and each attempting to access the same data segment.
• The procedure in code segment A is able to access data segment E using segment selector
E1, because the CPL of code segment A and the RPL of segment selector E1 are equal to
the DPL of data segment E.
• The procedure in code segment B is able to access data segment E using segment selector
E2, because the CPL of code segment A and the RPL of segment selector E2 are both
numerically lower than (more privileged) than the DPL of data segment E. A code segment
B procedure can also access data segment E using segment selector E1.
• The procedure in code segment C is not able to access data segment E using segment
selector E3 (dotted line), because the CPL of code segment C and the RPL of segment
selector E3 are both numerically greater than (less privileged) than the DPL of data
segment E. Even if a code segment C procedure were to use segment selector E1 or E2,
such that the RPL would be acceptable, it still could not access data segment E because its
CPL is not privileged enough.
• The procedure in code segment D should be able to access data segment E because code
segment D’s CPL is numerically less than the DPL of data segment E. However, the RPL
of segment selector E3 (which the code segment D procedure is using to access data
segment E) is numerically greater than the DPL of data segment E, so access is not
4-10
PROTECTION
allowed. If the code segment D procedure were to use segment selector E1 or E2 to access
the data segment, access would be allowed.
Code
Segment C Segment Sel. E3
CPL=3 RPL=3
3 Lowest Privilege
Code Data
Segment A Segment Sel. E1
RPL=2 Segment E
CPL=2
2 DPL=2
Code
Segment B Segment Sel. E2
RPL=1
CPL=1
1
Code
Segment D
CPL=0
0 Highest Privilege
Figure 4-4. Examples of Accessing Data Segments From Various Privilege Levels
As demonstrated in the previous examples, the addressable domain of a program or task varies
as its CPL changes. When the CPL is 0, data segments at all privilege levels are accessible; when
the CPL is 1, only data segments at privilege levels 1 through 3 are accessible; when the CPL is
3, only data segments at privilege level 3 are accessible.
The RPL of a segment selector can always override the addressable domain of a program or task.
When properly used, RPLs can prevent problems caused by accidental (or intensional) use of
segment selectors for privileged data segments by less privileged programs or procedures.
It is important to note that the RPL of a segment selector for a data segment is under software
control. For example, an application program running at a CPL of 3 can set the RPL for a data-
segment selector to 0. With the RPL set to 0, only the CPL checks, not the RPL checks, will
provide protection against deliberate, direct attempts to violate privilege-level security for the
data segment. To prevent these types of privilege-level-check violations, a program or procedure
can check access privileges whenever it receives a data-segment selector from another proce-
dure (refer to Section 4.10.4., “Checking Caller Access Privileges (ARPL Instruction)”).
4-11
PROTECTION
4-12
PROTECTION
• The target operand points to a TSS, which contains the segment selector for the target code
segment.
• The target operand points to a task gate, which points to a TSS, which in turn contains the
segment selector for the target code segment.
The following sections describe first two types of references. Refer to Section 6.3., “Task
Switching” in Chapter 6, Task Management for information on transferring program control
through a task gate and/or TSS.
CS Register
CPL
Segment Selector
For Code Segment
RPL
Destination Code
Segment Descriptor Privilege
Check
DPL C
Figure 4-5. Privilege Check for Control Transfer Without Using a Gate
• The DPL of the segment descriptor for the destination code segment that contains the
called procedure.
• The RPL of the segment selector of the destination code segment.
• The conforming (C) flag in the segment descriptor for the destination code segment, which
determines whether the segment is a conforming (C flag is set) or nonconforming (C flag is
clear) code segment. (Refer to Section 3.4.3.1., “Code- and Data-Segment Descriptor
4-13
PROTECTION
Segment Sel. D2
Code RPL=3
Segment B
CPL=3 Segment Sel. C2
RPL=3
3 Lowest Privilege
Code
Segment D
DPL=3
Conforming
1 Code Segment
0 Highest Privilege
4-14
PROTECTION
The RPL of the segment selector that points to a nonconforming code segment has a limited
effect on the privilege check. The RPL must be numerically less than or equal to the CPL of the
calling procedure for a successful control transfer to occur. So, in the example in Figure 4-6, the
RPLs of segment selectors C1 and C2 could legally be set to 0, 1, or 2, but not to 3.
When the segment selector of a nonconforming code segment is loaded into the CS register, the
privilege level field is not changed; that is, it remains at the CPL (which is the privilege level of
the calling procedure). This is true, even if the RPL of the segment selector is different from the
CPL.
4-15
PROTECTION
4-16
PROTECTION
31 16 15 14 13 12 11 8 7 6 5 4 0
D Type Param.
Offset in Segment 31:16 P P 0 0 0
Count 4
L 0 1 1 0 0
31 16 15 0
The segment selector field in a call gate specifies the code segment to be accessed. The offset
field specifies the entry point in the code segment. This entry point is generally to the first
instruction of a specific procedure. The DPL field indicates the privilege level of the call gate,
which in turn is the privilege level required to access the selected procedure through the gate.
The P flag indicates whether the call-gate descriptor is valid. (The presence of the code segment
to which the gate points is indicated by the P flag in the code segment’s descriptor.) The param-
eter count field indicates the number of parameters to copy from the calling procedures stack to
the new stack if a stack switch occurs (refer to Section 4.8.5., “Stack Switching”). The parameter
count specifies the number of words for 16-bit call gates and doublewords for 32-bit call gates.
Note that the P flag in a gate descriptor is normally always set to 1. If it is set to 0, a not present
(#NP) exception is generated when a program attempts to access the descriptor. The operating
system can use the P flag for special purposes. For example, it could be used to track the number
of times the gate is used. Here, the P flag is initially set to 0 causing a trap to the not-present
exception handler. The exception handler then increments a counter and sets the P flag to 1, so
that on returning from the handler, the gate descriptor will be valid.
4-17
PROTECTION
Descriptor Table
Offset Call-Gate
Segment Selector Offset Descriptor
Procedure
Entry Point
4-18
PROTECTION
CS Register
CPL
Call-Gate Selector
RPL
Destination Code-
Segment Descriptor
DPL
Figure 4-9. Privilege Check for Control Transfer with Call Gate
The privilege checking rules are different depending on whether the control transfer was initi-
ated with a CALL or a JMP instruction, as shown in Table 4-1.
The DPL field of the call-gate descriptor specifies the numerically highest privilege level from
which a calling procedure can access the call gate; that is, to access a call gate, the CPL of a
calling procedure must be equal to or less than the DPL of the call gate. For example, in Figure
4-12, call gate A has a DPL of 3. So calling procedures at all CPLs (0 through 3) can access this
call gate, which includes calling procedures in code segments A, B, and C. Call gate B has a
DPL of 2, so only calling procedures at a CPL or 0, 1, or 2 can access call gate B, which includes
calling procedures in code segments B and C. The dotted line shows that a calling procedure in
code segment A cannot access call gate B.
4-19
PROTECTION
The RPL of the segment selector to a call gate must satisfy the same test as the CPL of the calling
procedure; that is, the RPL must be less than or equal to the DPL of the call gate. In the example
in Figure 4-12, a calling procedure in code segment C can access call gate B using gate selector
B2 or B1, but it could not use gate selector B3 to access call gate B.
If the privilege checks between the calling procedure and call gate are successful, the processor
then checks the DPL of the code-segment descriptor against the CPL of the calling procedure.
Here, the privilege check rules vary between CALL and JMP instructions. Only CALL instruc-
tions can use call gates to transfer program control to more privileged (numerically lower priv-
ilege level) nonconforming code segments; that is, to nonconforming code segments with a DPL
less than the CPL. A JMP instruction can use a call gate only to transfer program control to a
nonconforming code segment with a DPL equal to the CPL. CALL and JMP instruction can both
transfer program control to a more privileged conforming code segment; that is, to a conforming
code segment with a DPL less than or equal to the CPL.
If a call is made to a more privileged (numerically lower privilege level) nonconforming desti-
nation code segment, the CPL is lowered to the DPL of the destination code segment and a stack
switch occurs (refer to Section 4.8.5., “Stack Switching”). If a call or jump is made to a more
privileged conforming destination code segment, the CPL is not changed and no stack switch
occurs.
Code Call
Segment B Gate Selector B1
RPL=2 Gate B
CPL=2 DPL=2
2
Code
Segment C Gate Selector B2
RPL=1
CPL=1
No Stack Stack Switch
1 Switch Occurs Occurs
Code Code
Segment D Segment E
DPL=0 DPL=0
Conforming Nonconforming
0 Highest Privilege Code Segment Code Segment
4-20
PROTECTION
Call gates allow a single code segment to have procedures that can be accessed at different priv-
ilege levels. For example, an operating system located in a code segment may have some
services which are intended to be used by both the operating system and application software
(such as procedures for handling character I/O). Call gates for these procedures can be set up
that allow access at all privilege levels (0 through 3). More privileged call gates (with DPLs of
0 or 1) can then be set up for other operating system services that are intended to be used only
by the operating system (such as procedures that initialize device drivers).
4-21
PROTECTION
The stack will need to require enough space to contain many frames of these items, because
procedures often call other procedures, and an operating system may support nesting of multiple
interrupts. Each stack should be large enough to allow for the worst case nesting scenario at its
privilege level.
(If the operating system does not use the processor’s multitasking mechanism, it still must create
at least one TSS for this stack-related purpose.)
When a procedure call through a call gate results in a change in privilege level, the processor
performs the following steps to switch stacks and begin execution of the called procedure at a
new privilege level:
1. Uses the DPL of the destination code segment (the new CPL) to select a pointer to the new
stack (segment selector and stack pointer) from the TSS.
2. Reads the segment selector and stack pointer for the stack to be switched to from the
current TSS. Any limit violations detected while reading the stack-segment selector, stack
pointer, or stack-segment descriptor cause an invalid TSS (#TS) exception to be generated.
3. Checks the stack-segment descriptor for the proper privileges and type and generates an
invalid TSS (#TS) exception if violations are detected.
4. Temporarily saves the current values of the SS and ESP registers.
5. Loads the segment selector and stack pointer for the new stack in the SS and ESP registers.
6. Pushes the temporarily saved values for the SS and ESP registers (for the calling
procedure) onto the new stack (refer to Figure 4-11).
7. Copies the number of parameter specified in the parameter count field of the call gate from
the calling procedure’s stack to the new stack. If the count is 0, no parameters are copied.
8. Pushes the return instruction pointer (the current contents of the CS and EIP registers) onto
the new stack.
9. Loads the segment selector for the new code segment and the new instruction pointer from
the call gate into the CS and EIP registers, respectively, and begins execution of the called
procedure.
Refer to the description of the CALL instruction in Chapter 3, Instruction Set Reference, in the
Intel Architecture Software Developer’s Manual, Volume 2, for a detailed description of the priv-
ilege level checks and other protection checks that the processor performs on a far call through
a call gate.
4-22
PROTECTION
Calling SS
Parameter 2 Parameter 1
Parameter 3
Calling CS
The parameter count field in a call gate specifies the number of data items (up to 31) that the
processor should copy from the calling procedure’s stack to the stack of the called procedure. If
more than 31 data items need to be passed to the called procedure, one of the parameters can be
a pointer to a data structure, or the saved contents of the SS and ESP registers may be used to
access parameters in the old stack space. The size of the data items passed to the called proce-
dure depends on the call gate size, as described in Section 4.8.3., “Call Gates”
4-23
PROTECTION
A far return that requires a privilege-level change is only allowed when returning to a less priv-
ileged level (that is, the DPL of the return code segment is numerically greater than the CPL).
The processor uses the RPL field from the CS register value saved for the calling procedure
(refer to Figure 4-11) to determine if a return to a numerically higher privilege level is required.
If the RPL is numerically greater (less privileged) than the CPL, a return across privilege levels
occurs.
The processor performs the following steps when performing a far return to a calling procedure
(refer to Figures 4-2 and 4-4 in the Intel Architecture Software Developer’s Manual, Volume 1,
for an illustration of the stack contents prior to and after a return):
1. Checks the RPL field of the saved CS register value to determine if a privilege level
change is required on the return.
2. Loads the CS and EIP registers with the values on the called procedure’s stack. (Type and
privilege level checks are performed on the code-segment descriptor and RPL of the code-
segment selector.)
3. (If the RET instruction includes a parameter count operand and the return requires a
privilege level change.) Adds the parameter count (in bytes obtained from the RET
instruction) to the current ESP register value (after popping the CS and EIP values), to step
past the parameters on the called procedure’s stack. The resulting value in the ESP register
points to the saved SS and ESP values for the calling procedure’s stack. (Note that the byte
count in the RET instruction must be chosen to match the parameter count in the call gate
that the calling procedure referenced when it made the original call multiplied by the size
of the parameters.)
4. (If the return requires a privilege level change.) Loads the SS and ESP registers with the
saved SS and ESP values and switches back to the calling procedure’s stack. The SS and
ESP values for the called procedure’s stack are discarded. Any limit violations detected
while loading the stack-segment selector or stack pointer cause a general-protection
exception (#GP) to be generated. The new stack-segment descriptor is also checked for
type and privilege violations.
5. (If the RET instruction includes a parameter count operand.) Adds the parameter count (in
bytes obtained from the RET instruction) to the current ESP register value, to step past the
parameters on the calling procedure’s stack. The resulting ESP value is not checked against
the limit of the stack segment. If the ESP value is beyond the limit, that fact is not
recognized until the next stack operation.
6. (If the return requires a privilege level change.) Checks the contents of the DS, ES, FS, and
GS segment registers. If any of these registers refer to segments whose DPL is less than the
new CPL (excluding conforming code segments), the segment register is loaded with a null
segment selector.
Refer to the description of the RET instruction in Chapter 3, Instruction Set Reference, of the
Intel Architecture Software Developer’s Manual, Volume 2, for a detailed description of the priv-
ilege level checks and other protection checks that the processor performs on a far return.
4-24
PROTECTION
4-25
PROTECTION
4-26
PROTECTION
4-27
PROTECTION
4-28
PROTECTION
not have sufficient privilege, but the operating system (located in code segment C) can. So, in
an attempt to access data segment D, the application program executes a call to the operating
system and passes segment selector D1 to the operating system as a parameter on the stack.
Before passing the segment selector, the (well behaved) application program sets the RPL of the
segment selector to its current privilege level (which in this example is 3). If the operating
system attempts to access data segment D using segment selector D1, the processor compares
the CPL (which is now 0 following the call), the RPL of segment selector D1, and the DPL of
data segment D (which is 0). Since the RPL is greater than the DPL, access to data segment D
is denied. The processor’s protection mechanism thus protects data segment D from access by
the operating system, because application program’s privilege level (represented by the RPL of
segment selector B) is greater than the DPL of data segment D.
Passed as a
parameter on
the stack.
Application Program
Code Call
Segment A Gate Selector B Segment Sel. D1
RPL=3 Gate B RPL=3
CPL=3 DPL=3
3 Lowest Privilege
2
Access
not
allowed
1
Code Data
Operating Segment C Segment Sel. D2 Segment D
System RPL=0
DPL=0 DPL=0
Access
0 Highest Privilege allowed
Now assume that instead of setting the RPL of the segment selector to 3, the application program
sets the RPL to 0 (segment selector D2). The operating system can now access data segment D,
because its CPL and the RPL of segment selector D2 are both equal to the DPL of data segment
D. Because the application program is able to change the RPL of a segment selector to any value,
it can potentially use a procedure operating at a numerically lower privilege level to access a
4-29
PROTECTION
protected data structure. This ability to lower the RPL of a segment selector breaches the
processor’s protection mechanism.
Because a called procedure cannot rely on the calling procedure to set the RPL correctly, oper-
ating-system procedures (executing at numerically lower privilege-levels) that receive segment
selectors from numerically higher privilege-level procedures need to test the RPL of the segment
selector to determine if it is at the appropriate level. The ARPL (adjust requested privilege level)
instruction is provided for this purpose. This instruction adjusts the RPL of one segment selector
to match that of another segment selector.
The example in Figure 4-12 demonstrates how the ARPL instruction is intended to be used.
When the operating-system receives segment selector D2 from the application program, it uses
the ARPL instruction to compare the RPL of the segment selector with the privilege level of the
application program (represented by the code-segment selector pushed onto the stack). If the
RPL is less than application program’s privilege level, the ARPL instruction changes the RPL
of the segment selector to match the privilege level of the application program (segment
selector D1). Using this instruction thus prevents a procedure running at a numerically higher
privilege level from accessing numerically lower privilege-level (more privileged) segments by
lowering the RPL of a segment selector.
Note that the privilege level of the application program can be determined by reading the RPL
field of the segment selector for the application-program’s code segment. This segment selector
is stored on the stack as part of the call to the operating system. The operating system can copy
the segment selector from the stack into a register for use as an operand for the ARPL
instruction.
4-30
PROTECTION
exception being generated. Because checks are performed in parallel with address translation,
there is no performance penalty.
The processor performs two page-level protection checks:
• Restriction of addressable domain (supervisor and user modes).
• Page type (read only or read/write).
Violations of either of these checks results in a page-fault exception being generated. Refer to
Chapter 5, Interrupt and Exception Handling for an explanation of the page-fault exception
mechanism. This chapter describes the protection violations which lead to page-fault excep-
tions.
4-31
PROTECTION
4-32
PROTECTION
NOTE:
* If the WP flag of CR0 is set, the access type is determined by the R/W flags of the page-directory and
page-table entries.
4-33
PROTECTION
4-34
5
Interrupt and
Exception Handling
INTERRUPT AND EXCEPTION HANDLING
CHAPTER 5
INTERRUPT AND EXCEPTION HANDLING
This chapter describes the processor’s interrupt and exception-handling mechanism, when oper-
ating in protected mode. Most of the information provided here also applies to the interrupt and
exception mechanism used in real-address or virtual-8086 mode. Refer to Chapter 16, 8086
Emulation for a description of the differences in the interrupt and exception mechanism for real-
address and virtual-8086 mode.
5-1
INTERRUPT AND EXCEPTION HANDLING
5-2
INTERRUPT AND EXCEPTION HANDLING
5-3
INTERRUPT AND EXCEPTION HANDLING
5-4
INTERRUPT AND EXCEPTION HANDLING
5-5
INTERRUPT AND EXCEPTION HANDLING
NOTES:
1. The UD2 instruction was introduced in the Pentium® Pro processor.
2. Intel Architecture processors after the Intel386™ processor do not generate this exception.
3. This exception was introduced in the Intel486™ processor.
4. This exception was introduced in the Pentium® processor and enhanced in the P6 family processors.
5. This exception was introduced in the Pentium® III processor.
5-6
INTERRUPT AND EXCEPTION HANDLING
5-7
INTERRUPT AND EXCEPTION HANDLING
code sample, the signaling of exceptions will occur uniformly when the code is executed on any
family of Intel Architecture processors (except where new exceptions or new opcodes have been
defined).
5-8
INTERRUPT AND EXCEPTION HANDLING
external interrupts. The IF flag does not affect nonmaskable interrupts (NMIs) delivered to the
NMI pin or delivery mode NMI messages delivered through the APIC serial bus, nor does it
affect processor generated exceptions. As with the other flags in the EFLAGS register, the
processor clears the IF flag in response to a hardware reset.
The fact that the group of maskable hardware interrupts includes the reserved interrupt and
exception vectors 0 through 32 can potentially cause confusion. Architecturally, when the IF
flag is set, an interrupt for any of the vectors from 0 through 32 can be delivered to the processor
through the INTR pin and any of the vectors from 16 through 32 can be delivered through the
local APIC. The processor will then generate an interrupt and call the interrupt or exception
handler pointed to by the vector number. So for example, it is possible to invoke the page-fault
handler through the INTR pin (by means of vector 14); however, this is not a true page-fault
exception. It is an interrupt. As with the INT n instruction (refer to Section 5.1.2.2., “Software-
Generated Exceptions”), when an interrupt is generated through the INTR pin to an exception
vector, the processor does not push an error code on the stack, so the exception handler may not
operate correctly.
The IF flag can be set or cleared with the STI (set interrupt-enable flag) and CLI (clear interrupt-
enable flag) instructions, respectively. These instructions may be executed only if the CPL is
equal to or less than the IOPL. A general-protection exception (#GP) is generated if they are
executed when the CPL is greater than the IOPL. (The effect of the IOPL on these instructions
is modified slightly when the virtual mode extension is enabled by setting the VME flag in
control register CR4, refer to Section 16.3., “Interrupt and Exception Handling in Virtual-8086
Mode” in Chapter 16, 8086 Emulation.)
The IF flag is also affected by the following operations:
• The PUSHF instruction stores all flags on the stack, where they can be examined and
modified. The POPF instruction can be used to load the modified flags back into the
EFLAGS register.
• Task switches and the POPF and IRET instructions load the EFLAGS register; therefore,
they can be used to modify the setting of the IF flag.
• When an interrupt is handled through an interrupt gate, the IF flag is automatically cleared,
which disables maskable hardware interrupts. (If an interrupt is handled through a trap
gate, the IF flag is not cleared.)
Refer to the descriptions of the CLI, STI, PUSHF, POPF, and IRET instructions in Chapter 3,
Instruction Set Reference, of the Intel Architecture Software Developer’s Manual, Volume 2, for
a detailed description of the operations these instructions are allowed to perform on the IF flag.
5-9
INTERRUPT AND EXCEPTION HANDLING
to prevent the processor from going into a debug exception loop on an instruction-breakpoint.
Refer to Section 15.3.1.1., “Instruction-Breakpoint Exception Condition”, in Chapter 15,
Debugging and Performance Monitoring, for more information on the use of this flag.
If an interrupt or exception occurs after the segment selector has been loaded into the SS register
but before the ESP register has been loaded, these two parts of the logical address into the stack
space are inconsistent for the duration of the interrupt or exception handler.
To prevent this situation, the processor inhibits interrupts, debug exceptions, and single-step trap
exceptions after either a MOV to SS instruction or a POP to SS instruction, until the instruction
boundary following the next instruction is reached. All other faults may still be generated. If the
LSS instruction is used to modify the contents of the SS register (which is the recommended
method of modifying this register), this problem does not occur.
5-10
INTERRUPT AND EXCEPTION HANDLING
order in which interrupts will be recognized by the processor if received simultaneously at the
processor pins.
2 QNaN operand1
1. Though this is not an exception, the handling of a QNaN operand has precedence over lower priority
exceptions. For example, a QNaN divided by zero results in a QNaN, not a zero-divide exception.
2. If masked, then instruction execution continues, and a lower priority exception can occur as well.
5-11
INTERRUPT AND EXCEPTION HANDLING
NOTE:
1. For the Pentium® and Intel486™ processors, the Code Segment Limit Violation and the Code Page Fault
exceptions are assigned to the priority 7.
The base addresses of the IDT should be aligned on an 8-byte boundary to maximize perfor-
mance of cache line fills. The limit value is expressed in bytes and is added to the base address
to get the address of the last valid byte. A limit value of 0 results in exactly 1 valid byte. Because
IDT entries are always eight bytes long, the limit should always be one less than an integral
multiple of eight (that is, 8N – 1).
5-12
INTERRUPT AND EXCEPTION HANDLING
The IDT may reside anywhere in the linear address space. As shown in Figure 5-1, the processor
locates the IDT using the IDTR register. This register holds both a 32-bit base address and 16-bit
limit for the IDT.
IDTR Register
47 16 15 0
IDT Base Address IDT Limit
Interrupt
Descriptor Table (IDT)
+ Gate for
Interrupt #n (n−1)∗8
Gate for
Interrupt #3 16
Gate for
Interrupt #2 8
Gate for
Interrupt #1 0
31 0
The LIDT (load IDT register) and SIDT (store IDT register) instructions load and store the
contents of the IDTR register, respectively. The LIDT instruction loads the IDTR register with
the base address and limit held in a memory operand. This instruction can be executed only
when the CPL is 0. It normally is used by the initialization code of an operating system when
creating an IDT. An operating system also may use it to change from one IDT to another. The
SIDT instruction copies the base and limit value stored in IDTR to memory. This instruction can
be executed at any privilege level.
If a vector references a descriptor beyond the limit of the IDT, a general-protection exception
(#GP) is generated.
5-13
INTERRUPT AND EXCEPTION HANDLING
Figure 5-2 shows the formats for the task-gate, interrupt-gate, and trap-gate descriptors. The
format of a task gate used in an IDT is the same as that of a task gate used in the GDT or an LDT
(refer to Section 6.2.4., “Task-Gate Descriptor” in Chapter 6, Task Management). The task gate
contains the segment selector for a TSS for an exception and/or interrupt handler task.
Task Gate
31 16 15 14 13 12 8 7 0
D
P P 0 0 1 0 1 4
L
31 16 15 0
Interrupt Gate
31 16 15 14 13 12 8 7 5 4 0
D
Offset 31..16 P P 0 D 1 1 0 0 0 0 4
L
31 16 15 0
Trap Gate
31 16 15 14 13 12 8 7 5 4 0
D
Offset 31..16 P P 0 D 1 1 1 0 0 0 4
L
31 16 15 0
Interrupt and trap gates are very similar to call gates (refer to Section 4.8.3., “Call Gates” in
Chapter 4, Protection). They contain a far pointer (segment selector and offset) that the
processor uses to transfer execution to a handler procedure in an exception- or interrupt-handler
5-14
INTERRUPT AND EXCEPTION HANDLING
code segment. These gates differ in the way the processor handles the IF flag in the EFLAGS
register (refer to Section 5.10.1.2., “Flag Usage By Exception- or Interrupt-Handler Proce-
dure”).
5-15
INTERRUPT AND EXCEPTION HANDLING
Destination
IDT Code Segment
Interrupt
Offset Procedure
Interrupt
Vector
Interrupt or
Trap Gate
+
Segment Selector
GDT or LDT
Base
Address
Segment
Descriptor
5-16
INTERRUPT AND EXCEPTION HANDLING
ESP Before
EFLAGS Transfer to Handler
CS
EIP
Error Code ESP After
Transfer to Handler
ESP Before
Transfer to Handler SS
ESP
EFLAGS
CS
EIP
ESP After Error Code
Transfer to Handler
5-17
INTERRUPT AND EXCEPTION HANDLING
handlers, such as the page-fault handler, providing that those handlers are placed in more
privileged code segments (numerically lower privilege level). For hardware-generated
interrupts and processor-detected exceptions, the processor ignores the DPL of interrupt
and trap gates.
Because exceptions and interrupts generally do not occur at predictable times, these privilege
rules effectively impose restrictions on the privilege levels at which exception and interrupt-
handling procedures can run. Either of the following techniques can be used to avoid privilege-
level violations.
• The exception or interrupt handler can be placed in a conforming code segment. This
technique can be used for handlers that only need to access data available on the stack (for
example, divide error exceptions). If the handler needs data from a data segment, the data
segment needs to be accessible from privilege level 3, which would make it unprotected.
• The handler can be placed in a nonconforming code segment with privilege level 0. This
handler would always run, regardless of the CPL that the interrupted program or task is
running at.
5-18
INTERRUPT AND EXCEPTION HANDLING
• The handler can be further isolated from other tasks by giving it a separate address space.
This is done by giving it a separate LDT.
The disadvantage of handling an interrupt with a separate task is that the amount of machine
state that must be saved on a task switch makes it slower than using an interrupt gate, resulting
in increased interrupt latency.
A task gate in the IDT references a TSS descriptor in the GDT (refer to Figure 5-5). A switch to
the handler task is handled in the same manner as an ordinary task switch (refer to Section 6.3.,
“Task Switching” in Chapter 6, Task Management). The link back to the interrupted task is
stored in the previous task link field of the handler task’s TSS. If an exception caused an error
code to be generated, this error code is copied to the stack of the new task.
Interrupt
Vector Task Gate
TSS Descriptor
5-19
INTERRUPT AND EXCEPTION HANDLING
When exception- or interrupt-handler tasks are used in an operating system, there are actually
two mechanisms that can be used to dispatch tasks: the software scheduler (part of the operating
system) and the hardware scheduler (part of the processor’s interrupt mechanism). The software
scheduler needs to accommodate interrupt tasks that may be dispatched when interrupts are
enabled.
31 3 2 1 0
T I E
Reserved Segment Selector Index I D X
T T
The segment selector index field provides an index into the IDT, GDT, or current LDT to the
segment or gate selector being referenced by the error code. In some cases the error code is null
(that is, all bits in the lower word are clear). A null error code indicates that the error was not
caused by a reference to a specific segment or that a null segment descriptor was referenced in
an operation.
The format of the error code is different for page-fault exceptions (#PF), refer to “Interrupt
14—Page-Fault Exception (#PF)” in this chapter.
The error code is pushed on the stack as a doubleword or word (depending on the default inter-
rupt, trap, or task gate size). To keep the stack aligned for doubleword pushes, the upper half of
the error code is reserved. Note that the error code is not popped when the IRET instruction is
executed to return from an exception handler, so the handler must remove the error code before
executing a return.
5-20
INTERRUPT AND EXCEPTION HANDLING
Error codes are not pushed on the stack for exceptions that are generated externally (with the
INTR or LINT[1:0] pins) or the INT n instruction, even if an error code is normally produced
for those exceptions.
5-21
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates the divisor operand for a DIV or IDIV instruction is 0 or that the result cannot be repre-
sented in the number of bits specified for the destination operand.
5-22
INTERRUPT AND EXCEPTION HANDLING
Exception Class Trap or Fault. The exception handler can distinguish between traps or
faults by examining the contents of DR6 and the other debug registers.
Description
Indicates that one or more of several debug-exception conditions has been detected. Whether the
exception is a fault or a trap depends on the condition, as shown below:
Refer to Chapter 15, Debugging and Performance Monitoring, for detailed information about
the debug exceptions.
5-23
INTERRUPT AND EXCEPTION HANDLING
Description
The nonmaskable interrupt (NMI) is generated externally by asserting the processor’s NMI pin
or through an NMI request set by the I/O APIC to the local APIC on the APIC serial bus. This
interrupt causes the NMI interrupt handler to be called.
5-24
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that a breakpoint instruction (INT 3) was executed, causing a breakpoint trap to be
generated. Typically, a debugger sets a breakpoint by replacing the first opcode byte of an
instruction with the opcode for the INT 3 instruction. (The INT 3 instruction is one byte long,
which makes it easy to replace an opcode in a code segment in RAM with the breakpoint
opcode.) The operating system or a debugging tool can use a data segment mapped to the same
physical address space as the code segment to place an INT 3 instruction in places where it is
desired to call the debugger.
With the P6 family, Pentium®, Intel486™, and Intel386™ processors, it is more convenient to
set breakpoints with the debug registers. (Refer to Section 15.3.2., “Breakpoint Exception
(#BP)—Interrupt Vector 3”, in Chapter 15, Debugging and Performance Monitoring, for infor-
mation about the breakpoint exception.) If more breakpoints are needed beyond what the debug
registers allow, the INT 3 instruction can be used.
The breakpoint (#BP) exception can also be generated by executing the INT n instruction with
an operand of 3. The action of this instruction (INT 3) is slightly different than that of the INT
3 instruction (refer to “INTn/INTO/INT3—Call to Interrupt Procedure” in Chapter 3 of the Intel
Architecture Software Developer’s Manual, Volume 2).
5-25
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that an overflow trap occurred when an INTO instruction was executed. The INTO
instruction checks the state of the OF flag in the EFLAGS register. If the OF flag is set, an over-
flow trap is generated.
Some arithmetic instructions (such as the ADD and SUB) perform both signed and unsigned
arithmetic. These instructions set the OF and CF flags in the EFLAGS register to indicate signed
overflow and unsigned overflow, respectively. When performing arithmetic on signed operands,
the OF flag can be tested directly or the INTO instruction can be used. The benefit of using the
INTO instruction is that if the overflow exception is detected, an exception handler can be called
automatically to handle the overflow condition.
5-26
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that a BOUND-range-exceeded fault occurred when a BOUND instruction was
executed. The BOUND instruction checks that a signed array index is within the upper and
lower bounds of an array located in memory. If the array index is not within the bounds of the
array, a BOUND-range-exceeded fault is generated.
5-27
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that the processor did one of the following things:
• Attempted to execute a Streaming SIMD Extensions instruction in an Intel Architecture
processor that does not support the Streaming SIMD Extensions.
• Attempted to execute a Streaming SIMD Extensions instruction when the OSFXSR bit is
not set (0) in CR4. Note this does not include the following Streaming SIMD Extensions:
PAVGB, PAVGW, PEXTRW, PINSRW, PMAXSW, PMAXUB, PMINSW, PMINUB,
PMOVMSKB, PMULHUW, PSADBW, PSHUFW, MASKMOVQ, MOVNTQ,
PREFETCH and SFENCE.
• Attempted to execute a Streaming SIMD Extensions instruction in an Intel Architecture
processor which causes a numeric exception when the OSXMMEXCPT bit is not set (0) in
CR4.
• Attempted to execute an invalid or reserved opcode, including any MMX™ instruction in
an Intel Architecture processor that does not support the MMX™ architecture.
• Attempted to execute an MMX™ instruction or SIMD floating-point instruction when the
EM flag in register CR0 is set. Note this does not include the following Streaming SIMD
Extensions: SFENCE and PREFETCH.
• Attempted to execute an instruction with an operand type that is invalid for its accompa-
nying opcode; for example, the source operand for a LES instruction is not a memory
location.
• Executed a UD2 instruction.
• Detected a LOCK prefix that precedes an instruction that may not be locked or one that
may be locked but the destination operand is not a memory location.
• Attempted to execute an LLDT, SLDT, LTR, STR, LSL, LAR, VERR, VERW, or ARPL
instruction while in real-address or virtual-8086 mode.
• Attempted to execute the RSM instruction when not in SMM mode.
In the P6 family processors, this exception is not generated until an attempt is made to retire the
result of executing an invalid instruction; that is, decoding and speculatively attempting to
execute an invalid opcode does not generate this exception. Likewise, in the Pentium® processor
and earlier Intel Architecture processors, this exception is not generated as the result of
prefetching and preliminary decoding of an invalid instruction. (Refer to Section 5.4., “Program
or Task Restart” for general rules for taking of interrupts and exceptions.)
The opcodes D6 and F1 are undefined opcodes that are reserved by Intel. These opcodes, even
though undefined, do not generate an invalid opcode exception.
5-28
INTERRUPT AND EXCEPTION HANDLING
5-29
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates one of the following things:
The device-not-available fault is generated by either of three conditions:
• The processor executed a floating-point instruction while the EM flag of register CR0 was
set.
• The processor executed a floating-point, MMX™ or SIMD floating-point (excluding
prefetch, sfence or streaming store instructions) instruction while the TS flag of register
CR0 was set.
• The processor executed a WAIT or FWAIT instruction while the MP and TS flags of
register CR0 were set.
The EM flag is set when the processor does not have an internal floating-point unit. An excep-
tion is then generated each time a floating-point instruction is encountered, allowing an excep-
tion handler to call floating-point instruction emulation routines.
The TS flag indicates that a context switch (task switch) has occurred since the last time a
floating-point, MMX™ or SIMD floating-point (excluding prefetch, sfence or streaming store
instructions) instruction was executed, but that the context of the FPU was not saved. When the
TS flag is set, the processor generates a device-not-available exception each time a floating-
point, MMX™ or SIMD floating-point (excluding prefetch, sfence or streaming store instruc-
tions) instruction is encountered. The exception handler can then save the context of the FPU
before it executes the instruction. Refer to Section 2.5., “Control Registers”, in Chapter 2,
System Architecture Overview, for more information about the TS flag.
The MP flag in control register CR0 is used along with the TS flag to determine if WAIT or
FWAIT instructions should generate a device-not-available exception. It extends the function of
the TS flag to the WAIT and FWAIT instructions, giving the exception handler an opportunity
to save the context of the FPU before the WAIT or FWAIT instruction is executed. The MP flag
is provided primarily for use with the Intel286 and Intel386™ DX processors. For programs
running on the P6 family, Pentium®, or Intel486™ DX processors, or the Intel 487 SX coproces-
sors, the MP flag should always be set; for programs running on the Intel486™ SX processor,
the MP flag should be clear.
5-30
INTERRUPT AND EXCEPTION HANDLING
5-31
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that the processor detected a second exception while calling an exception handler for
a prior exception. Normally, when the processor detects another exception while trying to call
an exception handler, the two exceptions can be handled serially. If, however, the processor
cannot handle them serially, it signals the double-fault exception. To determine when two faults
need to be signaled as a double fault, the processor divides the exceptions into three classes:
benign exceptions, contributory exceptions, and page faults (refer to Table 5-4).
Table 5-5 shows the various combinations of exception classes that cause a double fault to be
generated. A double-fault exception falls in the abort class of exceptions. The program or task
cannot be restarted or resumed. The double-fault handler can be used to collect diagnostic infor-
mation about the state of the machine and/or, when possible, to shut the application and/or
system down gracefully or restart the system.
A segment or page fault may be encountered while prefetching instructions; however, this
behavior is outside the domain of Table 5-5. Any further faults generated while the processor is
attempting to transfer control to the appropriate fault handler could still lead to a double-fault
sequence.
5-32
INTERRUPT AND EXCEPTION HANDLING
If another exception occurs while attempting to call the double-fault handler, the processor
enters shutdown mode. This mode is similar to the state following execution of an HLT instruc-
tion. In this mode, the processor stops executing instructions until an NMI interrupt, SMI inter-
rupt, hardware reset, or INIT# is received. The processor generates a special bus cycle to
indicate that it has entered shutdown mode. Software designers may need to be aware of the
response of hardware to receiving this signal. For example, hardware may turn on an indicator
light on the front panel, generate an NMI interrupt to record diagnostic information, invoke reset
initialization, generate an INIT initialization, or generate an SMI.
If the shutdown occurs while the processor is executing an NMI interrupt handler, then only a
hardware reset can restart the processor.
5-33
INTERRUPT AND EXCEPTION HANDLING
Exception Class Abort. (Intel reserved; do not use. Recent Intel Architecture proces-
sors do not generate this exception.)
Description
Indicates that an Intel386™ CPU-based systems with an Intel 387 math coprocessor detected a
page or segment violation while transferring the middle portion of an Intel 387 math copro-
cessor operand. The P6 family, Pentium®, and Intel486™ processors do not generate this excep-
tion; instead, this condition is detected with a general protection exception (#GP), interrupt 13.
5-34
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that a task switch was attempted and that invalid information was detected in the TSS
for the target task. Table 5-6 shows the conditions that will cause an invalid-TSS exception to
be generated. In general, these invalid conditions result from protection violations for the TSS
descriptor; the LDT pointed to by the TSS; or the stack, code, or data segments referenced by
the TSS.
This exception can generated either in the context of the original task or in the context of the
new task (refer to Section 6.3., “Task Switching” in Chapter 6, Task Management). Until the
processor has completely verified the presence of the new TSS, the exception is generated in the
context of the original task. Once the existence of the new TSS is verified, the task switch is
considered complete. Any invalid-TSS conditions detected after this point are handled in the
context of the new task. (A task switch is considered complete when the task register is loaded
with the segment selector for the new TSS and, if the switch is due to a procedure call or inter-
rupt, the previous task link field of the new TSS references the old TSS.)
To insure that a valid TSS is available to process the exception, the invalid-TSS exception
handler must be a task called using a task gate.
5-35
INTERRUPT AND EXCEPTION HANDLING
5-36
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that the present flag of a segment or gate descriptor is clear. The processor can generate
this exception during any of the following operations:
• While attempting to load CS, DS, ES, FS, or GS registers. [Detection of a not-present
segment while loading the SS register causes a stack fault exception (#SS) to be
generated.] This situation can occur while performing a task switch.
• While attempting to load the LDTR using an LLDT instruction. Detection of a not-present
LDT while loading the LDTR during a task switch operation causes an invalid-TSS
exception (#TS) to be generated.
• When executing the LTR instruction and the TSS is marked not present.
• While attempting to use a gate descriptor or TSS that is marked segment-not-present, but is
otherwise valid.
An operating system typically uses the segment-not-present exception to implement virtual
memory at the segment level. If the exception handler loads the segment and returns, the inter-
rupted program or task resumes execution.
A not-present indication in a gate descriptor, however, does not indicate that a segment is not
present (because gates do not correspond to segments). The operating system may use the
present flag for gate descriptors to trigger exceptions of special significance to the operating
system.
5-37
INTERRUPT AND EXCEPTION HANDLING
5-38
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that one of the following stack related conditions was detected:
• A limit violation is detected during an operation that refers to the SS register. Operations
that can cause a limit violation include stack-oriented instructions such as POP, PUSH,
CALL, RET, IRET, ENTER, and LEAVE, as well as other memory references which
implicitly or explicitly use the SS register (for example, MOV AX, [BP+6] or MOV AX,
SS:[EAX+6]). The ENTER instruction generates this exception when there is not enough
stack space for allocating local variables.
• A not-present stack segment is detected when attempting to load the SS register. This
violation can occur during the execution of a task switch, a CALL instruction to a different
privilege level, a return to a different privilege level, an LSS instruction, or a MOV or POP
instruction to the SS register.
Recovery from this fault is possible by either extending the limit of the stack segment (in the
case of a limit violation) or loading the missing stack segment into memory (in the case of a not-
present violation.
5-39
INTERRUPT AND EXCEPTION HANDLING
type checks) before it generates the exception. The stack fault handler should thus not rely on
being able to use the segment selectors found in the CS, SS, DS, ES, FS, and GS registers
without causing another exception. The exception handler should check all segment registers
before trying to resume the new task; otherwise, general protection faults may result later under
conditions that are more difficult to diagnose. (Refer to the Program State Change description
for “Interrupt 10—Invalid TSS Exception (#TS)” in this chapter for additional information on
how to handle this situation.)
5-40
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that the processor detected one of a class of protection violations called “general-
protection violations.” The conditions that cause this exception to be generated comprise all the
protection violations that do not cause other exceptions to be generated (such as, invalid-TSS,
segment-not-present, stack-fault, or page-fault exceptions). The following conditions cause
general-protection exceptions to be generated:
• Exceeding the segment limit when accessing the CS, DS, ES, FS, or GS segments.
• Exceeding the segment limit when referencing a descriptor table (except during a task
switch or a stack switch).
• Transferring execution to a segment that is not executable.
• Writing to a code segment or a read-only data segment.
• Reading from an execute-only code segment.
• Loading the SS register with a segment selector for a read-only segment (unless the
selector comes from a TSS during a task switch, in which case an invalid-TSS exception
occurs).
• Loading the SS, DS, ES, FS, or GS register with a segment selector for a system segment.
• Loading the DS, ES, FS, or GS register with a segment selector for an execute-only code
segment.
• Loading the SS register with the segment selector of an executable segment or a null
segment selector.
• Loading the CS register with a segment selector for a data segment or a null segment
selector.
• Accessing memory using the DS, ES, FS, or GS register when it contains a null segment
selector.
• Switching to a busy task during a call or jump to a TSS.
• Switching to an available (nonbusy) task during the execution of an IRET instruction.
• Using a segment selector on task switch that points to a TSS descriptor in the current LDT.
TSS descriptors can only reside in the GDT.
• Violating any of the privilege rules described in Chapter 4, Protection.
• Exceeding the instruction length limit of 15 bytes (this only can occur when redundant
prefixes are placed before an instruction).
5-41
INTERRUPT AND EXCEPTION HANDLING
• Loading the CR0 register with a set PG flag (paging enabled) and a clear PE flag
(protection disabled).
• Loading the CR0 register with a set NW flag and a clear CD flag.
• Referencing an entry in the IDT (following an interrupt or exception) that is not an
interrupt, trap, or task gate.
• Attempting to access an interrupt or exception handler through an interrupt or trap gate
from virtual-8086 mode when the handler’s code segment DPL is greater than 0.
• Attempting to write a 1 into a reserved bit of CR4.
• Attempting to execute a privileged instruction when the CPL is not equal to 0 (refer to
Section 4.9., “Privileged Instructions” in Chapter 4, Protection for a list of privileged
instructions).
• Writing to a reserved bit in an MSR.
• Accessing a gate that contains a null segment selector.
• Executing the INT n instruction when the CPL is greater than the DPL of the referenced
interrupt, trap, or task gate.
• The segment selector in a call, interrupt, or trap gate does not point to a code segment.
• The segment selector operand in the LLDT instruction is a local type (TI flag is set) or
does not point to a segment descriptor of the LDT type.
• The segment selector operand in the LTR instruction is local or points to a TSS that is not
available.
• The target code-segment selector for a call, jump, or return is null.
• If the PAE and/or PSE flag in control register CR4 is set and the processor detects any
reserved bits in a page-directory-pointer-table entry set to 1. These bits are checked during
a write to control registers CR0, CR3, or CR4 that causes a reloading of the page-
directory-pointer-table entry.
A program or task can be restarted following any general-protection exception. If the exception
occurs while attempting to call an interrupt handler, the interrupted program can be restartable,
but the interrupt may be lost.
5-42
INTERRUPT AND EXCEPTION HANDLING
5-43
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that, with paging enabled (the PG flag in the CR0 register is set), the processor detected
one of the following conditions while using the page-translation mechanism to translate a linear
address to a physical address:
• The P (present) flag in a page-directory or page-table entry needed for the address
translation is clear, indicating that a page table or the page containing the operand is not
present in physical memory.
• The procedure does not have sufficient privilege to access the indicated page (that is, a
procedure running in user mode attempts to access a supervisor-mode page).
• Code running in user mode attempts to write to a read-only page. In the Intel486™ and
later processors, if the WP flag is set in CR0, the page fault will also be triggered by code
running in supervisor mode that tries to write to a read-only user-mode page.
The exception handler can recover from page-not-present conditions and restart the program or
task without any loss of program continuity. It can also restart the program or task after a privi-
lege violation, but the problem that caused the privilege violation may be uncorrectable.
5-44
INTERRUPT AND EXCEPTION HANDLING
31 4 3 2 1 0
R U R
S
Reserved V / / P
D S W
U/S 0 The access causing the fault originated when the processor
was executing in supervisor mode.
1 The access causing the fault originated when the processor
was executing in user mode.
RSVD 0 The fault was not caused by a reserved bit violation.
1 The page fault occured because a 1 was detected in one of the
reserved bit positions of a page table entry or directory entry
that was marked present.
• The contents of the CR2 register. The processor loads the CR2 register with the 32-bit
linear address that generated the exception. The page-fault handler can use this address to
locate the corresponding page directory and page-table entries. If another page fault can
potentially occur during execution of the page-fault handler, the handler must push the
contents of the CR2 register onto the stack before the second page fault occurs.
If a page fault is caused by a page-level protection violation, the access flag in the page-directory
entry is set when the fault occurs. The behavior of Intel Architecture processors regarding the
access flag in the corresponding page-table entry is model specific and not architecturally
defined.
5-45
INTERRUPT AND EXCEPTION HANDLING
When a page-fault exception is generated during a task switch, the program-state may change,
as follows. During a task switch, a page-fault exception can occur during any of following
operations:
• While writing the state of the original task into the TSS of that task.
• While reading the GDT to locate the TSS descriptor of the new task.
• While reading the TSS of the new task.
• While reading segment descriptors associated with segment selectors from the new task.
• While reading the LDT of the new task to verify the segment registers stored in the new
TSS.
In the last two cases the exception occurs in the context of the new task. The instruction pointer
refers to the first instruction of the new task, not to the instruction which caused the task switch
(or the last instruction to be executed, in the case of an interrupt). If the design of the operating
system permits page faults to occur during task-switches, the page-fault handler should be called
through a task gate.
If a page fault occurs during a task switch, the processor will load all the state information from
the new TSS (without performing any additional limit, present, or type checks) before it gener-
ates the exception. The page-fault handler should thus not rely on being able to use the segment
selectors found in the CS, SS, DS, ES, FS, and GS registers without causing another exception.
(Refer to the Program State Change description for “Interrupt 10—Invalid TSS Exception
(#TS)” in this chapter for additional information on how to handle this situation.)
When executing this code on one of the 32-bit Intel Architecture processors, it is possible to get
a page fault, general-protection fault (#GP), or alignment check fault (#AC) after the segment
selector has been loaded into the SS register but before the ESP register has been loaded. At this
point, the two parts of the stack pointer (SS and ESP) are inconsistent. The new stack segment
is being used with the old stack pointer.
The processor does not use the inconsistent stack pointer if the exception handler switches to a
well defined stack (that is, the handler is a task or a more privileged procedure). However, if the
exception handler is called at the same privilege level and from the same task, the processor will
attempt to use the inconsistent stack pointer.
In systems that handle page-fault, general-protection, or alignment check exceptions within the
faulting task (with trap or interrupt gates), software executing at the same privilege level as the
exception handler should initialize a new stack by using the LSS instruction rather than a pair
5-46
INTERRUPT AND EXCEPTION HANDLING
of MOV instructions, as described earlier in this note. When the exception handler is running at
privilege level 0 (the normal case), the problem is limited to procedures or tasks that run at priv-
ilege level 0, typically the kernel of the operating system.
5-47
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that the FPU has detected a floating-point-error exception. The NE flag in the register
CR0 must be set and the appropriate exception must be unmasked (clear mask bit in the control
register) for an interrupt 16, floating-point-error exception to be generated. (Refer to Section
2.5., “Control Registers” in Chapter 2, System Architecture Overview for a detailed description
of the NE flag.)
While executing floating-point instructions, the FPU detects and reports six types of floating-
point errors:
• Invalid operation (#I)
— Stack overflow or underflow (#IS)
— Invalid arithmetic operation (#IA)
• Divide-by-zero (#Z)
• Denormalized operand (#D)
• Numeric overflow (#O)
• Numeric underflow (#U)
• Inexact result (precision) (#P)
For each of these error types, the FPU provides a flag in the FPU status register and a mask bit
in the FPU control register. If the FPU detects a floating-point error and the mask bit for the error
is set, the FPU handles the error automatically by generating a predefined (default) response and
continuing program execution. The default responses have been designed to provide a reason-
able result for most floating-point applications.
If the mask for the error is clear and the NE flag in register CR0 is set, the FPU does the
following:
1. Sets the necessary flag in the FPU status register.
2. Waits until the next “waiting” floating-point instruction or WAIT/FWAIT instruction is
encountered in the program’s instruction stream. (The FPU checks for pending floating-
point exceptions on “waiting” instructions prior to executing them. All the floating-point
instructions except the FNINIT, FNCLEX, FNSTSW, FNSTSW AX, FNSTCW,
FNSTENV, and FNSAVE instructions are “waiting” instructions.)
3. Generates an internal error signal that causes the processor to generate a floating-point-
error exception.
5-48
INTERRUPT AND EXCEPTION HANDLING
All of the floating-point-error conditions can be recovered from. The floating-point-error excep-
tion handler can determine the error condition that caused the exception from the settings of the
flags in the FPU status word. Refer to “Software Exception Handling” in Chapter 7 of the Intel
Architecture Software Developer’s Manual, Volume 1, for more information on handling
floating-point-error exceptions.
5-49
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that the processor detected an unaligned memory operand when alignment checking
was enabled. Alignment checks are only carried out in data (or stack) segments (not in code or
system segments). An example of an alignment-check violation is a word stored at an odd byte
address, or a doubleword stored at an address that is not an integer multiple of 4. Table 5-7 lists
the alignment requirements various data types recognized by the processor.
Table 5-7. Alignment Requirements by Data Type
Data Type Address Must Be Divisible By
Word 2
Doubleword 4
Single Real 4
Double Real 8
Extended Real 8
Segment Selector 2
32-bit Far Pointer 2
48-bit Far Pointer 4
32-bit Pointer 4
GDTR, IDTR, LDTR, or Task Register Contents 4
FSTENV/FLDENV Save Area 4 or 2, depending on operand size
FSAVE/FRSTOR Save Area 4 or 2, depending on operand size
Bit String 2 or 4 depending on the operand-size attribute.
1
128-bit 16
1. 128-bit datatype introduced with the Pentium® III processor. This type of alignment check is done for
operands less than 128-bits in size: 32-bit scalar single and 16-bit/32-bit/64-bit integer MMX™ technol-
ogy; 2, 4, or 8 byte alignments checks are possible when #AC is enabled. Some exceptional cases are:
• The MOVUPS instruction, which performs a 128-bit unaligned load or store. In this case, 2/4/8-byte
misalignments will be detected, but detection of 16-byte misalignment is not guaranteed and may
vary with implementation.
• The FXSAVE/FXRSTOR instructions - refer to instruction descriptions
To enable alignment checking, the following conditions must be true:
• AM flag in CR0 register is set.
• AC flag in the EFLAGS register is set.
• The CPL is 3 (protected mode or virtual-8086 mode).
5-50
INTERRUPT AND EXCEPTION HANDLING
Alignment-check faults are generated only when operating at privilege level 3 (user mode).
Memory references that default to privilege level 0, such as segment descriptor loads, do not
generate alignment-check faults, even when caused by a memory reference made from privilege
level 3.
Storing the contents of the GDTR, IDTR, LDTR, or task register in memory while at privilege
level 3 can generate an alignment-check fault. Although application programs do not normally
store these registers, the fault can be avoided by aligning the information stored on an even
word-address.
FSAVE and FRSTOR instructions generate unaligned references which can cause alignment-
check faults. These instructions are rarely needed by application programs.
5-51
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that the processor detected an internal machine error or a bus error, or that an external
agent detected a bus error. The machine-check exception is model-specific, available only on
the P6 family and Pentium® processors. The implementation of the machine-check exception is
different between the P6 family and Pentium® processors, and these implementations may not
be compatible with future Intel Architecture processors. (Use the CPUID instruction to deter-
mine whether this feature is present.)
Bus errors detected by external agents are signaled to the processor on dedicated pins: the
BINIT# pin on the P6 family processors and the BUSCHK# pin on the Pentium® processor.
When one of these pins is enabled, asserting the pin causes error information to be loaded into
machine-check registers and a machine-check exception is generated.
The machine-check exception and machine-check architecture are discussed in detail in Chapter
13, Machine-Check Architecture. Also, refer to the data books for the individual processors for
processor-specific hardware information.
5-52
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates the processor has detected a SIMD floating-point execution unit exception. The appro-
priate status flag in the MXCSR register must be set and the particular exception unmasked for
this interrupt to be generated.
There are six classes of numeric exception conditions that can occur while executing Streaming
SIMD Extensions:
• The processor can handle the exception by itself, producing the most reasonable result and
allowing numeric program execution to continue undisturbed (i.e., masked exception
response).
• A software exception handler can be invoked to handle the exception (i.e., unmasked
exception response).
Each of the six exception conditions described above has corresponding flag and mask bits in
the MXCSR. If an exception is masked (the corresponding mask bit in MXCSR = 1), the
processor takes an appropriate default action and continues with the computation. If the excep-
tion is unmasked (mask bit = 0) and the OS supports SIMD floating-point exceptions (i.e.
CR4.OSXMMEXCPT = 1), a software exception handler is invoked immediately through
SIMD floating-point exception interrupt vector 19. If the exception is unmasked (mask bit = 0)
and the OS does not support SIMD floating-point exceptions (i.e. CR4.OSXMMEXCPT = 0),
an invalid opcode exception is signaled instead of a SIMD floating-point exception.
Note that because SIMD floating-point exceptions are precise and occur immediately, the situ-
ation does not arise where an x87-FP instruction, an FWAIT instruction, or another Streaming
SIMD Extensions instruction will catch a pending unmasked SIMD floating-point exception.
5-53
INTERRUPT AND EXCEPTION HANDLING
5-54
INTERRUPT AND EXCEPTION HANDLING
Description
Indicates that the processor did one of the following things:
• Executed an INT n instruction where the instruction operand is one of the vector numbers
from 32 through 255.
• Responded to an interrupt request at the INTR pin or from the local APIC when the
interrupt vector number associated with the request is from 32 through 255.
5-55
INTERRUPT AND EXCEPTION HANDLING
5-56
6
Task Management
TASK MANAGEMENT
CHAPTER 6
TASK MANAGEMENT
This chapter describes the Intel Architecture’s task management facilities. These facilities are
only available when the processor is running in protected mode.
NOTE
This chapter describes primarily 32-bit tasks and the 32-bit TSS structure.
For information on 16-bit tasks and the 16-bit TSS structure, refer to Section
6.6., “16-Bit Task-State Segment (TSS)”.
A task is identified by the segment selector for its TSS. When a task is loaded into the processor
for execution, the segment selector, base address, limit, and segment descriptor attributes for the
TSS are loaded into the task register (refer to Section 2.4.4., “Task Register (TR)” in Chapter 2,
System Architecture Overview).
If paging is implemented for the task, the base address of the page directory used by the task is
loaded into control register CR3.
6-1
TASK MANAGEMENT
Code
Segment
Task-State Data
Segment Segment
(TSS)
Stack
Segment
(Current Priv.
Level)
Stack Seg.
Priv. Level 0
Stack Seg.
Priv. Level 1
Task Register Stack
Segment
CR3 (Priv. Level 2)
6-2
TASK MANAGEMENT
6-3
TASK MANAGEMENT
Use of task management facilities for handling multitasking applications is optional. Multi-
tasking can be handled in software, with each software defined task executed in the context of
a single Intel Architecture task.
6-4
TASK MANAGEMENT
(which is sometimes called the back link field) permits a task switch back to
the previous task to be initiated with an IRET instruction.
The processor reads the static fields, but does not normally change them. These fields are set up
when a task is created. The following are static fields:
LDT segment selector field
Contains the segment selector for the task’s LDT.
31 15 0
I/O Map Base Address T 100
LDT Segment Selector 96
GS 92
FS 88
DS 84
SS 80
CS 76
ES 72
EDI 68
ESI 64
EBP 60
ESP 56
EBX 52
EDX 48
ECX 44
EAX 40
EFLAGS 36
EIP 32
CR3 (PDBR) 28
SS2 24
ESP2 20
SS1 16
ESP1 12
SS0 8
ESP0 4
Previous Task Link 0
6-5
TASK MANAGEMENT
6-6
TASK MANAGEMENT
flag set (which indicates the current LDT) causes a general-protection exception (#GP) to be
generated. A general-protection exception is also generated if an attempt is made to load a
segment selector for a TSS into a segment register.
The busy flag (B) in the type field indicates whether the task is busy. A busy task is currently
running or is suspended. A type field with a value of 1001B indicates an inactive task; a value
of 1011B indicates a busy task. Tasks are not recursive. The processor uses the busy flag to
detect an attempt to call a task whose execution has been interrupted. To insure that there is only
one busy flag is associated with a task, each TSS should have only one TSS descriptor that points
to it.
TSS Descriptor
31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0
A Limit D Type
Base 31:24 G 0 0 V P P Base 23:16 4
19:16
L L 0 1 0 B 1
31 16 15 0
The base, limit, and DPL fields and the granularity and present flags have functions similar to
their use in data-segment descriptors (refer to Section 3.4.3., “Segment Descriptors” in Chapter
3, Protected-Mode Memory Management). The limit field must have a value equal to or greater
than 67H (for a 32-bit TSS), one byte less than the minimum size of a TSS. Attempting to switch
to a task whose TSS descriptor has a limit less than 67H generates an invalid-TSS exception
(#TS). A larger limit is required if an I/O permission bit map is included in the TSS. An even
larger limit would be required if the operating system stores additional data in the TSS. The
processor does not check for a limit greater than 67H on a task switch; however, it does when
accessing the I/O permission bit map or interrupt redirection bit map.
Any program or procedure with access to a TSS descriptor (that is, whose CPL is numerically
equal to or less than the DPL of the TSS descriptor) can dispatch the task with a call or a jump.
In most systems, the DPLs of TSS descriptors should be set to values less than 3, so that only
privileged software can perform task switching. However, in multitasking applications, DPLs
for some TSS descriptors can be set to 3 to allow task switching at the application (or user) priv-
ilege level.
6-7
TASK MANAGEMENT
6-8
TASK MANAGEMENT
TSS
+
GDT
TSS Descriptor
31 16 15 14 13 12 11 8 7 0
D Type
P P 4
L 0 0 1 0 1
31 16 15 0
Reserved
6-9
TASK MANAGEMENT
A task can be accessed either through a task-gate descriptor or a TSS descriptor. Both of these
structures are provided to satisfy the following needs:
• The need for a task to have only one busy flag. Because the busy flag for a task is stored in
the TSS descriptor, each task should have only one TSS descriptor. There may, however,
be several task gates that reference the same TSS descriptor.
• The need to provide selective access to tasks. Task gates fill this need, because they can
reside in an LDT and can have a DPL that is different from the TSS descriptor’s DPL. A
program or procedure that does not have sufficient privilege to access the TSS descriptor
for a task in the GDT (which usually has a DPL of 0) may be allowed access to the task
through a task gate with a higher DPL. Task gates give the operating system greater
latitude for limiting access to specific tasks.
• The need for an interrupt or exception to be handled by an independent task. Task gates
may also reside in the IDT, which allows interrupts and exceptions to be handled by
handler tasks. When an interrupt or exception vector points to a task gate, the processor
switches to the specified task.
Figure 6-6 illustrates how a task gate in an LDT, a task gate in the GDT, and a task gate in the
IDT can all point to the same task.
6-10
TASK MANAGEMENT
Task Gate
IDT
Task Gate
2. Checks that the current (old) task is allowed to switch to the new task. Data-access
privilege rules apply to JMP and CALL instructions. The CPL of the current (old) task and
the RPL of the segment selector for the new task must be less than or equal to the DPL of
the TSS descriptor or task gate being referenced. Exceptions, interrupts (except for
interrupts generated by the INT n instruction), and the IRET instruction are permitted to
switch tasks regardless of the DPL of the destination task-gate or TSS descriptor. For
interrupts generated by the INT n instruction, the DPL is checked.
3. Checks that the TSS descriptor of the new task is marked present and has a valid limit
(greater than or equal to 67H).
4. Checks that the new task is available (call, jump, exception, or interrupt) or busy (IRET
return).
6-11
TASK MANAGEMENT
5. Checks that the current (old) TSS, new TSS, and all segment descriptors used in the task
switch are paged into system memory.
6. If the task switch was initiated with a JMP or IRET instruction, the processor clears the
busy (B) flag in the current (old) task’s TSS descriptor; if initiated with a CALL
instruction, an exception, or an interrupt, the busy (B) flag is left set. (Refer to Table 6-2.)
7. If the task switch was initiated with an IRET instruction, the processor clears the NT flag
in a temporarily saved image of the EFLAGS register; if initiated with a CALL or JMP
instruction, an exception, or an interrupt, the NT flag is left unchanged in the saved
EFLAGS image.
8. Saves the state of the current (old) task in the current task’s TSS. The processor finds the
base address of the current TSS in the task register and then copies the states of the
following registers into the current TSS: all the general-purpose registers, segment
selectors from the segment registers, the temporarily saved image of the EFLAGS register,
and the instruction pointer register (EIP).
NOTE
At this point, if all checks and saves have been carried out successfully, the
processor commits to the task switch. If an unrecoverable error occurs in
steps 1 through 8, the processor does not complete the task switch and insures
that the processor is returned to its state prior to the execution of the
instruction that initiated the task switch. If an unrecoverable error occurs after
the commit point (in steps 9 through 14), the processor completes the task
switch (without performing additional access and segment availability
checks) and generates the appropriate exception prior to beginning execution
of the new task. If exceptions occur after the commit point, the exception
handler must finish the task switch itself before allowing the processor to
begin executing the task. Refer to Chapter 5, Interrupt and Exception
Handling for more information about the affect of exceptions on a task when
they occur after the commit point of a task switch.
9. If the task switch was initiated with a CALL instruction, an exception, or an interrupt, the
processor sets the NT flag in the EFLAGS image stored in the new task’s TSS; if initiated
with an IRET instruction, the processor restores the NT flag from the EFLAGS image
stored on the stack. If initiated with a JMP instruction, the NT flag is left unchanged.
(Refer to Table 6-2.)
10. If the task switch was initiated with a CALL instruction, JMP instruction, an exception, or
an interrupt, the processor sets the busy (B) flag in the new task’s TSS descriptor; if
initiated with an IRET instruction, the busy (B) flag is left set.
11. Sets the TS flag in the control register CR0 image stored in the new task’s TSS.
12. Loads the task register with the segment selector and descriptor for the new task's TSS.
6-12
TASK MANAGEMENT
13. Loads the new task’s state from its TSS into processor. Any errors associated with the
loading and qualification of segment descriptors in this step occur in the context of the new
task. The task state information that is loaded here includes the LDTR register, the PDBR
(control register CR3), the EFLAGS register, the EIP register, the general-purpose
registers, and the segment descriptor parts of the segment registers.
14. Begins executing the new task. (To an exception handler, the first instruction of the new
task appears not to have been executed.)
The state of the currently executing task is always saved when a successful task switch occurs.
If the task is resumed, execution starts with the instruction pointed to by the saved EIP value,
and the registers are restored to the values they held when the task was suspended.
When switching tasks, the privilege level of the new task does not inherit its privilege level from
the suspended task. The new task begins executing at the privilege level specified in the CPL
field of the CS register, which is loaded from the TSS. Because tasks are isolated by their sepa-
rate address spaces and TSSs and because privilege rules control access to a TSS, software does
not need to perform explicit privilege checks on a task switch.
Table 6-1 shows the exception conditions that the processor checks for when switching tasks. It
also shows the exception that is generated for each check if an error is detected and the segment
that the error code references. (The order of the checks in the table is the order used in the P6
family processors. The exact order is model specific and may be different for other Intel Archi-
tecture processors.) Exception handlers designed to handle these exceptions may be subject to
recursive calls if they attempt to reload the segment selector that generated the exception. The
cause of the exception (or the first of multiple causes) should be fixed before reloading the
selector.
6-13
TASK MANAGEMENT
NOTES:
1. #NP is segment-not-present exception, #GP is general-protection exception, #TS is invalid-TSS excep-
tion, and #SF is stack-fault exception.
2. The error code contains an index to the segment descriptor referenced in this column.
3. A segment selector is valid if it is in a compatible type of table (GDT or LDT), occupies an address within
the table’s segment limit, and refers to a compatible type of descriptor (for example, a segment selector in
the CS register only is valid when it points to a code-segment descriptor).
The TS (task switched) flag in the control register CR0 is set every time a task switch occurs.
System software uses the TS flag to coordinate the actions of floating-point unit when gener-
ating floating-point exceptions with the rest of the processor. The TS flag indicates that the
context of the floating-point unit may be different from that of the current task. Refer to Section
2.5., “Control Registers” in Chapter 2, System Architecture Overview for a detailed description
of the function and use of the TS flag.
6-14
TASK MANAGEMENT
NOTE
When a JMP instruction causes a task switch, the new task is not nested; that
is, the NT flag is set to 0 and the previous task link field is not used. A JMP
instruction is used to dispatch a new task when nesting is not desired.
Prev. Task Link Prev. Task Link Prev. Task Link Task Register
Table 6-2 summarizes the uses of the busy flag (in the TSS segment descriptor), the NT flag, the
previous task link field, and TS flag (in control register CR0) during a task switch. Note that the
NT flag may be modified by software executing at any privilege level. It is possible for a
program to set its NT flag and execute an IRET instruction, which would have the effect of
invoking the task specified in the previous link field of the current task’s TSS. To keep spurious
task switches from succeeding, the operating system should initialize the previous task link field
for every TSS it creates to 0.
Table 6-2. Effect of a Task Switch on Busy Flag, NT Flag, Previous Task Link Field,
and TS Flag
Effect of CALL
Effect of JMP Instruction or Effect of IRET
Flag or Field instruction Interrupt Instruction
Busy (B) flag of new Flag is set. Must have Flag is set. Must have No change. Must have
task. been clear before. been clear before. been set.
Busy flag of old task. Flag is cleared. No change. Flag is Flag is cleared.
currently set.
NT flag of new task. No change. Flag is set. Restored to value from
TSS of new task.
NT flag of old task. No change. No change. Flag is cleared.
Previous task link field of No change. Loaded with selector No change.
new task. for old task’s TSS.
Previous task link field of No change. No change. No change.
old task.
TS flag in control Flag is set. Flag is set. Flag is set.
register CR0.
6-15
TASK MANAGEMENT
6-16
TASK MANAGEMENT
6-17
TASK MANAGEMENT
With either method of mapping task linear address spaces, the TSSs for all tasks must lie in a
shared area of the physical space, which is accessible to all tasks. This mapping is required so
that the mapping of TSS addresses does not change while the processor is reading and updating
the TSSs during a task switch. The linear address space mapped by the GDT also should be
mapped to a shared area of the physical space; otherwise, the purpose of the GDT is defeated.
Figure 6-8 shows how the linear address spaces of two tasks can overlap in the physical space
by sharing page tables.
Task A
PTE Page
PTE
PDBR PDE PTE Task A
PDE Page
Shared PT
Shared
Page
PTE
PTE Shared
Task B TSS Page
Task B
Page
PDBR PDE PTE
PDE PTE Task B
Page
6-18
TASK MANAGEMENT
to specific tasks. Other tasks in the system may have different LDTs that do not give them
access to the shared segments.
• Through segment descriptors in distinct LDTs that are mapped to common addresses in the
linear address space. If this common area of the linear address space is mapped to the same
area of the physical address space for each task, these segment descriptors permit the tasks
to share segments. Such segment descriptors are commonly called aliases. This method of
sharing is even more selective than those listed above, because, other segment descriptors
in the LDTs may point to independent linear addresses which are not shared.
6-19
TASK MANAGEMENT
15 0
Task LDT Selector 42
DS Selector 40
SS Selector 38
CS Selector 36
ES Selector 34
DI 32
SI 30
BP 28
SP 26
BX 24
DX 22
CX 20
AX 18
FLAG Word 16
IP (Entry Point) 14
SS2 12
SP2 10
SS1 8
SP1 6
SS0 4
SP0 2
Previous Task Link 0
6-20
7
Multiple-Processor
Management
CHAPTER 7
MULTIPLE-PROCESSOR MANAGEMENT
The Intel Architecture provides several mechanisms for managing and improving the perfor-
mance of multiple processors connected to the same system bus. These mechanisms include:
• Bus locking and/or cache coherency management for performing atomic operations on
system memory.
• Serializing instructions. (These instructions apply only to the Pentium® and P6 family
processors.)
• Advance programmable interrupt controller (APIC) located on the processor chip. (The
APIC architecture was introduced into the Intel Architecture with the Pentium ® processor.)
• A secondary (level 2, L2) cache. For the P6 family processors, the L2 cache is included in
the processor package and is tightly coupled to the processor. For the Pentium® and
Intel486™ processors, pins are provided to support an external L2 cache.
These mechanisms are particularly useful in symmetric-multiprocessing systems; however, they
can also be used in applications where a Intel Architecture processor and a special-purpose
processor (such as a communications, graphics, or video processor) share the system bus.
The main goals of these multiprocessing mechanisms are as follows:
• To maintain system memory coherency—When two or more processors are attempting
simultaneously to access the same address in system memory, some communication
mechanism or memory access protocol must be available to promote data coherency and,
in some instances, to allow one processor to temporarily lock a memory location.
• To maintain cache consistency—When one processor accesses data cached in another
processor, it must not receive incorrect data. If it modifies data, all other processors that
access that data must receive the modified data.
• To allow predictable ordering of writes to memory—In some circumstances, it is important
that memory writes be observed externally in precisely the same order as programmed.
• To distribute interrupt handling among a group of processors—When several processors
are operating in a system in parallel, it is useful to have a centralized mechanism for
receiving interrupts and distributing them to available processors for servicing.
The Intel Architecture’s caching mechanism and cache consistency are discussed in Chapter 9,
Memory Cache Control. Bus and memory locking, serializing instructions, memory ordering,
and the processor’s internal APIC are discussed in the following sections.
7-1
MULTIPLE-PROCESSOR MANAGEMENT
7-2
MULTIPLE-PROCESSOR MANAGEMENT
Accesses to cacheable memory that are split across bus widths, cache lines, and page boundaries
are not guaranteed to be atomic by the Intel486™, Pentium®, or P6 family processors. The P6
family processors provide bus control signals that permit external memory subsystems to make
split accesses atomic; however, nonaligned data accesses will seriously impact the performance
of the processor and should be avoided where possible.
7-3
MULTIPLE-PROCESSOR MANAGEMENT
— Use a locked operation to modify the access-rights byte to indicate that the segment
descriptor is valid and present.
Note that the Intel386™ processor always updates the accessed flag in the segment
descriptor, whether it is clear or not. The P6 family, Pentium®, and Intel486™ processors
only update this flag if it is not already set.
• When updating page-directory and page-table entries. When updating page-directory
and page-table entries, the processor uses locked cycles to set the accessed and dirty flag in
the page-directory and page-table entries.
• Acknowledging interrupts. After an interrupt request, an interrupt controller may use the
data bus to send the interrupt vector for the interrupt to the processor. The processor
follows the LOCK semantics during this time to ensure that no other data appears on the
data bus when the interrupt vector is being transmitted.
7-4
MULTIPLE-PROCESSOR MANAGEMENT
NOTE
The locked instructions for the current versions of the Intel486™, Pentium®,
and P6 family processors will allow data written to be fetched as instructions.
However, Intel recommends that developers who require the use of self-
modifying code use a different synchronizing mechanism, described in the
following sections.
(* OPTION 2 *)
Store modified code (as data) into code segment;
Execute a serializing instruction; (* For example, CPUID instruction *)
Execute new code;
(The use of one of these options is not required for programs intended to run on the Pentium® or
Intel486™ processors, but are recommended to insure compatibility with the P6 family proces-
sors.)
It should be noted that self-modifying code will execute at a lower level of performance than
nonself-modifying or normal code. The degree of the performance deterioration will depend
upon the frequency of modification and specific characteristics of the code.
7-5
MULTIPLE-PROCESSOR MANAGEMENT
The act of one processor writing data into the currently executing code segment of a second
processor with the intent of having the second processor execute that data as code is called
cross-modifying code. As with self-modifying code, Intel Architecture processors exhibit
model-specific behavior when executing cross-modifying code, depending upon how far ahead
of the executing processors current execution pointer the code has been modified. To write
cross-modifying code and insure that it is compliant with current and future Intel Architectures,
the following processor synchronization algorithm should be implemented.
; Action of Modifying Processor
Store modified code (as data) into code segment;
Memory_Flag ← 1;
7-6
MULTIPLE-PROCESSOR MANAGEMENT
where reads and writes are issued on the system bus in the order they occur in the instruction
stream under all circumstances.
To allow optimizing of instruction execution, the Intel Architecture allows departures from
strong-ordering model called processor ordering in P6-family processors. These processor-
ordering variations allow performance enhancing operations such as allowing reads to go ahead
of writes by buffering writes. The goal of any of these variations is to increase instruction execu-
tion speeds, while maintaining memory coherency, even in multiple-processor systems.
The following sections describe the memory ordering models used by the Intel486™, Pentium®,
and P6 family processors.
7-7
MULTIPLE-PROCESSOR MANAGEMENT
6. Data from buffered writes can be forwarded to waiting reads within the processor.
7. Reads or writes cannot pass (be carried out ahead of) I/O instructions, locked instructions,
or serializing instructions.
The second rule allows a read to pass a write. However, if the write is to the same memory loca-
tion as the read, the processor’s internal “snooping” mechanism will detect the conflict and
update the already cached read before the processor executes the instruction that uses the value.
The sixth rule constitutes an exception to an otherwise write ordered model.
In a multiple-processor system, the following ordering rules apply:
• Individual processors use the same ordering rules as in a single-processor system.
• Writes by a single processor are observed in the same order by all processors.
• Writes from the individual processors on the system bus are globally observed and are
NOT ordered with respect to each other.
The latter rule can be clarified by the example in Figure 7-1. Consider three processors in a
system and each processor performs three writes, one to each of three defined locations (A, B,
and C). Individually, the processors perform the writes in the same program order, but because
of bus arbitration and other memory access mechanisms, the order that the three processors write
the individual memory locations can differ each time the respective code sequences are executed
on the processors. The final values in location A, B, and C would possibly vary on each execu-
tion of the write sequence.
7-8
MULTIPLE-PROCESSOR MANAGEMENT
The processor-ordering model described in this section is virtually identical to that used by the
Pentium® and Intel486™ processors. The only enhancements in the P6 family processors are:
• Added support for speculative reads.
• Store-buffer forwarding, when a read passes a write to the same memory location.
• Out of order store from long string store and string move operations (refer to Section
7.2.3., “Out of Order Stores From String Operations in P6 Family Processors” below).
7-9
MULTIPLE-PROCESSOR MANAGEMENT
Range Registers (MTRRs)”, in Chapter 9, Memory Cache Control). MTRRs are available
only in the P6 family processors.
These mechanisms can be used as follows.
Memory mapped devices and other I/O devices on the bus are often sensitive to the order of
writes to their I/O buffers. I/O instructions can be used to (the IN and OUT instructions) impose
strong write ordering on such accesses as follows. Prior to executing an I/O instruction, the
processor waits for all previous instructions in the program to complete and for all buffered
writes to drain to memory. Only instruction fetch and page tables walks can pass I/O instruc-
tions. Execution of subsequent instructions do not begin until the processor determines that the
I/O instruction has been completed.
Synchronization mechanisms in multiple-processor systems may depend upon a strong
memory-ordering model. Here, a program can use a locking instruction such as the XCHG
instruction or the LOCK prefix to insure that a read-modify-write operation on memory is
carried out atomically. Locking operations typically operate like I/O operations in that they wait
for all previous instructions to complete and for all buffered writes to drain to memory (refer to
Section 7.1.2., “Bus Locking”).
Program synchronization can also be carried out with serializing instructions (refer to Section
7.4., “Serializing Instructions”). These instructions are typically used at critical procedure or
task boundaries to force completion of all previous instructions before a jump to a new section
of code or a context switch occurs. Like the I/O and locking instructions, the processor waits
until all previous instructions have been completed and all buffered writes have been drained to
memory before executing the serializing instruction.
The MTRRs were introduced in the P6 family processors to define the cache characteristics for
specified areas of physical memory. The following are two examples of how memory types set
up with MTRRs can be used strengthen or weaken memory ordering for the P6 family proces-
sors:
• The uncached (UC) memory type forces a strong-ordering model on memory accesses.
Here, all reads and writes to the UC memory region appear on the bus and out-of-order or
speculative accesses are not performed. This memory type can be applied to an address
range dedicated to memory mapped I/O devices to force strong memory ordering.
• For areas of memory where weak ordering is acceptable, the write back (WB) memory
type can be chosen. Here, reads can be performed speculatively and writes can be buffered
and combined. For this type of memory, cache locking is performed on atomic (locked)
operations that do not split across cache lines, which helps to reduce the performance
penalty associated with the use of the typical synchronization instructions, such as XCHG,
that lock the bus during the entire read-modify-write operation. With the WB memory
type, the XCHG instruction locks the cache instead of the bus if the memory access is
contained within a cache line.
It is recommended that software written to run on P6 family processors assume the processor-
ordering model or a weaker memory-ordering model. The P6 family processors do not imple-
ment a strong memory-ordering model, except when using the UC memory type. Despite the
fact that P6 family processors support processor ordering, Intel does not guarantee that future
processors will support this model. To make software portable to future processors, it is recom-
7-10
MULTIPLE-PROCESSOR MANAGEMENT
mended that operating systems provide critical region and resource control constructs and API’s
(application program interfaces) based on I/O, locking, and/or serializing instructions be used to
synchronize access to shared areas of memory in multiple-processor systems. Also, software
should not depend on processor ordering in situations where the system hardware does not vsup-
port this memory-ordering model.
7-11
MULTIPLE-PROCESSOR MANAGEMENT
7-12
MULTIPLE-PROCESSOR MANAGEMENT
7-13
MULTIPLE-PROCESSOR MANAGEMENT
APIC Bus
I/O APIC
External
Interrupts I/O Chip Set
7-14
MULTIPLE-PROCESSOR MANAGEMENT
7-15
MULTIPLE-PROCESSOR MANAGEMENT
Section 7.5.3., “APIC Bus” describes the existing arbitration protocols and bus message
formats, while Section 7.5.12., “Interprocessor and Self-Interrupts” describes the INIT level de-
assert message, used to resynchronize all local APICs’ arbitration IDs. Note that except for start-
up (refer to Section 7.5.11., “Local Vector Table”), all bus messages failing during delivery are
automatically retried. The software should avoid situations in which interrupt messages may be
“ignored” by disabled or nonexistent “target” local APICs, and messages are being resent
repeatedly.
NOTE
For P6 family processors, the APIC handles all memory accesses to addresses
within the 4-KByte APIC register space and no external bus cycles are
produced. For the Pentium® processors with an on-chip APIC, bus cycles are
produced for accesses to the 4-KByte APIC register space. Thus, for software
intended to run on Pentium® processors, system software should explicitly
not map the APIC register space to regular system memory. Doing so can
result in an invalid opcode exception (#UD) being generated or unpredictable
execution.
The 4-KByte APIC register address space should be mapped as uncacheable (UC), refer to
Section 9, “Memory Cache Control”, in Chapter 9, Memory Cache Control.
7-16
MULTIPLE-PROCESSOR MANAGEMENT
Version Register
Timer
15 1
T S R V T S R V
LINT0/1 Local
Interrupts 0,1 TMR, ISR, IRR Registers
Performance
Monitoring Counters*
T R V T R V
Error Software Transparent Registers
Vec[3:0] Register
& TMR Bit Select
INIT,
Processor Acceptance
APIC ID Priority Logic
NMI,
SMI
Register
Logical Destination Dest. Mode
& Vector
Register
Destination Format APIC Bus
Send/Receive Logic
Register
Within the 4-KByte APIC register area, the register address allocation scheme is shown in Table
7-1. Register offsets are aligned on 128-bit boundaries. All registers must be accessed using 32-
bit loads and stores. Wider registers (64-bit or 256-bit) are defined and accessed as independent
multiple 32-bit registers. If a LOCK prefix is used with a MOV instruction that accesses the
APIC address space, the prefix is ignored; that is, a locking operation does not take place.
7-17
MULTIPLE-PROCESSOR MANAGEMENT
7-18
MULTIPLE-PROCESSOR MANAGEMENT
NOTES:
1. Introduced into the APIC Architecture in the Pentium® Pro processor.
2. Introduced into the APIC Architecture in the Pentium® processor.
63 36 35 12 11 10 9 8 7 0
Reserved
7-19
MULTIPLE-PROCESSOR MANAGEMENT
31 28 27 24 23 0
7-20
MULTIPLE-PROCESSOR MANAGEMENT
31 24 23 0
Destination format register (DFR) defines the interpretation of the logical destination informa-
tion (refer to Figure 7-7). The DFR register can be programmed for flat model or cluster model
interrupt delivery modes.
31 28 0
7-21
MULTIPLE-PROCESSOR MANAGEMENT
Broadcast to all local APICs is achieved by setting all destination bits to one. This guarantees a
match on all clusters, and selects all APICs in each cluster.
In the hierarchical cluster connection model, an arbitrary hierarchical network can be created by
connecting different flat clusters via independent APIC buses. This scheme requires a cluster
manager within each cluster, responsible for handling message passing between APIC buses.
One cluster contains up to 4 agents. Thus 15 cluster managers, each with 4 agents, can form a
network of up to 60 APIC agents. Note that hierarchical APIC networks requires a special
cluster manager device, which is not part of the local or the I/O APIC units.
7-22
MULTIPLE-PROCESSOR MANAGEMENT
7-23
MULTIPLE-PROCESSOR MANAGEMENT
31 18 17 16 15 13 12 11 8 7 0
Timer Vector
Trigger Mode
0: Edge
1: Level
31 17 11 10 8 7 0
LINT0 Vector
LINT1 Vector
ERROR Vector
PCINT Vector
16 15 14 13 12
Address: FEE0 0350H
Reserved
Address: FEE0 0360H
Address: FEE0 0370H
Address: FEE0 0340H
Value After Reset: 0001 0000H
7-24
MULTIPLE-PROCESSOR MANAGEMENT
7-25
MULTIPLE-PROCESSOR MANAGEMENT
63 56 55 32
Destination Field Reserved
31 20 19 18 17 16 15 14 13 12 11 10 8 7 0
Reserved Vector
Destination Mode
Address: FEE0 0310H 0: Physical
Value after Reset: 0H 1: Logical
Delivery Status
0: Idle
1: Send Pending
Level
0 = De-assert
1 = Assert
Trigger Mode
0: Edge
1: Level
All fields of the ICR are read-write by software with the exception of the delivery status field,
which is read-only. Writing to the 32-bit word that contains the interrupt vector causes the inter-
rupt message to be sent. The ICR consists of the following fields.
Vector The vector identifying the interrupt being sent. The localAPIC
register addresses are summarized in Table 7-1.
Delivery Mode Specifies how the APICs listed in the destination field should act
upon reception of the interrupt. Note that all interprocessor interrupts
behave as edge triggered interrupts (except for INIT level de-assert
message) even if they are programmed as level triggered interrupts.
000 (Fixed) Deliver the interrupt to all processors listed in the
destination field according to the information pro-
vided in the ICR. The fixed interrupt is treated as
7-26
MULTIPLE-PROCESSOR MANAGEMENT
7-27
MULTIPLE-PROCESSOR MANAGEMENT
7-28
MULTIPLE-PROCESSOR MANAGEMENT
Table 7-2. Valid Combinations for the APIC Interrupt Command Register
Trigger Valid/ Destination
Mode Destination Mode Delivery Mode Invalid Shorthand
Edge Physical or Logical Fixed, Lowest Priority, NMI, Valid Dest. Field
SMI, INIT, Start-Up
Level Physical or Logical Fixed, Lowest Priority, NMI 1 Dest. field
Level Physical or Logical INIT 2 Dest. Field
Level x4 SMI, Start-Up Invalid3 x
Edge x Fixed Valid Self
Level x Fixed 1 Self
x x Lowest Priority, NMI, INIT, Invalid3 Self
SMI, Start-Up
Edge x Fixed Valid All inc Self
Level x Fixed 1 All inc Self
3
x x Lowest Priority, NMI, INIT, Invalid All inc Self
SMI, Start-Up
Edge x Fixed, Lowest Priority, NMI, Valid All excl Self
INIT, SMI, Start-Up
Level x Fixed, Lowest Priority, NMI 1 All excl Self
3
Level x SMI, Start-Up Invalid All excl Self
Level x INIT 2 All excl Self
NOTES:
1. Valid. Treated as edge triggered if Level = 1 (assert), otherwise ignored.
2. Valid. Treated as edge triggered when Level = 1 (assert); when Level = 0 (deassert), treated as “INIT
Level Deassert” message. Only INIT level deassert messages are allowed to have level = deassert. For
all other messages the level must be “assert.”
3. Invalid. The behavior of the APIC is undefined.
4. X—Don’t care.
7-29
MULTIPLE-PROCESSOR MANAGEMENT
255 16 15 0
Reserved IRR
Reserved ISR
Reserved TMR
7-30
MULTIPLE-PROCESSOR MANAGEMENT
Wait to Receive
Bus Message
Belong
Discard No to
Message Destination?
Yes
Is it
NMI/SMI/INIT Yes Accept
/ Message
ExtINT?
No
Lowest
Fixed Delivery Priority
Mode?
No Is Interrupt Am Yes
Set Status I Accept
to Retry Slot Available? Message
Focus?
Yes No
No Accept
Message
No Is Yes
Set Status Interrupt Slot Arbitrate
to Retry Available?
No Am I Yes Accept
Winner? Message
Figure 7-11. Interrupt Acceptance Flow Chart for the Local APIC
7-31
MULTIPLE-PROCESSOR MANAGEMENT
31 8 7 0
Task
Reserved
Priority
Address: FEE0 0080H
Value after reset: 0H
The Task Priority is specified in the TPR. The 4 most-significant bits of the task priority corre-
spond to the 16 interrupt priorities, while the 4 least-significant bits correspond to the sub-class
priority. The TPR value is generally denoted as x:y, where x is the main priority and y provides
more precision within a given priority class. When the x-value of the TPR is 15, the APIC will
not accept any interrupts.
7-32
MULTIPLE-PROCESSOR MANAGEMENT
31 0
Upon receiving end-of-interrupt, the APIC clears the highest priority bit in the ISR and selects
the next highest priority interrupt for posting to the CPU. If the terminated interrupt was a level-
triggered interrupt, the local APIC sends an end-of-interrupt message to all I/O APICs. Note that
EOI command is supplied for the above two interrupt delivery modes regardless of the interrupt
source (that is, as a result of either the I/O APIC interrupts or those issued on local pins or using
the ICR). For future compatibility, the software is requested to issue the end-of-interrupt
command by writing a value of 0H into the EOI register.
7-33
MULTIPLE-PROCESSOR MANAGEMENT
• Pending interrupts in the IRR and ISR registers are held and require masking or handling
by the CPU.
• A disabled local APIC does not affect the sending of APIC messages. It is software’s
responsibility to avoid issuing ICR commands if no sending of interrupts is desired.
• Disabling a local APIC does not affect the message in progress. The local APIC will
complete the reception/transmission of the current message and then enter the disabled
state.
• A disabled local APIC automatically sets all mask bits in the LVT entries. Trying to reset
these bits in the local vector table will be ignored.
• A software-disabled local APIC listens to all bus messages in order to keep its arbitration
ID synchronized with the rest of the system, in the event that it is re-enabled.
For the Pentium® processor, the local APIC is enabled and disabled through a hardware mecha-
nism. (Refer to the Pentium® Processor Data Book for a description of this mechanism.)
31 10 9 8 7 4 3 0
Reserved 1111
Spurious Vector Released during an INTA cycle when all pending interrupts are
masked or when no interrupt is pending. Bits 4 through 7 of the this
field are programmable by software, and bits 0 through 3 are hard-
wired to logical ones. Software writes to bits 0 through 3 have no
effect.
APIC Enable Allows software to enable (1) or disable (0) the local APIC. To
bypass APIC completely, use the APIC_BASE_MSR in Figure 7-4.
Focus Processor Determines if focus processor checking is enabled during the lowest
Checking Priority delivery: (0) enabled and (1) disabled.
7-34
MULTIPLE-PROCESSOR MANAGEMENT
7-35
MULTIPLE-PROCESSOR MANAGEMENT
31 24 23 16 15 8 7 0
7-36
MULTIPLE-PROCESSOR MANAGEMENT
If the APICs are set up to use “lowest priority” arbitration (refer to Section 7.5.10., “Interrupt
Distribution Mechanisms”) and multiple APICs are currently executing at the lowest priority
(the value in the APR register), the arbitration priorities (unique values in the Arb ID register)
are used to break ties. All 8 bits of the APR are used for the lowest priority arbitration.
7-37
MULTIPLE-PROCESSOR MANAGEMENT
If the physical delivery mode is being used, then cycles 15 and 16 represent the APIC ID and
cycles 13 and 14 are considered don’t care by the receiver. If the logical delivery mode is being
used, then cycles 13 through 16 are the 8-bit logical destination field. For shorthands of “all-
incl-self” and “all-excl-self,” the physical delivery mode and an arbitration priority of 15
(D0:D3 = 1111) are used. The agent sending the message is the only one required to distinguish
between the two cases. It does so using internal information.
When using lowest priority delivery with an existing focus processor, the focus processor iden-
tifies itself by driving 10 during cycle 19 and accepts the interrupt. This is an indication to other
APICs to terminate arbitration. If the focus processor has not been found, the short message is
extended on-the-fly to the non-focused lowest-priority message. Note that except for the EOI
message, messages generating a checksum or an acceptance error (refer to Section 7.5.17.,
“Error Handling”) terminate after cycle 21.
Nonfocused Lowest Priority Message. These 34-cycle messages (refer to Table 7-5) are used
in the lowest priority delivery mode when a focus processor is not present. Cycles 1 through 20
7-38
MULTIPLE-PROCESSOR MANAGEMENT
are same as for the short message. If during the status cycle (cycle 19) the state of the (A:A) flags
is 10B, a focus processor has been identified, and the short message format is used (refer to
Table 7-4). If the (A:A) flags are set to 00B, lowest priority arbitration is started and the 34-
cycles of the nonfocused lowest priority message are competed. For other combinations of status
flags, refer to Section 7.5.16.2., “APIC Bus Status Cycles”
7-39
MULTIPLE-PROCESSOR MANAGEMENT
Cycles 21 through 28 are used to arbitrate for the lowest priority processor. The processors
participating in the arbitration drive their inverted processor priority on the bus. Only the local
APICs having free interrupt slots participate in the lowest priority arbitration. If no such APIC
exists, the message will be rejected, requiring it to be tried at a later time.
Cycles 29 through 32 are also used for arbitration in case two or more processors have the same
lowest priority. In the lowest priority delivery mode, all combinations of errors in cycle 33 (A2
A2) will set the “accept error” bit in the error status register (refer to Figure 7-16). Arbitration
priority update is performed in cycle 20, and is not affected by errors detected in cycle 33. Only
the local APIC that wins in the lowest priority arbitration, drives cycle 33. An error in cycle 33
will force the sender to resend the message.
7-40
MULTIPLE-PROCESSOR MANAGEMENT
00: CS_OK, NoFocus 11: Do Lowest 0X: Error Yes, 20 34 Cycle Yes
00: CS_OK, NoFocus 10: End and Retry XX: Yes, 20 34 Cycle Yes
00: CS_OK, NoFocus 0X: Error XX: No 34 Cycle Yes
10: CS_OK, Focus XX: XX: Yes, 20 34 Cycle No
11: CS_Error XX: XX: No 21 Cycle Yes
01: Error XX: XX: No 21 Cycle Yes
7-41
MULTIPLE-PROCESSOR MANAGEMENT
31 8 7 6 5 4 3 2 1 0
Reserved
7-42
MULTIPLE-PROCESSOR MANAGEMENT
Send CS Error Set when the local APIC detects a check sum error for a message
that was sent by it.
Receive CS Error Set when the local APIC detects a check sum error for a message
that was received by it.
Send Accept Error Set when the local APIC detects that a message it sent was not
accepted by any APIC on the bus.
Receive Accept Error Set when the local APIC detects that the message it received was not
accepted by any APIC on the bus, including itself.
Send Illegal Vector Set when the local APIC detects an illegal vector in the message that
it is sending on the bus.
Receive Illegal Vector Set when the local APIC detects an illegal vector in the message it
received, including an illegal vector code in the local vector table
interrupts and self-interrupts from ICR.
Illegal Reg. Address Set when the processor is trying to access a register that is not
(P6 Family Processors implemented in the P6 family processors’ local APIC register
Only) address space; that is, within FEE00000H (the APICBase MSR)
through FEE003FFH (the APICBase MSR plus 4K Bytes).
7.5.18. Timer
The local APIC unit contains a 32-bit programmable timer for use by the local processor. This
timer is configured through the timer register in the local vector table (refer to Figure 7-8). The
time base is derived from the processor’s bus clock, divided by a value specified in the divide
configuration register (refer to Figure 7-17). After reset, the timer is initialized to zero. The timer
supports one-shot and periodic modes. The timer can be configured to interrupt the local
processor with an arbitrary vector.
31 4 3 2 1 0
Reserved 0
7-43
MULTIPLE-PROCESSOR MANAGEMENT
The timer is started by programming its initial-count register, refer to Figure 7-18. The initial
count value is copied into the current-count register and count-down is begun. After the timer
reaches zero in one-shot mode, an interrupt is generated and the timer remains at its 0 value until
reprogrammed. In periodic mode, the current-count register is automatically reloaded from the
initial-count register when the count reaches 0 and the count-down is repeated. If during the
count-down process the initial-count register is set, the counting will restart and the new value
will be used. The initial-count register is read-write by software, while the current-count register
is read only.
31 0
Initial Count
Current Count
7-44
MULTIPLE-PROCESSOR MANAGEMENT
7-45
MULTIPLE-PROCESSOR MANAGEMENT
Appendix E, Programming the LINT0 and LINT1 Inputs describes (with code) how to program
the LINT[0:1] pins of the processor’s local APICs after a dual-processor configuration has been
completed.
7-46
MULTIPLE-PROCESSOR MANAGEMENT
IPI), then the MP protocol is not re-executed. Instead, each processor examines its BSP
flag to determine whether the processor should boot or wait for a STARTUP IPI.
Table 7-8 describes the various fields of each boot phase IPI.
NOTE:
* For all P6 family processors.
7-47
MULTIPLE-PROCESSOR MANAGEMENT
For BIPI and FIPI messages, the lower 4 bits of the vector field are equal to the APIC ID of the
processor issuing the message. The upper 4 bits of the vector field of a BIPI or FIPI can be
thought of as the “generation ID” of the message. All processors that run symmetric to a P6
family processor will have a generation ID of 0100B or 4H. BIPIs in a system based on the P6
family processors will therefore use vector values ranging from 40H to 4EH (4FH can not be
used because FH is not a valid APIC ID).
7-48
MULTIPLE-PROCESSOR MANAGEMENT
observes the BNR# (block next request) pin to guarantee that the initial BIPI is not issued
on the APIC bus until the BIST sequence is completed for all processors in the system.
2. When the first BIPI completes (at time t=1), the APIC hardware (in each processor)
propagates an interrupt to the processor core to indicate the arrival of the BIPI.
3. The processor compares the four least significant bits of the BIPI’s vector field to the
processor's APIC ID. A match indicates that the processor should be the BSP and continue
the initialization sequence. If the APIC ID fails to match the BIPIs vector field, the
processor is essentially the “loser” or not the BSP. The processor then becomes an
application processor and should enter a “wait for SIPI” loop.
4. The winner (the BSP) issues an FIPI. The FIPI is issued to “all including self” and is
guaranteed to be the last IPI on the APIC bus during the initialization sequence. This is due
to the fact that the round-robin priority mechanism forces the winning APIC agent's (the
BSPs) arbitration priority to 0. The FIPI is therefore issued by a priority 0 agent and has to
wait until all other agents have issued their BIPI's. When the BSP receives the FIPI that it
issued (t=5), it will start fetching code at the reset vector (Intel Architecture address).
APIC Bus
t=0 t=1 t=2 t=3 t=4 t=5
5. All application processors (non-BSP processors) remain in a “halted” state and can only be
woken up by SIPIs issued by another processor (note an AP in the startup IPI loop will also
respond to BINIT and snoops).
7-49
8
Processor
Management and
Initialization
PROCESSOR MANAGEMENT AND INITIALIZATION
CHAPTER 8
PROCESSOR MANAGEMENT AND
INITIALIZATION
This chapter describes the facilities provided for managing processor wide functions and for
initializing the processor. The subjects covered include: processor initialization, FPU initializa-
tion, processor configuration, feature determination, mode switching, the MSRs (in the
Pentium® and P6 family processors), and the MTRRs (in the P6 family processors).
8-1
PROCESSOR MANAGEMENT AND INITIALIZATION
At this point, for MP (or DP) systems, the BSP (or primary) processor wakes up each AP (or
secondary) processor to enable those processors to execute self-configuration code.
When all processors are initialized, configured, and synchronized, the BSP or primary processor
begins executing an initial operating-system or executive task.
The floating-point unit (FPU) is also initialized to a known state during hardware reset. FPU
software initialization code can then be executed to perform operations such as setting the preci-
sion of the FPU and the exception masks. No special initialization of the FPU is required to
switch operating modes.
Asserting the INIT# pin on the processor invokes a similar response to a hardware reset. The
major difference is that during an INIT, the internal caches, MSRs, MTRRs, and FPU state are
left unchanged (although, the TLBs and BTB are invalidated as with a hardware reset). An INIT
provides a method for switching from protected to real-address mode while maintaining the
contents of the internal caches.
8-2
PROCESSOR MANAGEMENT AND INITIALIZATION
8-3
PROCESSOR MANAGEMENT AND INITIALIZATION
NOTES:
1. The 10 most-significant bits of the EFLAGS register are undefined following a reset. Software should not
depend on the states of any of these bits.
2. The CD and NW flags are unchanged, bit 4 is set to 1, all other bits are cleared.
3. If Built-In Self-Test (BIST) is invoked on power up or reset, EAX is 0 only if all tests passed. (BIST cannot
be invoked during an INIT.)
4. The state of the FPU state and MMX™ registers is not changed by the execution of an INIT.
5. Available in the Pentium® III processor and Pentium® III Xeon™ processor only. The state of the SIMD
floating-point registers is not changed by the execution of an INIT.
8-4
PROCESSOR MANAGEMENT AND INITIALIZATION
Paging disabled: 0
Caching disabled: 1
Not write-through disabled: 1
31 30 29 28 19 18 17 16 15 6 5 4 3 2 1 0
P C N A W N T E M P
1
G DW M P E S M P E
31 14 13 12 11 8 7 4 3 0
Stepping
EDX Family Model
ID
Processor Type
Family (0110B for the Pentium® Pro Processor Family)
Model (Beginning with 0001B)
Reserved
Figure 8-2. Processor Type and Signature in the EDX Register after Reset
8-5
PROCESSOR MANAGEMENT AND INITIALIZATION
The stepping ID field contains a unique identifier for the processor’s stepping ID or revision
level. The upper word of EDX is reserved following reset.
8-6
PROCESSOR MANAGEMENT AND INITIALIZATION
ized. Initialization code can test for the type of processor present before setting or clearing these
flags.
NOTE:
* The setting of the NE flag depends on the operating system being used.
The EM flag determines whether floating-point instructions are executed by the FPU (EM is
cleared) or generate a device-not-available exception (#NM) so that an exception handler can
emulate the floating-point operation (EM = 1). Ordinarily, the EM flag is cleared when an FPU
or math coprocessor is present and set if they are not present. If the EM flag is set and no FPU,
math coprocessor, or floating-point emulator is present, the system will hang when a floating-
point instruction is executed.
The MP flag determines whether WAIT/FWAIT instructions react to the setting of the TS flag.
If the MP flag is clear, WAIT/FWAIT instructions ignore the setting of the TS flag; if the MP
flag is set, they will generate a device-not-available exception (#NM) if the TS flag is set. Gener-
ally, the MP flag should be set for processors with an integrated FPU and clear for processors
without an integrated FPU and without a math coprocessor present. However, an operating
system can choose to save the floating-point context at every context switch, in which case there
would be no need to set the MP bit.
Table 2-1 in Chapter 2, System Architecture Overview shows the actions taken for floating-point
and WAIT/FWAIT instructions based on the settings of the EM, MP, and TS flags.
The NE flag determines whether unmasked floating-point exceptions are handled by generating
a floating-point error exception internally (NE is set, native mode) or through an external inter-
rupt (NE is cleared). In systems where an external interrupt controller is used to invoke numeric
exception handlers (such as MS-DOS-based systems), the NE bit should be cleared.
8-7
PROCESSOR MANAGEMENT AND INITIALIZATION
Regardless of the value of the EM bit, the Intel486™ SX processor generates a device-not-avail-
able exception (#NM) upon encountering any floating-point instruction.
supported on future Intel Architecture processors and/or to have the same functions. The MSRs
are provided to control a variety of hardware- and software-related features, including:
• The performance-monitoring counters (refer to Section 15.6., “Performance-Monitoring
Counters”, in Chapter 15, Debugging and Performance Monitoring).
• (P6 family processors only.) Debug extensions (refer to Section 15.4., “Last Branch,
Interrupt, and Exception Recording”, in Chapter 15, Debugging and Performance
Monitoring).
• (P6 family processors only.) The machine-check exception capability and its accompa-
nying machine-check architecture (refer to Chapter 13, Machine-Check Architecture).
• (P6 family processors only.) The MTRRs (refer to Section 9.12., “Memory Type Range
Registers (MTRRs)”, in Chapter 9, Memory Cache Control).
The MSRs can be read and written to using the RDMSR and WRMSR instructions, respectively.
When performing software initialization of a Pentium® Pro or Pentium® processor, many of the
MSRs will need to be initialized to set up things like performance-monitoring events, run-time
machine checks, and memory types for physical memory.
Systems configured to implement FRC mode must write all of the processors’ internal MSRs to
deterministic values before performing either a read or read-modify-write operation using these
registers. The following is a list of MSRs that are not initialized by the processors’ reset
sequences.
• All fixed and variable MTRRs.
• All Machine Check Architecture (MCA) status registers.
• Microcode update signature register.
• All L2 cache initialization MSRs.
The list of available performance-monitoring counters for the Pentium® Pro and Pentium®
processors is given in Appendix A, Performance-Monitoring Events, and the list of available
MSRs for the Pentium® Pro processor is given in Appendix B, Model-Specific Registers. The
references earlier in this section show where the functions of the various groups of MSRs are
described in this manual.
8-9
PROCESSOR MANAGEMENT AND INITIALIZATION
“Memory Type Range Registers (MTRRs)”, in Chapter 9, Memory Cache Control, for detailed
information on the MTRRs.
8-10
PROCESSOR MANAGEMENT AND INITIALIZATION
Here are two examples of how NMIs can be handled during the initial states of processor initial-
ization:
• A simple IDT and NMI interrupt handler can be provided in EPROM. This allows an NMI
interrupt to be handled immediately after reset initialization.
• The system hardware can provide a mechanism to enable and disable NMIs by passing the
NMI# signal through an AND gate controlled by a flag in an I/O port. Hardware can clear
the flag when the processor is reset, and software can set the flag when it is ready to handle
NMI interrupts.
8-11
PROCESSOR MANAGEMENT AND INITIALIZATION
8-12
PROCESSOR MANAGEMENT AND INITIALIZATION
• Software must load at least one page directory and one page table into physical memory.
The page table can be eliminated if the page directory contains a directory entry pointing to
itself (here, the page directory and page table reside in the same page), or if only 4-MByte
pages are used.
• Control register CR3 (also called the PDBR register) is loaded with the physical base
address of the page directory.
• (Optional) Software may provide one set of code and data descriptors in the GDT or in an
LDT for supervisor mode and another set for user mode.
With this paging initialization complete, paging is enabled and the processor is switched to
protected mode at the same time by loading control register CR0 with an image in which the PG
and PE flags are set. (Paging cannot be enabled before the processor is switched to protected
mode.)
8-13
PROCESSOR MANAGEMENT AND INITIALIZATION
8-14
PROCESSOR MANAGEMENT AND INITIALIZATION
— Perform a JMP or CALL instruction to a new task, which automatically resets the
values of the segment registers and branches to a new code segment.
8. Execute the LIDT instruction to load the IDTR register with the address and limit of the
protected-mode IDT.
9. Execute the STI instruction to enable maskable hardware interrupts and perform the
necessary hardware operation to enable NMI interrupts.
Random failures can occur if other instructions exist between steps 3 and 4 above. Failures will
be readily seen in some situations, such as when instructions that reference memory are inserted
between steps 3 and 4 while in System Management mode.
8-15
PROCESSOR MANAGEMENT AND INITIALIZATION
reloaded, execution continues using the descriptor attributes loaded during protected
mode.
5. Execute an LIDT instruction to point to a real-address mode interrupt table that is within
the 1-MByte real-address mode address range.
6. Clear the PE flag in the CR0 register to switch to real-address mode.
7. Execute a far JMP instruction to jump to a real-address mode program. This operation
flushes the instruction queue and loads the appropriate base and access rights values in the
CS register.
8. Load the SS, DS, ES, FS, and GS registers as needed by the real-address mode code. If any
of the registers are not going to be used in real-address mode, write 0s to them.
9. Execute the STI instruction to enable maskable hardware interrupts and perform the
necessary hardware operation to enable NMI interrupts.
NOTE
All the code that is executed in steps 1 through 9 must be in a single page and
the linear addresses in that page must be identity mapped to physical
addresses.
8-16
PROCESSOR MANAGEMENT AND INITIALIZATION
After Reset
FFFF FFFFH
[CS.BASE+EIP] FFFF FFF0H
64K EPROM
8-17
PROCESSOR MANAGEMENT AND INITIALIZATION
8-18
PROCESSOR MANAGEMENT AND INITIALIZATION
LINE SOURCE
1 NAME STARTUP
2
3 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
4 ;
5 ; ASSUMPTIONS:
6 ;
7 ; 1. The bottom 64K of memory is ram, and can be used for
8 ; scratch space by this module.
9 ;
10 ; 2. The system has sufficient free usable ram to copy the
11 ; initial GDT, IDT, and TSS
8-19
PROCESSOR MANAGEMENT AND INITIALIZATION
12 ;
13 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
14
15 ; configuration data - must match with build definition
16
17 CS_BASE EQU 0FFFF0000H
18
19 ; CS_BASE is the linear address of the segment STARTUP_CODE
20 ; - this is specified in the build language file
21
22 RAM_START EQU 400H
23
24 ; RAM_START is the start of free, usable ram in the linear
25 ; memory space. The GDT, IDT, and initial TSS will be
26 ; copied above this space, and a small data segment will be
27 ; discarded at this linear address. The 32-bit word at
28 ; RAM_START will contain the linear address of the first
29 ; free byte above the copied tables - this may be useful if
30 ; a memory manager is used.
31
32 TSS_INDEX EQU 10
33
34 ; TSS_INDEX is the index of the TSS of the first task to
35 ; run after startup
36
37
38 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
39
40 ; ------------------------- STRUCTURES and EQU ---------------
41 ; structures for system data
42
43 ; TSS structure
44 TASK_STATE STRUC
45 link DW ?
46 link_h DW ?
47 ESP0 DD ?
48 SS0 DW ?
49 SS0_h DW ?
50 ESP1 DD ?
51 SS1 DW ?
52 SS1_h DW ?
53 ESP2 DD ?
54 SS2 DW ?
55 SS2_h DW ?
56 CR3_reg DD ?
57 EIP_reg DD ?
58 EFLAGS_reg DD ?
8-20
PROCESSOR MANAGEMENT AND INITIALIZATION
59 EAX_reg DD ?
60 ECX_reg DD ?
61 EDX_reg DD ?
62 EBX_reg DD ?
63 ESP_reg DD ?
64 EBP_reg DD ?
65 ESI_reg DD ?
66 EDI_reg DD ?
67 ES_reg DW ?
68 ES_h DW ?
69 CS_reg DW ?
70 CS_h DW ?
71 SS_reg DW ?
72 SS_h DW ?
73 DS_reg DW ?
74 DS_h DW ?
75 FS_reg DW ?
76 FS_h DW ?
77 GS_reg DW ?
78 GS_h DW ?
79 LDT_reg DW ?
80 LDT_h DW ?
81 TRAP_reg DW ?
82 IO_map_base DW ?
83 TASK_STATE ENDS
84
85 ; basic structure of a descriptor
86 DESC STRUC
87 lim_0_15 DW ?
88 bas_0_15 DW ?
89 bas_16_23 DB ?
90 access DB ?
91 gran DB ?
92 bas_24_31 DB ?
93 DESC ENDS
94
95 ; structure for use with LGDT and LIDT instructions
96 TABLE_REG STRUC
97 table_lim DW ?
98 table_linear DD ?
99 TABLE_REG ENDS
100
101 ; offset of GDT and IDT descriptors in builder generated GDT
102 GDT_DESC_OFF EQU 1*SIZE(DESC)
103 IDT_DESC_OFF EQU 2*SIZE(DESC)
104
105 ; equates for building temporary GDT in RAM
8-21
PROCESSOR MANAGEMENT AND INITIALIZATION
8-22
PROCESSOR MANAGEMENT AND INITIALIZATION
8-23
PROCESSOR MANAGEMENT AND INITIALIZATION
8-24
PROCESSOR MANAGEMENT AND INITIALIZATION
8-25
PROCESSOR MANAGEMENT AND INITIALIZATION
FFFF FFFFH
Base
Limit GDT_SCRATCH
Figure 8-4. Constructing Temporary GDT and Switching to Protected Mode (Lines
162-172 of List File)
8-26
PROCESSOR MANAGEMENT AND INITIALIZATION
FFFF FFFFH
TSS
IDT
GDT
TSS RAM
IDT RAM
GDT RAM
RAM_START
Figure 8-5. Moving the GDT, IDT and TSS from ROM to RAM (Lines 196-261 of List File)
8-27
PROCESSOR MANAGEMENT AND INITIALIZATION
•
•
EIP
SS = TSS.SS EFLAGS
ESP = TSS.ESP •
PUSH TSS.EFLAG •
•
PUSH TSS.CS
ESP
PUSH TSS.EIP
ES = TSS.ES •
DS = TSS.DS
IRET ES
CS
SS
DS
GDT
TSS RAM
IDT Alias IDT RAM
GDT Alias GDT RAM
RAM_START
0
8-28
PROCESSOR MANAGEMENT AND INITIALIZATION
8-29
PROCESSOR MANAGEMENT AND INITIALIZATION
SEGMENT
*SEGMENTS(DPL = 0)
, startup.startup_code(BASE = 0FFFF0000H)
;
TASK
BOOT_TASK(OBJECT = startup, INITIAL,DPL = 0,
NOT INTENABLED)
, PROTECTED_MODE_TASK(OBJECT = main_module,DPL = 0,
NOT INTENABLED)
;
TABLE
GDT (
LOCATION = GDT_EPROM
, ENTRY = (
10: PROTECTED_MODE_TASK
, startup.startup_code
, startup.startup_data
, main_module.data
, main_module.code
, main_module.stack
)
),
IDT (
LOCATION = IDT_EPROM
);
MEMORY
(
RESERVE = (0..3FFFH
-- Area for the GDT, IDT, TSS copied from ROM
, 60000H..0FFFEFFFFH)
, RANGE = (ROM_AREA = ROM (0FFFF0000H..0FFFFFFFFH))
-- Eprom size 64K
, RANGE = (RAM_AREA = RAM (4000H..05FFFFH))
);
END
Table 8-5 shows the relationship of each build item with an ASM source file.
8-30
PROCESSOR MANAGEMENT AND INITIALIZATION
Table 8-5. Relationship Between BLD Item and ASM Source File
BLD386 Controls and
Item ASM386 and Startup.A58 BLD file Effect
Bootstrap public startup bootstrap Near jump at
startup: start(startup) 0FFFFFFF0H to start
GDT location public GDT_EPROM TABLE The location of the GDT
GDT_EPROM TABLE_REG GDT(location = will be programmed into
<> GDT_EPROM) the GDT_EPROM
location
IDT location public IDT_EPROM TABLE The location of the IDT
IDT_EPROM TABLE_REG IDT(location = will be programmed into
<> IDT_EPROM the IDT_EPROM
location
RAM start RAM_START equ 400H memory (reserve = RAM_START is used as
(0..3FFFH)) the ram destination for
moving the tables. It
must be excluded from
the application’s
segment area.
Location of the TSS_INDEX EQU 10 TABLE GDT( Put the descriptor of the
application TSS ENTRY=( 10: application TSS in GDT
in the GDT PROTECTED_MODE_TA entry 10
SK))
EPROM size size and location of the SEGMENT startup.code Initialization code size
and location initialization code (base= 0FFFF0000H) must be less than 64K
...memory (RANGE( and resides at upper
ROM_AREA = most 64K of the 4GB
ROM(x..y)) memory space.
8-31
PROCESSOR MANAGEMENT AND INITIALIZATION
UPDATE
LOADER
Update
Blocks
8-32
PROCESSOR MANAGEMENT AND INITIALIZATION
Register Name:BBL_CR_OVRD
MSR Address:017h
Access:Read Only
BBL_CR_OVRD is a 64-bit register accessed only when referenced as a Qword through a
RDMSR instruction.
The microcode update is a data block that is exactly 2048 bytes in length. The initial 48 bytes
of the update contain a header with information used to identify the update. The update header
and its reserved fields are interpreted by software based upon the header version. The initial
version of the header is 00000001h. An encoding scheme also guards against tampering of the
update data and provides a means for determining the authenticity of any given update. Table
8-7 defines each of the fields and Figure 8-8 shows the format of the microcode update data
block.
8-33
PROCESSOR MANAGEMENT AND INITIALIZATION
8-34
PROCESSOR MANAGEMENT AND INITIALIZATION
32 24 16 8 0
Processor Flags
Reserved: 24 P7: I P6: I P5: I P4: I P3: I P2: I P1: I
Loader Revision
Checksum
Processor
Date
Month: 8 Day: 8 Year: 16
Update Revision
Header Revision
32 24 16 8 0
8-35
PROCESSOR MANAGEMENT AND INITIALIZATION
The code below represents the update loader with a loader revision of 00000001h:
mov ecx,79h ; MSR to read in ECX
xor eax,eax ; clear EAX
xor ebx,ebx ; clear EBX
movax,cs ; Segment of microcode update
shl eax,4
movbx,offset Update ; Offset of microcode update
addeax,ebx ; Linear Address of Update in EAX
addeax,48d ; Offset of the Update Data within the Update
xor edx,edx ; Zero in EDX
WRMSR ; microcode update trigger
8-36
PROCESSOR MANAGEMENT AND INITIALIZATION
8-37
PROCESSOR MANAGEMENT AND INITIALIZATION
value returned in the EAX register, uniquely identifies a particular update. The signature ID can
be directly compared with the update revision field in the microcode update header for verifica-
tion of a correct update load. No consecutive updates released for a given stepping of the P6
family processor may share the same signature. Updates for different steppings are differenti-
ated by the CPUID value.
8-38
PROCESSOR MANAGEMENT AND INITIALIZATION
8-39
PROCESSOR MANAGEMENT AND INITIALIZATION
8-40
PROCESSOR MANAGEMENT AND INITIALIZATION
returned an image identical to the one that was written. The following pseudo-code represents
a calling program.
INT 15 D042 Calling Program Pseudo-code
//
// We must be in real mode
//
If the system is not in Real mode
then Exit
//
// Detect the presence of Genuine Intel processor(s) that can be updated (CPUID)
//
If no Intel processors exist that can be updated
then Exit
//
// Detect the presence of the Intel microcode update extensions
//
If the BIOS fails the PresenceTest
then Exit
//
// If the APIC is enabled, see if any other processors are out there
//
Read APICBaseMSR
If APIC enabled {
Send Broadcast Message to all processors except self via APIC;
Have all processors execute CPUID and record Type, Family, Model, Stepping
Have all processors read BBL_CR_OVRD[52:50] and record platform ID bits
If current processor is not updatable
then Exit
}
//
// Determine the number of unique update slots needed for this system
//
NumSlots = 0;
For each processor {
If ((this is a unique processor stepping) and
(we have an update in the database for this processor)) {
Checksum the update from the database;
If Checksum fails
then Exit;
Increment NumSlots;
}
}
//
// Do we have enough update slots for all CPUs?
//
If there are more unique processor steppings than update slots provided by the BIOS
then Exit
8-41
PROCESSOR MANAGEMENT AND INITIALIZATION
//
// Do we need any update slots at all? If not, then we’re all done
//
If (NumSlots == 0)
then Exit
//
// Record updates for processors in NVRAM.
//
For (I=0; I<NumSlots; I++) {
//
// Load each Update
//
Issue the WriteUpdate function
If (STORAGE_FULL) returned {
Display Error -- BIOS is not managing NVRAM appropriately
exit
}
If (INVALID_REVISION) returned {
Display Message: More recent update already loaded in NVRAM for this stepping
continue;
}
If an error occurred {
Display Diagnostic
exit
}
//
// Compare the Update read to that written
//
if (Update read != Update written) {
Display Diagnostic
exit
}
}
//
8-42
PROCESSOR MANAGEMENT AND INITIALIZATION
8-43
PROCESSOR MANAGEMENT AND INITIALIZATION
In order to assure that the BIOS function is present, the caller must verify the Carry Flag, the
Return Code, and the 64-bit signature. Each update block is exactly 2048 bytes in length. The
update count reflects the number of update blocks available for storage within non-volatile
RAM. The update count must return with a value greater than or equal to the number of unique
processor steppings currently installed within the system.
The loader version number refers to the revision of the update loader program that is included
in the system BIOS image.
8-44
PROCESSOR MANAGEMENT AND INITIALIZATION
The BIOS is responsible for selecting an appropriate update block in the non-volatile storage for
storing the new update. This BIOS is also responsible for ensuring the integrity of the informa-
tion provided by the caller, including authenticating the proposed update before incorporating it
into storage.
8-45
PROCESSOR MANAGEMENT AND INITIALIZATION
Before writing the update block into NVRAM, the BIOS should ensure that the update structure
meets the following criteria in the following order:
1. The update header version should be equal to an update header version recognized by the
BIOS.
2. The update loader version in the update header should be equal to the update loader
version contained within the BIOS image.
3. The update block should checksum to zero. This checksum is computed as a 32-bit
summation of all 512 double words in the structure, including the header.
The BIOS selects an update block in non-volatile storage for storing the candidate update. The
BIOS can select any available update block as long as it guarantees that only a single update
exists for any given processor stepping in non-volatile storage. If the update block selected
already contains an update, the following additional criteria apply to overwrite it:
• The processor signature in the proposed update should be equal to the processor signature
in the header of the current update in NVRAM (CPUID + platform ID bits).
• The update revision in the proposed update should be greater than the update revision in
the header of the current update in NVRAM.
If no unused update blocks are available and the above criteria are not met, the BIOS can over-
write an update block for a processor stepping that is no longer present in the system. This can
be done by scanning the update blocks and comparing the processor steppings, identified in the
MP Specification table, to the processor steppings that currently exist in the system.
Finally, before storing the proposed update into NVRAM, the BIOS should verify the authen-
ticity of the update via the mechanism described in Section 8.10.2., “Microcode Update
Loader”. This includes loading the update into the current processor, executing the CPUID
instruction, reading MSR 08Bh, and comparing a calculated value with the update revision in
the proposed update header for equality.
When performing the write update function, the BIOS should record the entire update, including
the header and the update data. When writing an update, the original contents may be over-
written, assuming the above criteria have been met. It is the responsibility of the BIOS to ensure
that more recent updates are not overwritten through the use of this BIOS call, and that only a
single update exists within the NVRAM for any processor stepping.
Figure 8-9 shows the process the BIOS follows to choose an update block and ensure the integ-
rity of the data when it stores the new microcode update.
8-46
PROCESSOR MANAGEMENT AND INITIALIZATION
Yes
Valid
Return
Update Header No
INVALID_HEADER
Version?
Yes
Does Loader
Return
Revision Match BIOS’s No
INVALID_HEADER
Loader?
Yes
Yes
Update
Space Available
Matching CPU Already No Yes
in NVRAM?
In NVRAM?
No
Yes Return
STORAGE_FULL
Update
Return
Revision Newer Than No
INVALID_REVISION
NVRAM Update?
Yes
Yes
Return
SUCCESS
8-47
PROCESSOR MANAGEMENT AND INITIALIZATION
This control is provided on a global basis for all updates and processors. The caller can deter-
mine the current status of update loading (enabled or disabled) without changing the state. The
function does not allow the caller to disable loading of binary updates, as this poses a security
risk.
The caller specifies the requested operation by placing one of the values from Table 8-6 in the
BH register. After successfully completing this function the BL register contains either the
enable or the disable designator. Note that if the function fails, the update status return value is
undefined.
The READ_FAILURE error code returned by this function has meaning only if the control func-
tion is implemented in the BIOS NVRAM. The state of this feature (enabled/disabled) can also
be implemented using CMOS RAM bits where READ failure errors cannot occur.
8-48
PROCESSOR MANAGEMENT AND INITIALIZATION
The read function enables the caller to read any update data that already exists in a BIOS and
make decisions about the addition of new updates. As a result of a successful call, the BIOS
copies exactly 2048 bytes into the location pointed to by ES:DI, with the contents of the update
block represented by update number.
An update block is considered unused and available for storing a new update if its header version
contains the value 0FFFFFFFFh after return from this function call. The actual implementation
of NVRAM storage management is not specified here and is BIOS dependent. As an example,
the actual data value used to represent an empty block by the BIOS may be zero, rather than
8-49
PROCESSOR MANAGEMENT AND INITIALIZATION
0FFFFFFFFh. The BIOS is responsible for translating this information into the header provided
by this function.
8-50
9
Memory Cache
Control
MEMORY CACHE CONTROL
CHAPTER 9
MEMORY CACHE CONTROL
This chapter describes the Intel Architecture’s memory cache and cache control mechanisms,
the TLBs, and the write buffer. It also describes the memory type range registers (MTRRs)
found in the P6 family processors and how they are used to control caching of physical memory
locations.
9-1
MEMORY CACHE CONTROL
Physical
Memory
1
For the Intel486™ processor, the L1 Cache is a unified
instruction and data cache.
2
For the Pentium® and Intel486™ processors, the L2 Cache
is external to the processor package and there is
no cache bus (that is, the L2 cache interfaces with
the system bus).
3
For the Pentium® Pro, Pentium® II and Pentium® III processors,
the L2 Cache is internal to the processor package and there is
a separate cache bus.
The Intel Architecture defines two separate caches: the level 1 (L1) cache and the level 2 (L2)
cache (see Figure 9-1). The L1 cache is closely coupled to the instruction fetch unit and execu-
tion units of the processor. For the Pentium® and P6 family processors, the L1 cache is divided
into two sections: one dedicated to caching instructions and one to caching data. For the
Intel486™ processor, the L1 cache is a unified instruction and data cache.
9-2
MEMORY CACHE CONTROL
NOTES:
1. In the Intel486™ processor, the L1 cache is a unified instruction and data cache, and the TLB is a unified
instruction and data TLB.
2. In the Intel486™ and Pentium® processors, the L2 cache is external to the processor package and
optional.
3. In the Pentium® Pro, Pentium® II, and Pentium® III processors, the L2 cache is internal to the processor
package.
9-3
MEMORY CACHE CONTROL
The L2 cache is a unified cache for storage of both instructions and data. It is closely coupled to
the L1 cache through the processor’s cache bus (for the P6 family processors) or the system bus
(for the Pentium® and Intel486™ processors).
The cache lines for the P6 family and Pentium® processors’ L1 and L2 caches are 32 bytes wide.
The processor always reads a cache line from system memory beginning on a 32-byte boundary.
(A 32-byte aligned cache line begins at an address with its 5 least-significant bits clear.) A cache
line can be filled from memory with a 4-transfer burst transaction. The caches do not support
partially-filled cache lines, so caching even a single doubleword requires caching an entire line.
(The cache line size for the Intel486™ processor is 16 bytes.)
The L1 and L2 caches are available in all execution modes. Using these caches greatly improves
the performance of the processor both in single- and multiple-processor systems. Caching can
also be used in system management mode (SMM); however, it must be handled carefully. For
more information, see Section 12.4.2., “SMRAM Caching”, in Chapter 12, System Management
Mode (SMM).
The TLBs store the most recently used page-directory and page-table entries. They speed up
memory accesses when paging is enabled by reducing the number of memory accesses that are
required to read the page tables stored in system memory. The TLBs are divided into four
groups: instruction TLBs for 4-KByte pages, data TLBs for 4-KByte pages; instruction TLBs
for large pages (2-MByte or 4-MByte pages), and data TLBs for large pages. (Only 4-KByte
pages are supported for Intel386™ and Intel486™ processors.) The TLBs are normally active
only in protected mode with paging enabled. When paging is disabled or the processor is in real-
address mode, the TLBs maintain their contents until explicitly or implicitly flushed. For more
information, see Section 9.10., “Invalidating the Translation Lookaside Buffers (TLBs)”.
The write buffer is associated with the processors instruction execution units. It allows writes to
system memory and/or the internal caches to be saved and in some cases combined to optimize
the processor’s bus accesses. The write buffer is always enabled in all execution modes.
The processor’s caches are for the most part transparent to software. When enabled, instructions
and data flow through these caches without the need for explicit software control. However,
knowledge of the behavior of these caches may be useful in optimizing software performance.
For example, knowledge of cache dimensions and replacement algorithms gives an indication
of how large of a data structure can be operated on at once without causing cache thrashing.
In multiprocessor systems, maintenance of cache consistency may, in rare circumstances,
require intervention by system software. For these rare cases, the processor provides privileged
cache control instructions for use in flushing caches.
9-4
MEMORY CACHE CONTROL
When the processor recognizes that an operand being read from memory is cacheable, the
processor reads an entire cache line into the appropriate cache (L1, L2, or both). This operation
is called a cache line fill. If the memory location containing that operand is still cached the next
time the processor attempts to access the operand, the processor can read the operand from the
cache instead of going back to memory. This operation is called a cache hit.
When the processor attempts to write an operand to a cacheable area of memory, it first checks
if a cache line for that memory location exists in the cache. If a valid cache line does exist, the
processor (depending on the write policy currently in force) can write the operand into the cache
instead of writing it out to system memory. This operation is called a write hit. If a write misses
the cache (that is, a valid cache line is not present for the area of memory being written to), the
processor performs a cache line fill, write allocation. Then it writes the operand into the cache
line and (depending on the write policy currently in force) can also write it out to memory. If the
operand is to be written out to memory, it is written first into the write buffer, and then written
from the write buffer to memory when the system bus is available. (Note that for the Intel486™
and Pentium® processors, write misses do not result in a cache line fill; they always result in a
write to memory. For these processors, only read misses result in cache line fills.)
When operating in a multiple-processor system, Intel Architecture processors (beginning with
the Intel486™ processor) have the ability to snoop other processor’s accesses to system
memory and to their internal caches. They use this snooping ability to keep their internal caches
consistent both with system memory and with the caches in other processors on the bus. For
example, in the Pentium® and P6 family processors, if through snooping one processor detects
that another processor intends to write to a memory location that it currently has cached in
shared state, the snooping processor will invalidate its cache line forcing it to perform a cache
line fill the next time it accesses the same memory location.
Beginning with the P6 family processors, if a processor detects (through snooping) that another
processor is trying to access a memory location that it has modified in its cache, but has not yet
written back to system memory, the snooping processor will signal the other processor (by
means of the HITM# signal) that the cache line is held in modified state and will preform an
implicit write-back of the modified data. The implicit write-back is transferred directly to the
initial requesting processor and snooped by the memory controller to assure that system memory
has been updated. Here, the processor with the valid data may pass the data to the other proces-
sors without actually writing it to system memory; however, it is the responsibility of the
memory controller to snoop this operation and update memory.
9-5
MEMORY CACHE CONTROL
memory accesses, page-table walks, or prefetches of speculated branch targets are made.
This type of cache-control is useful for memory-mapped I/O devices. When used with
normal RAM, it greatly reduces processor performance.
NOTES:
1. Requires programming of MTRRs to implement.
2. Speculative reads not supported.
• Write Combining (WC)—System memory locations are not cached (as with uncacheable
memory) and coherency is not enforced by the processor’s bus coherency protocol.
Speculative reads are allowed. Writes may be delayed and combined in the write buffer to
reduce memory accesses. The writes may be delayed until the next occurrence of a buffer
or processor serialization event, e.g., CPUID execution, a read or write to uncached
memory, interrupt occurrence, LOCKed instruction execution, etc. if the WC buffer is
partially filled. This type of cache-control is appropriate for video frame buffers, where the
order of writes is unimportant as long as the writes update memory so they can be seen on
the graphics display. See Section 9.3.1., “Buffering of Write Combining Memory
Locations”, for more information about caching the WC memory type. The preferred
method is to use the new SFENCE (store fence) instruction introduced in the Pentium® III
processor. The SFENCE instruction ensures weakly ordered writes are written to memory
in order, i.e., it serializes only the store operations.
• Write-through (WT)—Writes and reads to and from system memory are cached. Reads
come from cache lines on cache hits; read misses cause cache fills. Speculative reads are
allowed. All writes are written to a cache line (when possible) and through to system
memory. When writing through to memory, invalid cache lines are never filled, and valid
cache lines are either filled or invalidated. Write combining is allowed. This type of cache-
control is appropriate for frame buffers or when there are devices on the system bus that
access system memory, but do not perform snooping of memory accesses. It enforces
coherency between caches in the processors and system memory.
• Write-back (WB)—Writes and reads to and from system memory are cached. Reads come
from cache lines on cache hits; read misses cause cache fills. Speculative reads are
allowed. Write misses cause cache line fills (in the P6 family processors), and writes are
performed entirely in the cache, when possible. Write combining is allowed. The write-
back memory type reduces bus traffic by eliminating many unnecessary writes to system
memory. Writes to a cache line are not immediately forwarded to system memory; instead,
9-6
MEMORY CACHE CONTROL
they are accumulated in the cache. The modified cache lines are written to system memory
later, when a write-back operation is performed. Write-back operations are triggered when
cache lines need to be deallocated, such as when new cache lines are being allocated in a
cache that is already full. They also are triggered by the mechanisms used to maintain
cache consistency. This type of cache-control provides the best performance, but it requires
that all devices that access system memory on the system bus be able to snoop memory
accesses to insure system memory and cache coherency.
• Write protected (WP)—Reads come from cache lines when possible, and read misses
cause cache fills. Writes are propagated to the system bus and cause corresponding cache
lines on all processors on the bus to be invalidated. Speculative reads are allowed. This
caching option is available in the P6 family processors by programming the MTRRs
(seeTable 9-5).
9-7
MEMORY CACHE CONTROL
subject to the weak ordering semantics of its definition. Ordering is not maintained between the
successive allocation/deallocation of WC buffers (for example, writes to WC buffer 1 followed
by writes to WC buffer 2 may appear as buffer 2 followed by buffer 1 on the system bus. When
a WC buffer is propagated to memory as partial writes there is no guaranteed ordering between
successive partial writes (for example, a partial write for chunk 2 may appear on the bus before
the partial write for chunk 1 or vice versa). The only elements of WC propagation to the system
bus that are guaranteed are those provided by transaction atomicity. For the P6 family proces-
sors, a completely full WC buffer will always be propagated as a single burst transaction using
any of the chunk orders. In a WC buffer propagation where the data will be propagated as
partials, all data contained in the same chunk (0 mod 8 aligned) will be propagated simulta-
neously.
9-8
MEMORY CACHE CONTROL
9-9
MEMORY CACHE CONTROL
NOTE
The effect of setting the CD flag is somewhat different for the P6 family, Pentium®,
and Intel486™ processors (see Table 9-4). To insure memory coherency after the
CD flag is set, the caches should be explicitly flushed. For more information, see
Section 9.5.2., “Preventing Caching”. Setting the CD flag for the P6 family
processors modifies cache line fill and update behaviour. Also for the P6 family
processors, setting the CD flag does not force strict ordering of memory accesses
unless the MTRRs are disabled and/or all memory is referenced as uncached. For
more information, see Section 7.2.4., “Strengthening or Weakening the Memory
Ordering Model”, in Chapter 7, Multiple-Processor Management.
CR4
P
G
E
Enables global pages
CR3 designated with G flag Physical Memory
P P
FFFFFFFFH2
C W
D T
Control caching of
page directory
Page-Directory or
CR0 Page-Table Entry
C N P P MTRRs3
D W G1 C W
D T
9-10
MEMORY CACHE CONTROL
NOTE:
1. The P6 family processors are the only Intel Architecture processors that contain an integrated L2 cache.
The L2 column in this table is definitive for the P6 family processors. It is intended to represent what could
be implemented in a Pentium® processor based system with a platform specific write-back L2 cache.
9-11
MEMORY CACHE CONTROL
• NW flag, bit 29 of control register CR0—Controls the write policy for system memory
locations. For more information, see Section 2.5., “Control Registers”, in Chapter 2,
System Architecture Overview. If the NW and CD flags are clear, write-back is enabled for
the whole of system memory (write-through for the Intel486™ processor), but may be
restricted for individual pages or regions of memory by other cache-control mechanisms.
Table 9-4 shows how the other combinations of CD and NW flags affects caching.
NOTE
®
For the Pentium processor, when the L1 cache is disabled (the CD and NW
flags in control register CR0 are set), external snoops are accepted in DP
(dual-processor) systems and inhibited in uniprocessor systems. When
snoops are inhibited, address parity is not checked and APCHK# is not
asserted for a corrupt address; however, when snoops are accepted, address
parity is checked and APCHK# is asserted for corrupt addresses.
• PCD flag in the page-directory and page-table entries—Controls caching for individual
page tables and pages, respectively. For more information, see Section 3.6.4., “Page-
Directory and Page-Table Entries”, in Chapter 3, Protected-Mode Memory Management.
This flag only has effect when paging is enabled and the CD flag in control register CR0 is
clear. The PCD flag enables caching of the page table or page when clear and prevents
caching when set.
• PWT flag in the page-directory and page-table entries—Controls the write policy for
individual page tables and pages, respectively. For more information, see Section 3.6.4.,
“Page-Directory and Page-Table Entries”, in Chapter 3, Protected-Mode Memory
Management. This flag only has effect when paging is enabled and the NW flag in control
register CR0 is clear. The PWT flag enables write-back caching of the page table or page
when clear and write-through caching when set.
• PCD and PWT flags in control register CR3. Control the global caching and write policy
for the page directory. For more information, see Section 2.5., “Control Registers”, in
Chapter 2, System Architecture Overview. The PCD flag enables caching of the page
directory when clear and prevents caching when set. The PWT flag enables write-back
caching of the page directory when clear and write-through caching when set. These flags
do not affect the caching and write policy for individual page tables. These flags only have
effect when paging is enabled and the CD flag in control register CR0 is clear.
• G (global) flag in the page-directory and page-table entries (introduced to the Intel Archi-
tecture in the P6 family processors)—Controls the flushing of TLB entries for individual
pages. See Section 3.7., “Translation Lookaside Buffers (TLBs)”, in Chapter 3, Protected-
Mode Memory Management, for more information about this flag.
• PGE (page global enable) flag in control register CR4—Enables the establishment of
global pages with the G flag. See Section 3.7., “Translation Lookaside Buffers (TLBs)”, in
Chapter 3, Protected-Mode Memory Management, for more information about this flag.
• Memory type range registers (MTRRs) (introduced in the P6 family processors)—Control
the type of caching used in specific regions of physical memory. Any of the caching types
described in Section 9.3., “Methods of Caching Available”, can be selected. See Section
9-12
MEMORY CACHE CONTROL
9.12., “Memory Type Range Registers (MTRRs)”, for a detailed description of the
MTRRs.
• KEN# and WB/WT# pins on Pentium® processor and KEN# pin alone on the Intel486™
processor—These pins allow external hardware to control the caching method used for
specific areas of memory. They perform similar (but not identical) functions to the MTRRs
in the P6 family processors.
• PCD and PWT pins on the Pentium® and Intel486™ processors—These pins (which are
associated with the PCD and PWT flags in control register CR3 and in the page-directory
and page-table entries) permit caching in an external L2 cache to be controlled on a page-
by-page basis, consistent with the control exercised on the L1 cache of these processors.
The P6 family processors do not provide these pins because the L2 cache in internal to the
chip package.
9-13
MEMORY CACHE CONTROL
Table 9-5. Effective Memory Type Depending on MTRR, PCD, and PWT Settings
MTRR Memory Type PCD Value PWT Value Effective Memory Type
UC X X UC
WC 0 0 WC
0 1 WC
1 0 WC
1 1 UC
WT 0 X WT
1 X UC
WP 0 0 WP
0 1 WP
1 0 WC
1 1 UC
WB 0 0 WB
0 1 WT
1 X UC
NOTE:
This table assumes that the CD and NW flags in register CR0 are set to 0. The effective memory types in the
grey areas are implementation defined and may be different in future Intel Architecture processors.
9-14
MEMORY CACHE CONTROL
9-15
MEMORY CACHE CONTROL
Because of speculative execution in the P6 family processors, the last MOV instruction
performed would place the value at physical location B000H into EBX, rather than the value at
the new physical address A000H. This situation is remedied by placing a TLB invalidation
between the load and the store.
9-16
MEMORY CACHE CONTROL
to fetch the data now or as soon as possible. It will be used soon. The prefetch instruction has
different variations that allow the programmer to control into which cache level the data will be
read. For more information on the variations of the prefetch instruction refer to Section 9.5.3.1.,
“Cacheability Hint Instructions”, Chapter 9, Programming with the Streaming SIMD Exten-
sions, if the Intel Architecture Software Developer’s Manual, Volume 2.
9-17
MEMORY CACHE CONTROL
In general, the existence of the write buffer is transparent to software, even in systems that use
multiple processors. The processor ensures that write operations are always carried out in
program order. It also insures that the contents of the write buffer are always drained to memory
in the following situations:
• When an exception or interrupt is generated.
• (P6 family processors only.) When a serializing instruction is executed.
• When an I/O instruction is executed.
• When a LOCK operation is performed.
• (P6 family processors only.) When a BINIT operation is performed.
• (Pentium® III processors only.) When using SFENCE to order stores.
The discussion of write ordering in Section 7.2., “Memory Ordering”, in Chapter 7, Multiple-
Processor Management, gives a detailed description of the operation of the write buffer.
9-18
MEMORY CACHE CONTROL
NOTE:
* Using these encoding result in a general-protection exception (#GP) being generated.
9-19
MEMORY CACHE CONTROL
Physical Memory
FFFFFFFFH
8 variable ranges
(from 4 KBytes to
maximum size of
physical memory)
100000H
64 fixed ranges FFFFFH
(4 KBytes each) 256 KBytes
C0000H
16 fixed ranges BFFFFH
256 KBytes
(16 KBytes each) 80000H
7FFFFH
8 fixed ranges
(64-KBytes each) 512 KBytes
0
9-20
MEMORY CACHE CONTROL
63 11 10 9 8 7 0
W F
Reserved C I VCNT
X
9-21
MEMORY CACHE CONTROL
Intel recommends the use of the UC (uncached) memory type for all physical
memory addresses where memory does not exist. To assign the UC type to
nonexistent memory locations, it can either be specified as the default type in
the Type field or be explicitly assigned with the fixed and variable MTRRs.
63 12 11 10 9 8 7 0
F
Reserved E E Type
E—MTRR enable/disable
FE—Fixed-range MTRRs enable/disable
Type—Default memory type
Reserved
9-22
MEMORY CACHE CONTROL
9-23
MEMORY CACHE CONTROL
MTRRphysBasen Register
63 36 35 12 11 8 7 0
MTRRphysMaskn Register
63 36 35 12 11 10 0
9-24
MEMORY CACHE CONTROL
NOTE
Some mask values can result in discontinuous ranges. In a discontinuous
range, the area not mapped by the mask value is set to the default memory
type. Intel does not encourage the use of discontinuous ranges, because they
could require physical memory to be present throughout the entire 4-GByte
physical memory map. If memory is not provided for the complete memory
map, the behaviour of the processor is undefined.
9-25
MEMORY CACHE CONTROL
The following settings for the MTRRs will yield the proper mapping of the physical address
space for this system configuration. The x0_0x notation is used below to add clarity to the large
numbers represented.
MTRRPhysBase0 = 0000_0000_0000_0006h
MTRRPhysMask0 = 0000_000F_FC00_0800h Caches 0-64 MB as WB cache type.
MTRRPhysBase1 = 0000_0000_0400_0006h
MTRRPhysMask1 = 0000_000F_FE00_0800h Caches 64-96 MB as WB cache type.
MTRRPhysBase2 = 0000_0000_0600_0006h
MTRRPhysMask2 = 0000_000F_FFC0_0800h Caches 96-100 MB as WB cache type.
MTRRPhysBase3 = 0000_0000_0400_0000h
MTRRPhysMask3 = 0000_000F_FFC0_0800h Caches 64-68 MB as UC cache type.
MTRRPhysBase4 = 0000_0000_00F0_0000h
MTRRPhysMask4 = 0000_000F_FFF0_0800h Caches 15-16 MB as UC cache type
MTRRPhysBase5 = 0000_0000_A000_0001h
MTRRPhysMask5 = 0000_000F_FF80_0800h Cache A0000000h-A0800000 as WC type.
This MTRR setup uses the ability to overlap any two memory ranges (as long as the ranges are
mapped to WB and UC memory types) to minimize the number of MTRR registers that are
required to configure the memory environment. This setup also fulfills the requirement that two
register pairs are left for operating system usage.
9-26
MEMORY CACHE CONTROL
2. Otherwise, the processor attempts to match the physical address with a memory type range
set with a pair of variable-range MTRRs:
a. If one variable memory range matches, the processor uses the memory type stored in
the MTRRphysBasen register for that range.
b. If two or more variable memory ranges match and the memory types are identical,
then that memory type is used.
c. If two or more variable memory ranges match and one of the memory types is UC, the
UC memory type used.
d. If two or more variable memory ranges match and the memory types are WT and WB,
the WT memory type is used.
e. If two or more variable memory ranges match and the memory types are other than UC
and WB, the behaviour of the processor is undefined.
3. If no fixed or variable memory range matches, the processor uses the default memory type.
9-27
MEMORY CACHE CONTROL
3. A memory type that views write data as not necessarily stored and read back by a
subsequent read, such as the write-protected type, can only be mapped to another type with
the same behaviour (and there are no others for the P6 family processors) or to the
uncacheable type.
In many specific cases, a system designer can have additional information about how a memory
type is used, allowing additional mappings. For example, write-through memory with no asso-
ciated write side effects can be mapped into write-back memory.
9-28
MEMORY CACHE CONTROL
9-29
MEMORY CACHE CONTROL
FI;
IF TYPE is invalid for P6 family processors
THEN return UNSUPPORTED;
FI;
IF TYPE is WC and not supported
THEN return UNSUPPORTED;
FI;
IF MTRRcap.FIX is set AND range can be mapped using a fixed-range MTRR
THEN
pre_mtrr_change();
update affected MTRR;
post_mtrr_change();
FI;
pre_mtrr_change()
BEGIN
disable interrupts;
Save current value of CR4;
disable and flush caches;
flush TLBs;
disable MTRRs;
IF multiprocessing
THEN maintain consistency through IPIs;
FI;
END
post_mtrr_change()
BEGIN
flush caches and TLBs;
enable MTRRs;
9-30
MEMORY CACHE CONTROL
enable caches;
restore value of CR4;
enable interrupts;
END
The physical address to variable range mapping algorithm in the MemTypeSet function detects
conflicts with current variable range registers by cycling through them and determining whether
the physical address in question matches any of the current ranges. During this scan, the algo-
rithm can detect whether any current variable ranges overlap and can be concatenated into a
single range.
The pre_mtrr_change() function disables interrupts prior to changing the MTRRs, to avoid
executing code with a partially valid MTRR setup. The algorithm disables caching by setting
the CD flag and clearing the NW flag in control register CR0. The caches are invalidated using
the WBINVD instruction. The algorithm disables the page global flag (PGE) in control register
CR4, if necessary, then flushes all TLB entries by updating control register CR3. Finally, it
disables MTRRs by clearing the E flag in the MTRRdefType register.
After the memory type is updated, the post_mtrr_change() function re-enables the MTRRs and
again invalidates the caches and TLBs. This second invalidation is required because of the
processor’s aggressive prefetch of both instructions and data. The algorithm restores interrupts
and re-enables caching by setting the CD flag.
An operating system can batch multiple MTRR updates so that only a single pair of cache inval-
idations occur.
9-31
MEMORY CACHE CONTROL
4. Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag
to 0.)
5. Flush all caches using the WBINVD instruction.
6. Clear the PGE flag in control register CR4 (if set).
7. Flush all TLBs. (Execute a MOV from control register CR3 to another register and then a
MOV from that register back to CR3.)
8. Disable all range registers (by clearing the E flag in register MTRRdefType). If only
variable ranges are being modified, software may clear the valid bits for the affected
register pairs instead.
9. Update the MTRRs.
10. Enable all range registers (by setting the E flag in register MTRRdefType). If only
variable-range registers were modified and their individual valid bits were cleared, then set
the valid bits for the affected ranges instead.
11. Flush all caches and all TLBs a second time. (The TLB flush is required for P6 family
processors. Executing the WBINVD instruction is not needed when using P6 family
processors, but it may be needed in future systems.)
12. Enter the normal cache mode to re-enable caching. (Set the CD and NW flags in control
register CR0 to 0.)
13. Set PGE flag in control register CR4, if previously cleared.
14. Wait for all processors to reach this point.
15. Enable interrupts.
9-32
MEMORY CACHE CONTROL
The P6 family processors provide special support for the physical memory range from 0 to 4
MBytes, which is potentially mapped by both the fixed and variable MTRRs. This support is
invoked when a P6 family processor detects a large page overlapping the first 1 MByte of this
memory range with a memory type that conflicts with the fixed MTRRs. Here, the processor
maps the memory range as multiple 4-KByte pages within the TLB. This operation insures
correct behavior at the cost of performance. To avoid this performance penalty, operating-
system software should reserve the large page option for regions of memory at addresses greater
than or equal to 4 MBytes.
NOTE
In multiple processor systems, the operating system(s) must maintain MTRR
consistency between all the processors in the system. The P6 family
processors provide no hardware support for maintaining this consistency. In
general, all processors must have the same MTRR values.
9.13.1. Background
The P6 family of processors support the assignment of specific memory types to physical
addresses. Memory type support is provided through the use of Memory Type Range Registers
(MTRRs). Currently there are two interacting mechanisms that work together to set the effective
memory type: the MTRRs and the page tables. Refer to the Intel Architecture Software Devel-
oper’s Manual, Volume 3: System Programming Guide.
The MTRRs define the memory types for physical address ranges. MTRRs have specific align-
ment and length requirements for the memory regions they describe. Therefore, they are useful
for statically describing memory types for physical ranges, and are typically set up by the system
BIOS. However, they are incapable of describing memory types for the dynamic, linearly
addressed data structures of programs. The MTRRs are an expandable and programmable way
to encode memory types, but are inflexible because they can only apply those memory types to
physical address ranges.
9-33
MEMORY CACHE CONTROL
The page tables allow memory types to be assigned dynamically to linearly addressed pages of
memory. This gives the operating system the maximum amount of flexibility in applying
memory types to any data structure. However, the page tables only offer three of the five basic
P6 processor family memory type encodings: Write-back (WB), Write-through (WT) and
Uncached (UC). The PAT extends the existing page-table format to enable the specification of
additional memory types.
31 27 26 24 23 19 18 16 15 11 10 8 7 3 2 0
Rsvd PA3 Rsvd PA 2 Rsvd PA 1 Rsvd PA 0
63 59 58 56 55 51 50 48 47 43 42 40 39 35 34 32
Rsvd PA7 Rsvd PA6 Rsvd PA5 Rsvd PA4
NOTES:
1. PA0-7 = Specifies the eight page attribute locations contained within the PAT
2. Rsvd = Most significant bits for each Page Attribute are reserved for future expansion
Each of the eight page attribute fields can contain any of the available memory type encodings,
or indexes, as specified in Table 9-1.
9-34
MEMORY CACHE CONTROL
NOTES:
1. PATi bit is defined as bit 7 for 4 KB PTEs, bit 12 for PDEs mapping 2 MB/4 MB pages.
2. UC- is the page encoding PCD, PWT = 10 on P6 family processors that do not support this feature. UC-
in the page table is overridden by WC in the MTRRs.
3. UC is the page encoding PCD, PWT = 11 on P6 family processors that do not support this feature. UC in
the page-table overrides WC in the MTRRs.
In P6 family processors that do not support the PAT, the PCD and PWT bits are used to deter-
mine the page-table memory types of a given physical page. The PAT feature redefines these two
bits and combines them with a newly defined PAT-index bit (PATi) in the page-directory and
page-table entries. These three bits create an index into the 8-entry Page Attribute Table. The
memory type from the PAT is used in place of PCD and PWT for computing the effective
memory type.
The bit used for PATi differs depending upon the level of the paging hierarchy. PATi is bit 7 for
page-table entries, and bit 12 for page-directory entries that map to large pages. Reserved bit
faults are disabled for nonzero values for PATi, but remain present for all other reserved bits.
This is true for 4 KB/2 MB pages when PAE is enabled. The PAT index scheme for each level
of the paging hierarchy is shown in Figure 9-8.
9-35
MEMORY CACHE CONTROL
31 4 3
PCD PWT
Page-Directory Base Register (CR3)
31 4 3
PCD and PWT provide 2 bit
PCD PWT index into the PAT, allowing use
of first 4 entries
Page-Directory Pointer Table Entry
31 4 3
PCD PWT
4 KB Page-Directory Entry
31 13 12 4 3
PATi PCD PWT
2 MB/4 MB Page-Directory Entry PATi, PCD, and PWT provide 3 bit
index into the PAT, allowing use of
31 8 7 4 3 all 8 entries
PATi PCD PWT
4 KB Page-Table Entry
Figure 9-8. Page Attribute Table Index Scheme for Paging Hierarchy
NOTE:
This figure only shows the format of the lower 32 bits of the PDE, PDEPTR, and PTEs when in PAE mode
Refer to Figure 3-21 from Chapter 3, Protected-Mode Memory Management of the Intel Architecture Soft-
ware Developer’s Manual, Volume 3: System Programming Guide. Additionally, the formats shown in this
figure are not meant to accurately represent the entire structure, but only the labeled bits.
Figure 9-8 shows that the PAT bit is not defined in CR3, the Page-Directory-Pointer Tables when
PAE is enabled, or the Page Directory when it doesn’t describe a large page. In these cases, only
PCD and PWT are used to index into the PAT, limiting the operating system to using only the
first 4 entries of PAT for describing the memory attributes of the paging hierarchy. Note that all
8 PAT entries are available for describing a 4 KB/2 MB/4 MB page.
The memory type as now defined by PAT interacts with the MTRR memory type to determine
the effective memory type as outlined in Table 9-9. Compare this to Table 9-5.
9-36
MEMORY CACHE CONTROL
NOTES:
• This table assumes that the CD and NW flags in register CR0 are set to 0. If CR0.CD = 1, then the effec-
tive memory type returned is UC, regardless of what is indicated in the table. However, this does not force
strict ordering. To ensure strict ordering, the MTRRs also must be disabled.
• The effective memory types in the gray areas are implementation dependent and may be different
between implementations of Intel Architecture processors.
• UC_MTRR indicates that the UC attribute came from the MTRRs and the processor(s) are not required to
snoop their caches since the data could never have been cached. This is preferred for performance rea-
sons.
• UC_PAGE indicates that the UC attribute came from the page tables and processors are required to
check their caches because the data may be cached due to page aliasing, which is not recommended.
• UC- is the page encoding PCD, PWT = 10 on P6 family processors that do not support this feature. UC- in the
PTE/PDE is overridden by WC in the MTRRs.
• UC is the page encoding PCD, PWT = 11 on P6 family processors that do not support this feature. UC in the
PTE/PDE overrides WC in the MTRRs.
Whenever the MTRRs are disabled, via bit 11 (E) in the MTRRDefType register, the effective
memory type is UC for all memory ranges.
An operating system can program the PAT and select the 8 most useful attribute combinations.
The PAT allows an operating system to offer performance-enhancing memory types to applica-
tions.
9-37
MEMORY CACHE CONTROL
The page attribute for addresses containing a page directory or page table supports only the first
four entries in the PAT, since a PAT-index bit is not defined for these mappings. The page
attribute is determined by using the two-bit value specified by PCD and PWT in CR3 (for page
directory) or the page-directory entry (for page tables). The same applies to Page-Directory-
Pointer Tables when PAE is enabled.
The operating system is responsible for ensuring that changes to a PAT entry occur in a manner
that maintains the consistency of the processor caches and translation lookaside buffers (TLB).
This is accomplished by following the procedure as specified in the Intel Architecture Software
9-38
MEMORY CACHE CONTROL
Developer’s Manual, Volume 3: System Programming Guide, for changing the value of an
MTRR. It involves a specific sequence of operations that includes flushing the processor(s)
caches and TLBs. An operating system must ensure that the PAT of all processors in a multipro-
cessing system have the same values.
The PAT allows any memory type to be specified in the page tables, and therefore it is possible
to have a single physical page mapped by two different linear addresses with differing memory
types. This practice is strongly discouraged by Intel and should be avoided as it may lead to
undefined results. In particular, a WC page must never be aliased to a cacheable page because
WC writes may not check the processor caches. When remapping a page that was previously
mapped as a cacheable memory type to a WC page, an operating system can avoid this type of
aliasing by:
• Removing the previous mapping to a cacheable memory type in the page tables; that is,
make them not present.
• Flushing the TLBs of processors that may have used the mapping, even speculatively.
• Creating a new mapping to the same physical address with a new memory type, for
instance, WC.
• Flushing the caches on all processors that may have used the mapping previously.
Operating systems that use a Page Directory as a Page Table and enable Page Size Extensions
must carefully scrutinize the use of the PATi index bit for the 4 KB Page-Table Entries. The PATi
index bit for a PTE (bit 7) corresponds to the page size bit in a PDE. Therefore, the operating
system can only utilize PAT entries PA0-3 when setting the caching type for a page table that is
also used as a page directory. If the operating system attempts to use PAT entries PA4-7 when
using this memory as a page table, it effectively sets the PS bit for the access to this memory as
a page directory.
9-39
MEMORY CACHE CONTROL
9-40
10
MMX™ Technology
System Programming
MMX™ TECHNOLOGY SYSTEM PROGRAMMING
CHAPTER 10
MMX™ TECHNOLOGY SYSTEM PROGRAMMING
This chapter describes those features of the MMX™ technology that must be considered when
designing or enhancing an operating system to support MMX™ technology. It covers MMX™
instruction set emulation, the MMX™ state, aliasing of MMX™ registers, saving MMX™ state,
task and context switching considerations, exception handling, and debugging.
10-1
MMX™ TECHNOLOGY SYSTEM PROGRAMMING
• Each time an MMX™ instruction is executed, the TOS value is set to 000B.
Execution of MMX™ instructions does not affect the other bits in the FPU status word (bits 0
through 10 and bits 14 and 15) or the contents of the other FPU registers that comprise the FPU
state (the FPU control word, instruction pointer, data pointer, or opcode registers).
Table 10-1 summarizes the effects of the MMX™ instructions on the FPU state.
10-2
MMX™ TECHNOLOGY SYSTEM PROGRAMMING
NOTE:
MMn refers to one MMX™ register; Rn refers to corresponding floating-point register.
10-3
MMX™ TECHNOLOGY SYSTEM PROGRAMMING
The values in the fields of the FPU tag word do not affect the contents of the MMX™ registers
or the execution of MMX™ instructions. However, the MMX™ instructions do modify the
contents of the FPU tag word, as is described in Section 10.2., “The MMX™ State and MMX™
Register Aliasing”. These modifications may affect the operation of the FPU when executing
floating-point instructions, if the FPU state is not initialized or restored prior to beginning
floating-point instruction execution.
Note that the FXSAVE/FSAVE and FSTENV instructions (which save FPU state information)
read the FPU tag register and contents of each of the floating-point registers, determine the
actual tag values for each register (empty, nonzero, zero, or special), and store the updated tag
word in memory. After executing these instructions, all the tags in the FPU tag word are set to
empty (11B). Likewise, the EMMS instruction clears MMX™ state from the MMX™/floating-
point registers by setting all the tags in the FPU tag word to 11B.
NOTE
Intel does not support scanning the FPU tag word and then only saving valid
entries.
10-4
MMX™ TECHNOLOGY SYSTEM PROGRAMMING
10-5
MMX™ TECHNOLOGY SYSTEM PROGRAMMING
The TS flag in control register CR0 is provided to allow the operating system to delay saving
the MMX™/FPU state until the FPU is actually accessed in the new task. When this flag is set,
the processor monitors the instruction stream for MMX™ or floating-point instructions. When
the processor detects an MMX™ or floating-point instruction, it raises a device-not-available
exception (#NM) prior to executing the instruction. The device-not-available exception handler
can then be used to save the MMX™/FPU state for the previous task (using an FXSAVE/FSAVE
instruction) and load the MMX™/FPU state for the current task (using an FXRSTOR/FRSTOR
instruction). If the task never encounters an MMX™ or floating-point instruction, the device-
not-available exception will not be raised and the MMX™/FPU state will not be saved unnec-
essarily.
The TS flag can be set either explicitly (by executing a MOV instruction to control register CR0)
or implicitly (using the processors native task switching mechanism). When the native task
switching mechanism is used, the processor automatically sets the TS flag on a task switch.
After the device-not-available handler has saved the MMX™/FPU state, it should execute the
CLTS instruction to clear the TS flag in CR0.
Figure 10-2 gives an example of an operating system that implements MMX™/FPU state saving
using the TS flag. In this example, task A is the currently running task and task B is the task
being switched to.
Task A Task B
MMX™/FPU
Application State Owner
Operating System
CR0.TS=1 and
Task A Floating-point or Task B
MMX™/FPU MMX™ Instruction MMX™/FPU
State Save Area is encountered. State Save Area
Operating System
Saves Task A Task Switching Code Loads Task B
MMX™/FPU State MMX™/FPU State
Device-Not-Available
Exception Handler
The operating system maintains an MMX™/FPU save area for each task and defines a variable
(MMX™/FPUStateOwner) that indicates which task “owns” the MMX™/FPU state. In this
example, task A is the current MMX™/FPU state owner.
On a task switch, the operating system task switching code must execute the following pseudo-
code to set the TS flag according to who is the current MMX™/FPU state owner. If the new task
10-6
MMX™ TECHNOLOGY SYSTEM PROGRAMMING
(task B in this example) is not the current MMX™/FPU state owner, the TS flag is set to 1; other-
wise, it is set to 0.
IF Task_Being_Switched_To ≠ MMX/FPUStateOwner
THEN
CR0.TS ← 1;
ELSE
CR0.TS ← 0;
FI;
If a new task attempts to use an MMX™ or floating-point instruction while the TS flag is set to
1, a device-not-available exception (#NM) is generated and the device-not-available exception
handler executes the following pseudo-code.
CR0.TS ← 0;
FSAVE “To MMX/FPU State Save Area for Current MMX/FPU State Owner”;
FRSTOR “MMX/FPU State From Current Task’s MMX/FPU State Save Area”;
MMX/FPUStateOwner ← Current_Task;
This handler code performs the following tasks:
• Clears the TS flag.
• Saves the MMX™/FPU state in the state save area for the current MMX™/FPU state
owner.
• Restores the MMX™/FPU state from the new task’s MMX™/FPU state save area.
• Updates the current MMX™/FPU state owner to be the current task.
10-7
MMX™ TECHNOLOGY SYSTEM PROGRAMMING
— Device not available (#NM), if an MMX™ instruction is executed when the TS flag in
control register CR0 is set. (See Refer to Section 10.4.1., “Using the TS Flag in
Control Register CR0 to Control MMX™/FPU State Saving”.)
• Floating-point error (#MF). (See Refer to Section 10.5.1., “Effect of MMX™ Instructions
on Pending Floating-Point Exceptions”.)
• Other exceptions can occur indirectly due to the faulty execution of the exception handlers
for the above exceptions. For example, if a stack-segment fault (#SS) occurs due to
MMX™ instructions, the interrupt gate for the stack-segment fault can direct the processor
to invalid TSS, causing an invalid TSS exception (#TS) to be generated.
10.6. DEBUGGING
The debug facilities of the Intel Architecture operate in the same manner when executing
MMX™ instructions as when executing other Intel Architecture instructions. These facilities
enable debuggers to debug MMX™ technology code.
To correctly interpret the contents of the MMX™ or FPU registers from the FXSAVE/FSAVE
image in memory, a debugger needs to take account of the relationship between the floating-
point register’s logical locations relative to TOS and the MMX™ register’s physical locations.
In the floating-point context, STn refers to a floating-point register at location n relative to the
TOS. However, the tags in the FPU tag word are associated with the physical locations of the
floating-point registers (R0 through R7). The MMX™ registers always refer to the physical
locations of the registers (with MM0 through MM7 being mapped to R0 through R7).
In Figure 10-2, the inner circle refers to the physical location of the floating-point and MMX™
registers. The outer circle refers to the floating-point registers’s relative location to the current
TOS.
10-8
MMX™ TECHNOLOGY SYSTEM PROGRAMMING
When the TOS equals 0 (case A in Figure 10-2), ST0 points to the physical location R0 on the
floating-point stack. MM0 maps to ST0, MM1 maps to ST1, and so on.
When the TOS equals 2 (case B in Figure 10-2), ST0 points to the physical location R2. MM0
maps to ST6, MM1 maps to ST7, MM2 maps to ST0, and so on.
10-9
MMX™ TECHNOLOGY SYSTEM PROGRAMMING
10-10
11
Streaming SIMD
Extensions System
Programming
CHAPTER 11
STREAMING SIMD EXTENSIONS SYSTEM
PROGRAMMING
This chapter describes those features of the Streaming SIMD Extensions that must be considered
when designing or enhancing an operating system to support the Pentium® III processor. It
covers extensions emulation, the new SIMD floating-point architectural state, similarities to
MMX™ technology, task and context switching considerations, exception handling, and debug-
ging.
11-1
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
11-2
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
31-16 15 10 5 0
Reserved F R R P U O Z D I R P U O Z D I
Z C C M M M M M M s E E E E E E
v
d
Bits 5-0 indicate whether a Streaming SIMD Extensions numerical exception has been detected.
They are “sticky” flags, and can be cleared by using the LDMXCSR instruction to write zeroes
to these fields. If a LDMXCSR instruction clears a mask bit and sets the corresponding excep-
tion flag bit, an exception will not be generated because of this change. This type of exception
will occur only upon the next Streaming SIMD Extensions instruction to cause it. Streaming
SIMD Extensions use only one exception flag for each exception. There is no provision for indi-
vidual exception reporting within a packed data type. In situations where multiple identical
exceptions occur within the same instruction, the associated exception flag is updated and indi-
cates that at least one of these conditions happened. These flags are cleared upon reset.
Bits 12-7 configure numerical exception masking; an exception type is masked if the corre-
sponding bit is set and it is unmasked if the bit is clear. These bits are set upon reset, meaning
that all numerical exceptions are masked.
Bits 14-13 encode the rounding control, which provides for the common round to nearest mode,
as well as directed rounding and true chop (refer to Section 11.3.2.1., “Rounding Control
Field”). The rounding control is set to round to nearest upon reset.
Bit 15 (FZ) is used to turn on the flush-to-zero mode (refer to Section 11.3.2.2., “Flush-to-
Zero”). This bit is cleared upon reset, disabling the flush-to-zero mode.
The other bits of MXCSR (bits 31-16 and bit 6) are defined as reserved and cleared; attempting
to write a non-zero value to these bits, using either the FXRSTOR or LDMXCSR instructions,
will result in a general protection exception.
11-3
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
The round up and round down modes are termed directed rounding and can be used to imple-
ment interval arithmetic. Interval arithmetic is used to determine upper and lower bounds for the
true result of a multistep computation, when the intermediate results of the computation are
subject to rounding.
The round toward zero mode (sometimes called the “chop” mode) is commonly used when
performing integer arithmetic with the processor.
Whenever possible, the processor produces an infinitely precise result. However, it is often the
case that the infinitely precise result of an arithmetic or store operation cannot be encoded
exactly in the format of the destination operand. For example, the following value (a) has a 24-
bit fraction. The least-significant bit of this fraction (the underlined bit) cannot be encoded
exactly in the single-real format (which has only a 23-bit fraction):
(a) 1.0001 0000 1000 0011 1001 0111E2 101
To round this result (a), the processor first selects two representable fractions b and c that most
closely bracket a in value (b < a < c).
(b) 1.0001 0000 1000 0011 1001 011E2 101
(c) 1.0001 0000 1000 0011 1001 100E2 101
The processor then sets the result to b or to c according to the rounding mode selected in the RC
field. Rounding introduces an error in a result that is less than one unit in the last place to which
the result is rounded.
The rounded result is called the inexact result. When the processor produces an inexact result,
the floating-point precision (inexact) flag (PE) is set in MXCSR.
When the infinitely precise result is between the largest positive finite value allowed in a partic-
ular format and +∞, the processor rounds the result as shown in Table 11-3.
11-4
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
When the infinitely precise result is between the largest negative finite value allowed in a partic-
ular format and −∞, the processor rounds the result as shown in Table 11-4.
The rounding modes have no effect on comparison operations, operations that produce exact
results, or operations that produce NaN results.
11.3.2.2. FLUSH-TO-ZERO
Turning on the Flush-To-Zero mode has the following effects when tiny results occur (i.e. when
the infinitely precise result rounded to the destination precision with an unbounded exponent, is
smaller in absolute value than the smallest normal number that can be represented; this is similar
to the underflow condition when underflow traps are unmasked):
• Zero results are returned with the sign of the true result
• Precision and underflow exception flags are set
The IEEE mandated masked response to underflow is to deliver the denormalized result (i.e.,
gradual underflow); consequently, the flush-to-zero mode is not compatible with IEEE Standard
754. It is provided primarily for performance reasons. At the cost of a slight precision loss, faster
execution can be achieved for applications where underflow is common. Underflow for flush-
to-zero is defined to occur when the exponent for a computed result, prior to denormalization
scaling, falls in the denormal range; this is regardless of whether a loss of accuracy has occurred.
Unmasking the underflow exception takes precedence over flush-to-zero mode; this means that
an exception handler will be invoked for a Streaming SIMD Extensions instruction that gener-
ates an underflow condition while this exception is unmasked, regardless of whether flush-to-
zero is enabled.
11-5
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
11-6
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
tion of any Streaming SIMD Extensions instruction will cause an invalid opcode fault regardless
of the state of CR0.TS.
11-7
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
instruction to read the saved contents of the MXCSR register from memory into the
MXCSR register.
11-8
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
The TS flag in control register CR0 is provided to allow the operating system to delay saving
the SIMD floating-point state until the SIMD floating-point registers are actually accessed in the
new task. When this flag is set, the processor monitors the instruction stream for Streaming
SIMD Extensions instructions. When the processor detects a Streaming SIMD Extensions
instruction, it raises a device-not-available exception (#NM) prior to executing the instruction.
The device-not-available exception handler can then be used to save the SIMD floating-point
state for the previous task (using an FXSAVE instruction) and load the SIMD floating-point state
for the current task (using an FXRSTOR instruction). If the task never encounters a Streaming
SIMD Extensions instruction, the device-not-available exception will not be raised and the
SIMD floating-point state will not be saved unnecessarily.
The TS flag can be set either explicitly (by executing a MOV instruction to control register CR0)
or implicitly (using the processor’s native task switching mechanism). When the native task
switching mechanism is used, the processor automatically sets the TS flag on a task switch.
After the device-not-available handler has saved the SIMD floating-point state, it should execute
the CLTS instruction to clear the TS flag in CR0.
Figure 10-2 gives an example of an operating system that implements SIMD floating-point state
saving using the TS flag. In this example, task A is the currently running task and task B is the
task being switched to.
Task A Task B
SIMD floating-point
Application State Owner
Operating System
CR0.TS=1 and
Task A extensions Task B
SIMD floating-point instruction SIMD floating-point
State Save Area is encountered. State Save Area
Operating System
Saves Task A Task Switching Code Loads Task B
SIMD floating-point State SIMD floating-point State
Device-Not-Available
Exception Handler
Figure 11-2. Example of SIMD Floating-Point State Saving During an Operating System-
Controlled Task Switch
The operating system maintains a SIMD floating-point save area for each task and defines a
variable (SIMD-fpStateOwner) that indicates which task “owns” the SIMD floating-point state.
In this example, task A is the current SIMD floating-point state owner.
On a task switch, the operating system task switching code must execute the following pseudo-
code to set the TS flag according to the current SIMD floating-point state owner. If the new task
11-9
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
(task B in this example) is not the current SIMD floating-point state owner, the TS flag is set to
1; otherwise, it is set to 0.
IF Task_Being_Switched_To ≠ SIMD-fpStateOwner
THEN
CR0.TS ← 1;
ELSE
CR0.TS ← 0;
FI;
If a new task attempts to use a Streaming SIMD Extensions instruction while the TS flag is set
to 1, a device-not-available exception (#NM) is generated and the device-not-available excep-
tion handler executes the following pseudo-code.
CR0.TS ← 0;
FXSAVE “To SIMD floating-point State Save Area for Current SIMD Floating-point State
Owner”;
FXRSTOR “SIMD floating-point State From Current Task’s SIMD Floating-point State Save
Area”;
SIMF-fpStateOwner ← Current_Task;
11-10
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
11-11
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
11-12
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
11-13
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
the destination. The prioritization policy also applies for unmasked exceptions; if both invalid
and divide-by-zero are unmasked for the previous example, only the invalid flag will be set.
Prioritization of exceptions is performed only on an individual sub-operand basis, and not
between suboperands; for example, an invalid exception generated by one sub-operand will not
prevent the reporting of a divide-by-zero exception generated by another sub-operand.
The precedence for SIMD floating-point numeric exceptions is as follows:
1. Invalid operation exception due to NaN operands (refer to Table 11-8).
2. QNaN operand. Though this is not an exception, the handling of a QNaN operand has
precedence over lower-priority exceptions. For example, a QNaN divided by zero results
in a QNaN, not a zero-divide exception.
3. Any other invalid operation exception not mentioned above or a divide-by-zero exception
(refer to Table 11-8).
4. Denormal operand exception. If masked, then instruction execution continues, and a
lower-priority exception can occur as well.
5. Numeric overflow and underflow exceptions possibly in conjunction with the inexact
result exception.
6. Inexact result exception.
11-14
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
Updating of exception flags is generated by a logical-OR of exception conditions for all sub-
operand computations, where the OR is done independently for each type of exception; for
packed computations this means 4 sub-operands and for scalar computations this means 1 sub-
operand (the lowest one).
11-15
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
exception; for packed computations this means 4 sub-operands and for scalar computations
this means 1 sub-operand (the lowest one).
• In the case of only masked exception conditions, all flags will be updated,
• In the case of an unmasked pre-computation type of exception condition (e.g., denormal
input), all flags relating to all pre-computation conditions (masked or unmasked) will be
updated, and no subsequent computation is performed (i.e., no post-computation condition
can occur if there is an unmasked pre-computation condition).
• In the case of an unmasked post-computation exception condition, all flags relating to all
post-computation conditions (masked or unmasked) will be updated; all pre-computation
conditions, which must be masked-only will also be reported.
11-16
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
11-17
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
Table 11-8. Invalid Arithmetic Operations and the Masked Responses to Them
Condition Masked Response
ADDPS/ADDSS/DIVPS/DIVSS/ Return the Signaling NaN converted to a quiet
MULPS/MULSS/SUBPS/SUBSS with a SNaN NaN; Refer to Table 7-18, in Chapter 7,
operand. Floating-Point Unit, for more details; set #IA
flag.
CMPPS/CMPSS with QNaN/SNaN operands Return a mask of all 0’s for predicates "eq", "lt",
(QNaN applies only for predicates "lt", "le", "nlt", "le", and "ord", and a mask of all 1’s for
"nle") predicates "neq", "nlt", "nle", and "unord"; set
#IA flag.
COMISS with QNaN/SNaN operand(s). Set EFLAGS values to ’not comparable’; set
#IA flag.
UCOMISS with SNaN operand(s). Set EFLAGS values to ’not comparable’; set
#IA flag.
SQRTPS/SQRTSS with SNaN operand(s). Return the SNan converted to a QNaN; set #IA
flag;
Addition of opposite signed infinities or Return the QNaN Indefinite; set #IA flag.
subtraction of like-signed infinities.
Multiplication of infinity by zero. Return the QNaN Indefinite; set #IA flag.
Divide of (0/0) or( ∞ / ∞ .) Return the QNaN Indefinite; set #IA flag.
SQRTPS/SQRTSS of negative operands (except Return the QNaN Indefinite; set #IA flag.
negative zero).
Conversion to integer when the source register is Return the Integer Indefinite; set #IA flag.
a NaN, Infinity or exceeds the representable
range.
NOTE:
RCPPS/RCPSS/RSQRTPS/RSQRTSS with QNaN/SNaN operand(s) do not raise an invalid exception.
They return either the SNaN operand converted to QNaN, or the original QNaN operand.
RSQRTPS/RSQRTSS with negative operands (but not for negative zero) do not raise an invalid excep-
tion, and return QNaN Indefinite.
11-18
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
11-19
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
Note that the overflow status flag is not set by RCPPS/RCPSS, since these instructions are
combinatorial and are not affected by exception masks.
.
Table 11-9. Masked Responses to Numeric Overflow
Rounding Mode Sign of True Result Result
To nearest + +∞
– –∞
Toward –∞ + Largest finite positive number
– –∞
Toward +∞ + +∞
– Largest finite negative number
Toward zero + Largest finite positive number
– Largest finite negative number
11-20
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
left unaltered, and a software exception handler is invoked (see Section 11.7.2.3.,
“Software Exception Handling - Unmasked Exceptions”).
If underflow is masked and flush-to-zero mode is enabled, an underflow condition will set the
underflow (#U) and inexact (#P) status flags UE and PE in MXCSR and a correctly signed zero
result will be returned; this will avoid the performance penalty associated with generating a
denormalized result. If underflow is unmasked, the flush-to-zero mode is ignored and an under-
flow condition will be handled as described above.
Note that the underflow status flag is not set by RCPPS/RCPSS, since these instructions are
combinatorial and are not affected by exception masks.
The flag (UE) for the numeric underflow exception is bit 4 of MXCSR and the mask bit (UM)
is bit 11 of MXCSR.
11-21
STREAMING SIMD EXTENSIONS SYSTEM PROGRAMMING
In flush-to-zero mode, the inexact result exception is reported along with the underflow excep-
tion (the latter must be masked).
11.8. DEBUGGING
The debug facilities of the Intel Architecture operate in the same manner when executing
Streaming SIMD Extensions as when executing other Intel Architecture instructions. These
facilities enable debuggers to debug code utilizing these instructions.
To correctly interpret the contents of the Pentium® III processor registers from the FXSAVE
image in memory, a debugger needs to take account of the relationship between the floating-
point register’s logical locations relative to TOS and the MMX™ register’s physical locations
(refer to Section 10.6., “Debugging”, Chapter 10, MMX™ Technology System Programming).
In addition it needs to have knowledge of the SIMD floating-point registers and the state save
data area used by the FXSAVE instruction.
Comparisons of the Streaming SIMD Extensions and x87 results can be performed within the
Pentium® III processor at the internal single precision format and/or externally at the memory
single precision format. The internal format comparison is required to allow the partitioning of
the data space to reduce test time.
11-22
12
System Management
Mode
SYSTEM MANAGEMENT MODE (SMM)
CHAPTER 12
SYSTEM MANAGEMENT MODE (SMM)
This chapter describes the Intel Architecture’s System Management Mode (SMM) architecture.
SMM was introduced into the Intel Architecture in the Intel386™ SL processor (a mobile
specialized version of the Intel386™ processor). It is also available in the Intel486™ processors
(beginning with the Intel486™ SL and Intel486™ enhanced versions) and in the Intel Pentium®
and P6 family processors. For a detailed description of the hardware that supports SMM, refer
to the developer’s manuals for each of the Intel Architecture processors.
12-1
SYSTEM MANAGEMENT MODE (SMM)
NOTE
The physical address extension (PAE) mechanism available in the P6 family
processors is not supported when a processor is in SMM.
NOTE
In the P6 family processors, when a processor that is designated as the
application processor during an MP initialization protocol is waiting for a
startup IPI, it is in a mode where SMIs are masked.
12-2
SYSTEM MANAGEMENT MODE (SMM)
(that is, after SMM has been acknowledged to external hardware) is latched and serviced when
the processor exits SMM with the RSM instruction. The processor will latch only one SMI while
in SMM.
Refer to Section 12.5., “SMI Handler Execution Environment” for a detailed description of the
execution environment when in SMM.
12-3
SYSTEM MANAGEMENT MODE (SMM)
12.4. SMRAM
While in SMM, the processor executes code and stores data in the SMRAM space. The SMRAM
space is mapped to the physical address space of the processor and can be up to 4 GBytes in size.
The processor uses this space to save the context of the processor and to store the SMI handler
code, data and stack. It can also be used to store system management information (such as the
system configuration and specific information about powered-down devices) and OEM-specific
information.
The default SMRAM size is 64 KBytes beginning at a base physical address in physical memory
called the SMBASE (refer to Figure 12-1). The SMBASE default value following a hardware
reset is 30000H. The processor looks for the first instruction of the SMI handler at the address
[SMBASE + 8000H]. It stores the processor’s state in the area from [SMBASE + FE00H] to
[SMBASE + FFFFH]. Refer to Section 12.4.1., “SMRAM State Save Map” for a description of
the mapping of the state save area.
The system logic is minimally required to decode the physical address range for the SMRAM
from [SMBASE + 8000H] to [SMBASE + FFFFH]. A larger area can be decoded if needed. The
size of this SMRAM can be between 32 KBytes and 4 GBytes.
The location of the SMRAM can be changed by changing the SMBASE value (refer to Section
12.11., “SMBASE Relocation”). It should be noted that all processors in a multiple-processor
system are initialized with the same SMBASE value (30000H). Initialization software must
sequentially place each processor in SMM and change its SMBASE so that it does not overlap
those of other processors.
The actual physical location of the SMRAM can be in system memory or in a separate RAM
memory. The processor generates an SMI acknowledge transaction (P6 family processors) or
asserts the SMIACT# pin (Pentium® and Intel486™ processors) when the processor receives an
SMI (refer to Section 12.3.1., “Entering SMM”). System logic can use the SMI acknowledge
transaction or the assertion of the SMIACT# pin to decode accesses to the SMRAM and redirect
them (if desired) to specific SMRAM memory. If a separate RAM memory is used for SMRAM,
system logic should provide a programmable method of mapping the SMRAM into system
memory space when the processor is not in SMM. This mechanism will enable start-up proce-
dures to initialize the SMRAM space (that is, load the SMI handler) before executing the SMI
handler during SMM.
12-4
SYSTEM MANAGEMENT MODE (SMM)
SMRAM
SMBASE + FFFFH
Start of State Save Area
SMBASE
Figure 12-1. SMRAM Usage
12-5
SYSTEM MANAGEMENT MODE (SMM)
NOTE:
* Upper two bytes are reserved.
The following registers are saved (but not readable) and restored upon exiting SMM:
• Control register CR4 (CR4 is set to “0” while in the SMM handler).
• The hidden segment descriptor information stored in segment registers CS, DS, ES, FS,
GS, and SS.
If an SMI request is issued for the purpose of powering down the processor, the values of all
reserved locations in the SMM state save must be saved to nonvolatile memory.
The following state is not automatically saved and restored following an SMI and the RSM
instruction, respectively:
• Debug registers DR0 through DR3.
• The FPU registers.
• The MTRRs.
• Control register CR2.
• The model-specific registers (for the P6 family and Pentium® processors) or test registers
TR3 through TR7 (for the Pentium® and Intel486™ processors).
12-6
SYSTEM MANAGEMENT MODE (SMM)
NOTE
A small subset of the MSRs (such as, the time-stamp counter and
performance-monitoring counter) are not arbitrarily writable and therefore
cannot be saved and restored. SMM-based power-down and restoration
should only be performed with operating systems that do not use or rely on
the values of these registers. Operating system developers should be aware of
this fact and ensure that their operating-system assisted power-down and
restoration software is immune to unexpected changes in these register
values.
12-7
SYSTEM MANAGEMENT MODE (SMM)
Intel Architecture processors do not write back or invalidate their internal caches upon leaving
SMM. For this reason, references to the SMRAM area must not be cached if any part of the
SMRAM shadows (overlays) non-SMRAM memory; that is, system DRAM or video RAM. It
is the obligation of the system to ensure that all memory references to overlapped areas are
uncached; that is, the KEN# pin is sampled inactive during all references to the SMRAM area
for the Pentium® processor. The WBINVD instruction should be used to ensure cache coherency
at the end of a cached SMM execution in systems that have a protected SMM memory region
provided by the chipset.
The P6 family of processors have no external equivalent of the KEN# pin. All memory accesses
are typed via the MTRRs. It is not practical therefore to have memory access to a certain address
be cached in one access and not cached in another. Intel does not recommend the caching of
SMM space in any overlapping memory environment on the P6 family of processors.
12-8
SYSTEM MANAGEMENT MODE (SMM)
• Data and the stack can be located anywhere in the 4-GByte address space, but can be
accessed only with a 32-bit address-size override if they are located above 1 MByte. As
with the code segment, the base address for a data or stack segment cannot be more than 20
bits.
The value in segment register CS is automatically set to the default of 30000H for the SMBASE
shifted 4 bits to the right; that is, 3000H. The EIP register is set to 8000H. When the EIP value
is added to shifted CS value (the SMBASE), the resulting linear address points to the first
instruction of the SMI handler.
The other segment registers (DS, SS, ES, FS, and GS) are cleared to 0 and their segment limits
are set to 4 GBytes. In this state, the SMRAM address space may be treated as a single flat 4-
Gbyte linear address space. If a segment register is loaded with a 16-bit value, that value is then
shifted left by 4 bits and loaded into the segment base (hidden part of the segment register). The
limits and attributes are not modified.
Maskable hardware interrupts, exceptions, NMI interrupts, SMI interrupts, A20M interrupts,
single-step traps, breakpoint traps, and INIT operations are inhibited when the processor enters
SMM. Maskable hardware interrupts, exceptions, single-step traps, and breakpoint traps can be
enabled in SMM if the SMM execution environment provides and initializes an interrupt table
and the necessary interrupt and exception handlers (refer to Section 12.6., “Exceptions and
Interrupts Within SMM”).
12-9
SYSTEM MANAGEMENT MODE (SMM)
12-10
SYSTEM MANAGEMENT MODE (SMM)
• If an SMI handler needs access to the debug trap facilities, it must insure that an SMM
accessible debug handler is available and save the current contents of debug registers DR0
through DR3 (for later restoration). Debug registers DR0 through DR3 and DR7 must then
be initialized with the appropriate values.
• If an SMI handler needs access to the single-step mechanism, it must insure that an SMM
accessible single-step handler is available, and then set the TF flag in the EFLAGS
register.
• If the SMI design requires the processor to respond to maskable hardware interrupts or
software-generated interrupts while in SMM, it must ensure that SMM accessible interrupt
handlers are available and then set the IF flag in the EFLAGS register (using the STI
instruction). Software interrupts are not blocked upon entry to SMM, so they do not need
to be enabled.
12-11
SYSTEM MANAGEMENT MODE (SMM)
safest way to perform this task is to place the processor in 32-bit protected mode before saving
the FPU state. The reason for this is as follows.
The FSAVE instruction saves the FPU context in any of four different formats, depending on
which mode the processor is in when FSAVE is executed (refer to Figures 7-13 through 7-16 in
the Intel Architecture Software Developer’s Manual, Volume 1). When in SMM, by default, the
16-bit real-address mode format is used (shown in Figure 7-16). If an SMI interrupt occurs while
the processor is in a mode other than 16-bit real-address mode, FSAVE and FRSTOR will be
unable to save and restore all the relevant FPU information, and this situation may result in a
malfunction when the interrupted program is resumed. To avoid this problem, the processor
should be in 32-bit protected mode when executing the FSAVE and FRSTOR instructions.
The following guidelines should be used when going into protected mode from an SMI handler
to save and restore the FPU state:
• Use the CPUID instruction to insure that the processor contains an FPU.
• Create a 32-bit code segment in SMRAM space that contains procedures or routines to
save and restore the FPU using the FSAVE and FRSTOR instructions, respectively. A
GDT with an appropriate code-segment descriptor (D bit is set to 1) for the 32-bit code
segment must also be placed in SMRAM.
• Write a procedure or routine that can be called by the SMI handler to save and restore the
FPU state. This procedure should do the following:
— Place the processor in 32-bit protected mode as describe in Section 8.8.1., “Switching
to Protected Mode” in Chapter 8, Processor Management and Initialization.
— Execute a far JMP to the 32-bit code segment that contains the FPU save and restore
procedures.
— Place the processor back in 16-bit real-address mode before returning to the SMI
handler (refer to Section 8.8.2., “Switching Back to Real-Address Mode” in Chapter 8,
Processor Management and Initialization).
The SMI handler may continue to execute in protected mode after the FPU state has been saved
and return safely to the interrupted program from protected mode. However, it is recommended
that the handler execute primarily in 16- or 32-bit real-address mode.
12-12
SYSTEM MANAGEMENT MODE (SMM)
Register Offset
7EFCH
31 18 17 16 15 0
SMBASE Relocation
I/O Instruction Restart
Reserved
The upper word of the SMM revision identifier refers to the extensions available. If the I/O
instruction restart flag (bit 16) is set, the processor supports the I/O instruction restart (refer to
Section 12.12., “I/O Instruction Restart”); if the SMBASE relocation flag (bit 17) is set,
SMRAM base address relocation is supported (refer to Section 12.11., “SMBASE Relocation”).
15 1 0
Register Offset
7F02H
Reserved
Auto HALT Restart
12-13
SYSTEM MANAGEMENT MODE (SMM)
These options are summarized in Table 12-3. Note that if the processor was not in a HALT state
when the SMI was received (the auto HALT restart flag is cleared), setting the flag to 1 will
cause unpredictable behavior when the RSM instruction is executed.
If the HLT instruction is restarted, the processor will generate a memory access to fetch the HLT
instruction (if it is not in the internal cache), and execute a HLT bus transaction. This behavior
results in multiple HLT bus transactions for the same HLT instruction.
12-14
SYSTEM MANAGEMENT MODE (SMM)
SMBASE value for each processor so that the SMRAM state save areas for each processor do
not overlap. (For Pentium® and Intel486™ processors, the SMBASE values must be aligned on
a 32-KByte boundary or the processor will enter shutdown state during the execution of a RSM
instruction.)
31 0
Register Offset
SMM Base
7EF8H
If the SMBASE relocation flag in the SMM revision identifier field is set, it indicates the ability
to relocate the SMBASE (refer to Section 12.9., “SMM Revision Identifier”).
A stack located above the 1-MByte boundary can be accessed in the same manner.
12-15
SYSTEM MANAGEMENT MODE (SMM)
The I/O instruction restart field (at offset 7F00H in the SMM state-save area, refer to Figure
12-5) controls I/O instruction restart. When an RSM instruction is executed, if this field contains
the value FFH, then the EIP register is modified to point to the I/O instruction that received the
SMI request. The processor will then automatically re-execute the I/O instruction that the SMI
trapped. (The processor saves the necessary machine state to insure that re-execution of the
instruction is handled coherently.)
15 0
If the I/O instruction restart field contains the value 00H when the RSM instruction is executed,
then the processor begins program execution with the instruction following the I/O instruction.
(When a repeat prefix is being used, the next instruction may be the next I/O instruction in the
repeat loop.) Not re-executing the interrupted I/O instruction is the default behavior; the
processor automatically initializes the I/O instruction restart field to 00H upon entering SMM.
Table 12-4 summarizes the states of the I/O instruction restart field.
Note that the I/O instruction restart mechanism does not indicate the cause of the SMI. It is the
responsibility of the SMI handler to examine the state of the processor to determine the cause of
the SMI and to determine if an I/O instruction was interrupted and should be restarted upon
exiting SMM. If an SMI interrupt is signaled on a non-I/O instruction boundary, setting the I/O
instruction restart field to FFH prior to executing the RSM instruction will likely result in a
program error.
12-16
SYSTEM MANAGEMENT MODE (SMM)
when I/O instruction restart is being used and insure that the handler sets the I/O instruction
restart field to 00H prior to returning from the second invocation of the SMI handler.
12-17
SYSTEM MANAGEMENT MODE (SMM)
12-18
13
Machine-Check
Architecture
MACHINE-CHECK ARCHITECTURE
CHAPTER 13
MACHINE-CHECK ARCHITECTURE
This chapter describes the P6 family’s machine-check architecture and machine-check excep-
tion mechanism. Refer to Chapter 5, Interrupt and Exception Handling for more information on
the machine-check exception. A brief description of the Pentium® processor’s machine check
capability is also given.
13-1
MACHINE-CHECK ARCHITECTURE
63 0 63 0
MCG_STATUS Register MCi_STATUS Register
63 0 63 0
MCG_CTL Register* MCi_ADDR Register
13-2
MACHINE-CHECK ARCHITECTURE
63 9 8 7 0
Reserved Count
63 3 2 1 0
M E R
C I I
Reserved I P P
P V V
13-3
MACHINE-CHECK ARCHITECTURE
NOTE
Operating system or executive software must not modify the contents of the
MC0_CTL register. The MC0_CTL register is internally aliased to the
EBL_CR_POWERON register and as such controls system-specific error
handling features. These features are platform specific. System specific
firmware (the BIOS) is responsible for the appropriate initialization of
MC0_CTL. The P6 family processors only allows the writing of all 1s or all
0s to the MCi_CTL registers.
63 62 61 3 2 1 0
E
E
6
3
E
E
6
2
E
E
6
1
..... E
E
0
2
E E
E E
0 0
1 0
13-4
MACHINE-CHECK ARCHITECTURE
63 62 6160 59 58 5756 32 31 16 15 0
V P Model-Specific
U E
A O C N C Other Information MCA Error Code
L C Error Code
13-5
MACHINE-CHECK ARCHITECTURE
63 36 35 0
Reserved Address
13-6
MACHINE-CHECK ARCHITECTURE
13-7
MACHINE-CHECK ARCHITECTURE
13-8
MACHINE-CHECK ARCHITECTURE
13-9
MACHINE-CHECK ARCHITECTURE
The 2-bit LL sub-field (refer to Table 13-4) indicates the level in the memory hierarchy where
the error occurred (level 0, level 1, level 2, or generic). The LL sub-field also applies to the TLB,
cache, and interconnect error conditions. The P6 family processors support two levels in the
cache hierarchy and one level in the TLBs. Again, the generic type is reported when the
processor cannot determine the hierarchy level.
The 4-bit RRRR sub-field (refer to Table 13-5) indicates the type of action associated with the
error. Actions include read and write operations, prefetches, cache evictions, and snoops.
Generic error is returned when the type of error cannot be determined. Generic read and generic
write are returned when the processor cannot determine the type of instruction or data request
that caused the error. Eviction and Snoop requests apply only to the caches. All of the other
requests apply to TLBs, caches and interconnects.
13-10
MACHINE-CHECK ARCHITECTURE
The bus and interconnect errors are defined with the 2-bit PP (participation), 1-bit T (time-out),
and 2-bit II (memory or I/O) sub-fields, in addition to the LL and RRRR sub-fields (refer to
Table 13-6). The bus error conditions are implementation dependent and related to the type of
bus implemented by the processor. Likewise, the interconnect error conditions are predicated on
a specific implementation-dependent interconnect model that describes the connections
between the different levels of the storage hierarchy. The type of bus is implementation depen-
dent, and as such is not specified in this document. A bus or interconnect transaction consists of
a request involving an address and a response.
Table 13-7. Encoding of the MCi_STATUS Register for External Bus Errors
Bit
No. Bit Function Bit Description
0-1 MCA Error Undefined.
Code
2-3 MCA Error Bit 2 is set to 1 if the access was a special cycle.
Code Bit 3 is set to 1 if the access was a special cycle OR a I/O cycle.
4-7 MCA Error 00WR; W = 1 for writes, R = 1 for reads.
Code
13-11
MACHINE-CHECK ARCHITECTURE
Table 13-7. Encoding of the MCi_STATUS Register for External Bus Errors (Contd.)
Bit
No. Bit Function Bit Description
8-9 MCA Error Undefined.
Code
10 MCA Error Set to 0 for all EBL errors.
Code Set to 1 for internal watch-dog timer time-out.
For a watch-dog timer time-out, all the MCACOD bits except this bit are set to
0. A watch-dog timer time-out only occurs if the BINIT driver is enabled.
11 MCA Error Set to 1 for EBL errors.
Code Set to 0 for internal watch-dog timer time-out.
12-15 MCA Error Reserved.
Code
16-18 Model- Reserved.
Specific Error
Code
19-24 Model- 000000 for BQ_DCU_READ_TYPE error.
Specific Error 000010 for BQ_IFU_DEMAND_TYPE error.
Code 000011 for BQ_IFU_DEMAND_NC_TYPE error.
000100 for BQ_DCU_RFO_TYPE error.
000101 for BQ_DCU_RFO_LOCK_TYPE error.
000110 for BQ_DCU_ITOM_TYPE error.
001000 for BQ_DCU_WB_TYPE error.
001010 for BQ_DCU_WCEVICT_TYPE error.
001011 for BQ_DCU_WCLINE_TYPE error.
001100 for BQ_DCU_BTM_TYPE error.
001101 for BQ_DCU_INTACK_TYPE error.
001110 for BQ_DCU_INVALL2_TYPE error.
001111 for BQ_DCU_FLUSHL2_TYPE error.
010000 for BQ_DCU_PART_RD_TYPE error.
010010 for BQ_DCU_PART_WR_TYPE error.
010100 for BQ_DCU_SPEC_CYC_TYPE error.
011000 for BQ_DCU_IO_RD_TYPE error.
011001 for BQ_DCU_IO_WR_TYPE error.
011100 for BQ_DCU_LOCK_RD_TYPE error.
011110 for BQ_DCU_SPLOCK_RD_TYPE error.
011101 for BQ_DCU_LOCK_WR_TYPE error.
27-25 Model- 000 for BQ_ERR_HARD_TYPE error.
Specific Error 001 for BQ_ERR_DOUBLE_TYPE error.
Code 010 for BQ_ERR_AERR2_TYPE error.
100 for BQ_ERR_SINGLE_TYPE error.
101 for BQ_ERR_AERR1_TYPE error.
28 Model- 1 if FRC error is active.
Specific Error
Code
29 Model- 1 if BERR is driven.
Specific Error
Code
13-12
MACHINE-CHECK ARCHITECTURE
Table 13-7. Encoding of the MCi_STATUS Register for External Bus Errors (Contd.)
Bit
No. Bit Function Bit Description
30 Model- 1 if BINIT is driven for this processor.
Specific Error
Code
31 Model- Reserved.
Specific Error
Code
32-34 Other Reserved.
Information
35 Other 1 if BINIT is received from external bus.
Information
BINIT
36 Other This bit is asserted in the MCi_STATUS register if this component has received
Information a parity error on the RS[2:0]# pins for a response transaction. The RS signals
RESPONSE are checked by the RSP# external pin.
PARITY
ERROR
37 Other This bit is asserted in the MCi_STATUS register if this component has received
Information a hard error response on a split transaction (one access that has needed to be
BUS BINIT split across the 64-bit external bus interface into two accesses).
38 Other This bit is asserted in the MCi_STATUS register if this component has
Information experienced a ROB time-out, which indicates that no microinstruction has been
TIMEOUT retired for a predetermined period of time. A ROB time-out occurs when the 15-
BINIT bit ROB time-out counter carries a 1 out of its high order bit.
The ROB time-out counter is prescaled by the 8-bit PIC timer which is a divide
by 128 of the bus clock (the bus clock is 1:2, 1:3, 1:4 the core clock). When a
carry out of the 8-bit PIC timer occurs, the ROB counter counts up by one.
13-13
MACHINE-CHECK ARCHITECTURE
Table 13-7. Encoding of the MCi_STATUS Register for External Bus Errors (Contd.)
Bit
No. Bit Function Bit Description
45 Other Uncorrectable ECC error bit is asserted in the MCi_STATUS register for
Information uncorrected ECC errors. While this bit is asserted, the ECC syndrome field will
UECC not be overwritten.
46 Other The correctable ECC error bit is asserted in the MCi_STATUS register for
Information corrected ECC errors.
CECC
47-54 Other The ECC syndrome field in the MCi_STATUS register contains the 8-bit ECC
Information syndrome only if the error was a correctable/uncorrectable ECC error,
SYNDROME and there wasn’t a previous valid ECC error syndrome logged in the
MCi_STATUS register.
A previous valid ECC error in MCi_STATUS is indicated by MCi_STATUS.bit45
(uncorrectable error occurred) being asserted. After processing an ECC error,
machine-check handling software should clear MCi_STATUS.bit45 so that
future ECC error syndromes can be logged.
55-56 Other Reserved.
Information
13-14
MACHINE-CHECK ARCHITECTURE
Virtually all the machine-check conditions detected with the P6 family processors cannot be
recovered from (they result in abort-type exceptions). The logging of status and error informa-
tion is therefore a baseline implementation. Refer to Section 13.7., “Guidelines for Writing
Machine-Check Software” for more information on logging errors.
For future P6 family processor implementations, where recovery may be possible, the following
things should be considered when writing a machine-check exception handler:
• To determine the nature of the error, the handler must read each of the error-reporting
register banks. The count field in the MCG_CAP register gives number of register banks.
The first register of register bank 0 is at address 400H.
• The VAL (valid) flag in each MCi_STATUS register indicates whether the error
information in the register is valid. If this flag is clear, the registers in that bank do not
contain valid error information and do not need to be checked.
• To write a portable exception handler, only the MCA error code field in the MCi_STATUS
register should be checked. Refer to Section 13.6., “Interpreting the MCA Error Codes” for
information that can be used to write an algorithm to interpret this field.
• The RIPV, PCC, and OVER flags in each MCi_STATUS register indicate whether
recovery from the error is possible. If either of these fields is set, recovery is not possible.
The OVER field indicates that two or more machine-check error occurred. When recovery
is not possible, the handler typically records the error information and signals an abort to
the operating system.
• Corrected errors will have been corrected automatically by the processor. The UC flag in
each MCi_STATUS register indicates whether the processor automatically corrected the
error.
• The RIPV flag in the MCG_STATUS register indicates whether the program can be
restarted at the instruction pointed to by the instruction pointer pushed on the stack when
the exception was generated. If this flag is clear, the processor may still be able to be
restarted (for debugging purposes), but not without loss of program continuity.
• For unrecoverable errors, the EIPV flag in the MCG_STATUS register indicates whether
the instruction pointed to by the instruction pointer pushed on the stack when the exception
was generated is related to the error. If this flag is clear, the pushed instruction may not be
related to the error.
• The MCIP flag in the MCG_STATUS register indicates whether a machine-check
exception was generated. Before returning from the machine-check exception handler,
software should clear this flag so that it can be used reliably by an error logging utility. The
MCIP flag also detects recursion. The machine-check architecture does not support
recursion. When the processor detects machine-check recursion, it enters the shutdown
state.
13-15
MACHINE-CHECK ARCHITECTURE
Example 13-2 gives typical steps carried out by a machine-check exception handler:
13-16
MACHINE-CHECK ARCHITECTURE
• A user-initiated application that polls the register banks and records the exceptions. Here,
the actual polling service is provided by an operating-system driver or through the system
call interface.
Example 13-3 gives pseudocode for an error logging utility.
If the processor supports the machine-check architecture, the utility reads through the banks of
error-reporting registers looking for valid register entries, and then saves the values of the
MCi_STATUS, MCi_ADDR, MCi_MISC and MCG_STATUS registers for each bank that is
valid. The routine minimizes processing time by recording the raw data into a system data struc-
ture or file, reducing the overhead associated with polling. User utilities analyze the collected
data in an off-line environment.
When the MCIP flag is set in the MCG_STATUS register, a machine-check exception is in
progress and the machine-check exception handler has called the exception logging routine.
Once the logging process has been completed the exception-handling routine must determine
13-17
MACHINE-CHECK ARCHITECTURE
whether execution can be restarted, which is usually possible when damage has not occurred
(The PCC flag is clear, in the MCi_STATUS register) and when the processor can guarantee that
execution is restartable (the RIPV flag is set in the MCG_STATUS register). If execution cannot
be restarted, the system is not recoverable and the exception-handling routine should signal the
console appropriately before returning the error status to the Operating System kernel for subse-
quent shutdown.
The machine-check architecture allows buffering of exceptions from a given error-reporting
bank although the P6 family processors do not implement this feature. The error logging routine
should provide compatibility with future processors by reading each hardware error-reporting
bank’s MCi_STATUS register and then writing 0s to clear the OVER and VAL flags in this
register. The error logging utility should re-read the MCi_STATUS register for the bank
ensuring that the valid bit is clear. The processor will write the next error into the register bank
and set the VAL flags.
Additional information that should be stored by the exception-logging routine includes the
processor’s time-stamp counter value, which provides a mechanism to indicate the frequency of
exceptions. A multiprocessing operating system stores the identity of the processor node incur-
ring the exception using a unique identifier, such as the processor’s APIC ID (refer to Section
7.5.9., “Interrupt Destination and APIC ID”).
The basic algorithm given in Example 13-3 can be modified to provide more robust recovery
techniques. For example, software has the flexibility to attempt recovery using information
unavailable to the hardware. Specifically, the machine-check exception handler can, after
logging carefully analyze the error-reporting registers when the error-logging routine reports an
error that does not allow execution to be restarted. These recovery techniques can use external
bus related model-specific information provided with the error report to localize the source of
the error within the system and determine the appropriate recovery strategy.
13-18
14
Code Optimization
CHAPTER 14
CODE OPTIMIZATION
This chapter describes the more important code optimization techniques for Intel Architecture
processors with and without MMX™ technology, as well as with and without Streaming SIMD
Extensions. The chapter begins with general code-optimization guidelines and continues with a
brief overview of the more important blended techniques for optimizing integer, MMX™ tech-
nology, floating-point, and SIMD floating-point code. A comprehensive discussion of code opti-
mization techniques can be found in the Intel Architecture Optimization Manual, Order Number
242816.
14-1
CODE OPTIMIZATION
• Pay attention to the branch prediction algorithm for the target processor. This optimization
is particularly important for P6 family processors. Code that optimizes branch predict-
ability will spend fewer clocks fetching instructions.
• Take advantage of the SIMD capabilities of MMX™ technology and Streaming SIMD
Extensions.
• Avoid partial register stalls.
• Align all data.
• Organize code to minimize instruction cache misses and optimize instruction prefetches.
• Schedule code to maximize pairing on Pentium® processors.
• Avoid prefixed opcodes other than 0FH.
• When possible, load and store data to the same area of memory using the same data sizes
and address alignments; that is, avoid small loads after large stores to the same area of
memory, and avoid large loads after small stores to the same area of memory.
• Use software pipelining.
• Always pair CALL and RET (return) instructions.
• Avoid self-modifying code.
• Do not place data in the code segment.
• Calculate store addresses as soon as possible.
• Avoid instructions that contain 4 or more micro-ops or instructions that are more than 7
bytes long. If possible, use instructions that require 1 micro-op.
• Cleanse partial registers before calling callee-save procedures.
14-2
CODE OPTIMIZATION
• Understand how the compiler handles floating-point code. Look at the assembly dump and
see what transforms are already performed on the program. Study the loop nests in the
application that dominate the execution time.
• Determine why the compiler is not creating the fastest code. For example, look for
dependences that can be resolved by rearranging code
• Look for and correct situations known to cause slow execution of floating-point code, such
as:
— Large memory bandwidth requirements.
— Poor cache locality.
— Long-latency floating-point arithmetic operations.
• Do not use more precision than is necessary. Single precision (32-bits) is faster on some
operations and consumes only half the memory space as double precision (64-bits) or
double extended (80-bits).
• Use a library that provides fast floating-point to integer routines. Many library routines do
more work than is necessary.
• Insure whenever possible that computations stay in range. Out of range numbers cause
very high overhead.
• Schedule code in assembly language using the FXCH instruction. When possible, unroll
loops and pipeline code.
• Perform transformations to improve memory access patterns. Use loop fusion or
compression to keep as much of the computation in the cache as possible.
• Break dependency chains.
14-3
CODE OPTIMIZATION
use them with no iteration. If near full accuracy is needed, use a Newton-Raphson
iteration. If full accuracy is needed, then use divide and square root which provide more
accuracy, but slow down performance.
• Exceptions: mask exceptions to achieve higher performance. Unmasked exceptions may
cause a reduction in the retirement rate.
• Utilize the flush-to-zero mode for higher performance to avoid the penalty of dealing with
denormals and underflows.
• Incorporate the prefetch instruction whenever possible (for details, refer to Chapter 6,
“Optimizing Cache Utilization for Pentium® III processors” ).
• Try to emulate conditional moves by masked compares and logicals instead of using
conditional jumps.
• Utilize MMX™ technology instructions if the computations can be done in SIMD-integer
or for shuffling data or copying data that is not used later in SIMD floating-point computa-
tions.
• If the algorithm requires extended precision, then conversion to SIMD floating-point code
is not advised because the SIMD floating-point instructions are single-precision.
14-4
CODE OPTIMIZATION
14-5
CODE OPTIMIZATION
cmp A, B ; condition
jge L30 ; conditional branch
mov ebx, CONST1
jmp L31 ; unconditional branch
L30:
mov ebx, CONST2
L31:
By replacing the JGE instruction as shown in the previous example with a SETcc instruction,
the EBX register is set to either C1 or C2. This code can be optimized to eliminate the branches
as shown in the following code:
xor ebx, ebx ;clear ebx
cmp A, B
setge bl ;When ebx = 0 or 1
;OR the complement condition
dec ebx ;ebx=00...00 or 11...11
and ebx, (CONST2-CONST1) ;ebx=0 or(CONST2-CONST1)
add ebx, min(CONST1,CONST2) ;ebx=CONST1 or CONST2
The optimized code sets register EBX to 0 then compares A and B. If A is greater than or equal
to B then EBX is set to 1. EBX is then decremented and ANDed with the difference of the
constant values. This sets EBX to either 0 or the difference of the values. By adding the
minimum of the two constants the correct value is written to EBX. When CONST1 or CONST2
is equal to zero, the last instruction can be deleted as the correct value already has been written
to EBX.
When ABS(CONST1-CONST2) is 1 of {2,3,5,9}, the following example applies:
xor ebx, ebx
cmp A, B
setge bl ; or the complement condition
lea ebx, [ebx*D+ebx+CONST1-CONST2]
where D stands for ABS(CONST1 − CONST2) − 1.
A second way to remove branches on P6 family processors is to use the new CMOVcc and
FCMOVcc instructions. The following example shows how to use the CMOVcc instruction to
eliminate the branch from a test and branch instruction sequence. If the test sets the equal flag
then the value in register EBX will be moved to register EAX. This branch is data dependent,
and is representative of a unpredictable branch.
test ecx, ecx
jne 1h
mov eax, ebx
1h:
To change the code, the JNE and the MOV instructions are combined into one CMOVcc instruc-
tion, which checks the equal flag. The optimized code is shown below:
test ecx, ecx ; test the flags
cmoveqeax, ebx ; if the equal flag is set, move ebx to eax
1h:
14-6
CODE OPTIMIZATION
The label 1h: is no longer needed unless it is the target of another branch instruction. These
instructions will generate invalid opcodes when used on previous generation Intel Architecture
processors. Therefore, use the CPUID instruction to check feature bit 15 of the EDX register,
which when set indicates presence of the CMOVcc family of instructions. Do not use the family
and model codes returned by CPUID to test for the presence of specific features.
Additional information on branch optimization can be found in the Intel Architecture Optimiza-
tion Manual.
Because the P6 family processors can execute code out of order, the instructions need not be
immediately adjacent for the stall to occur. The following example also contains a partial stall:
MOV AL, 8
MOV EDX, 0x40
MOV EDI, new_value
14-7
CODE OPTIMIZATION
14-8
CODE OPTIMIZATION
14-9
CODE OPTIMIZATION
• Align 80-bit data on a 128-bit boundary (that is, any boundary that is a multiple of 16
bytes).
• Align 128-bit SIMD floating-point data on a 128-bit boundary (that is, any boundary that is
a multiple of 16 bytes).
14-10
CODE OPTIMIZATION
static float a;
float b; b
static float c; b
Stack
a
Memory
c
14-11
CODE OPTIMIZATION
NOTE
Pairing of instructions improves Pentium® processor performance signifi-
cantly. It does not slow and sometimes improves the performance of P6
family processors.
The following subsections describe the Pentium® processor pairing rules for integer, MMX™,
and, floating-point instructions. The pairing rules are grouped into types, as follows:
• General pairing rules
• Integer instruction pairing rules.
• MMX™ instruction pairing rules.
• Floating-point instruction pairing rules.
14-12
CODE OPTIMIZATION
14-13
CODE OPTIMIZATION
NOTES:
ALU—Arithmetic or logical instruction such as ADD, SUB, or AND. In general, most simple ALU instructions
are pairable.
imm—Immediate.
reg—Register.
mem—Memory location.
r/m—Register or memory location.
acc—Accumulator (EAX or AX register).
14-14
CODE OPTIMIZATION
— Prefixed instructions.
— Shift with immediate instructions.
• PV instructions—The following instructions when issued to the V-pipe can be paired with
a suitable instruction in the U-Pipe. The simple control transfer instructions, such as the
CALL near, JMP near, or Jcc instructions, can execute in either the U-pipe or the V-pipe,
but they can be paired with other instructions only when they are in the V-pipe. Since these
instructions change the instruction pointer (EIP), they cannot pair in the U-pipe since the
next instruction may not be adjacent. The PV instructions include both Jcc short and Jcc
near (which have a 0FH prefix) versions of the Jcc instruction.
Unpairability Due to Register Dependencies
Instruction pairing is also affected by instruction operands. The following instruction pairings
will not result in parallel execution because of register contention. Exceptions to these rules are
given in “Special Pairs”, in Section 14.5.1.2., “Integer Pairing Rules”.
• Flow Dependence—The first instruction writes to a register that the second one reads
from, as in the following example:
mov eax, 8
mov [ebp], eax
• Output Dependence—Both instructions write to the same register, as in the following
example.
mov eax, 8
mov eax, [ebp]
This output dependence limitation does not apply to a pair of instructions that write to the
EFLAGS register (for example, two ALU operations that change the condition codes). The
condition code after the paired instructions execute will have the condition from the V-pipe
instruction.
Note that a pair of instructions in which the first reads a register and the second writes to the
same register (anti-dependence) may be paired, as in the following example:
mov eax, ebx
mov ebx, [ebp]
For purposes of determining register contention, a reference to a byte or word register is treated
as a reference to the containing 32-bit register. Therefore, the following instruction pair does not
execute in parallel because of output dependencies on the contents of the EAX register.
mov al, 1
mov ah, 0
14-15
CODE OPTIMIZATION
Special Pairs
Some integer instructions can be paired in spite of the previously described general integer-
instruction rules. These special pairs overcome register dependencies, and most involve implicit
reads/writes to the ESP register or implicit writes to the condition codes:
• Stack Pointer.
push reg/imm ; push reg/imm
push reg/imm ; call
pop reg ; pop reg
• Condition Codes.
cmp ; jcc
add ; jne
Note that the special pairs that consist of PUSH/POP instructions may have only immediate or
register operands, not memory operands.
Restrictions On Pair Execution
Some integer-instruction pairs may be issued simultaneously but will not execute in parallel:
• Data-Cache Conflict—If both instructions access the same data-cache memory bank then
the second request (V-pipe) must wait for the first request to complete. A bank conflict
occurs when bits 2 through 4 of the two physical addresses are the same. A bank conflict
results in a 1-clock penalty on the V-pipe instruction.
• Inter-Pipe Concurrency—Parallel execution of integer instruction pairs preserves memory-
access ordering. A multiclock instruction in the U-pipe will execute alone until its last
memory access.
For example, the following instructions add the contents of the register and the value at the
memory location, then put the result in the register. An add with a memory operand takes 2
clocks to execute. The first clock loads the value from the data cache, and the second clock
performs the addition. Since there is only one memory access in the U-pipe instruction, the add
in the V-pipe can start in the same clock.
add eax, meml
add ebx, mem2 ;1
(add) (add) ; 2 2-cycle
The following instructions add the contents of the register to the memory location and store the
result at the memory location. An add with a memory result takes 3 clocks to execute. The first
clock loads the value, the second performs the addition, and the third stores the result. When
paired, the last clock of the U-pipe instruction overlaps with the first clock of the V-pipe instruc-
tion execution.
add meml, eax ;1
(add) ;2
(add) add mem2, ebx ;3
(add) ;4
(add) ;5
14-16
CODE OPTIMIZATION
No other instructions may begin execution until the instructions already executing have
completed.
To expose the opportunities for scheduling and pairing, it is better to issue a sequence of simple
instructions rather than a complex instruction that takes the same number of clocks. The simple
instruction sequence can take advantage of more issue slots. The load/store style code genera-
tion requires more registers and increases code size. This impacts Intel486™ processor perfor-
mance, although only as a second order effect. To compensate for the extra registers needed,
extra effort should be put into register allocation and instruction scheduling so that extra regis-
ters are only used when parallelism increases.
14-17
CODE OPTIMIZATION
• The U-pipe integer instruction is a pairable U-pipe integer instruction (see Table 14-2).
Pairing an MMX™ Instruction in the U-Pipe with an Integer Instruction in the V-Pipe
Use the following guidelines for pairing an MMX™ instruction in the U-pipe and an integer
instruction in the V-pipe:
• The U-pipe MMX™ instruction does not access either memory or a general-purpose
register.
• The V-pipe instruction is a pairable integer V-pipe instruction (see Table 14-2).
14-18
CODE OPTIMIZATION
bility of the P6 family processors, stalls will not necessarily occur on an instruction or micro-op
basis. However, if an instruction has a very long latency such as an FDIV, then scheduling can
improve the throughput of the overall application. The following sections list considerations for
floating-point pipelining on Pentium® processors.
Pairing of Floating-Point Instructions
In a Pentium® processor, pairing floating-point instructions with one another (with one excep-
tion) does not result in a performance enhancement because the processor has only one floating-
point unit (FPU). However, some floating-point instructions can be paired with integer instruc-
tions or the FXCH instruction to improve execution times. The following are some general
pairing rules and restrictions for floating-point instructions:
• All floating-point instructions can be executed in the V-pipe and paired with suitable
instructions (generally integer instructions) in the U-pipe.
• The only floating-point instruction that can be executed in the U-pipe is the FXCH
instruction. The FXCH instruction, if executed in the U-pipe can be paired with another
floating-point instruction executing in the V-pipe.
• The floating-point instructions FSCALE, FLDCW, and FST cannot be paired with any
instruction (integer instruction or the FXCH instruction).
Using Integer Instructions to Hide Latencies and Schedule Floating-Point Instructions
When a floating-point instruction depends on the result of the immediately preceding instruc-
tion, and that instruction is also a floating-point instruction, performance can be improved by
placing one or more integer instructions between the two floating-point instructions. This is true
even if the integer instructions perform loop control. The following example restructures a loop
in this manner:
for (i=0; i<Size; i++)
array1 [i] += array2 [i];
; assume eax=Size-1, esi=array1, edi=array2
PENTIUM(R) PROCESSORCLOCKS
LoopEntryPoint:
fld real4 ptr [esi+eax*4] ; 2 - AGI
fadd real4 ptr [edi+eax*4] ;1
fstp real4 ptr [esi+eax*4] ; 5 - waits for fadd
dec eax ;1
jnz LoopEntryPoint
; assume eax=Size-1, esi=array1, edi=array2
jmp LoopEntryPoint
Align 16
TopOfLoop:
fstp real4 ptr [esi+eax*4+4] ; 4 - waits for fadd + AGI
LoopEntryPoint:
fld real4 ptr [esi+eax*4] ;1
14-19
CODE OPTIMIZATION
14-20
CODE OPTIMIZATION
Integer instructions generally overlap with the floating-point operations except when the last
floating-point operation was FXCH. In this case there is a 1 clock delay:
:
U-pipe V-pipe
fadd fxch ;1
; 2 fxch delay
mov eax, 1 inc edx
Using the FILD and FADDP instructions in place of FIADD yields 2 free clocks for executing
other instructions.
FSTSW Instruction
The FSTSW instruction that usually appears after a floating-point comparison instruction
(FCOM, FCOMP, FCOMPP) delays for 3 clocks. Other instructions may be inserted after the
comparison instruction to hide this latency. On the P6 family processors the FCMOVcc instruc-
tion can be used instead.
14-21
CODE OPTIMIZATION
Transcendental Instructions
Transcendental instructions execute in the U-pipe and nothing can be overlapped with them, so
an integer instruction following a transcendental instruction will wait until the previous instruc-
tion completes.
Transcendental instructions execute on the Pentium® processor (and later Intel Architecture
processors) much faster than the software emulations of these instructions found in most math
libraries. Therefore, it may be worthwhile in-lining transcendental instructions in place of math
library calls to transcendental functions. Software emulations of transcendental instructions will
execute faster than the equivalent instructions only if accuracy is sacrificed.
FXCH Guidelines
The FXCH instruction costs no extra clocks on the Pentium® processor when all of the following
conditions occur, allowing the instruction to execute in the V-pipe in parallel with another
floating-point instruction executing in the U-pipe:
• A floating-point instruction follows the FXCH instruction.
• A floating-point instruction from the following list immediately precedes the FXCH
instruction: FADD, FSUB, FMUL, FLD, FCOM, FUCOM, FCHS, FTST, FABS, or FDIV.
• An FXCH instruction has already been executed. This is because the instruction boundaries
in the cache are marked the first time the instruction is executed, so pairing only happens
the second time this instruction is executed from the cache.
When the above conditions are true, the instruction is almost “free” and can be used to access
elements in the deeper levels of the floating-point stack instead of storing them and then loading
them again.
14-22
CODE OPTIMIZATION
The macro instructions entering the decoder travel through the pipe in order; therefore, if a
macro instruction will not fit in the next available decoder then the instruction must wait until
the next clock to be decoded. It is possible to schedule instructions for the decoder such that the
instructions in the in-order pipeline are less likely to be stalled.
Consider the following examples:
• If the next available decoder for a multimicro-op instruction is not decoder 0, the
multimicro-op instruction will wait for decoder 0 to be available, usually in the next clock,
leaving the other decoders empty during the current clock. Hence, the following two
instructions will take 2 clocks to decode.
add eax, ecx ; 1 uop instruction (decoder 0)
add edx, [ebx] ; 2 uop instruction (stall 1 cycle wait till
; decoder 0 is available)
• During the beginning of the decoding clock, if two consecutive instructions are more than
1 micro-op, decoder 0 will decode one instruction and the next instruction will not be
decoded until the next clock.
add eax, [ebx] ; 2 uop instruction (decoder 0)
mov ecx, [eax] ; 2 uop instruction (stall 1 cycle to wait until
; decoder 0 is available)
add ebx, 8 ; 1 uop instruction (decoder 1)
Instructions of the opcode reg, mem form produce two micro-ops: the load from memory and
the operation micro-op. Scheduling for the decoder template (4-1-1) can improve the decoding
throughput of your application.
In general, the opcode reg, mem forms of instructions are used to reduce register pressure in code
that is not memory bound, and when the data is in the cache. Use simple instructions for
improved speed on the Pentium® and P6 family processors.
The following rules should be observed while using the opcode reg, mem instruction on
Pentium® processors with MMX™ technology:
• Schedule for minimal stalls in the Pentium® processor pipe. Use as many simple instruc-
tions as possible. Generally, 32-bit assembly code that is well optimized for the Pentium®
processor pipeline will execute well on the P6 family processors.
• When scheduling for Pentium® processors, keep in mind the primary stall conditions and
decoder (4-1-1) template on the P6 family processors, as shown in the example below.
pmaddw mm6, [ebx] ; 2 uops instruction (decoder 0)
paddd mm7, mm6 ; 1 uop instruction (decoder 1)
ad ebx, 8 ; 1 uop instruction (decoder 2)
14-23
CODE OPTIMIZATION
Instructions NonMMXInstructions
--------------------------------- < MemoryAccesses + -------------------------------------------------------------
2 2
For memory-bound MMX™ code, Intel recommends merging loads whenever the same
memory address is used more than once to reduce memory accesses. For example, the following
code sequence can be speeded up by using a MOVQ instruction in place of the opcode reg,
mem forms of the MMX™ instructions:
OPCODE MM0, [address A]
OPCODE MM1, [address A]
; optimized by use of a MOVQ instruction and opcode reg, mem forms
; of the MMX(TM) instructions
MOVQ MM2, [address A]
OPCODE MM0, MM2
OPCODE MM1, MM2
Another alternative is to incorporate the prefetch instruction introduced in the Pentium® III
processor. Prefetching the data preloads the cache prior to actually needing the data. Proper use
of prefetch can improve performance if the application is not memory bandwidth bound or the
14-24
CODE OPTIMIZATION
data does not already fit into cache. For more information on proper usage of the prefetch
instruction see the Intel Architecture Optimization Manual order number 245127-001.
For MMX™ code that is not memory-bound, load merging is recommended only if the same
memory address is used more than twice. Where load merging is not possible, usage of the
opcode reg, mem instructions is recommended to minimize instruction count and code size. For
example, the following code sequence can be shortened by removing the MOVQ instruction and
using an opcode reg, mem form of the MMX™ instruction:
MOVQ mm0, [address A]
OPCODE mm1, mm0
; optimized by removing the MOVQ instruction and using an
; opcode reg, mem form of the MMX(TM) instructions
OPCODE mm1, [address A]
In many cases, a MOVQ reg, reg and opcode reg, mem can be replaced by a MOVQ reg, mem and
the opcode reg, reg. This should be done where possible, since it saves one micro-op on the
Pentium® II and Pentium® III processors. The following example is one where the opcode is a
symmetric operation:
MOVQ mm1, mm0 (1 micro-op)
OPCODE mm1, [address A] (2 micro-ops)
One clock can be saved by rewriting the code as follows:
MOVQ mm1, [address A] (1 micro-op)
OPCODE mm1, mm0 (1 micro-op)
14-25
CODE OPTIMIZATION
14-26
CODE OPTIMIZATION
These transformations, in general, increase the number the instructions required to perform the
desired operation. For the Pentium® II and Pentium® III processors, the performance penalty due
to the increased number of instructions is more than offset by the number of clocks saved. For
the Pentium® processor with MMX™ technology, however, the increased number of instruc-
tions can negatively impact performance. For this reason, careful and efficient coding of these
transformations is necessary to minimize any potential negative impact to Pentium® processor
performance.
NOTE
This is a very simplistic example used only to demonstrate cache effects.
Many other optimizations are possible in this code.
14-27
CODE OPTIMIZATION
boolean array[max];
for(i=2;i<max;i++) {
array = 1;
}
for(i=2;i<max;i++) {
if( array[i] ) {
for(j=2;j<max;j+=i) {
array[j] = 0; /*here we assign memory to 0 causing
the cache line fetch within the j
loop */
}
}
}
Two optimizations are available for this specific example:
• Optimization 1—In “boolean” in this example there is a “char” array. Here, it may well be
better to make the “boolean” array into an array of bits, thereby reducing the size of the
array, which in turn reduces the number of cache line fetches. The array is packed so that
read-modify-writes are done (since the cache protocol makes every read into a read-
modify-write). Unfortunately, in this example, the vast majority of strides are greater than
256 bits (one cache line of bits), so the performance increase is not significant.
• Optimization 2—Another optimization is to check if the value is already zero before
writing (as shown in the following example), thereby reducing the number of writes to
memory (dirty cache lines)
boolean array[max];
for(i=2;i<max;i++) {
array = 1;
}
for(i=2;i<max;i++) {
if( array[i] ) {
for(j=2;j<max;j+=i) {
if( array[j] != 0 ) { /* check to see if value is
already 0 */
array[j] = 0;
}
}
}
}
The external bus activity is reduced by half because most of the time in the Sieve program the
data is already zero. By checking first, you need only 1 burst bus cycle for the read and you save
the burst bus cycle for every line you do not write. The actual write back of the modified line is
no longer needed, therefore saving the extra cycles.
14-28
CODE OPTIMIZATION
NOTE
This operation benefits the P6 family processors, but it may not enhance the
performance of Pentium® processors. As such, it should not be considered
generic.
AGI Penalty
PF
DI PF
D2 DI PF
AGI
E D2 DI
WB E D2
WB E
WB
14-29
CODE OPTIMIZATION
Note that some instructions have implicit reads/writes to registers. Instructions that generate
addresses implicitly through ESP (such as PUSH, POP, RET, CALL) also suffer from the AGI
penalty, as shown in the following example:
sub esp, 24
; 1 clock cycle stall
push ebx
mov esp, ebp
; 1 clock cycle stall
pop ebp
The PUSH and POP instructions also implicitly write to the ESP register. These writes, however,
do not cause an AGI when the next instruction addresses through the ESP register. Pentium®
processors “rename” the ESP register from PUSH and POP instructions to avoid the AGI
penalty (see the following example):
push edi ; no stall
mov ebx, [esp]
On Pentium® processors, instructions that include both an immediate and a displacement field
are pairable in the U-pipe. When it is necessary to use constants, it is usually more efficient to
use immediate data instead of loading the constant into a register first. If the same immediate
data is used more than once, however, it is faster to load the constant in a register and then use
the register multiple times, as illustrated in the following example:
mov result, 555 ; 555 is immediate, result is
; displacement
mov word ptr [esp+4], 1 ; 1 is immediate, 4 is displacement
Since MMX™ instructions have 2-byte opcodes (0FH opcode map), any MMX™ instruction
that uses base or index addressing with a 4-byte displacement to access memory will have a
length of 8 bytes. Instructions over 7 bytes can slow macro instruction decoding and should be
avoided where possible. It is often possible to reduce the size of such instructions by adding the
immediate value to the value in the base or index register, thus removing the immediate field.
14-30
CODE OPTIMIZATION
14-31
CODE OPTIMIZATION
It is recommended that, whenever possible, prefixed instructions not be used or that they be
scheduled behind instructions which themselves stall the pipe for some other reason.
14-32
CODE OPTIMIZATION
so, the MOVZX instruction is a better choice for the P6 family processors than the
alternative sequences.
• PUSH Mem. The PUSH mem instruction takes 4 clocks for the Intel486™ processor. It is
recommended that the following sequence be used in place of a PUSH mem instruction
because it takes only 2 clocks for the Intel486™ processor and increases pairing
opportunity for the Pentium® processor.
mov reg, mem
push reg
• Short Opcodes. Use 1 byte long instructions as much as possible. This will reduce code
size and help increase instruction density in the instruction cache. The most common
example is using the INC and DEC instructions rather than adding or subtracting the
constant 1 with an ADD or SUB instruction. Another common example is using the PUSH
and POP instructions instead of the equivalent sequence.
• 8/16 Bit Operands. With 8-bit operands, try to use the byte opcodes, rather than using 32-
bit operations on sign and zero extended bytes. Prefixes for operand size override apply to
16-bit operands, not to 8-bit operands.
Sign Extension is usually quite expensive. Often, the semantics can be maintained by
zero extending 16-bit operands. Specifically, the C code in the following example does
not need sign extension nor does it need prefixes for operand size overrides.
static short int a, b;
if (a==b) {
...
}
Code for comparing these 16-bit operands might be:
U Pipe V Pipe
xor eax, eax xor ebx, ebx ;1
Of course, this can only be done under certain circumstances, but the circumstances
tend to be quite common. This would not work if the compare was for greater than, less
than, greater than or equal, and so on, or if the values in EAX or EBX were to be used
in another operation where sign extension was required.
The P6 family processors provides special support for the XOR reg, reg instruction
where both operands point to the same register, recognizing that clearing a register does
not depend on the old value of the register. Additionally, special support is provided for
the above specific code sequence to avoid the partial stall.
14-33
CODE OPTIMIZATION
14-34
CODE OPTIMIZATION
In routines that do not call other routines (leaf routines), use ESP as the base register
to free up EBP. If you are not using the 32-bit flat model, remember that EBP cannot
be used as a general purpose base register because it references the stack segment.
• Avoid Compares with Immediate Zero. Often when a value is compared with zero, the
operation producing the value sets condition codes that can be tested directly by a Jcc
instruction. The most notable exceptions are the MOV and LEA instructions. In these
cases, use the TEST instruction.
• Epilog Sequence. If only 4 bytes were allocated in the stack frame for the current function,
instead of incrementing the stack pointer by 4, use POP instructions to prevent AGIs. For
the Pentium® processor, use two pops for eight bytes.
14-35
CODE OPTIMIZATION
14-36
15
Debugging and
Performance
Monitoring
CHAPTER 15
DEBUGGING AND PERFORMANCE MONITORING
The Intel Architecture provides extensive debugging facilities for use in debugging code and
monitoring code execution and processor performance. These facilities are valuable for debug-
ging applications software, system software, and multitasking operating systems.
The debugging support is accessed through the debug registers (DB0 through DB7) and two
model-specific registers (MSRs). The debug registers of the Intel Architecture processors hold
the addresses of memory and I/O locations, called breakpoints. Breakpoints are user-selected
locations in a program, a data-storage area in memory, or specific I/O ports where a programmer
or system designer wishes to halt execution of a program and examine the state of the processor
by invoking debugger software. A debug exception (#DB) is generated when a memory or I/O
access is made to one of these breakpoint addresses. A breakpoint is specified for a particular
form of memory or I/O access, such as a memory read and/or write operation or an I/O read
and/or write operation. The debug registers support both instruction breakpoints and data break-
points. The MSRs (which were introduced into the Intel Architecture in the P6 family proces-
sors) monitor branches, interrupts, and exceptions and record the addresses of the last branch,
interrupt or exception taken and the last branch taken before an interrupt or exception.
15-1
DEBUGGING AND PERFORMANCE MONITORING
15-2
DEBUGGING AND PERFORMANCE MONITORING
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
R
LEN R/W LEN R/W LEN R/W LEN R/W s
3 3 2 0 0 G 0 0 1v G L G L G L G L G L DR7
2 1 1 0 0 D E E 3 3 2 2 1 1 0 0
d
31 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
B B B 0 1 1 1 1 1 1 1 1 B B B B
Reserved (set to 1)
T S D 3 2 1 0
DR6
31 0
Reserved DR5
31 0
Reserved DR4
31 0
31 0
31 0
31 0
15-3
DEBUGGING AND PERFORMANCE MONITORING
The primary function of the debug registers is to set up and monitor from 1 to 4 breakpoints,
numbered 0 though 3. For each breakpoint, the following information can be specified and
detected with the debug registers:
• The linear address where the breakpoint is to occur.
• The length of the breakpoint location (1, 2, or 4 bytes).
• The operation that must be performed at the address for a debug exception to be generated.
• Whether the breakpoint is enabled.
• Whether the breakpoint condition was present when the debug exception was generated.
The following paragraphs describe the functions of flags and fields in the debug registers.
15-4
DEBUGGING AND PERFORMANCE MONITORING
15-5
DEBUGGING AND PERFORMANCE MONITORING
currently executing on the processor.) The processor clears the GD flag upon
entering to the debug exception handler, to allow the handler access to the
debug registers.
R/W0 through R/W3 (read/write) fields (bits 16, 17, 20, 21, 24, 25, 28, and 29)
Specifies the breakpoint condition for the corresponding breakpoint. The DE
(debug extensions) flag in control register CR4 determines how the bits in the
R/Wn fields are interpreted. When the DE flag is set, the processor interprets
these bits as follows:
00—Break on instruction execution only.
01—Break on data writes only.
10—Break on I/O reads or writes.
11—Break on data reads or writes but not instruction fetches.
When the DE flag is clear, the processor interprets the R/Wn bits the same as
for the Intel386™ and Intel486™ processors, which is as follows:
00—Break on instruction execution only.
01—Break on data writes only.
10—Undefined.
11—Break on data reads or writes but not instruction fetches.
LEN0 through LEN3 (Length) fields (bits 18, 19, 22, 23, 26, 27, 30, and 31)
Specify the size of the memory location at the address specified in the corre-
sponding breakpoint address register (DR0 through DR3). These fields are
interpreted as follows:
00—1-byte length
01—2-byte length
10—Undefined
11—4-byte length
If the corresponding RWn field in register DR7 is 00 (instruction execution),
then the LENn field should also be 00. The effect of using any other length is
undefined. Refer to Section 15.2.5., “Breakpoint Field Recognition” for further
information on the use of these fields.
15-6
DEBUGGING AND PERFORMANCE MONITORING
A data breakpoint for reading or writing data is triggered if any of the bytes participating in an
access is within the range defined by a breakpoint address register and its LENn field. Table 15-1
gives an example setup of the debug registers and the data accesses that would subsequently trap
or not trap on the breakpoints.
A data breakpoint for an unaligned operand can be constructed using two breakpoints, where
each breakpoint is byte-aligned, and the two breakpoints together cover the operand. These
breakpoints generate exceptions only for the operand, not for any neighboring bytes.
Instruction breakpoint addresses must have a length specification of 1 byte (the LENn field is
set to 00). The behavior of code breakpoints for other operand sizes is undefined. The processor
recognizes an instruction breakpoint address only when it points to the first byte of an instruc-
tion. If the instruction has any prefixes, the breakpoint address must point to the first prefix.
15-7
DEBUGGING AND PERFORMANCE MONITORING
sections describe how these exceptions are generated and typical exception handler operations
for handling these exceptions.
15-8
DEBUGGING AND PERFORMANCE MONITORING
for the breakpoint. Instruction breakpoints are the highest priority debug exceptions and are
guaranteed to be serviced before any other exceptions that may be detected during the decoding
or execution of an instruction.
Because the debug exception for an instruction breakpoint is generated before the instruction is
executed, if the instruction breakpoint is not removed by the exception handler, the processor
will detect the instruction breakpoint again when the instruction is restarted and generate another
debug exception. To prevent looping on an instruction breakpoint, the Intel Architecture
provides the RF flag (resume flag) in the EFLAGS register (refer to Section 2.3., “System Flags
and Fields in the EFLAGS Register” in Chapter 2, System Architecture Overview). When the RF
flag is set, the processor ignores instruction breakpoints.
All Intel Architecture processors manage the RF flag as follows. The processor sets the RF flag
automatically prior to calling an exception handler for any fault-class exception except a debug
exception that was generated in response to an instruction breakpoint. For debug exceptions
resulting from instruction breakpoints, the processor does not set the RF flag prior to calling the
debug exception handler. The debug exception handler then has the option of disabling the
instruction breakpoint or setting the RF flag in the EFLAGS image on the stack. If the RF flag
in the EFLAGS image is set when the processor returns from the exception handler, it is copied
into the RF flag in the EFLAGS register by the IRETD or task switch instruction that causes the
return. The processor then ignores instruction breakpoints for the duration of the next instruc-
tion. (Note that the POPF, POPFD, and IRET instructions do not transfer the RF image into the
EFLAGS register.) Setting the RF flag does not prevent other types of debug-exception condi-
tions (such as, I/O or data breakpoints) from being detected, nor does it prevent nondebug excep-
tions from being generated. After the instruction is successfully executed, the processor clears
the RF flag in the EFLAGS register, except after an IRETD instruction or after a JMP, CALL,
or INT n instruction that causes a task switch. (Note that the processor also does not set the RF
flag when calling exception or interrupt handlers for trap-class exceptions, for hardware inter-
rupts, or for software-generated interrupts.)
For the Pentium® processor, when an instruction breakpoint coincides with another fault-type
exception (such as a page fault), the processor may generate one spurious debug exception after
the second exception has been handled, even though the debug exception handler set the RF flag
in the EFLAGS image. To prevent this spurious exception with Pentium® processors, all fault-
class exception handlers should set the RF flag in the EFLAGS image.
15-9
DEBUGGING AND PERFORMANCE MONITORING
The Intel486™ and later Intel Architecture processors ignore the GE and LE flags in DR7. In
the Intel386™ processor, exact data breakpoint matching does not occur unless it is enabled by
setting the LE and/or the GE flags.
The P6 family processors, however, are unable to report data breakpoints exactly for the REP
MOVS and REP STOS instructions until the completion of the iteration after the iteration in
which the breakpoint occurred.
For repeated INS and OUTS instructions that generate an I/O-breakpoint debug exception, the
processor generates the exception after the completion of the first iteration. Repeated INS and
OUTS instructions generate an I/O-breakpoint debug exception after the iteration in which the
memory address breakpoint location is accessed.
15-10
DEBUGGING AND PERFORMANCE MONITORING
15-11
DEBUGGING AND PERFORMANCE MONITORING
31 7 6 5 4 3 2 1 0
P P P P B L
T B B B B T B
R 3 2 1 0 F R
Reserved
15-12
DEBUGGING AND PERFORMANCE MONITORING
following the branch instruction. If the processor faults on the branch, the
address stored in the LastBranchFromIP MSR is again the address of the
branch instruction and that same address is sent out on the bus.
15-13
DEBUGGING AND PERFORMANCE MONITORING
the control transfer to calculate the linear address to be placed in the breakpoint-address regis-
ters. The segment base address can be determined by reading the segment selector for the code
segment from the stack and using it to locate the segment descriptor for the segment in the GDT
or LDT. The segment base address can then be read from the segment descriptor.
Before resuming program execution from a debug-exception handler, the handler should set the
LBR flag again to re-enable last branch and last exception/interrupt recording.
15-14
DEBUGGING AND PERFORMANCE MONITORING
is incremented every processor clock cycle, even when the processor is halted by the HLT
instruction or the external STPCLK# pin.
The RDTSC instruction reads the time-stamp counter and is guaranteed to return a monotoni-
cally increasing unique value whenever executed, except for 64-bit counter wraparound. Intel
guarantees, architecturally, that the time-stamp counter frequency and configuration will be such
that it will not wraparound within 10 years after being reset to 0. The period for counter wrap is
several thousands of years in the Pentium® and P6 family processors.
Normally, the RDTSC instruction can be executed by programs and procedures running at any
privilege level and in virtual-8086 mode. The TSD flag in control register CR4 (bit 2) allows
use of this instruction to be restricted to only programs and procedures running at privilege level
0. A secure operating system would set the TSD flag during system initialization to disable user
access to the time-stamp counter. An operating system that disables user access to the time-
stamp counter should emulate the instruction through a user-accessible programming interface.
The RDTSC instruction is not serializing or ordered with other instructions. Thus, it does not
necessarily wait until all previous instructions have been executed before reading the counter.
Similarly, subsequent instructions may begin execution before the RDTSC instruction operation
is performed.
The RDMSR and WRMSR instructions can read and write the time-stamp counter, respectively,
as a model-specific register (TSC). The ability to read and write the time-stamp counter with the
RDMSR and WRMSR instructions is not an architectural feature, and may not be supported by
future Intel Architecture processors. Writing to the time-stamp counter with the WRMSR
instruction resets the count. Only the low order 32-bits of the time-stamp counter can be written
to; the high-order 32 bits are 0 extended (cleared to all 0s).
15-15
DEBUGGING AND PERFORMANCE MONITORING
number of processor clocks that occur while a specified condition is true. The counters can count
events or measure durations that occur at any privilege level. Table A-1 in Appendix A, Perfor-
mance-Monitoring Events lists the events that can be counted with the P6 family performance
monitoring counters.
The performance-monitoring counters are supported by four MSRs: the performance event
select MSRs (PerfEvtSel0 and PerfEvtSel1) and the performance counter MSRs (PerfCtr0 and
PerfCtr1). These registers can be read from and written to using the RDMSR and WRMSR
instructions, respectively. They can be accessed using these instructions only when operating at
privilege level 0. The PerfCtr0 and PerfCtr1 MSRs can be read from any privilege level using
the RDPMC (read performance-monitoring counters) instruction.
NOTE
The PerfEvtSel0, PerfEvtSel1, PerfCtr0, and PerfCtr1 MSRs and the events
listed in Table A-1 in Appendix A, Performance-Monitoring Events are
model-specific for P6 family processors. They are not guaranteed to be
available in future Intel Architecture processors.
15-16
DEBUGGING AND PERFORMANCE MONITORING
31 24 23 22 21 20 19 18 17 16 15 8 7 0
I I U
Counter Mask N E N P E O S Unit Mask Event Select
V N T C S R
Reserved
15-17
DEBUGGING AND PERFORMANCE MONITORING
counted during a single cycle. If the event count is greater than or equal to this
mask, the counter is incremented by one. Otherwise the counter is not incre-
mented. This mask can be used to count events only if multiple occurrences
happen per clock (for example, two or more instructions retired per clock). If
the counter-mask field is 0, then the counter is incremented each cycle by the
number of events that occurred that cycle.
15-18
DEBUGGING AND PERFORMANCE MONITORING
• Stop counters.
• Read the event counters.
• Read the time-stamp counter.
The event monitor feature determination procedure must determine whether the current
processor supports the performance-monitoring counters and time-stamp counter. This proce-
dure compares the family and model of the processor returned by the CPUID instruction with
those of processors known to support performance monitoring. (The Pentium® and P6 family
processors support performance counters.) The procedure also checks the MSR and TSC flags
returned to register EDX by the CPUID instruction to determine if the MSRs and the RDTSC
instruction are supported.
The initialize and start counters procedure sets the PerfEvtSel0 and/or PerfEvtSel1 MSRs for
the events to be counted and the method used to count them and initializes the counter MSRs
(PerfCtr0 and PerfCtr1) to starting counts. The stop counters procedure stops the performance
counters. (Refer to Section 15.6.1.3., “Starting and Stopping the Performance-Monitoring
Counters” for more information about starting and stopping the counters.)
The read counters procedure reads the values in the PerfCtr0 and PerfCtr1 MSRs, and a read
time-stamp counter procedure reads the time-stamp counter. These procedures would be
provided in lieu of enabling the RDTSC and RDPMC instructions that allow application code
to read the counters.
15-19
DEBUGGING AND PERFORMANCE MONITORING
• Reset the counter to its initial setting and return from the interrupt.
An event monitor application utility or another application program can read the information
collected for analysis of the performance of the profiled application.
NOTE
The CESR, CTR0, and CTR1 MSRs and the events listed in Table A-1 in
Appendix A, Performance-Monitoring Events are model-specific for the
Pentium® processor.
CCn Meaning
000 Count nothing (counter disabled)
001 Count the selected event while CPL is 0, 1, or 2
010 Count the selected event while CPL is 3
011 Count the selected event regardless of CPL
100 Count nothing (counter disabled)
101 Count clocks (duration) while CPL is 0, 1, or 2
110 Count clocks (duration) while CPL is 3
111 Count clocks (duration) regardless of CPL
15-20
DEBUGGING AND PERFORMANCE MONITORING
Note that the highest order bit selects between counting events and counting
clocks (duration); the middle bit enables counting when the CPL is 3; and the
low-order bit enables counting when the CPL is 0, 1, or 2.
31 26 25 24 22 21 16 15 10 9 8 6 5 0
P P
C CC1 ES1 C CC0 ESO
1 0
PC1—Pin control 1
CC1—Counter control 1
ES1—Event select 1
PC0—Pin control 0
CC0—Counter control 0
ES0—Event select 0
Reserved
15-21
DEBUGGING AND PERFORMANCE MONITORING
the pins can only indicate that the event occurred. Moreover, since the internal clock frequency
may be higher than the external clock frequency, a single external clock may correspond to
multiple internal clocks.
A “count up to” function may be provided when the event pin is programmed to signal an over-
flow of the counter. Because the counters are 40 bits, a carry out of bit 39 indicates an overflow.
A counter may be preset to a specific value less then 240 − 1. After the counter has been enabled
and the prescribed number of events has transpired, the counter will overflow. Approximately 5
clocks later, the overflow is indicated externally and appropriate action, such as signaling an
interrupt, may then be taken.
The PM0/BP0 and PM1/BP1 pins also serve to indicate breakpoint matches during in-circuit
emulation, during which time the counter increment or overflow function of these pins is not
available. After RESET, the PM0/BP0 and PM1/BP1 pins are configured for performance moni-
toring, however a hardware debugger may reconfigure these pins to indicate breakpoint
matches.
15-22
16
8086 Emulation
8086 EMULATION
CHAPTER 16
8086 EMULATION
Intel Architecture processors (beginning with the Intel386™ processor) provide two ways to
execute new or legacy programs that are assembled and/or compiled to run on an Intel 8086
processor:
• Real-address mode.
• Virtual-8086 mode.
Figure 2-2 in Chapter 2, System Architecture Overview shows the relationship of these operating
modes to protected mode and system management mode (SMM).
When the processor is powered up or reset, it is placed in the real-address mode. This operating
mode almost exactly duplicates the execution environment of the Intel 8086 processor, with
some extensions. Virtually any program assembled and/or compiled to run on an Intel 8086
processor will run on an Intel Architecture processor in this mode.
When running in protected mode, the processor can be switched to virtual-8086 mode to run
8086 programs. This mode also duplicates the execution environment of the Intel 8086
processor, with extensions. In virtual-8086 mode, an 8086 program runs as a separate protected-
mode task. Legacy 8086 programs are thus able to run under an operating system (such as
Microsoft Windows*) that takes advantage of protected mode and to use protected-mode facil-
ities, such as the protected-mode interrupt- and exception-handling facilities. Protected-mode
multitasking permits multiple virtual-8086 mode tasks (with each task running a separate 8086
program) to be run on the processor along with other nonvirtual-8086 mode tasks.
This section describes both the basic real-address mode execution environment and the virtual-
8086-mode execution environment, available on the Intel Architecture processors beginning
with the Intel386™ processor.
16-1
8086 EMULATION
• The processor supports a nominal 1-MByte physical address space (refer to Section
16.1.1., “Address Translation in Real-Address Mode” for specific details). This address
space is divided into segments, each of which can be up to 64 KBytes in length. The base
of a segment is specified with a 16-bit segment selector, which is zero extended to form a
20-bit offset from address 0 in the address space. An operand within a segment is
addressed with a 16-bit offset from the base of the segment. A physical address is thus
formed by adding the offset to the 20-bit segment base (refer to Section 16.1.1., “Address
Translation in Real-Address Mode”).
• All operands in “native 8086 code” are 8-bit or 16-bit values. (Operand size override
prefixes can be used to access 32-bit operands.)
• Eight 16-bit general-purpose registers are provided: AX, BX, CX, DX, SP, BP, SI, and DI.
The extended 32 bit registers (EAX, EBX, ECX, EDX, ESP, EBP, ESI, and EDI) are
accessible to programs that explicitly perform a size override operation.
• Four segment registers are provided: CS, DS, SS, and ES. (The FS and GS registers are
accessible to programs that explicitly access them.) The CS register contains the segment
selector for the code segment; the DS and ES registers contain segment selectors for data
segments; and the SS register contains the segment selector for the stack segment.
• The 8086 16-bit instruction pointer (IP) is mapped to the lower 16-bits of the EIP register.
Note this register is a 32-bit register and unintentional address wrapping may occur.
• The 16-bit FLAGS register contains status and control flags. (This register is mapped to
the 16 least significant bits of the 32-bit EFLAGS register.)
• All of the Intel 8086 instructions are supported (refer to Section 16.1.3., “Instructions
Supported in Real-Address Mode”).
• A single, 16-bit-wide stack is provided for handling procedure calls and invocations of
interrupt and exception handlers. This stack is contained in the stack segment identified
with the SS register. The SP (stack pointer) register contains an offset into the stack
segment. The stack grows down (toward lower segment offsets) from the stack pointer.
The BP (base pointer) register also contains an offset into the stack segment that can be
used as a pointer to a parameter list. When a CALL instruction is executed, the processor
pushes the current instruction pointer (the 16 least-significant bits of the EIP register and,
on far calls, the current value of the CS register) onto the stack. On a return, initiated with
a RET instruction, the processor pops the saved instruction pointer from the stack into the
EIP register (and CS register on far returns). When an implicit call to an interrupt or
exception handler is executed, the processor pushes the EIP, CS, and EFLAGS (low-order
16-bits only) registers onto the stack. On a return from an interrupt or exception handler,
initiated with an IRET instruction, the processor pops the saved instruction pointer and
EFLAGS image from the stack into the EIP, CS, and EFLAGS registers.
• A single interrupt table, called the “interrupt vector table” or “interrupt table,” is provided
for handling interrupts and exceptions (refer to Figure 16-2). The interrupt table (which has
4-byte entries) takes the place of the interrupt descriptor table (IDT, with 8-byte entries)
used when handling protected-mode interrupts and exceptions. Interrupt and exception
vector numbers provide an index to entries in the interrupt table. Each entry provides a
pointer (called a “vector”) to an interrupt- or exception-handling procedure. Refer to
16-2
8086 EMULATION
Section 16.1.4., “Interrupt and Exception Handling” for more details. It is possible for
software to relocate the IDT by means of the LIDT instruction on Intel Architecture
processors beginning with the Intel386™ processor.
• The floating-point unit (FPU) is active and available to execute FPU instructions in real-
address mode. Programs written to run on the Intel 8087 and Intel 287 math coprocessors
can be run in real-address mode without modification.
The following extensions to the Intel 8086 execution environment are available in the Intel
Architecture’s real-address mode. If backwards compatibility to Intel 286 and Intel 8086 proces-
sors is required, these features should not be used in new programs written to run in real-address
mode.
• Two additional segment registers (FS and GS) are available.
• Many of the integer and system instructions that have been added to P6-family processors
can be executed in real-address mode (refer to Section 16.1.3., “Instructions Supported in
Real-Address Mode”).
• The 32-bit operand prefix can be used in real-address mode programs to execute the 32-bit
forms of instructions. This prefix also allows real-address mode programs to use the
processor’s 32-bit general-purpose registers.
• The 32-bit address prefix can be used in real-address mode programs, allowing 32-bit
offsets.
The following sections describe address formation, registers, available instructions, and inter-
rupt and exception handling in real-address mode. For information on I/O in real-address mode,
refer to Chapter 9, Input/Output, in the Intel Architecture Software Developer’s Manual, Volume
1.
16-3
8086 EMULATION
19 4 3 0
+ 19 16 15 0
= 19 0
Linear
20-bit Linear Address
Address
The Intel Architecture processors beginning with the Intel386™ processor can generate 32-bit
offsets using an address override prefix; however, in real-address mode, the value of a 32-bit
offset may not exceed FFFFH without causing an exception.
For full compatibility with Intel 286 real-address mode, pseudo-protection faults (interrupt 12
or 13) occur if a 32-bit offset is generated outside the range 0 through FFFFH.
16-4
8086 EMULATION
16-5
8086 EMULATION
• BOUND instruction.
• CPU identification (CPUID) instruction.
• System instructions CLTS, INVD, WINVD, INVLPG, LGDT, SGDT, LIDT, SIDT,
LMSW, SMSW, RDMSR, WRMSR, RDTSC, and RDPMC.
Execution of any of the other Intel Architecture instructions (not given in the previous two lists)
in real-address mode result in an invalid-opcode exception (#UD) being generated.
16-6
8086 EMULATION
backward compatibility to Intel 8086 processors, the default base address and limit of the inter-
rupt vector table should not be changed.)
Up to Entry 255
Entry 3
12
Entry 2
8
Entry 1
4
Segment Selector 2
Interrupt Vector 0*
Offset 0
15 0
* Interrupt vector number 0 selects entry 0
(called “interrupt vector 0”) in the interrupt IDTR
vector table. Interrupt vector 0 in turn
points to the start of the interrupt handler
for interrupt 0.
Table 16-1 shows the interrupt and exception vectors that can be generated in real-address mode
and virtual-8086 mode, and in the Intel 8086 processor. Refer to Chapter 5, Interrupt and Excep-
tion Handling for a description of the exception conditions.
16-7
8086 EMULATION
NOTE:
* In the real-address mode, vector 13 is the segment overrun exception. In protected and virtual-8086
modes, this exception covers all general-protection error conditions, including traps to the virtual-8086
monitor from virtual-8086 mode.
16-8
8086 EMULATION
16-9
8086 EMULATION
The TSS of the new task must be a 32-bit TSS, not a 16-bit TSS, because the 16-bit TSS does
not load the most-significant word of the EFLAGS register, which contains the VM flag. All
TSS’s, stacks, data, and code used to handle exceptions when in virtual-8086 mode must also be
32-bit segments.
The processor enters virtual-8086 mode to run the 8086 program and returns to protected mode
to run the virtual-8086 monitor.
The virtual-8086 monitor is a 32-bit protected-mode code module that runs at a CPL of 0. The
monitor consists of initialization, interrupt- and exception-handling, and I/O emulation proce-
dures that emulate a personal computer or other 8086-based platform. Typically, the monitor is
either part of or closely associated with the protected-mode general-protection (#GP) exception
handler, which also runs at a CPL of 0. As with any protected-mode code module, code-segment
descriptors for the virtual-8086 monitor must exist in the GDT or in the task’s LDT. The virtual-
8086 monitor also may need data-segment descriptors so it can examine the IDT or other parts
of the 8086 program in the first 1 MByte of the address space. The linear addresses above
10FFEFH are available for the monitor, the operating system, and other system software.
The 8086 operating-system services consists of a kernel and/or operating-system procedures
that the 8086 program makes calls to. These services can be implemented in either of the
following two ways:
• They can be included in the 8086 program. This approach is desirable for either of the
following reasons:
— The 8086 program code modifies the 8086 operating-system services.
— There is not sufficient development time to merge the 8086 operating-system services
into main operating system or executive.
• They can be implemented or emulated in the virtual-8086 monitor. This approach is
desirable for any of the following reasons:
— The 8086 operating-system procedures can be more easily coordinated among several
virtual-8086 tasks.
— Memory can be saved by not duplicating 8086 operating-system procedure code for
several virtual-8086 tasks.
— The 8086 operating-system procedures can be easily emulated by calls to the main
operating system or executive.
The approach chosen for implementing the 8086 operating-system services may result in
different virtual-8086-mode tasks using different 8086 operating-system services.
16-10
8086 EMULATION
is used, it is transparent to the program running in virtual-8086 mode just as it is for any task
running on the processor.
Paging is not necessary for a single virtual-8086-mode task, but paging is useful or necessary in
the following situations:
• When running multiple virtual-8086-mode tasks. Here, paging allows the lower 1 MByte
of the linear address space for each virtual-8086-mode task to be mapped to a different
physical address location.
• When emulating the 8086 address-wraparound that occurs at 1 MByte. When using 8086-
style address translation, it is possible to specify addresses larger than 1 MByte. These
addresses automatically wraparound in the Intel 8086 processor (refer to Section 16.1.1.,
“Address Translation in Real-Address Mode”). If any 8086 programs depend on address
wraparound, the same effect can be achieved in a virtual-8086-mode task by mapping the
linear addresses between 100000H and 110000H and linear addresses between 0 and
10000H to the same physical addresses.
• When sharing the 8086 operating-system services or ROM code that is common to several
8086 programs running as different 8086-mode tasks.
• When redirecting or trapping references to memory-mapped I/O devices.
Real Mode
Code
Real-Address
Mode PE=0 or
PE=1 RESET
Task Switch1
VM=0
VM=1
Interrupt or
Virtual-8086 Exception2
Mode Virtual-8086 #GP Exception3
RESET Mode Tasks
(8086 IRET4
Programs) IRET5
NOTES:
1. Task switch carried out in either of two ways:
- CALL or JMP where the VM flag in the EFLAGS image is 1.
- IRET where VM is 1 and NT is 1.
2. Hardware interrupt or exception; software interrupt (INT n) when IOPL is 3.
3. General-protection exception caused by software interrupt (INT n), IRET,
POPF, PUSHF, IN, or OUT when IOPL is less than 3.
4. Normal return from protected-mode interrupt or exception handler.
5. A return from the 8086 monitor to redirect an interrupt or exception back
to an interrupt or exception handler in the 8086 program running in virtual-
8086 mode.
6. Internal redirection of a software interrupt (INT n) when VME is 1,
IOPL is <3, and the redirection bit is 1.
When a task switch is used to enter virtual-8086 mode, the TSS for the virtual-8086-mode task
must be a 32-bit TSS. (If the new TSS is a 16-bit TSS, the upper word of the EFLAGS register
is not in the TSS, causing the processor to clear the VM flag when it loads the EFLAGS register.)
The processor updates the VM flag prior to loading the segment registers from their images in
the new TSS. The new setting of the VM flag determines whether the processor interprets the
16-12
8086 EMULATION
Intel Architecture processors that incorporate the virtual mode extension (enabled with the
VME flag in control register CR4) are capable of redirecting software-generated interrupts
back to the program’s interrupt handlers without leaving virtual-8086 mode. Refer to
Section 16.3.3.4., “Method 5: Software Interrupt Handling” for more information on this
mechanism.
16-13
8086 EMULATION
• A hardware reset initiated by asserting the RESET or INIT pin is a special kind of
interrupt. When a RESET or INIT is signaled while the processor is in virtual-8086 mode,
the processor leaves virtual-8086 mode and enters real-address mode.
• Execution of the HLT instruction in virtual-8086 mode will cause a general-protection
(GP#) fault, which the protected-mode handler generally sends to the virtual-8086 monitor.
The virtual-8086 monitor then determines the correct execution sequence after verifying
that it was entered as a result of a HLT execution.
Refer to Section 16.3., “Interrupt and Exception Handling in Virtual-8086 Mode” for informa-
tion on leaving virtual-8086 mode to handle an interrupt or exception generated in virtual-8086
mode.
16-14
8086 EMULATION
16-15
8086 EMULATION
In virtual-8086 mode, the interrupts and exceptions are divided into three classes for the
purposes of handling:
• Class 1—All processor-generated exceptions and all hardware interrupts, including the
NMI interrupt and the hardware interrupts sent to the processor’s external interrupt
delivery pins. All class 1 exceptions and interrupts are handled by the protected-mode
exception and interrupt handlers.
• Class 2—Special case for maskable hardware interrupts (Section 5.1.1.2., “Maskable
Hardware Interrupts”, in Chapter 5, Interrupt and Exception Handling) when the virtual
mode extensions are enabled.
• Class 3—All software-generated interrupts, that is interrupts generated with the INT n
instruction1.
The method the processor uses to handle class 2 and 3 interrupts depends on the setting of the
following flags and fields:
• IOPL field (bits 12 and 13 in the EFLAGS register)—Controls how class 3 software
interrupts are handled when the processor is in virtual-8086 mode (refer to Section 2.3.,
“System Flags and Fields in the EFLAGS Register”, in Chapter 2, System Architecture
Overview). This field also controls the enabling of the VIF and VIP flags in the EFLAGS
register when the VME flag is set. The VIF and VIP flags are provided to assist in the
handling of class 2 maskable hardware interrupts.
• VME flag (bit 0 in control register CR4)—Enables the virtual mode extension for the
processor when set (refer to Section 2.5., “Control Registers”, in Chapter 2, System Archi-
tecture Overview).
• Software interrupt redirection bit map (32 bytes in the TSS, refer to Figure
16-5)—Contains 256 flags that indicates how class 3 software interrupts should be handled
when they occur in virtual-8086 mode. A software interrupt can be directed either to the
interrupt and exception handlers in the currently running 8086 program or to the protected-
mode interrupt and exception handlers.
• The virtual interrupt flag (VIF) and virtual interrupt pending flag (VIP) in the EFLAGS
register—Provides virtual interrupt support for the handling of class 2 maskable
hardware interrupts (refer to Section 16.3.2., “Class 2—Maskable Hardware Interrupt
Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism”).
NOTE
The VME flag, software interrupt redirection bit map, and VIF and VIP flags
are only available in Intel Architecture processors that support the virtual
mode extensions. These extensions were introduced in the Intel Architecture
with the Pentium® processor.
The following sections describe the actions that processor takes and the possible actions of inter-
rupt and exception handlers for the two classes of interrupts described in the previous para-
graphs. These sections describe three possible types of interrupt and exception handlers:
1. The INT 3 instruction is a special case (refer to the description of the INT n instruction in Chapter 3,
Instruction Set Reference, of the Intel Architecture Software Developer’s Manual, Volume 2).
16-16
8086 EMULATION
• Protected-mode interrupt and exceptions handlers—These are the handlers that the
processor calls through the protected-mode IDT.
• Virtual-8086 monitor interrupt and exception handlers—These handlers are resident in the
virtual-8086 monitor, and they are commonly accessed through a general-protection
exception (#GP, interrupt 13) that is directed to the protected-mode general-protection
exception handler.
• 8086 program interrupt and exception handlers—These handlers are part of the 8086
program that is running in virtual-8086 mode.
The following sections describe how these handlers are used, depending on the selected class
and method of interrupt and exception handling.
16-17
8086 EMULATION
expect values in the segment registers or that return values in the segment registers must
use the register images saved on the stack for privilege level 0.
4. Clears the VM flag in the EFLAGS register.
5. Begins executing the selected interrupt or exception handler.
Old FS Old FS
Old DS Old DS
Old ES Old ES
Old SS Old SS
Old CS Old CS
Figure 16-4. Privilege Level 0 Stack After Interrupt or Exception in Virtual-8086 Mode
16-18
8086 EMULATION
If the interrupt or exception is handled with a protected-mode handler, the handler can return to
the interrupted program in virtual-8086 mode by executing an IRET instruction. This instruction
loads the EFLAGS and segment registers from the images saved in the privilege level 0 stack
(refer to Figure 16-4). A set VM flag in the EFLAGS image causes the processor to switch back
to virtual-8086 mode. The CPL at the time the IRET instruction is executed must be 0, otherwise
the processor does not change the state of the VM flag.
The virtual-8086 monitor runs at privilege level 0, like the protected-mode interrupt and excep-
tion handlers. It is commonly closely tied to the protected-mode general-protection exception
(#GP, vector 13) handler. If the protected-mode interrupt or exception handler calls the virtual-
8086 monitor to handle the interrupt or exception, the return from the virtual-8086 monitor to
the interrupted virtual-8086 mode program requires two return instructions: a RET instruction
to return to the protected-mode handler and an IRET instruction to return to the interrupted
program.
The virtual-8086 monitor has the option of directing the interrupt and exception back to an inter-
rupt or exception handler that is part of the interrupted 8086 program, as described in Section
16.3.1.2., “Handling an Interrupt or Exception With an 8086 Program Interrupt or Exception
Handler”.
16-19
8086 EMULATION
7. Execute an IRET instruction to pass control back to the interrupted 8086 program.
Note that if an operating system intends to support all 8086 MS-DOS-based programs, it is
necessary to use the actual 8086 interrupt and exception handlers supplied with the program.
The reason for this is that some programs modify their own interrupt vector table to substitute
(or hook in series) their own specialized interrupt and exception handlers.
16-20
8086 EMULATION
These flags provide the virtual-8086 monitor with more efficient control over handling
maskable hardware interrupts that occur during virtual-8086 mode tasks. They also reduce inter-
rupt-handling overhead, by eliminating the need for all IF related operations (such as PUSHF,
POPF, CLI, and STI instructions) to trap to the virtual-8086 monitor. The purpose and use of
these flags are as follows.
NOTE
The VIF and VIP flags are only available in Intel Architecture processors that
support the virtual mode extensions. These extensions were introduced in the
Intel Architecture with the Pentium® processor. When this mechanism is
either not available or not enabled, maskable hardware interrupts are handled
as class 1 interrupts. Here, if VIF and VIP flags are needed, the virtual-8086
monitor can implement them in software.
Existing 8086 programs commonly set and clear the IF flag in the EFLAGS register to enable
and disable maskable hardware interrupts, respectively; for example, to disable interrupts while
handling another interrupt or an exception. This practice works well in single task environments,
but can cause problems in multitasking and multiple-processor environments, where it is often
desirable to prevent an application program from having direct control over the handling of
hardware interrupts. When using earlier Intel Architecture processors, this problem was often
solved by creating a virtual IF flag in software. The Intel Architecture processors (beginning
with the Pentium® processor) provide hardware support for this virtual IF flag through the VIF
and VIP flags.
The VIF flag is a virtualized version of the IF flag, which an application program running from
within a virtual-8086 task can used to control the handling of maskable hardware interrupts.
When the VIF flag is enabled, the CLI and STI instructions operate on the VIF flag instead of
the IF flag. When an 8086 program executes the CLI instruction, the processor clears the VIF
flag to request that the virtual-8086 monitor inhibit maskable hardware interrupts from inter-
rupting program execution; when it executes the STI instruction, the processor sets the VIF flag
requesting that the virtual-8086 monitor enable maskable hardware interrupts for the 8086
program. But actually the IF flag, managed by the operating system, always controls whether
maskable hardware interrupts are enabled. Also, if under these circumstances an 8086 program
tries to read or change the IF flag using the PUSHF or POPF instructions, the processor will
change the VIF flag instead, leaving IF unchanged.
The VIP flag provides software a means of recording the existence of a deferred (or pending)
maskable hardware interrupt. This flag is read by the processor but never explicitly written by
the processor; it can only be written by software.
If the IF flag is set and the VIF and VIP flags are enabled, and the processor receives a maskable
hardware interrupt (interrupt vector 0 through 255), the processor performs and the interrupt
handler software should perform the following operations:
1. The processor invokes the protected-mode interrupt handler for the interrupt received, as
described in the following steps. These steps are almost identical to those described for
16-21
8086 EMULATION
16-22
8086 EMULATION
handle the pending interrupt for the virtual-8086 mode task for which the VIF flag is enabled.
Note that this situation can only occur immediately following execution of a POPF or IRET
instruction or upon entering a virtual-8086 mode task through a task switch.
Note that the states of the VIF and VIP flags are not modified in real-address mode or during
transitions between real-address and protected modes.
NOTE
The virtual interrupt mechanism described in this section is also available for
use in protected mode, refer to Section 16.4., “Protected-Mode Virtual Inter-
rupts”.
16-23
8086 EMULATION
NOTE
The software interrupt redirection bit map does not affect hardware generated
interrupts and exceptions. Hardware generated interrupts and exceptions are
always handled by the protected-mode interrupt and exception handlers.
NOTE:
* When set to 0, software interrupt is redirected back to the 8086 program interrupt handler; when set to 1,
interrupt is directed to protected-mode handler.
16-24
8086 EMULATION
Redirecting software interrupts back to the 8086 program potentially speeds up interrupt
handling because a switch back and forth between virtual-8086 mode and protected mode is not
required. This latter interrupt-handling technique is particularly useful for 8086 operating
systems (such as MS-DOS) that use the INT n instruction to call operating system procedures.
The CPUID instruction can be used to verify that the virtual mode extension is implemented on
the processor. Bit 1 of the feature flags register (EDX) indicates the availability of the virtual
mode extension (refer to “CPUID—CPU Identification” in Chapter 3 of the Intel Architecture
Software Developer’s Manual, Volume 2).
The following sections describe the six methods (or mechanisms) for handling software inter-
rupts in virtual-8086 mode. Refer to Section 16.3.2., “Class 2—Maskable Hardware Interrupt
Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism” for a description of the
use of the VIF and VIP flags in the EFLAGS register for handling maskable hardware interrupts.
16-25
8086 EMULATION
16-26
8086 EMULATION
high-order bits are set to 0. The interrupt vector table is assumed to be at linear address 0 of
the current virtual-8986 task.
7. Begins executing the selected interrupt handler.
An IRET instruction at the end of the handler procedure reverses these steps to return program
control to the interrupted 8086 program.
Note that with method 5 handling, a mode switch from virtual-8086 mode to protected mode
does not occur. The processor remains in virtual-8086 mode throughout the interrupt-handling
operation.
The method 5 handling actions are virtually identical to the actions the processor takes when
handling software interrupts in real-address mode. The benefit of using method 5 handling to
access the 8086 program handlers is that it avoids the overhead of methods 2 and 3 handling,
which requires first going to the virtual-8086 monitor, then to the 8086 program handler, then
back again to the virtual-8086 monitor, before returning to the interrupted 8086 program (refer
to Section 16.3.1.2., “Handling an Interrupt or Exception With an 8086 Program Interrupt or
Exception Handler”).
NOTE
Methods 1 and 4 handling can handle a software interrupt in a virtual-8086
task with a regular protected-mode handler, but this approach requires all
virtual-8086 tasks to use the same software interrupt handlers, which
generally does not give sufficient latitude to the programs running in the
virtual-8086 tasks, particularly MS-DOS programs.
16-27
8086 EMULATION
virtual interrupt) flag in the CR4 register. Setting the PVI flag allows applications running at
privilege level 3 to execute the CLI and STI instructions without causing a general-protection
exception (#GP) or affecting hardware interrupts.
When the PVI flag is set to 1, the CPL is 3, and the IOPL is less than 3, the STI and CLI instruc-
tions set and clear the VIF flag in the EFLAGS register, leaving IF unaffected. In this mode of
operation, an application running in protected mode and at a CPL of 3 can inhibit interrupts in
the same manner as is described in Section 16.3.2., “Class 2—Maskable Hardware Interrupt
Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism” for a virtual-8086
mode task. When the application executes the CLI instruction, the processor clears the VIF flag.
If the processor receives a maskable hardware interrupt when the VIF flag is clear, the processor
invokes the protected-mode interrupt handler. This handler checks the state of the VIF flag in
the EFLAGS register. If the VIF flag is clear (indicating that the active task does not want to
have interrupts handled now), the handler sets the VIP flag in the EFLAGS image on the stack
and returns to the privilege-level 3 application, which continues program execution. When the
application executes a STI instruction to set the VIF flag, the processor automatically invokes
the general-protection exception handler, which can then handle the pending interrupt. After
handing the pending interrupt, the handler typically sets the VIF flag and clears the VIP flag in
the EFLAGS image on the stack and executes a return to the application program. The next time
the processor receives a maskable hardware interrupt, the processor will handle it in the normal
manner for interrupts received while the processor is operating at a CPL of 3.
As with the virtual mode extension (enabled with the VME flag in the CR4 register), the
protected-mode virtual interrupt extension only affects maskable hardware interrupts (interrupt
vectors 32 through 255). NMI interrupts and exceptions are handled in the normal manner.
When protected-mode virtual interrupts are disabled (that is, when the PVI flag in control
register CR4 is set to 0, the CPL is less than 3, or the IOPL value is 3), then the CLI and STI
instructions execute in a manner compatible with the Intel486™ processor. That is, if the CPL
is greater (less privileged) than the I/O privilege level (IOPL), a general-protection exception
occurs. If the IOPL value is 3, CLI and STI clear or set the IF flag, respectively.
PUSHF, POPF, and IRET are executed like in the Intel486™ processor, regardless of whether
protected-mode virtual interrupts are enabled.
It is only possible to enter virtual-8086 mode through a task switch or the execution of an IRET
instruction, and it is only possible to leave virtual-8086 mode by faulting to a protected-mode
interrupt handler (typically the general-protection exception handler, which in turn calls the
virtual 8086-mode monitor). In both cases, the EFLAGS register is saved and restored. This is
not true, however, in protected mode when the PVI flag is set and the processor is not in virtual-
8086 mode. Here, it is possible to call a procedure at a different privilege level, in which case
the EFLAGS register is not saved or modified. However, the states of VIF and VIP flags are
never examined by the processor when the CPL is not 3.
16-28
17
Mixing 16-Bit and
32-Bit Code
MIXING 16-BIT AND 32-BIT CODE
CHAPTER 17
MIXING 16-BIT AND 32-BIT CODE
Program modules written to run on Intel Architecture processors can be either 16-bit modules
or 32-bit modules. Table 17-1 shows the characteristic of 16-bit and 32-bit modules.
The Intel Architecture processors function most efficiently when executing 32-bit program
modules. They can, however, also execute 16-bit program modules, in any of the following
ways:
• In real-address mode.
• In virtual-8086 mode.
• System management mode (SMM).
• As a protected-mode task, when the code, data, and stack segments for the task are all
configured as a 16-bit segments.
• By integrating 16-bit and 32-bit segments into a single protected-mode task.
• By integrating 16-bit operations into 32-bit code segments.
Real-address mode, virtual-8086 mode, and SMM are native 16-bit modes. A legacy program
assembled and/or compiled to run on an Intel 8086 or Intel 286 processor should run in real-
address mode or virtual-8086 mode without modification. Sixteen-bit program modules can also
be written to run in real-address mode for handling system initialization or to run in SMM for
handling system management functions. Refer to Chapter 16, 8086 Emulation for detailed infor-
mation on real-address mode and virtual-8086 mode; refer to Chapter 12, System Management
Mode (SMM) for information on SMM.
This chapter describes how to integrate 16-bit program modules with 32-bit program modules
when operating in protected mode and how to mix 16-bit and 32-bit code within 32-bit code
segments.
17-1
MIXING 16-BIT AND 32-BIT CODE
These prefixes reverse the default size selected by the D flag in the code-segment descriptor. For
example, the processor can interpret the (MOV mem, reg) instruction in any of four ways:
17-2
MIXING 16-BIT AND 32-BIT CODE
17-3
MIXING 16-BIT AND 32-BIT CODE
A stack that spans less than 64 KBytes can be shared by both 16- and 32-bit code segments. This
class of stacks includes:
• Stacks in expand-up segments with the G (granularity) and B (big) flags in the stack-
segment descriptor clear.
• Stacks in expand-down segments with the G and B flags clear.
• Stacks in expand-up segments with the G flag set and the B flag clear and where the stack
is contained completely within the lower 64 KBytes. (Offsets greater than FFFFH can be
used for data, other than the stack, which is not shared.)
Refer to Section 3.4.3., “Segment Descriptors” in Chapter 3, Protected-Mode Memory Manage-
ment for a description of the G and B flags and the expand-down stack type.
The B flag cannot, in general, be used to change the size of stack used by a 16-bit code segment.
This flag controls the size of the stack pointer only for implicit stack references such as those
caused by interrupts, exceptions, and the PUSH, POP, CALL, and RET instructions. It does not
control explicit stack references, such as accesses to parameters or local variables. A 16-bit code
segment can use a 32-bit stack only if the code is modified so that all explicit references to the
stack are preceded by the 32-bit address-size prefix, causing those references to use 32-bit
addressing and explicit writes to the stack pointer are preceded by a 32-bit operand-size prefix.
In 32-bit, expand-down segments, all offsets may be greater than 64 KBytes; therefore, 16-bit
code cannot use this kind of stack segment unless the code segment is modified to use 32-bit
addressing.
17-4
MIXING 16-BIT AND 32-BIT CODE
These methods of transferring program control overcome the following architectural limitations
imposed on calls between 16-bit and 32-bit code segments:
• Pointers from 16-bit code segments (which by default can only be 16-bits) cannot be used
to address data or code located beyond FFFFH in a 32-bit segment.
• The operand-size attributes for a CALL and its companion RETURN instruction must be
the same to maintain stack coherency. This is also true for implicit calls to interrupt and
exception handlers and their companion IRET instructions.
• A 32-bit parameters (particularly a pointer parameter) greater than FFFFH cannot be
squeezed into a 16-bit parameter location on a stack.
• The size of the stack pointer (SP or ESP) changes when switching between 16-bit and
32-bit code segments.
These limitations are discussed in greater detail in the following sections.
17-5
MIXING 16-BIT AND 32-BIT CODE
CS
EIP ESP
SS SP SS
PARM 1
CS
EIP ESP
Undefined
While executing 32-bit code, if a call is made to a 16-bit code segment which is at the same or
a more privileged level (that is, the DPL of the called code segment is less than or equal to the
CPL of the calling code segment) through a 16-bit call gate, then the upper 16-bits of the ESP
register may be unreliable upon returning to the 32-bit code segment (that is, after executing a
RET in the 16-bit code segment).
When the CALL instruction and its matching RET instruction are in code segments that have D
flags with the same values (that is, both are 32-bit code segments or both are 16-bit code
segments), the default settings may be used. When the CALL instruction and its matching RET
instruction are in segments which have different D-flag settings, an operand-size prefix must be
used.
17-6
MIXING 16-BIT AND 32-BIT CODE
17-7
MIXING 16-BIT AND 32-BIT CODE
17-8
MIXING 16-BIT AND 32-BIT CODE
The interface procedure becomes more complex if any of these rules are violated. For example,
if a 16-bit procedure calls a 32-bit procedure with an entry point beyond FFFFH, the interface
procedure will need to provide the offset to the entry point. The mapping between 16- and 32-bit
addresses is only performed automatically when a call gate is used, because the gate descriptor
for a call gate contains a 32-bit address. When a call gate is not used, the interface code must
provide the 32-bit address.
The structure of the interface procedure depends on the types of calls it is going to support, as
follows:
• Calls from 16-bit procedures to 32-bit procedures. Calls to the interface procedure from
a 16-bit code segment are made with 16-bit CALL instructions (by default, because the D
flag for the calling code-segment descriptor is clear), and 16-bit operand-size prefixes are
used with RET instructions to return from the interface procedure to the calling procedure.
Calls from the interface procedure to 32-bit procedures are performed with 32-bit CALL
instructions (by default, because the D flag for the interface procedure’s code segment is
set), and returns from the called procedures to the interface procedure are performed with
32-bit RET instructions (also by default).
• Calls from 32-bit procedures to 16-bit procedures. Calls to the interface procedure from
a 32-bit code segment are made with 32-bit CALL instructions (by default), and returns to
the calling procedure from the interface procedure are made with 32-bit RET instructions
(also by default). Calls from the interface procedure to 16-bit procedures require the CALL
instructions to have the operand-size prefixes, and returns from the called procedures to the
interface procedure are performed with 16-bit RET instructions (by default).
17-9
MIXING 16-BIT AND 32-BIT CODE
17-10
18
Intel Architecture
Compatibility
CHAPTER 18
INTEL ARCHITECTURE COMPATIBILITY
All Intel Architecture processors are binary compatible. Compatibility means that, within
certain limited constraints, programs that execute on previous generations of Intel Architecture
processors will produce identical results when executed on later Intel Architecture processors.
The compatibility constraints and any implementation differences between the Intel Architec-
ture processors are described in this chapter.
Each new Intel Architecture processor has enhanced the software visible architecture from that
found in earlier Intel Architecture processors. Those enhancements have been defined with
consideration for compatibility with previous and future processors. This chapter also summa-
rizes the compatibility considerations for those extensions.
• Do not depend on the states of any reserved bits when testing the values of registers or
memory locations that contain such bits. Mask out the reserved bits before testing.
• Do not depend on the states of any reserved bits when storing them to memory or to a
register.
18-1
INTEL ARCHITECTURE COMPATIBILITY
• Do not depend on the ability to retain information written into any reserved bits.
• When loading a register, always load the reserved bits with the values indicated in the
documentation, if any, or reload them with values previously read from the same register.
Software written for existing Intel Architecture processor that handles reserved bits correctly
will port to future Intel Architecture processors without generating protection exceptions.
18-2
INTEL ARCHITECTURE COMPATIBILITY
Table 18-1. New Instructions in the Pentium® and Later Intel Architecture Processors
Instruction CPUID Identification Bits Introduced In
Streaming SIMD Extensions EDX, Bit 25 Pentium® III processor
SYSENTER/SYSEXIT(fast system call) EDX, Bit 11 Pentium® II processor
FXSAVE/FXRSTOR(fast save/restore) EDX, Bit 24 Pentium® II processor
CMOVcc (conditional move) EDX, Bit 15 Pentium® Pro processor
FCMOVcc (floating-point conditional move) EDX, Bits 0 and 15
FCOMI (floating-point compare and set EDX, Bits 0 and 15
EFLAGS)
RDPMC (read performance monitoring EAX, Bits 8-11, set to 6H;
counters) refer to Note 1
UD2 (undefined) EAX, Bits 8-11, set to 6H
18-3
INTEL ARCHITECTURE COMPATIBILITY
Table 18-1. New Instructions in the Pentium® and Later Intel Architecture Processors
Instruction CPUID Identification Bits Introduced In
CMPXCHG8B (compare and exchange 8 EDX, Bit 8 Pentium® processor
bytes)
CPUID (CPU identification) None; refer to Note 2
RDTSC (read time-stamp counter) EDX, Bit 4
RDMSR (read model-specific register) EDX, Bit 5
WRMSR (write model-specific register) EDX, Bit 5
MMX™ Instructions EDX, Bit 23
NOTES:
1. The RDPMC instruction was introduced in the P6 family of processors and added to later model Pentium®
processors. This instruction is model specific in nature and not architectural.
2. The CPUID instruction is available in all Pentium® and P6 family processors and in later models of the
Intel486™ processors. The ability to set and clear the ID flag (bit 21) in the EFLAGS register indicates the
availability of the CPUID instruction.
18-4
INTEL ARCHITECTURE COMPATIBILITY
• RSM (resume from SMM). This instruction was introduced in the Intel386™ SL and
Intel486™ SL processors.
The following instructions were added in the Intel 387 math coprocessor:
• FPREM1.
• FUCOM, FUCOMP, and FUCOMPP.
18-5
INTEL ARCHITECTURE COMPATIBILITY
The AC flag (bit 18) was added to the EFLAGS register in the Intel486™ processor.
• Bit 18 (the AC flag) can be used to distinguish an Intel386™ processor from the P6 family,
Pentium®, and Intel486™ processors. Since it is not implemented on the Intel386™
processor, it will always be clear.
• Bit 21 (the ID flag) indicates whether an application can execute the CPUID instruction.
The ability to set and clear this bit indicates that the processor is a P6 family or Pentium®
processor. The CPUID instruction can then be used to determine which processor.
• Bits 19 (the VIF flag) and 20 (the VIP flag) will always be zero on processors that do not
support virtual mode extensions, which includes all 32-bit processors prior to the Pentium®
processor.
Refer to Chapter 10, Processor Identification and Feature Determination, in the Intel Architec-
ture Software Developer’s Manual, Volume 1, for more information on identifying processors.
18-6
INTEL ARCHITECTURE COMPATIBILITY
18.11.STACK OPERATIONS
This section identifies the differences in stack implementation between the various Intel Archi-
tecture processors.
18.11.1. PUSH SP
The P6 family, Pentium®, Intel486™, Intel386™, and Intel 286 processors push a different value
on the stack for a PUSH SP instruction than the 8086 processor. The 32-bit processors push the
value of the SP register before it is decremented as part of the push operation; the 8086 processor
pushes the value of the SP register after it is decremented. If the value pushed is important,
replace PUSH SP instructions with the following three instructions:
PUSH BP
MOV BP, SP
XCHG BP, [BP]
This code functions as the 8086 processor PUSH SP instruction on the P6 family, Pentium®,
Intel486™, Intel386™, and Intel 286 processors.
18.12.FPU
This section addresses the issues that must be faced when porting floating-point software
designed to run on earlier Intel Architecture processors and math coprocessors to a Pentium® or
P6 family processor with integrated FPU. To software, a P6 family processor looks very much
like a Pentium® processor. Floating-point software which runs on a Pentium® or Intel486™ DX
processor, or on an Intel486™ SX processor/Intel 487 SX math coprocessor system or an
Intel386™ processor/Intel 387 math coprocessor system, will run with at most minor modifica-
tions on a P6 family processor. To port code directly from an Intel 286 processor/Intel 287
math coprocessor system or an Intel 8086 processor/8087 math coprocessor system to the
Pentium® and P6 family processors, certain additional issues must be addressed.
18-7
INTEL ARCHITECTURE COMPATIBILITY
In the following sections, the term “32-bit Intel Architecture FPUs” refers to the P6 family,
Pentium®, and Intel486™ DX processors, and to the Intel 487 SX and Intel 387 math coproces-
sors; the term “16-bit Intel Architecture math coprocessors” refers to the Intel 287 and 8087
math coprocessors.
The following information pertains to differences in the use of the condition code flags (C0
through C3) located in bits 8, 9, 10, and 14 of the FPU status word.
After execution of an FINIT instruction or a hardware reset on a 32-bit Intel Architecture FPU,
the condition code flags are set to 0. The same operations on a 16-bit Intel Architecture math
18-8
INTEL ARCHITECTURE COMPATIBILITY
coprocessor leave these flags intact (they contain their prior value). This difference in operation
has no impact on software and provides a consistent state after reset.
Transcendental instruction results in the core range of the P6 family and Pentium® processors
may differ from the Intel486™ DX processor and Intel 487 SX math coprocessor by 2 to 3 units
in the last place (ulps)—(refer to “Transcendental Instruction Accuracy” in Chapter 7 of the Intel
Architecture Software Developer’s Manual, Volume 1). As a result, the value saved in the C1 flag
may also differ.
After an incomplete FPREM/FPREM1 instruction, the C0, C1, and C3 flags are set to 0 on the
32-bit Intel Architecture FPUs. After the same operation on a 16-bit Intel Architecture math
coprocessor, these flags are left intact.
On the 32-bit Intel Architecture FPUs, the C2 flag serves as an incomplete flag for the FTAN
instruction. On the 16-bit Intel Architecture math coprocessors, the C2 flag is undefined for the
FPTAN instruction. This difference has no impact on software, because Intel 287 or 8087
programs do not check C2 after an FPTAN instruction. The use of this flag on later processors
allows fast checking of operand range.
18-9
INTEL ARCHITECTURE COMPATIBILITY
FXSAVE (Pentium® III processor only) instructions examine the nonempty registers and put the
correct values in the tags before storing the tag word.
The corresponding tag for a 16-bit Intel Architecture math coprocessor is checked before each
register access to determine the class of operand in the register; the tag is updated after every
change to a register so that the tag always reflects the most recent status of the register. Software
can load a tag with a value that disagrees with the contents of a register (for example, the register
contains a valid value, but the tag says special). Here, the 16-bit Intel Architecture math copro-
cessors honor the tag and do not examine the register.
Software written to run on a 16-bit Intel Architecture math coprocessor may not operate
correctly on a 16-bit Intel Architecture FPU, if it uses FLDENV, FRSTOR, or FXRSTOR
(Pentium® III processor only) to change tags to values (other than to empty) that are different
from actual register contents.
The encoding in the tag word for the 32-bit Intel Architecture FPUs for unsupported data
formats (including pseudo-zero and unnormal) is special (10B), to comply with the IEEE Stan-
dard 754. The encoding in the 16-bit Intel Architecture math coprocessors for pseudo-zero and
unnormal is valid (00B) and the encoding for other unsupported data formats is special (10B).
Code that recognizes the pseudo-zero or unnormal format as valid must therefore be changed if
it is ported to a 32-bit Intel Architecture FPU.
18.12.5.1. NaNs
The 32-bit Intel Architecture FPUs distinguish between signaling NaNs (SNaNs) and quiet
NaNs (QNaNs). These FPUs only generate QNaNs and normally do not generate an exception
upon encountering a QNaN. An invalid operation exception (#I) is generated only upon encoun-
tering a SNaN, except for the FCOM, FIST, and FBSTP instructions, which also generates an
invalid operation exceptions for a QNaNs. This behavior matches the IEEE Standard 754.
The 16-bit Intel Architecture math coprocessors only generate one kind of NaN (the equivalent
of a QNaN), but the raise an invalid operation exception upon encountering any kind of NaN.
When porting software written to run on a 16-bit Intel Architecture math coprocessor to a 32-bit
Intel Architecture FPU, uninitialized memory locations that contain QNaNs should be changed
to SNaNs to cause the FPU or math coprocessor to fault when uninitialized memory locations
are referenced.
18-10
INTEL ARCHITECTURE COMPATIBILITY
tion, they raise an invalid operation exception. The 16-bit Intel Architecture math coprocessors
define and support special handling for these formats. Support for these formats was dropped to
conform with the IEEE Standard 754.
This change should not impact software ported from 16-bit Intel Architecture math coprocessors
to 32-bit Intel Architecture FPUs. The 32-bit Intel Architecture FPUs do not generate these
formats, and therefore will not encounter them unless software explicitly loads them in the data
registers. The only affect may be in how software handles the tags in the tag word (refer to
Section 18.12.4., “FPU Tag Word”).
18-11
INTEL ARCHITECTURE COMPATIBILITY
On the 16-bit Intel Architecture math coprocessors, the precision exception is not flagged and
the significand is not rounded. The impact on existing software is that if the result is stored on
the stack, a program running on a 32-bit Intel Architecture FPU produces a different result under
overflow conditions than on a 16-bit Intel Architecture math coprocessor. The difference is
apparent only to the exception handler. This difference is for IEEE Standard 754 compatibility.
18-12
INTEL ARCHITECTURE COMPATIBILITY
coprocessors does. If an 8086 processor uses another exception for the 8087 interrupt, both
exception vectors should call the floating-point-error exception handler. Some instructions in a
floating-point-error exception handler may need to be deleted if they use the interrupt controller.
The P6 family, Pentium®, and Intel486™ processors have signals that, with the addition of
external logic, support reporting for emulation of the interrupt mechanism used in many
personal computers.
On the P6 family, Pentium®, and Intel486™ processors, an undefined floating-point opcode will
cause an invalid-opcode exception (#UD, interrupt vector 6). Undefined floating-point opcodes,
like legal floating-point opcodes, cause a device not available exception (#NM, interrupt vector
7) when either the TS or EM flag in control register CR0 is set. The P6 family, Pentium®, and
Intel486™ processors do not check for floating-point error conditions on encountering an unde-
fined floating-point opcode.
18-13
INTEL ARCHITECTURE COMPATIBILITY
18-14
INTEL ARCHITECTURE COMPATIBILITY
18-15
INTEL ARCHITECTURE COMPATIBILITY
operand is restricted to (| ST(0) | < π/4) on the 16-bit Intel Architecture math coprocessors; the
operand must be reduced to this range using FPREM. This change has no impact on existing
software.
18-16
INTEL ARCHITECTURE COMPATIBILITY
On the 32-bit Intel Architecture FPUs, loading an SNaN that is in single- or double-real format
causes the FPU to generate an invalid operation exception. The 16-bit Intel Architecture math
coprocessors do not raise an exception when loading a signaling NaN. The invalid operation
exception handler for 16-bit math coprocessor software needs to be updated to handle this condi-
tion when porting software to 32-bit FPUs. This change was made for IEEE Standard 754
compatibility.
18-17
INTEL ARCHITECTURE COMPATIBILITY
18-18
INTEL ARCHITECTURE COMPATIBILITY
18-19
INTEL ARCHITECTURE COMPATIBILITY
The EM and MP flags in register CR0 are interpreted as shown in Table 18-2.
Following is an example code sequence to initialize the system and check for the presence of
Intel486™ SX processor/Intel 487 SX math coprocessor.
fninit
fstcw mem_loc
mov ax, mem_loc
cmp ax, 037fh
jz Intel487_SX_Math_CoProcessor_present;ax=037fh
jmp Intel486_SX_microprocessor_present;ax=ffffh
If the Intel 487 SX math coprocessor is not present, the following code can be run to set the CR0
register for the Intel486™ SX processor.
18-20
INTEL ARCHITECTURE COMPATIBILITY
This initialization will cause any floating-point instruction to generate a device not available
exception (#NH), interrupt 7. The software emulation will then take control to execute these
instructions. This code is not required if an Intel 487 SX math coprocessor is present in the
system. In that case, the typical initialization routine for the Intel486™ SX microprocessor will
be adequate.
Also, when designing an Intel486™ SX processor based system with an Intel 487 SX math
coprocessor, timing loops should be independent of clock speed and clocks per instruction. One
way to attain this is to implement these loops in hardware and not in software (for example,
BIOS).
The Pentium® Pro processor introduced three new control flags in control register CR4:
• PAE (bit 5)—Physical address extension. Enables paging mechanism to reference 36-bit
physical addresses when set; restricts physical addresses to 32 bits when clear (refer to
Section 18.16.1.1., “Physical Memory Addressing Extension” in Chapter 18, Intel Archi-
tecture Compatibility).
• PGE (bit 7)—Page global enable. Inhibits flushing of frequently-used or shared pages on
task switches (refer to Section 18.16.1.2., “Global Pages” in Chapter 18, Intel Architecture
Compatibility).
• PCE (bit 8)—Performance-monitoring counter enable. Enables execution of the RDPMC
instruction at any protection level.
18-21
INTEL ARCHITECTURE COMPATIBILITY
Control register CR4 was introduced in the Pentium® processor. This register contains flags that
enable certain new extensions provided in the Pentium® processor:
The Intel486™ processor introduced five new flags in control register CR0:
• NE—Numeric error. Enables the normal mechanism for reporting floating-point numeric
errors.
• WP—Write protect. Write-protects user-level pages against supervisor-mode accesses.
• AM—Alignment mask. Controls whether alignment checking is performed. Operates in
conjunction with the AC (Alignment Check) flag.
• NW—Not write-through. Enables write-throughs and cache invalidation cycles when clear
and disables invalidation cycles and write-throughs that hit in the cache when set.
• CD—Cache disable. Enables the internal cache when clear and disables the cache when
set.
The Intel486™ processor introduced two new flags in control register CR3:
• PCD—Page-level cache disable. The state of this flag is driven on the PCD# pin during
bus cycles that are not paged, such as interrupt acknowledge cycles, when paging is
enabled. The PCD# pin is used to control caching in an external cache on a cycle-by-cycle
basis.
• PWT—Page-level write-through. The state of this flag is driven on the PWT# pin during
bus cycles that are not paged, such as interrupt acknowledge cycles, when paging is
enabled. The PWT# pin is used to control write through in an external cache on a cycle-by-
cycle basis.
18-22
INTEL ARCHITECTURE COMPATIBILITY
18-23
INTEL ARCHITECTURE COMPATIBILITY
to Table 9-4, in Chapter 9, Memory Cache Control for a comparison of these bits on the P6
family, Pentium®, and Intel486™ processors. For complete information on caching, refer to
Chapter 9, Memory Cache Control.
18-24
INTEL ARCHITECTURE COMPATIBILITY
On the P6 family and Pentium® processors, reserved bits 11, 12, 14 and 15 are hard-wired to 0.
On the Intel486™ processor, however, bit 12 can be set. Refer to Table 8-1 in Chapter 8,
Processor Management and Initialization for the different settings of this register following a
power-up or hardware reset.
18-25
INTEL ARCHITECTURE COMPATIBILITY
No new exceptions were added to the Pentium® II and Pentium® Pro processors. The set of avail-
able exceptions is the same as for the Pentium® processor. However, the following exception
condition was added to the Intel Architecture with the Pentium® Pro processor:
• Machine-check exception (#MC, interrupt 18)—New exception conditions. Many
exception conditions have been added to the machine-check exception and a new archi-
tecture has been added for handling and reporting on hardware errors. Refer to Chapter 13,
Machine-Check Architecture for a detailed description of the new conditions.
The following exceptions and/or exception conditions were added to the Intel Architecture with
the Pentium® processor:
• Machine-check exception (#MC, interrupt 18)—New exception. This exception reports
parity and other hardware errors. It is a model-specific exception and may not be
implemented or implemented differently in future processors. The MCE flag in control
register CR4 enables the machine-check exception. When this bit is clear (which it is at
reset), the processor inhibits generation of the machine-check exception.
• General-protection exception (#GP, interrupt 13)—New exception condition added. An
attempt to write a 1 to a reserved bit position of a special register causes a general-
protection exception to be generated.
• Page-fault exception (#PF, interrupt 14)—New exception condition added. When a 1 is
detected in any of the reserved bit positions of a page-table entry, page-directory entry, or
page-directory pointer during address translation, a page-fault exception is generated.
The following exceptions and/or exception conditions were added to the Intel386™ processor:
• Divide-error exception (#DE, interrupt 0)
— Change in exception handling. Divide-error exceptions on the Intel386™ processors
always leave the saved CS:IP value pointing to the instruction that failed. On the 8086
processor, the CS:IP value points to the next instruction.
— Change in exception handling. The Intel386™ processors can generate the largest
negative number as a quotient for the IDIV instruction (80H and 8000H). The 8086
processor generates a divide-error exception instead.
• Invalid-opcode exception (#UD, interrupt 6)—New exception condition added. Improper
use of the LOCK instruction prefix can generate an invalid-opcode exception.
• Page-fault exception (#PF, interrupt 14)—New exception condition added. If paging is
enabled in a 16-bit program, a page-fault exception can be generated as follows. Paging
can be used in a system with 16-bit tasks if all tasks use the same page directory. Because
there is no place in a 16-bit TSS to store the PDBR register, switching to a 16-bit task does
not change the value of the PDBR register. Tasks ported from the Intel 286 processor
should be given 32-bit TSSs so they can make full use of paging.
18-26
INTEL ARCHITECTURE COMPATIBILITY
18.20. INTERRUPTS
The following differences in handling interrupts are found among the Intel Architecture
processors.
18-27
INTEL ARCHITECTURE COMPATIBILITY
18-28
INTEL ARCHITECTURE COMPATIBILITY
18-29
INTEL ARCHITECTURE COMPATIBILITY
FFFFH FFFFH
FFFFH + 10H = FH
for I/O Validation
0H 0H
I/O access at port 10H checks I/O access at port 10H checks
bitmap at I/O map base address bitmap at I/O address FFFFH + 10H,
FFFFH + 10H = offset 10H. which exceeds segment limit.
Offset FH from beginning of Wrap around does not occur,
TSS segment results because general-protection exception (#GP)
wraparound occurs. occurs.
18-30
INTEL ARCHITECTURE COMPATIBILITY
for more information about hardware control of the Pentium® processor caches. In the P6 family
processors, the MTRRs can be used to override the CD and NW flags (refer to Table 9-6, in
Chapter 9, Memory Cache Control).
The P6 family and Pentium® processors support page-level cache management in the same
manner as the Intel486™ processor by using the PCD and PWT flags in control register CR3,
the page-directory entries, and the page-table entries. The Intel486™ processor, however, is not
affected by the state of the PWT flag since the internal cache of the Intel486™ processor is a
write-through cache.
NOTE
The check on linear addresses described above is not in practice a concern for
compatibility. Applications that include self-modifying code use the same
linear address for modifying and fetching the instruction. System software,
such as a debugger, that might possibly modify an instruction using a
different linear address than that used to fetch the instruction must execute a
serializing operation, such as IRET, before the modified instruction is
executed.
18.23. PAGING
This section identifies enhancements made to the paging mechanism and implementation differ-
ences in the paging mechanism for various Intel Architecture processors.
18-31
INTEL ARCHITECTURE COMPATIBILITY
1. Execute a MOV CR0, REG instruction to either set (enable paging) or clear (disable
paging) the PG flag.
2. Execute a near JMP instruction.
The sequence bounded by the MOV and JMP instructions should be identity mapped (that is,
the instructions should reside on a page whose linear and physical addresses are identical).
For the P6 family processors, the MOV CR0, REG instruction is serializing, so the jump oper-
ation is not required. However, for backwards compatibility, the JMP instruction should still be
included.
18-32
INTEL ARCHITECTURE COMPATIBILITY
18-33
INTEL ARCHITECTURE COMPATIBILITY
is a fault from a branch instruction occurring from a segment limit or access rights violation. If
a branch fault is taken, the Intel486™ and P6 family processors will have corrupted memory
below the stack pointer. However, the ESP register is backed up to make the instruction restart-
able. The P6 family processors issue the branch before the pushes. Therefore, if a branch fault
does occur, these processors do not corrupt memory below the stack pointer. This implementa-
tion difference, however, does not constitute a compatibility problem, as only values at or above
the stack pointer are considered to be valid.
• Base Address—The upper 8 bits of the 32-bit base address are clear, which limits base
addresses to 24 bits.
• Limit—The upper 4 bits of the limit field are clear, restricting the value of the limit field to
64 Kbytes.
• Granularity bit—The G (granularity) flag is clear, indicating the value of the 16-bit limit is
interpreted in units of 1 byte.
18-34
INTEL ARCHITECTURE COMPATIBILITY
• Big bit—In a data-segment descriptor, the B flag is clear in the segment descriptor used by
the 32-bit processors, indicating the segment is no larger than 64 Kbytes.
• Default bit—In a code-segment descriptor, the D flag is clear, indicating 16-bit addressing
and operands are the default. In a stack-segment descriptor, the D flag is clear, indicating
use of the SP register (instead of the ESP register) and a 64-Kbyte maximum segment
limit.
For information on mixing 16- and 32-bit code in applications, refer to Chapter 17, Mixing 16-
Bit and 32-Bit Code.
18-35
INTEL ARCHITECTURE COMPATIBILITY
18-36
INTEL ARCHITECTURE COMPATIBILITY
modify-write access and, if the cache line has been modified, writes the contents back to
memory before locking the bus. The P6 family processors write to their cache on a read-modify-
write operation (if the access does not split across a cache line) and does not write back to system
memory. If the access does split across a cache line, it locks the bus and accesses system
memory.
I/O reads are never reordered in front of buffered memory writes on an Intel Architecture
processor. This ensures an update of all memory locations before reading the status from an I/O
device.
18-37
INTEL ARCHITECTURE COMPATIBILITY
• Porting selected 16-bit applications to run in a 32-bit processor environment with a 32-bit
operating system, loader, and system builder. Here, the TSSs used to represent 286 tasks
should be changed to 32-bit TSSs. It is possible to mix 16 and 32-bit TSSs, but the benefits
are small and the problems are great. All tasks in a 32-bit software system should have 32-
bit TSSs. It is not necessary to change the 16-bit object modules themselves; TSSs are
usually constructed by the operating system, by the loader, or by the system builder. Refer
to Chapter 17, Mixing 16-Bit and 32-Bit Code for more detailed information about mixing
16-bit and 32-bit code.
Because the 32-bit processors use the contents of the reserved word of 16-bit segment descrip-
tors, 16-bit programs that place values in this word may not run correctly on the 32-bit
processors.
18-38
INTEL ARCHITECTURE COMPATIBILITY
18-39
INTEL ARCHITECTURE COMPATIBILITY
18-40
A
Performance-
Monitoring Events
PERFORMANCE-MONITORING EVENTS
APPENDIX A
PERFORMANCE-MONITORING EVENTS
This appendix contains list of the performance-monitoring events that can be monitored with the
Intel Architecture processors. In the Intel Architecture processors, the ability to monitor perfor-
mance events and the events that can be monitored are model specific. Section A.1., “P6 Family
Processor Performance-Monitoring Events” lists and describes the events that can be monitored
with the P6 family of processors. Section A.2., “Pentium® Processor Performance-Monitoring
Events” lists and describes the events that can be monitored with Pentium® processors.
A-1
PERFORMANCE-MONITORING EVENTS
Table A-1. Events That Can Be Counted with the P6 Family Performance-
Monitoring Counters
Event Mnemonic Event Unit
Unit Num. Name Mask Description Comments
Data Cache 43H DATA_MEM_REFS 00H All loads from any memory type.
Unit (DCU) All stores to any memory type.
Each part of a split is counted
separately. The internal logic
counts not only memory loads
and stores, but also internal
retries.
Includes UC accesses.
85H ITLB_MISS 00H Number of ITLB misses.
A-2
PERFORMANCE-MONITORING EVENTS
Table A-1. Events That Can Be Counted with the P6 Family Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event Unit
Unit Num. Name Mask Description Comments
86H IFU_MEM_STALL 00H Number of cycles instruction
fetch is stalled, for any reason.
A-3
PERFORMANCE-MONITORING EVENTS
Table A-1. Events That Can Be Counted with the P6 Family Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event Unit
Unit Num. Name Mask Description Comments
26H L2_LINES_OUT 00H Number of lines removed from
the L2 for any reason.
25H L2_M_LINES_INM 00H Number of modified lines
allocated in the L2.
27H L2_M_LINES_OUTM 00H Number of modified lines
removed from the L2 for any
reason.
2EH L2_RQSTS MESI Total number of L2 requests.
0FH
21H L2_ADS 00H Number of L2 address strobes.
22H L2_DBUS_BUSY 00H Number of cycles during which
the L2 cache data bus was busy.
23H L2_DBUS_BUSY_RD 00H Number of cycles during which
the data bus was busy
transferring read data from L2 to
the processor.
External Bus 62H BUS_DRDY_ 00H Number of clocks during which Unit Mask = 00H counts
Logic (EBL)2 CLOCKS (Self) DRDY# is asserted. bus clocks when the
20H processor is driving
(Any) Utilization of the external system DRDY#.
data bus during data transfers.
Unit Mask = 20H counts
in processor clocks
when any agent is
driving DRDY#.
63H BUS_LOCK_ 00H Number of clocks during which Always counts in
CLOCKS (Self) LOCK# is asserted on the processor clocks.
20H external system bus.3
(Any)
60H BUS_REQ_ 00H Number of bus requests Counts only DCU full-
OUTSTANDING (Self) outstanding. line cacheable reads, not
RFOs, writes, instruction
This counter is incremented by fetches, or anything else.
the number of cacheable read Counts “waiting for bus
bus requests outstanding in any to complete” (last data
given cycle. chunk received).
65H BUS_TRAN_BRD 00H Number of burst read
(Self) transactions.
20H
(Any)
66H BUS_TRAN_RFO 00H Number of completed read for
(Self) ownership transactions.
20H
(Any)
67H BUS_TRANS_WB 00H Number of completed write back
(Self) transactions.
20H
(Any)
68H BUS_TRAN_ 00H Number of completed instruction
IFETCH (Self) fetch transactions.
20H
(Any)
A-4
PERFORMANCE-MONITORING EVENTS
Table A-1. Events That Can Be Counted with the P6 Family Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event Unit
Unit Num. Name Mask Description Comments
69H BUS_TRAN_INVAL 00H Number of completed invalidate
(Self) transactions.
20H
(Any)
6AH BUS_TRAN_PWR 00H Number of completed partial
(Self) write transactions.
20H
(Any)
6BH BUS_TRANS_P 00H Number of completed partial
(Self) transactions.
20H
(Any)
6CH BUS_TRANS_IO 00H Number of completed I/O
(Self) transactions.
20H
(Any)
6DH BUS_TRAN_DEF 00H Number of completed deferred
(Self) transactions.
20H
(Any)
6EH BUS_TRAN_BURST 00H Number of completed burst
(Self) transactions.
20H
(Any)
70H BUS_TRAN_ANY 00H Number of all completed bus
(Self) transactions.
20H
(Any) Address bus utilization can be
calculated knowing the minimum
address bus occupancy.
A-5
PERFORMANCE-MONITORING EVENTS
Table A-1. Events That Can Be Counted with the P6 Family Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event Unit
Unit Num. Name Mask Description Comments
7AH BUS_HIT_DRV 00H Number of bus clock cycles Includes cycles due to
(Self) during which this processor is snoop stalls.
driving the HIT# pin.
The event counts
correctly, but the BPMi
pins function as follows
based on the setting of
the PC bits (bit 19 in the
PerfEvtSel0 and
PerfEvtSel1 registers):
A-6
PERFORMANCE-MONITORING EVENTS
Table A-1. Events That Can Be Counted with the P6 Family Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event Unit
Unit Num. Name Mask Description Comments
Floating- C1H FLOPS 00H Number of computational Counter 0 only.
Point Unit floating-point operations retired.
Excludes floating-point
computational operations that
cause traps or assists.
Includes floating-point
computational operations
executed by the assist handler.
A-7
PERFORMANCE-MONITORING EVENTS
Table A-1. Events That Can Be Counted with the P6 Family Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event Unit
Unit Num. Name Mask Description Comments
Memory 03H LD_BLOCKS 00H Number of store buffer blocks.
Ordering
Includes counts caused by
preceding stores whose
addresses are unknown,
preceding stores whose
addresses are known but whose
data is unknown, and preceding
stores that conflicts with the load
but which incompletely overlap
the load.
04H SB_DRAINS 00H Number of store buffer drain
cycles.
A-8
PERFORMANCE-MONITORING EVENTS
Table A-1. Events That Can Be Counted with the P6 Family Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event Unit
Unit Num. Name Mask Description Comments
Instruction C0H INST_RETIRED OOH Number of instructions retired. A hardware interrupt
Decoding received during/after the
and last iteration of the REP
Retirement STOS flow causes the
counter to undercount by
1 instruction.
C2H UOPS_RETIRED 00H Number of UOPs retired.
D0H INST_DECODED 00H Number of instructions decoded.
D8H EMON_KNI_INST_ Number of Streaming SIMD Counters 0 and 1.
RETIRED extensions retired Pentium® III processor
0: packed & scalar only.
00H 1: scalar
01H
D9H EMON_KNI_COMP_ Number of Streaming SIMD Counters 0 and 1.
INST_RET extensions computation Pentium® III processor
instructions retired. only.
0: packed and scalar
00H 1: scalar
01H
Interrupts C8H HW_INT_RX 00H Number of hardware interrupts
received.
C6H CYCLES_INT_ 00H Number of processor cycles for
MASKED which interrupts are disabled.
C7H CYCLES_INT_ 00H Number of processor cycles for
PENDING_ which interrupts are disabled
AND_MASKED and interrupts are pending.
Branches C4H BR_INST_RETIRED 00H Number of branch instructions
retired.
C5H BR_MISS_PRED_ 00H Number of mispredicted
RETIRED branches retired.
C9H BR_TAKEN_ 00H Number of taken branches
RETIRED retired.
CAH BR_MISS_PRED_ 00H Number of taken mispredictions
TAKEN_RET branches retired.
E0H BR_INST_DECODED 00H Number of branch instructions
decoded.
E2H BTB_MISSES 00H Number of branches for which
the BTB did not produce a
prediction.
E4H BR_BOGUS 00H Number of bogus branches.
E6H BACLEARS 00H Number of times BACLEAR is
asserted.
A-9
PERFORMANCE-MONITORING EVENTS
Table A-1. Events That Can Be Counted with the P6 Family Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event Unit
Unit Num. Name Mask Description Comments
Stalls A2H RESOURCE_STALLS 00H Incremented by 1 during every
cycle for which there is a
resource related stall.
A-10
PERFORMANCE-MONITORING EVENTS
Table A-1. Events That Can Be Counted with the P6 Family Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event Unit
Unit Num. Name Mask Description Comments
CCH FP_MMX_TRANS 00H Transitions from MMX™ Available in Pentium® II
instruction to floating-point & Pentium® III
instructions. processors only.
01H Transitions from floating-point
instructions to MMX ™
instructions.
CDH MMX_ASSIST 00H Number of MMX™ Assists (that Available in Pentium® II
is, the number of EMMS & Pentium® III
instructions executed). processors only.
CEH MMX_INSTR_RET 00H Number of MMX™ Instructions Available in Pentium® II
Retired. processor only.
Segment D4H SEG_RENAME_ Number of Segment Register Available in Pentium® II
Register STALLS Renaming Stalls: & Pentium® III
Renaming Segment register ES processors only.
01H Segment register DS
02H Segment register FS
04H Segment register FS
08H Segment registers ES + DS +
0FH FS + GS
D5H SEG_REG_ Number of Segment Register Available in Pentium® II
RENAMES Renames: & Pentium® III
01H Segment register ES processors only.
02H Segment register DS
04H Segment register FS
08H Segment register FS
0FH Segment registers ES + DS +
FS + GS
D6H RET_SEG_ 00H Number of segment register Available in Pentium® II
RENAMES rename events retired. & Pentium® III
processors only.
NOTES:
1. Several L2 cache events, where noted, can be further qualified using the Unit Mask (UMSK) field in the
PerfEvtSel0 and PerfEvtSel1 registers. The lower 4 bits of the Unit Mask field are used in conjunction
with L2 events to indicate the cache state or cache states involved. The P6 family processors identify
cache states using the “MESI” protocol and consequently each bit in the Unit Mask field represents one of
the four states: UMSK[3] = M (8H) state, UMSK[2] = E (4H) state, UMSK[1] = S (2H) state, and UMSK[0]
= I (1H) state. UMSK[3:0] = MESI” (FH) should be used to collect data for all states; UMSK = 0H, for the
applicable events, will result in nothing being counted.
2. All of the external bus logic (EBL) events, except where noted, can be further qualified using the Unit
Mask (UMSK) field in the PerfEvtSel0 and PerfEvtSel1 registers. Bit 5 of the UMSK field is used in con-
junction with the EBL events to indicate whether the processor should count transactions that are self-
generated (UMSK[5] = 0) or transactions that result from any processor on the bus (UMSK[5] = 1).
3. L2 cache locks, so it is possible to have a zero count.
A-11
PERFORMANCE-MONITORING EVENTS
NOTE
The events in the table that are shaded are implemented only in the Pentium®
processor with MMX technology.
Table A-2. Events That Can Be Counted with the Pentium® Processor Performance-
Monitoring Counters
Event Mnemonic Event
Num. Name Description Comments
00H DATA_READ Number of memory data Split cycle reads are counted
reads (internal data cache individually. Data Memory Reads that
hit and miss combined). are part of TLB miss processing are not
included. These events may occur at a
maximum of two per clock. I/O is not
included.
01H DATA_WRITE Number of memory data Split cycle writes are counted
writes (internal data cache individually. These events may occur at
hit and miss combined), a maximum of two per clock. I/O is not
I/O is not included. included.
0H2 DATA_TLB_MISS Number of misses to the
data cache translation
look-aside buffer.
03H DATA_READ_MISS Number of memory read Additional reads to the same cache line
accesses that miss the after the first BRDY# of the burst line fill
internal data cache is returned but before the final (fourth)
whether or not the access BRDY# has been returned, will not
is cacheable or cause the counter to be incremented
noncacheable. additional times. Data accesses that
are part of TLB miss processing are not
included. Accesses directed to I/O
space are not included.
04H DATA WRITE MISS Number of memory write Data accesses that are part of TLB
accesses that miss the miss processing are not included.
internal data cache Accesses directed to I/O space are not
whether or not the access included.
is cacheable or
noncacheable.
A-12
PERFORMANCE-MONITORING EVENTS
Table A-2. Events That Can Be Counted with the Pentium® Processor Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event
Num. Name Description Comments
05H WRITE_HIT_TO_ Number of write hits to These are the writes that may be held
M-_OR_E- exclusive or modified lines up if EWBE# is inactive. These events
STATE_LINES in the data cache. may occur a maximum of two per clock.
06H DATA_CACHE_ Number of dirty lines (all) Replacements and internal and external
LINES_ that are written back, snoops can all cause writeback and are
WRITTEN_BACK regardless of the cause. counted.
07H EXTERNAL_ Number of accepted Assertions of EADS# outside of the
SNOOPS external snoops whether sampling interval are not counted, and
they hit in the code cache no internal snoops are counted.
or data cache or neither.
08H EXTERNAL_DATA_ Number of external Snoop hits to a valid line in either the
CACHE_SNOOP_ snoops to the data cache. data cache, the data line fill buffer, or
HITS one of the write back buffers are all
counted as hits.
09H MEMORY Number of data memory These accesses are not necessarily run
ACCESSES IN reads or writes that are in parallel due to cache misses, bank
BOTH PIPES paired in both pipes of the conflicts, etc.
pipeline.
0AH BANK CONFLICTS Number of actual bank
conflicts.
0BH MISALIGNED DATA Number of memory or I/O A 2- or 4-byte access is misaligned
MEMORY OR I/O reads or writes that are when it crosses a 4-byte boundary; an
REFERENCES misaligned. 8-byte access is misaligned when it
crosses an 8-byte boundary. Ten byte
accesses are treated as two separate
accesses of 8 and 2 bytes each.
0CH CODE READ Number of instruction Individual 8-byte noncacheable
reads whether the read is instruction reads are counted.
cacheable or
noncacheable.
0DH CODE TLB MISS Number of instruction Individual 8-byte noncacheable
reads that miss the code instruction reads are counted.
TLB whether the read is
cacheable or
noncacheable.
0EH CODE CACHE MISS Number of instruction Individual 8-byte noncacheable
reads that miss the instruction reads are counted.
internal code cache
whether the read is
cacheable or
noncacheable.
A-13
PERFORMANCE-MONITORING EVENTS
Table A-2. Events That Can Be Counted with the Pentium® Processor Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event
Num. Name Description Comments
0FH ANY SEGMENT Number of writes into any Segment loads are caused by explicit
REGISTER LOADED segment register in real or segment register load instructions, far
protected mode including control transfers, and task switches. Far
the LDTR, GDTR, IDTR, control transfers and task switches
and TR. causing a privilege level change will
signal this event twice. Note that
interrupts and exceptions may initiate a
far control transfer.
10H Reserved
11H Reserved
12H Branches Number of taken and not Also counted as taken branches are
taken branches, including serializing instructions, VERR and
conditional branches, VERW instructions, some segment
jumps, calls, returns, descriptor loads, hardware interrupts
software interrupts, and (including FLUSH#), and programmatic
interrupt returns. exceptions that invoke a trap or fault
handler. The pipe is not necessarily
flushed. The number of branches
actually executed is measured, not the
number of predicted branches.
13H BTB_HITS Number of BTB hits that Hits are counted only for those
occur. instructions that are actually executed.
14H TAKEN_BRANCH_ Number of taken This event type is a logical OR of taken
OR_BTB_HIT branches or BTB hits that branches and BTB hits. It represents an
occur. event that may cause a hit in the BTB.
Specifically, it is either a candidate for a
space in the BTB or it is already in the
BTB.
15H PIPELINE FLUSHES Number of pipeline The counter will not be incremented for
flushes that occur. serializing instructions (serializing
Pipeline flushes are instructions cause the prefetch queue
caused by BTB misses on to be flushed but will not trigger the
taken branches, Pipeline Flushed event counter) and
mispredictions, software interrupts (software interrupts
exceptions, interrupts, do not flush the pipeline).
and some segment
descriptor loads.
A-14
PERFORMANCE-MONITORING EVENTS
Table A-2. Events That Can Be Counted with the Pentium® Processor Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event
Num. Name Description Comments
16H INSTRUCTIONS_ Number of instructions Invocations of a fault handler are
EXECUTED executed (up to two per considered instructions. All hardware
clock). and software interrupts and exceptions
will also cause the count to be
incremented. Repeat prefixed string
instructions will only increment this
counter once despite the fact that the
repeat loop executes the same
instruction multiple times until the loop
criteria is satisfied. This applies to all
the Repeat string instruction prefixes
(i.e., REP, REPE, REPZ, REPNE, and
REPNZ). This counter will also only
increment once per each HLT
instruction executed regardless of how
many cycles the processor remains in
the HALT state.
17H INSTRUCTIONS_ Number of instructions This event is the same as the 16H
EXECUTED_ V PIPE executed in the V_pipe. It event except it only counts the number
indicates the number of of instructions actually executed in the
instructions that were V-pipe.
paired.
18H BUS_CYCLE_ Number of clocks while a The count includes HLDA, AHOLD, and
DURATION bus cycle is in progress. BOFF# clocks.
This event measures bus
use.
19H WRITE_BUFFER_ Number of clocks while Full write buffers stall data memory
FULL_STALL_ the pipeline is stalled due read misses, data memory write
DURATION to full write buffers. misses, and data memory write hits to
S-state lines. Stalls on I/O accesses are
not included.
1AH WAITING_FOR_ Number of clocks while Data TLB Miss processing is also
DATA_MEMORY_ the pipeline is stalled included in the count. The pipeline stalls
READ_STALL_ while waiting for data while a data memory read is in progress
DURATION memory reads. including attempts to read that are not
bypassed while a line is being filled.
1BH STALL ON WRITE Number of stalls on writes
TO AN E- OR M- to E- or M-state lines
STATE LINE
1CH LOCKED BUS Number of locked bus Only the read portion of the locked
CYCLE cycles that occur as the read-modify-write is counted. Split
result of the LOCK prefix locked cycles (SCYC active) count as
or LOCK instruction, two separate accesses. Cycles
page-table updates, and restarted due to BOFF# are not re-
descriptor table updates. counted.
A-15
PERFORMANCE-MONITORING EVENTS
Table A-2. Events That Can Be Counted with the Pentium® Processor Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event
Num. Name Description Comments
1DH I/O READ OR Number of bus cycles Misaligned I/O accesses will generate
WRITE CYCLE directed to I/O space. two bus cycles. Bus cycles restarted
due to BOFF# are not re-counted.
1EH NONCACHEABLE_ Number of noncacheable Cycles restarted due to BOFF# are not
MEMORY_READS instruction or data re-counted.
memory read bus cycles.
Count includes read
cycles caused by TLB
misses, but does not
include read cycles to I/O
space.
1FH PIPELINE_AGI_ Number of address An AGI occurs when the instruction in
STALLS generation interlock (AGI) the execute stage of either of U- or V-
stalls. An AGI occurring in pipelines is writing to either the index or
both the U- and V- base address register of an instruction
pipelines in the same in the D2 (address generation) stage of
clock signals this event either the U- or V- pipelines.
twice.
20H Reserved
21H Reserved
22H FLOPS Number of floating-point Number of floating-point adds,
operations that occur. subtracts, multiplies, divides,
remainders, and square roots are
counted. The transcendental
instructions consist of multiple adds and
multiplies and will signal this event
multiple times. Instructions generating
the divide-by-zero, negative square
root, special operand, or stack
exceptions will not be counted.
Instructions generating all other
floating-point exceptions will be
counted. The integer multiply
instructions and other instructions
which use the FPU will be counted.
A-16
PERFORMANCE-MONITORING EVENTS
Table A-2. Events That Can Be Counted with the Pentium® Processor Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event
Num. Name Description Comments
23H BREAKPOINT Number of matches on The counters is incremented regardless
MATCH ON DR0 register DR0 breakpoint. if the breakpoints are enabled or not.
REGISTER However, if breakpoints are not
enabled, code breakpoint matches will
not be checked for instructions
executed in the V-pipe and will not
cause this counter to be incremented.
(They are checked on instruction
executed in the U-pipe only when
breakpoints are not enabled.) These
events correspond to the signals driven
on the BP[3:0] pins. Refer to Chapter
15, Debugging and Performance
Monitoring, for more information.
24H BREAKPOINT Number of matches on Refer to comment for 23H event.
MATCH ON DR1 register DR1 breakpoint.
REGISTER
25H BREAKPOINT Number of matches on Refer to comment for 23H event.
MATCH ON DR2 register DR2 breakpoint.
REGISTER
26H BREAKPOINT Number of matches on Refer to comment for 23H event.
MATCH ON DR3 register DR3 breakpoint.
REGISTER
27H HARDWARE Number of taken INTR
INTERRUPTS and NMI interrupts.
28H DATA_READ_OR_ Number of memory data Split cycle reads and writes are counted
WRITE reads and/or writes individually. Data Memory Reads that
(internal data cache hit are part of TLB miss processing are not
and miss combined). included. These events may occur at a
maximum of two per clock. I/O is not
included.
29H DATA_READ_MISS Number of memory read Additional reads to the same cache line
OR_WRITE MISS and/or write accesses that after the first BRDY# of the burst line fill
miss the internal data is returned but before the final (fourth)
cache whether or not the BRDY# has been returned, will not
access is cacheable or cause the counter to be incremented
noncacheable. additional times. Data accesses that
are part of TLB miss processing are not
included. Accesses directed to I/O
space are not included.
A-17
PERFORMANCE-MONITORING EVENTS
Table A-2. Events That Can Be Counted with the Pentium® Processor Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event
Num. Name Description Comments
2AH BUS_OWNERSHIP_ The time from LRM bus The ratio of the 2AH events counted on
LATENCY (Counter ownership request to bus counter 0 and counter 1 is the average
0) ownership granted (that stall time due to bus ownership conflict.
is, the time from the
earlier of a PBREQ (0),
PHITM# or HITM#
assertion to a PBGNT
assertion).
2AH BUS OWNERSHIP The number of buss The ratio of the 2AH events counted on
TRANSFERS ownership transfers (that counter 0 and counter 1 is the average
(Counter 1) is, the number of PBREQ stall time due to bus ownership conflict.
(0) assertions.
2BH MMX_ Number of MMX™
INSTRUCTIONS_ instructions executed in
EXECUTED_ the U-pipe.
U-PIPE (Counter 0)
2BH MMX_ Number of MMX™
INSTRUCTIONS_ instructions executed in
EXECUTED_ the V-pipe.
V-PIPE (Counter 1)
2CH CACHE_M- Number of times a If the average memory latencies of the
STATE_LINE_ processor identified a hit system are known, this event enables
SHARING to a modified line due to a the user to count the Write Backs on
(Counter 0) memory access in the PHITM(O) penalty and the Latency on
other processor (PHITM Hit Modified(I) penalty.
(O)).
2CH CACHE_LINE_ Number of shared data
SHARING lines in the L1 cache
(Counter 1) (PHIT (O)).
2DH EMMS_ Number of EMMS
INSTRUCTIONS_ instructions executed.
EXECUTED
(Counter 0)
2DH TRANSITIONS_ Number of transitions This event counts the first floating-point
BETWEEN_MMX_ between MMX™ and instruction following an MMX™
AND_FP_ floating-point instructions instruction or first MMX™ instruction
INSTRUCTIONS or vice versa. An even following a floating-point instruction.
(Counter 1) count indicates the The count may be used to estimate the
processor is in MMX™ penalty in transitions between floating-
state. an odd count point state and MMX™ state.
indicates it is in FP state.
2DH BUS_UTILIZATION_ Number of clocks the bus
DUE_TO_ is busy due to the
PROCESSOR_ processor’s own activity,
ACTIVITY i.e., the bus activity that is
(Counter 0) caused by the processor.
A-18
PERFORMANCE-MONITORING EVENTS
Table A-2. Events That Can Be Counted with the Pentium® Processor Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event
Num. Name Description Comments
2EH WRITES_TO_ Number of write accesses The count includes write cycles caused
NONCACHEABLE_ to noncacheable memory. by TLB misses and I/O write cycles.
MEMORY Cycles restarted due to BOFF# are not
(Counter 1) re-counted.
2FH SATURATING_ Number of saturating
MMX_ MMX™ instructions
INSTRUCTIONS_ executed, independently
EXECUTED of whether they actually
(Counter 0) saturated.
2FH SATURATIONS_ Number of MMX™ If an MMX™ instruction operating on 4
PERFORMED instructions that used doublewords saturated in three out of
(Counter 1) saturating arithmetic and the four results, the counter will be
that at least one of its incremented by one only.
results actually saturated.
30H NUMBER_OF_ Number of cycles the This event will enable the user to
CYCLES_NOT_IN_ processor is not idle due calculate “net CPI”. Note that during the
HALT_STATE to HLT instruction. time that the processor is executing the
(Counter 0) HLT instruction, the Time-Stamp
Counter is not disabled. Since this
event is controlled by the Counter
Controls CC0, CC1 it can be used to
calculate the CPI at CPL=3, which the
TSC cannot provide.
30H DATA_CACHE_ Number of clocks the
TLB_MISS_ pipeline is stalled due to a
STALL_DURATION data cache translation
(Counter 1) look-aside buffer (TLB)
miss.
31H MMX_ Number of MMX™
INSTRUCTION_ instruction data reads.
DATA_READS
(Counter 0)
31H MMX_ Number of MMX™
INSTRUCTION_ instruction data read
DATA_READ_ misses.
MISSES
(Counter 1)
32H FLOATING_POINT_ Number of clocks while
STALLS_DURATION pipe is stalled due to a
(Counter 0) floating-point freeze.
32H TAKEN_BRANCHES Number of taken
(Counter 1) branches.
33H D1_STARVATION_ Number of times D1 stage The D1 stage can issue 0, 1, or 2
AND_FIFO_IS_ cannot issue ANY instructions per clock if those are
EMPTY instructions since the available in an instructions FIFO buffer.
(Counter 0) FIFO buffer is empty.
A-19
PERFORMANCE-MONITORING EVENTS
Table A-2. Events That Can Be Counted with the Pentium® Processor Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event
Num. Name Description Comments
33H D1_STARVATION_ Number of times the D1 The D1 stage can issue 0, 1, or 2
AND_ONLY_ONE_ stage issues just a single instructions per clock if those are
INSTRUCTION_IN_ instruction since the FIFO available in an instructions FIFO buffer.
FIFO buffer had just one When combined with the previously
(Counter 1) instruction ready. defined events, Instruction Executed
(16H) and Instruction Executed in the V-
pipe (17H), this event enables the user
to calculate the numbers of time pairing
rules prevented issuing of two
instructions.
34H MMX_ Number of data writes
INSTRUCTION_ caused by MMX™
DATA_WRITES instructions.
(Counter 0)
34H MMX_ Number of data write
INSTRUCTION_ misses caused by MMX™
DATA_WRITE_ instructions.
MISSES
(Counter 1)
35H PIPELINE_ Number of pipeline The count includes any pipeline flush
FLUSHES_DUE_ flushes due to wrong due to a branch that the pipeline did not
TO_WRONG_ branch predictions follow correctly. It includes cases where
BRANCH_ resolved in either the E- a branch was not in the BTB, cases
PREDICTIONS stage or the WB-stage. where a branch was in the BTB but was
(Counter 0) mispredicted, and cases where a
branch was correctly predicted but to
the wrong address. Branches are
resolved in either the Execute stage (E-
stage) or the Writeback stage (WB-
stage). In the later case, the
misprediction penalty is larger by one
clock. The difference between the 35H
event count in counter 0 and counter 1
is the number of E-stage resolved
branches.
35H PIPELINE_ Number of pipeline Refer to note for event 35H (Counter 0).
FLUSHES_DUE_ flushes due to wrong
TO_WRONG_ branch predictions
BRANCH_ resolved in the WB-stage.
PREDICTIONS_
RESOLVED_IN_
WB-STAGE (Counter
1)
36H MISALIGNED_ Number of misaligned
DATA_MEMORY_ data memory references
REFERENCE_ON_ when executing MMX™
MMX_ instructions.
INSTRUCTIONS
(Counter 0)
A-20
PERFORMANCE-MONITORING EVENTS
Table A-2. Events That Can Be Counted with the Pentium® Processor Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event
Num. Name Description Comments
36H PIPELINE_ Number clocks during
ISTALL_FOR_MMX_ pipeline stalls caused by
INSTRUCTION_ waits form MMX™
DATA_MEMORY_ instruction data memory
READS reads.
(Counter 1)
37H MISPREDICTED_ Number of returns The count is the difference between the
OR_ predicted incorrectly or total number of executed returns and
UNPREDICTED_ not predicted at all. the number of returns that were
RETURNS correctly predicted. Only RET
(Counter 1) instructions are counted (for example,
IRET instructions are not counted).
37H PREDICTED_ Number of predicted Only RET instructions are counted (for
RETURNS returns (whether they are example, IRET instructions are not
(Counter 1) predicted correctly and counted).
incorrectly.
38H MMX_MULTIPLY_ Number of clocks the pipe The counter will not be incremented if
UNIT_INTERLOCK is stalled since the there is another cause for a stall. For
(Counter 0) destination of previous each occurrence of a multiply interlock
MMX™ multiply this event will be counted twice (if the
instruction is not ready stalled instruction comes on the next
yet. clock after the multiply) or by one (if the
stalled instruction comes two clocks
after the multiply).
38H MOVD/MOVQ_ Number of clocks a
STORE_STALL_ MOVD/MOVQ instruction
DUE_TO_ store is stalled in D2 stage
PREVIOUS_MMX_ due to a previous MMX™
OPERATION operation with a
(Counter 1) destination to be used in
the store instruction.
39H RETURNS Number or returns Only RET instructions are counted;
(Counter 0) executed. IRET instructions are not counted. Any
exception taken on a RET instruction
and any interrupt recognized by the
processor on the instruction boundary
prior to the execution of the RET
instruction will also cause this counter
to be incremented.
39H Reserved
3AH BTB_FALSE_ Number of false entries in False entries are causes for
ENTRIES the Branch Target Buffer. misprediction other than a wrong
(Counter 0) prediction.
A-21
PERFORMANCE-MONITORING EVENTS
Table A-2. Events That Can Be Counted with the Pentium® Processor Performance-
Monitoring Counters (Contd.)
Event Mnemonic Event
Num. Name Description Comments
3AH BTB_MISS_ Number of times the BTB
PREDICTION_ON_ predicted a not-taken
NOT-TAKEN_ branch as taken.
BRANCH
(Counter 1)
3BH FULL_WRITE_ Number of clocks while
BUFFER_STALL_ the pipeline is stalled due
DURATION_ to full write buffers while
WHILE_ executing MMX™
EXECUTING_MMX_ instructions.
INSTRUCTIONS
(Counter 0)
3BH STALL_ON_MMX_ Number of clocks during
INSTRUCTION_ stalls on MMX™
WRITE_TO E-_OR_ instructions writing to E-
M-STATE_LINE or M-state lines.
(Counter 1)
A-22
B
Model-Specific
Registers
APPENDIX B
MODEL-SPECIFIC REGISTERS
Table B-1 lists the model-specific registers (MSRs) that can be read with the RDMSR and writ-
ten with the WRMSR instructions. Register addresses are given in both hexadecimal and deci-
mal; the register name is the mnemonic register name; the bit description describes individual
bits in registers.
NOTE
The registers with addresses 0H, 1H, 10H, 11H, 12H, and 13H in Table B-1
are available only in the Pentium® processor. Code code that accesses
registers 0H, 1H, and 10H will run on a P6 family processor without
generating exceptions; however, code that accesses registers 11H, 12H, and
13H will generate exceptions on a P6 family processor. The MSRs in this
table that are shaded are available only in the Pentium® II and later processors
in the P6 family.
B-1
MODEL-SPECIFIC REGISTERS
B-2
MODEL-SPECIFIC REGISTERS
B-3
MODEL-SPECIFIC REGISTERS
BL_CR_CTL[63:22] Reserved
BBL_CR_CTL[21] Processor number2
Disable = 1
Enable = 0
BBL_CR_CTL[20:19] Reserved
BBL_CR_CTL[18] User supplied ECC
BBL_CR_CTL[17] Reserved
BBL_CR_CTL[16] L2 Hit
BBL_CR_CTL[15:14] Reserved
BBL_CR_CTL[13:12] State from L2
Modified - 11,Exclusive - 10, Shared - 01, Invalid - 00
BBL_CR_CTL[11:10] Way from L2
Way 0 - 00, Way 1 - 01, Way 2 - 10, Way 3 - 11
BBL_CR_CTL[9:8] Way to L2
BBL_CR_CTL[7] Reserved
BBL_CR_CTL[6:5] State to L2
BBL_CR_CTL[4:0] L2 Command
01100 Data Read w/ LRU update (RLU)
01110 Tag Read w/ Data Read (TRR)
01111 Tag Inquire (TI)
00010 L2 Control Register Read (CR)
00011 L2 Control Register Write (CW)
010 + MESI encode Tag Write w/ Data Read (TWR)
111 + MESI encode Tag Write w/ Data Write (TWW)
100 + MESI encode Tag Write (TW)
11A 282 BBL_CR_TRIG Trigger register: used to initiate a cache configuration
accesses access, Write only with Data=0.
11B 283 BBL_CR_BUSY Busy register: indicates when a cache configuration
accesses L2 command is in progress. D[0] = 1 = BUSY
B-4
MODEL-SPECIFIC REGISTERS
BBL_CR_CTL3[63:26] Reserved
BBL_CR_CTL3[25] Cache bus fraction (read only)
BBL_CR_CTL3[24] Reserved
BBL_CR_CTL3[23] L2 Hardware Disable (read only)
BBL_CR_CTL3[22:20] L2 Physical Address Range support
111 64Gbytes
110 32Gbytes
101 16Gbytes
100 8Gbytes
011 4Gbytes
010 2Gbytes
001 1Gbytes
000 512Mbytes
BBL_CR_CTL3[19] Reserved
BBL_CR_CTL3[18] Cache State error checking enable (read/write)
BBL_CR_CTL3[17:13 Cache size per bank (read/write)
00001 256Kbytes
00010 512Kbytes
00100 1Mbyte
01000 2Mbyte
10000 4Mbytes
BBL_CR_CTL3[12:11] Number of L2 banks (read only)
BBL_CR_CTL3[10:9] L2 Associativity (read only)
00 Direct Mapped
01 2 Way
10 4 Way
11 Reserved
BBL_CR_CTL3[8] L2 Enabled (read/write)
BBL_CR_CTL3[7] CRTN Parity Check Enable (read/write)
BBL_CR_CTL3[6] Address Parity Check Enable (read/write)
BBL_CR_CTL3[5] ECC Check Enable (read/write)
BBL_CR_CTL3[4:1] L2 Cache Latency (read/write)
BBL_CR_CTL3[0] L2 Configured (read/write)
179H 377 MCG_CAP
17AH 378 MCG_STATUS
17BH 379 MCG_CTL
186H 390 EVNTSEL0
7:0 Event Select
(Refer to Performance Counter section for a list of
event encodings)
15:8 UMASK:
Unit Mask Register Set to 0 to enable all count options
16 USER:
Controls the counting of events at Privilege levels of 1,
2, and 3
B-5
MODEL-SPECIFIC REGISTERS
B-6
MODEL-SPECIFIC REGISTERS
B-7
MODEL-SPECIFIC REGISTERS
B-8
MODEL-SPECIFIC REGISTERS
NOTES:
1. Bit 0 of this register has been redefined several times, and is no longer used in Pentium® Pro processors.
2. The processor number feature may be disabled by setting bit 21 of the BBL_CR_CTL MSR (model-spe-
cific register address 119h) to “1”. Once set, bit 21 of the BBL_CR_CTL may not be cleared. This bit is
write-once. The processor number feature will be disabled until the processor is reset.
3. The Pentium® III processor will prevent FSB frequency overclocking with a new shutdown mechanism. If
the FSB frequency selected is greater than the internal FSB frequency the processor will shutdown. If the
FSB selected is less than the internal FSB frequency the BIOS may choose to use bit 11 to implement its
own shutdown policy.
B-9
C
Dual-Processor
Bootup Sequence
Example
(Specific to Pentium® Processors)
APPENDIX C
DUAL-PROCESSOR (DP) BOOTUP SEQUENCE
EXAMPLE (SPECIFIC TO PENTIUM®
PROCESSORS)
The following example shows the DP protocol for booting two Pentium® processors (a primary
processor and a secondary processor) in a DP system and initializing their APICs. For dual-pro-
cessor systems based on Pentium® processors, the APIC ID of the primary processor is always 0.
The following constants and data definitions are used in the accompanying code examples. They
are based on the addresses of the APIC registers as defined in Table 7-1 in Chapter 7.
ICR_LOW EQU 0FEE00300H
ICR_HI EQU 0FEE00310H
SVR EQU 0FEE000F0H
APIC_ID EQU 0FEE00020H
LVT3 EQU 0FEE00370H
APIC_ENABLED EQU 100H
BOOT_ID DW ?
UPGRD_ID DW ?
C-1
DUAL-PROCESSOR (DP) BOOTUP SEQUENCE EXAMPLE (SPECIFIC
8. Convert the base address of the 4-KByte page for the secondary processor’s bootup code
into 8-bit vector. The 8-bit vector defines the address of a 4-KByte page in the real-address
mode address space (1-MByte space). For example, a vector of 0BDH specifies a start-up
memory address of 000BD000H.
Use steps 9 and 10 to use the LVT APIC error handling entry to deal with unsuccessful
delivery of the start-up IPI.
9. Enable the local APIC by writing to spurious vector register (SVR). This is required to do
APIC error handling via the local vector table.
MOV ESI, SVR ; address of SVR
MOV EAX, [ESI]
OR EAX, APIC_ENABLED; set bit 8 to enable (0 on reset)
MOV [ESI], EAX
C-2
DUAL-PROCESSOR (DP) BOOTUP SEQUENCE EXAMPLE (SPECIFIC
10. Program LVT3 (APIC error interrupt vector) of the local vector table with an 8-bit vector
for handling APIC errors.
MOV ESI, LVT3
MOV EAX, [ESI]
AND EAX, FFFFFF00H; clear out previous vector
OR EAX, 000000xxH; xx is the 8-bit vector for APIC error
; handling.
MOV [ESI], EAX
11. Write APIC ICRH with address of the secondary processor’s APIC.
MOV ESI,
ICR_HI ; address of ICR high dword
MOV EAX,
[ESI] ; get high word of ICR
AND EAX,
0F0FFFFFFH; zero out ID Bits
OR EAX,
SECOND_ID; write ID into appropriate bits - don’t
; affect reserved bits
MOV [ESI], SECOND_ID; write upgrade ID to destination field
C-3
C.2. SECONDARY PROCESSOR’S SEQUENCE OF EVENTS
FOLLOWING RECEIPT OF START-UP IPI
If the secondary processor’s APIC is to be used for symmetric multiprocessing, the secondary
processor must undertake the following steps:
1. Switch to protected mode to access the APIC addresses.
2. Initialize its local APIC by writing to bit 8 of the SVR register and programming its LVT3
for error handling.
3. Configure the APIC as appropriate.
4. Enable interrupts.
5. (Optional.) Execute the CPUID instruction and write the results into the configuration
RAM.
6. Do either of the following:
— Execute a HALT instruction and wait for an IPI from the operating system.
— Continue execution.
D
Multiple-Processor
(MP) Bootup
Sequence Example
(Specific to P6 Family Processors)
APPENDIX D
MULTIPLE-PROCESSOR (MP) BOOTUP
SEQUENCE EXAMPLE (SPECIFIC TO P6 FAMILY
PROCESSORS)
The following example illustrates the use of the MP protocol to boot two P6 family processors
in a multiple-processor (MP) system and initialize their APICs. The primary processor (the pro-
cessor that won the “race for the flag”) is called the boot strap processor (BSP) and the second-
ary processor is called the application processor (AP).
The following constants and data definitions are used in the accompanying code examples. They
are based on the addresses of the APIC registers as defined in Table 7-1 in Chapter 7.
ICR_LOW EQU 0FEE00300H
ICR_HI EQU 0FEE00310H
SVR EQU 0FEE000F0H
APIC_ID EQU 0FEE00020H
LVT3 EQU 0FEE00370H
APIC_ENABLED EQU 100H
BOOT_ID DW ?
SECOND_ID DW ?
D-1
MULTIPLE-PROCESSOR (MP) BOOTUP SEQUENCE EXAMPLE
5. Switch to protected mode (to access APIC address space above 1 MByte) or change the
APIC base to less than 1 MByte and insure it is mapped to an uncached (UC) memory
type.
6. Determine the BSP’s APIC ID from the local APIC ID register (default is 0):
MOV ESI, APIC_ID; address of local APIC ID register
MOV EAX, [ESI]
AND EAX, 0F000000H; zero out all other bits except APIC ID
MOV BOOT_ID, EAX; save in memory
8. Convert the base address of the 4-KByte page for the AP’s bootup code into 8-bit vector.
The 8-bit vector defines the address of a 4-KByte page in the real-address mode address
space (1-MByte space). For example, a vector of 0BDH specifies a start-up memory
address of 000BD000H.
Use steps 9 and 10 to use the LVT APIC error handling entry to deal with unsuccessful
delivery of the start-up IPI.
9. Enable the local APIC by writing to spurious vector register (SVR). This is required to do
APIC error handling via the local vector table.
MOV ESI, SVR ; address of SVR
MOV EAX, [ESI]
OR EAX, APIC_ENABLED; set bit 8 to enable (0 on reset)
MOV [ESI], EAX
10. Program LVT3 (APIC error interrupt vector) of the local vector table with an 8-bit vector
for handling APIC errors.
MOV ESI, LVT3
MOV EAX, [ESI]
AND EAX, FFFFFF00H; clear out previous vector
OR EAX, 000000xxH; xx is the 8-bit vector for APIC error
; handling.
MOV [ESI], EAX
MOV ESI,
ICR_HI ; address of ICR high dword
MOV EAX,
[ESI] ; get high word of ICR
AND EAX,
0F0FFFFFFH; zero out ID Bits
OR EAX,
SECOND_ID; write ID into appropriate bits - don’t
; affect reserved bits
MOV [ESI], SECOND_ID; write upgrade ID to destination field
D-2
MULTIPLE-PROCESSOR (MP) BOOTUP SEQUENCE EXAMPLE
12. Initialize the memory location into which the AP will write to signal it’s presence.
13. Set the timer with an appropriate value (~100 milliseconds).
14. Write APIC ICRL to send a start-up IPI message to the AP via the APIC.
MOV ESI,
ICR_LOW; write address of ICR low dword
MOV EAX,
[ESI] ; get low dword of ICR
AND EAX,
0FFF0F800H; zero out delivery mode and vector fields
OR EAX,
000006xxH; 6 selects delivery mode 110 (StartUp IPI)
; xx should be vector of 4kb page as
; computed in Step 8.
MOV [ESI], EAX
D-3
E
Programming the
LINT0 and LINT1
Inputs
APPENDIX E
PROGRAMMING THE LINT0 AND LINT1 INPUTS
The following procedure describes how to program the LINT0 and LINT1 local APIC pins on
a processor after multiple processors have been booted and initialized (as described in Appendix
C and Appendix D). In this example, LINT0 is programmed to be the ExtINT pin and LINT1 is
programmed to be the NMI pin.
E.1. CONSTANTS
The following constants are defined:
LVT1 EQU 0FEE00350H
LVT2 EQU 0FEE00360H
LVT3 EQU 0FEE00370H
SVR EQU 0FEE000F0H
3. Program LVT1 as an ExtINT which delivers the signal to the INTR signal of all processors
cores listed in the destination as an interrupt that originated in an externally connected
interrupt controller.
MOV ESI,
LVT1
MOV EAX,
[ESI]
AND EAX,
0FFFE58FFH; mask off bits 8-10, 12, 14 and 16
OR EAX,
700H ; Bit 16=0 for not masked, Bit 15=0 for edge
; triggered, Bit 13=0 for high active input
; polarity, Bits 8-10 are 111b for ExtINT
MOV [ESI], EAX ; Write to LVT1
E-1
PROGRAMMING THE LINT0 AND LINT1 INPUTS
4. Program LVT2 as NMI, which delivers the signal on the NMI signal of all processor cores
listed in the destination.
MOV ESI,
LVT2
MOV EAX,
[ESI]
AND EAX,
0FFFE58FFH; mask off bits 8-10 and 15
OR EAX,
000000400H; Bit 16=0 for not masked, Bit 15=0 edge
; triggered, Bit 13=0 for high active input
; polarity, Bits 8-10 are 100b for NMI
MOV [ESI], EAX ; Write to LVT2
;Unmask 8259 interrupts and allow NMI.
E-2
INDEX
INDEX-1
INDEX
INDEX-2
INDEX
INDEX-3
INDEX
INDEX-4
INDEX
B0-B3 (breakpoint condition detected) Event select field, PerfEvtSel0 and PerfEvtSel1
flags. . . . . . . . . . . . . . . . . . . . . . . . . . . .15-4 MSRs (P6 family processors) . . . . 15-16
BD (debug register access detected) flag. .15-4 Exception handler
BS (single step) flag . . . . . . . . . . . . . . . . . .15-5 calling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
BT (task switch) flag . . . . . . . . . . . . . . . . . .15-5 defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
debug exception (#DB) . . . . . . . . . . . . . . . .5-23 flag usage by handler procedure. . . . . . . . 5-18
reserved bits . . . . . . . . . . . . . . . . . . . . . . .18-24 machine-check exceptions (#MC). . . . . . 13-14
DR7 debug control register . . . . . . . . . . . . . . .15-5 procedures . . . . . . . . . . . . . . . . . . . . . . . . 5-15
G0-G3 (global breakpoint enable) flags . . .15-5 protection of handler procedures . . . . . . . 5-17
GD (general detect enable) flag . . . . . . . . .15-5 task . . . . . . . . . . . . . . . . . . . . . . . . . . .5-18, 6-3
GE (global exact breakpoint enable) flag . .15-5 Exception priority, FPU exceptions. . .11-13, 18-12
L0-L3 (local breakpoint enable) flags . . . . .15-5 Exceptions
LE local exact breakpoint enable) flag . . . .15-5 alignment check . . . . . . . . . . . . . . . . . . . 18-13
LEN0-LEN3 (Length) fields. . . . . . . . . . . . .15-6 classifications . . . . . . . . . . . . . . . . . . . . . . . 5-4
R/W0-R/W3 (read/write) fields . . . . 15-6, 18-24 conditions checked during a task switch . . 6-13
D/B (default operation size/default stack pointer coprocessor segment overrun. . . . . . . . . 18-14
size and/or upper bound) flag, segment description of. . . . . . . . . . . . . . . . . . . . .2-4, 5-1
descriptor . . . . . . . . . . . . . . . . . 3-12, 4-5 device not available. . . . . . . . . . . . . . . . . 18-14
double fault . . . . . . . . . . . . . . . . . . . . . . . . 5-32
error code . . . . . . . . . . . . . . . . . . . . . . . . . 5-20
E floating-point error . . . . . . . . . . . . . . . . . . 18-14
E (edge detect) flag, PerfEvtSel0 and PerfEvtSel1 general protection . . . . . . . . . . . . . . . . . . 18-14
MSRs (P6 family processors) . . . .15-17 handler mechanism. . . . . . . . . . . . . . . . . . 5-15
E (enable/disable APIC) flag, handler procedures . . . . . . . . . . . . . . . . . . 5-15
APIC_BASE_MSR . . . . . . . . . . . . . .7-19 handling. . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
E (expansion direction) flag, segment handling in real-address mode . . . . . . . . . 16-6
descriptor . . . . . . . . . . . . . . . . . . 4-2, 4-5 handling in SMM . . . . . . . . . . . . . . . . . . . 12-10
E (MTRRs enabled) flag, MTRRdefType handling in virtual-8086 mode . . . . . . . . . 16-15
register . . . . . . . . . . . . . . . . . . 7-19, 9-22 handling through a task gate in virtual-8086
EFLAGS register mode . . . . . . . . . . . . . . . . . . . . . . . . . 16-20
introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-5 handling through a trap or interrupt gate in
new flags. . . . . . . . . . . . . . . . . . . . . . . . . . .18-6 virtual-8086 mode . . . . . . . . . . . . . . . 16-17
saved in TSS . . . . . . . . . . . . . . . . . . . . . . . .6-4 IDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11
saving on call to exception or interrupt initializing for protected-mode operation . . 8-12
handler . . . . . . . . . . . . . . . . . . . . . . . . .5-15 invalid opcode . . . . . . . . . . . . . . . . . . . . . . 18-6
using flags to distinguish between 32-bit Intel masking debug exceptions . . . . . . . . . . . . . 5-9
Architecture processors. . . . . . . . . . . . .18-6 masking when switching stack segments . 5-10
EIP register . . . . . . . . . . . . . . . . . . . . . . . . . .18-12 notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
saved in TSS . . . . . . . . . . . . . . . . . . . . . . . .6-4 overview of . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
saving on call to exception or interrupt priorities among simultaneous exceptions and
handler . . . . . . . . . . . . . . . . . . . . . . . . .5-15 interrupts . . . . . . . . . . . . . . . . . . . . . . . 5-10
state following initialization . . . . . . . . . . . . . .8-6 priority of . . . . . . . . . . . . . . . . . . . . . . . . . 18-27
EM (emulation) flag, CR0 control register . . . 2-15, reference information on all exceptions . . 5-21
5-30, 8-6, 8-8 restarting a task or program . . . . . . . . . . . . 5-7
EOI (end-of-interrupt register), local APIC . . . .7-33 segment not present . . . . . . . . . . . . . . . . 18-14
Error code sources of . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
exception, description of . . . . . . . . . . . . . . .5-20 summary of . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
pushing on stack. . . . . . . . . . . . . . . . . . . .18-33 vectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
Error signals . . . . . . . . . . . . . . . . . . . . 18-12, 18-13 Executable code segment, size . . . . . . . . . . . 3-12
ERROR# input . . . . . . . . . . . . . . . . . . . . . . . .18-19 Expand-down data segment type . . . . . . . . . . 3-12
ERROR# output . . . . . . . . . . . . . . . . . . . . . . .18-19 External bus errors, detected with machine-check
ES0 and ES1 (event select) fields, CESR MSR architecture. . . . . . . . . . . . . . . . . . 13-11
(Pentium processor). . . . . . .15-20, A-12
ESP register, saving on call to exception or interrupt
handler . . . . . . . . . . . . . . . . . . . . . . .5-15 F
ESR (error status register), local APIC . . . . . .7-42 F2XM1 instruction. . . . . . . . . . . . . . . . . . . . . 18-16
ET (extension type) flag, CR0 control register .2-14 Fast string operations . . . . . . . . . . . . . . . . . . . . 7-9
ET (extension type) flag, CR0 register . . . . . . .18-8
INDEX-5
INDEX
INDEX-6
INDEX
INDEX-7
INT (APIC interrupt enable) flag, PerfEvtSel0 and handling through a trap or interrupt gate in
PerfEvtSel1 MSRs (P6 family processors) virtual-8086 mode . . . . . . . . . . . . . . . 16-17
15-17 IDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11
INT3 instruction . . . . . . . . . . . . . . . . . . . . . 3-9, 5-3 IDTR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11
Intel 287 math coprocessor . . . . . . . . . . . . . . .18-7 initializing for protected-mode operation . . 8-12
Intel 387 math coprocessor system . . . . . . . . .18-7 interrupt descriptor table register (see IDTR)
Intel 487 SX math coprocessor . . . . . . 18-7, 18-20 interrupt descriptor table (see IDT)
Intel 8086 processor. . . . . . . . . . . . . . . . . . . . .18-7 local APIC . . . . . . . . . . . . . . . . . . . . . . . . . 7-13
Intel Architecture local APIC sources . . . . . . . . . . . . . . . . . . 7-15
compatibility . . . . . . . . . . . . . . . . . . . . . . . .18-1 maskable hardware interrupts. . . . . . .2-8, 7-23
processors . . . . . . . . . . . . . . . . . . . . . . . . .18-1 masking maskable hardware interrupts . . . 5-8
Intel286 processor . . . . . . . . . . . . . . . . . . . . . .18-7 masking when switching stack segments . 5-10
Intel386 DX processor . . . . . . . . . . . . . . . . . . .18-7 overview of . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
Intel486 DX processor . . . . . . . . . . . . . . . . . . .18-7 priorities among simultaneous exceptions and
Intel486 SX processor . . . . . . . . . . . . . 18-7, 18-20 interrupts . . . . . . . . . . . . . . . . . . . . . . . 5-10
Interprivilege level calls propagation delay . . . . . . . . . . . . . . . . . . 18-27
call mechanism . . . . . . . . . . . . . . . . . . . . . .4-17 restarting a task or program . . . . . . . . . . . . 5-7
stack switching . . . . . . . . . . . . . . . . . . . . . .4-21 software. . . . . . . . . . . . . . . . . . . . . . . . . . . 5-55
Interrupt command register (ICR), local APIC .7-25 summary of . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Interrupt gates user defined . . . . . . . . . . . . . . . . . . . .5-4, 5-55
16-bit, interlevel return from . . . . . . . . . . .18-34 valid APIC interrupts . . . . . . . . . . . . . . . . . 7-15
clearing IF flag . . . . . . . . . . . . . . . . . . 5-9, 5-18 vectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
difference between interrupt and trap gates . . INTn instruction . . . . . . . . . . . . . . . . . . . . . . 15-10
5-18 INTO instruction . . . . . . . . . . 3-9, 5-3, 5-26, 15-10
for 16-bit and 32-bit code modules . . . . . . .17-2 INTR# pin . . . . . . . . . . . . . . . . . . . . . . . . . .5-2, 5-8
handling a virtual-8086 mode interrupt or Invalid arithmetic operand exception (#IA), FPU
exception through . . . . . . . . . . . . . . . .16-17 description of. . . . . . . . . . . . . . . . . . . . . . 11-17
in IDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-13 Invalid opcode exception (#UD) . 5-28, 12-3, 15-4,
introduction to . . . . . . . . . . . . . . . . . . . . 2-3, 2-4 18-6, 18-13
layout of . . . . . . . . . . . . . . . . . . . . . . . . . . .5-13 Invalid operation exception. . . . . . . . . . . . . . 11-17
Interrupt handler Invalid operation exception, FPU . . . .18-13, 18-17
calling . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-15 Invalid TSS exception (#TS). . . . . . . . . . .5-35, 6-7
defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-1 Invalid-opcode exception (#UD) . . . . .18-25, 18-26
flag usage by handler procedure . . . . . . . .5-18 INVD instruction . . . . 2-21, 4-25, 7-12, 9-15, 18-5
procedures . . . . . . . . . . . . . . . . . . . . . . . . .5-15 INVLPG instruction . . . . . . . 2-21, 4-25, 7-12, 18-5
protection of handler procedures . . . . . . . .5-17 IOPL (I/O privilege level) field, EFLAGS register
task . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-18, 6-3 description of. . . . . . . . . . . . . . . . . . . . . . . . 2-8
Interrupt redirection bit map field (in TSS) . . .16-16 restoring on return from exception or interrupt h
Interrupts andler. . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
acceptance, local APIC. . . . . . . . . . . . . . . .7-30 sensitive instructions in virtual-8086
APIC priority levels . . . . . . . . . . . . . . . . . . .7-15 mode . . . . . . . . . . . . . . . . . . . . . . . . . 16-14
automatic bus locking when IRET instruction . . 3-9, 5-8, 5-9, 5-15, 5-18, 6-10,
acknowledging. . . . . . . . . . . . . . . . . . .18-37 6-12, 7-12, 16-6, 16-27
control transfers between 16- and 32-bit code IRETD instruction . . . . . . . . . . . . . . . . . . . . . . 7-12
modules. . . . . . . . . . . . . . . . . . . . . . . . .17-8 IRR (interrupt request register), local APIC . . 7-30
description of . . . . . . . . . . . . . . . . . . . . 2-4, 5-1 ISR (in-service register), local APIC . . . . . . . . 7-30
distribution mechanism, local APIC . . . . . .7-22 I/O
enabling and disabling . . . . . . . . . . . . . . . . .5-8 breakpoint exception conditions . . . . . . . . 15-9
handler mechanism . . . . . . . . . . . . . . . . . .5-15 in virtual-8086 mode . . . . . . . . . . . . . . . . 16-14
handler procedures. . . . . . . . . . . . . . . . . . .5-15 instruction restart flag, SMM revision indentifier
handling . . . . . . . . . . . . . . . . . . . . . . . . . . .5-15 field . . . . . . . . . . . . . . . . . . . . .12-15, 12-16
handling in real-address mode . . . . . . . . . .16-6 instructions, restarting following an SMI
handling in SMM . . . . . . . . . . . . . . . . . . . .12-10 interrupt . . . . . . . . . . . . . . . . . . . . . . . 12-15
handling in virtual-8086 mode. . . . . . . . . .16-15 I/O permission bit map, TSS . . . . . . . . . . . . 6-6
handling multiple NMIs . . . . . . . . . . . . . . . . .5-8 map base address field, TSS . . . . . . . . . . . 6-6
handling through a task gate in virtual-8086 I/O APIC
mode . . . . . . . . . . . . . . . . . . . . . . . . . .16-20 bus arbitration . . . . . . . . . . . . . . . . . . . . . . 7-15
description of. . . . . . . . . . . . . . . . . . . . . . . 7-13
INDEX
INDEX-9
INDEX
INDEX-10
INDEX
MCi_MISC MSRs . . . . . . . . . . . . . . . . . 13-7, 13-17 MTRR flag, EDX feature information register . 9-20
MCi_STATUS MSRs . . . . . . . . 13-5, 13-15, 13-17 MTRRcap register . . . . . . . . . . . . . . . . . . . . . 9-20
MDA (message destination address), local MTRRdefType register . . . . . . . . . . . . . . . . . . 9-21
APIC. . . . . . . . . . . . . . . . . . . . . . . . .7-20 MTRRfix16K_80000 and MTRRfix16K_A0000
Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-1 (fixed range) MTRRs . . . . . . . . . . . 9-23
Memory management MTRRfix4K_C0000. and MTRRfix4K_F8000 (fixed
introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-5 range) MTRRs . . . . . . . . . . . . . . . . 9-23
overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1 MTRRfix64K_00000 (fixed range) MTRR. . . . 9-22
paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1 MTRRphysBasen (variable range) MTRRs . . 9-23
segmentation . . . . . . . . . . . . . . . . . . . . . . . .3-1 MTRRphysMaskn (variable range) MTRRs . . 9-23
Memory ordering MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
in Intel Architecture processors . . . . . . . .18-36 address mapping for fixed-range MTRRs . 9-23
overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-6 cache control. . . . . . . . . . . . . . . . . . . . . . . 9-12
processor ordering . . . . . . . . . . . . . . . . . . . .7-6 description of. . . . . . . . . . . . . . . . . . . .8-9, 9-18
snooping mechanism . . . . . . . . . . . . . . . . . .7-8 enabling caching . . . . . . . . . . . . . . . . . . . . . 8-8
write forwarding . . . . . . . . . . . . . . . . . . . . . .7-8 example of base and mask calculations . . 9-25
write ordering . . . . . . . . . . . . . . . . . . . . . . . .7-6 feature identification . . . . . . . . . . . . . . . . . 9-20
Memory type range registers (see MTRRs) fixed-range registers . . . . . . . . . . . . . . . . . 9-22
Memory types initialization of . . . . . . . . . . . . . . . . . . . . . . 9-27
caching methods, defined. . . . . . . . . . . . . . .9-5 introduction of in Intel Architecture
choosing . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-8 processors . . . . . . . . . . . . . . . . . . . . 18-39
MTRR types . . . . . . . . . . . . . . . . . . . . . . . .9-19 large page size considerations . . . . . . . . . 9-32
UC (uncacheable). . . . . . . . . . . . . . . . . . . . .9-5 mapping physical memory with . . . . . . . . . 9-20
WB (write back) . . . . . . . . . . . . . . . . . . . . . .9-6 memory types and their properties . . . . . . 9-19
WC (write combining) . . . . . . . . . . . . . . . . . .9-6 MemTypeGet() function . . . . . . . . . . . . . . 9-28
WP (write protected) . . . . . . . . . . . . . . . . . . .9-7 MemTypeSet() function. . . . . . . . . . . . . . . 9-29
WT (write through) . . . . . . . . . . . . . . . . . . . .9-6 MTRRcap register . . . . . . . . . . . . . . . . . . . 9-20
MemTypeGet() function . . . . . . . . . . . . . . . . . .9-28 MTRRdefType register . . . . . . . . . . . . . . . 9-21
MemTypeSet() function . . . . . . . . . . . . . . . . . .9-29 multiple-processor considerations. . . . . . . 9-31
MESI cache protocol precedence of cache controls . . . . . . . . . . 9-13
described . . . . . . . . . . . . . . . . . . . . . . . 9-4, 9-9 precedences . . . . . . . . . . . . . . . . . . . . . . . 9-26
Mixing 16-bit and 32-bit code programming interface . . . . . . . . . . . . . . . 9-28
on Intel Architecture processors . . . . . . . .18-34 remapping memory types . . . . . . . . . . . . . 9-27
overview . . . . . . . . . . . . . . . . . . . . . . . . . . .17-1 setting memory ranges . . . . . . . . . . . . . . . 9-21
MMX instructions state of following a hardware reset . . . . . . 9-18
pairing guidelines . . . . . . . . . . . . . . . . . . .14-17 variable-range registers . . . . . . . . . . . . . . 9-23
Mode switching Multiple-processor initialization
between real-address and protected mode 8-13 MP protocol . . . . . . . . . . . . . . . . . . . .7-45, 7-46
example . . . . . . . . . . . . . . . . . . . . . . . . . . .8-16 procedure . . . . . . . . . . . . . . . . . . . . . . . . . 7-48
to SMM . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-2 Multiple-processor management
Model and stepping information, following bus locking . . . . . . . . . . . . . . . . . . . . . . . . . 7-3
processor initialization or reset . . . . .8-5 guaranteed atomic operations. . . . . . . . . . . 7-2
Model-specific registers (see MSRs) interprocessor and self-interrupts . . . . . . . 7-25
MOV instruction . . . . . . . . . . . . . . . . . . . . 3-9, 4-10 local APIC . . . . . . . . . . . . . . . . . . . . . . . . . 7-13
MOV (control registers) instructions. . . 2-20, 4-25, memory ordering . . . . . . . . . . . . . . . . . . . . . 7-6
7-12, 8-14 MP protocol . . . . . . . . . . . . . . . . . . . .7-45, 7-46
MOV (debug registers) instructions . . . 2-21, 4-25, overview of . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
7-12, 15-10 SMM considerations . . . . . . . . . . . . . . . . 12-17
MP (monitor coprocessor) flag, CR0 control register Multiple-processor system
2-16, 5-30, 8-6, 8-8 MP protocol . . . . . . . . . . . . . . . . . . . .7-45, 7-46
MP (monitor coprocessor) flag, CR0 register. .18-8 relationship of local and I/O APICs . . . . . . 7-14
MSRs Multisegment model . . . . . . . . . . . . . . . . . . . . . 3-5
description of . . . . . . . . . . . . . . . . . . . . . . . .8-8 Multitasking
introduction of in Intel Architecture processors initialization for . . . . . . . . . . . . . . . . . . . . . 8-13
18-38 linking tasks. . . . . . . . . . . . . . . . . . . . . . . . 6-14
introduction to . . . . . . . . . . . . . . . . . . . . . . . .2-5 mechanism, description of . . . . . . . . . . . . . 6-3
machine-check architecture . . . . . . . . . . . .13-2 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
reading and writing . . . . . . . . . . . . . . . . . . .2-23 setting up TSS. . . . . . . . . . . . . . . . . . . . . . 8-13
INDEX-11
INDEX
INDEX-12
INDEX
INDEX-13
INDEX
Previous task link field, TSS. . . . . . 6-4, 6-14, 6-16 CR3 control register . 2-16, 9-12, 18-22, 18-31
Priority levels, APIC interrupts . . . . . . . . . . . . .7-15 page-directory entries . . . . . . . . 8-8, 9-12, 9-32
Privilege levels page-table entries . . . . . 8-8, 9-12, 9-32, 18-32
checking when accessing data segments . .4-9 page-table entry . . . . . . . . . . . . . . . . . . . . 3-26
checking, for call gates . . . . . . . . . . . . . . . .4-17
checking, when transferring program control
between code segments . . . . . . . . . . . .4-12 Q
description of . . . . . . . . . . . . . . . . . . . . . . . .4-8 QNaN
protection rings . . . . . . . . . . . . . . . . . . . . . . .4-9 compatibility, Intel Architecture
Privileged instructions . . . . . . . . . . . . . . . . . . .4-25 processors . . . . . . . . . . . . . . . . . . . . . 18-10
Processor identification
earlier Intel architecture processors . . . . . .9-33 R
Processor management
RC (rounding control) field, FPU control
initialization . . . . . . . . . . . . . . . . . . . . . . . . . .8-1
word . . . . . . . . . . . . . . . . . . . .11-3, 11-4
local APIC . . . . . . . . . . . . . . . . . . . . . . . . . .7-13
RDMSR instruction2-23, 4-25, 9-20, 15-13, 15-15,
overview of . . . . . . . . . . . . . . . . . . . . . . . . . .7-1
15-16, 15-18, 15-20, 18-4, 18-38
snooping mechanism . . . . . . . . . . . . . . . . . .7-8
RDPMC instruction2-22, 4-25, 15-16, 15-18, 18-3,
processor number . . . . . . . . . . . . . . . . . . B-4, B-9
18-21, 18-40
Processor ordering, description of . . . . . . . . . . .7-7
RDTSC instruction . . . . . . 2-22, 4-25, 15-15, 18-4
Protected mode
Read/write
IDT initialization . . . . . . . . . . . . . . . . . . . . .8-12
protection, page level . . . . . . . . . . . . . . . . 4-32
initialization for . . . . . . . . . . . . . . . . . . . . . .8-11
rights, checking . . . . . . . . . . . . . . . . . . . . . 4-27
mixing 16-bit and 32-bit code modules . . . .17-2
Real-address mode
mode switching . . . . . . . . . . . . . . . . . . . . . .8-13
8086 emulation . . . . . . . . . . . . . . . . . . . . . 16-1
PE flag, CR0 register . . . . . . . . . . . . . . . . . .4-2
address translation in . . . . . . . . . . . . . . . . 16-3
switching to . . . . . . . . . . . . . . . . . . . . . 4-2, 8-14
system data structures required during description of. . . . . . . . . . . . . . . . . . . . . . . 16-1
initialization . . . . . . . . . . . . . . . . . 8-11, 8-12 exceptions and interrupts . . . . . . . . . . . . . 16-8
Protection IDT initialization. . . . . . . . . . . . . . . . . . . . . 8-10
combining segment and page-level IDT, changing base and limit of. . . . . . . . . 16-6
protection. . . . . . . . . . . . . . . . . . . . . . . .4-33 IDT, structure of . . . . . . . . . . . . . . . . . . . . 16-7
disabling . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2 IDT, use of. . . . . . . . . . . . . . . . . . . . . . . . . 16-6
enabling . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2 initialization . . . . . . . . . . . . . . . . . . . . . . . . 8-10
flags used for page-level protection . . . . . . .4-2 instructions supported . . . . . . . . . . . . . . . . 16-4
flags used for segment-level protection . . . .4-2 interrupt and exception handling . . . . . . . . 16-6
of exception- and interrupt-handler procedures mode switching . . . . . . . . . . . . . . . . . . . . . 8-13
5-17 native 16-bit mode. . . . . . . . . . . . . . . . . . . 17-1
overview of . . . . . . . . . . . . . . . . . . . . . . . . . .4-1 overview of . . . . . . . . . . . . . . . . . . . . . . . . 16-1
page level . . . . . . . . . . . . . . . . . . . . . . 4-2, 4-32 registers supported . . . . . . . . . . . . . . . . . . 16-4
page level, overriding . . . . . . . . . . . . . . . . .4-32 switching to . . . . . . . . . . . . . . . . . . . . . . . . 8-15
page level, overview . . . . . . . . . . . . . . . . . .4-30 Related literature . . . . . . . . . . . . . . . . . . . . . . . 1-9
page-level protection flags . . . . . . . . . . . . .4-31 Requested privilege level (see RPL)
read/write, page level . . . . . . . . . . . . . . . . .4-32 Reserved bits . . . . . . . . . . . . . . . . . . . . . .1-6, 18-1
RESET# pin . . . . . . . . . . . . . . . . . . . . . .5-2, 18-19
segment level . . . . . . . . . . . . . . . . . . . . . . . .4-2
RESET# signal . . . . . . . . . . . . . . . . . . . . . . . . 2-22
user/supervisor type . . . . . . . . . . . . . . . . . .4-31
Reset, hardware
Protection rings . . . . . . . . . . . . . . . . . . . . . . . . .4-9
receiving when processor is shutdown . . . 5-33
PS (page size) flag, page-table entry. . . . . . . .3-27
Restarting program or task, following an exception
PSE (page size extension) flag, CR4 control
or interrupt . . . . . . . . . . . . . . . . . . . . 5-7
register . . . 2-17, 3-19, 3-21, 3-22, 9-17,
18-22, 18-23 Restricting addressable domain . . . . . . . . . . . 4-31
Pseudo-infinity . . . . . . . . . . . . . . . . . . . . . . . .18-10 RET instruction . . . . . . . . . . 4-12, 4-13, 4-23, 17-7
Returning
Pseudo-NaN. . . . . . . . . . . . . . . . . . . . . . . . . .18-10
from a called procedure . . . . . . . . . . . . . . 4-23
Pseudo-zero. . . . . . . . . . . . . . . . . . . . . . . . . .18-10
from an interrupt or exception handler . . . 5-15
PUSH instruction . . . . . . . . . . . . . . . . . . . . . . .18-7
RF (resume) flag, EFLAGS register . 2-9, 5-9, 15-2
PUSHF instruction . . . . . . . . . . . . . . . . . . 5-9, 18-7
Rounding
PVI (protected-mode virtual interrupts) flag, CR4
control, RC field of FPU control word . . . . 11-3
control register . . . . . . . . . . . 2-17 , 18-22
PWT (page-level write-through) flag modes, FPU . . . . . . . . . . . . . . . . . . .11-3, 11-4
INDEX-14
INDEX
INDEX-15
INDEX
INDEX-16
INDEX
INDEX-17
INDEX
VIF (virtual interrupt) flag, EFLAGS register . .2-10 WRMSR instruction 2-22, 2-23, 4-25, 7-12, 15-11,
VIP (virtual interrupt pending) flag, EFLAGS 15-15, 15-16, 15-18, 15-20, 18-4, 18-38
register . . . . . . . . . . . . . . . . . . 2-10, 18-6 WT (write through) memory type . . . . . . . .9-6, 9-8
Virtual memory . . . . . . . . . . . . . . . . . . . . . . 2-5, 3-1
Virtual-8086 mode
8086 emulation . . . . . . . . . . . . . . . . . . . . . .16-1 X
description of . . . . . . . . . . . . . . . . . . . . . . .16-9 XADD instruction . . . . . . . . . . . . . . . . . . .7-4, 18-5
emulating 8086 operating system calls. . .16-25 XCHG instruction . . . . . . . . . . . . . . . 7-3, 7-4, 7-10
enabling . . . . . . . . . . . . . . . . . . . . . . . . . . .16-9 XOR instruction . . . . . . . . . . . . . . . . . . . . . . . . 7-4
entering. . . . . . . . . . . . . . . . . . . . . . . . . . .16-11
exception and interrupt handling, Z
overview . . . . . . . . . . . . . . . . . . . . . . .16-15
ZF flag, EFLAGS register . . . . . . . . . . . . . . . . 4-27
exceptions and interrupts, handling through a
task gate . . . . . . . . . . . . . . . . . . . . . . .16-19
exceptions and interrupts, handling through a
trap or interrupt gate . . . . . . . . . . . . . .16-17
handling exceptions and interrupts through a
task gate . . . . . . . . . . . . . . . . . . . . . . .16-20
IOPL sensitive instructions . . . . . . . . . . . .16-14
I/O-port-mapped I/O . . . . . . . . . . . . . . . . .16-15
leaving . . . . . . . . . . . . . . . . . . . . . . . . . . .16-13
memory mapped I/O . . . . . . . . . . . . . . . . .16-15
native 16-bit mode . . . . . . . . . . . . . . . . . . .17-1
overview of . . . . . . . . . . . . . . . . . . . . . . . . .16-1
paging of virtual-8086 tasks . . . . . . . . . . .16-10
protection within a virtual-8086 task . . . . .16-11
special I/O buffers. . . . . . . . . . . . . . . . . . .16-15
structure of a virtual-8086 task . . . . . . . . . .16-9
virtual I/O . . . . . . . . . . . . . . . . . . . . . . . . .16-14
Virtual-8086 tasks
paging of . . . . . . . . . . . . . . . . . . . . . . . . . .16-10
protection within . . . . . . . . . . . . . . . . . . . .16-11
structure of . . . . . . . . . . . . . . . . . . . . . . . . .16-9
VM (virtual-8086 mode) flag, EFLAGS register .2-9
VME (virtual-8086 mode extensions) flag, CR4
control register . . . . . . . . . . . 2-17, 18-22
W
WAIT instruction. . . . . . . . . . . . . . . . . . . . . . . .5-30
WAIT/FWAIT instructions. . . . . 18-8, 18-18, 18-19
WB (write back) memory type . . . . . . . . . . 9-6, 9-8
WBINVD instruction . . 2-21, 4-25, 7-12, 9-15, 18-5
WC (write combining)
flag, MTRRcap register. . . . . . . . . . . . . . . .9-21
memory type . . . . . . . . . . . . . . . . . . . . . 9-6, 9-8
WP (write protected) memory type. . . . . . . . . . .9-7
WP (write protect) flag, CR0 control register . 2-14,
4-32, 18-22
Write
forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . .7-8
hit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-5
Write back (WB) memory type . . . . . . . . . . . . .7-10
Write buffer
description of . . . . . . . . . . . . . . . . . . . . . . . .9-4
in Intel Architecture processors . . . . . . . .18-36
operation of. . . . . . . . . . . . . . . . . . . . . . . . .9-17
Write-back caching. . . . . . . . . . . . . . . . . . . . . . .9-5
INDEX-18