Intel Architecture Basics
Intel Architecture Basics
Software Developer’s
Manual
Volume 1:
Basic Architecture
1997
Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel
or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel’s Terms and
Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied
warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular
purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are
not intended for use in medical, life saving, or life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or
“undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or
incompatibilities arising from future changes to them.
Intel’s Intel Architecture processors (e.g., Pentium and Pentium Pro processors) may contain design defects or errors
known as errata. Current characterized errata are available on request.
Third-party brands and names are the property of their respective owners.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your
product order.
Copies of documents which have an ordering number and are referenced in this document, or other Intel literature,
may be obtained from:
Intel Corporation
P.O. Box 7641
Mt. Prospect IL 60056-7641
or call 1-800-879-4683
or visit Intel’s website at http:\\www.intel.com
CHAPTER 2
INTRODUCTION TO THE INTEL ARCHITECTURE
2.1. BRIEF HISTORY OF THE INTEL ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . .2-1
2.2. INCREASING INTEL ARCHITECTURE PERFORMANCE AND MOORE’S LAW . . 2-4
2.3. BRIEF HISTORY OF THE INTEL ARCHITECTURE FLOATING-POINT UNIT. . . . . 2-5
2.4. INTRODUCTION TO THE PENTIUM® PRO PROCESSOR’S ADVANCED
MICROARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-5
2.5. DETAILED DESCRIPTION OF THE PENTIUM® PRO
PROCESSOR MICROARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-8
2.5.1. Memory Subsystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-8
2.5.2. The Fetch/Decode Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-10
2.5.3. Instruction Pool (Reorder Buffer). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-10
2.5.4. Dispatch/Execute Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-11
2.5.5. Retirement Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-12
CHAPTER 3
BASIC EXECUTION ENVIRONMENT
3.1. MODES OF OPERATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-1
3.2. OVERVIEW OF THE BASIC EXECUTION ENVIRONMENT . . . . . . . . . . . . . . . . . . 3-2
3.3. MEMORY ORGANIZATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-2
3.4. MODES OF OPERATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-4
3.5. 32-BIT VS. 16-BIT ADDRESS AND OPERAND SIZES. . . . . . . . . . . . . . . . . . . . . . .3-5
3.6. REGISTERS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5
3.6.1. General-Purpose Data Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5
3.6.2. Segment Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7
3.6.3. EFLAGS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-10
3.6.3.1. Status Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11
3.6.3.2. DF Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-12
3.6.4. System Flags and IOPL Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-13
3.7. INSTRUCTION POINTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
3.8. OPERAND-SIZE AND ADDRESS-SIZE ATTRIBUTES. . . . . . . . . . . . . . . . . . . . . 3-14
v
TABLE OF CONTENTS
PAGE
CHAPTER 4
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
4.1. PROCEDURE CALL TYPES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
4.2. STACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
4.2.1. Setting Up a Stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2
4.2.2. Stack Alignment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3
4.2.3. Address-Size Attributes for Stack Accesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3
4.2.4. Procedure Linking Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3
4.2.4.1. Stack-Frame Base Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4
4.2.4.2. Return Instruction Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4
4.3. CALLING PROCEDURES USING CALL AND RET . . . . . . . . . . . . . . . . . . . . . . . . 4-4
4.3.1. Near CALL and RET Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-5
4.3.2. Far CALL and RET Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-6
4.3.3. Parameter Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-6
4.3.3.1. Passing Parameters Through the General-Purpose Registers . . . . . . . . . . . .4-6
4.3.3.2. Passing Parameters on the Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-6
4.3.3.3. Passing Parameters in an Argument List . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7
4.3.4. Saving Procedure State Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7
4.3.5. Calls to Other Privilege Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7
4.3.6. CALL and RET Operation Between Privilege Levels . . . . . . . . . . . . . . . . . . . . . . 4-9
4.4. INTERRUPTS AND EXCEPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
4.4.1. Call and Return Operation for Interrupt or Exception Handling Procedures . . . .4-11
4.4.2. Calls to Interrupt or Exception Handler Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . .4-14
4.4.3. Interrupt and Exception Handling in Real-Address Mode . . . . . . . . . . . . . . . . . .4-15
4.4.4. INT n, INTO, INT 3, and BOUND Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . .4-15
4.5. PROCEDURE CALLS FOR BLOCK-STRUCTURED LANGUAGES. . . . . . . . . . . 4-16
4.5.1. ENTER Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-16
4.5.2. LEAVE Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-21
CHAPTER 5
DATA TYPES AND ADDRESSING MODES
5.1. FUNDAMENTAL DATA TYPES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-1
5.1.1. Alignment of Words, Doublewords, and Quadwords. . . . . . . . . . . . . . . . . . . . . . .5-1
5.2. NUMERIC, POINTER, BIT FIELD, AND STRING DATA TYPES . . . . . . . . . . . . . . . 5-2
5.2.1. Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-2
5.2.2. Unsigned Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4
5.2.3. BCD Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4
5.2.4. Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
5.2.5. Bit Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4
5.2.6. Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4
5.2.7. Floating-Point Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5
5.2.8. MMX™ Technology Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5
5.3. OPERAND ADDRESSING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5
5.3.1. Immediate Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5
5.3.2. Register Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-5
5.3.3. Memory Operands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-6
5.3.3.1. Specifying a Segment Selector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-6
5.3.3.2. Specifying an Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-7
5.3.3.3. Assembler and Compiler Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . .5-9
5.3.4. I/O Port Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-10
vi
TABLE OF CONTENTS
PAGE
CHAPTER 6
INSTRUCTION SET SUMMARY
6.1. NEW INTEL ARCHITECTURE INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
6.1.1. New Instructions Introduced with the MMX™ Technology . . . . . . . . . . . . . . . . . 6-1
6.1.2. New Instructions in the Pentium® Pro Processor . . . . . . . . . . . . . . . . . . . . . . . . 6-1
6.1.3. New Instructions in the Pentium® Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
6.1.4. New Instructions in the Intel486™ Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
6.2. INSTRUCTION SET LIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
6.2.1. Integer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3
6.2.1.1. Data Transfer Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3
6.2.1.2. Binary Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4
6.2.1.3. Decimal Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4
6.2.1.4. Logic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.2.1.5. Shift and Rotate Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.2.1.6. Bit and Byte Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.2.1.7. Control Transfer Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
6.2.1.8. String Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7
6.2.1.9. Flag Control Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8
6.2.1.10. Segment Register Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8
6.2.1.11. Miscellaneous Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
6.2.2. MMX™ Technology Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
6.2.2.1. MMX™ Data Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
6.2.2.2. MMX™ Conversion Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
6.2.2.3. MMX™ Packed Arithmetic Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10
6.2.2.4. MMX™ Comparison Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10
6.2.2.5. MMX™ Logic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10
6.2.2.6. MMX™ Shift and Rotate Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11
6.2.2.7. MMX™ State Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11
6.2.3. Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11
6.2.3.1. Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11
6.2.3.2. Basic Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
6.2.3.3. Comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13
6.2.3.4. Transcendental . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13
6.2.3.5. Load Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13
6.2.3.6. FPU Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14
6.2.4. System Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15
6.3. DATA MOVEMENT INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16
6.3.1. General-Purpose Data Movement Instructions . . . . . . . . . . . . . . . . . . . . . . . . . 6-16
6.3.1.1. Move Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16
6.3.1.2. Conditional Move Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16
6.3.1.3. Exchange Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-17
6.3.2. Stack Manipulation Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19
6.3.2.1. Type Conversion Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21
6.3.2.2. Simple Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21
6.3.2.3. Move and Convert. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22
6.4. BINARY ARITHMETIC INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22
6.4.1. Addition and Subtraction Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22
6.4.2. Increment and Decrement Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22
6.4.3. Comparison and Sign Change Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23
6.4.4. Multiplication and Divide Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23
6.5. DECIMAL ARITHMETIC INSTRUCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23
6.5.1. Packed BCD Adjustment Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-24
vii
TABLE OF CONTENTS
PAGE
6.5.2. Unpacked BCD Adjustment Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-24
6.6. LOGICAL INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-25
6.7. SHIFT AND ROTATE INSTRUCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-25
6.7.1. Shift Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-25
6.7.2. Double-Shift Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-27
6.7.3. Rotate Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-27
6.8. BIT AND BYTE INSTRUCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29
6.8.1. Bit Test and Modify Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29
6.8.2. Bit Scan Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29
6.8.3. Byte Set On Condition Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29
6.8.4. Test Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-30
6.9. CONTROL TRANSFER INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-30
6.9.1. Unconditional Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-30
6.9.1.1. Jump Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-30
6.9.1.2. Call and Return Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-31
6.9.1.3. Return From Interrupt Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-31
6.9.2. Conditional Transfer Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-31
6.9.2.1. Conditional Jump Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-32
6.9.2.2. Loop Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-33
6.9.2.3. Jump If Zero Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-33
6.9.3. Software Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-34
6.10. STRING OPERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-34
6.10.1. Repeating String Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-35
6.11. I/O INSTRUCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-36
6.12. ENTER AND LEAVE INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-36
6.13. EFLAGS INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-37
6.13.1. Carry and Direction Flag Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-37
6.13.2. Interrupt Flag Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-37
6.13.3. EFLAGS Transfer Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-37
6.13.4. Interrupt Flag Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-38
6.14. SEGMENT REGISTER INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-38
6.14.1. Segment-Register Load and Store Instructions. . . . . . . . . . . . . . . . . . . . . . . . . .6-38
6.14.2. Far Control Transfer Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-39
6.14.3. Software Interrupt Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-39
6.14.4. Load Far Pointer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-39
6.15. MISCELLANEOUS INSTRUCTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-39
6.15.1. Address Computation Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-39
6.15.2. Table Lookup Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-40
6.15.3. Processor Identification Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-40
6.15.4. No-Operation and Undefined Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-40
CHAPTER 7
FLOATING-POINT UNIT
7.1. COMPATIBILITY AND EASE OF USE OF THE INTEL ARCHITECTURE FPU . . . . 7-1
7.2. REAL NUMBERS AND FLOATING-POINT FORMATS. . . . . . . . . . . . . . . . . . . . . . . 7-2
7.2.1. Real Number System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-3
7.2.2. Floating-Point Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-3
7.2.2.1. Normalized Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-4
7.2.2.2. Biased Exponent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-5
7.2.3. Real Number and Non-number Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-5
7.2.3.1. Signed Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-6
7.2.3.2. Normalized and Denormalized Finite Numbers . . . . . . . . . . . . . . . . . . . . . . . .7-6
viii
TABLE OF CONTENTS
PAGE
7.2.3.3. Signed Infinities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
7.2.3.4. NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
7.2.4. Indefinite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
7.3. FPU ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
7.3.1. The FPU Data Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
7.3.1.1. Parameter Passing With the FPU Register Stack. . . . . . . . . . . . . . . . . . . . . 7-11
7.3.2. FPU Status Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12
7.3.2.1. Top of Stack (TOP) Pointer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12
7.3.2.2. Condition Code Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12
7.3.2.3. Exception Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-14
7.3.2.4. Stack Fault Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15
7.3.3. Branching and Conditional Moves on FPU Condition Codes . . . . . . . . . . . . . . 7-15
7.3.4. FPU Control Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16
7.3.4.1. Exception-Flag Masks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17
7.3.4.2. Precision Control Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17
7.3.4.3. Rounding Control Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18
7.3.5. Infinity Control Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20
7.3.6. FPU Tag Word. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20
7.3.7. The FPU Instruction and Operand (Data) Pointers . . . . . . . . . . . . . . . . . . . . . . 7-21
7.3.8. Last Instruction Opcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21
7.3.9. Saving the FPU’s State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21
7.4. FLOATING-POINT DATA TYPES AND FORMATS . . . . . . . . . . . . . . . . . . . . . . . . 7-24
7.4.1. Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-25
7.4.2. Binary Integers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-27
7.4.3. Decimal Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-28
7.4.4. Unsupported Extended-Real Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-28
7.5. FPU INSTRUCTION SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-29
7.5.1. Escape (ESC) Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-30
7.5.2. FPU Instruction Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-31
7.5.3. Data Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-31
7.5.4. Load Constant Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-33
7.5.5. Basic Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-33
7.5.6. Comparison and Classification Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-34
7.5.6.1. Branching on the FPU Condition Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-36
7.5.7. Trigonometric Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-37
7.5.8. Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-37
7.5.9. Logarithmic, Exponential, and Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-38
7.5.10. Transcendental Instruction Accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-39
7.5.11. FPU Control Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-39
7.5.12. Waiting Vs. Non-waiting Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-40
7.5.13. Unsupported FPU Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-41
7.6. OPERATING ON NANS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-41
7.6.1. Uses for Signaling NANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-42
7.6.2. Uses for Quiet NANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-42
7.7. FLOATING-POINT EXCEPTION HANDLING . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-42
7.7.1. Arithmetic vs. Non-arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-43
7.7.2. Automatic Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-43
7.7.3. Software Exception Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-45
7.7.3.1. Native Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-45
7.7.3.2. MS-DOS* Compatibility Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-45
7.7.3.3. Typical Floating-Point Exception Handler Actions . . . . . . . . . . . . . . . . . . . . 7-46
7.8. FLOATING-POINT EXCEPTION CONDITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . 7-47
ix
TABLE OF CONTENTS
PAGE
7.8.1. Invalid Operation Exception. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-47
7.8.1.1. Stack Overflow or Underflow Exception (#IS). . . . . . . . . . . . . . . . . . . . . . . . .7-48
7.8.1.2. Invalid Arithmetic Operand Exception (#IA) . . . . . . . . . . . . . . . . . . . . . . . . . .7-48
7.8.2. Divide-By-Zero Exception (#Z) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-49
7.8.3. Denormal Operand Exception (#D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-50
7.8.4. Numeric Overflow Exception (#O) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-50
7.8.5. Numeric Underflow Exception (#U) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-52
7.8.6. Inexact-Result (Precision) Exception (#P). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-53
7.8.7. Exception Priority. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-53
7.9. FLOATING-POINT EXCEPTION SYNCHRONIZATION . . . . . . . . . . . . . . . . . . . . . 7-54
CHAPTER 8
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
8.1. OVERVIEW OF THE MMX™ TECHNOLOGY PROGRAMMING
ENVIRONMENT 8-1
8.1.1. MMX™ Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-2
8.1.2. MMX™ Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-2
8.1.3. Single Instruction, Multiple Data (SIMD) Execution Model . . . . . . . . . . . . . . . . . .8-3
8.1.4. Memory Data Formats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-4
8.1.5. Data Formats for MMX™ Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-4
8.2. MMX™ INSTRUCTION SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
8.2.1. Saturation Arithmetic and Wraparound Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-5
8.2.2. Instruction Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-6
8.3. OVERVIEW OF THE MMX™ INSTRUCTION SET . . . . . . . . . . . . . . . . . . . . . . . . . .8-6
8.3.1. Data Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-6
8.3.2. Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-8
8.3.2.1. Packed Addition And Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-8
8.3.2.2. Packed Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-8
8.3.2.3. Packed Multiply Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-8
8.3.3. Comparison Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-8
8.3.4. Conversion Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-9
8.3.5. Logical Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-9
8.3.6. Shift Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-9
8.3.7. EMMS (Empty MMX™ State) Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-9
8.4. COMPATIBILITY WITH FPU ARCHITECTURE . . . . . . . . . . . . . . . . . . . . . . . . . . .8-10
8.4.1. MMX™ Instructions and the Floating-Point Tag Word . . . . . . . . . . . . . . . . . . . .8-10
8.4.2. Effect of Instruction Prefixes on MMX™ Instructions . . . . . . . . . . . . . . . . . . . . .8-10
8.5. WRITING APPLICATIONS WITH MMX™ CODE . . . . . . . . . . . . . . . . . . . . . . . . . .8-10
8.5.1. Detecting Support for MMX™ Technology Using the CPUID Instruction . . . . . . 8-11
8.5.2. Using the EMMS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-11
8.5.3. Interfacing with MMX™ Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-12
8.5.4. Writing Code with MMX™ and Floating-Point Instructions . . . . . . . . . . . . . . . . .8-13
8.5.4.1. RECOMMENDATIONS AND GUIDELINES . . . . . . . . . . . . . . . . . . . . . . . . . .8-13
8.5.5. Using MMX™ Code in a Multitasking Operating System Environment . . . . . . . .8-14
8.5.5.1. COOPERATIVE MULTITASKING OPERATING SYSTEM . . . . . . . . . . . . . . 8-14
8.5.5.2. PREEMPTIVE MULTITASKING OPERATING SYSTEM . . . . . . . . . . . . . . . . 8-14
8.5.6. Exception Handling in MMX™ Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-15
8.5.7. Register Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-15
CHAPTER 9
INPUT/OUTPUT
9.1. I/O PORT ADDRESSING. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-1
x
TABLE OF CONTENTS
PAGE
9.2. I/O PORT HARDWARE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1
9.3. I/O ADDRESS SPACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
9.3.1. Memory-Mapped I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
9.4. I/O INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3
9.5. PROTECTED-MODE I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4
9.5.1. I/O Privilege Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4
9.5.2. I/O Permission Bit Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
9.6. ORDERING I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6
CHAPTER 10
PROCESSOR IDENTIFICATION AND FEATURE DETERMINATION
10.1. PROCESSOR IDENTIFICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1
10.2. IDENTIFICATION OF EARLIER INTEL ARCHITECTURE PROCESSORS . . . . . 10-3
APPENDIX A
EFLAGS CROSS-REFERENCE
APPENDIX B
EFLAGS CONDITION CODES
APPENDIX C
FLOATING-POINT EXCEPTIONS SUMMARY
APPENDIX D
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
D.1. ORIGIN OF THE MS-DOS* COMPATIBILITY MODE FOR
HANDLING FPU EXCEPTIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-2
D.2. IMPLEMENTATION OF THE MS-DOS* COMPATIBILITY MODE
IN THE INTEL486™, PENTIUM®, AND PENTIUM PRO PROCESSORS . . . . . . . D-3
D.2.1. MS-DOS* Compatibility Mode in the Intel486™ and Pentium® Processors . . . . D-3
D.2.1.1. Basic Rules: When FERR# Is Generated . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-4
D.2.1.2. Recommended External Hardware to Support the
MS-DOS* Compatibility Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-5
D.2.1.3. No-Wait FPU Instructions Can Get FPU Interrupt in Window . . . . . . . . . . . . . D-7
D.2.2. MS-DOS* Compatibility Mode in the Pentium® Pro Processor . . . . . . . . . . . . . . D-9
D.3. RECOMMENDED PROTOCOL FOR MS-DOS* COMPATIBILITY HANDLERS . . D-10
D.3.1. Floating-Point Exceptions and Their Defaults . . . . . . . . . . . . . . . . . . . . . . . . . . D-10
D.3.2. Two Options for Handling Numeric Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . D-11
D.3.2.1. Automatic Exception Handling: Using Masked Exceptions. . . . . . . . . . . . . . D-11
D.3.2.2. Software Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-13
D.3.3. Synchronization Required for Use of FPU Exception Handlers. . . . . . . . . . . . . D-14
D.3.3.1. Exception Synchronization: What, Why and When. . . . . . . . . . . . . . . . . . . . D-14
D.3.3.2. Exception Synchronization Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-15
D.3.3.3. Proper Exception Synchronization in General . . . . . . . . . . . . . . . . . . . . . . . D-16
D.3.4. FPU Exception Handling Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-16
D.3.5. Need for Storing State of IGNNE# Circuit If Using FPU and SMM . . . . . . . . . . D-20
D.3.6. Considerations When FPU Shared Between Tasks . . . . . . . . . . . . . . . . . . . . . D-21
D.3.6.1. Speculatively Deferring FPU Saves, General Overview . . . . . . . . . . . . . . . . D-22
D.3.6.2. Tracking FPU Ownership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-22
D.3.6.3. interaction of FPU State Saves and Floating Point Exception Association. . D-23
D.3.6.4. Interrupt Routing From the Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-26
xi
TABLE OF CONTENTS
PAGE
D.4. DIFFERENCES FOR HANDLERS USING NATIVE MODE. . . . . . . . . . . . . . . . . . D-26
D.4.1. Origin With the Intel 286 and Intel 287, and Intel386™ and
Intel 387 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-27
D.4.2. Changes with Intel486™, Pentium“ and Pentium Pro Processors
with CR0.NE=1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-27
D.4.3. Considerations When FPU Shared Between Tasks Using Native Mode . . . . . D-28
xii
TABLE OF FIGURES
PAGE
xiii
TABLE OF FIGURES
PAGE
Figure 7-13. Protected Mode FPU State Image in Memory, 32-Bit Format . . . . . . . . . . . . 7-22
Figure 7-14. Real Mode FPU State Image in Memory, 32-Bit Format . . . . . . . . . . . . . . . . 7-23
Figure 7-15. Protected Mode FPU State Image in Memory, 16-Bit Format . . . . . . . . . . . . 7-23
Figure 7-16. Real Mode FPU State Image in Memory, 16-Bit Format . . . . . . . . . . . . . . . . 7-23
Figure 7-17. Floating-Point Unit Data Type Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-24
Figure 8-1. MMX™ Register Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-2
Figure 8-2. MMX™ Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-3
Figure 8-3. Eight Packed Bytes in Memory (at address 1000H) . . . . . . . . . . . . . . . . . . . . .8-4
Figure 9-1. Memory-Mapped I/O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-3
Figure 9-2. I/O Permission Bit Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-5
Figure D-1. Recommended Circuit for MS-DOS* Compatibility FPU
Exception Handling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-6
Figure D-2. Behavior of Signals During FPU Exception Handling . . . . . . . . . . . . . . . . . . . D-7
Figure D-3. Timing of Receipt of External Interrupt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-8
Figure D-4. Arithmetic Example Using Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-12
Figure D-5. General Program Flow for DNA Exception Handler . . . . . . . . . . . . . . . . . . . D-25
Figure D-6. Program Flow for a Numeric Exception Dispatch Routine . . . . . . . . . . . . . . D-25
xiv
TABLE OF TABLES
PAGE
Table 2-1. Processor Performance Over Time and Other Key Features of the
Intel Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-4
Table 3-1. Effective Operand- and Address-Size Attributes . . . . . . . . . . . . . . . . . . . . . .3-15
Table 4-1. Exceptions and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-12
Table 5-1. Default Segment Selection Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-7
Table 6-1. Move Instruction Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-17
Table 6-2. Conditional Move Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-18
Table 6-3. Bit Test and Modify Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29
Table 6-4. Conditional Jump Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-32
Table 6-5. Information Provided by the CPUID Instruction . . . . . . . . . . . . . . . . . . . . . . .6-40
Table 7-1. Real Number Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-5
Table 7-2. Denormalization Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-7
Table 7-3. FPU Condition Code Interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-14
Table 7-4. Precision Control Field (PC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-17
Table 7-5. Rounding Control Field (RC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-18
Table 7-6. Rounding of Positive Numbers With Masked Overflow . . . . . . . . . . . . . . . . .7-19
Table 7-7. Rounding of Negative Numbers With Masked Overflow. . . . . . . . . . . . . . . . .7-19
Table 7-8. Length, Precision, and Range of FPU Data Types. . . . . . . . . . . . . . . . . . . . . 7-25
Table 7-9. Real Number and NaN Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-26
Table 7-10. Binary Integer Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-27
Table 7-11. Packed Decimal Integer Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-29
Table 7-12. Unsupported Extended-Real Encodings. . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-30
Table 7-13. Data Transfer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-31
Table 7-14. Floating-Point Conditional Move Instructions . . . . . . . . . . . . . . . . . . . . . . . . .7-32
Table 7-15. Setting of FPU Condition Code Flags for Real Number Comparisons . . . . . . 7-35
Table 7-16. Setting of EFLAGS Status Flags for Real Number Comparisons. . . . . . . . . . 7-35
Table 7-17. TEST Instruction Constants for Conditional Branching . . . . . . . . . . . . . . . . .7-36
Table 7-18. Rules for Generating QNaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-41
Table 7-19. Arithmetic and Non-arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . .7-44
Table 7-20. Invalid Arithmetic Operations and the Masked Responses to Them . . . . . . .7-49
Table 7-21. Divide-By-Zero Conditions and the Masked Responses to Them . . . . . . . . .7-50
Table 7-22. Masked Responses to Numeric Overflow. . . . . . . . . . . . . . . . . . . . . . . . . . . .7-51
Table 8-1. Data Range Limits for Saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-5
Table 8-2. MMX™ Instruction Set Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-7
Table 8-3. Effect of Prefixes on MMX™ Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-10
Table 9-1. I/O Instruction Serialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-7
Table A-1. EFLAGS Cross-Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1
Table B-1. EFLAGS Condition Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
Table C-1. Floating-Point Exceptions Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1
xv
1
About This Manual
CHAPTER 1
ABOUT THIS MANUAL
The Intel Architecture Software Developer’s Manual, Volume 1: Basic Architecture (Order
Number 243190) is part of a three-volume set that describes the architecture and programming
environment of all Intel Architecture processors. The other two volumes in this set are:
• The Intel Architecture Software Developer’s Manual, Volume 2: Instruction Set Reference
(Order Number 243191).
• The Intel Architecture Software Developer’s Manual, Volume 3: System Programming
Guide (Order Number 243192).
The Intel Architecture Software Developer’s Manual, Volume 1, describes the basic architecture
and programming environment of an Intel Architecture processor; the Intel Architecture Soft-
ware Developer’s Manual, Volume 2, describes the instruction set of the processor and the
opcode structure. These two volumes are aimed at application programmers who are writing
programs to run under existing operating systems or executives. The Intel Architecture Software
Developer’s Manual, Volume 3 describes the operating-system support environment of an Intel
Architecture processor, including memory management, protection, task management, interrupt
and exception handling, and system management mode. It also provides Intel Architecture
processor compatibility information. This volume is aimed at operating-system and BIOS
designers and programmers.
1-1
ABOUT THIS MANUAL
Chapter 5 — Data Types and Addressing Modes. Describes the data types and addressing
modes recognized by the processor.
Chapter 6 — Instruction Set Summary. Gives an overview of all the Intel Architecture
instructions except those executed by the processor’s floating-point unit. The instructions are
presented in functionally related groups.
Chapter 7 — Floating-Point Unit. Describes the Intel Architecture floating-point unit,
including the floating-point registers and data types; gives an overview of the floating-point
instruction set; and describes the processor's floating-point exception conditions.
Chapter 8 — Programming with Intel MMX™ Technology. Describes the Intel MMX™
technology, including MMX registers and data types, and gives an overview of the MMX
instruction set.
Chapter 9 — Input/Output. Describes the processor’s I/O architecture, including I/O port
addressing, the I/O instructions, and the I/O protection mechanism.
Chapter 10 — Processor Identification and Feature Determination. Describes how to deter-
mine the CPU type and the features that are available in the processor.
Appendix A — EFLAGS Cross-Reference. Summaries how the Intel Architecture instructions
affect the flags in the EFLAGS register.
Appendix B — EFLAGS Condition Codes. Summarizes how the conditional jump, move, and
byte set on condition code instructions use the condition code flags (OF, CF, ZF, SF, and PF) in
the EFLAGS register.
Appendix C — Floating-Point Exceptions Summary. Summarizes the exceptions that can be
raised by floating-point instructions.
Appendix D — Guidelines for Writing FPU Exception Handlers. Describes how to design
and write MS-DOS* compatible exception handling facilities for FPU exceptions, including
both software and hardware requirements and assembly-language code examples. This appendix
also describes general techniques for writing robust FPU exception handlers.
1-2
ABOUT THIS MANUAL
Chapter 3 — Instruction Set Reference. Describes each of the Intel Architecture instructions
in detail, including an algorithmic description of operations, the effect on flags, the effect of
operand- and address-size attributes, and the exceptions that may be generated. The instructions
are arranged in alphabetical order. The FPU and MMX instructions are included in this chapter.
Appendix A — Opcode Map. Gives an opcode map for the Intel Architecture instruction set.
Appendix B — Instruction Formats and Encodings. Gives the binary encoding of each form
of each Intel Architecture instruction.
1-3
ABOUT THIS MANUAL
how to set up an Intel Architecture processor for real-address mode operation and protected
mode operation, and how to switch between modes.
Chapter 9 — Memory Cache Control. Describes the general concept of caching and the
caching mechanisms supported by the Intel Architecture. This chapter also describes the
memory type range registers (MTRRs) and how they can be used to map memory types of phys-
ical memory. MTRRs were introduced into the Intel Architecture with the Pentium® Pro
processor.
Chapter 10 — MMX™ Technology System Programming Model. Describes those aspects
of the Intel MMX technology that must be handled and considered at the system programming
level, including task switching, exception handling, and compatibility with existing system envi-
ronments.
Chapter 11 — System Management Mode (SMM). Describes the Intel Architecture’s system
management mode (SMM), which can be used to implement power management functions.
Chapter 12 — Machine Check Architecture. Describes the machine check architecture,
which was introduced into the Intel Architecture with the Pentium processor.
Chapter 13 — Code Optimization. Discusses general optimization techniques for program-
ming an Intel Architecture processor.
Chapter 14 — Debugging and Performance Monitoring. Describes the debugging registers
and other debug mechanism provided in the Intel Architecture. This chapter also describes the
time-stamp counter and the performance monitoring counters.
Chapter 15 — 8086 Emulation. Describes the real-address and virtual-8086 modes of the Intel
Architecture.
Chapter 16 — Mixing 16-Bit and 32-Bit Code. Describes how to mix 16-bit and 32-bit code
modules within the same program or task.
Chapter 17 — Intel Architecture Compatibility. Describes the programming differences
between the Intel 286, Intel386™, Intel486™, Pentium, and Pentium Pro processors. The differ-
ences among the 32-bit Intel Architecture processors (the Intel386, Intel486, Pentium, and
Pentium Pro processors) are described throughout the three volumes of the Intel Architecture
Software Developer’s Manual, as relevant to particular features of the architecture. This chapter
provides a collection of all the relevant compatibility information for all Intel Architecture
processors and also describes the basic differences with respect to the 16-bit Intel Architecture
processors (the Intel 8086 and Intel 286 processors).
Appendix A — Performance-Monitoring Counters. Lists the events that can be counted with
the performance-monitoring counters and the codes used to select these events.
Appendix B — Model Specific Registers (MSRs). Lists the MSRs available in the Pentium Pro
processor and their functions.
1-4
ABOUT THIS MANUAL
Data Structure
Highest 24 23 16 15 8 7 0 Bit offset
31
Address
28
24
20
16
12
8
4
Lowest
Byte 3 Byte 2 Byte 1 Byte 0 0 Address
Byte Offset
1-5
ABOUT THIS MANUAL
NOTE
Avoid any software dependence upon the state of reserved bits in Intel Archi-
tecture registers. Depending upon the values of reserved register bits will
make software dependent upon the unspecified manner in which the
processor handles these bits. Programs that depend upon reserved values risk
incompatibility with future processors.
where:
• A label is an identifier which is followed by a colon.
• A mnemonic is a reserved name for a class of instruction opcodes which have the same
function.
• The operands argument1, argument2, and argument3 are optional. There may be from
zero to three operands, depending on the opcode. When present, they take the form of
either literals or identifiers for data items. Operand identifiers are either reserved names of
registers or are assumed to be assigned to data items declared in another part of the
program (which may not be shown in the example).
When two operands are present in an arithmetic or logical instruction, the right operand is the
source and the left operand is the destination.
For example:
LOADREG: MOV EAX, SUBTOTAL
In this example LOADREG is a label, MOV is the mnemonic identifier of an opcode, EAX is
the destination operand, and SUBTOTAL is the source operand. Some assembly languages put
the source and destination in reverse order.
1-6
ABOUT THIS MANUAL
The following segment address identifies an instruction address in the code segment. The CS
register points to the code segment and the EIP register contains the address of the instruction.
CS:EIP
1.4.6. Exceptions
An exception is an event that typically occurs when an instruction causes an error. For example,
an attempt to divide by zero generates an exception. However, some exceptions, such as break-
points, occur under other conditions. Some types of exceptions may provide error codes. An
error code reports additional information about the error. An example of the notation used to
show an exception and error code is shown below.
#PF(fault code)
This example refers to a page-fault exception under conditions where an error code naming a
type of fault is reported. Under some conditions, exceptions which produce error codes may not
be able to report an accurate code. In this case, the error code is zero, as shown below for a
general-protection exception.
#GP(0)
See Chapter 5, Interrupt and Exception Handling, in the Intel Architecture Software Developer’s
Manual, Volume 3, for a list of exception mnemonics and their descriptions.
1-7
ABOUT THIS MANUAL
1-8
2
Introduction to the
Intel Architecture
CHAPTER 2
INTRODUCTION TO THE INTEL ARCHITECTURE
A strong case can be made that the exponential growth of both the power and breadth of usage
of the computer has made it the most important force that is reshaping human technology, busi-
ness, and society in the second half of the twentieth century. Further, the computer promises to
continue to dominate technological growth well into the twenty-first century, in part since other
powerful technological forces that are just emerging are strongly dependent on the growth of
computing power for their own existence and growth (such as the Internet, and genetics devel-
opments like recombinant DNA research and development). The Intel Architecture is clearly
today’s preferred computer architecture, as measured by number of computers in use and total
computing power available in the world. Thus it is hard to overestimate the importance of the
Intel Architecture.
2-1
INTRODUCTION TO THE INTEL ARCHITECTURE
ibility. A new virtual-8086 mode was provided to yield greater efficiency when executing
programs created for the 8086 and 8088 processors on the new 32-bit machine. The 32-bit
addressing was supported with an external 32-bit address bus, giving a 4-GByte address space,
and also allowed each segment to be as large as 4 GBytes. The original instructions were
enhanced with new 32-bit operand and addressing forms, and completely new instructions were
provided, including those for bit manipulation. The Intel386 processor also introduced paging
into the Intel Architecture, with the fixed 4-KByte page size providing a method for virtual
memory management that was significantly superior compared to using segments for the
purpose (it was much more efficient for operating systems, and completely transparent to the
applications without significant sacrifice of execution speed). Furthermore, the ability to define
segments as large as the 4 GBytes physical address space, together with paging, allowed the
creation of protected “flat model”1 addressing systems in the architecture, including complete
implementations of the widely used main-frame operating system UNIX.
The Intel Architecture has been and is committed to the task of maintaining backward compat-
ibility at the object code level to preserve our customers’ very large investment in software, but
at the same time, in each generation of the architecture the latest most effective microprocessor
architecture and silicon fabrication technologies have been used to produce the fastest, most
powerful processors possible. Intel has worked over the generations to adapt and incorporate
increasingly sophisticated techniques from main-frame architecture into microprocessor archi-
tecture. Various forms of parallel processing have been the most performance enhancing of these
techniques, and the Intel386 processor was the first Intel Architecture processor to include a
number of parallel stages: six. These are the Bus Interface Unit (accesses memory and I/O for
the other units), the Code Prefetch Unit (receives object code from the Bus Unit and puts it into
a 16 byte queue), the Instruction Decode Unit (decodes object code from the Prefetch unit into
microcode), the Execution Unit (executes the microcode instructions), the Segment Unit (trans-
lates logical addresses to linear addresses and does protection checks), and the Paging Unit
(translates linear addresses to physical addresses, does page based protection checks, and
contains a cache with information for up to 32 most recently accessed pages).
The Intel486 processor added more parallel execution capability by (basically) expanding the
Intel386 processor’s Instruction Decode and Execution Units into five pipelined stages, where
each stage (when needed) operates in parallel with the others on up to five instructions in
different stages of execution. Each stage can do its work on one instruction in one clock, and so
the Intel486 processor can execute as rapidly as one instruction per CPU clock. An 8-KByte on
chip L1 cache was added to the Intel486 processor to greatly increase the percent of instructions
that could execute at the scalar rate of one per clock: memory access instructions were now
included if the operand was in the L1 cache. The Intel486 processor also for the first time inte-
grated the Floating-Point math Unit onto the same chip as the CPU (see section 2.3 below) and
added new pins, bits and instructions to support more complex and powerful systems (L2 cache
support and multiprocessor support).
Late in the Intel486 processor generation, Intel incorporated features designed to support energy
savings and other system management capabilities into the Intel Architecture mainstream with
the Intel486 SL Enhanced processors. These features were developed in the Intel386 SL and
Intel486 SL processors, which were specialized for the rapidly growing battery operated note-
1. Requires ony one 32-bit address component to access anywhere in the address space.
2-2
INTRODUCTION TO THE INTEL ARCHITECTURE
book PC market. The features include the new System Management Mode, triggered by its own
dedicated interrupt pin, which allows complex system management features (such as power
management of various subsystems within the PC), to be added to a system transparently to the
main operating system and all applications. The Stop Clock and Auto Halt Powerdown features
allow the CPU itself to execute at a reduced clock rate to save power, or to be shut down (with
state preserved) to save even more power.
The Intel Pentium processor added a second execution pipeline to achieve superscalar perfor-
mance (two pipelines, known as u and v, together can execute two instructions per clock). The
on-chip L1 cache has also been doubled, with 8 KBytes devoted to code, and another 8 KBytes
to data. The data cache uses the MESI protocol to support the more efficient write-back mode,
as well as the write-through mode that is used by the Intel486 processor. Branch prediction with
an on-chip branch table has been added to increase performance in looping constructs. Exten-
sions have been added to make the virtual-8086 mode more efficient, and to allow for 4-MByte
as well as 4-KByte pages. The main registers are still 32 bits, but internal data paths of 128 and
256-bits have been added to speed internal data transfers, and the burstable external data bus has
been increased to 64 bits. The Advanced Programmable Interrupt Controller (APIC) has been
added to support systems with multiple Pentium processors, and new pins and a special mode
(dual processing) has been designed in to support glueless two processor systems.
The Intel Pentium Pro processor is the latest and most powerful member of the Intel Architec-
ture. It has three-way superscalar architecture, which means that it can execute three instructions
per CPU clock. It does this by incorporating even more parallelism than the Pentium processor.
The Pentium Pro processor provides Dynamic Execution (micro-data flow analysis, out-of-
order execution, superior branch prediction, and speculative execution) in a superscalar imple-
mentation. Three instruction decode units work in parallel to decode object code into smaller
operations called “micro-ops.” These go into an instruction pool, and (when interdependencies
don’t prevent) can be executed out of order by the five parallel execution units (two integer, two
FPU and one memory interface unit). The Retirement Unit retires completed micro-ops in their
original program order, taking account of any branches. The power of the Pentium Pro processor
is further enhanced by its caches: it has the same two on-chip 8-KByte L1 caches as does the
Pentium processor, and also has a 256-KByte L2 cache that’s in the same package as, and closely
coupled to, the CPU, using a dedicated 64-bit (“backside”) full clock speed bus. The L1 cache
is dual ported, the L2 cache supports up to 4 concurrent accesses, and the 64-bit external data
bus is transaction-oriented, meaning that each access is handled as a separate request and
response, with numerous requests allowed while awaiting a response. These parallel features for
data access work with the parallel execution capabilities to provide a “non-blocking” architec-
ture in which the processor is more fully utilized and performance is enhanced. The Pentium Pro
processor also has an expanded 36-bit address bus, giving a maximum physical address space of
64 GBytes.
Since the Pentium Pro processor is currently the most advanced of the Intel Architecture family,
a more detailed description of its architecture is provided in Sections 2.4. and 2.5. More detailed
hardware and architectural information on each of the generations of the Intel Architecture
family is available in the separate data books for the processor generations (see Section 1.5.,
Related Literature).
2-3
INTRODUCTION TO THE INTEL ARCHITECTURE
NOTES:
1. Performance here is indicated by Dhrystone MIPs (Millions of Instructions per Second) because even
though MIPs are no longer considered a preferred measure of CPU performance, they are the only
benchmarks that span all six generations of the Intel Architecture. The MIPs and frequency values given
here correspond to the maximum CPU frequency available at product introduction.
2. Main CPU register size and external data bus size are given in bits. Note also that there are 8 and 16-bit
data registers in all of the CPUs, there are eight 80-bit registers in the FPUs integrated into the Intel386™
chip and beyond, and there are internal data paths that are 2 to 4 times wider than the external data bus
for each processor.
3. In addition to the large general purpose caches listed in the table for the Intel486™ processor (8 KBytes
of combined code and data) and the Intel Pentium® and Pentium Pro processors (8 KBytes each for sep-
arate code cache and data cache), there are smaller special purpose caches. The Intel 286 has 6 byte
descriptor caches for each segment register. The Intel386 has 8 byte descriptor caches for each segment
register, and also a 32 entry, 4 way set associative Translation Lookaside Buffer (cache) to store access
information for recently used pages on the chip. The Intel486 has the same caches described for the
Intel386, as well as its 8K L1 general purpose cache. The Intel Pentium and Pentium Pro processors have
their general purpose caches, descriptor caches, and two Translation Lookaside Buffers each (one for
each 8K L1 cache).
2-4
INTRODUCTION TO THE INTEL ARCHITECTURE
2-5
INTRODUCTION TO THE INTEL ARCHITECTURE
goals of the Intel chip architects was to exceed the performance of the Pentium processor signif-
icantly while still using the same 0.6-micrometer, four-layer, metal BICMOS manufacturing
process. Using the same manufacturing process as the Pentium processor meant that perfor-
mance gains could only be achieved through substantial advances in the microarchitecture.
The resulting Pentium Pro processor microarchitecture is a three-way superscalar, pipelined
architecture. The term “three-way superscalar” means that using parallel processing techniques,
the processor is able on average to decode, dispatch, and complete execution of (retire) three
instructions per clock cycle. To handle this level of instruction throughput, the Pentium Pro
processor uses a decoupled, 12-stage superpipeline that supports out-of-order instruction execu-
tion. Figure 2-1 shows a conceptual view of this pipeline, with the pipeline divided into four
processing units (the fetch/decode unit, the dispatch/execute unit, the retire unit, and the instruc-
tion pool). Instructions and data are supplied to these units through the bus interface unit.
System Bus
L2 Cache
Cache Bus
Intel
Fetch/Decode Dispatch/
Retire Unit Architecture
Unit Execute Unit
Registers
Instruction
Pool
Figure 2-1. The Processing Units in the Pentium® Pro Processor Microarchitecture
and Their Interface with the Memory Subsystem
To insure a steady supply of instructions and data to the instruction execution pipeline, the
Pentium Pro processor microarchitecture incorporates two cache levels. The L1 cache provides
an 8-KByte instruction cache and an 8-KByte data cache, both closely coupled to the pipeline.
The L2 cache is a 256-KByte static RAM that is coupled to the core processor through a full
clock-speed, 64-bit, cache bus.
2-6
INTRODUCTION TO THE INTEL ARCHITECTURE
2-7
INTRODUCTION TO THE INTEL ARCHITECTURE
2-8
INTRODUCTION TO THE INTEL ARCHITECTURE
Cache Bus
Next IP
Instruction Fetch Unit Instruction Cache (L1) Unit
Branch Memory
Instruction Decoder Target Reorder
Buffer Buffer
Simple Simple Complex
Instruction Instruction Instruction
Decoder Decoder Decoder Microcode From
Instruction Integer
Sequencer Unit
Register Alias Table
Retirement
Retirement Unit Register File Data Cache
(Intel Arch. Unit (L1)
Reorder Buffer (Instruction Pool) Registers)
Reservation Station
To Branch
Target Buffer
Figure 2-2. Functional Block Diagram of the Pentium® Pro Processor Microarchitecture
Memory requests to the L2 cache or system memory go through the memory reorder buffer,
which functions as a scheduling and dispatch station. This unit keeps track of all memory
requests and is able to reorder some requests to prevent blocks and improve throughput. For
example, the memory reorder buffer allows loads to pass stores. It also issues speculative loads.
(Stores are always dispatched in order, and speculative stores are never issued.)
2-9
INTRODUCTION TO THE INTEL ARCHITECTURE
2-10
INTRODUCTION TO THE INTEL ARCHITECTURE
The reorder buffer is an array of content-addressable memory, arranged into 40 micro-op regis-
ters. It contains micro-ops that are waiting to be executed, as well as those that have already been
executed but not yet committed to machine state. The dispatch/execute unit can execute instruc-
tions from the reorder buffer in any order.
2-11
INTRODUCTION TO THE INTEL ARCHITECTURE
2-12
3
Basic Execution
Environment
CHAPTER 3
BASIC EXECUTION ENVIRONMENT
This chapter describes the basic execution environment of an Intel Architecture processor as
seen by assembly-language programmers. It describes how the processor executes instructions
and how it stores and manipulates data. The parts of the execution environment described here
include memory (the address space), the general-purpose data registers, the segment registers,
the EFLAGS register, and the instruction pointer register.
The execution environment for the floating-point unit (FPU) is described in Chapter 7, Floating-
Point Unit.
3-1
BASIC EXECUTION ENVIRONMENT
232 −1
0
Figure 3-1. Pentium Pro Processor Basic Execution Environment
®
3-2
BASIC EXECUTION ENVIRONMENT
With the flat memory model (see Figure 3-2), memory appears to a program as a single, contin-
uous address space, called a linear address space. Code (a program’s instructions), data, and
the procedure stack are all contained in this address space. The linear address space is byte
addressable, with addresses running contiguously from 0 to 232 − 1. An address for any byte in
the linear address space is called a linear address.
Flat Model
Linear Address
Linear
Address
Space*
Segmented Model
Segments
Offset Linear
Address
Space*
Logical
Address Segment Selector
With the segmented memory model, memory appears to a program as a group of independent
address spaces called segments. When using this model, code, data, and stacks are typically
contained in separate segments. To address a byte in a segment, a program must issue a logical
address, which consists of a segment selector and an offset. (A logical address is often referred
to as a far pointer.) The segment selector identifies the segment to be accessed and the offset
identifies a byte in the address space of the segment. The programs running on an Intel Archi-
tecture processor can address up to 16,383 segments of different sizes and types, and each
segment can be as large as 232 bytes.
3-3
BASIC EXECUTION ENVIRONMENT
Internally, all the segments that are defined for a system are mapped into the processor’s linear
address space. So, the processor translates each logical address into a linear address to access a
memory location. This translation is transparent to the application program.
The primary reason for using segmented memory is to increase the reliability of programs and
systems. For example, placing a program’s stack in a separate segment prevents the stack from
growing into the code or data space and overwriting instructions or data, respectively. And
placing the operating system’s or executive’s code, data, and stack in separate segments protects
them from the application program and vice versa.
With either the flat or segmented model, the Intel Architecture provides facilities for dividing
the linear address space into pages and mapping the pages into virtual memory. If an operating
system/executive uses the Intel Architecture’s paging mechanism, the existence of the pages is
transparent to an application program.
The real-address mode model uses the memory model for the Intel 8086 processor, the first
Intel Architecture processor. It was provided in all the subsequent Intel Architecture processors
for compatibility with existing programs written to run on the Intel 8086 processor. The real-
address mode uses a specific implementation of segmented memory in which the linear address
space for the program and the operating system/executive consists of an array of segments of up
to 64K bytes in size each. The maximum size of the linear address space in real-address mode
is 220 bytes. (See Chapter 15, 8086 Emulation, in the Intel Architecture Software Developer’s
Manual, Volume 3, for more information on this memory model.)
3-4
BASIC EXECUTION ENVIRONMENT
3.6. REGISTERS
The processor provides 16 registers for use in general system and application programing. As
shown in Figure 3-3, these registers can be grouped as follows:
• General-purpose data registers. These eight registers are available for storing operands
and pointers.
• Segment registers. These registers hold up to six segment selectors.
• Status and control registers. These registers report and allow modification of the state of
the processor and of the program being executed.
3-5
BASIC EXECUTION ENVIRONMENT
Although all of these registers are available for general storage of operands, results, and pointers,
caution should be used when referencing the ESP register. The ESP register holds the stack
pointer and as a general rule should not be used for any other purpose.
Many instructions assign specific registers to hold operands. For example, string instructions use
the contents of the ECX, ESI, and EDI registers as operands. When using a segmented memory
model, some instructions assume that pointers in certain registers are relative to specific
segments. For instance, some instructions assume that a pointer in the EBX register points to a
memory location in the DS segment.
The special uses of general-purpose registers by instructions are described in Chapter 6, Instruc-
tion Set Summary, in this volume and Chapter 3, Instruction Set Reference, in the Intel Architec-
ture Software Developer’s Manual, Volume 2. The following is a summary of these special uses:
• EAX—Accumulator for operands and results data.
• EBX—Pointer to data in the DS segment.
General-Purpose Registers
31 0
EAX
EBX
ECX
EDX
ESI
EDI
EBP
ESP
Segment Registers
15 0
CS
DS
SS
ES
FS
GS
3-6
BASIC EXECUTION ENVIRONMENT
General-Purpose Registers
31 16 15 8 7 0 16-bit 32-bit
AH AL AX EAX
BH BL BX EBX
CH CL CX ECX
DH DL DX EDX
BP EBP
SI ESI
DI EDI
SP ESP
3-7
BASIC EXECUTION ENVIRONMENT
How segment registers are used depends on the type of memory management model that the
operating system or executive is using. When using the flat (unsegmented) memory model, the
segment registers are loaded with segment selectors that point to overlapping segments, each of
which begins at address 0 of the linear address space (as shown in Figure 3-5). These overlap-
ping segments then comprise the linear-address space for the program. (Typically, two overlap-
ping segments are defined: one for code and another for data and stacks. The CS segment
register points to the code segment and all the other segment registers point to the data and stack
segment.)
When using the segmented memory model, each segment register is ordinarily loaded with a
different segment selector so that each segment register points to a different segment within the
linear-address space (as shown in Figure 3-6). At any time, a program can thus access up to six
segments in the linear-address space. To access a segment not pointed to by one of the segment
registers, a program must first load the segment selector for the segment to be accessed into a
segment register.
Linear Address
Space for Program
3-8
BASIC EXECUTION ENVIRONMENT
Code
Segment
Segment Registers
Data
CS Segment
DS Stack
SS Segment
ES All segments
FS are mapped
GS to the same
linear-address
space
Data
Segment
Data
Segment
Data
Segment
Each of the segment registers is associated with one of three types of storage: code, data, or
stack). For example, the CS register contains the segment selector for the code segment, where
the instructions being executed are stored. The processor fetches instructions from the code
segment, using a logical address that consists of the segment selector in the CS register and the
contents of the EIP register. The EIP register contains the linear address within the code segment
of the next instruction to be executed. The CS register cannot be loaded explicitly by an appli-
cation program. Instead, it is loaded implicitly by instructions or internal processor operations
that change program control (such as, procedure calls, interrupt handling, or task switching).
The DS, ES, FS, and GS registers point to four data segments. The availability of four data
segments permits efficient and secure access to different types of data structures. For example,
four separate data segments might be created: one for the data structures of the current module,
another for the data exported from a higher-level module, a third for a dynamically created data
structure, and a fourth for data shared with another program. To access additional data segments,
the application program must load segment selectors for these segments into the DS, ES, FS, and
GS registers, as needed.
The SS register contains the segment selector for a stack segment, where the procedure stack is
stored for the program, task, or handler currently being executed. All stack operations use the
SS register to find the stack segment. Unlike the CS register, the SS register can be loaded explic-
itly, which permits application programs to set up multiple stacks and switch among them.
See Section 3.3., “Memory Organization”, for an overview of how the segment registers are used
in real-address mode.
3-9
BASIC EXECUTION ENVIRONMENT
The four segment registers CS, DS, SS, and ES are the same as the segment registers found in
the Intel 8086 and Intel 286 processors and the FS and GS registers were introduced into the
Intel Architecture with the Intel386 family of processors.
3-10
BASIC EXECUTION ENVIRONMENT
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
I
V V O
I A R N O D I T S Z A P C
0 0 0 0 0 0 0 0 0 0 D I I C V F 0 T F F F F F F 0 F 0 F 1 F
M P
P F
L
X ID Flag (ID)
X Virtual Interrupt Pending (VIP)
X Virtual Interrupt Flag (VIF)
X Alignment Check (AC)
X Virtual-8086 Mode (VM)
X Resume Flag (RF)
X Nested Task (NT)
X I/O Privilege Level (IOPL)
X Overflow Flag (OF)
X Direction Flag (DF)
X Interrupt Enable Flag (IF)
X Trap Flag (TF)
S Sign Flag (SF)
S Zero Flag (ZF)
S Auxiliary Carry Flag (AF)
S Parity Flag (PF)
S Carry Flag (CF)
As the Intel Architecture has evolved, flags have been added to the EFLAGS register, but the
function and placement of existing flags have remained the same from one family of the Intel
Architecture processors to the next. As a result, code that accesses or modifies these flags for
one family of Intel Architecture processors works as expected when run on later families of
processors.
3-11
BASIC EXECUTION ENVIRONMENT
PF (bit 2) Parity flag. Set if the least-significant byte of the result contains an even
number of 1 bits; cleared otherwise.
AF (bit 4) Adjust flag. Set if an arithmetic operation generates a carry or a borrow
out of bit 3 of the result; cleared otherwise. This flag is used in binary-
coded decimal (BCD) arithmetic.
ZF (bit 6) Zero flag. Set if the result is zero; cleared otherwise.
SF (bit 7) Sign flag. Set equal to the most-significant bit of the result, which is the
sign bit of a signed integer. (0 indicates a positive value and 1 indicates a
negative value.)
OF (bit 11) Overflow flag. Set if the integer result is too large a positive number or
too small a negative number (excluding the sign-bit) to fit in the destina-
tion operand; cleared otherwise. This flag indicates an overflow condition
for signed-integer (two’s complement) arithmetic.
Of these status flags, only the CF flag can be modified directly, using the STC, CLC, and CMC
instructions. Also the bit instructions (BT, BTS, BTR, and BTC) copy a specified bit into the CF
flag.
The status flags allow a single arithmetic operation to produce results for three different data
types: unsigned integers, signed integers, and BCD integers. If the result of an arithmetic oper-
ation is treated as an unsigned integer, the CF flag indicates an out-of-range condition (carry or
a borrow); if treated as a signed integer (two’s complement number), the OF flag indicates a
carry or borrow; and if treated as a BCD digit, the AF flag indicates a carry or borrow. The SF
flag indicates the sign of a signed integer. The ZF flag indicates either a signed- or an unsigned-
integer zero.
When performing multiple-precision arithmetic on integers, the CF flag is used in conjunction
with the add with carry (ADC) and subtract with borrow (SBB) instructions to propagate a carry
or borrow from one computation to the next.
The condition instructions Jcc (jump on condition code cc), SETcc (byte set on condition code
cc), LOOPcc, and CMOVcc (conditional move) use one or more of the status flags as condition
codes and test them for branch, set-byte, or end-loop conditions.
3.6.3.2. DF FLAG
The direction flag (DF, located in bit 10 of the EFLAGS register) controls the string instructions
(MOVS, CMPS, SCAS, LODS, and STOS). Setting the DF flag causes the string instructions to
auto-decrement (that is, to process strings from high addresses to low addresses). Clearing the
DF flag causes the string instructions to auto-increment (process strings from low addresses
to high addresses).
The STD and CLD instructions set and clear the DF flag, respectively.
3-12
BASIC EXECUTION ENVIRONMENT
3-13
BASIC EXECUTION ENVIRONMENT
3-14
BASIC EXECUTION ENVIRONMENT
NOTES:
Y Yes, this instruction prefix is present.
N No, this instruction prefix is not present.
3-15
4
Procedure Calls,
Interrupts, and
Exceptions
CHAPTER 4
PROCEDURE CALLS, INTERRUPTS, AND
EXCEPTIONS
This chapter describes the facilities in the Intel Architecture for executing calls to procedures or
subroutines. It also describes how interrupts and exceptions are handled from the perspective of
an application programmer.
4.2. STACK
The stack (see Figure 4-1) is a contiguous array of memory locations. It is contained in a
segment and identified by the segment selector in the SS register. (When using the flat memory
model, the stack can be located anywhere in the linear address space for the program.) A stack
can be up to 4 gigabytes long, the maximum size of a segment.
The next available memory location on the stack is called the top of stack. At any given time,
the stack pointer (contained in the ESP register) gives the address (that is the offset from the base
of the SS segment) of the top of the stack.
Items are placed on the stack using the PUSH instruction and removed from the stack using the
POP instruction. When an item is pushed onto the stack, the processor decrements the ESP
register, then writes the item at the new top of stack. When an item is popped off the stack, the
processor reads the item from the top of stack, then increments the ESP register. In this manner,
the stack grows down in memory (towards lesser addresses) when items are pushed on the stack
and shrinks up (towards greater addresses) when the items are popped from the stack.
A program or operating system/executive can set up many stacks. For example, in multitasking
systems, each task can be given its own stack. The number of stacks in a system is limited by the
maximum number of segments and the available physical memory. When a system sets up many
4-1
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
stacks, only one stack—the current stack—is available at a time. The current stack is the one
contained in the segment referenced by the SS register.
Stack Segment
Bottom of Stack
(Initial ESP Value)
Local Variables
for Calling
Procedure The Stack Can Be
16 or 32 Bits Wide
Parameters
Passed to The EBP register is
Called typically set to point
Procedure to the return
instruction pointer.
Frame Boundary
Return Instruction EBP Register
Pointer
ESP Register
Top of Stack
The processor references the SS register automatically for all stack operations. For example,
when the ESP register is used as a memory address, it automatically points to an address in the
current stack. Also, the CALL, RET, PUSH, POP, ENTER, and LEAVE instructions all perform
operations on the current stack.
4-2
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
3. Load the stack pointer for the stack into the ESP register using a MOV, POP, or LSS
instruction. (The LSS instruction can be used to load the SS and ESP registers in one
operation.)
See “Segment Descriptors” in Chapter 3 of the Intel Architecture Software Developer’s Manual,
Volume 3, for information on how to set up a segment descriptor and segment limits for a stack
segment.
4-3
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
4-4
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
stack is determined by an optional argument (n) to the RET instruction. See “RET—Return from
Procedure” in Chapter 3 of the Intel Architecture Software Developer’s Manual, Volume 2, for a
detailed description of the RET instruction.
4-5
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
4-6
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
4-7
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
Protection Rings
Operating
System
Kernel Level 0
Operating System
Services (Device
Drivers, Etc.) Level 1
Applications Level 2
Level 3
Highest Lowest
0 1 2 3
Privilege Levels
If an operating system or executive uses this multilevel protection mechanism, a call to a proce-
dure that is in a more privileged protection level than the calling procedure is handled in a similar
manner as a far call (see Section 4.3.2., “Far CALL and RET Operation”). The differences are
as follows:
• The segment selector provided in the CALL instruction references a special data structure
called a call gate descriptor. Among other things, the call gate descriptor provides the
following:
— Access rights information.
— The segment selector for the code segment of the called procedure.
— An offset into the code segment (that is, the instruction pointer for the called
procedure).
• The processor switches to a new stack to execute the called procedure. Each privilege level
has its own stack. The segment selector and stack pointer for the privilege level 3 stack are
stored in the SS and ESP registers, respectively, and are automatically saved when a call to
a more privileged level occurs. The segment selectors and stack pointers for the privilege
level 2, 1, and 0 stacks are stored in a system segment called the task state segment (TSS).
The use of a call gate and the TSS during a stack switch are transparent to the calling procedure,
except when a general-protection exception is raised.
4-8
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
Calling SS
Calling ESP
Param 1 Param 1
Stack Frame
Param 2 Param 2 Stack Frame
Before Call
Param 3 ESP Before Call Param 3 After Call
Calling CS
ESP After Call Calling EIP
Calling SS
ESP After Return Calling ESP
Param 1 Param 1
Param 2 Param 2
Param 3 Param 3
Calling CS
ESP Before Return Calling EIP
3. Loads the segment selector and stack pointer for the new stack (that is, the stack for the
privilege level being called) from the TSS into the SS and ESP registers and switches to the
new stack.
4. Pushes the temporarily saved SS and ESP values for the calling procedure’s stack onto the
new stack.
5. Copies the parameters from the calling procedure’s stack to the new stack. (A value in the
call gate descriptor determines how many parameters to copy to the new stack.)
6. Pushes the temporarily saved CS and EIP values for the calling procedure to the new stack.
4-9
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
7. Loads the segment selector for the new code segment and the new instruction pointer from
the call gate into the CS and EIP registers, respectively.
8. Begins execution of the called procedure at the new privilege level.
When executing a return from the privileged procedure, the processor performs these actions:
1. Performs a privilege check.
2. Restores the CS and EIP registers to their values prior to the call.
3. (If the RET instruction has an optional n argument.) Increments the stack pointer by the
number of bytes specified with the n operand to release parameters from the stack. If the
call gate descriptor specifies that one or more parameters be copied from one stack to the
other, a RET n instruction must be used to release the parameters from both stacks. Here,
the n operand specifies the number of bytes occupied on each stack by the parameters. On
a return, the processor increments ESP by n for each stack to step over (effectively remove)
these parameters from the stacks.
4. Restores the SS and ESP registers to their values prior to the call, which causes a switch
back to the stack of the calling procedure.
5. (If the RET instruction has an optional n argument.) Increments the stack pointer by the
number of bytes specified with the n operand to release parameters from the stack (see
explanation in step 3).
6. Resumes execution of the calling procedure.
See Chapter 4, Protection, in the Intel Architecture Software Developer’s Manual, Volume 3, for
detailed information on calls to privileged levels and the call gate descriptor.
4-10
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
through assembly-language calls. The remainder of this section gives a brief overview of the
processor’s interrupt and exception handling mechanism. See Chapter 5, Interrupt and Excep-
tion Handling in the Intel Architecture Software Developer’s Manual, Volume 3, for a detailed
description of this mechanism.
The Intel Architecture defines 16 predefined interrupts and exceptions and 224 user defined
interrupts, which are associated with entries in the IDT. Each interrupt and exception in the IDT
is identified with a number, called a vector. Table 4-1 lists the interrupts and exceptions with
entries in the IDT and their respective vector numbers. Vectors 0 through 8, 10 through 14, and
16 through 18 are the predefined interrupts and exceptions, and vectors 32 through 255 are the
user-defined interrupts, called maskable interrupts.
Note that the processor defines several additional interrupts that do not point to entries in the
IDT; the most notable of these interrupts is the SMI interrupt. See “Exception and Interrupt
Vectors” in Chapter 5 of the Intel Architecture Software Developer’s Manual, Volume 3, for more
information about the interrupts and exceptions that the Intel Architecture supports.
When the processor detects an interrupt or exception, it does one of the following things:
• Executes an implicit call to a handler procedure.
• Executes an implicit call to a handler task.
4-11
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
4-12
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
If no stack switch occurs, the processor does the following when calling an interrupt or excep-
tion handler (see Figure 4-5):
1. Pushes the current contents of the EFLAGS, CS, and EIP registers (in that order) on the
stack.
2. Pushes an error code (if appropriate) on the stack.
3. Loads the segment selector for the new code segment and the new instruction pointer (from
the interrupt gate or trap gate) into the CS and EIP registers, respectively.
4. If the call is through an interrupt gate, clears the IF flag in the EFLAGS register.
5. Begins execution of the handler procedure at the new privilege level.
ESP Before
EFLAGS Transfer to Handler
CS
EIP
Error Code ESP After
Transfer to Handler
ESP Before
Transfer to Handler SS
ESP
EFLAGS
CS
EIP
ESP After Error Code
Transfer to Handler
Figure 4-5. Stack Usage on Transfers to Interrupt and Exception Handling Routines
4-13
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
2. Loads the segment selector and stack pointer for the new stack (that is, the stack for the
privilege level being called) from the TSS into the SS and ESP registers and switches to the
new stack.
3. Pushes the temporarily saved SS, ESP, EFLAGS, CS, and EIP values for the interrupted
procedure’s stack onto the new stack.
4. Pushes an error code on the new stack (if appropriate).
5. Loads the segment selector for the new code segment and the new instruction pointer (from
the interrupt gate or trap gate) into the CS and EIP registers, respectively.
6. If the call is through an interrupt gate, clears the IF flag in the EFLAGS register.
7. Begins execution of the handler procedure at the new privilege level.
A return from an interrupt or exception handler is initiated with the IRET instruction. The IRET
instruction is similar to the far RET instruction, except that it also restores the contents of the
EFLAGS register for the interrupted procedure:
When executing a return from an interrupt or exception handler from the same privilege level as
the interrupted procedure, the processor performs these actions:
1. Restores the CS and EIP registers to their values prior to the interrupt or exception.
2. Restores the EFLAGS register.
3. Increments the stack pointer appropriately
4. Resumes execution of the interrupted procedure.
When executing a return from an interrupt or exception handler from a different privilege level
than the interrupted procedure, the processor performs these actions:
1. Performs a privilege check.
2. Restores the CS and EIP registers to their values prior to the interrupt or exception.
3. Restores the EFLAGS register.
4. Restores the SS and ESP registers to their values prior to the interrupt or exception,
resulting in a stack switch back to the stack of the interrupted procedure.
5. Resumes execution of the interrupted procedure.
4-14
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
of the task switch, the processor saves complete state information for the interrupted program or
task. Upon returning from the handler task, the state of the interrupted program or task is
restored and execution continues. See Chapter 5, Interrupt and Exception Handling, in the Intel
Architecture Software Developer’s Manual, Volume 3, for a detailed description of the
processor’s mechanism for handling interrupts and exceptions through handler tasks.
4-15
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
The lexical nesting level determines the number of stack frame pointers to copy into the new
stack frame from the preceding frame. A stack frame pointer is a doubleword used to access the
variables of a procedure. The set of stack frame pointers used by a procedure to access the
variables of other procedures is called the display. The first doubleword in the display is a pointer
to the previous stack frame. This pointer is used by a LEAVE instruction to undo the effect of an
ENTER instruction by discarding the current stack frame.
After the ENTER instruction creates the display for a procedure, it allocates the dynamic local
variables for the procedure by decrementing the contents of the ESP register by the number of
bytes specified in the first parameter. This new value in the ESP register serves as the initial top-
of-stack for all PUSH and POP operations within the procedure.
4-16
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
To allow a procedure to address its display, the ENTER instruction leaves the EBP register
pointing to the first doubleword in the display. Because stacks grow down, this is actually the
doubleword with the highest address in the display. Data manipulation instructions that specify
the EBP register as a base register automatically address locations within the stack segment
instead of the data segment.
The ENTER instruction can be used in two ways: nested and non-nested. If the lexical level is
0, the non-nested form is used. The non-nested form pushes the contents of the EBP register on
the stack, copies the contents of the ESP register into the EBP register, and subtracts the first
operand from the contents of the ESP register to allocate dynamic storage. The non-nested form
differs from the nested form in that no stack frame pointers are copied. The nested form of the
ENTER instruction occurs when the second parameter (lexical level) is not zero.
The following pseudo code shows the formal definition of the ENTER instruction. STORAGE
is the number of bytes of dynamic storage to allocate for local variables, and LEVEL is the
lexical nesting level.
PUSH EBP;
FRAME_PTR ← ESP;
IF LEVEL > 0
THEN
DO (LEVEL − 1) times
EBP ← EBP − 4;
PUSH Pointer(EBP); (* doubleword pointed to by EBP *)
OD;
PUSH FRAME_PTR;
FI;
EBP ← FRAME_PTR;
ESP ← ESP − STORAGE;
The main procedure (in which all other procedures are nested) operates at the highest lexical
level, level 1. The first procedure it calls operates at the next deeper lexical level, level 2. A level
2 procedure can access the variables of the main program, which are at fixed locations specified
by the compiler. In the case of level 1, the ENTER instruction allocates only the requested
dynamic storage on the stack because there is no previous display to copy.
A procedure which calls another procedure at a lower lexical level gives the called procedure
access to the variables of the caller. The ENTER instruction provides this access by placing a
pointer to the calling procedure's stack frame in the display.
A procedure which calls another procedure at the same lexical level should not give access to its
variables. In this case, the ENTER instruction copies only that part of the display from the
calling procedure which refers to previously nested procedures operating at higher lexical levels.
The new stack frame does not include the pointer for addressing the calling procedure’s stack
frame.
The ENTER instruction treats a re-entrant procedure as a call to a procedure at the same lexical
level. In this case, each succeeding iteration of the re-entrant procedure can address only its own
variables and the variables of the procedures within which it is nested. A re-entrant procedure
always can address its own variables; it does not require pointers to the stack frames of previous
iterations.
4-17
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
By copying only the stack frame pointers of procedures at higher lexical levels, the ENTER
instruction makes certain that procedures access only those variables of higher lexical levels, not
those at parallel lexical levels (see Figure 4-6).
Block-structured languages can use the lexical levels defined by ENTER to control access to the
variables of nested procedures. In Figure 4-6, for example, if procedure A calls procedure B
which, in turn, calls procedure C, then procedure C will have access to the variables of the MAIN
procedure and procedure A, but not those of procedure B because they are at the same lexical
level. The following definition describes the access to variables for the nested procedures in
Figure 4-6.
1. MAIN has variables at fixed locations.
2. Procedure A can access only the variables of MAIN.
3. Procedure B can access only the variables of procedure A and MAIN. Procedure B cannot
access the variables of procedure C or procedure D.
4. Procedure C can access only the variables of procedure A and MAIN. procedure C cannot
access the variables of procedure B or procedure D.
5. Procedure D can access the variables of procedure C, procedure A, and MAIN. Procedure
D cannot access the variables of procedure B.
In Figure 4-7, an ENTER instruction at the beginning of the MAIN procedure creates three
doublewords of dynamic storage for MAIN, but copies no pointers from other stack frames. The
first doubleword in the display holds a copy of the last value in the EBP register before the
ENTER instruction was executed. The second doubleword holds a copy of the contents of the
EBP register following the ENTER instruction. After the instruction is executed, the EBP
register points to the first doubleword pushed on the stack, and the ESP register points to the last
doubleword in the stack frame.
When MAIN calls procedure A, the ENTER instruction creates a new display (see Figure 4-8).
The first doubleword is the last value held in MAIN's EBP register. The second doubleword is a
pointer to MAIN's stack frame which is copied from the second doubleword in MAIN's display.
4-18
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
This happens to be another copy of the last value held in MAIN’s EBP register. Procedure A can
access variables in MAIN because MAIN is at level 1. Therefore the base address for the
dynamic storage used in MAIN is the current address in the EBP register, plus four bytes to
account for the saved contents of MAIN’s EBP register. All dynamic variables for MAIN are at
fixed, positive offsets from this value.
Dynamic
Storage
ESP
Old EBP
Main’s EBP
4-19
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
When procedure A calls procedure B, the ENTER instruction creates a new display (see Figure
4-9). The first doubleword holds a copy of the last value in procedure A’s EBP register. The
second and third doublewords are copies of the two stack frame pointers in procedure A’s
display. Procedure B can access variables in procedure A and MAIN by using the stack frame
pointers in its display.
Old EBP
Main’s EBP
Main’s EBP
Main’s EBP
Procedure A’s EBP
Dynamic
Storage
ESP
When procedure B calls procedure C, the ENTER instruction creates a new display for proce-
dure C (see Figure 4-10). The first doubleword holds a copy of the last value in procedure B’s
EBP register. This is used by the LEAVE instruction to restore procedure B’s stack frame. The
second and third doublewords are copies of the two stack frame pointers in procedure A’s
display. If procedure C were at the next deeper lexical level from procedure B, a fourth double-
word would be copied, which would be the stack frame pointer to procedure B’s local variables.
Note that procedure B and procedure C are at the same level, so procedure C is not intended to
access procedure B’s variables. This does not mean that procedure C is completely isolated from
procedure B; procedure C is called by procedure B, so the pointer to the returning stack frame
is a pointer to procedure B's stack frame. In addition, procedure B can pass parameters to proce-
dure C either on the stack or through variables global to both procedures (that is, variables in the
scope of both procedures).
4-20
PROCEDURE CALLS, INTERRUPTS, AND EXCEPTIONS
Old EBP
Main’s EBP
Main’s EBP
Main’s EBP
Procedure A’s EBP
Dynamic
Storage
ESP
4-21
5
Data Types and
Addressing Modes
CHAPTER 5
DATA TYPES AND ADDRESSING MODES
This chapter describes data types and addressing modes available to programmers of the Intel
Architecture processors.
7 0
Byte
N
15 87 0
High Low Word
Byte Byte
N+1 N
31 16 15 0
High Word Low Word Doubleword
N+2 N
63 32 31 0
High Doubleword Low Doubleword Quadword
N+4 N
Figure 5-1. Fundamental Data Types
Figure 5-2 shows the byte order of each of the fundamental data types when referenced as oper-
ands in memory. The low byte (bits 0 through 7) of each data type occupies the lowest address
in memory and that address is also the address of the operand.
5-1
DATA TYPES AND ADDRESSING MODES
aligned accesses require only one memory access. A word or doubleword operand that crosses
a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered
unaligned and requires two separate memory bus cycles to access it; a word that starts on an odd
address but does not cross a word boundary is considered aligned and can still be accessed in
one bus cycle.
EH
7AH DH
36H AH
Byte at Address 9H
1FH 9H
Contains 1FH Quadword at Address 6H
A4H 8H Contains 7AFE06361FA4230BH
5.2.1. Integers
Integers are signed binary numbers held in a byte, word, or doubleword. All operations assume
a two's complement representation. The sign bit is located in bit 7 in a byte integer, bit 15 in a
word integer, and bit 31 in a doubleword integer. The sign bit is set for negative integers and
cleared for positive integers and zero. Integer values range from –128 to +127 for a byte integer,
from –32,768 to +32,767 for a word integer, and from –231 to +231 – 1 for a doubleword integer.
5-2
DATA TYPES AND ADDRESSING MODES
7 0
Word Unsigned Integer
15 0
Doubleword Unsigned Integer
31 0
BCD Integers
X BCD .... X BCD X BCD
7 43 0
Packed BCD Integers
BCD BCD .... BCD BCD BCD BCD
7 43 0
Near Pointer
Offset or Linear Address
31 0
Far Pointer or Logical Address
Segment Selector Offset
47 32 31 0
Bit Field
Field Length
Least
Significant
Bit
5-3
DATA TYPES AND ADDRESSING MODES
5.2.4. Pointers
Pointers are addresses of locations in memory. The Pentium Pro processor recognizes two types
of pointers: a near pointer (32 bits) and a far pointer (48 bits). A near pointer is a 32-bit offset
(also called an effective address) within a segment. Near pointers are used for all memory refer-
ences in a flat memory model or for references in a segmented model where the identity of the
segment being accessed is implied. A far pointer is a 48-bit logical address, consisting of a 16-bit
segment selector and a 32-bit offset. Far pointers are used for memory references in a segmented
memory model where the identity of a segment being accessed must be specified explicitly.
5.2.6. Strings
Strings are continuous sequences of bits, bytes, words, or doublewords. A bit string can begin
at any bit position of any byte and can contain up to 232 – 1 bits. A byte string can contain bytes,
words, or doublewords and can range from zero to 232 – 1 bytes (4 gigabytes).
5-4
DATA TYPES AND ADDRESSING MODES
All the arithmetic instructions (except the DIV and IDIV instructions) allow the source operand
to be an immediate value. The maximum value allowed for an immediate operand varies among
instructions, but can never be greater than the maximum value of an unsigned doubleword
integer (232).
5-5
DATA TYPES AND ADDRESSING MODES
• The 8-bit general-purpose registers (AH, BH, CH, DH, AL, BL, CL, or DL).
• The segment registers (CS, DS, SS, ES, FS, and GS).
• The EFLAGS register.
• System registers, such as the global descriptor table (GDTR) or the interrupt descriptor
table register (IDTR).
Some instructions (such as the DIV and MUL instructions) use quadword operands contained
in a pair of 32-bit registers. Register pairs are represented with a colon separating them. For
example, in the register pair EDX:EAX, EDX contains the high order bits and EAX contains the
low order bits of a quadword operand.
Several instructions (such as the PUSHFD and POPFD instructions) are provided to load and
store the contents of the EFLAGS register or to set or clear individual flags in this register. Other
instructions (such as the Jcc instructions) use the state of the status flags in the EFLAGS register
as condition codes for branching or other decision making operations.
The processor contains a selection of system registers that are used to control memory manage-
ment, interrupt and exception handling, task management, processor management, and debug-
ging activities. Some of these system registers are accessible by an application program, the
operating system, or the executive through a set of system instructions. When accessing a system
register with a system instruction, the register is generally an implied operand of the instruction.
15 0 31 0
Segment Offset (or Linear Address)
Selector
5-6
DATA TYPES AND ADDRESSING MODES
When storing data in or loading data from memory, the DS segment default can be overridden
to allow other segments to be accessed. Within an assembler, the segment override is generally
handled with a colon “:” operator. For example, the following MOV instruction moves a value
from register EAX into the segment pointed to by the ES register. The offset into the segment is
contained in the EBX register:
MOV ES:[EBX], EAX;
(At the machine level, a segment override is specified with a segment-override prefix, which is
a byte placed at the beginning of an instruction.) The following default segment selections
cannot be overridden:
• Instruction fetches must be made from the code segment.
• Destination strings in string instructions must be stored in the data segment pointed to by
the ES register.
• Push and pop operations must always reference the SS segment.
Some instructions require a segment selector to be specified explicitly. In these cases, the 16-bit
segment selector can be located in a memory location or in a 16-bit register. For example, the
following MOV instruction moves a segment selector located in register BX into segment
register DS:
MOV DS, BX
Segment selectors can also be specified explicitly as part of a 48-bit far pointer in memory. Here,
the first doubleword in memory contains the offset and the next word contains the segment
selector.
5-7
DATA TYPES AND ADDRESSING MODES
EAX
EAX None
EBX 1
EBX
ECX
ECX 2 8-bit
EDX
+ EDX +
ESP
EBP
EBP * 3 16-bit
ESI
ESI 4 32-bit
EDI
EDI
The uses of general-purpose registers as base or index components are restricted in the following
manner:
• The ESP register cannot be used as an index register.
• When the ESP or EBP register is used as the base, the SS segment is the default segment.
In all other cases, the DS segment is the default segment.
The base, index, and displacement components can be used in any combination, and any of these
components can be null. A scale factor may be used only when an index also is used. Each
possible combination is useful for data structures commonly used by programmers in high-level
languages and assembly language. The following addressing modes suggest uses for common
combinations of address components.
Displacement
A displacement alone represents a direct (uncomputed) offset to the operand. Because the
displacement is encoded in the instruction, this form of an address is sometimes called an abso-
lute or static address. It is commonly used to access a statically allocated scalar operand.
5-8
DATA TYPES AND ADDRESSING MODES
Base
A base alone represents an indirect offset to the operand. Since the value in the base register can
change, it can be used for dynamic storage of variables and data structures.
Base + Displacement
A base register and a displacement can be used together for two distinct purposes:
• As an index into an array when the element size is not 2, 4, or 8 bytes—The displacement
component encodes the static offset to the beginning of the array. The base register holds
the results of a calculation to determine the offset to a specific element within the array.
• To access a field of a record—The base register holds the address of the beginning of the
record, while the displacement is an static offset to the field.
An important special case of this combination is access to parameters in a procedure activation
record. A procedure activation record is the stack frame created when a procedure is entered.
Here, the EBP register is the best choice for the base register, because it automatically selects
the stack segment. This is a compact encoding for this common function.
5-9
DATA TYPES AND ADDRESSING MODES
5-10
6
Instruction Set
Summary
CHAPTER 6
INSTRUCTION SET SUMMARY
This chapter lists all the instructions in the Intel Architecture instruction set, divided into three
functional groups: integer, floating-point, and system. It also briefly describes each of the integer
instructions.
Brief descriptions of the floating-point instructions are given in Chapter 7, Floating-Point Unit;
brief descriptions of the system instructions are given in the Intel Architecture Software Devel-
oper’s Manual, Volume 3.
Detailed descriptions of all the Intel Architecture instructions are given in Intel Architecture
Software Developer’s Manual, Volume 2. Included in this volume are a description of each
instruction’s encoding and operation, the effect of an instruction on the EFLAGS flags, and the
exceptions an instruction may generate.
6-1
INSTRUCTION SET SUMMARY
Volume 2). (This instruction is also available in all Pentium® processors that implement the
MMX™ technology.)
• UD2—Undefined instruction (see Section 6.15.4., “No-Operation and Undefined Instruc-
tions”).
6-2
INSTRUCTION SET SUMMARY
6-3
INSTRUCTION SET SUMMARY
6-4
INSTRUCTION SET SUMMARY
6-5
INSTRUCTION SET SUMMARY
6-6
INSTRUCTION SET SUMMARY
JO Jump if overflow
JNO Jump if not overflow
JS Jump if sign (negative)
JNS Jump if not sign (non-negative)
JPO/JNP Jump if parity odd/Jump if not parity
JPE/JP Jump if parity even/Jump if parity
JCXZ/JECXZ Jump register CX zero/Jump register ECX zero
LOOP Loop with ECX counter
LOOPZ/LOOPE Loop with ECX and zero/Loop with ECX and equal
LOOPNZ/LOOPNE Loop with ECX and not zero/Loop with ECX and not equal
CALL Call procedure
RET Return
IRET Return from interrupt
INT Software interrupt
INTO Interrupt on overflow
BOUND Detect value out of range
ENTER High-level procedure entry
LEAVE High-level procedure exit
6-7
INSTRUCTION SET SUMMARY
6-8
INSTRUCTION SET SUMMARY
6-9
INSTRUCTION SET SUMMARY
6-10
INSTRUCTION SET SUMMARY
6-11
INSTRUCTION SET SUMMARY
6-12
INSTRUCTION SET SUMMARY
6.2.3.3. COMPARISON
FCOM Compare real
FCOMP Compare real and pop
FCOMPP Compare real and pop twice
FUCOM Unordered compare real
FUCOMP Unordered compare real and pop
FUCOMPP Unordered compare real and pop twice
FICOM Compare integer
FICOMP Compare integer and pop
FCOMI Compare real and set EFLAGS
FUCOMI Unordered compare real and set EFLAGS
FCOMIP Compare real, set EFLAGS, and pop
FUCOMIP Unordered compare real, set EFLAGS, and pop
FTST Test real
FXAM Examine real
6.2.3.4. TRANSCENDENTAL
FSIN Sine
FCOS Cosine
FSINCOS Sine and cosine
FPTAN Partial tangent
FPATAN Partial arctangent
F2XM1 2x − 1
FYL2X y∗log2x
FYL2XP1 y∗log2(x+1)
6-13
INSTRUCTION SET SUMMARY
FLDPI Load π
FLDL2E Load log2e
FLDLN2 Load loge2
FLDL2T Load log210
FLDLG2 Load log102
6-14
INSTRUCTION SET SUMMARY
6-15
INSTRUCTION SET SUMMARY
6-16
INSTRUCTION SET SUMMARY
Table 6-4 shows the mnemonics for the CMOVcc instructions and the conditions being tested for
each instruction. The condition code mnemonics are appended to the letters “CMOV” to form the
mnemonics for the CMOVcc instructions. The instructions listed in Table 6-4 as pairs (for
example, CMOVA/CMOVNBE) are alternate names for the same instruction. The assembler
provides these alternate names to make it easier to read program listings.
The CMOVcc instructions are useful for optimizing small IF constructions. They also help elim-
inate branching overhead for IF statements and the possibility of branch mispredictions by the
processor.
These instructions may not be supported on some processors in the Pentium Pro processor family.
Software can check if the CMOVcc instructions are supported by checking the processor’s
feature information with the CPUID instruction (see “CPUID—CPU Identification” in Chapter
3 of the Intel Architecture Software Developer’s Manual, Volume 2).
6-17
INSTRUCTION SET SUMMARY
value as before. The BSWAP instruction is useful for converting between “big-endian” and
“little-endian” data formats. This instruction also speeds execution of decimal arithmetic. (The
XCHG instruction can be used two swap the bytes in a word.)
The XADD (exchange and add) instruction swaps two operands and then stores the sum of the
two operands in the destination operand. The status flags in the EFLAGS register indicate the
result of the addition. This instruction can be combined with the LOCK prefix (see
“LOCK—Assert LOCK# Signal Prefix” in Chapter 3 of the Intel Architecture Software Devel-
oper’s Manual, Volume 2) in a multiprocessing system to allow multiple processors to execute
one DO loop.
The CMPXCHG (compare and exchange) and CMPXCHG8B (compare and exchange 8 bytes)
instructions are used to synchronize operations in systems that use multiple processors. The
CMPXCHG instruction requires three operands: a source operand in a register, another source
operand in the EAX register, and a destination operand. If the values contained in the destination
operand and the EAX register are equal, the destination operand is replaced with the value of the
other source operand (the value not in the EAX register). Otherwise, the original value of the
destination operand is loaded in the EAX register. The status flags in the EFLAGS register
6-18
INSTRUCTION SET SUMMARY
reflect the result that would have been obtained by subtracting the destination operand from the
value in the EAX register.
The CMPXCHG instruction is commonly used for testing and modifying semaphores. It checks
to see if a semaphore is free. If the semaphore is free it is marked allocated, otherwise it gets the
ID of the current owner. This is all done in one uninterruptible operation. In a single-processor
system, the CMPXCHG instruction eliminates the need to switch to protection level 0 (to disable
interrupts) before executing multiple instructions to test and modify a semaphore. For multiple
processor systems, CMPXCHG can be combined with the LOCK prefix to perform the compare
and exchange operation atomically. (See “Locked Atomic Operations” in Chapter 7 of the Intel
Architecture Software Developer’s Manual, Volume 3, for more information on atomic opera-
tions.)
The CMPXCHG8B instruction also requires three operands: a 64-bit value in EDX:EAX, a
64-bit value in ECX:EBX, and a destination operand in memory. The instruction compares the
64-bit value in the EDX:EAX registers with the destination operand. If they are equal, the 64-bit
value in the ECX:EBX register is stored in the destination operand. If the EDX:EAX register
and the destination are not equal, the destination is loaded in the EDX:EAX register. The
CMPXCHG8B instruction can be combined with the LOCK prefix to perform the operation
atomically.
Stack
Before Pushing Doubleword After Pushing Doubleword
Stack
Growth 31 0 31 0
n ESP
n−4 Doubleword Value ESP
n−8
The PUSHA instruction saves the contents of the eight general-purpose registers on the stack
(see Figure 6-2). This instruction simplifies procedure calls by reducing the number of instruc-
tions required to save the contents of the general-purpose registers. The registers are pushed on
the stack in the following order: EAX, ECX, EDX, EBX, the initial value of ESP before EAX
was pushed, EBP, ESI, and EDI.
6-19
INSTRUCTION SET SUMMARY
Stack
Before Pushing Registers After Pushing Registers
Stack 31 0 31 0
Growth
n
n-4 ESP
n-8 EAX
n - 12 ECX
n - 16 EDX
n - 20 EBX
n - 24 Old ESP
n - 28 EBP
n - 32 ESI
n - 36 EDI ESP
The POP instruction copies the word or doubleword at the current top of stack (indicated by the
ESP register) to the location specified with the destination operand, and then increments the ESP
register to point to the new top of stack (see Figure 6-3). The destination operand may specify a
general-purpose register, a segment register, or a memory location.
Stack
Before Popping Doubleword After Popping Doubleword
Stack
Growth 31 0 31 0
n
n-4 ESP
n-8 Doubleword Value ESP
The POPA instruction reverses the effect of the PUSHA instruction. It pops the top eight words
or doublewords from the top of the stack into the general-purpose registers, except for the ESP
register (see Figure 6-4). If the operand-size attribute is 32, the doublewords on the stack are
transferred to the registers in the following order: EDI, ESI, EBP, ignore doubleword, EBX,
EDX, ECX, and EAX. The ESP register is restored by the action of popping the stack. If the
operand-size attribute is 16, the words on the stack are transferred to the registers in the
following order: DI, SI, BP, ignore word, BX, DX, CX, and AX.
6-20
INSTRUCTION SET SUMMARY
Stack
Before Popping Registers After Popping Registers
Stack 0 31 0 31
Growth
n
n-4 ESP
n-8 EAX
n - 12 ECX
n - 16 EDX
n - 20 EBX
n - 24 Ignored
n - 28 EBP
n - 32 ESI
n - 36 EDI ESP
15 0
Before Sign
S N N N N N N N N N N N N N N N
Extension
31 15 0
After Sign
S S S S S S S S S S S S S S S S S N N N N N N N N N N N N N N N
Extension
6-21
INSTRUCTION SET SUMMARY
The CWD instruction copies the sign (bit 15) of the word in the AX register into every bit posi-
tion in the DX register. The CDQ instruction copies the sign (bit 31) of the doubleword in the
EAX register into every bit position in the EDX register. The CWD instruction can be used to
produce a doubleword dividend from a word before a word division, and the CDQ instruction
can be used to produce a quadword dividend from a doubleword before doubleword division.
6-22
INSTRUCTION SET SUMMARY
6-23
INSTRUCTION SET SUMMARY
6-24
INSTRUCTION SET SUMMARY
Initial State
CF Operand
X 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1 1
0
1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1 1 0
0
0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
The SHR instruction shifts the source operand right by from 1 to 31 bit positions (see Figure
6-7). As with the SHL/SAL instruction, the empty bit positions are cleared and the CF flag is
loaded with the last bit shifted out of the operand.
6-25
INSTRUCTION SET SUMMARY
The SAR instruction shifts the source operand right by from 1 to 31 bit positions (see Figure
6-8). This instruction differs from the SHR instruction in that it preserves the sign of the source
operand by clearing empty bit positions if the operand is positive or setting the empty bits if the
operand is negative. Again, the CF flag is loaded with the last bit shifted out of the operand.
The SAR and SHR instructions can also be used to perform division by powers of 2 (see
“SAL/SAR/SHL/SHR—Shift Instructions” in Chapter 3 of the Intel Architecture Software
Developer’s Manual, Volume 2).
0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1
1 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1
6-26
INSTRUCTION SET SUMMARY
SHLD Instruction
31 0
CF Destination (Memory or Register)
31 0
Source (Register)
SHRD Instruction
31 0
Source (Register)
31 0
Destination (Memory or Register) CF
The SHLD instruction shifts the bits in the destination operand to the left and fills the empty bit
positions (in the destination operand) with bits shifted out of the source operand. The destination
and source operands must be the same length (either words or doublewords). The shift count can
range from 0 to 31 bits. The result of this shift operation is stored in the destination operand, and
the source operand is not modified. The CF flag is loaded with the last bit shifted out of the desti-
nation operand.
The SHRD instruction operates the same as the SHLD instruction except bits are shifted to the
left in the destination operand, with the empty bit positions filled with bits shifted out of the
source operand.
6-27
INSTRUCTION SET SUMMARY
ROL Instruction
31 0
31 ROR Instruction 0
Destination (Memory or Register) CF
RCL Instruction
31 0
CF Destination (Memory or Register)
RCR Instruction
31 0
Destination (Memory or Register) CF
The ROL instruction rotates the bits in the operand to the left (toward more significant bit loca-
tions). The ROR instruction rotates the operand right (toward less significant bit locations).
The RCL instruction rotates the bits in the operand to the left, through the CF flag). This instruc-
tion treats the CF flag as a one-bit extension on the upper end of the operand. Each bit which
exits from the most significant bit location of the operand moves into the CF flag. At the same
time, the bit in the CF flag enters the least significant bit location of the operand.
The RCR instruction rotates the bits in the operand to the right through the CF flag.
For all the rotate instructions, the CF flag always contains the value of the last bit rotated out of
the operand, even if the instruction does not use the CF flag as an extension of the operand. The
value of this flag can then be tested by a conditional jump instruction (JC or JNC).
6-28
INSTRUCTION SET SUMMARY
6-29
INSTRUCTION SET SUMMARY
6-30
INSTRUCTION SET SUMMARY
6-31
INSTRUCTION SET SUMMARY
The destination operand specifies a relative address (a signed offset with respect to the address
in the EIP register) that points to an instruction in the current code segment. The Jcc instructions
do not support far transfers; however, far transfers can be accomplished with a combination of
a Jcc and a JMP instruction (see “Jcc—Jump if Condition Is Met” in Chapter 3 of the Intel Archi-
tecture Software Developer’s Manual, Volume 2).
6-32
INSTRUCTION SET SUMMARY
Table 6-4 shows the mnemonics for the Jcc instructions and the conditions being tested for each
instruction. The condition code mnemonics are appended to the letter “J” to form the mnemonic
for a Jcc instruction. The instructions are divided into two groups: unsigned and signed condi-
tional jumps. These groups correspond to the results of operations performed on unsigned and
signed integers, respectively. Those instructions listed as pairs (for example, JA/JNBE) are alter-
nate names for the same instruction. The assembler provides these alternate names to make it
easier to read program listings.
The JCXZ and JECXZ instructions test the CX and ECX registers, respectively, instead of one
or more status flags. See Section 6.9.2.3., “Jump If Zero Instructions” for more information
about these instructions.
6-33
INSTRUCTION SET SUMMARY
instructions decrement the contents of the ECX register before testing for zero. If the value in
the ECX register is zero initially, it will be decremented to FFFFFFFFH on the first loop instruc-
tion, causing the loop to be executed 232 times. To prevent this problem, a JECXZ instruction
can be inserted at the beginning of the code block for the loop, causing a jump out the loop if the
EAX register count is initially zero. When used with repeated string scan and compare instruc-
tions, the JECXZ instruction can determine whether the loop terminated because the count
reached zero or because the scan or compare conditions were satisfied.
The JCXZ (jump if CX is zero) instruction operates the same as the JECXZ instruction when the
16-bit address-size attribute is used. Here, the CX register is tested for zero.
6-34
INSTRUCTION SET SUMMARY
source and destination strings can be located in the same segment. (This latter condition can also
be achieved by loading the DS and ES segment registers with the same segment selector and
allowing the ESI register to default to the DS register.)
The MOVS instruction moves the string element addressed by the ESI register to the location
addressed by the EDI register. The assembler recognizes three “short forms” of this instruction,
which specify the size of the string to be moved: MOVSB (move byte string), MOVSW (move
word string), and MOVSD (move doubleword string).
The CMPS instruction subtracts the destination string element from the source string element
and updates the status flags (CF, ZF, OF, SF, PF, and AF) in the EFLAGS register according to
the results. Neither string element is written back to memory. The assembler recognizes three
“short forms” of the CMPS instruction: CMPSB (compare byte strings), CMPSW (compare
word strings), and CMPSD (compare doubleword strings).
The SCAS instruction subtracts the destination string element from the contents of the EAX,
AX, or AL register (depending on operand length) and updates the status flags according to the
results. The string element and register contents are not modified. The following “short forms”
of the SCAS instruction specifies the operand length: SCASB (scan byte string), SCASW (scan
word string), and SCASD (scan doubleword string).
The LODS instruction loads the source string element identified by the ESI register into the
EAX register (for a doubleword string), the AX register (for a word string), or the AL register
(for a byte string). The “short forms” for this instruction are LODSB (load byte string), LODSW
(load word string), and LODSD (load doubleword string). This instruction is usually used in a
loop, where other instructions process each element of the string after they are loaded into the
target register.
The STOS instruction stores the source string element from the EAX (doubleword string), AX
(word string), or AL (byte string) register into the memory location identified with the EDI
register. The “short forms” for this instruction are STOSB (store byte string), STOSW (store
word string), and STOSD (store doubleword string). This instruction is also normally used in a
loop. Here a string is commonly loaded into the register with a LODS instruction, operated
on by other instructions, and then stored again in memory with a STOS instruction.
The I/O instructions (see Section 6.11., “I/O Instructions”) also perform operations on strings in
memory.
6-35
INSTRUCTION SET SUMMARY
the EFLAGS register controls whether the registers are incremented (DF=0) or decremented
(DF=1). The STD and CLD instructions set and clear this flag, respectively.
The following repeat prefixes can be used in conjunction with a count in the ECX register to
cause a string instruction to repeat:
• REP—Repeat while the ECX register not zero.
• REPE/REPZ—Repeat while the ECX register not zero and the ZF flag is set.
• REPNE/REPNZ—Repeat while the ECX register not zero and the ZF flag is clear.
When a string instruction has a repeat prefix, the operation executes until one of the termination
conditions specified by the prefix is satisfied. The REPE/REPZ and REPNE/REPNZ prefixes
are used only with the CMPS and SCAS instructions. Also, note that a A REP STOS instruction
is the fastest way to initialize a large block of memory.
6-36
INSTRUCTION SET SUMMARY
6-37
INSTRUCTION SET SUMMARY
PUSHFD/POPFD
PUSHF/POPF
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
I
V V
I I I A V R 0 N O O D I T S Z A P C
0 0 0 0 0 0 0 0 0 0 C M F T F F F F F F 0 F 0 F 1 F
D P
P F
L
Figure 6-11. Flags Affected by the PUSHF, POPF, PUSHFD, and POPFD instructions
The POPF instruction pops a word from the stack into the EFLAGS register. Only bits 11, 10,
8, 7, 6, 4, 2, and 0 of the EFLAGS register are affected with all uses of this instruction. If the
current privilege level (CPL) of the current code segment is 0 (most privileged), the IOPL bits
(bits 13 and 12) also are affected. If the I/O privilege level (IOPL) is greater than or equal to the
CPL, numerically, the IF flag (bit 9) also is affected.
The POPFD instruction pops a doubleword into the EFLAGS register. This instruction can
change the state of the AC bit (bit 18) and the ID bit (bit 21), as well as the bits affected by a
POPF instruction. The restrictions for changing the IOPL bits and the IF flag that were given for
the POPF instruction also apply to the POPFD instruction.
6-38
INSTRUCTION SET SUMMARY
The POP and MOV instructions cannot place a value in the CS register. Only the far control-
transfer versions of the JMP, CALL, and RET instructions (see Section 6.14.2., “Far Control
Transfer Instructions”) affect the CS register directly.
6-39
INSTRUCTION SET SUMMARY
registers before the execution of string instructions or for initializing the EBX register before an
XLAT instruction.
6-40
7
Floating-Point Unit
CHAPTER 7
FLOATING-POINT UNIT
7-1
FLOATING-POINT UNIT
2
– b ± b – 4ac
---------------------------------------
2a
If a does not equal 0, the formula is numerically unstable when the roots are nearly coincident
or when their magnitudes are wildly different. The formula is also vulnerable to spurious
over/underflows when the coefficients a, b, and c are all very big or all very tiny. When single-
precision (4-byte) floating-point coefficients are given as data and the formula is evaluated in the
FPU's normal way, keeping all intermediate results in its stack, the FPU produces impeccable
single-precision roots. This happens because, by default and with no effort on the programmer's
part, the FPU evaluates all those sub-expressions with so much extra precision and range as to
overwhelm almost any threat to numerical integrity.
If double-precision data and results were at issue, a better formula would have to be used, and
once again the FPU's default evaluation of that formula would provide substantially enhanced
numerical integrity over mere double-precision evaluation.
On most machines, straightforward algorithms will not deliver consistently correct results (and
will not indicate when they are incorrect). To obtain correct results on traditional machines under
all conditions usually requires sophisticated numerical techniques that go beyond typical
programming practice. General application programmers using straightforward algorithms will
produce much more reliable programs using the Intel architectures. This simple fact greatly
reduces the software investment required to develop safe, accurate computation-based products.
Beyond traditional numeric support for scientific applications, the Intel architectures have built-
in facilities for commercial computing. They can process decimal numbers of up to 18 digits
without round-off errors, performing exact arithmetic on integers as large as 264 (or 1018).
Exact arithmetic is vital in accounting applications where rounding errors may introduce mone-
tary losses that cannot be reconciled.
The Intel FPU's contain a number of optional numerical facilities that can be invoked by sophis-
ticated users. These advanced features include directed rounding, gradual underflow, and
programmed exception-handling facilities.
These automatic exception-handling facilities permit a high degree of flexibility in numeric
processing software, without burdening the programmer. While performing numeric calcula-
tions, the processor automatically detects exception conditions that can potentially damage a
calculation (for example, X ÷ 0 or X when X < 0). By default, on-chip exception logic handles
these exceptions so that a reasonable result is produced and execution may proceed without
program interruption. Alternatively, the processor can invoke a software exception handler to
provide special results whenever various types of exceptions are detected.
7-2
FLOATING-POINT UNIT
+10
10.0000000000000000000000
1.11111111111111111111111
Precision 24 Binary Digits
Because the size and number of registers that any computer can have is limited, only a subset of
the real-number continuum can be used in real-number calculations. As shown at the bottom of
Figure 7-1, the subset of real numbers that a particular FPU supports represents an approxima-
tion of the real number system. The range and precision of this real-number subset is determined
by the format that the FPU uses to represent real numbers.
7-3
FLOATING-POINT UNIT
parts: a sign, a significand, and an exponent. Figure 7-2 shows the binary floating-point format
that the Intel Architecture FPU uses. This format conforms to the IEEE standard.
The sign is a binary value that indicates whether the number is positive (0) or negative (1).
The significand has two parts: a 1-bit binary integer (also referred to as the J-bit) and a binary
fraction. The J-bit is often not represented, but instead is an implied value. The exponent is a
binary integer that represents the base-2 power that the significand is raised to.
Sign
Exponent Significand
Fraction
Integer or J-Bit
Table 7-1 shows how the real number 178.125 (in ordinary decimal format) is stored in floating-
point format. The table lists a progression of real number notations that leads to the single-real,
32-bit floating-point format (which is one of the floating-point formats that the FPU supports).
In this format, the significand is normalized (see Section 7.2.2.1., “Normalized Numbers”) and
the exponent is biased (see Section 7.2.2.2., “Biased Exponent”). For the single-real format, the
biasing constant is +127.
7-4
FLOATING-POINT UNIT
Representing numbers in normalized form maximizes the number of significant digits that can
be accommodated in a significand of a given width. To summarize, a normalized real number
consists of a normalized significand that represents a real number between 1 and 2 and an expo-
nent that specifies the number’s binary point.
7-5
FLOATING-POINT UNIT
Figure 7-3 shows how the encodings for these numbers and non-numbers fit into the real number
continuum. The encodings shown here are for the IEEE single-precision (32-bit) format, where
the term “S” indicates the sign bit, “E” the biased exponent, and “F” the fraction. (The exponent
values are given in decimal.)
The FPU can operate on and/or return any of these values, depending on the type of computation
being performed. The following sections describe these number and non-number classes.
NaN NaN
−Denormalized Finite +Denormalized Finite
−∞ −Normalized Finite −0 +0 +Normalized Finite +∞
−Denormalized +Denormalized
1 0 0.XXX2 Finite Finite 0 0 0.XXX2
−Normalized +Normalized 0 1...254 Any Value
1 1...254 Any Value Finite Finite
1 255 0 −∞ +∞ 0 255 0
NOTES:
1. Sign bit ignored.
2. Fractions must be non-zero.
7-6
FLOATING-POINT UNIT
When real numbers become very close to zero, the normalized-number format can no longer be
used to represent the numbers. This is because the range of the exponent is not large enough to
compensate for shifting the binary point to the right to eliminate leading zeros.
When the biased exponent is zero, smaller numbers can only be represented by making the
integer bit (and perhaps other leading bits) of the significand zero. The numbers in this range are
called denormalized (or tiny) numbers. The use of leading zeros with denormalized numbers
allows smaller numbers to be represented. However, this denormalization causes a loss of preci-
sion (the number of significant bits in the fraction is reduced by the leading zeros).
When performing normalized floating-point computations, an FPU normally operates on
normalized numbers and produces normalized numbers as results. Denormalized numbers
represent an underflow condition.
A denormalized number is computed through a technique called gradual underflow. Table 7-2
gives an example of gradual underflow in the denormalization process. Here the single-real
format is being used, so the minimum exponent (unbiased) is −12610. The true result in this
example requires an exponent of −12910 in order to have a normalized number. Since −12910 is
beyond the allowable exponent range, the result is denormalized by inserting leading zeros until
the minimum exponent of −12610 is reached.
NOTE:
* Expressed as an unbiased, decimal number.
In the extreme case, all the significant bits are shifted out to the right by leading zeros, creating
a zero result.
The FPU deals with denormal values in the following ways:
• It avoids creating denormals by normalizing numbers whenever possible.
• It provides the floating-point underflow exception to permit programmers to detect cases
when denormals are created.
• It provides the floating-point denormal-operand exception to permit procedures or
programs to detect when denormals are being used as source operands for computations.
When a denormal number in single- or double-real format is used as a source operand and the
denormal exception is masked, the FPU automatically normalizes the number when it is
converted to extended-real format.
7-7
FLOATING-POINT UNIT
7.2.3.4. NANS
Since NaNs are non-numbers, they are not part of the real number line. In Figure 7-3, the
encoding space for NaNs in the FPU floating-point formats is shown above the ends of the real
number line. This space includes any value with the maximum allowable biased exponent and a
non-zero fraction. (The sign bit is ignored for NaNs.)
The IEEE standard defines two classes of NaN: quiet NaNs (QNaNs) and signaling NaNs
(SNaNs). A QNaN is a NaN with the most significant fraction bit set; an SNaN is a NaN with
the most significant fraction bit clear. QNaNs are allowed to propagate through most arithmetic
operations without signaling an exception. SNaNs generally signal an invalid-operation excep-
tion whenever they appear as operands in arithmetic operations. Exceptions are discussed in
Section 7.7., “Floating-Point Exception Handling”.
See Section 7.6., “Operating on NaNs”, for detailed information on how the FPU handles NaNs.
7.2.4. Indefinite
For each FPU data type, one unique encoding is reserved for representing the special value
indefinite. For example, when operating on real values, the real indefinite value is a QNaN
(see Section 7.4.1., “Real Numbers”). The FPU produces indefinite values as responses to
masked floating-point exceptions.
Instruction
Decoder and
Sequencer
Integer
FPU
Unit
Data Bus
Figure 7-4. Relationship Between the Integer Unit and the FPU
The instruction execution environment of the FPU (see Figure 7-5) consists of 8 data registers
(called the FPU data registers) and the following special-purpose registers:
• The status register.
• The control register.
• The tag word register.
• Instruction pointer register.
• Last operand (data pointer) register.
• Opcode register.
These registers are described in the following sections.
7-9
FLOATING-POINT UNIT
15 0 47 0
Control
Register FPU Instruction Pointer
Tag 10 0
Register
Opcode
If a load operation is performed when TOP is at 0, register wraparound occurs and the new value
of TOP is set to 7. The floating-point stack-overflow exception indicates when wraparound
might cause an unsaved value to be overwritten (see Section 7.8.1.1., “Stack Overflow or Under-
flow Exception (#IS)”).
7-10
FLOATING-POINT UNIT
Many floating-point instructions have several addressing modes that permit the programmer to
implicitly operate on the top of the stack, or to explicitly operate on specific registers relative to
the TOP. Assemblers supports these register addressing modes, using the expression ST(0), or
simply ST, to represent the current stack top and ST(i) to specify the ith register from TOP in
the stack (0 ≤ i ≤ 7). For example, if TOP contains 011B (register 3 is the top of the stack), the
following instruction would add the contents of two registers in the stack (registers 3 and 5):
FADD ST, ST(2);
Figure 7-7 shows an example of how the stack structure of the FPU registers and instructions are
typically used to perform a series of computations. Here, a two-dimensional dot product is
computed, as follows:
1. The first instruction (FLD value1) decrements the stack register pointer (TOP) and loads
the value 5.6 from memory into ST(0). The result of this operation is shown in snap-shot
(a).
2. The second instruction multiplies the value in ST(0) by the value 2.4 from memory and
stores the result in ST(0), shown in snap-shot (b).
3. The third instruction decrements TOP and loads the value 3.8 in ST(0).
4. The fourth instruction multiplies the value in ST(0) by the value 10.3 from memory and
stores the result in ST(0), shown in snap-shot (c).
5. The fifth instruction adds the value and the value in ST(1) and stores the result in ST(0),
shown in snap-shot (d).
The style of programming demonstrated in this example is supported by the floating-point
instruction set. In cases where the stack structure causes computation bottlenecks, the FXCH
(exchange FPU register contents) instruction can be used to streamline a computation.
7-11
FLOATING-POINT UNIT
Computation
Dot Product = (5.6 x 2.4) + (3.8 x 10.3)
Code:
FLD value1 ;(a) value1=5.6
FMUL value2 ;(b) value2=2.4
FLD value3 ; value3=3.8
FMUL value4 ;(c)value4=10.3
FADD ST(1) ;(d)
7-12
FLOATING-POINT UNIT
point instructions set the condition code flags. These condition code bits are used principally for
conditional branching and for storage of information used in exception handling (see Section
7.3.3., “Branching and Conditional Moves on FPU Condition Codes”).
FPU Busy
Top of Stack Pointer
15 14 13 11 10 9 8 7 6 5 4 3 2 1 0
C C C C E S P U O Z D I
B TOP
3 2 1 0 S F E E E E E E
Condition
Code
Error Summary Status
Stack Fault
Exception Flags
Precision
Underflow
Overflow
Zero Divide
Denormalized Operand
Invalid Operation
As shown in Table 7-3, the C1 condition code flag is used for a variety of functions. When both
the IE and SF flags in the FPU status word are set, indicating a stack overflow or underflow
exception (#IS), the C1 flag distinguishes between overflow (C1=1) and underflow (C1=0).
When the PE flag in the status word is set, indicating an inexact (rounded) result, the C1 flag is
set to 1 if the last rounding by the instruction was upward. The FXAM instruction sets C1 to the
sign of the value being examined.
The C2 condition code flag is used by the FPREM and FPREM1 instructions to indicate an
incomplete reduction (or partial remainder). When a successful reduction has been completed,
the C0, C3, and C1 condition code flags are set to the three least-significant bits of the quotient
(Q2, Q1, and Q0, respectively). See “FPREM1—Partial Remainder” in Chapter 3, Instruction
Set Reference, of the Intel Architecture Software Developer’s Manual, Volume 2, for more infor-
mation on how these instructions use the condition code flags.
The FPTAN, FSIN, FCOS, and FSINCOS instructions set the C2 flag to 1 to indicate that the
source operand is beyond the allowable range of ±263.
Where the state of the condition code flags are listed as undefined in Table 7-3, do not rely on
any specific value in these flags.
7-13
FLOATING-POINT UNIT
7-14
FLOATING-POINT UNIT
set, the FPU exception handler is invoked, using one of the techniques described in Section
7.7.3., “Software Exception Handling”. (Note that if an exception flag is masked, the FPU will
still set the flag if its associated exception occurs, but it will not set the ES flag.)
The exception flags are “sticky” bits, meaning that once set, they remain set until explicitly
cleared. They can be cleared by executing the FCLEX/FNCLEX (clear exceptions) instructions,
by reinitializing the FPU with the FINIT/FNINIT or FSAVE/FNSAVE instructions, or by over-
writing the flags with an FRSTOR or FLDENV instruction.
The B-bit (bit 15) is included for 8087 compatibility only. It reflects the contents of the ES flag.
7-15
FLOATING-POINT UNIT
SAHF Instruction
31 EFLAGS Register 7 0
Z P C
F F 1 F
Figure 7-9. Moving the FPU Condition Codes to the EFLAGS Register
The new mechanism is available only in the Pentium Pro processor. Using this mechanism, the
new floating-point compare and set EFLAGS instructions (FCOMI, FCOMIP, FUCOMI, and
FUCOMIP) compare two floating-point values and set the ZF, PF, and CF flags in the EFLAGS
register directly. A single instruction thus replaces the three instructions required by the old
mechanism.
Note also that the FCMOVcc instructions (also new in the Pentium Pro processor) allow condi-
tional moves of floating-point values (values in the FPU data registers) based on the setting of
the status flags (ZF, PF, and CF) in the EFLAGS register. These instructions eliminate the need
for an IF statement to perform conditional moves of floating-point values.
7-16
FLOATING-POINT UNIT
Infinity Control
Rounding Control
Precision Control
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
P U O Z D I
X RC PC
M M M M M M
Exception Masks
Precision
Underflow
Overflow
Zero Divide
Denormalized Operand
Invalid Operation
Reserved
The precision-control (PC) field (bits 8 and 9 of the FPU control word) determines the precision
(64, 53, or 24 bits) of floating-point calculations made by the FPU (see Table 7-4). The default
precision is extended precision, which uses the full 64-bit significand available with the
extended-real format of the FPU data registers. This setting is best suited for most applications,
because it allows applications to take full advantage of the precision of the extended-real format.
NOTE:
* Includes the implied integer bit.
7-17
FLOATING-POINT UNIT
The double precision and single precision settings, reduce the size of the significand to 53 bits
and 24 bits, respectively. These settings are provided to support the IEEE standard and to allow
exact replication of calculations which were done using the lower precision data types. Using
these settings nullifies the advantages of the extended-real format's 64-bit significand length.
When reduced precision is specified, the rounding of the significand value clears the unused bits
on the right to zeros.
The precision-control bits only affect the results of the following floating-point instructions:
FADD, FADDP, FSUB, FSUBP, FSUBR, FSUBRP, FMUL, FMULP, FDIV, FDIVP, FDIVR,
FDIVRP, and FSQRT.
The round up and round down modes are termed directed rounding and can be used to imple-
ment interval arithmetic. Interval arithmetic is used to determine upper and lower bounds for the
true result of a multistep computation, when the intermediate results of the computation are
subject to rounding.
The round toward zero mode (sometimes called the “chop” mode) is commonly used when
performing integer arithmetic with the FPU.
Whenever possible, the FPU produces an infinitely precise result in the destination format
(single, double, or extended real). However, it is often the case that the infinitely precise result
of an arithmetic or store operation cannot be encoded exactly in the format of the destination
operand.
7-18
FLOATING-POINT UNIT
For example, the following value (a) has a 24-bit fraction. The least-significant bit of this frac-
tion (the underlined bit) cannot be encoded exactly in the single-real format (which has only a
23-bit fraction):
(a) 1.0001 0000 1000 0011 1001 0111E2 101
To round this result (a), the FPU first selects two representable fractions b and c that most closely
bracket a in value (b < a < c).
(b) 1.0001 0000 1000 0011 1001 011E2 101
(c) 1.0001 0000 1000 0011 1001 100E2 101
The FPU then sets the result to b or to c according to the rounding mode selected in the RC field.
Rounding introduces an error in a result that is less than one unit in the last place to which the
result is rounded.
The rounded result is called the inexact result. When the FPU produces an inexact result, the
floating-point precision (inexact) flag (PE) is set in the FPU status word.
When the overflow exception is masked and the infinitely precise result is between the largest
positive finite value allowed in a particular format and +∞, the FPU rounds the result as shown
in Table 7-6.
When the overflow exception is masked and the infinitely precise result is between the largest
negative finite value allowed in a particular format and −∞, the FPU rounds the result as shown
in Table 7-7.
The rounding modes have no effect on comparison operations, operations that produce exact
results, or operations that produce NaN results.
7-19
FLOATING-POINT UNIT
15 0
TAG Values
00 — Valid
01 — Zero
10 — Special: invalid (NaN, unsupported), infinity, or denormal
11 — Empty
Each tag in the FPU tag word corresponds to a physical register (numbers 0 through 7). The
current top-of-stack (TOP) pointer stored in the FPU status word can be used to associate tags
with registers relative to ST(0).
The FPU uses the tag values to detect stack overflow and underflow conditions. Stack overflow
occurs when the TOP pointer is decremented (due to a register load or push operation) to point
to a non-empty register. Stack underflow occurs when the TOP pointer is incremented (due to a
save or pop operation) to point to an empty register or when an empty register is also referenced
as a source operand. A non-empty register is defined as a register containing a zero (01), a valid
value (00), or an special (10) value.
Application programs and exception handlers can use this tag information to check the contents
of an FPU data register without performing complex decoding of the actual data in the register.
To read the tag register, it must be stored in memory using either the FSTENV/FNSTENV or
FSAVE/FNSAVE instructions. The location of the tag word in memory after being saved with
one of these instructions is shown in Figures 7-13 through 7-16.
Software cannot directly load or modify the tags in the tag register. The FLDENV and FRSTOR
instructions load an image of the tag register into the FPU; however, the FPU uses those tag
7-20
FLOATING-POINT UNIT
values only to determine if the data registers are empty (11B) or non-empty (00B, 01B, or 10B).
If the tag register image indicates that a data register is empty, the tag in the tag register for that
data register is marked empty (11B); if the tag register image indicates that the data register is
non-empty, the FPU reads the actual value in the data register and sets the tag for the register
accordingly. This action prevents a program from setting the values in the tag register to incor-
rectly represent the actual contents of non-empty data registers.
7-21
FLOATING-POINT UNIT
information plus the contents of the FPU data registers. Note that the FSAVE/FNSAVE instruc-
tion also initializes the FPU to default values (just as the FINIT/FNINIT instruction does) after
it has saved the original state of the FPU.
10 8 7 0
The manner in which this information is stored in memory depends on the operating mode of
the processor (protected mode or real-address mode) and on the operand-size attribute in effect
(32-bit or 16-bit). See Figures 7-13 through 7-16. In virtual-8086 mode or SMM, the real-
address mode formats shown in Figure 7-16 is used. See “Using the FPU in SMM” in Chapter
11 of the Intel Architecture Software Developer’s Manual, Volume 3, for special considerations
for using the FPU while in SMM.
Reserved
Figure 7-13. Protected Mode FPU State Image in Memory, 32-Bit Format
7-22
FLOATING-POINT UNIT
Reserved
Figure 7-14. Real Mode FPU State Image in Memory, 32-Bit Format
Figure 7-15. Protected Mode FPU State Image in Memory, 16-Bit Format
Figure 7-16. Real Mode FPU State Image in Memory, 16-Bit Format
7-23
FLOATING-POINT UNIT
The FLDENV and FRSTOR instructions allow FPU state information to be loaded from
memory into the FPU. Here, the FLDENV instruction loads only the status, control, tag, FPU
instruction pointer, FPU operand pointer, and opcode registers, and the FRSTOR instruction
loads all the FPU registers, including the FPU stack registers.
Single Real
Sign Exp. Fraction
3130 23 22 Implied Integer 0
Double Real
Sign Exponent Fraction
63 62 52 51 Implied Integer 0
Sign
Extended Real
Exponent Fraction
79 78 6463 62 Integer 0
Word Integer
Sign
15 14 0
Short Integer
Sign
31 30 0
Long Integer
Sign
Sign 63 62 0
Packed BCD Integers
X D17 D16 D15 D14 D13 D12 D11 D10 D9 D8 D7 D6 D5 D4 D3 D2 D1 D0
79 78 72 71 4 Bits = 1 BCD Digit 0
7-24
FLOATING-POINT UNIT
When stored in memory, the least significant byte of an FPU data-type value is stored at the
initial address specified for the value. Successive bytes from the value are then stored in succes-
sively higher addresses in memory. The floating-point instructions load and store memory oper-
ands using only the initial address of the operand.
The exponent of each real data type is encoded in biased format. The biasing constant is 127 for
the single-real format, 1023 for the double-real format, and 16,383 for the extended-real format.
Table 7-9 shows the encodings for all the classes of real numbers (that is, zero, denormalized-
finite, normalized-finite, and ∞) and NaNs for each of the three real data-types. It also gives the
format for the real indefinite value.
When storing real values in memory, single-real values are stored in 4 consecutive bytes in
memory; double-real values are stored in 8 consecutive bytes; and extended-real values are
stored in 10 consecutive bytes.
As a general rule, values should be stored in memory in double-real format. This format provides
sufficient range and precision to return correct results with a minimum of programmer attention.
7-25
FLOATING-POINT UNIT
The single-real format is appropriate for applications that are constrained by memory; however,
it provides less precision and a greater chance of overflow. The single-real format is also useful
for debugging algorithms, because rounding problems will manifest themselves more quickly in
this format. The extended-real format is normally reserved for holding intermediate results in
the FPU registers and constants. Its extra length is designed to shield final results from the
effects of rounding and overflow/underflow in intermediate calculations. However, when an
application requires the maximum range and precision of the FPU (for data storage, computa-
tions, and results), values can be stored in memory in extended-real format.
The real indefinite value is a QNaN encoding that is stored by several floating-point instructions
in response to a masked floating-point invalid-operation exception (see Table 7-20).
7-26
FLOATING-POINT UNIT
The most significant bit of each format is the sign bit (0 for positive and 1 for negative). Negative
values are represented in standard two's complement notation. The quantity zero is represented
with all bits (including the sign bit) set to zero. Note that the FPU’s word-integer data type is
identical to the word-integer data type used by the processor’s integer unit and the short-integer
format is identical to the integer unit’s doubleword-integer data type.
Word-integer values are stored in 2 consecutive bytes in memory; short-integer values are stored
in 4 consecutive bytes; and long-integer values are stored in 8 consecutive bytes. When loaded
into the FPU’s data registers, all the binary integers are exactly representable in the extended-
real format.
The binary integer encoding 100..00B represents either of two things, depending on the circum-
stances of its use:
• The largest negative number supported by the format (–215, –231, or –263).
• The integer indefinite value.
7-27
FLOATING-POINT UNIT
If this encoding is used as a source operand (as in an integer load or integer arithmetic instruc-
tion), the FPU interprets it as the largest negative number representable in the format being used.
If the FPU detects an invalid operation when storing an integer value in memory with an
FIST/FISTP instruction and the invalid-operation exception is masked, the FPU stores the
integer indefinite encoding in the destination operand as a masked response to the exception. In
situations where the origin of a value with this encoding may be ambiguous, the invalid-opera-
tion exception flag can be examined to see if the value was produced as a response to an
exception.
If the integer indefinite is stored in memory and is later loaded back into an FPU data register,
it is interpreted as the largest negative number supported by the format.
7-28
FLOATING-POINT UNIT
The encodings formerly known as pseudo-denormal numbers are not generated by the Intel 387
math coprocessor and the internal FPUs in the Intel486, Pentium, and Pentium Pro processors;
however, they are used correctly when encountered as operands. The exponent is treated as if it
were 00..01B and the mantissa is unchanged. The denormal exception is generated.
7-29
FLOATING-POINT UNIT
See Section 6.2.3., “Floating-Point Instructions”, for a list of the floating-point instructions by
category.
The following section briefly describes the instructions in each category. Detailed descriptions
of the floating-point instructions are given in Chapter 3, Instruction Set Reference, in the Intel
Architecture Software Developer’s Manual, Volume 2.
7-30
FLOATING-POINT UNIT
Operands are normally stored in the FPU data registers in extended-real format (see Section
7.3.4.2., “Precision Control Field”). The FLD (load real) instruction pushes a real operand from
memory onto the top of the FPU data-register stack. If the operand is in single- or double-real
format, it is automatically converted to extended-real format. This instruction can also be used
to push the value in a selected FPU data register onto the top of the register stack.
The FILD (load integer) instruction converts an integer operand in memory into extended-real
format and pushes the value onto the top of the register stack. The FBLD (load packed decimal)
instruction performs the same load operation for a packed BCD operand in memory.
The FST (store real) and FIST (store integer) instructions store the value in register ST(0) in
memory in the destination format (real or integer, respectively). Again, the format conversion is
carried out automatically.
7-31
FLOATING-POINT UNIT
The FSTP (store real and pop), FISTP (store integer and pop), and FBSTP (store packed decimal
and pop) instructions store the value in the ST(0) registers into memory in the destination format
(real, integer, or packed BCD), then performs a pop operation on the register stack. A pop oper-
ation causes the ST(0) register to be marked empty and the stack pointer (TOP) in the FPU
control work to be incremented by 1. The FSTP instruction can also be used to copy the value
in the ST(0) register to another FPU register [ST(i)].
The FXCH (exchange register contents) instruction exchanges the value in a selected register in
the stack [ST(i)] with the value in ST(0).
The FCMOVcc (conditional move) instructions move the value in a selected register in the stack
[ST(i)] to register ST(0). These instructions move the value only if the conditions specified with
a condition code (cc) are satisfied (see Table 7-14). The conditions being tested with the
FCMOVcc instructions are represented by the status flags in the EFLAGS register. The condi-
tion code mnemonics are appended to the letters “FCMOV” to form the mnemonic for a
FCMOVcc instruction.
Like the CMOVcc instructions, the FCMOVcc instructions are useful for optimizing small IF
constructions. They also help eliminate branching overhead for IF operations and the possibility
of branch mispredictions by the processor.
NOTE
The FCMOVcc instructions may not be supported on some processors in the
Pentium Pro processor family. Software can check if the FCMOVcc instruc-
tions are supported by checking the processor’s feature information with the
CPUID instruction (see “CPUID—CPU Identification” in Chapter 3 of the
Intel Architecture Software Developer’s Manual, Volume 2).
7-32
FLOATING-POINT UNIT
7-33
FLOATING-POINT UNIT
7-34
FLOATING-POINT UNIT
The FCOM, FCOMP, and FCOMPP instructions compare the value in register ST(0) with a real
source operand and set the condition code flags (C0, C2, and C3) in the FPU status word
according to the results (see Table 7-15). If an unordered condition is detected (one or both of
the values is a NaN or in an undefined format), a floating-point invalid-operation exception is
generated.
The pop versions of the instruction pop the FPU register stack once or twice after the comparison
operation is complete.
The FUCOM, FUCOMP, and FUCOMPP instructions operate the same as the FCOM, FCOMP,
and FCOMPP instructions. The only difference is that with the FUCOM, FUCOMP, and
FUCOMPP instructions, if an unordered condition is detected because one or both of the oper-
ands is a QNaN, the floating-point invalid-operation exception is not generated.
Table 7-15. Setting of FPU Condition Code Flags for Real Number Comparisons
Condition C3 C2 C0
ST(0) > Source Operand 0 0 0
ST(0) < Source Operand 0 0 1
ST(0) = Source Operand 1 0 0
Unordered 1 1 1
The FICOM and FICOMP instructions also operate the same as the FCOM and FCOMP instruc-
tions, except that the source operand is an integer value in memory. The integer value is auto-
matically converted into an extended real value prior to making the comparison. The FICOMP
instruction pops the FPU register stack following the comparison operation.
The FTST instruction performs the same operation as the FCOM instruction, except that the
value in register ST(0) is always compared with the value 0.0.
The FCOMI and FCOMIP instructions are new in the Intel Pentium Pro processor. They perform
the same comparison as the FCOM and FCOMP instructions, except that they set the status flags
(ZF, PF, and CF) in the EFLAGS register to indicate the results of the comparison (see Table
7-16) instead of the FPU condition code flags. The FCOMI and FCOMIP instructions allow
condition branch instructions (Jcc) to be executed directly from the results of their comparison.
Table 7-16. Setting of EFLAGS Status Flags for Real Number Comparisons
Comparison Results ZF PF CF
ST0 > ST(i) 0 0 0
ST0 < ST(i) 0 0 1
ST0 = ST(i) 1 0 0
Unordered 1 1 1
The FUCOMI and FUCOMIP instructions operate the same as the FCOMI and FCOMIP
instructions, except that they do not generate a floating-point invalid-operation exception if the
7-35
FLOATING-POINT UNIT
unordered condition is the result of one or both of the operands being a QNaN. The FCOMIP
and FUCOMIP instructions pop the FPU register stack following the comparison operation.
The FXAM instruction determines the classification of the real value in the ST(0) register (that
is, whether the value is zero, a denormal number, a normal finite number, ∞, a NaN, or an unsup-
ported format) or that the register is empty. It sets the FPU condition code flags to indicate the
classification (see “FXAM—Examine” in Chapter 3, Instruction Set Reference, of the Intel
Architecture Software Developer’s Manual, Volume 2). It also sets the C1 flag to indicate the sign
of the value.
2. Check ordered comparison result. Use the constants given in Table 7-17 in the TEST
instruction to test for a less than, equal to, or greater than result, then use the corresponding
conditional branch instruction to transfer program control to the appropriate procedure or
section of code.
If a program or procedure has been thoroughly tested and it incorporates periodic checks for
QNaN results, then it is not necessary to check for the unordered result every time a comparison
is made.
See Section 7.3.3., “Branching and Conditional Moves on FPU Condition Codes”, for another
technique for branching on FPU condition codes.
Some non-comparison FPU instructions update the condition code flags in the FPU status word.
To ensure that the status word is not altered inadvertently, store it immediately following a
comparison operation.
7-36
FLOATING-POINT UNIT
7.5.8. Pi
When the argument (source operand) of a trigonometric function is within the range of the func-
tion, the argument is automatically reduced by the appropriate multiple of 2π through the same
reduction mechanism used by the FPREM and FPREM1 instructions. The internal value of π
that the Intel Architecture FPU uses for argument reduction and other computations is as
follows:
π = 0.f ∗ 22
where:
f = C90FDAA2 2168C234 C
(The spaces in the fraction above indicate 32-bit boundaries.)
This internal π value has a 66-bit mantissa, which is 2 bits more than is allowed in the significand
of an extended-real value. (Since 66 bits is not an even number of hexadecimal digits, two addi-
tional zeros have been added to the value so that it can be represented in hexadecimal
format. The least-significant hexadecimal digit (C) is thus 1100B, where the two least-
significant bits represent bits 67 and 68 of the mantissa.)
This value of π has been chosen to guarantee no loss of significance in a source operand,
provided the operand is within the specified range for the instruction.
If the results of computations that explicitly use π are to be used in the FSIN, FCOS, FSINCOS,
or FPTAN instructions, the full 66-bit fraction of π should be used. This insures that the results
are consistent with the argument-reduction algorithms that these instructions use. Using a
rounded version of π can cause inaccuracies in result values, which if propagated through several
calculations, might result in meaningless results.
7-37
FLOATING-POINT UNIT
A common method of representing the full 66-bit fraction of π is to separate the value into two
numbers (highπ and lowπ) that when added together give the value for π shown earlier in this
section with the full 66-bit fraction:
π = highπ + lowπ
For example, the following two values (given in scientific notation with the fraction in hexadec-
imal and the exponent in decimal) represent the 33 most-significant and the 33 least-significant
bits of the fraction:
highπ (unnormalized)= 0.C90FDAA20 * 2+2
lowπ (unnormalized) = 0.42D184698 * 2−31
These values encoded in standard IEEE double-real format are as follows:
highπ = 400921FB 54400000
lowπ = 3DE0B461 1A600000
(Note that in the IEEE double-real format, the exponents are biased (by 1023) and the fractions
are normalized.)
Similar versions of π can also be written in extended-real format.
When using this two-part π value in an algorithm, parallel computations should be performed
on each part, with the results kept separate. When all the computations are complete, the two
results can be added together to form the final result.
The complications of maintaining a consistent value of π for argument reduction can be avoided,
either by applying the trigonometric functions only to arguments within the range of the
automatic reduction mechanism, or by performing all argument reductions (down to a magni-
tude less than π/4) explicitly in software.
7-38
FLOATING-POINT UNIT
The F2XM1 instruction computes the exponential (2x − 1). This instruction only operates on
source values in the range −1.0 to +1.0.
The FSCALE instruction multiplies the source operand by a power of 2.
–k
where k is an integer such that 1 ≤ 2 f ( x) < 2 .
With the Pentium and Pentium Pro processors, the worst case error on transcendental func-
tions is less than 1 ulp when rounding to the nearest-even and less than 1.5 ulps when rounding
in other modes. The functions are guaranteed to be monotonic, with respect to the input oper-
ands, throughout the domain supported by the instruction.
With the Intel486 processor and Intel 387 math coprocessor, the worst-case, transcendental-
function error is typically 3 or 3.5 ulps, but is sometimes as large as 4.5 ulps.
7-39
FLOATING-POINT UNIT
The FINIT/FNINIT instructions initialize the FPU and its internal registers to default values.
The FLDCW instructions loads the FPU control word register with a value from memory. The
FSTCW/FNSTCW and FSTSW/FNSTSW instructions store the FPU control and status words,
respectively, in memory (or for an FSTSW/FNSTSW instruction in a general-purpose register).
The FSTENV/FNSTENV and FSAVE/FNSAVE instructions save the FPU environment and
state, respectively, in memory. The FPU environment includes all the FPU’s control and status
registers; the FPU state includes the FPU environment and the data registers in the FPU register
stack. (The FSAVE/FNSAVE instruction also initializes the FPU to default values, like the
FINIT/FNINIT instruction, after it saves the original state of the FPU.)
The FLDENV and FRSTOR instructions load the FPU environment and state, respectively, from
memory into the FPU. These instructions are commonly used when switching tasks or contexts.
The WAIT/FWAIT instructions are synchronization instructions. (They are actually mnemonics
for the same opcode.) These instructions check the FPU status word for pending unmasked FPU
exceptions. If any pending unmasked FPU exceptions are found, they are handled before the
processor resumes execution of the instructions (integer, floating-point, or system instruction)
in the instruction stream. The WAIT/FWAIT instructions are provided to allow synchronization
of instruction execution between the FPU and the processor’s integer unit. See Section 7.9.,
“Floating-Point Exception Synchronization” for more information on the use of the
WAIT/FWAIT instructions.
NOTE
When operating a Pentium or Intel486 processor in MS-DOS compatibility
mode, it is possible (under unusual circumstances) for a non-waiting
instruction to be interrupted prior to being executed to handle a pending FPU
exception. The circumstances where this can happen and the resulting action
of the processor are described in Section D.2.1.3., “No-Wait FPU Instructions
Can Get FPU Interrupt in Window”. When operating a Pentium Pro processor
in MS-DOS compatibility mode, non-waiting instructions can not be
interrupted in this way (see Section D.2.2., “MS-DOS* Compatibility Mode
in the Pentium® Pro Processor”).
7-40
FLOATING-POINT UNIT
7-41
FLOATING-POINT UNIT
7-42
FLOATING-POINT UNIT
The nomenclature of “#” symbol followed by one or two letters (for example, #IS) is used in this
manual to indicate exception conditions. It is merely a short-hand form and is not related to
assembler mnemonics.
Each of the six exception classes has a corresponding flag bit in the FPU status word and a mask
bit in the FPU control word (see Section 7.3.2., “FPU Status Register” and Section 7.3.4., “FPU
Control Word”, respectively). In addition, the exception summary (ES) flag in the status word
indicates when any of the exceptions has been detected, and the stack fault (SF) flag (also in the
status word) distinguishes between the two types of invalid-operation exceptions.
When the FPU detects a floating-point exception, it sets the appropriate flags in the FPU status
word, then takes one of two possible courses of action:
• Handles the exception automatically, producing a predefined (and often times usable
result), while allowing program execution to continue undisturbed.
• Invokes a software exception handler to handle the exception.
The following sections describe how the FPU handles exceptions (either automatically or by
calling a software exception handler), how the FPU detects the various floating-point excep-
tions, and the automatic (masked) response to the floating-point exceptions.
7-43
FLOATING-POINT UNIT
7-44
FLOATING-POINT UNIT
Note that when exceptions are masked, the FPU may detect multiple exceptions in a single
instruction, because it continues executing the instruction after performing its masked response.
For example, the FPU can detect a denormalized operand, perform its masked response to this
exception, and then detect numeric underflow.
7-45
FLOATING-POINT UNIT
2. If the IGNNE# pin is deasserted, the FPU then asserts the FERR# pin either immediately,
or else delayed (deferred) until just before the execution of the next waiting floating-point
instruction or MMX™ instruction. Whether the FERR# pin is asserted immediately or
delayed depends on the type of processor, the instruction, and the type of exception.
3. If a preceding floating-point instruction has set the exception flag for an unmasked FPU
exception, the processor freezes just before executing the next WAIT instruction, waiting
floating-point instruction, or MMX instruction. Whether the FERR# pin was asserted at
the preceding floating-point instruction or is just now being asserted, the freezing of the
processor assures that the FPU exception handler will be invoked before the new floating-
point (or MMX) instruction gets executed.
4. The FERR# pin is connected through external hardware to IRQ13 of a cascaded, program-
mable interrupt controller (PIC). When the FERR# pin is asserted, the PIC is programmed
to generate an interrupt 75H.
5. The PIC asserts the INTR pin on the processor to signal the interrupt 75H.
6. The BIOS for the PC system handles the interrupt 75H by branching to the interrupt 2
(NMI) interrupt handler.
7. The interrupt 2 handler determines if the interrupt is the result of an NMI interrupt or a
floating-point exception.
8. If a floating-point exception is detected, the interrupt 2 handler branches to the floating-
point exception handler.
If the IGNNE# pin is asserted, the processor ignores floating-point error conditions. This pin is
provided to inhibit floating-point exceptions from being generated while the floating-point
exception handler is servicing a previously signaled floating-point exception.
Appendix D, Guidelines for Writing FPU Exception Handlers, describes the MS-DOS compat-
ibility mode in much greater detail. This mode is somewhat more complicated in the Intel486
and Pentium processor implementations, as described in Appendix D.
7-46
FLOATING-POINT UNIT
7-47
FLOATING-POINT UNIT
7-48
FLOATING-POINT UNIT
Table 7-20. Invalid Arithmetic Operations and the Masked Responses to Them
Condition Masked Response
Any arithmetic operation on an operand that is in an Return the real indefinite value to the destination
unsupported format. operand.
Any arithmetic operation on a SNaN. Return a QNaN to the destination operand (see
Section 7.6., “Operating on NaNs”).
Compare and test operations: one or both operands Set the condition code flags (C0, C2, and C3) in
are NaNs. the FPU status word to 111B (not comparable).
Addition: operands are opposite-signed infinities. Return the real indefinite value to the destination
Subtraction: operands are like-signed infinities. operand.
Multiplication: ∞ by 0; 0 by ∞. Return the real indefinite value to the destination
operand.
Division: ∞ by ∞; 0 by 0. Return the real indefinite value to the destination
operand.
Remainder instructions FPREM, FPREM1: modulus Return the real indefinite; clear condition code
(divisor) is 0 or dividend is ∞. flag C2 to 0.
Trigonometric instructions FCOS, FPTAN, FSIN, Return the real indefinite; clear condition code
FSINCOS: source operand is ∞. flag C2 to 0.
FSQRT: negative operand (except FSQRT (–0) = –0); Return the real indefinite value to the destination
FYL2X: negative operand (except FYL2X (–0) = –∞); operand.
FYL2XP1: operand more negative than –1.
FBSTP: source register is empty or it contains a NaN, Store BCD integer indefinite value in the
∞, or a value that cannot be represented in 18 destination operand.
decimal digits.
FXCH: one or both registers are tagged empty. Load empty registers with the real indefinite
value, then perform the exchange.
7-49
FLOATING-POINT UNIT
7-50
FLOATING-POINT UNIT
FSTP instructions), where a within-range value in a data register is stored in memory in a single-
or double-real format. The overflow threshold range for the single-real format is −1.0 ∗ 2128 to
1.0 ∗ 2128; the range for the double-real format is −1.0 ∗ 21024 to 1.0 ∗ 21024.
The numeric overflow exception cannot occur when overflow occurs when storing values in an
integer or BCD integer format. Instead, the invalid-arithmetic-operand exception is signaled.
The flag (OE) for the numeric-overflow exception is bit 3 of the FPU status word, and the mask
bit (OM) is bit 3 of the FPU control word.
When a numeric-overflow exception occurs and the exception is masked, the FPU sets the OE
flag and returns one of the values shown in Table 7-22. The value returned depends on the
current rounding mode of the FPU (see Section 7.3.4.3., “Rounding Control Field”).
.
Table 7-22. Masked Responses to Numeric Overflow
Rounding Mode Sign of True Result Result
To nearest + +∞
– –∞
Toward –∞ + Largest finite positive number
– –∞
Toward +∞ + +∞
– Largest finite negative number
Toward zero + Largest finite positive number
– Largest finite negative number
The action that the FPU takes when numeric overflow occurs and the numeric-overflow excep-
tion is not masked, depends on whether the instruction is supposed to store the result in memory
or on the register stack.
If the destination is a memory location, the OE flag is set and a software exception handler is
invoked (see Section 7.7.3., “Software Exception Handling”). The top-of-stack pointer (TOP)
and source and destination operands remain unchanged.
If the destination is the register stack, the exponent of the rounded result is divided by 224576 and
the result is stored along with the significand in the destination operand. Condition code bit C1
in the FPU status word (called in this situation the “round-up bit”) is set if the significand was
rounded upward and cleared if the result was rounded toward 0. After the result is stored, the OE
flag is set and a software exception handler is invoked.
The scaling bias value 24,576 is equal to 3 ∗ 213. Biasing the exponent by 24,576 normally trans-
lates the number as nearly as possible to the middle of the extended-real exponent range so that,
if desired, it can be used in subsequent scaled operations with less risk of causing further
exceptions.
When using the FSCALE instruction, massive overflow can occur, where the result is too large
to be represented, even with a bias-adjusted exponent. Here, if overflow occurs again, after the
result has been biased, a properly signed ∞ is stored in the destination operand.
7-51
FLOATING-POINT UNIT
7-52
FLOATING-POINT UNIT
The inexact-result exception flag (PE) is bit 5 of the FPU status word, and the mask bit (PM) is
bit 5 of the FPU control word.
If the inexact-result exception is masked when an inexact-result condition occurs and a numeric
overflow or underflow condition has not occurred, the FPU sets the PE flag and stores the
rounded result in the destination operand. The current rounding mode determines the method
used to round the result (see Section 7.3.4.3., “Rounding Control Field”). The C1 (round-up) bit
in the FPU status word indicates whether the inexact result was rounded up (C1 is set) or “not
rounded up” (C1 is cleared). In the “not rounded up” case, the least-significant bits of the inexact
result are truncated so that the result fits in the destination format.
If the inexact-result exception is not masked when an inexact result occurs and numeric overflow
or underflow has not occurred, the FPU performs the same operation described in the previous
paragraph and, in addition, invokes a software exception handler (see Section 7.7.3., “Software
Exception Handling”).
If an inexact result occurs in conjunction with numeric overflow or underflow, one of the
following operations is carried out:
• If an inexact result occurs along with masked overflow or underflow, the OE or UE flag
and the PE flag are set and the result is stored as described for the overflow or underflow
exceptions (see Section 7.8.4., “Numeric Overflow Exception (#O)” or Section 7.8.5.,
“Numeric Underflow Exception (#U)”). If the inexact-result exception is unmasked, the
FPU also invokes the software exception handler.
• If an inexact result occurs along with unmasked overflow or underflow and the destination
operand is a register, the OE or UE flag and the PE flag are set, the result is stored as
described for the overflow or underflow exceptions, and the software exception handler is
invoked.
• If an inexact result occurs along with unmasked overflow or underflow and the destination
operand is a memory location, the inexact-result condition is ignored.
7-53
FLOATING-POINT UNIT
7-54
FLOATING-POINT UNIT
When a floating-point exception is unmasked and the exception condition occurs, the FPU stops
further execution of the floating-point instruction and signals the exception event. On the next
occurrence of a floating-point instruction or a WAIT/FWAIT instruction in the instruction
stream, the processor checks the ES flag in the FPU status word for pending floating-point
exceptions. It floating-point exceptions are pending, the FPU makes an implicit call (traps) to
the floating-point software exception handler. The exception handler can then execute recovery
procedures for selected or all floating-point exceptions.
Synchronization problems occur in the time frame between when the exception is signaled and
when it is actually handled. Because of concurrent execution, integer or system instructions can
be executed during this time frame. It is thus possible for the source or destination operands for
a floating-point instruction that faulted to be overwritten in memory, making it impossible for
the exception handler to analyze or recover from the exception.
To solve this problem, an exception synchronizing instruction (either a floating-point instruction
or a WAIT/FWAIT instruction) can be placed immediately after any floating-point instruction
that might present a situation where state information pertaining to a floating-point exception
might be lost or corrupted. Floating-point instructions that store data in memory are prime candi-
dates for synchronization. For example, the following three lines of code have the potential for
exception synchronization problems:
FILD COUNT ; Floating-point instruction
INC COUNT ; Integer instruction
FSQRT ; Subsequent floating-point instruction
In this example, the INC instruction modifies the result of a floating-point instruction (FILD).
If an exception is signaled during the execution of the FILD instruction, the result stored in the
COUNT memory location might be overwritten before the exception handler is called.
Rearranging the instructions, as follows, so that the FSQRT instruction follows the FILD
instruction, synchronizes the exception handling and eliminates the possibility of the exception
being handled incorrectly.
FILD COUNT ; Floating-point instruction
FSQRT ; Subsequent floating-point instruction synchronizes
; any exceptions generated by the FILD instruction.
INC COUNT ; Integer instruction
The FSQRT instruction does not require any synchronization, because the results of this instruc-
tion are stored in the FPU data registers and will remain there, undisturbed, until the next
floating-point or WAIT/FWAIT instruction is executed. To absolutely insure that any exceptions
emanating from the FSQRT instruction are handled (for example, prior to a procedure call), a
WAIT instruction can be placed directly after the FSQRT instruction.
Note that some floating-point instructions (non-waiting instructions) do not check for pending
unmasked exceptions (see Section 7.5.11., “FPU Control Instructions”). They include the
FNINIT, FNSTENV, FNSAVE, FNSTSW, FNSTCW, and FNCLEX instructions. When an
FNINIT, FNSTENV, FNSAVE, or FNCLEX instruction is executed, all pending exceptions are
essentially lost (either the FPU status register is cleared or all exceptions are masked). The
FNSTSW and FNSTCW instructions do not check for pending interrupts, but they do not
modify the FPU status and control registers. A subsequent “waiting” floating-point instruction
can then handle any pending exceptions.
7-55
8
Programming With
the Intel MMX™
Technology
CHAPTER 8
PROGRAMMING WITH THE INTEL
MMX™ TECHNOLOGY
The Intel MMX technology comprises a set of extensions to the Intel Architecture that are
designed to greatly enhance the performance of advanced media and communications applica-
tions. These extensions (which include new registers, data types, and instructions) are combined
with a single-instruction, multiple-data (SIMD) execution model to accelerate the performance
of applications such as motion video, combined graphics with video, image processing, audio
synthesis, speech synthesis and compression, telephony, video conferencing, and 2D and 3D
graphics, which typically use compute-intensive algorithms to perform repetitive operations on
large arrays of simple, native data elements.
The MMX technology defines a simple and flexible software model, with no new mode or oper-
ating-system visible state. All existing software will continue to run correctly, without modifi-
cation, on Intel Architecture processors that incorporate the MMX technology, even in the
presence of existing and new applications that incorporate this technology.
The following sections of this chapter describe the MMX technology’s basic programming envi-
ronment, including the MMX register set, data types, and instruction set. Detailed descriptions
of the MMX instructions are provided in Chapter 3, Instruction Set Reference, of the Intel Archi-
tecture Software Developer’s Manual, Volume 2. The manner in which the MMX technology is
integrated into the Intel Architecture system programming model is described in Chapter 10,
MMX™ Technology System Programming Model, in the Intel Architecture Software Developer’s
Manual, Volume 3.
8-1
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
63 0
MM7
MM6
MM5
MM4
MM3
MM2
MM1
MM0
3006044
Although the MMX registers are defined in the Intel Architecture as separate registers, they are
aliased to the registers in the FPU data register stack (R0 through R7). (See Chapter 10, MMX™
Technology System Programming Model, in the Intel Architecture Software Developer’s
Manual, Volume 3, for more a detailed discussion of the aliasing of MMX registers.)
3006002
The MMX instructions move the packed data types (packed bytes, packed words, or packed
doublewords) and the quadword data type to-and-from memory or to-and-from the Intel Archi-
tecture general-purpose registers in 64-bit blocks. However, when performing arithmetic or
logical operations on the packed data types, the MMX instructions operate in parallel on the
individual bytes, words, or doublewords contained in a 64-bit MMX register, as described in the
following section (Section 8.1.3., “Single Instruction, Multiple Data (SIMD) Execution
Model”).
When operating on the bytes, words, and doublewords within packed data types, the MMX
instructions recognize and operate on both signed and unsigned byte integers, word integers, and
doubleword integers.
8-3
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
The SIMD execution model supported in the MMX technology directly addresses the needs of
modern media, communications, and graphics applications, which often use sophisticated algo-
rithms that perform the same operations on a large number of small data types (bytes, words, and
doublewords). For example, most audio data is represented in 16-bit (word) quantities. The
MMX instructions can operate on 4 of these words simultaneously with one instruction. Video
and graphics information is commonly represented as palletized 8-bit (byte) quantities. Here,
one MMX instruction can operate on 8 of these bytes simultaneously.
63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0
Byte 7 Byte 6 Byte 5 Byte 4 Byte 3 Byte 2 Byte 1 Byte 0
3006045
8-4
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
• Comparison instructions
• Conversion instructions
• Logical instructions
• Shift instructions
• Empty MMX™ state instruction (EMMS)
When operating on packed data within an MMX register, the data is cast by the type specified
by the instruction. For example, the PADDB (add packed bytes) instruction treats the packed
data in an MMX register as 8 packed bytes; whereas, the PADDW (add packed words) instruc-
tion treats the packed data as 4 packed words.
For example, when the result exceeds the data range limit for signed bytes, it is saturated to 7FH
(FFH for unsigned bytes). If a value is less than the data range limit, it is saturated to 80H for
signed bytes (00H for unsigned bytes).
Saturation provides a useful feature of avoiding wraparound artifacts. In the example of color
calculations, saturation causes a color to remain pure black or pure white without allowing for
and inversion.
MMX instructions do not indicate overflow or underflow occurrence by generating exceptions
or setting flags.
8-5
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
8-6
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
8-7
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
8-8
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
implement a packed conditional move operation without a branch or a set of branch instructions.
No flags are set.
These instructions support packed byte, packed word and packed doubleword data types.
8-9
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
See the section titled “Instruction Prefixes” in Chapter 2 of the Intel Architecture Software
Developer’s Manual, Volume 2, for detailed information on prefixes.
8-10
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
NOTE
The CPUID instruction will continue to report the existence of the MMX
technology if the CR0.EM bit is set (which signifies that the CPU is
configured to generate exception interrupt 7 that can be used to emulate
floating point instructions). In this case, executing an MMX instruction
results in an invalid opcode exception.
Example 8-1 illustrates how to use the CPUID instruction. This example does not represent the
entire CPUID sequence, but shows the portion used for detection of MMX technology.
Example 8-1. Partial Routine for Detecting MMX™ Technology with the CPUID Instruction
... ; identify existence of CPUID instruction
...
... ; identify Intel processor
....
mov EAX, 1 ; request for feature flags
CPUID ; 0Fh, 0A2h CPUID instruction
test EDX, 00800000h ; Is IA MMX technology bit (Bit 23 of EDX)
; in feature flags set?
jnz MMX_Technology_Found
8-11
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
point tag word as empty. Therefore, it is imperative to use the EMMS instruction at the end of
every MMX routine, if the next routine may contain FPU code.
The EMMS instruction must be used in each of the following cases:
• When an application using the floating-point instructions calls an MMX™ technology
library/DLL. (Use the EMMS instruction at the end of the MMX code.)
• When an application using MMX instructions calls a floating-point library/DLL. (Use the
EMMS instruction before calling the floating-point code.)
• When a switch is made between MMX code in a task/thread and other tasks/threads in
cooperative operating systems, unless it is certain that more MMX instructions will be
executed before any FPU code.
If the EMMS instruction is not used when trying to execute a floating-point instruction, the
following may occur:
• Depending on the exception mask bits of the floating-point control word, a floating- point
exception event may be generated.
• A “soft exception” may occur. In this case floating-point code continues to execute, but
generates incorrect results. This happens when the floating-point exceptions are masked
and no visible exceptions occur. The internal exception handler (microcode, not user
visible) loads a NaN (Not a Number) with an exponent of 11..11B onto the floating-point
stack. The NaN is used for further calculations, yielding incorrect results.
• A potential error may occur only if the operating system does NOT manage floating-point
context across task switches. These operating systems are usually cooperative operating
systems. It is imperative that the EMMS instruction execute at the end of all the MMX™
routines that may enable a task switch immediately after they end execution (explicit yield
API or implicit yield API).
8-12
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
• Passing parameters to an MMX™ routine by passing a pointer to a structure via the integer
stack.
• Returning a value from a function by returning the pointer to a structure.
8-13
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
FP_code 1:
..
.. (*leave the FPU stack empty*)
8-14
PROGRAMMING WITH THE INTEL MMX™ TECHNOLOGY
8-15
9
Input/Output
CHAPTER 9
INPUT/OUTPUT
In addition to transferring data to and from external memory, Intel Architecture processors can
also transfer data to and from input/output ports (I/O ports). I/O ports are created in system hard-
ware by circuity that decodes the control, data, and address pins on the processor. These I/O
ports are then configured to communicate with peripheral devices. An I/O port can be an input
port, an output port, or a bidirectional port. Some I/O ports are used for transmitting data, such
as to and from the transmit and receive registers, respectively, of a serial interface device. Other
I/O ports are used to control peripheral devices, such as the control registers of a disk controller.
This chapter describes the processor’s I/O architecture. The topics discussed include:
• I/O port addressing.
• I/O instructions.
• I/O protection mechanism.
9-1
INPUT/OUTPUT
9-2
INPUT/OUTPUT
Physical Memory
FFFF FFFFH
EPROM
I/O Port
I/O Port
I/O Port
RAM
All the Intel Architecture processors that have on-chip caches also provide the PCD (page-level
cache disable) flag in page table and page directory entries. This flag allows caching to be
disabled on a page-by-page basis. See “Page-Directory and Page-Table Entries” in Chapter 3 of
in the Intel Architecture Software Developer’s Manual, Volume 3.
9-3
INPUT/OUTPUT
When used with one of the repeat prefixes (such as REP), the INS and OUTS instructions
perform string (or block) input or output operations. The repeat prefix REP modifies the INS and
OUTS instructions to transfer blocks of data between an I/O port and memory. Here, the ESI or
EDI register is incremented or decremented (according to the setting of the DF flag in the
EFLAGS register) after each byte, word, or doubleword is transferred between the selected I/O
port and memory.
See the individual references for the IN, INS, OUT, and OUTS instructions in Chapter 3,
Instruction Set Reference, of the Intel Architecture Software Developer’s Manual, Volume 2, for
more information on these instructions.
9-4
INPUT/OUTPUT
The I/O permission bit map in the TSS can be used to modify the effect of the IOPL on I/O sensi-
tive instructions, allowing access to some I/O ports by less privileged programs or tasks (see
Section 9.5.2., “I/O Permission Bit Map”).
A program or task can change its IOPL only with the POPF and IRET instructions; however,
such changes are privileged. No procedure may change the current IOPL unless it is running at
privilege level 0. An attempt by a less privileged procedure to change the IOPL does not result
in an exception; the IOPL simply remains unchanged.
The POPF instruction also may be used to change the state of the IF flag (as can the CLI and
STI instructions); however, the POPF instruction in this case is also I/O sensitive. A procedure
may use the POPF instruction to change the setting of the IF flag only if the CPL is less than or
equal to the current IOPL. An attempt by a less privileged procedure to change the IF flag does
not result in an exception; the IF flag simply remains unchanged.
Because each task has its own TSS, each task has its own I/O permission bit map. Access to indi-
vidual I/O ports can thus be granted to individual tasks.
If in protected mode and the CPL is less than or equal to the current IOPL, the processor allows
all I/O operations to proceed. If the CPL is greater than the IOPL or if the processor is operating
9-5
INPUT/OUTPUT
in virtual-8086 mode, the processor checks the I/O permission bit map to determine if access to
a particular I/O port is allowed. Each bit in the map corresponds to an I/O port byte address. For
example, the control bit for I/O port address 29H in the I/O address space is found at bit position
1 of the sixth byte in the bit map. Before granting I/O access, the processor tests all the bits corre-
sponding to the I/O port being addressed. For a doubleword access, for example, the processors
tests the four bits corresponding to the four adjacent 8-bit port addresses. If any tested bit is set,
a general-protection exception (#GP) is signaled. If all tested bits are clear, the I/O operation is
allows to proceed.
Because I/O port addresses are not necessarily aligned to word and doubleword boundaries, the
processor read two bytes from the I/O permission bit map for every access to an I/O port. To
prevent exceptions from being generated when the ports with the highest addresses are accessed,
an extra byte needs to included in the TSS immediately after the table. This byte must have all
of its bits set, and it must be within the segment limit.
It is not necessary for the I/O permission bit map to represent all the I/O addresses. I/O addresses
not spanned by the map are treated as if they had set bits in the map. For example, if the TSS
segment limit is 10 bytes past the bit-map base address, the map has 11 bytes and the first 80 I/O
ports are mapped. Higher addresses in the I/O address space generate exceptions.
If the I/O bit map base address is greater than or equal to the TSS segment limit, there is no I/O
permission map, and all I/O instructions generate exceptions when the CPL is greater than the
current IOPL. The I/O bit map base address must be less than or equal to DFFFH.
9-6
INPUT/OUTPUT
Another method of enforcing program order is to insert one of the serializing instructions, such
as the CPUID instruction, between operations. See Chapter 7, Multiple Processor Management,
in the Intel Architecture Software Developer’s Manual, Volume 3, for more information on seri-
alization of instructions.
It should be noted that the chip set being used to support the processor (bus controller, memory
controller, and/or I/O controller) may post writes to uncacheable memory which can lead to out-
of-order execution of memory accesses. In situations where out-of-order processing of memory
accesses by the chip set can potentially cause faulty memory-mapped I/O processing, code must
be written to force synchronization and ordering of I/O operations. Serializing instructions can
often be used for this purpose.
When the I/O address space is used instead of memory-mapped I/O, the situation is different in
two respects:
• The processor never buffers I/O writes. Therefore, strict ordering of I/O operations is
enforced by the processor. (As with memory-mapped I/O, it is possible for a chip set to
post writes in certain I/O ranges.)
• The processor synchronizes I/O instruction execution with external bus activity (see Table
9-1).
IN Yes Yes
INS Yes Yes
REP INS Yes Yes
OUT Yes Yes Yes
OUTS Yes Yes Yes
REP OUTS Yes Yes Yes
9-7
10
Processor
Identification
and Feature
Determination
CHAPTER 10
PROCESSOR IDENTIFICATION AND FEATURE
DETERMINATION
When writing software intended to run on several different types of Intel Architecture proces-
sors, it is generally necessary to identify the type of processor present in a system and the
processor features that are available to an application. This chapter describes how to identify the
processor that is executing the code and determine the features the processor supports. It also
shows how to determine if an FPU or NPX is present. See Chapter 17, Intel Architecture
Compatibility, in the Intel Architecture Software Developer’s Manual, Volume 3, for a complete
list of the features that are available for the different Intel Architecture processors.
10-1
PROCESSOR IDENTIFICATION AND FEATURE DETERMINATION
10-2
PROCESSOR IDENTIFICATION AND FEATURE DETERMINATION
10-3
A
EFLAGS
Cross-Reference
APPENDIX A
EFLAGS CROSS-REFERENCE
The cross-reference in Table A-1 summarizes how the flags in the processor’s EFLAGS register
are affected by each instruction. For detailed information on how flags are affected, see Chapter
3, Instruction Set Reference in the Intel Architecture Software Developer’s Manual, Volume 2.
The following codes describe the how the flags are affected:
A-1
EFLAGS CROSS-REFERENCE
A-2
EFLAGS CROSS-REFERENCE
A-3
EFLAGS CROSS-REFERENCE
RDMSR
RDPMC
RDTSC
REP/REPE/REPNE
RET
ROL/ROR 1 M M
ROL/ROR count — M
RSM M M M M M M M M M M M
SAHF R R R R R
SAL/SAR/SHL/SHR 1 M M M — M M
SAL/SAR/SHL/SHR count — M M — M M
SBB M M M M M TM
SCAS M M M M M M T
SETcc T T T T T
SGDT/SIDT/SLDT/SMSW
SHLD/SHRD — M M — M M
STC 1
STD 1
STI 1
STOS T
STR
SUB M M M M M M
TEST 0 M M — M 0
UD2
VERR/VERRW M
WAIT
WBINVD
WRMSR
XADD M M M M M M
XCHG
XLAT
XOR 0 M M — M 0
A-4
B
EFLAGS
Condition Codes
APPENDIX B
EFLAGS CONDITION CODES
Table B-1 gives all the condition codes that can be tested for by the CMOVcc, FCMOVcc, Jcc
and SETcc instructions. The condition codes refer to the setting of one or more status flags (CF,
OF, SF, ZF, and PF) in the EFLAGS register. The “Mnemonic” column gives the suffix (cc) add-
ed to the instruction to specific the test condition. The “Condition Tested For” column describes
the condition specified in the “Status Flags Setting” column. The “Instruction Subcode” column
gives the opcode suffix added to the main opcode to specify a test condition.
O Overflow 0000 OF = 1
NO No overflow 0001 OF = 0
B Below 0010 CF = 1
NAE Neither above nor equal
NB Not below 0011 CF = 0
AE Above or equal
E Equal 0100 ZF = 1
Z Zero
NE Not equal 0101 ZF = 0
NZ Not zero
BE Below or equal 0110 (CF OR ZF) = 1
NA Not above
NBE Neither below nor equal 0111 (CF OR ZF) = 0
A Above
S Sign 1000 SF = 1
NS No sign 1001 SF = 0
P Parity 1010 PF = 1
PE Parity even
NP No parity 1011 PF = 0
PO Parity odd
Instruction
Mnemonic Meaning Subcode Condition Tested
L Less 1100 (SF xOR OF) = 1
NGE Neither greater nor equal
NL Not less 1101 (SF xOR OF) = 0
GE Greater or equal
B-1
EFLAGS CONDITION CODES
Many of the test conditions are described in two different ways. For example LE (less or equal)
and NG (not greater) describe the same test condition. Alternate mnemonics are provided to
make code more intelligible.
The terms “above” and “below” are associated with the CF flag and refer to the relation between
two unsigned integer values. The terms “greater” and “less” are associated with the SF and OF
flags and refer to the relation between two signed integer values.
B-2
C
Floating-Point
Exceptions Summary
APPENDIX C
FLOATING-POINT EXCEPTIONS SUMMARY
Table C-1 lists the floating-point instruction mnemonics in alphabetical order. For each
mnemonic, it summarizes the exceptions that the instruction may cause. See Section 7.8.,
“Floating-Point Exception Conditions”, for a detailed discussion of the floating-point excep-
tions. The following codes indicate the floating-point exceptions:
C-1
FLOATING-POINT EXCEPTIONS SUMMARY
C-2
FLOATING-POINT EXCEPTIONS SUMMARY
C-3
D
Guidelines for
Writing FPU
Exception Handlers
APPENDIX D
GUIDELINES FOR WRITING FPU
EXCEPTION HANDLERS
As described in Chapter 7, Floating-Point Unit, the Intel Architecture supports two mechanisms
for accessing exception handlers to handle unmasked FPU exceptions: native mode and MS-
DOS compatibility mode. The primary purpose of this appendix is to provide detailed informa-
tion to help software engineers design and write FPU exception-handling facilities to run on PC
systems that use the MS-DOS compatibility mode1 for handling FPU exceptions. Some of the
information in this appendix will also be of interest to engineers who are writing native-mode
FPU exception handlers. The information provided is as follows:
• Discussion of the origin of the MS-DOS* FPU exception handling mechanism and its
relationship to the FPU’s native exception handling mechanism.
• Description of the Intel Architecture flags and processor pins that control the MS-DOS
FPU exception handling mechanism.
• Description of the external hardware typically required to support MS-DOS exception
handling mechanism.
• Description of the FPU’s exception handling mechanism and the typical protocol for FPU
exception handlers.
• Code examples that demonstrate various levels of FPU exception handlers.
• Discussion of FPU considerations in multitasking environments.
• Discussion of native mode FPU exception handling.
The information given is oriented toward the most recent generations of Intel architecture
processors, starting with the Intel486. It is intended to augment the reference information given
in Chapter 7, Floating-Point Unit.
A more extensive version of this appendix is available in the application note AP-578, Software
and Hardware Considerations for FPU Exception Handlers for Intel Architecture Processors
(Order Number 242415-001), which is available from Intel.
1. Microsoft Windows* 95 and Windows* 3.1 (and earlier versions) operating systems use almost the same
FPU exception handling interface as the operating system. The recommendations in this appendix for a
MS-DOS* compatible exception handler thus apply to all three operating systems.
D-1
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
D-2
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
D-3
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
forward for the Pentium Pro processors, as described in Section D.2.2., “MS-DOS* Compati-
bility Mode in the Pentium® Pro Processor”.
For Pentium and Pentium Pro processors, it is important to note that the special DP (Dual
Processing) mode for Pentium Processors and also the more general Intel MultiProcessor Spec-
ification for systems with multiple Pentium or Pentium Pro processors support FPU exception
handling only in the native mode. Intel does not recommend using the MS-DOS compatibility
FPU mode for systems using more than one processor.
D-4
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
If NE=0 but the IGNNE# input is active while an unmasked FPU exception is in effect, the
processor disregards the exception, does not assert FERR#, and continues. If IGNNE# is then
de-asserted and the FPU exception has not been cleared, the processor will respond as described
above. (That is, an immediate exception case will assert FERR# immediately. A deferred excep-
tion case will assert FERR# and freeze just before the next FPU or WAIT instruction.) The asser-
tion of IGNNE# is intended for use only inside the FPU exception handler, where it is needed if
one wants to execute non-control FPU instructions for diagnosis, before clearing the exception
condition. When IGNNE# is asserted inside the exception handler, a preceding FPU exception
has already caused FERR# to be asserted, and the external interrupt hardware has responded,
but IGNNE# assertion still prevents the freeze at FPU instructions. Note that if IGNNE# is left
active outside of the FPU exception handler, additional FPU instructions may be executed after
a given instruction has caused an FPU exception. In this case, if the FPU exception handler ever
did get invoked, it could not determine which instruction caused the exception.
To properly manage the interface between the processor’s FERR# output, its IGNNE# input, and
the IRQ13 input of the PIC, additional external hardware is needed. A recommended configu-
ration is described in the following section.
D-5
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
FF #1
Intel486,
Pentium, or
FF #2
Pentium Pro
processor
FP_IRQ
Legend:
FF #n Flip Flop #n
CLR Clear or Reset
Figure D-1. Recommended Circuit for MS-DOS* Compatibility FPU Exception Handling
D-6
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
In the circuit in Figure D-1, when the FPU exception handler accesses I/O port 0F0H it clears
the IRQ13 interrupt request output from Flip Flop #1 and also clocks out the IGNNE# signal
(active) from Flip Flop #2. So the handler can activate IGNNE#, if needed, by doing this 0F0H
access before clearing the FPU exception condition (which de-asserts FERR#). However, the
circuit does not depend on the order of actions by the FPU exception handler to guarantee the
correct hardware state upon exit from the handler. Flip Flop #2, which drives IGNNE# to the
processor, has its CLEAR input attached to the inverted FERR#. This ensures that IGNNE# can
never be active when FERR# is inactive. So if the handler clears the FPU exception condition
before the 0F0H access, IGNNE# does not get activated and left on after exit from the handler.
0F0H Address
Decode
D-7
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
All the FPU instructions are implemented such that during their execution, there is a window in
which the processor will sample and accept external interrupts. If there is a pending interrupt,
the processor services the interrupt first before resuming the execution of the instruction. Conse-
quently, it is possible that the no-wait floating-point instruction may accept the external interrupt
caused by it’s own assertion of the FERR# pin in the event of a pending unmasked numeric
exception, which is not an explicitly documented behavior of a no-wait instruction. This process
is illustrated in Figure D-3.
Exception Generating
Floating-Point
Instruction
Assertion of FERR#
by the Processor
Start of the “No-Wait”
Floating-Point
Instruction
System
Dependent
Delay
Case 1 External Interrupt
Sampling Window
Assertion of INTR Pin
by the System
Case 2
Window Closed
D-8
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
There are two other ways, in addition to Case 1 above, in which a no-wait floating-point instruc-
tion can service a numeric exception inside its interrupt window. First, the first floating-point
error condition could be of the “immediate” category (as defined in Section D.2.1.1., “Basic
Rules: When FERR# Is Generated”) that asserts FERR# immediately. If the system delay before
asserting INTR is long enough, relative to the time elapsed before the no-wait floating-point in-
struction, INTR can be asserted inside the interrupt window for the latter. Second, consider two
no-wait FPU instructions in close sequence, and assume that a previous FPU instruction has
caused an unmasked numeric exception. Then if the INTR timing is too long for an FERR# sig-
nal triggered by the first no-wait instruction to hit the first instruction’s interrupt window, it
could catch the interrupt window of the second.
The possible malfunction of a no-wait FPU instruction explained above cannot happen if the in-
struction is being used in the manner for which Intel originally designed it. The no-wait instruc-
tions were intended to be used inside the FPU exception handler, to allow manipulation of the
FPU before the error condition is cleared, without hanging the processor because of the FPU er-
ror condition, and without the need to assert IGNNE#. They will perform this function correctly,
since before the error condition is cleared, the assertion of FERR# that caused the FPU error
handler to be invoked is still active. Thus the logic that would assert FERR# briefly at a no-wait
instruction causes no change since FERR# is already asserted. The no-wait instructions may also
be used without problem in the handler after the error condition is cleared, since now they will
not cause FERR# to be asserted at all.
If a no-wait instruction is used outside of the FPU exception handler, it may malfunction as ex-
plained above, depending on the details of the hardware interface implementation and which
particular processor is involved. The actual interrupt inside the window in the no-wait instruc-
tion may be blocked by surrounding it with the instructions: PUSHFD, CLI, no-wait, then
POPFD. (CLI blocks interrupts, and the push and pop of flags preserves and restores the original
value of the interrupt flag.) However, if FERR# was triggered by the no-wait, its latched value
and the PIC response will still be in effect. Further code can be used to check for and correct
such a condition, if needed. Section D.3.6., “Considerations When FPU Shared Between Tasks”,
discusses an important example of this type of problem and gives a solution.
D-9
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
Although FERR# is asserted immediately upon detection of an unmasked FPU error, this
certainly does not mean that the requested interrupt will always be serviced before the next
instruction in the code sequence is executed. To begin with, the Pentium Pro processor executes
several instructions simultaneously. There also will be a delay, which depends on the external
hardware implementation, between the FERR# assertion from the processor and the responding
INTR assertion to the processor. Further, the interrupt request to the PICs (IRQ13) may be
temporarily blocked by the operating system, or delayed by higher priority interrupts, and
processor response to INTR itself is blocked if the operating system has cleared the IF bit in
EFLAGS.
However, just as with the Intel486 and Pentium processors, if the IGNNE# input is inactive, a
floating point exception which occurred in the previous FPU instruction and is unmasked causes
the processor to freeze immediately when encountering the next WAIT or FPU instruction (ex-
cept for no-wait instructions). This means that if the FPU exception handler has not already been
invoked due to the earlier exception (and therefore, the handler not has cleared that exception
state from the FPU), the processor is forced to wait for the handler to be invoked and handle the
exception, before the processor can execute another WAIT or FPU instruction.
As explained in Section D.2.1.3., “No-Wait FPU Instructions Can Get FPU Interrupt in
Window”, if a no-wait instruction is used outside of the FPU exception handler, in the Intel486
and Pentium processors, it may accept an unmasked exception from a previous FPU instruction
which happens to fall within the external interrupt sampling window that is opened near the
beginning of execution of all FPU instructions. This will not happen in the Pentium Pro
processor, because this sampling window has been removed from the no-wait group of FPU
instructions.
D-10
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
2. #Z — Divide-by-zero
3. #D — Denormalized operand
4. #O — Numeric overflow
5. #U — Numeric underflow
6. #P — Inexact result (precision)
For complete details on these exceptions and their defaults, see Section 7.7., “Floating-Point
Exception Handling” and Section 7.8., “Floating-Point Exception Conditions”.
D-11
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
single instruction, because it continues executing the instruction after performing its masked
response. For example, the FPU could detect a denormalized operand, perform its masked
response to this exception, and then detect an underflow.
As an example of how even severe exceptions can be handled safely and automatically using the
default exception responses, consider a calculation of the parallel resistance of several values
using only the standard formula (see Figure D-4). If R1 becomes zero, the circuit resistance
becomes zero. With the divide-by-zero and precision exceptions masked, the processor will
produce the correct result. FDIV of R1 into 1 gives infinity, and then FDIV of (infinity +R2 +R3)
into 1 gives zero.
R1 R2 R3
1
Equivalent Resistance =
1 1 1
+ +
R1 R2 R3
D-12
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
D-13
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
• Aborting further execution, or using the exception pointers to build an instruction that will
run without exception and executing it.
Applications programmers should consult their operating system's reference manuals for the
appropriate system response to numerical exceptions. For systems programmers, some details
on writing software exception handlers are provided in Chapter 5, Interrupt and Exception
Handling, in the Intel Architecture Software Developer’s Manual, Volume 3, as well as in
Section D.3.4., “FPU Exception Handling Examples”, in this appendix.
As discussed in Section D.2.1.2., “Recommended External Hardware to Support the MS-DOS*
Compatibility Mode”, some early FERR# to INTR hardware interface implementations are less
robust than the recommended circuit. This is because they depended on the exception handler
to clear the FPU exception interrupt request to the PIC (by accessing port 0F0H) before the
handler causes FERR# to be de-asserted by clearing the exception from the FPU itself. To elim-
inate the chance of a problem with this early hardware, Intel recommends that FPU exception
handlers always access port 0F0H before clearing the error condition from the FPU.
D-14
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
been tested and debugged, but in a different system or numeric environment, exceptions may
occur regularly nonetheless. An obvious example would be use of the program with some
numbers beyond the range for which it was designed and tested. The example in Section
D.3.3.2., “Exception Synchronization Examples”, shows a more subtle way in which unex-
pected exceptions can occur.
As described in Section D.3.1., “Floating-Point Exceptions and Their Defaults”, depending on
options determined by the software system designer, the processor can perform one of two
possible courses of action when a numeric exception occurs.
• The FPU can provide a default fix-up for selected numeric exceptions. If the FPU performs
its default action for all exceptions, then the need for exception synchronization is not
manifest. However, code is often ported to contexts and operating systems for which it was
not originally designed. The example below illustrates that it is safest to always consider
exception synchronization when designing code that uses the FPU.
• Alternatively, a software exception handler can be invoked to handle the exception. When
a numeric exception is unmasked and the exception occurs, the FPU stops further
execution of the numeric instruction and causes a branch to a software exception handler.
When an FPU exception handler will be invoked, synchronization must always be
considered to assure reliable performance.
The following examples illustrate the need to always consider exception synchronization when
writing numeric code, even when the code is initially intended for execution with exceptions
masked.
D-15
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
In some operating systems supporting the FPU, the numeric register stack is extended to
memory. To extend the FPU stack to memory, the invalid exception is unmasked. A push to a
full register or pop from an empty register sets SF (Stack Fault flag) and causes an invalid oper-
ation exception. The recovery routine for the exception must recognize this situation, fix up the
stack, then perform the original operation. The recovery routine will not work correctly in the
first example shown in the figure. The problem is that the value of COUNT is incremented
before the exception handler is invoked, so that the recovery routine will load an incorrect value
of COUNT, causing the program to fail or behave unreliably.
D-16
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
tion. (The no-wait instructions are discussed in Section 7.5.12., “Waiting Vs. Non-waiting
Instructions”.) Note that the handler must still clear the exception flag(s) before executing the
IRET. If the exception handler uses neither of these techniques the system will be caught in an
endless loop of nested floating point exceptions, and hang.
The body of the exception handler examines the diagnostic information and makes a response
that is necessarily application-dependent. This response may range from halting execution, to
displaying a message, to attempting to repair the problem and proceed with normal execution.
The epilogue essentially reverses the actions of the prologue, restoring the processor so that
normal execution can be resumed. The epilogue must not load an unmasked exception flag into
the FPU or another exception will be requested immediately.
The following code examples show the ASM386/486 coding of three skeleton exception
handlers, with the save spaces given as correct for 32 bit protected mode. They show how
prologues and epilogues can be written for various situations, but the application dependent
exception handling body is just indicated by comments showing where it should be placed.
The first two are very similar; their only substantial difference is their choice of instructions to
save and restore the FPU. The trade-off here is between the increased diagnostic information
provided by FNSAVE and the faster execution of FNSTENV. (Also, after saving the original
contents, FNSAVE re-initializes the FPU, while FNSTENV only masks all FPU exceptions.)
For applications that are sensitive to interrupt latency or that do not need to examine register
contents, FNSTENV reduces the duration of the “critical region,” during which the processor
does not recognize another interrupt request. (See the Section 7.3.9., “Saving the FPU’s State”,
for a complete description of the FPU save image.)
After the exception handler body, the epilogues prepare the processor to resume execution from
the point of interruption (i.e., the instruction following the one that generated the unmasked
exception). Notice that the exception flags in the memory image that is loaded into the FPU are
cleared to zero prior to reloading (in fact, in these examples, the entire status word image is
cleared).
Examples D-1 and D-2 assume that the exception handler itself will not cause an unmasked
exception. Where this is a possibility, the general approach shown in Example D-3 can be
employed. The basic technique is to save the full FPU state and then to load a new control word
in the prologue. Note that considerable care should be taken when designing an exception
handler of this type to prevent the handler from being reentered endlessly.
SAVE_ALL PROC
;
; SAVE REGISTERS, ALLOCATE STACK SPACE FOR FPU STATE IMAGE
PUSH EBP
.
.
MOV EBP, ESP
SUB ESP, 108 ; ALLOCATES 108 BYTES (32-bit PROTECTED MODE SIZE)
D-17
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
SAVE_ENVIRONMENTPROC
;
; SAVE REGISTERS, ALLOCATE STACK SPACE FOR FPU ENVIRONMENT
PUSH EBP
.
.
MOV EBP, ESP
SUB ESP, 28 ; ALLOCATES 28 BYTES (32-bit PROTECTED MODE SIZE)
;SAVE ENVIRONMENT, RESTORE INTERRUPT ENABLE FLAG (IF)
FNSTENV[EBP-28]
PUSH [EBP + OFFSET_TO_EFLAGS] ; COPY OLD EFLAGS TO STACK TOP
POPFD ; RESTORE IF TO VALUE BEFORE FPU EXCEPTION
;
; APPLICATION-DEPENDENT EXCEPTION HANDLING CODE GOES HERE
;
; CLEAR EXCEPTION FLAGS IN STATUS WORD (WHICH IS IN MEMORY)
; RESTORE MODIFIED ENVIRONMENT IMAGE
MOV BYTE PTR [EBP-24], 0H
D-18
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
FLDENV [EBP-28]
; DE-ALLOCATE STACK SPACE, RESTORE REGISTERS
MOV ESP, EBP
.
.
POP EBP
;
; RETURN TO INTERRUPTED CALCULATION
IRETD
SAVE_ENVIRONMENT ENDP
.
.
LOCAL_CONTROL DW ?; ASSUME INITIALIZED
.
.
REENTRANTPROC
;
; SAVE REGISTERS, ALLOCATE STACK SPACE FOR FPU STATE IMAGE
PUSH EBP
.
.
MOV EBP, ESP
SUB ESP, 108 ; ALLOCATES 108 BYTES (32-bit PROTECTED MODE
SIZE)
; SAVE STATE, LOAD NEW CONTROL WORD, RESTORE INTERRUPT ENABLE FLAG (IF)
FNSAVE [EBP-108]
FLDCW LOCAL_CONTROL
PUSH [EBP + OFFSET_TO_EFLAGS] ; COPY OLD EFLAGS TO STACK TOP
POPFD ; RESTORE IF TO VALUE BEFORE FPU EXCEPTION
.
.
;
; APPLICATION-DEPENDENT EXCEPTION HANDLING CODE GOES HERE. AN UNMASKED
EXCEPTION
D-19
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
;
.
.
; CLEAR EXCEPTION FLAGS IN STATUS WORD (WHICH IS IN MEMORY)
; RESTORE MODIFIED STATE IMAGE
MOV BYTE PTR [EBP-104], 0H
FRSTOR [EBP-108]
; DE-ALLOCATE STACK SPACE, RESTORE REGISTERS
MOV ESP, EBP
.
.
POP EBP
;
; RETURN TO POINT OF INTERRUPTION
IRETD
REENTRANT ENDP
D.3.5. Need for Storing State of IGNNE# Circuit If Using FPU and
SMM
The recommended circuit (see Figure D-1) for MS-DOS compatibility FPU exception handling
for Intel486 processors and beyond contains two flip flops. When the FPU exception handler
accesses I/O port 0F0H it clears the IRQ13 interrupt request output from Flip Flop #1 and also
clocks out the IGNNE# signal (active) from Flip Flop #2. The assertion of IGNNE# may be used
by the handler if needed to execute any FPU instruction while ignoring the pending FPU errors.
The problem here is that the state of Flip Flop #2 is effectively an additional (but hidden) status
bit that can affect processor behavior, and so ideally should be saved upon entering SMM, and
restored before resuming to normal operation. If this is not done, and also the SMM code saves
the FPU state, AND an FPU error handler is being used which relies on IGNNE# assertion, then
(very rarely) the FPU handler will nest inside itself and malfunction. The following example
shows how this can happen.
Suppose that the FPU exception handler includes the following sequence:
D-20
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
FCLEX ; clear the FPU error conditions & thus turn off
FERR# & reset the IGNNE# FF
The problem will only occur if the processor enters SMM between the OUT and the FLDCW
instructions. But if that happens, AND the SMM code saves the FPU state using FNSAVE, then
the IGNNE# Flip Flop will be cleared (because FNSAVE clears the FPU errors and thus de-
asserts FERR#). When the processor returns from SMM it will restore the FPU state with
FRSTOR, which will re-assert FERR#, but the IGNNE# Flip Flop will not get set. Then when
the FPU error handler executes the FLDCW instruction, the active error condition will cause the
processor to re-enter the FPU error handler from the beginning. This may cause the handler to
malfunction.
To avoid this problem, Intel recommends two measures:
1. Do not use the FPU for calculations inside SMM code. (The normal power management,
and sometimes security, functions provided by SMM have no need for FPU calculations; if
they are needed for some special case, use scaling or emulation instead.) This eliminates
the need to do FNSAVE/FRSTOR inside SMM code, except when going into a 0 V
suspend state (in which, in order to save power, the CPU is turned off completely, requiring
its complete state to be saved.)
2. The system should not call upon SMM code to put the processor into 0 V suspend while
the processor is running FPU calculations, or just after an interrupt has occurred. Normal
power management protocol avoids this by going into power down states only after timed
intervals in which no system activity occurs.
D-21
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
1. In a software task switch, the operating system uses a sequence of instructions to save the suspending
thread’s state and restore the resuming thread’s state, instead of the single long non-interruptible task
switch operation provided by the Intel Architecture.
2. Although CR0, bit 2, the emulation flag (EM), also causes a DNA exception, do not use the EM bit as a
surrogate for TS. EM means that no floating point unit is available and that floating point instructions must
be emulated. Using EM to trap on task switches is not compatible with the Intel Architecture’s MMX™
technology. If the EM flag is set, MMX instructions raise the invalid opcode exception.
D-22
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
2. Save the FPU contents to the old thread’s save area, typically using an FNSAVE
instruction.
3. Set the FPU Owner variable to the identify the currently executing thread.
4. Reload the FPU contents from the new thread’s save area, typically using an FRSTOR
instruction.
5. Clear TS using the CLTS instruction and exit the DNA exception handler.
While this flow covers the basic requirements for speculatively deferred FPU state swaps, there
are some additional subtleties that need to be handled in a robust implementation.
D-23
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
execute a floating point instruction the instruction will fault with the DNA exception because
TS is set.
At this point the handler is entered, and eventually it finds that the current FPU Owner is not the
currently executing thread. To guard the FPU state swap from extraneous numeric exceptions,
the FPU Owner is set to be the kernel. The old owner’s FPU state is saved with FNSAVE, and
the current thread’s FPU state is restored with FRSTOR. Before exiting, the FPU owner is set to
thread B, and the TS bit is cleared.
On exit, thread B resumes execution of the faulting floating point instruction and continues.
Case #2: FPU State Swap with Discarded Numeric Exception
Again, assume two threads A and B, both using the floating point unit. Let A be the thread to
have most recently executed a floating point instruction, but this time let there be a pending
numeric exception. Let B be the currently executing thread. When B starts to execute a floating
point instruction the instruction will fault with the DNA exception and enter the DNA handler.
(If both numeric and DNA exceptions are pending, the DNA exception takes precedence, in
order to support handling the numeric exception in its own context.)
D-24
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
Current Thread
same as
FPU Owner? Yes
No
FPU Owner := Kernel
Is Kernel
FPU Owner? Yes
No
Normal Dispatch to
Numeric Exception Handler Exit
D-25
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
When the FNSAVE starts, it will trigger an interrupt via FERR# because of the pending numeric
exception. After some system dependent delay, the numeric exception handler is entered. It may
be entered before the FNSAVE starts to execute, or it may be entered shortly after execution of
the FNSAVE. Since the FPU Owner is the kernel, the numeric exception handler simply exits,
discarding the exception. The DNA handler resumes execution, completing the FNSAVE of the
old floating point context of thread A and the FRSTOR of the floating point context for thread B.
Thread A eventually gets an opportunity to handle the exception that was discarded during the
task switch. After some time, thread B is suspended, and thread A resumes execution. When
thread A starts to execute an floating point instruction, once again the DNA exception handler
is entered. B’s FPU state is Finessed, and A’s FPU state is Frustrate. Note that in restoring the
FPU state from A’s save area, the pending numeric exception flags are reloaded in to the floating
point status word. Now when the DNA exception handler returns, thread A resumes execution
of the faulting floating point instruction just long enough to immediately generate a numeric
exception, which now gets handled in the normal way. The net result is that the task switch and
resulting FPU state swap via the DNA exception handler causes an extra numeric exception
which can be safely discarded.
D-26
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
D.4.1. Origin With the Intel 286 and Intel 287, and Intel386™ and
Intel 387 Processors
The Intel 286 and Intel 287, and Intel386 and Intel 387 processor/coprocessor pairs are each
provided with ERROR# pins that are recommended to be connected between the processor and
FPU. If this is done, when an unmasked FPU exception occurs, the FPU records the exception,
and asserts its ERROR# pin. The processor recognizes this active condition of the ERROR#
status line immediately before execution of the next WAIT or FPU instruction (except for the
no-wait type) in its instruction stream, and branches to the routine at interrupt vector 16. Thus
an FPU exception will be handled before any other FPU instruction (after the one causing the
error) is executed (except for no-wait instructions, which will be executed without triggering the
FPU exception interrupt, but it will remain pending).
Using the dedicated interrupt 16 for FPU exception handling is referred to as the native mode.
It is the simplest approach, and the one recommended most highly by Intel.
D-27
GUIDELINES FOR WRITING FPU EXCEPTION HANDLERS
D-28
Index
INDEX
Numerics B
16-bit B (default size) flag, segment
address size . . . . . . . . . . . . . . . . . . . . . . . . .3-5 descriptor . . . . . . . . . . . . . . . . .3-14, 4-3
operand size . . . . . . . . . . . . . . . . . . . . . . . . .3-5 Base (operand addressing) . . . . . . . . . . . .5-8, 5-9
32-bit Basic execution environment . . . . . . . . . . . . . . 3-2
address size . . . . . . . . . . . . . . . . . . . . . . . . .3-5 B-bit, FPU status word . . . . . . . . . . . . . . . . . . 7-15
operand size . . . . . . . . . . . . . . . . . . . . . . . . .3-5 BCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
BCD integers . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
FPU encoding . . . . . . . . . . . . . . . . . .7-28, 7-29
A packed. . . . . . . . . . . . . . . . . . . . . . . . .5-4, 6-24
AAA instruction. . . . . . . . . . . . . . . . . . . . . . . . .6-24 relationship to status flags. . . . . . . . . . . . . 3-12
AAD instruction . . . . . . . . . . . . . . . . . . . . . . . .6-24 unpacked. . . . . . . . . . . . . . . . . . . . . . .5-4, 6-24
AAM instruction . . . . . . . . . . . . . . . . . . . . . . . .6-24 BH register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
AAS instruction. . . . . . . . . . . . . . . . . . . . . . . . .6-24 Bias value
AC (alignment check) flag, EFLAGS numeric overflow . . . . . . . . . . . . . . . . . . . . 7-51
register . . . . . . . . . . . . . . . . . . . . . . .3-13 numeric underflow. . . . . . . . . . . . . . . . . . . 7-52
Access rights, segment descriptor . . . . . . 4-8, 4-11 Biased exponent. . . . . . . . . . . . . . . . . . . . . . . . 7-5
ADC instruction . . . . . . . . . . . . . . . . . . . . . . . .6-22 Binary numbers . . . . . . . . . . . . . . . . . . . . . . . . 1-6
ADD instruction . . . . . . . . . . . . . . . . . . . . . . . .6-22 Binary-coded decimal (see BCD)
Address size attribute Bit fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
code segment . . . . . . . . . . . . . . . . . . . . . . .3-14 Bit order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
description of . . . . . . . . . . . . . . . . . . . . . . .3-14 BOUND instruction . . . . . . . . . . . . 4-15, 6-34, 6-39
of stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3 BOUND range exceeded exception (#BR) . . . 4-15
Address sizes. . . . . . . . . . . . . . . . . . . . . . . . . . .3-5 BP register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
Addressing modes Branch prediction . . . . . . . . . . . . . . . . . . . . . . . 2-7
assembler . . . . . . . . . . . . . . . . . . . . . . . . . . .5-9 Branching, on FPU condition codes . . . .7-15, 7-36
base . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8, 5-9 BSF instruction . . . . . . . . . . . . . . . . . . . . . . . . 6-29
base plus displacement . . . . . . . . . . . . . . . .5-9 BSR instruction . . . . . . . . . . . . . . . . . . . . . . . . 6-29
base plus index plus displacement . . . . . . . .5-9 BSWAP instruction . . . . . . . . . . . . . . . . . .6-2, 6-17
base plus index time scale plus BT instruction . . . . . . . . . . . . . . . . 3-10, 3-12, 6-29
displacement . . . . . . . . . . . . . . . . . . . . . .5-9 BTC instruction . . . . . . . . . . . . . . . 3-10, 3-12, 6-29
displacement. . . . . . . . . . . . . . . . . . . . . . . . .5-8 BTR instruction . . . . . . . . . . . . . . . 3-10, 3-12, 6-29
effective address. . . . . . . . . . . . . . . . . . . . . .5-8 BTS instruction . . . . . . . . . . . . . . . 3-10, 3-12, 6-29
immediate operands . . . . . . . . . . . . . . . . . . .5-5 Bus interface unit . . . . . . . . . . . . . . . . . . . . . . . 2-8
index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-8 BX register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
index times scale plus displacement . . . . . .5-9 Byte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
memory operands. . . . . . . . . . . . . . . . . . . . .5-6 Byte order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
register operands . . . . . . . . . . . . . . . . . . . . .5-5
scale factor . . . . . . . . . . . . . . . . . . . . . . . . . .5-8
specifying a segment selector . . . . . . . . . . .5-6 C
specifying an offset . . . . . . . . . . . . . . . . . . . .5-7 C1 flag, FPU status word . . 7-13, 7-48, 7-51, 7-53
Addressing, segments . . . . . . . . . . . . . . . . . . . .1-7 C2 flag, FPU status word . . . . . . . . . . . . . . . . 7-13
Advanced programmable interrupt controller Call gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
(see APIC) CALL instruction . . . . . . 3-14, 4-4, 4-8, 6-31, 6-39
AF (adjust) flag, EFLAGS register . . . . . . . . . .3-11 Calls (see Procedure calls)
AH register . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7 CBW instruction . . . . . . . . . . . . . . . . . . . . . . . 6-21
Alignment CDQ instruction . . . . . . . . . . . . . . . . . . . . . . . 6-22
of words, doublewords, and quadwords . . . .5-1 CF (carry) flag, EFLAGS register . . . . . . . . . . 3-11
AND instruction . . . . . . . . . . . . . . . . . . . . . . . .6-25 CH register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
APIC, presence of . . . . . . . . . . . . . . . . . . . . . .10-1 CLC instruction . . . . . . . . . . . . . . . . . . . .3-12, 6-37
Arctangent, FPU operation. . . . . . . . . . . . . . . .7-37 CLD instruction . . . . . . . . . . . . . . . . . . . .3-12, 6-37
Arithmetic instructions, FPU. . . . . . . . . . . . . . .7-43 CLI instruction. . . . . . . . . . . . . . . . . . . . . .6-38, 9-4
Assembler, addressing modes. . . . . . . . . . . . . .5-9 CMC instruction . . . . . . . . . . . . . . . . . . .3-12, 6-37
AX register . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7 CMOVcc instructions . . . . . . . . . . . . . . . .6-1, 6-16
INDEX-1
INDEX
INDEX-2
INDEX
INDEX-3
INDEX
INDEX-4
INDEX
INDEX-5
INDEX
INDEX-6
INDEX
INDEX-7
INDEX
INDEX-8
INDEX
U
UD2 instruction. . . . . . . . . . . . . . . . . . . . . 6-2, 6-40
UE (numeric overflow exception) flag,
FPU status word . . . . . . . . . . 7-14, 7-52
Underflow, FPU exception (see Numeric
underflow exception)
Underflow, FPU stack . . . . . . . . . . . . . . 7-47, 7-48
Underflow, numeric . . . . . . . . . . . . . . . . . . . . . .7-7
Un-normal number . . . . . . . . . . . . . . . . . . . . . .7-28
Unsigned integers . . . . . . . . . . . . . 5-4, 6-22, 6-23
Unsupported floating-point formats . . . . . . . . .7-28
Unsupported FPU instructions . . . . . . . . . . . . .7-41
V
Vector (see Interrupt vector)
VIF (virtual interrupt) flag, EFLAGS register . .3-13
VIP (virtual interrupt pending) flag, EFLAGS
register . . . . . . . . . . . . . . . . . . . . . . .3-13
Virtual 8086 mode
description of . . . . . . . . . . . . . . . . . . . . . . .3-13
memory model . . . . . . . . . . . . . . . . . . . . . . .3-4
VM (virtual 8086 mode) flag, EFLAGS
register . . . . . . . . . . . . . . . . . . . . . . .3-13
W
Waiting instructions . . . . . . . . . . . . . . . . . . . . .7-40
WAIT/FWAIT instructions . . . . . . . . . . . . 7-40, 7-55
WBINVD instruction . . . . . . . . . . . . . . . . . . . . . .6-2
Word. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-1
Wraparound mode (MMX instructions) . . . . . . .8-5
WRMSR instruction . . . . . . . . . . . . . . . . . 6-2, 10-1
X
XADD instruction . . . . . . . . . . . . . . . . . . . 6-2, 6-18
XCHG instruction . . . . . . . . . . . . . . . . . . . . . . .6-17
XLAT/XLATB instruction . . . . . . . . . . . . . . . . .6-40
XOR instruction . . . . . . . . . . . . . . . . . . . . . . . .6-25
Z
ZE (division-by-zero exception) flag,
FPU status word . . . . . . . . . . . . . . .7-14
Zero, floating-point format . . . . . . . . . . . . . . . . .7-6
ZF (zero) flag, EFLAGS register . . . . . . . . . . .3-11
INDEX-9