Chapter 6 - Using Floating-Poin
Chapter 6 - Using Floating-Poin
CHAPTER 6
Math c oprocessors — the 8087, 80287, and 80387 chips — work with the ma in processor to handle re
al-number
calculations. The 80486 processor perfor ms floating-point operations direc tly. All information in
this c hapte
pertaining to rthe 80387 coproc essor applies to the 80486DX processor as we ll. It does not apply to the
80486SX,
which does not provide an on-chip coprocessor.
This chapter begins with a summary of the directives a nd forma ts of floating-point da ta that you need to
allocate
memory storage and initialize variables before you can work with floating-point numbers.
The chapter then explains how to use a math coprocessor for floating-point operations. It covers:
The next ma in se ction de sc ribe s e mulation libra rie s. The emulation routines provided with all M icrosoft high-
level
la nguages enable you to use coprocessor instructions as though your c omputer had a math coprocessor.
However,
some coprocessor instructions a re not handled by emulation, as this section explains.
Finally, be cause math coproc essor and emulation routines can also operate on BCD numbers,
this chapte
includes r
the instruction set for these numbers.
Before using floating-point da ta in your pr ogram, you need to alloc ate the memor y stora ge for the
data. You canva ria bles either as real numbers in decimal form or as encoded hexadecima ls. The a ssembler
then initialize
store s
allocated da ta in 10- byte IEEE format. This section covers floa ting-point declarations and
floating-point
formats. data
Directive Size
Table 6.1 lists the possible ranges for floating-point va ria bles. T he numbe r of significant digits
can va ry operation
arithmetic in an a s the lea st-significant digit may be lost through rounding errors. This occ urs
regularly
short and forlong re al numbers, so you should assume the lesser va lue of significant digits shown in Table
6.1. Te n-
byte real numbers are much less susce ptible to rounding errors for reasons described in the next section. However,
under certain circumstances, 10-byte real operations can have a precision of only 18 digits.
With versions of MASM prior to 6.0, the DD, DQ, and DT directives could alloca te real constants.
MASM 6.1
still supports these direc tives, but the varia bles are intege rs rather than floating-point values. Although this
makes
no difference in the assembly code, CodeView displays the values incorrectly.
You can specify floating-point c onsta nts either a s decimal constants or as e ncode d hexadecimal consta
nts. You decimal real-number constants in the form:
can express
For example, the num bers 2.523E 1 and -3.6E-2 are written in the correc t decimal format. You
cnumbers
an useas initializers
the se for real-number variables.
The assembler always eva luates digits of real numbe rs a s base 10. It conve rts rea l-number c
onstants give n toin a binary format. The sign, exponent, and de cimal part of the re al number ar e
decimal format
enc oded
fields asthe
within bit number.
You can also specify the encoded format directly with hexade cimal digits (0–9 plus A–F). T he number must
begin
with a decimal digit (0–9) a nd end with the real- numbe r designa tor (R). It ca nnot be signed.
For example,
hexadecimal the 3F800000r can serve as an initializer for a doubleword-sized variable.
number
The maximum range of expone nt value s and the numbe r of digits required in the hexa decimal number de
pend on
the directive . The number of digits for encoded numbers used with REAL4, REAL8, and REAL10
must be 8,
16, a nd 20 digits, respectively. If the number has a leading zero, the number must be 9, 17, or 21 digits.
Examples of decimal constant and hexadecimal specifications are shown here:
; Real numbers
short REAL4 25.23 ; IEEE format
double REAL8 2.523E1 ; IEEE format
tenbyte REAL10 2523.0E-2 ; 10-byte real format
; Encoded as hexadecimals
ieeeshort REAL4 3F800000r ; 1.0 as IEEE short
ieeedouble REAL8 3FF0000000000000r ; 1.0 as IEEE long
temporary REAL10 3FFF8000000000000000r ; 1.0 as 10-byte
; real
The section “Storing Numbers in Floating-Point Forma t,” following, explains the IEEE formats —
the way r actually
assemble the stores the data.
Pa scal or C progra mmer s may prefe r to cre ate language-specific TYPEDEF declarations, as
illustrated
example: in this
; C-language specific
floa t TYPEDEF REAL4
double TYPEDEF REAL8
long_double TYPEDEF REAL10
; Pa scal-language specific
SINGLE TYPEDEF REAL4
DOUBLE TYPEDEF REAL8
EXTENDED TYPEDEF REAL10
For applications of TYPEDEF, see “Defining Pointer Types with TYPEDEF,” page 75.
The following list expla ins how the parts of a real number are stored in the IEE E format. Ea
ch
refersitem
to aninitem
thein Figure
list 6.1.
Sign bit (0 for positive or 1 for negative) in the upper bit of the first byte.
Exponent in the ne xt bits in sequence (8 bits for a short real number , 11 bits for a long real number,
for
a nda 15
10-byte
bits real number).
The integer part of the significand in bit 63 for the 10-byte real format. By absorbing carry va lues,
10-byte real ope ra tions to pre serve prec ision to 19 digits. The integer part is alwa ys 1 in
this bit allows
short andconsequently,
numbers; long real these formats do not provide a bit for the integer, since there is no point in storing it.
Decimal part of the significand in the r emaining bits. The length is 23 bits for short real numbers, 52 bits for
real numbers, and 63 bits for 10-byte re al numbers.
long
The e xponent fie ld re prese nts a multiplier 2n. T o ac commodate nega tive exponents (such a s 2-6), the value in
the
exponent fie ld is biased; tha t is, the a ctual e xponent is determined by subtracting the appr opriate bia s value
from
the value in the expone nt field. For e xample, the bias for short real numbers is 127. If the va lue in
the exponent
field is 130, the exponent represents a value of 2130-127, or 23. The bias for long real numbe rs is
1,023. The bias
for 10-byte real numbers is 16,383.
Once you have dec lared floating-point data for your program, you can use coprocessor or emulator instruc tions to
acce ss the data . The next section foc use s on the c oprocessor architecture, instructions, and ope rands require
dfloa
forting-point operations.
This se ction de scribe s how the coproc essor transfe rs data to and from the coproc essor, coordinates proc essor
and
coprocessor operations, and controls program flow.
Coprocessor Architecture
The coprocessor accesses memory as the CPU does, but it has its own da ta and control
registers
registers — eight
orga nizeddata
as a stack and seve n control registers similar to the 8086 flag re giste rs. The
cinstruction
oproc essor’s
set provides direct access to these registers.
The eight 80-bit data registers of the 8087-base d coproce ssors are organize d as a sta ck, although they need
not
usedbe as a stack. As data items are pushed into the top re giste r, previous data items move into
higher-numbered
registers, which a re lower on the stack. Register 0 is the top of the sta ck; register 7 is the bottom. The
synta x forregiste rs is:
specifying
ST [[(number)]]
The number must be a digit between 0 and 7 or a constant expression that eva luates to a number from 0 to 7.
is another way to refer to ST(0).
All c oprocessor data is store d in registers in the 10-byte real format. The registers and the
register format ar e
shown in Figure 6.2.
Inte rnally, all ca lculations are done on numbers of the same type. Since 10-byte re al numbers
have the grea
precision, test
lower-precision numbers are guarante ed not to lose prec ision a s a r esult of ca
lculations. The
instructions that transfer value s between the main me mory and the c oprocessor automatica lly convert
numbers
and from to
the 10-byte real format.
You can easily recognize coprocessor instr uctions because, unlike all 8086-fa mily instruction
mnemonics,
start with the they
letter F. Coproc essor instruc tions can ne ver have immediate operands a nd, with the
exception of the
FSTSW instruction, the y ca nnot have processor registers as ope rands.
Classical-Stack Format
Instructions in the cla ssical-stack format treat the c oprocessor re gisters like items on a sta ck —
thus
Items itsarename .
pushed onto or poppe d off the top e le ments of the stack. Since only the top ite m can be a
ccessed on sta
traditional a ck, there is no need to spe cify operands. The first (top) register (and the sec ond, if
the
needsinstruction
two operands) is always assumed.
ST (the top of the stack) is the source ope ra nd in coproc essor arithmetic operations. ST(1), the sec ond
register, is tion. The result of the operation replaces the de stina tion operand, and the source is
the destina
popped
stack. Thisoffleaves
the the result at the top of the stack.
The following example illustr ates the classica l-stac k forma t; Figure 6.3 shows the sta tus of the register stack
aeach
fte rinstruction.
Memory Format
Instructions that use the memory format, such as data transfer instr uctions, also treat c oprocessor
re
itegisters
ms on a like
stack. However, with this for mat, items ar e pushed from memory onto the top ele ment of the
stack,
poppedorfrom the top element to memory. You must specify the memory operand.
Some instructions tha t use the memory format spec ify how a memory ope ra nd is to be interpreted — as an
inte
(I) ger
or as a binary coded decimal (B). The letter I or B follows the initial F in the syntax.
For example,
interpre ts its ope ra nd as an inte ger and FBLD interprets its operand as a BCD numbe r. If the
instruc
does nottion name
include a type letter, the instruction works on real numbers.
You can also use memory operands in calculation instructions tha t operate on two va lues (see “Using Copr
ocessor
Instructions,” later in this section). The memory ope rand is always the source. The stac k top (ST)
is alwaysdestination.
implied the
The result of the operation replaces the de stina tion without changing its sta ck position, as shown in
this example
and in Figure 6.4:
.DATA
m1 REAL4 1.0
m2 REAL4 2.0
.CODE
.
.
.
fld m1 ; Push m1 into first position
fld m2 ; Push m2 into first position
fadd m1 ; Add m2 to first position
fstp m1 ; Pop first position into m1
fst m2 ; Copy first position to m2
Figure 6.4 Status of the Register Stack and Memory Loc ations
Register Format
Instructions tha t use the register format treat coprocessor registers as registers rathe r than as
stack eleme nts.
Instructions that use this forma t require two register operands; one of them must be the stack top (ST).
In the re gister format, spec ify all opera nds by name. The first operand is the de stina tion; its value is replac ed
with
the result of the ope ra tion. The se cond operand is the sour ce ; it is not affecte d by the oper
ation. The
positions stack
of the operands do not change.
The only instructions that use the register operand format a re the FXCH instruction and arithme tic instructions
for
calculations on two values. With the FXCH instruction, the stack top is implied and need not
be spec
shown in ified, as
this example a nd in Figure 6.5:
Register-Pop Format
The register-pop format tre ats coprocessor registers as a modified stack. The source register must
always
stack top.beSpecify
the the destination with the register’s name.
Instructions with this format plac e the result of the operation into the destination oper and, a nd the top pops off the
stack. The re giste r-pop format is used only for instructions for c alculations on two value s, as in this
example
in Figureand
6.6:
The main proce ssor and the c oprocessor have their own r egisters, which are separate and
inaccessible to e ach data through me mory, since memory is ava ilable to both.
other. They exchange
Ste p 2, proce ssing the data , c an occ ur while the main processor is ha ndling othe r tasks. Ste ps 1
and 3 must
coordinated withbethe ma in processor so that the processor and coprocessor do not try to ac cess the sam e
memory
at the same time; othe rwise , problems of coordinating memory acc ess c an occ ur. Since the
processor
coprocessor andwork independently, they may not finish working on memory in the order in which
you give
instructions. The two potential timing conflicts that can occur are handled in different ways.
One timing conflic t results from a copr oce ssor instruction following a processor instruc tion. The
proc
have essor m ay
to wait until the coprocessor finishes if the ne xt processor instruction requires the result of the c oproc
essor’s
calculation. You do not have to write your c ode to avoid this conflict, however . The assembler c
oordinates this
timing a utomatica lly for the 8088 a nd 8086 proce ssors, and the proc essor c oordinates it
automatically on the This is the c ase shown in the first e xample that follows.
80186–80486 processors.
Another conflict results from a pr ocessor instruction that acc esses memor y following a
coprocessor
that accesses instr
the uction
same memory. Theprocessor ca n try to load a variable that is still being used by the c
oprocessor.
You ne ed careful synchronization to control the timing, and this synchroniz ation is not automatic
on the 8087For code to run corre ctly on the 8087, you m ust include WAIT or FWAIT (mne
coprocessor.
monics for the to ensur e that the c oprocessor finishe s before the proc essor begins, a s shown
same instruction)
in the second
example.
In this situation, the processor does not generate the FWAIT instruction automatically.
When gener ating c ode for the 8087 coprocessor, the assemble r automatica lly inserts a WAIT
instruction
the coprocbeessor
fore instruction. However, if you use the .286 or .386 directive, the compiler assumes
that the
coprocessor instruc tions ar e for the 80287 or 80387 and does not inse rt the WAIT instruction. If your code
does
not need to run on an 8086 or 8088 proc essor, you ca n make your progra ms sm aller and more effic
ient by using
the .286 or .386 directive.
The following sections expla in the available instr uctions and show how to use them for each of these
ope rations. syntax information, see “Instruction a nd Opera nd Formats,” earlier in this section.
For general
The choice of instruction deter mines whe ther a value in me mory is considered a n intege r, a
BCDreanumber,
l number.orThea value is a lways considered a 10-byte real number onc e transferred to the coprocessor.
The size of the ope rand determines the size of a value in memory. Value s in the c oprocessor
always
bytes.take up 10
You c an transfer da ta to stack registers using load c ommands. These commands push data onto the stac
kme from
mory or from coprocessor registe rs. Store c omm ands remove data. Some store c omm ands pop
data off stack
register the into memory or coprocessor registers; others simply copy the data without changing it on the stack.
If you use constants as operands, you cannot load them dire ctly into coprocessor registers. You
must
me moryallocate
a nd initialize a var iable to a consta nt value. That variable can then be loaded by using
one of thein the
instructions loadfollowing list.
The math coprocessor offer s a few special instructions for loading ce rta in constants. You can load 0,
1,
sevepi,ral and
common logarithmic values directly. Using these instructions is faster and often more precise than
loading
the values from initialized variables.
All instructions tha t loa d constants have the stack top a s the implied destination oper and. T he
constant to implied
loaded is the be source operand.
The c oprocessor data area, or parts of it, ca n also be moved to mem ory and late r loaded back. You
may
do thiswant to the c urrent state of the coprocessor before e xecuting a procedur e. After the procedure ends, re
to save
store
the previous sta tus. Saving c oprocessor data is also useful whe n you want to modify coproc essor
behavior by data to ma in memory, operating on the data with 8086-family instruc tions, and the n loading it back
writing certain
to the coprocessor data area.
Use the following instructions for transferring numbers to and from registers:
Instruction(s) Description
The following e xample and Figure 6.7 illustrate some of these instructions:
.DATA
m1 REAL4 1.0
m2 REAL4 2.0
.CODE
fld m1 ; Push m1 into first item
fld st(2) ; Push third item into first
fst m2 ; Copy first item to m2
fxch st(2) ; Exchange first and third items
fstp m1 ; Pop first item into m1
Figure 6.7 Status of the Register Stack: Main Memory and Coprocessor
instruction if both opera nds a re stac k re giste rs, since r egister value s are always 10- byte rea l
numbers. In most
the arithmetic of listed here, the result replaces the destination register. The instructions include:
instructions
Instruction Description
80387 Only
Instruction Description
The following e xample illustrate s seve ra l arithmetic instructions. The code solves quadratic equations, but does
no
error checking and fails for some values be cause it attempts to find the square root of a negative
number . Both
Help and the M ATH.ASM sam ple file show a complete ve rsion of this procedure. The c omplete
form use s theZero) instruction to che ck for a ne gative numbe r or 0 before calculating the square root.
FTST (Test for
.DATA
a REAL4 3.0
b REAL4 7.0
cc REAL4 2.0
posx REAL4 0.0
negx REAL4 0.0
.CODE
.
.
.
; Solve quadratic equation - no error checking
; The formula is: -b +/- squareroot(b2 - 4ac) / (2a)
fld1 ; Get constants 2 and 4
fadd st,st ; 2 at bottom
fld st ; Copy it
fmul a ; = 2a
fmul st(1),st ; = 4a
fxch ; Exchange
fmul cc ; = 4ac
fld b ; Load b
fmul st,st ; = b2
fsubr ; = b2 - 4ac
; Negative value here produces error
fsqrt ; = square root(b2 - 4ac)
fld b ; Load b
fchs ; Make it negative
fxch ; Exchange
An e asy way to use the status word with conditional jumps is to move its upper byte into the
lowe r byte
processor flags,ofas the
shown in this example:
You can save se veral steps by loading the status word directly to AX on the 80287 with the
FSTSW
FNSTSW instructions. This is the only c ase in which data can be transferred directly
between
coprocessorprocessor
registers, asand
shown in this example:
fstsw ax
The coprocessor control flags and their relationship to the status word are de scribe d in
“Control
following. Re giste rs,”
The 8087-family coproce ssors provide several instr uctions for comparing oper ands and testing control
flags.
these All
instructions compare the stac k top ( ST) to a source oper and, which may either be spec ified
or implie
ST(1). d as
The c ompar e instructions affect the C3, C2, and C0 c ontrol flags, but not the C1 flag. Table 6.3 shows the
flags’
settings for each possible result of a comparison or test.
Variations on the compa re instructions a llow you to pop the stac k once or twice a nd to compare integers and
zero.
For each instruction, the stack top is alwa ys the implied destination operand. If you do not give an operand, ST( 1)
is the implie d source. With some compa re instructions, you can specify the source as a
memory
operand. or re giste r
All instructions summar ize d in the following list have implie d operands: either ST as a single-destination
operand
or ST as the destination and ST (1) as the source. Ea ch instruction in the list has implied
operands.
instructionsSome
have a wa it ve rsion and a no-wait ver sion. The no-wa it ve rsions have N as the
sec ond letter. The
instructions for comparing and testing flags include:
Instruction Desc ription
The following exa mple illustra tes some of these instructions. Notice how conditional blocks ar e used
to e nhacode.
80287 nce
.DATA
down REAL4 10.35 ; Sides of a rectangle
across REAL4 13.07
diamtr REAL4 12.93 ; Diameter of a circle
status WORD ?
P287 EQU (@Cpu AND 00111y)
.CODE
.
.
.
; Get area of rectangle
fld across ; Load one side
fmul down ; Multiply by the other
Additional instr uctions for the 80387/486 are FLDENVD and FLDENVW for loading the envir
onment;
FNSTENVD, FNSTENVW, FSTENVD, and FSTENVW for storing the e nvironment state;
FNSAVED
FNSAVEW, FSAVED, and FSAVEW for saving the coprocessor state; and FRSTORD and
FRSTORW
restoring the coprocessor state.
The size of the code se gment, not the ope ra nd siz e, determines the number of bytes loaded or
stored with Tthese
instructions. he instructions ending with W store the 16-bit form of the control registe r da ta, and the instruc
tions
ending with D store the 32-bit form. For exa mple, in 16-bit mode FSAVEW saves the 16-bit c ontrol register data.
If you need to store the 32-bit form of the control register data, use FSAVED.
Control Registers
Some of the flags of the seven 16-bit control registers control coproc essor oper ations, while
other s status
current maintain
of thethe
coprocessor. In this sense, they are much like the 8086-family flags registers (see Figure 6.8).
Figure 6.8 Coprocessor Control Registers
The status word registe r is the only commonly used control re giste r. (The othe rs are used
mostly
programmeby rs.)
systems
The forma t of the status word registe r is shown in Figure 6.9, which shows how the
coprocessor
control flags align with the pr oce ssor flags. C3 overwrite s the zero flag, C2 overwrites the
parity flag, the
overwrites and c C0
arry flag. C1 over wr ites a n undefined bit, so it cannot be used directly with
conditional jumps,
although you can use the TEST instruction to
check C1 in memory or in a r egister. T he status wor d register also overwrites the sign and auxilia ry-c arry flags,
so
you cannot count on their being unchanged after the operation.
To use emula tor functions, first write your asse mbly-language procedure using coprocessor
instruc tions. Then
assemble the module with the /FPi option a nd link it with your high-leve l – language modules.
You
optionsc anin enter
the Progra mmer’s WorkBench ( PWB) environment, or you ca n use the OPTION
EMULATOR
your source code.
In emula tion mode, the a ssembler generates instructions for the linker that the Microsoft emula tor
can
for muse.
of The
the OPTION directive in the following example tells the assembler to use emulation
mode. Thisin Chapter
(introduced option 1) can be defined only once in a module.
OPTION EMULATOR
You c an use emulator functions in a stand-a lone assembler program by assembling with the /Cx
comma
option nd-line
and linking with the appropriate emulator library. The following fragme nt outlines a small-model
program
that contains floating-point instructions served by an emulator:
.MODEL small, c
OPTION EMULATOR
.
.
.
PUBLIC main
.CODE
main: ; Program entry point must
.STARTUP ; have name 'main'
.
fadd st, st ; Floating-point instructions
fldpi ; emulate
d
Emulator libraries do not allow for all of the coprocessor instructions. The following floating-point
instruc
are not tions
emulated:
FBLD
FBSTP
FCOS
FDECSTP
FINCSTP
FINIT
FLDENV
FNOP
FPREM1
FRSTOR
FRSTORW
FRSTORD
FSAVE
FSAVEW
FSAVED
FSETPM
FSIN
FSINCOS
FSTENV
FUCOM
FUCOMP
FUCOMPP
FXTRACT
For information about writing assembly-language proc edures for high- level langua ges, see Chapte r
12, “M ixed-
Language Progra mming.”
This se ction explains how to define BCD number s, how to access the m with a math coprocessor or e mulator ,
and
how to perform simple BCD c alculations on the main processor.
Pa cked BCD number s are encoded in the 8087 coprocessor’s packed BCD format. They can be up
to
long,18 digits
packed two digits per byte . The assembler z er o-pads BCDs initialized with fewer tha n 18 digits.
Digit 20 is a nd digit 19 is reserved.
the sign bit,
When you define an inte ger constant with the TBYTE dire ctive and the curre nt radix is decima l (t), the
assembler
interprets the number as a packed BCD number.
The syntax for specifying pac ked BCDs is the same as for other integers.
Unpacked BCD numbers a re stored one digit to a byte, with the value in the lower 4 bits.
They ca BYTE
using the n be directive.
defined For example, an unpacked BCD number could be define d and initialized as follows:
As the se two lines show, you can arra nge digits backward or forward, de pending on how you write the calc
ulation
routines that handle the numbers.
fbld bcd1
pushe s the pa cked BCD number at bcd1 onto the c oproc essor sta ck. When your code complete s ca
lculations
the number,on
place the result back into memory in BCD format with the instruction
fbstp bcd1
The main processor provide s instr uctions spec ifically designed to translate to and from BCD
format. The se
instructions are ca lled “ASCII-a djust” and “de cimal-adjust” instructions. They ge t their names
from Intel
mnemonics that use the term “ASCII” to refe r to unpacke d BCD numbe rs and “ decima l” to refer to packed
BCD
numbers.
Instruction Description
; To divide 25 by 2:
mov ax, 205h ; Load 25
mov bl, 2 ; and 2 as unpacked BCDs
aad ; Adjust 0205h in AX
; to get 19h in AX
div bl ; Divide by 2 to get
; quotient 0Ch in AL
; rema inder 1 in
AH aam ; Adjust 0Ch in AL
; to 12 (unpacked BCD in AX)
; (remainder destroyed)
If you process multidigit BCD numbers in loops, each digit is processed and a djusted in turn.
For processor calculations on pac ked BCD numbers, you must do the 8-bit arithme tic calculations
on eacrately,
sepa h byte placing the result in the AL register. After each operation, use the corresponding
decima l-adjust