Nasmdoc
Nasmdoc
===========================
Chapter 1: Introduction
-----------------------
(*) `a86' is good, but not free, and in particular you don't get any
32-bit capability until you pay. It's DOS only, too.
(*) `gas' is free, and ports over DOS and Unix, but it's not very
good, since it's designed to be a back end to `gcc', which
always feeds it correct code. So its error checking is minimal.
Also, its syntax is horrible, from the point of view of anyone
trying to actually _write_ anything in it. Plus you can't write
16-bit code in it (properly).
(*) MASM isn't very good, and it's expensive, and it runs only under
DOS.
(*) TASM is better, but still strives for MASM compatibility, which
means millions of directives and tons of red tape. And its
syntax is essentially MASM's, with the contradictions and quirks
that entails (although it sorts out some of those by means of
Ideal mode). It's expensive too. And it's DOS-only.
1.3 Installation
Once you've obtained the DOS archive for NASM, `nasmXXX.zip' (where
`XXX' denotes the version number of NASM contained in the archive),
unpack it into its own directory (for example `c:\nasm').
The archive will contain four executable files: the NASM executable
files `nasm.exe' and `nasmw.exe', and the NDISASM executable files
`ndisasm.exe' and `ndisasmw.exe'. In each case, the file whose name
ends in `w' is a Win32 executable, designed to run under Windows 95
or Windows NT Intel, and the other one is a 16-bit DOS executable.
The only file NASM needs to run is its own executable, so copy (at
least) one of `nasm.exe' and `nasmw.exe' to a directory on your
PATH, or alternatively edit `autoexec.bat' to add the `nasm'
directory to your `PATH'. (If you're only installing the Win32
version, you may wish to rename it to `nasm.exe'.)
Once NASM has auto-configured, you can type `make' to build the
`nasm' and `ndisasm' binaries, and then `make install' to install
them in `/usr/local/bin' and install the man pages `nasm.1' and
`ndisasm.1' in `/usr/local/man/man1'. Alternatively, you can give
options such as `--prefix' to the `configure' script (see the file
`INSTALL' for more details), or install the programs yourself.
NASM also comes with a set of utilities for handling the RDOFF
custom object-file format, which are in the `rdoff' subdirectory of
the NASM archive. You can build these with `make rdf' and install
them with `make rdf_install', if you want them.
For example,
To produce a listing file, with the hex codes output from NASM
displayed on the left of the original sources, use the `-l' option
to give a listing file name, for example:
nasm -h
This will also list the available output file formats, and what they
are.
If you use Linux but aren't sure whether your system is `a.out' or
ELF, type
file nasm
(in the directory in which you put the NASM binary when you
installed it). If it says something like
nasm: ELF 32-bit LSB executable i386 (386 and up) Version 1
then your system is ELF, and you should use the option `-f elf' when
you want NASM to produce Linux object files. If it says
NASM will normally choose the name of your output file for you;
precisely how it does this is dependent on the object file format.
For Microsoft object file formats (`obj' and `win32'), it will
remove the `.asm' extension (or whatever extension you like to use -
NASM doesn't care) from your source file name and substitute `.obj'.
For Unix object file formats (`aout', `coff', `elf' and `as86') it
will substitute `.o'. For `rdf', it will use `.rdf', and for the
`bin' format it will simply remove the extension, so that
`myfile.asm' produces the output file `myfile'.
If the output file already exists, NASM will overwrite it, unless it
has the same name as the input file, in which case it will give a
warning and use `nasm.out' as the output file name instead.
Like `-o', the intervening space between `-f' and the output file
format is optional; so `-f elf' and `-felf' are both valid.
If you supply the `-l' option to NASM, followed (with the usual
optional space) by a file name, NASM will generate a source-listing
file for you, in which addresses and generated code are listed on
the left, and the actual source code, with expansions of multi-line
macros (except those which specifically request no expansion in
source listings: see section 4.2.9) on the right. For example:
(As usual, a space between `-i' and the path name is allowed, and
optional).
For consistency with the `-I', `-D' and `-U' options, this option
can also be specified as `-P'.
at the start of the file. You can miss off the macro value, as well:
the option `-dFOO' is equivalent to coding `%define FOO'. This form
of the directive may be useful for selecting assembly-time options
which are then tested using `%ifdef', for example `-dDEBUG'.
NASM can observe many conditions during the course of assembly which
are worth mentioning to the user, but not a sufficiently severe
error to justify NASM refusing to generate an output file. These
conditions are reported like errors, but come up with the word
`warning' before the message. Warnings do not prevent NASM from
generating an output file and returning a success status to the
operating system.
Some conditions are even less severe than that: they are only
sometimes worth mentioning to the user. Therefore NASM supports the
`-w' command-line option, which enables or disables certain classes
of assembly warning. Such warning classes are described by a name,
for example `orphan-labels'; you can enable warnings of this class
by the command-line option `-w+orphan-labels' and disable it by
`-w-orphan-labels'.
The suppressible warning classes are:
To get round this, NASM provides a feature whereby, if you begin the
`NASM' environment variable with some character that isn't a minus
sign, then NASM will treat this character as the separator character
for options. So setting the `NASM' variable to the value
`!-s!-ic:\nasmlib' is equivalent to setting it to `-s -ic:\nasmlib',
but `!-dNAME="my name"' will work.
foo equ 1
bar dw 2
mov ax,foo
mov ax,bar
This also means that NASM has no need for MASM's `OFFSET' keyword,
since the MASM code `mov ax,offset bar' means exactly the same thing
as NASM's `mov ax,bar'. If you're trying to get large amounts of
MASM code to assemble sensibly under NASM, you can always code
`%idefine offset' to make the preprocessor treat the `OFFSET'
keyword as a no-op.
For this reason, NASM doesn't support the `LODS', `MOVS', `STOS',
`SCAS', `CMPS', `INS', or `OUTS' instructions, but only supports the
forms such as `LODSB', `MOVSW', and `SCASD', which explicitly
specify the size of the components of the strings being manipulated.
As part of NASM's drive for simplicity, it also does not support the
`ASSUME' directive. NASM will not keep track of what values you
choose to put in your segment registers, and will never
_automatically_ generate a segment override prefix.
NASM also does not have any directives to support different 16-bit
memory models. The programmer has to keep track of which functions
are supposed to be called with a far call and which with a near
call, and is responsible for putting the correct form of `RET'
instruction (`RETN' or `RETF'; NASM accepts `RET' itself as an
alternate form for `RETN'); in addition, the programmer is
responsible for coding CALL FAR instructions where necessary when
calling _external_ functions, and must also keep track of which
external variable definitions are far and which are near.
For historical reasons, NASM uses the keyword `TWORD' where MASM and
compatible assemblers use `TBYTE'.
NASM does not declare uninitialised storage in the same way as MASM:
where a MASM programmer might use `stack db 64 dup (?)', NASM
requires `stack resb 64', intended to be read as `reserve 64 bytes'.
For a limited amount of compatibility, since NASM treats `?' as a
valid character in symbol names, you can code `? equ 0' and then
writing `dw ?' will at least do something vaguely useful. `DUP' is
still not a supported syntax, however.
Valid characters in labels are letters, numbers, `_', `$', `#', `@',
`~', `.', and `?'. The only characters which may be used as the
_first_ character of an identifier are letters, `.' (with special
meaning: see section 3.8), `_' and `?'. An identifier may also be
prefixed with a `$' to indicate that it is intended to be read as an
identifier and not a reserved word; thus, if some other module you
are linking with defines a symbol called `eax', you can refer to
`$eax' in NASM code to distinguish the symbol from the register.
3.2 Pseudo-Instructions
`DB', `DW', `DD', `DQ' and `DT' are used, much as in MASM, to
declare initialised data in the output file. They can be invoked in
a wide range of ways:
For example:
zerobuf: times 64 db 0
which will store exactly enough spaces to make the total length of
`buffer' up to 64. Finally, `TIMES' can be applied to ordinary
instructions, so you can code trivial unrolled loops in it:
Note also that `TIMES' can't be applied to macros: the reason for
this is that `TIMES' is processed after the macro phase, which
allows the argument to `TIMES' to contain expressions such as
`64-$+buffer' as above. To repeat more than one line of code, or a
complex macro, use the preprocessor `%rep' directive.
wordvar dw 123
mov ax,[wordvar]
mov ax,[wordvar+1]
mov ax,[es:wordvar+bx]
mov eax,[ebx*2+ecx+offset]
mov ax,[bp+di+8]
Some forms of effective address have more than one assembled form;
in most such cases NASM will generate the smallest form it can. For
example, there are distinct assembled forms for the 32-bit effective
addresses `[eax*2+0]' and `[eax+eax]', and NASM will generally
generate the latter on the grounds that the former requires four
bytes to store a zero offset.
3.4 Constants
Some examples:
mov eax,'abcd'
Some examples:
3.5 Expressions
NASM does not guarantee the size of the integers used to evaluate
expressions at compile time: since NASM can compile and run on 64-
bit systems quite happily, don't assume that expressions are
evaluated in 32-bit registers and so try to make deliberate use of
integer overflow. It might not always work. The only thing NASM will
guarantee is what's guaranteed by ANSI C: you always have _at least_
32 bits to work in.
3.5.6 `*', `/', `//', `%' and `%%': Multiplication and Division
`*' is the multiplication operator. `/' and `//' are both division
operators: `/' is unsigned division and `//' is signed division.
Similarly, `%' and `%%' provide unsigned and signed modulo operators
respectively.
Things can be more complex than this: since 16-bit segments and
groups may overlap, you might occasionally want to refer to some
symbol using a different segment base from the preferred one. NASM
lets you do this, by the use of the `WRT' (With Reference To)
keyword. So you can do things like
NASM supports the syntax `call far procedure' as a synonym for the
first of the above usages. `JMP' works identically to `CALL' in
these examples.
NASM supports no convenient synonym for this, though you can always
invent one using the macro processor.
The first pass is used to determine the size of all the assembled
code and data, so that the second pass, when generating all the
code, knows all the symbol addresses the code refers to. So one
thing NASM can't handle is code whose size depends on the value of a
symbol declared after the code in question. For example,
times (label-$) db 0
label: db 'Where am I?'
times (label-$+1) db 0
label: db 'NOW where am I?'
mov ax,symbol1
symbol1 equ symbol2
symbol2:
mov eax,[ebx+offset]
offset equ 10
In the above code fragment, each `JNE' instruction jumps to the line
immediately before it, because the two definitions of `.loop' are
kept separate by virtue of each being associated with the previous
non-local label.
This form of local label handling is borrowed from the old Amiga
assembler DevPac; however, NASM goes one step further, in allowing
access to local labels from other parts of the code. This is
achieved by means of _defining_ a local label in terms of the
previous non-local label: the first definition of `.loop' above is
really defining a symbol called `label1.loop', and the second
defines a symbol called `label2.loop'. So, if you really needed to,
you could write
NASM has the capacity to define other special symbols beginning with
a double period: for example, `..start' is used to specify the entry
point in the `obj' output format (see section 6.2.6).
will evaluate in the expected way to `mov ax,1+2*8', even though the
macro `b' wasn't defined at the time of definition of `a'.
You can pre-define single-line macros using the `-d' option on the
NASM command line: see section 2.1.8.
will expand to the instruction `mov eax, foo', since after `%undef'
the macro `foo' is no longer defined.
%assign i i+1
Multi-line macros are much more like the type of macro seen in MASM
and TASM: a multi-line macro definition in NASM looks something like
this.
%macro prologue 1
push ebp
mov ebp,esp
sub esp,%1
%endmacro
myfunc: prologue 12
The number `1' after the macro name in the `%macro' line defines the
number of parameters the macro `prologue' expects to receive. The
use of `%1' inside the macro definition refers to the first
parameter to the macro call. With a macro taking more than one
parameter, subsequent parameters would be referred to as `%2', `%3'
and so on.
%macro silly 2
%2: db %1
%endmacro
silly 'a', letter_a ; letter_a: db 'a'
silly 'ab', string_ab ; string_ab: db 'ab'
silly {13,10}, crlf ; crlf: db 13,10
%macro prologue 0
push ebp
mov ebp,esp
%endmacro
%macro push 2
push %1
push %2
%endmacro
Ordinarily, NASM will give a warning for the first of the above two
lines, since `push' is now defined to be a macro, and is being
invoked with a number of parameters for which no definition has been
given. The correct code will still be generated, but the assembler
will give a warning. This warning can be disabled by the use of the
`-w-macro-params' command-line option (see section 2.1.12).
%macro retz 0
jnz %%skip
ret
%%skip:
%endmacro
You can call this macro as many times as you want, and every time
you call it NASM will make up a different `real' name to substitute
for the label `%%skip'. The names NASM invents are of the form
`[email protected]', where the number 2345 changes with every macro call.
The `..@' prefix prevents macro-local labels from interfering with
the local label mechanism, as described in section 3.8. You should
avoid defining your own labels in this form (the `..@' prefix, then
a number, then another period) in case they interfere with macro-
local labels.
%macro writefile 2+
jmp %%endstr
%%str: db %2
%%endstr: mov dx,%%str
mov cx,%%endstr-%%str
mov bx,%1
mov ah,0x40
int 0x21
%endmacro
If you define a greedy macro, you are effectively telling NASM how
it should expand the macro given _any_ number of parameters from the
actual number specified up to infinity; in this case, for example,
NASM now knows what to do when it sees a call to `writefile' with 2,
3, 4 or more parameters. NASM will take this into account when
overloading macros, and will not allow you to define another form of
`writefile' taking 4 parameters (for example).
See section 5.2.1 for a better way to write the above macro.
4.2.4 Default Macro Parameters
then it could be called with between one and three parameters, and
`%1' would always be taken from the macro call. `%2', if not
specified by the macro call, would default to `eax', and `%3' if not
specified would default to `[ebx+2]'.
You may omit parameter defaults from the macro definition, in which
case the parameter default is taken to be blank. This can be useful
for macros which can take a variable number of parameters, since the
`%0' token (see section 4.2.5) allows you to determine how many
parameters were really passed to the macro call.
Note also the use of `*' as the maximum parameter count, indicating
that there is no upper limit on the number of parameters you may
supply to the `multipush' macro.
%macro keytab_entry 2
keypos%1 equ $-keytab
db %2
%endmacro
keytab:
keytab_entry F1,128+1
keytab_entry F2,128+2
keytab_entry Return,13
keytab:
keyposF1 equ $-keytab
db 128+1
keyposF2 equ $-keytab
db 128+2
keyposReturn equ $-keytab
db 13
Far more usefully, though, you can refer to the macro parameter by
means of `%-1', which NASM will expand as the _inverse_ condition
code. So the `retz' macro defined in section 4.2.2 can be replaced
by a general conditional-return macro like this:
%macro retc 1
j%-1 %%skip
ret
%%skip:
%endmacro
This macro can now be invoked using calls like `retc ne', which will
cause the conditional-jump instruction in the macro expansion to
come out as `JE', or `retc po' which will make the jump a `JPE'.
Or like this:
%if<condition>
; some code which only appears if <condition> is met
%elif<condition2>
; only appears if <condition> is not met but <condition2> is
%else
; this appears if neither <condition> nor <condition2> was met
%endif
For example, when debugging a program, you might want to write code
such as
You can test for a macro _not_ being defined by using `%ifndef'
instead of `%ifdef'. You can also test for macro definitions in
`%elif' blocks by using `%elifdef' and `%elifndef'.
For more details of the context stack, see section 4.6. For a sample
use of `%ifctx', see section 4.6.5.
%macro pushparam 1
%ifidni %1,ip
call %%label
%%label:
%else
push %1
%endif
%endmacro
Note the use of `%if' inside the `%ifstr': this is to detect whether
the macro was passed two arguments (so the string would be a single
string constant, and `db %2' would be adequate) or more (in which
case, all but the first two would be lumped together into `%3', and
`db %2,%3' would be required).
%ifdef SOME_MACRO
; do some setup
%elifdef SOME_OTHER_MACRO
; do some different setup
%else
%error Neither SOME_MACRO nor SOME_OTHER_MACRO was defined.
%endif
Then any user who fails to understand the way your code is supposed
to be assembled will be quickly warned of their mistake, rather than
having to wait until the program crashes on being run and then not
knowing what went wrong.
%assign i 0
%rep 64
inc word [table+2*i]
%assign i i+1
%endrep
fibonacci:
%assign i 0
%assign j 1
%rep 100
%if j > 65535
%exitrep
%endif
dw j
%assign k j+i
%assign i j
%assign j k
%endrep
fib_number equ ($-fibonacci)/2
This produces a list of all the Fibonacci numbers that will fit in
16 bits. Note that a maximum repeat count must still be given to
`%rep'. This is to prevent the possibility of NASM getting into an
infinite loop in the preprocessor, which (on multitasking or multi-
user systems) would typically cause all the system memory to be
gradually used up and other applications to start crashing.
%include "macros.mac"
will include the contents of the file `macros.mac' into the source
file containing the `%include' directive.
The standard C idiom for preventing a file being included more than
once is just as applicable in NASM: if the file `macros.mac' has the
form
%ifndef MACROS_MAC
%define MACROS_MAC
; now define some macros
%endif
then including the file more than once will not cause errors,
because the second time the file is included nothing will happen
because the macro `MACROS_MAC' will already be defined.
%push foobar
This pushes a new context called `foobar' on the stack. You can have
several contexts on the stack with the same name: they can still be
distinguished.
%macro repeat 0
%push repeat
%$begin:
%endmacro
%macro until 1
j%-1 %$begin
%pop
%endmacro
mov cx,string
repeat
add cx,3
scasb
until e
which would scan every fourth byte of a string in search of the byte
in `AL'.
NASM also allows you to define single-line macros which are local to
a particular context, in just the same way:
%define %$localmac 3
If you need to change the name of the top context on the stack (in
order, for example, to have it respond differently to `%ifctx'), you
can execute a `%pop' followed by a `%push'; but this will have the
side effect of destroying all context-local labels and macros
associated with the context that was just popped.
%pop
%push newname
%macro if 1
%push if
j%-1 %$ifnot
%endmacro
%macro else 0
%ifctx if
%repl else
jmp %$ifend
%$ifnot:
%else
%error "expected `if' before `else'"
%endif
%endmacro
%macro endif 0
%ifctx if
%$ifnot:
%pop
%elifctx else
%$ifend:
%pop
%else
%error "expected `if' or `else' before `endif'"
%endif
%endmacro
This code is more robust than the `REPEAT' and `UNTIL' macros given
in section 4.6.2, because it uses conditional assembly to check that
the macros are issued in the right order (for example, not calling
`endif' before `if') and issues a `%error' if they're not.
In addition, the `endif' macro has to be able to cope with the two
distinct cases of either directly following an `if', or following an
`else'. It achieves this, again, by using conditional assembly to do
different things depending on whether the context on top of the
stack is `if' or `else'.
The `else' macro has to preserve the context on the stack, in order
to have the `%$ifnot' referred to by the `if' macro be the same as
the one defined by the `endif' macro, but has to change the
context's name so that `endif' will know there was an intervening
`else'. It does this by the use of `%repl'.
cmp ax,bx
if ae
cmp bx,cx
if ae
mov ax,cx
else
mov ax,bx
endif
else
cmp ax,cx
if ae
mov ax,cx
endif
endif
Like the C preprocessor, NASM allows the user to find out the file
name and line number containing the current instruction. The macro
`__FILE__' expands to a string constant giving the name of the
current input file (which may change through the course of assembly
if `%include' directives are used), and `__LINE__' expands to a
numeric constant giving the current line number in the input file.
%macro notdeadyet 0
push eax
mov eax,__LINE__
call stillhere
pop eax
%endmacro
and then pepper your code with calls to `notdeadyet' until you find
the crash point.
`STRUC' takes one parameter, which is the name of the data type.
This name is defined as a symbol with the value zero, and also has
the suffix `_size' appended to it and is then defined as an `EQU'
giving the size of the structure. Once `STRUC' has been issued, you
are defining the structure, and should define fields using the
`RESB' family of pseudo-instructions, and then invoke `ENDSTRUC' to
finish the definition.
struc mytype
mt_long: resd 1
mt_word: resw 1
mt_byte: resb 1
mt_str: resb 32
endstruc
The above code defines six symbols: `mt_long' as 0 (the offset from
the beginning of a `mytype' structure to the longword field),
`mt_word' as 4, `mt_byte' as 6, `mt_str' as 7, `mytype_size' as 39,
and `mytype' itself as zero.
The reason why the structure type name is defined at zero is a side
effect of allowing structures to work with the local label
mechanism: if your structure members tend to have the same names in
more than one structure, you can define the above structure like
this:
struc mytype
.long: resd 1
.word: resw 1
.byte: resb 1
.str: resb 32
endstruc
Having defined a structure type, the next thing you typically want
to do is to declare instances of that structure in your data
segment. NASM provides an easy way to do this in the `ISTRUC'
mechanism. To declare a structure of type `mytype' in a program, you
code something like this:
The function of the `AT' macro is to make use of the `TIMES' prefix
to advance the assembly position to the correct point for the
specified structure field, and then to declare the specified data.
Therefore the structure fields must be declared in the same order as
they were specified in the structure definition.
at mt_str, db 123,134,145,156,167,178,189
db 190,100,0
Depending on personal taste, you can also omit the code part of the
`AT' line completely, and start the structure field on the next
line:
at mt_str
db 'hello, world'
db 13,10,0
`ALIGNB' (or `ALIGN' with a second argument of `RESB 1') can be used
within structure definitions:
struc mytype2
mt_byte: resb 1
alignb 2
mt_word: resw 1
alignb 4
mt_long: resd 1
mt_str: resb 32
endstruc
This will ensure that the structure members are sensibly aligned
relative to the base of the structure.
In most cases, you should not need to use `BITS' explicitly. The
`aout', `coff', `elf' and `win32' object formats, which are designed
for use in 32-bit operating systems, all cause NASM to select 32-bit
mode by default. The `obj' object format allows you to specify each
segment you define as either `USE16' or `USE32', and NASM will set
its operating mode accordingly, so the use of the `BITS' directive
is once again unnecessary.
The most likely reason for using the `BITS' directive is to write
32-bit code in a flat binary file; this is because the `bin' output
format defaults to 16-bit mode in anticipation of it being used most
frequently to write DOS `.COM' programs, DOS `.SYS' device drivers
and boot loader software.
You do _not_ need to specify `BITS 32' merely in order to use 32-bit
instructions in a 16-bit DOS program; if you do, the assembler will
generate incorrect code because it will be writing code targeted at
a 32-bit platform, to be run on a 16-bit one.
When NASM is in `BITS 16' state, instructions which use 32-bit data
are prefixed with an 0x66 byte, and those referring to 32-bit
addresses have an 0x67 prefix. In `BITS 32' state, the reverse is
true: 32-bit instructions require no prefixes, whereas instructions
using 16-bit data need an 0x66 and those working in 16-bit addresses
need an 0x67.
The `BITS' directive has an exactly equivalent primitive form,
`[BITS 16]' and `[BITS 32]'. The user-level form is a macro which
has no function other than to call the primitive form.
The Unix object formats, and the `bin' object format, all support
the standardised section names `.text', `.data' and `.bss' for the
code, data and uninitialised-data sections. The `obj' format, by
contrast, does not recognise these section names as being special,
and indeed will strip off the leading period of any section name
that has one.
SECTION .text
Users may find it useful to make use of this in their own macros.
For example, the `writefile' macro defined in section 4.2.3 can be
usefully rewritten in the following more sophisticated form:
%macro writefile 2+
[section .data]
%%str: db %2
%%endstr:
__SECT__
mov dx,%%str
mov cx,%%endstr-%%str
mov bx,%1
mov ah,0x40
int 0x21
%endmacro
absolute 0x1A
kbuf_chr resw 1
kbuf_free resw 1
kbuf resw 16
This defines some variables `on top of' the setup code, so that
after the setup has finished running, the space it took up can be
re-used as data storage for the running TSR. The symbol `tsr_end'
can be used to calculate the total size of the part of the TSR that
needs to be made resident.
extern _printf
extern _sscanf,_fscanf
The primitive form of `EXTERN' differs from the user-level form only
in that it can take only one argument at a time: the support for
multiple arguments is implemented at the preprocessor level.
You can declare the same variable as `EXTERN' more than once: NASM
will quietly ignore the second and later redeclarations. You can't
declare a variable as `EXTERN' as well as something else, though.
`GLOBAL' uses the same syntax as `EXTERN', except that it must refer
to symbols which _are_ defined in the same module as the `GLOBAL'
directive. For example:
global _main
_main: ; some code
Like `EXTERN', the primitive form of `GLOBAL' differs from the user-
level form only in that it can take only one argument at a time.
common intvar 4
is similar in function to
global intvar
section .bss
intvar resd 1
The difference is that if more than one module defines the same
common variable, then at link time those variables will be _merged_,
and references to `intvar' in all modules will point at the same
piece of memory.
The `bin' format does not produce object files: it generates nothing
in the output file except the code you wrote. Such `pure binary'
files are used by MS-DOS: `.COM' executables and `.SYS' device
drivers are pure binary files. Pure binary output is also useful for
operating-system and boot loader development.
org 0x100
dd label
label:
The parameter to `ALIGN' specifies how many low bits of the section
start address must be forced to zero. The alignment value given may
be any power of two.
The `obj' file format (NASM calls it `obj' rather than `omf' for
historical reasons) is the one produced by MASM and TASM, which is
typically fed to 16-bit DOS linkers to produce `.EXE' files. It is
also the format used by OS/2.
The `obj' format does not define any special segment names: you can
call your segments anything you like. Typical names for segments in
`obj' format files are `CODE', `DATA' and `BSS'.
When you define a segment in an `obj' file, NASM defines the segment
name as a symbol as well, so that you can access the segment address
of the segment. So, for example:
segment data
dvar: dw 1234
segment code
function: mov ax,data ; get segment address of data
mov ds,ax ; and move it into DS
inc word [dvar] ; now this reference will work
ret
The `obj' format also enables the use of the `SEG' and `WRT'
operators, so that you can write code which does things like
extern foo
mov ax,seg foo ; get preferred segment of foo
mov ds,ax
mov ax,data ; a different segment
mov es,ax
mov ax,[ds:foo] ; this accesses `foo'
mov [es:foo wrt data],bx ; so does this
(*) `ALIGN' is used, as shown above, to specify how many low bits of
the segment start address must be forced to zero. The alignment
value given may be any power of two from 1 to 4096; in reality,
the only values supported are 1, 2, 4, 16, 256 and 4096, so if 8
is specified it will be rounded up to 16, and 32, 64 and 128
will all be rounded up to 256, and so on. Note that alignment to
4096-byte boundaries is a PharLap extension to the format and
may not be supported by all linkers.
(*) `CLASS' can be used to specify the segment class; this feature
indicates to the linker that segments of the same class should
be placed near each other in the output file. The class name can
be any word, e.g. `CLASS=CODE'.
(*) When writing OS/2 object files, you should declare 32-bit
segments as `FLAT', which causes the default segment base for
anything in the segment to be the special group `FLAT', and also
defines the group if it is not already defined.
segment data
; some data
segment bss
; some uninitialised data
group dgroup data bss
NASM will allow a segment to be part of more than one group, but
will generate a warning if you do this. Variables declared in a
segment which is part of more than one group will default to being
relative to the first group that was defined to contain the segment.
A group does not have to contain any segments; you can still make
`WRT' references to a group which does not contain the variable you
are referring to. OS/2, for example, defines the special group
`FLAT' with no segments in it.
Although NASM itself is case sensitive, some OMF linkers are not;
therefore it can be useful for NASM to output single-case object
files. The `UPPERCASE' format-specific directive causes all segment,
group and symbol names that are written to the object file to be
forced to upper case just before being written. Within a source
file, NASM is still case-sensitive; but the object file can be
written entirely in upper case if desired.
For example:
export myfunc
export myfunc TheRealMoreFormalLookingFunctionName
export myfunc myfunc 1234 ; export by ordinal
export myfunc myfunc resident parm=23 nodata
OMF linkers require exactly one of the object files being linked to
define the program entry point, where execution will begin when the
program is run. If the object file that defines the entry point is
assembled using NASM, you specify the entry point by declaring the
special symbol `..start' at the point where you wish execution to
begin.
extern foo
then references such as `mov ax,foo' will give you the offset of
`foo' from its preferred segment base (as specified in whichever
module `foo' is actually defined in). So to access the contents of
`foo' you will usually need to do something like
This form causes NASM to pretend that the preferred segment base of
`foo' is in fact `dgroup'; so the expression `seg foo' will now
return `dgroup', and the expression `foo' is equivalent to
`foo wrt dgroup'.
Far common variables may be greater in size than 64Kb, and so the
OMF specification says that they are declared as a number of
_elements_ of a given size. So a 10-byte far common variable could
be declared as ten one-byte elements, five two-byte elements, two
five-byte elements or one ten-byte element.
Some OMF linkers require the element size, as well as the variable
size, to match when resolving common variables declared in more than
one module. Therefore NASM must allow you to specify the element
size on your far common variables. This is done by the following
syntax:
Note that although Microsoft say that Win32 object files follow the
COFF (Common Object File Format) standard, the object files produced
by Microsoft Win32 compilers are not compatible with COFF linkers
such as DJGPP's, and vice versa. This is due to a difference of
opinion over the precise semantics of PC-relative relocations. To
produce COFF files suitable for DJGPP, use NASM's `coff' output
format; conversely, the `coff' format does not produce object files
that Win32 linkers can generate correct output from.
The `coff' output type produces COFF object files suitable for
linking with the DJGPP linker.
(*) `exec' defines the section to be one which should have execute
permission when the program is run. `noexec' defines it as one
which should not.
`elf' defines five special symbols which you can use as the right-
hand side of the `WRT' operator to obtain PIC relocation types. They
are `..gotpc', `..gotoff', `..got', `..plt' and `..sym'. Their
functions are summarised here:
(*) Referring to the symbol marking the global offset table base
using `wrt ..gotpc' will end up giving the distance from the
beginning of the current section to the global offset table.
(`_GLOBAL_OFFSET_TABLE_' is the standard symbol name used to
refer to the GOT.) So you would then need to add `$$' to the
result to get the real address of the GOT.
You can also specify the size of the data associated with the
symbol, as a numeric expression (which may involve labels, and even
forward references) after the type specifier. Like this:
This makes NASM automatically calculate the length of the table and
place that information into the ELF symbol table.
This declares the total size of the array to be 128 bytes, and
requires that it be aligned on a 4-byte boundary.
The `aout' format generates `a.out' object files, in the form used
by early Linux systems. (These differ from other `a.out' object
files in that the magic number in the first four bytes of the file
is different. Also, some implementations of `a.out', for example
NetBSD's, support position-independent code, which Linux's
implementation doesn't.)
The `aoutb' format generates `a.out' object files, in the form used
by the various free BSD Unix clones, NetBSD, FreeBSD and OpenBSD.
For simple object files, this object format is exactly the same as
`aout' except for the magic number in the first four bytes of the
file. However, the `aoutb' format supports position-independent code
in the same way as the `elf' format, so you can use it to write BSD
shared libraries.
The Linux 16-bit assembler `as86' has its own non-standard object
file format. Although its companion linker `ld86' produces something
close to ordinary `a.out' binaries as output, the object file format
used to communicate between `as86' and `ld86' is not itself `a.out'.
`as86' is a very simple object format (from the NASM user's point of
view). It supports no special directives, no special symbols, no use
of `SEG' or `WRT', and no extensions to any standard directives. It
supports only the three standard section names `.text', `.data' and
`.bss'.
The Unix NASM archive, and the DOS archive which includes sources,
both contain an `rdoff' subdirectory holding a set of RDOFF
utilities: an RDF linker, an RDF static-library manager, an RDF file
dump utility, and a program which will load and execute an RDF
executable under Linux.
`rdf' supports only the standard section names `.text', `.data' and
`.bss'.
library mylib.rdl
The `dbg' output format is not built into NASM in the default
configuration. If you are building your own NASM executable from the
sources, you can define `OF_DBG' in `outform.h' or on the compiler
command line, and obtain the `dbg' output format.
The `dbg' format does not output an object file as such; instead, it
outputs a text file which contains a complete list of all the
transactions between the main body of NASM and the output-format
back end module. It is primarily intended to aid people who want to
write their own output drivers, so that they can get a clearer idea
of the various requests the main program makes of the output driver,
and in what order they happen.
For simple files, one can easily use the `dbg' format like this:
This workaround will still typically not work for programs intended
for `obj' format, because the `obj' `SEGMENT' and `GROUP' directives
have side effects of defining the segment and group names as
symbols; `dbg' will not do this, so the program will not assemble.
You will have to work around that by defining the symbols yourself
(using `EXTERN', for example) if you really need to get a `dbg'
trace of an `obj'-specific source file.
`dbg' accepts any section name and any directives at all, and logs
them all to its output file.
When linking several `.OBJ' files into a `.EXE' file, you should
ensure that exactly one of them has a start point defined (using the
`..start' special symbol defined by the `obj' format: see section
6.2.6). If no module defines a start point, the linker will not know
what value to give the entry-point field in the output file header;
if more than one defines a start point, the linker will not know
_which_ value to use.
segment code
mov dx,hello
mov ah,9
int 0x21
The above is the main program: load `DS:DX' with a pointer to the
greeting message (`hello' is implicitly relative to the segment
`data', which was loaded into `DS' in the setup code, so the full
pointer is valid), and call the DOS print-string function.
mov ax,0x4c00
int 0x21
segment data
hello: db 'hello, world', 13, 10, '$'
The above file, when assembled into a `.OBJ' file, will link on its
own to a valid `.EXE' file, which when run will print `hello, world'
and then exit.
The `.EXE' file format is simple enough that it's possible to build
a `.EXE' file by writing a pure-binary program and sticking a 32-
byte header on the front. This header is simple enough that it can
be generated using `DB' and `DW' commands by NASM itself, so that
you can use the `bin' output format to directly generate `.EXE'
files.
Included in the NASM archives, in the `misc' subdirectory, is a file
`exebin.mac' of macros. It defines three macros: `EXE_begin',
`EXE_stack' and `EXE_end'.
In this model, the code you end up writing starts at `0x100', just
like a `.COM' file - in fact, if you strip off the 32-byte header
from the resulting `.EXE' file, you will have a valid `.COM'
program. All the segment bases are the same, so you are limited to a
64K program, again just like a `.COM' file. Note that an `ORG'
directive is issued by the `EXE_begin' macro, so you should not
explicitly issue one of your own.
While large DOS programs must be written as `.EXE' files, small ones
are often better written as `.COM' files. `.COM' files are pure
binary, and therefore most easily produced using the `bin' output
format.
org 100h
section .text
start: ; put your code here
section .data
; put data items here
section .bss
; put uninitialised data here
The `bin' format puts the `.text' section first in the file, so you
can declare data or BSS items before beginning to write code if you
want to and the code will still end up at the front of the file
where it belongs.
The BSS (uninitialised data) section does not take up space in the
`.COM' file itself: instead, addresses of BSS items are resolved to
point at space beyond the end of the file, on the grounds that this
will be free memory when the program is run. Therefore you should
not rely on your BSS being initialised to all zeros when you run.
To assemble the above program, you should use a command line like
If you are writing a `.COM' program as more than one module, you may
wish to assemble several `.OBJ' files and link them together into a
`.COM' program. You can do this, provided you have a linker capable
of outputting `.COM' files directly (TLINK does this), or
alternatively a converter program such as `EXE2BIN' to transform the
`.EXE' file output from the linker into a `.COM' file.
(*) The first object file containing code should start its code
segment with a line like `RESB 100h'. This is to ensure that the
code begins at offset `100h' relative to the beginning of the
code segment, so that the linker or converter program does not
have to adjust address references within the file when
generating the `.COM' file. Other assemblers use an `ORG'
directive for this purpose, but `ORG' in NASM is a format-
specific directive to the `bin' output format, and does not mean
the same thing as it does in MASM-compatible assemblers.
(*) All your segments should be in the same group, so that every
time your code or data references a symbol offset, all offsets
are relative to the same segment base. This is because, when a
`.COM' file is loaded, all the segment registers contain the
same value.
For more information on the format of `.SYS' files, and the data
which has to go in the header structure, a list of books is given in
the Frequently Asked Questions list for the newsgroup
`comp.os.msdos.programmer'.
C compilers have the convention that the names of all global symbols
(functions or data) they define are formed by prefixing an
underscore to the name as it appears in the C program. So, for
example, the function a C programmer thinks of as `printf' appears
to an assembly language programmer as `_printf'. This means that in
your assembly programs, you can define symbols without a leading
underscore, and not have to worry about name clashes with C symbols.
%macro cglobal 1
global _%1
%define %1 _%1
%endmacro
%macro cextern 1
extern _%1
%define %1 _%1
%endmacro
cextern printf
extern _printf
%define printf _printf
The `cglobal' macro works similarly. You must use `cglobal' before
defining the symbol in question, but you would have had to do that
anyway if you used `GLOBAL'.
(*) In models using a single code segment (tiny, small and compact),
functions are near. This means that function pointers, when
stored in data segments or pushed on the stack as function
arguments, are 16 bits long and contain only an offset field
(the `CS' register never changes its value, and always gives the
segment part of the full function address), and that functions
are called using ordinary near `CALL' instructions and return
using `RETN' (which, in NASM, is synonymous with `RET' anyway).
This means both that you should write your own routines to
return with `RETN', and that you should call external C routines
with near `CALL' instructions.
(*) In models using more than one code segment (medium, large and
huge), functions are far. This means that function pointers are
32 bits long (consisting of a 16-bit offset followed by a 16-bit
segment), and that functions are called using `CALL FAR' (or
`CALL seg:offset') and return using `RETF'. Again, you should
therefore write your own routines to return with `RETF' and use
`CALL FAR' to call external routines.
(*) In models using a single data segment (tiny, small and medium),
data pointers are 16 bits long, containing only an offset field
(the `DS' register doesn't change its value, and always gives
the segment part of the full data item address).
(*) In models using more than one data segment (compact, large and
huge), data pointers are 32 bits long, consisting of a 16-bit
offset followed by a 16-bit segment. You should still be careful
not to modify `DS' in your routines without restoring it
afterwards, but `ES' is free for you to use to access the
contents of 32-bit data pointers you are passed.
(*) The huge memory model allows single data items to exceed 64K in
size. In all other memory models, you can access the whole of a
data item just by doing arithmetic on the offset field of the
pointer you are given, whether a segment field is present or
not; in huge model, you have to be more careful of your pointer
arithmetic.
(*) The caller pushes the function's parameters on the stack, one
after another, in reverse order (right to left, so that the
first argument specified to the function is pushed last).
(*) The callee receives control, and typically (although this is not
actually necessary, in functions which do not need to access
their parameters) starts by saving the value of `SP' in `BP' so
as to be able to use `BP' as a base pointer to find its
parameters on the stack. However, the caller was probably doing
this too, so part of the calling convention states that `BP'
must be preserved by any C function. Hence the callee, if it is
going to set up `BP' as a _frame pointer_, must push the
previous value first.
(*) The callee may then access its parameters relative to `BP'. The
word at `[BP]' holds the previous value of `BP' as it was
pushed; the next word, at `[BP+2]', holds the offset part of the
return address, pushed implicitly by `CALL'. In a small-model
(near) function, the parameters start after that, at `[BP+4]';
in a large-model (far) function, the segment part of the return
address lives at `[BP+4]', and the parameters begin at `[BP+6]'.
The leftmost parameter of the function, since it was pushed
last, is accessible at this offset from `BP'; the others follow,
at successively greater offsets. Thus, in a function such as
`printf' which takes a variable number of parameters, the
pushing of the parameters in reverse order means that the
function knows where to find its first parameter, which tells it
the number and type of the remaining ones.
(*) Once the callee has finished processing, it restores `SP' from
`BP' if it had allocated local stack space, then pops the
previous value of `BP', and returns via `RETN' or `RETF'
depending on memory model.
(*) When the caller regains control from the callee, the function
parameters are still on the stack, so it typically adds an
immediate constant to `SP' to remove them (instead of executing
a number of slow `POP' instructions). Thus, if a function is
accidentally called with the wrong number of parameters due to a
prototype mismatch, the stack will still be returned to a
sensible state since the caller, which _knows_ how many
parameters it pushed, does the removing.
global _myfunc
_myfunc: push bp
mov bp,sp
sub sp,0x40 ; 64 bytes of local stack space
mov bx,[bp+4] ; first parameter to function
; some more code
mov sp,bp ; undo "sub sp,0x40" above
pop bp
ret
extern _printf
; and then, further down...
push word [myint] ; one of my integer variables
push word mystring ; pointer into my data segment
call _printf
add sp,byte 4 ; `byte' saves space
; then those data items...
segment _DATA
myint dw 1234
mystring db 'This number -> %d <- should be 1234',10,0
In large model, the function-call code might look more like this. In
this example, it is assumed that `DS' already holds the segment base
of the segment `_DATA'. If not, you would have to initialise it
first.
The integer value still takes up one word on the stack, since large
model does not affect the size of the `int' data type. The first
argument (pushed last) to `printf', however, is a data pointer, and
therefore has to contain a segment and offset part. The segment
should be stored second in memory, and therefore must be pushed
first. (Of course, `PUSH DS' would have been a shorter instruction
than `PUSH WORD SEG mystring', if `DS' was set up as the above
example assumed.) Then the actual call becomes a far call, since
functions expect far calls in large model; and `SP' has to be
increased by 6 rather than 4 afterwards to make up for the extra
word of parameters.
extern _i
mov ax,[_i]
And to declare your own integer variable which C programs can access
as `extern int j', you do this (making sure you are assembling in
the `_DATA' segment, if necessary):
global _j
_j dw 0
To access a C data structure, you need to know the offset from the
base of the structure to the field you are interested in. You can
either do this by converting the C structure definition into a NASM
structure definition (using `STRUC'), or by calculating the one
offset and using just that.
might be four bytes long rather than three, since the `int' field
would be aligned to a two-byte boundary. However, this sort of
feature tends to be a configurable option in the C compiler, either
using command-line options or `#pragma' lines, so you have to find
out how your own compiler does it.
proc _nearproc
%$i arg
%$j arg
mov ax,[bp + %$i]
mov bx,[bp + %$j]
add ax,[bx]
endproc
Note that the `arg' macro has an `EQU' as the first line of its
expansion, and since the label before the macro call gets prepended
to the first line of the expanded macro, the `EQU' works, defining
`%$i' to be an offset from `BP'. A context-local variable is used,
local to the context pushed by the `proc' macro and popped by the
`endproc' macro, so that the same argument name can be used in later
procedures. Of course, you don't _have_ to do that.
The macro set produces code for near functions (tiny, small and
compact-model code) by default. You can have it generate far
functions (medium, large and huge-model code) by means of coding
`%define FARCODE'. This changes the kind of return instruction
generated by `endproc', and also changes the starting point for the
argument offsets. The macro set contains no intrinsic dependency on
whether data pointers are far or not.
%define FARCODE
proc _farproc
%$i arg
%$j arg 4
mov ax,[bp + %$i]
mov bx,[bp + %$j]
mov es,[bp + %$j + 2]
add ax,[bx]
endproc
(*) The memory model is always large: functions are far, data
pointers are far, and no data item can be more than 64K long.
(Actually, some functions are near, but only those functions
that are local to a Pascal unit and never called from outside
it. All assembly functions that Pascal calls, and all Pascal
functions that assembly routines are able to call, are far.)
However, all static data declared in a Pascal program goes into
the default data segment, which is the one whose segment address
will be in `DS' when control is passed to your assembly code.
The only things that do not live in the default data segment are
local variables (they live in the stack segment) and dynamically
allocated variables. All data _pointers_, however, are far.
(*) There are restrictions on the segment names you are allowed to
use - Borland Pascal will ignore code or data declared in a
segment it doesn't like the name of. The restrictions are
described below.
(*) The caller pushes the function's parameters on the stack, one
after another, in normal order (left to right, so that the first
argument specified to the function is pushed first).
(*) The callee receives control, and typically (although this is not
actually necessary, in functions which do not need to access
their parameters) starts by saving the value of `SP' in `BP' so
as to be able to use `BP' as a base pointer to find its
parameters on the stack. However, the caller was probably doing
this too, so part of the calling convention states that `BP'
must be preserved by any function. Hence the callee, if it is
going to set up `BP' as a frame pointer, must push the previous
value first.
(*) The callee may then access its parameters relative to `BP'. The
word at `[BP]' holds the previous value of `BP' as it was
pushed. The next word, at `[BP+2]', holds the offset part of the
return address, and the next one at `[BP+4]' the segment part.
The parameters begin at `[BP+6]'. The rightmost parameter of the
function, since it was pushed last, is accessible at this offset
from `BP'; the others follow, at successively greater offsets.
(*) Once the callee has finished processing, it restores `SP' from
`BP' if it had allocated local stack space, then pops the
previous value of `BP', and returns via `RETF'. It uses the form
of `RETF' with an immediate parameter, giving the number of
bytes taken up by the parameters on the stack. This causes the
parameters to be removed from the stack as a side effect of the
return instruction.
(*) When the caller regains control from the callee, the function
parameters have already been removed from the stack, so it needs
to do nothing further.
global myfunc
myfunc: push bp
mov bp,sp
sub sp,0x40 ; 64 bytes of local stack space
mov bx,[bp+8] ; first parameter to function
mov bx,[bp+6] ; second parameter to function
; some more code
mov sp,bp ; undo "sub sp,0x40" above
pop bp
retf 4 ; total size of params is 4
At the other end of the process, to call a Pascal function from your
assembly code, you would do something like this:
extern SomeFunc
; and then, further down...
push word seg mystring ; Now push the segment, and...
push word mystring ; ... offset of "mystring"
push word [myint] ; one of my variables
call far SomeFunc
(*) Any other segments in the object file are completely ignored.
`GROUP' directives and segment attributes are also ignored.
Defining `PASCAL' does not change the code which calculates the
argument offsets; you must declare your function's arguments in
reverse order. For example:
%define PASCAL
proc _pascalproc
%$j arg 4
%$i arg
mov ax,[bp + %$i]
mov bx,[bp + %$j]
mov es,[bp + %$j + 2]
add ax,[bx]
endproc
Almost all 32-bit code, and in particular all code running under
Win32, DJGPP or any of the PC Unix variants, runs in _flat_ memory
model. This means that the segment registers and paging have already
been set up to give you the same 32-bit 4Gb address space no matter
what segment you work relative to, and that you should ignore all
segment registers completely. When writing flat-model application
code, you never need to use a segment override or modify any segment
register, and the code-section addresses you pass to `CALL' and
`JMP' live in the same address space as the data-section addresses
you access your variables by and the stack-section addresses you
access local variables and procedure parameters by. Every address is
32 bits long and contains only an offset part.
The older Linux `a.out' C compiler, all Win32 compilers, DJGPP, and
NetBSD and FreeBSD, all use the leading underscore; for these
compilers, the macros `cextern' and `cglobal', as given in section
7.4.1, will still work. For ELF, though, the leading underscore
should not be used.
(*) The caller pushes the function's parameters on the stack, one
after another, in reverse order (right to left, so that the
first argument specified to the function is pushed last).
(*) The callee receives control, and typically (although this is not
actually necessary, in functions which do not need to access
their parameters) starts by saving the value of `ESP' in `EBP'
so as to be able to use `EBP' as a base pointer to find its
parameters on the stack. However, the caller was probably doing
this too, so part of the calling convention states that `EBP'
must be preserved by any C function. Hence the callee, if it is
going to set up `EBP' as a frame pointer, must push the previous
value first.
(*) The callee may then access its parameters relative to `EBP'. The
doubleword at `[EBP]' holds the previous value of `EBP' as it
was pushed; the next doubleword, at `[EBP+4]', holds the return
address, pushed implicitly by `CALL'. The parameters start after
that, at `[EBP+8]'. The leftmost parameter of the function,
since it was pushed last, is accessible at this offset from
`EBP'; the others follow, at successively greater offsets. Thus,
in a function such as `printf' which takes a variable number of
parameters, the pushing of the parameters in reverse order means
that the function knows where to find its first parameter, which
tells it the number and type of the remaining ones.
(*) Once the callee has finished processing, it restores `ESP' from
`EBP' if it had allocated local stack space, then pops the
previous value of `EBP', and returns via `RET' (equivalently,
`RETN').
(*) When the caller regains control from the callee, the function
parameters are still on the stack, so it typically adds an
immediate constant to `ESP' to remove them (instead of executing
a number of slow `POP' instructions). Thus, if a function is
accidentally called with the wrong number of parameters due to a
prototype mismatch, the stack will still be returned to a
sensible state since the caller, which _knows_ how many
parameters it pushed, does the removing.
global _myfunc
_myfunc: push ebp
mov ebp,esp
sub esp,0x40 ; 64 bytes of local stack space
mov ebx,[ebp+8] ; first parameter to function
; some more code
leave ; mov esp,ebp / pop ebp
ret
extern _printf
; and then, further down...
push dword [myint] ; one of my integer variables
push dword mystring ; pointer into my data segment
call _printf
add esp,byte 8 ; `byte' saves space
; then those data items...
segment _DATA
myint dd 1234
mystring db 'This number -> %d <- should be 1234',10,0
extern _i
mov eax,[_i]
And to declare your own integer variable which C programs can access
as `extern int j', you do this (making sure you are assembling in
the `_DATA' segment, if necessary):
global _j
_j dd 0
To access a C data structure, you need to know the offset from the
base of the structure to the field you are interested in. You can
either do this by converting the C structure definition into a NASM
structure definition (using `STRUC'), or by calculating the one
offset and using just that.
To do either of these, you should read your C compiler's manual to
find out how it organises data structures. NASM gives no special
alignment to structure members in its own `STRUC' macro, so you have
to specify alignment yourself if the C compiler generates it.
Typically, you might find that a structure like
struct {
char c;
int i;
} foo;
might be eight bytes long rather than five, since the `int' field
would be aligned to a four-byte boundary. However, this sort of
feature is sometimes a configurable option in the C compiler, either
using command-line options or `#pragma' lines, so you have to find
out how your own compiler does it.
proc _proc32
%$i arg
%$j arg
mov eax,[ebp + %$i]
mov ebx,[ebp + %$j]
add eax,[ebx]
endproc
Note that the `arg' macro has an `EQU' as the first line of its
expansion, and since the label before the macro call gets prepended
to the first line of the expanded macro, the `EQU' works, defining
`%$i' to be an offset from `BP'. A context-local variable is used,
local to the context pushed by the `proc' macro and popped by the
`endproc' macro, so that the same argument name can be used in later
procedures. Of course, you don't _have_ to do that.
ELF replaced the older `a.out' object file format under Linux
because it contains support for position-independent code (PIC),
which makes writing shared libraries much easier. NASM supports the
ELF position-independent code features, so you can write Linux ELF
shared libraries in NASM.
NetBSD, and its close cousins FreeBSD and OpenBSD, take a different
approach by hacking PIC support into the `a.out' format. NASM
supports this as the `aoutb' output format, so you can write BSD
shared libraries in NASM too.
The _data_ section of a PIC shared library does not have these
restrictions: since the data section is writable, it has to be
copied into memory anyway rather than just paged in from the library
file, so as long as it's being copied it can be relocated too. So
you can put ordinary types of relocation in the data section without
too much worry (but see section 8.2.4 for a caveat).
Each code module in your shared library should define the GOT as an
external symbol:
The interesting bit is the `CALL' instruction and the following two
lines. The `CALL' and `POP' combination obtains the address of the
label `.get_GOT', without having to know in advance where the
program was loaded (since the `CALL' instruction is encoded relative
to the current position). The `ADD' instruction makes use of one of
the special PIC relocation types: GOTPC relocation. With the
`WRT ..gotpc' qualifier specified, the symbol referenced (here
`_GLOBAL_OFFSET_TABLE_', the special symbol assigned to the GOT) is
given as an offset from the beginning of the section. (Actually, ELF
encodes it as the offset from the operand field of the `ADD'
instruction, but NASM simplifies this deliberately, so you do things
the same way for both ELF and BSD.) So the instruction then _adds_
the beginning of the section, to get the real address of the GOT,
and subtracts the value of `.get_GOT' which it knows is in `EBX'.
Therefore, by the time that instruction has finished, `EBX' contains
the address of the GOT.
%macro get_GOT 0
call %%getgot
%%getgot: pop ebx
add ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc
%endmacro
Having got the GOT, you can then use it to obtain the addresses of
your data items. Most variables will reside in the sections you have
declared; they can be accessed using the `..gotoff' special `WRT'
type. The way this works is like this:
Note that due to a peculiarity of the way BSD `a.out' format handles
this relocation type, there must be at least one non-local symbol in
the same section as the address you're trying to access.
8.2.3 Finding External and Common Data Items
This loads the address of `extvar' out of an entry in the GOT. The
linker, when it builds the shared library, collects together every
relocation of type `..got', and builds the GOT so as to ensure it
has every necessary entry present.
If you want to export symbols to the user of the library, you have
to declare whether they are functions or data, and if they are data,
you have to give the size of the data item. This is because the
dynamic linker has to build procedure linkage table entries for any
exported functions, and also moves exported data items away from the
library's data section in which they were declared.
And to export a data item such as an array, you would have to code
which makes use of the special `WRT' type `..sym' to instruct NASM
to search the symbol table for a particular symbol at that address,
rather than just relocating by section base.
funcptr: dd my_function
will give the user the address of the code you wrote, whereas
will give the address of the procedure linkage table for the
function, which is where the calling program will _believe_ the
function lives. Either address is a valid way to call the function.
Having written some code modules and assembled them to `.o' files,
you then generate your shared library with a command such as
You would then copy `library.so.1.2' into the library directory, and
create `library.so.1' as a symbolic link to it.
The most common form of mixed-size instruction is the one used when
writing a 32-bit OS: having done your setup in 16-bit mode, such as
loading the kernel, you then have to boot it by switching into
protected mode and jumping to the 32-bit kernel start address. In a
fully 32-bit OS, this tends to be the _only_ mixed-size instruction
you need, since everything before it can be done in pure 16-bit
code, and everything after it can be pure 32-bit.
This jump must specify a 48-bit far address, since the target
segment is a 32-bit one. However, it must be assembled in a 16-bit
segment, so just coding, for example,
will not work, since the offset part of the address will be
truncated to `0x9ABC' and the jump will be an ordinary 16-bit far
one.
The Linux kernel setup code gets round the inability of `as86' to
generate the required instruction by coding it manually, using `DB'
instructions. NASM can go one better than that, by actually
generating the right instruction itself. Here's how to do it right:
If the data you are trying to access in a 32-bit segment lies within
the first 64K of the segment, you may be able to get away with using
an ordinary 16-bit addressing operation for the purpose; but sooner
or later, you will want to do 32-bit addressing from 16-bit mode.
The easiest way to do this is to make sure you use a register for
the address, since any effective address containing a 32-bit
register is forced to be a 32-bit address. So you can do
mov eax,offset_into_32_bit_segment_specified_by_fs
mov dword [fs:eax],0x11223344
It can. As in section 9.1, you need only prefix the address with the
`DWORD' keyword, and it will be forced to be a 32-bit address:
Also as in section 9.1, NASM is not fussy about whether the `DWORD'
prefix comes before or after the segment override, so arguably a
nicer-looking way to code the above instruction is
You can also specify `WORD' or `DWORD' prefixes along with the `FAR'
prefix to indirect far jumps or calls. For example:
The other way you might want to access data might be using the
string instructions (`LODSx', `STOSx' and so on) or the `XLATB'
instruction. These instructions, since they take no parameters,
might seem to have no easy way to make them perform 32-bit
addressing when assembled in a 16-bit segment.
This is the purpose of NASM's `a16' and `a32' prefixes. If you are
coding `LODSB' in a 16-bit segment but it is supposed to be
accessing a string in a 32-bit segment, you should load the desired
address into `ESI' and then code
a32 lodsb
o16 push ss
o16 push ds
(You can also use the `o32' prefix to force the 32-bit behaviour
when in 16-bit mode, but this seems less useful.)
This chapter describes some of the common problems that users have
been known to encounter with NASM, and answers them. It also gives
instructions for reporting bugs in NASM if you find a difficulty
that isn't listed here.
ORG 0
; some boot sector code
ORG 510
DW 0xAA55
This is not the intended use of the `ORG' directive in NASM, and
will not work. The correct way to solve this problem in NASM is to
use the `TIMES' directive, like this:
ORG 0
; some boot sector code
TIMES 510-($-$$) DB 0
DW 0xAA55
The `TIMES' directive will insert exactly enough zero bytes into the
output to move the assembly point up to 510. This method also has
the advantage that if you accidentally fill your boot sector too
full, NASM will catch the problem at assembly time and report it, so
you won't end up with a boot sector that you have to disassemble to
find out what's wrong with it.
The other common problem with the above code is people who write the
`TIMES' line as
TIMES 510-$ DB 0
by reasoning that `$' should be a pure number, just like 510, so the
difference between them is also a pure number and can happily be fed
to `TIMES'.
NASM is a _modular_ assembler: the various component parts are
designed to be easily separable for re-use, so they don't exchange
information unnecessarily. In consequence, the `bin' output format,
even though it has been told by the `ORG' directive that the `.text'
section should start at 0, does not pass that information back to
the expression evaluator. So from the evaluator's point of view, `$'
isn't a pure number: it's an offset from a section base. Therefore
the difference between `$' and 510 is also not a pure number, but
involves a section base. Values involving section bases cannot be
passed as arguments to `TIMES'.
TIMES 510-($-$$) DB 0
in which `$' and `$$' are offsets from the same section base, and so
their difference is a pure number. This will solve the problem and
generate sensible code.
10.2 Bugs
We have never yet released a version of NASM with any _known_ bugs.
That doesn't usually stop there being plenty we didn't know about,
though. Any that you find should be reported to `[email protected]'.
Please read section 2.2 first, and don't report the bug if it's
listed in there as a deliberate feature. (If you think the feature
is badly thought out, feel free to send us reasons why you think it
should be changed, but don't just send us mail saying `This is a
bug' if the documentation says we did it on purpose.) Then read
section 10.1, and don't bother reporting the bug if it's listed
there.
(*) What operating system you're running NASM under. DOS, Linux,
NetBSD, Win16, Win32, VMS (I'd be impressed), whatever.
(*) Which version of NASM you're using, and exactly how you invoked
it. Give us the precise command line, and the contents of the
`NASM' environment variable if any.
(*) If you believe the output file from NASM to be faulty, send it
to us. That allows us to determine whether our own copy of NASM
generates the same file, or whether the problem is related to
portability issues between our development platforms and yours.
We can handle binary files mailed to us as MIME attachments,
uuencoded, and even BinHex. Alternatively, we may be able to
provide an FTP site you can upload the suspect files to; but
mailing them is easier for us.
(*) Any other information or data files that might be helpful. If,
for example, the problem involves NASM failing to generate an
object file while TASM can generate an equivalent file without
trouble, then send us _both_ object files, so we can see what
TASM is doing differently from us.
This appendix also provides the opcodes which NASM will generate for
each form of each instruction. The opcodes are listed in the
following way:
(*) The code `/r' combines the above two: it indicates that one of
the operands is a memory address or `r/m', and another is a
register, and that an effective address should be generated with
the spare (register) field in the ModR/M byte being equal to the
`register value' of the register operand. The encoding of
effective addresses is given in section A.2.3; register values
are given in section A.2.1.
(*) The codes `ib', `iw' and `id' indicate that one of the operands
to the instruction is an immediate value, and that this is to be
encoded as a byte, little-endian word or little-endian
doubleword respectively.
(*) The codes `rb', `rw' and `rd' indicate that one of the operands
to the instruction is an immediate value, and that the
_difference_ between this value and the address of the end of
the instruction is to be encoded as a byte, word or doubleword
respectively. Where the form `rw/rd' appears, it indicates that
either `rw' or `rd' should be used according to whether assembly
is being performed in `BITS 16' or `BITS 32' state respectively.
(*) The codes `ow' and `od' indicate that one of the operands to the
instruction is a reference to the contents of a memory address
specified as an immediate value: this encoding is used in some
forms of the `MOV' instruction in place of the standard
effective-address mechanism. The displacement is encoded as a
word or doubleword. Again, `ow/od' denotes that `ow' or `od'
should be chosen according to the `BITS' setting.
(*) The codes `o16' and `o32' indicate that the given form of the
instruction should be assembled with operand size 16 or 32 bits.
In other words, `o16' indicates a `66' prefix in `BITS 32'
state, but generates no code in `BITS 16' state; and `o32'
indicates a `66' prefix in `BITS 16' state but generates nothing
in `BITS 32'.
(*) The codes `a16' and `a32', similarly to `o16' and `o32',
indicate the address size of the given form of the instruction.
Where this does not match the `BITS' setting, a `67' prefix is
required.
The available condition codes are given here, along with their
numeric representations as part of opcodes. Many of these condition
codes have synonyms, so several will be listed at a time.
(*) `E' and `Z' are 4 (trigger if the zero flag is set); `NE' and
`NZ' are 5.
(*) `BE' and `NA' are 6 (trigger if either of the carry or zero
flags is set); `A' and `NBE' are 7.
(*) `P' and `PE' are 10 (trigger if the parity flag is set); `NP'
and `PO' are 11.
(*) `L' and `NGE' are 12 (trigger if exactly one of the sign and
overflow flags is set); `GE' and `NL' are 13.
(*) `LE' and `NG' are 14 (trigger if either the zero flag is set, or
exactly one of the sign and overflow flags is set); `G' and
`NLE' are 15.
The ModR/M byte consists of three fields: the `mod' field, ranging
from 0 to 3, in the upper two bits of the byte, the `r/m' field,
ranging from 0 to 7, in the lower three bits, and the spare
(register) field in the middle (bit 3 to bit 5). The spare field is
not relevant to the effective address being encoded, and either
contains an extension to the instruction opcode or the register
value of another operand.
(*) The `mod' field gives the length of the displacement field: 0
means no displacement, 1 means one byte, and 2 means two bytes.
(*) The `mod' field gives the length of the displacement field: 0
means no displacement, 1 means one byte, and 2 means four bytes.
(*) The `base' field encodes the register value of the base
register.
(*) The `index' field encodes the register value of the index
register, unless it is 4, in which case no index register is
used (so `ESP' cannot be used as an index register).
(*) The `scale' field encodes the multiplier by which the index
register is scaled before adding it to the base and
displacement: 0 encodes a multiplier of 1, 1 encodes 2, 2
encodes 4 and 3 encodes 8.
(*) `8086', `186', `286', `386', `486', `PENT' and `P6' denote the
lowest processor type that supports the instruction. Most
instructions run on all processors above the given type; those
that do not are documented. The Pentium II contains no
additional instructions beyond the P6 (Pentium Pro); from the
point of view of its instruction set, it can be thought of as a
P6 with MMX capability.
(*) `MMX' indicates that the instruction is an MMX one, and will run
on MMX-capable Pentium processors and the Pentium II.
AAA ; 37 [8086]
AAS ; 3F [8086]
AAD ; D5 0A [8086]
AAD imm ; D5 ib [8086]
AAM ; D4 0A [8086]
AAM imm ; D4 ib [8086]
`AAM' is for use after you have multiplied two decimal digits
together and left the result in `AL': it divides `AL' by ten and
stores the quotient in `AH', leaving the remainder in `AL'. The
divisor 10 can be changed by specifying an operand to the
instruction: a particularly handy use of this is `AAM 16', causing
the two nibbles in `AL' to be separated into `AH' and `AL'.
To add two numbers without also adding the contents of the carry
flag, use `ADD' (section A.6).
The MMX instruction `PAND' (see section A.116) performs the same
operation on the 64-bit MMX registers.
A.8 `ARPL': Adjust RPL Field of Selector
`BSR' performs the same function, but searches from the top instead,
so it finds the most significant set bit.
`BSWAP' swaps the order of the four bytes of a 32-bit register: bits
0-7 exchange places with bits 24-31, and bits 8-15 swap with bits
16-23. There is no explicit 16-bit equivalent: to byte-swap `AX',
`BX', `CX' or `DX', `XCHG' can be used.
These instructions all test one bit of their first operand, whose
index is given by the second operand, and store the value of that
bit into the carry flag. Bit indices are from 0 (least significant)
to 15 or 31 (most significant).
In addition to storing the original value of the bit into the carry
flag, `BTR' also resets (clears) the bit in the operand itself.
`BTS' sets the bit, and `BTC' complements the bit. `BT' does not
modify its operands.
The bit offset should be no greater than the size of the operand.
You can choose between the two immediate far call forms
(`CALL imm:imm') by the use of the `WORD' and `DWORD' keywords:
`CALL WORD 0x1234:0x5678') or `CALL DWORD 0x1234:0x56789abc'.
The `CALL FAR mem' forms execute a far call by loading the
destination address out of memory. The address loaded consists of 16
or 32 bits of offset (depending on the operand size), and 16 bits of
segment. The operand size may be overridden using
`CALL WORD FAR mem' or `CALL DWORD FAR mem'.
The `CALL r/m' forms execute a near call (within the same segment),
loading the destination address out of memory or out of a register.
The keyword `NEAR' may be specified, for clarity, in these forms,
but is not necessary. Again, operand size can be overridden using
`CALL WORD mem' or `CALL DWORD mem'.
The `CALL r/m' forms given above are near calls; NASM will accept
the `NEAR' keyword (e.g. `CALL NEAR [address]'), even though it is
not strictly necessary.
`CBW' extends `AL' into `AX' by repeating the top bit of `AL' in
every bit of `AH'. `CWD' extends `AX' into `DX:AX' by repeating the
top bit of `AX' throughout `DX'. `CWDE' extends `AX' into `EAX', and
`CDQ' extends `EAX' into `EDX:EAX'.
CLC ; F8 [8086]
CLD ; FC [8086]
CLI ; FA [8086]
CLTS ; 0F 06 [286,PRIV]
These instructions clear various flags. `CLC' clears the carry flag;
`CLD' clears the direction flag; `CLI' clears the interrupt flag
(thus disabling interrupts); and `CLTS' clears the task-switched
(`TS') flag in `CR0'.
CMC ; F5 [8086]
Although the `CMOV' instructions are flagged `P6' above, they may
not be supported by all Pentium Pro processors; the `CPUID'
instruction (section A.22) will return a bit which indicates whether
conditional moves are supported.
CMPSB ; A6 [8086]
CMPSW ; o16 A7 [8086]
CMPSD ; o32 A7 [386]
The registers used are `SI' and `DI' if the address size is 16 bits,
and `ESI' and `EDI' if it is 32 bits. If you need to use an address
size not equal to the current `BITS' setting, you can use an
explicit `a16' or `a32' prefix.
`CMPSW' and `CMPSD' work in the same way, but they compare a word or
a doubleword instead of a byte, and increment or decrement the
addressing registers by 2 or 4 instead of 1.
CPUID ; 0F A2 [PENT]
`CPUID' returns various information about the processor it is being
executed on. It fills the four registers `EAX', `EBX', `ECX' and
`EDX' with information, which varies depending on the input contents
of `EAX'.
(*) If `EAX' is two on input, `EAX', `EBX', `ECX' and `EDX' all
contain information about caches and TLBs (Translation Lookahead
Buffers).
For more information on the data returned from `CPUID', see the
documentation on Intel's web site.
DAA ; 27 [8086]
DAS ; 2F [8086]
These instructions are used in conjunction with the add and subtract
instructions to perform binary-coded decimal arithmetic in _packed_
(one BCD digit per nibble) form. For the unpacked equivalents, see
section A.4.
`DEC' subtracts 1 from its operand. It does _not_ affect the carry
flag: to affect the carry flag, use `SUB something,1' (see section
A.159). See also `INC' (section A.79).
(*) For `DIV r/m8', `AX' is divided by the given operand; the
quotient is stored in `AL' and the remainder in `AH'.
(*) For `DIV r/m16', `DX:AX' is divided by the given operand; the
quotient is stored in `AX' and the remainder in `DX'.
(*) For `DIV r/m32', `EDX:EAX' is divided by the given operand; the
quotient is stored in `EAX' and the remainder in `EDX'.
EMMS ; 0F 77 [PENT,MMX]
`EMMS' sets the FPU tag word (marking which floating-point registers
are available) to all ones, meaning all registers are available for
the FPU to use. It should be used after executing MMX instructions
and before executing any subsequent floating-point operations.
F2XM1 ; D9 F0 [8086,FPU]
FABS ; D9 E1 [8086,FPU]
`FABS' computes the absolute value of `ST0', storing the result back
in `ST0'.
`FADD', given one operand, adds the operand to `ST0' and stores the
result back in `ST0'. If the operand has the `TO' modifier, the
result is stored in the register given rather than in `ST0'.
`FADDP' performs the same function as `FADD TO', but pops the
register stack after storing the result.
The given two-operand forms are synonyms for the one-operand forms.
A.31 `FBLD', `FBSTP': BCD Floating-Point Load and Store
FCHS ; D9 E0 [8086,FPU]
FCLEX ; 9B DB E2 [8086,FPU]
FNCLEX ; DB E2 [8086,FPU]
The conditions are not the same as the standard condition codes used
with conditional jump instructions. The conditions `B', `BE', `NB',
`NBE', `E' and `NE' are exactly as normal, but none of the other
standard ones are supported. Instead, the condition `U' and its
counterpart `NU' are provided; the `U' condition is satisfied if the
last two floating-point numbers compared were _unordered_, i.e. they
were not equal but neither one could be said to be greater than the
other, for example if they were NaNs. (The flag state which signals
this is the setting of the parity flag: so the `U' condition is
notionally equivalent to `PE', and `NU' is equivalent to `PO'.)
The `FCMOV' conditions test the main processor's status flags, not
the FPU status flags, so using `FCMOV' directly after `FCOM' will
not work. Instead, you should either use `FCOMI' which writes
directly to the main CPU flags word, or use `FSTSW' to extract the
FPU flags.
Although the `FCMOV' instructions are flagged `P6' above, they may
not be supported by all Pentium Pro processors; the `CPUID'
instruction (section A.22) will return a bit which indicates whether
conditional moves are supported.
FCOMPP ; DE D9 [8086,FPU]
`FCOM' compares `ST0' with the given operand, and sets the FPU flags
accordingly. `ST0' is treated as the left-hand side of the
comparison, so that the carry flag is set (for a `less-than' result)
if `ST0' is less than the given operand.
`FCOMP' does the same as `FCOM', but pops the register stack
afterwards. `FCOMPP' compares `ST0' with `ST1' and then pops the
register stack twice.
`FCOMI' and `FCOMIP' work like the corresponding forms of `FCOM' and
`FCOMP', but write their results directly to the CPU flags register
rather than the FPU status word, so they can be immediately followed
by conditional jump or conditional move instructions.
FCOS ; D9 FF [386,FPU]
`FCOS' computes the cosine of `ST0' (in radians), and stores the
result in `ST0'. See also `FSINCOS' (section A.61).
FDECSTP ; D9 F6 [8086,FPU]
FDISI ; 9B DB E1 [8086,FPU]
FNDISI ; DB E1 [8086,FPU]
FENI ; 9B DB E0 [8086,FPU]
FNENI ; DB E0 [8086,FPU]
`FDIV' divides `ST0' by the given operand and stores the result back
in `ST0', unless the `TO' qualifier is given, in which case it
divides the given operand by `ST0' and stores the result in the
operand.
`FDIVR' does the same thing, but does the division the other way up:
so if `TO' is not given, it divides the given operand by `ST0' and
stores the result in `ST0', whereas if `TO' is given it divides
`ST0' by its operand and stores the result in the operand.
`FDIVP' operates like `FDIV TO', but pops the register stack once it
has finished. `FDIVRP' operates like `FDIVR TO', but pops the
register stack once it has finished.
`FIADD' adds the 16-bit or 32-bit integer stored in the given memory
location to `ST0', storing the result in `ST0'.
FINCSTP ; D9 F7 [8086,FPU]
FINIT ; 9B DB E3 [8086,FPU]
FNINIT ; DB E3 [8086,FPU]
FLD1 ; D9 E8 [8086,FPU]
FLDL2E ; D9 EA [8086,FPU]
FLDL2T ; D9 E9 [8086,FPU]
FLDLG2 ; D9 EC [8086,FPU]
FLDLN2 ; D9 ED [8086,FPU]
FLDPI ; D9 EB [8086,FPU]
FLDZ ; D9 EE [8086,FPU]
`FLDCW' loads a 16-bit value out of memory and stores it into the
FPU control word (governing things like the rounding mode, the
precision, and the exception masks). See also `FSTCW' (section
A.64).
`FMUL' multiplies `ST0' by the given operand, and stores the result
in `ST0', unless the `TO' qualifier is used in which case it stores
the result in the operand. `FMULP' performs the same operation as
`FMUL TO', and then pops the register stack.
FPATAN ; D9 F3 [8086,FPU]
FPTAN ; D9 F2 [8086,FPU]
`FPTAN' computes the tangent of the value in `ST0' (in radians), and
stores the result back into `ST0'.
FPREM ; D9 F8 [8086,FPU]
FPREM1 ; D9 F5 [386,FPU]
FRNDINT ; D9 FC [8086,FPU]
`FNSAVE' does the same as `FSAVE', without first waiting for pending
floating-point exceptions to clear.
FSCALE ; D9 FD [8086,FPU]
FSETPM ; DB E4 [286,FPU]
FSIN ; D9 FE [386,FPU]
FSINCOS ; D9 FB [386,FPU]
`FSIN' calculates the sine of `ST0' (in radians) and stores the
result in `ST0'. `FSINCOS' does the same, but then pushes the cosine
of the same value on the register stack, so that the sine ends up in
`ST1' and the cosine in `ST0'. `FSINCOS' is faster than executing
`FSIN' and `FCOS' (see section A.36) in succession.
FSQRT ; D9 FA [8086,FPU]
`FSQRT' calculates the square root of `ST0' and stores the result in
`ST0'.
`FST' stores the value in `ST0' into the given memory location or
other FPU register. `FSTP' does the same, but then pops the register
stack.
A.64 `FSTCW': Store Floating-Point Control Word
`FSTCW' stores the FPU control word (governing things like the
rounding mode, the precision, and the exception masks) into a 2-byte
memory area. See also `FLDCW' (section A.51).
`FNSTCW' does the same thing as `FSTCW', without first waiting for
pending floating-point exceptions to clear.
`FNSTENV' does the same thing as `FSTENV', without first waiting for
pending floating-point exceptions to clear.
`FSTSW' stores the FPU status word into `AX' or into a 2-byte memory
area.
`FNSTSW' does the same thing as `FSTSW', without first waiting for
pending floating-point exceptions to clear.
`FSUB' subtracts the given operand from `ST0' and stores the result
back in `ST0', unless the `TO' qualifier is given, in which case it
subtracts `ST0' from the given operand and stores the result in the
operand.
`FSUBR' does the same thing, but does the subtraction the other way
up: so if `TO' is not given, it subtracts `ST0' from the given
operand and stores the result in `ST0', whereas if `TO' is given it
subtracts its operand from `ST0' and stores the result in the
operand.
`FSUBP' operates like `FSUB TO', but pops the register stack once it
has finished. `FSUBRP' operates like `FSUBR TO', but pops the
register stack once it has finished.
FTST ; D9 E4 [8086,FPU]
`FTST' compares `ST0' with zero and sets the FPU flags accordingly.
`ST0' is treated as the left-hand side of the comparison, so that a
`less-than' result is generated if `ST0' is negative.
FUCOMPP ; DA E9 [386,FPU]
`FUCOM' compares `ST0' with the given operand, and sets the FPU
flags accordingly. `ST0' is treated as the left-hand side of the
comparison, so that the carry flag is set (for a `less-than' result)
if `ST0' is less than the given operand.
`FUCOMP' does the same as `FUCOM', but pops the register stack
afterwards. `FUCOMPP' compares `ST0' with `ST1' and then pops the
register stack twice.
FXAM ; D9 E5 [8086,FPU]
`FXAM' sets the FPU flags C3, C2 and C0 depending on the type of
value stored in `ST0': 000 (respectively) for an unsupported format,
001 for a NaN, 010 for a normal finite number, 011 for an infinity,
100 for a zero, 101 for an empty register, and 110 for a denormal.
It also sets the C1 flag to the sign of the number.
FXCH ; D9 C9 [8086,FPU]
FXCH fpureg ; D9 C8+r [8086,FPU]
FXCH fpureg,ST0 ; D9 C8+r [8086,FPU]
FXCH ST0,fpureg ; D9 C8+r [8086,FPU]
FXTRACT ; D9 F4 [8086,FPU]
FYL2X ; D9 F1 [8086,FPU]
FYL2XP1 ; D9 F9 [8086,FPU]
`FYL2XP1' works the same way, but replacing the base-2 log of `ST0'
with that of `ST0' plus one. This time, `ST0' must have magnitude no
greater than 1 minus half the square root of two.
HLT ; F4 [8086]
`HLT' puts the processor into a halted state, where it will perform
no more operations until restarted by an interrupt or a reset.
(*) For `IDIV r/m8', `AX' is divided by the given operand; the
quotient is stored in `AL' and the remainder in `AH'.
(*) For `IDIV r/m16', `DX:AX' is divided by the given operand; the
quotient is stored in `AX' and the remainder in `DX'.
(*) For `IDIV r/m32', `EDX:EAX' is divided by the given operand; the
quotient is stored in `EAX' and the remainder in `EDX'.
(*) For `IMUL r/m8', `AL' is multiplied by the given operand; the
product is stored in `AX'.
(*) For `IMUL r/m16', `AX' is multiplied by the given operand; the
product is stored in `DX:AX'.
(*) For `IMUL r/m32', `EAX' is multiplied by the given operand; the
product is stored in `EDX:EAX'.
The two-operand form multiplies its two operands and stores the
result in the destination (first) operand. The three-operand form
multiplies its last two operands and stores the result in the first
operand.
IN AL,imm8 ; E4 ib [8086]
IN AX,imm8 ; o16 E5 ib [8086]
IN EAX,imm8 ; o32 E5 ib [386]
IN AL,DX ; EC [8086]
IN AX,DX ; o16 ED [8086]
IN EAX,DX ; o32 ED [386]
`IN' reads a byte, word or doubleword from the specified I/O port,
and stores it in the given destination register. The port number may
be specified as an immediate value if it is between 0 and 255, and
otherwise must be stored in `DX'. See also `OUT' (section A.111).
`INC' adds 1 to its operand. It does _not_ affect the carry flag: to
affect the carry flag, use `ADD something,1' (see section A.6). See
also `DEC' (section A.24).
INSB ; 6C [186]
INSW ; o16 6D [186]
INSD ; o32 6D [386]
`INSB' inputs a byte from the I/O port specified in `DX' and stores
it at `[ES:DI]' or `[ES:EDI]'. It then increments or decrements
(depending on the direction flag: increments if the flag is clear,
decrements if it is set) `DI' or `EDI'.
The register used is `DI' if the address size is 16 bits, and `EDI'
if it is 32 bits. If you need to use an address size not equal to
the current `BITS' setting, you can use an explicit `a16' or `a32'
prefix.
`INSW' and `INSD' work in the same way, but they input a word or a
doubleword instead of a byte, and increment or decrement the
addressing register by 2 or 4 instead of 1.
The `REP' prefix may be used to repeat the instruction `CX' (or
`ECX' - again, the address size chooses which) times.
INT1 ; F1 [P6]
ICEBP ; F1 [P6]
INT01 ; F1 [P6]
INT3 ; CC [8086]
`INT3' is not precisely equivalent to `INT 3': the short form, since
it is designed to be used as a breakpoint, bypasses the normal IOPL
checks in virtual-8086 mode, and also does not go through interrupt
redirection.
INTO ; CE [8086]
INVD ; 0F 08 [486]
IRET ; CF [8086]
IRETW ; o16 CF [8086]
IRETD ; o32 CF [386]
`IRETW' pops `IP', `CS' and the flags as 2 bytes each, taking 6
bytes off the stack in total. `IRETD' pops `EIP' as 4 bytes, pops a
further 4 bytes of which the top two are discarded and the bottom
two go into `CS', and pops the flags as 4 bytes as well, taking 12
bytes off the stack.
`JCXZ' performs a short jump (with maximum range 128 bytes) if and
only if the contents of the `CX' register is 0. `JECXZ' does the
same thing, but with `ECX'.
`JMP SHORT imm' has a maximum range of 128 bytes, since the
displacement is specified as only 8 bits, but takes up less code
space. NASM does not choose when to generate `JMP SHORT' for you:
you must explicitly code `SHORT' every time you want a short jump.
You can choose between the two immediate far jump forms
(`JMP imm:imm') by the use of the `WORD' and `DWORD' keywords:
`JMP WORD 0x1234:0x5678') or `JMP DWORD 0x1234:0x56789abc'.
The `JMP FAR mem' forms execute a far jump by loading the
destination address out of memory. The address loaded consists of 16
or 32 bits of offset (depending on the operand size), and 16 bits of
segment. The operand size may be overridden using `JMP WORD FAR mem'
or `JMP DWORD FAR mem'.
The `JMP r/m' forms execute a near jump (within the same segment),
loading the destination address out of memory or out of a register.
The keyword `NEAR' may be specified, for clarity, in these forms,
but is not necessary. Again, operand size can be overridden using
`JMP WORD mem' or `JMP DWORD mem'.
The `CALL r/m' forms given above are near calls; NASM will accept
the `NEAR' keyword (e.g. `CALL NEAR [address]'), even though it is
not strictly necessary.
The ordinary form of the instructions has only a 128-byte range; the
`NEAR' form is a 386 extension to the instruction set, and can span
the full size of a segment. NASM will not override your choice of
jump instruction: if you want `Jcc NEAR', you have to use the `NEAR'
keyword.
LAHF ; 9F [8086]
`LAHF' sets the `AH' register according to the contents of the low
byte of the flags word. See also `SAHF' (section A.145).
`LEA', despite its syntax, does not access memory. It calculates the
effective address specified by its second operand as if it were
going to load or store data from it, but instead it stores the
calculated address into the register specified by its first operand.
This can be used to perform quite complex calculations (e.g.
`LEA EAX,[EBX+ECX*4+100]') in one instruction.
LEAVE ; C9 [186]
`LGDT' and `LIDT' both take a 6-byte memory area as an operand: they
load a 32-bit linear address and a 16-bit size limit from that area
(in the opposite order) into the GDTR (global descriptor table
register) or IDTR (interrupt descriptor table register). These are
the only instructions which directly use _linear_ addresses, rather
than segment/offset pairs.
`LMSW' loads the bottom four bits of the source operand into the
bottom four bits of the `CR0' control register (or the Machine
Status Word, on 286 processors). See also `SMSW' (section A.155).
LOADALL ; 0F 07 [386,UNDOC]
LOADALL286 ; 0F 05 [286,UNDOC]
LODSB ; AC [8086]
LODSW ; o16 AD [8086]
LODSD ; o32 AD [386]
The register used is `SI' if the address size is 16 bits, and `ESI'
if it is 32 bits. If you need to use an address size not equal to
the current `BITS' setting, you can use an explicit `a16' or `a32'
prefix.
`LOOPE' (or its synonym `LOOPZ') adds the additional condition that
it only jumps if the counter is nonzero _and_ the zero flag is set.
Similarly, `LOOPNE' (and `LOOPNZ') jumps only if the counter is
nonzero and the zero flag is clear.
`LTR' looks up the segment base and limit in the GDT or LDT
descriptor specified by the segment selector given as its operand,
and loads them into the Task Register.
`MOV' copies the contents of its source (second) operand into its
destination (first) operand.
In all forms of the `MOV' instruction, the two operands are the same
size, except for moving between a segment register and an `r/m32'
operand. These instructions are treated exactly like the
corresponding 16-bit equivalent (so that, for example, `MOV DS,EAX'
functions identically to `MOV DS,AX' but saves a prefix when in 32-
bit mode), except that when a segment register is moved into a 32-
bit destination, the top two bytes of the result are undefined.
`MOVD' copies 32 bits from its source (second) operand into its
destination (first) operand. When the destination is a 64-bit MMX
register, the top 32 bits are set to zero.
`MOVQ' copies 64 bits from its source (second) operand into its
destination (first) operand.
MOVSB ; A4 [8086]
MOVSW ; o16 A5 [8086]
MOVSD ; o32 A5 [386]
The registers used are `SI' and `DI' if the address size is 16 bits,
and `ESI' and `EDI' if it is 32 bits. If you need to use an address
size not equal to the current `BITS' setting, you can use an
explicit `a16' or `a32' prefix.
`MOVSW' and `MOVSD' work in the same way, but they copy a word or a
doubleword instead of a byte, and increment or decrement the
addressing registers by 2 or 4 instead of 1.
The `REP' prefix may be used to repeat the instruction `CX' (or
`ECX' - again, the address size chooses which) times.
(*) For `MUL r/m8', `AL' is multiplied by the given operand; the
product is stored in `AX'.
(*) For `MUL r/m16', `AX' is multiplied by the given operand; the
product is stored in `DX:AX'.
(*) For `MUL r/m32', `EAX' is multiplied by the given operand; the
product is stored in `EDX:EAX'.
NOP ; 90 [8086]
OR r/m8,reg8 ; 08 /r [8086]
OR r/m16,reg16 ; o16 09 /r [8086]
OR r/m32,reg32 ; o32 09 /r [386]
OR reg8,r/m8 ; 0A /r [8086]
OR reg16,r/m16 ; o16 0B /r [8086]
OR reg32,r/m32 ; o32 0B /r [386]
OR r/m8,imm8 ; 80 /1 ib [8086]
OR r/m16,imm16 ; o16 81 /1 iw [8086]
OR r/m32,imm32 ; o32 81 /1 id [386]
OR AL,imm8 ; 0C ib [8086]
OR AX,imm16 ; o16 0D iw [8086]
OR EAX,imm32 ; o32 0D id [386]
The MMX instruction `POR' (see section A.129) performs the same
operation on the 64-bit MMX registers.
OUTSB ; 6E [186]
The register used is `SI' if the address size is 16 bits, and `ESI'
if it is 32 bits. If you need to use an address size not equal to
the current `BITS' setting, you can use an explicit `a16' or `a32'
prefix.
`OUTSW' and `OUTSD' work in the same way, but they output a word or
a doubleword instead of a byte, and increment or decrement the
addressing registers by 2 or 4 instead of 1.
The `REP' prefix may be used to repeat the instruction `CX' (or
`ECX' - again, the address size chooses which) times.
A.119 `PDISTIB': MMX Packed Distance and Accumulate with Implied Register
POP CS ; 0F [8086,UNDOC]
POP DS ; 1F [8086]
POP ES ; 07 [8086]
POP SS ; 17 [8086]
POP FS ; 0F A1 [386]
POP GS ; 0F A9 [386]
POPA ; 61 [186]
POPAW ; o16 61 [186]
POPAD ; o32 61 [386]
`POPAW' pops a word from the stack into each of, successively, `DI',
`SI', `BP', nothing (it discards a word from the stack which was a
placeholder for `SP'), `BX', `DX', `CX' and `AX'. It is intended to
reverse the operation of `PUSHAW' (see section A.135), but it
ignores the value for `SP' that was pushed on the stack by `PUSHAW'.
`POPAD' pops twice as much data, and places the results in `EDI',
`ESI', `EBP', nothing (placeholder for `ESP'), `EBX', `EDX', `ECX'
and `EAX'. It reverses the operation of `PUSHAD'.
Note that the registers are popped in reverse order of their numeric
values in opcodes (see section A.2.1).
POPF ; 9D [186]
POPFW ; o16 9D [186]
POPFD ; o32 9D [386]
`POPFW' pops a word from the stack and stores it in the bottom 16
bits of the flags register (or the whole flags register, on
processors below a 386). `POPFD' pops a doubleword and stores it in
the entire flags register.
`PSxxQ' perform simple bit shifts on the 64-bit MMX registers: the
destination (first) operand is shifted left or right by the number
of bits given in the source (second) operand, and the vacated bits
are filled in with zeros (for a logical shift) or copies of the
original sign bit (for an arithmetic right shift).
`PSLLx' and `PSRLx' perform logical shifts: the vacated bits at one
end of the shifted number are filled with zeros. `PSRAx' performs an
arithmetic right shift: the vacated bits at the top of the shifted
number are filled with copies of the original top (sign) bit.
PUSH CS ; 0E [8086]
PUSH DS ; 1E [8086]
PUSH ES ; 06 [8086]
PUSH SS ; 16 [8086]
PUSH FS ; 0F A0 [386]
PUSH GS ; 0F A8 [386]
Unlike the undocumented and barely supported `POP CS', `PUSH CS' is
a perfectly valid and sensible instruction, supported on all
processors.
PUSHA ; 60 [186]
PUSHAD ; o32 60 [386]
PUSHAW ; o16 60 [186]
Note that the registers are pushed in order of their numeric values
in opcodes (see section A.2.1).
PUSHF ; 9C [186]
PUSHFD ; o32 9C [386]
PUSHFW ; o16 9C [186]
`PUSHFW' pops a word from the stack and stores it in the bottom 16
bits of the flags register (or the whole flags register, on
processors below a 386). `PUSHFD' pops a doubleword and stores it in
the entire flags register.
You can force the longer (286 and upwards, beginning with a `C1'
byte) form of `RCL foo,1' by using a `BYTE' prefix:
`RCL foo,BYTE 1'. Similarly with `RCR'.
RDMSR ; 0F 32 [PENT]
RDPMC ; 0F 33 [P6]
RDTSC ; 0F 31 [PENT]
RET ; C3 [8086]
RET imm16 ; C2 iw [8086]
RETF ; CB [8086]
RETF imm16 ; CA iw [8086]
RETN ; C3 [8086]
RETN imm16 ; C2 iw [8086]
`RET', and its exact synonym `RETN', pop `IP' or `EIP' from the
stack and transfer control to the new address. Optionally, if a
numeric second operand is provided, they increment the stack pointer
by a further `imm16' bytes after popping the return address.
You can force the longer (286 and upwards, beginning with a `C1'
byte) form of `ROL foo,1' by using a `BYTE' prefix:
`ROL foo,BYTE 1'. Similarly with `ROR'.
RSM ; 0F AA [PENT]
`RSM' returns the processor to its normal operating mode when it was
in System-Management Mode.
SAHF ; 9E [8086]
`SAHF' sets the low byte of the flags word according to the contents
of the `AH' register. See also `LAHF' (section A.90).
`SAL' is a synonym for `SHL' (see section A.152). NASM will assemble
either one to the same code, but NDISASM will always disassemble
that code as `SHL'.
You can force the longer (286 and upwards, beginning with a `C1'
byte) form of `SAL foo,1' by using a `BYTE' prefix:
`SAL foo,BYTE 1'. Similarly with `SAR'.
SALC ; D6 [8086,UNDOC]
SCASB ; AE [8086]
SCASW ; o16 AF [8086]
SCASD ; o32 AF [386]
The register used is `DI' if the address size is 16 bits, and `EDI'
if it is 32 bits. If you need to use an address size not equal to
the current `BITS' setting, you can use an explicit `a16' or `a32'
prefix.
`SCASW' and `SCASD' work in the same way, but they compare a word to
`AX' or a doubleword to `EAX' instead of a byte to `AL', and
increment or decrement the addressing registers by 2 or 4 instead of
1.
`SETcc' sets the given 8-bit operand to zero if its condition is not
satisfied, and to 1 if it is.
`SGDT' and `SIDT' both take a 6-byte memory area as an operand: they
store the contents of the GDTR (global descriptor table register) or
IDTR (interrupt descriptor table register) into that area as a 32-
bit linear address and a 16-bit size limit from that area (in that
order). These are the only instructions which directly use _linear_
addresses, rather than segment/offset pairs.
A synonym for `SHL' is `SAL' (see section A.146). NASM will assemble
either one to the same code, but NDISASM will always disassemble
that code as `SHL'.
You can force the longer (286 and upwards, beginning with a `C1'
byte) form of `SHL foo,1' by using a `BYTE' prefix:
`SHL foo,BYTE 1'. Similarly with `SHR'.
SMI ; F1 [386,UNDOC]
`SMSW' stores the bottom half of the `CR0' control register (or the
Machine Status Word, on 286 processors) into the destination
operand. See also `LMSW' (section A.96).
STC ; F9 [8086]
STD ; FD [8086]
STI ; FB [8086]
These instructions set various flags. `STC' sets the carry flag;
`STD' sets the direction flag; and `STI' sets the interrupt flag
(thus enabling interrupts).
STOSB ; AA [8086]
STOSW ; o16 AB [8086]
STOSD ; o32 AB [386]
`STOSW' and `STOSD' work in the same way, but they store the word in
`AX' or the doubleword in `EAX' instead of the byte in `AL', and
increment or decrement the addressing registers by 2 or 4 instead of
1.
The `REP' prefix may be used to repeat the instruction `CX' (or
`ECX' - again, the address size chooses which) times.
`VERR' sets the zero flag if the segment specified by the selector
in its operand can be read from at the current privilege level.
`VERW' sets the zero flag if the segment can be written.
WAIT ; 9B [8086]
`WAIT', on 8086 systems with a separate 8087 FPU, waits for the FPU
to have finished any operation it is engaged in before continuing
main processor operations, so that (for example) an FPU store to
main memory can be guaranteed to have completed before the CPU tries
to read the result back out.
WBINVD ; 0F 09 [486]
`WBINVD' invalidates and empties the processor's internal caches,
and causes the processor to instruct external caches to do the same.
It writes the contents of the caches back to memory first, so no
data is lost. To flush the caches quickly without bothering to write
the data back first, use `INVD' (section A.84).
WRMSR ; 0F 30 [PENT]
`XADD' exchanges the values in its two operands, and then adds them
together and writes the result into the destination (first) operand.
This instruction can be used with a `LOCK' prefix for multi-
processor synchronisation purposes.
`XCHG' exchanges the values in its two operands. It can be used with
a `LOCK' prefix for purposes of multi-processor synchronisation.
XLATB ; D7 [8086]
The base register used is `BX' if the address size is 16 bits, and
`EBX' if it is 32 bits. If you need to use an address size not equal
to the current `BITS' setting, you can use an explicit `a16' or
`a32' prefix.
The MMX instruction `PXOR' (see section A.137) performs the same
operation on the 64-bit MMX registers.