1 Euroasmindex
1 Euroasmindex
Embedded word
Sitemap Links Forum Tests Projects Case ins. Search
EuroAssembler Manual
Česká verze tohoto manuálu
Tutorial - how to write in assembler
About EuroAssembler ↓
Input/Output ↓
Structure of €ASM program ↓
Elements of source ↓
Instructions ↓
Program formats ↓
€ASM functions ↓
↑ About EuroAssembler
Product identification ↓
Short characteristics ↓
Notational typographic conventions ↓
Why Assembler ↓
Why Yet Another Assembler ↓
Why EuroAssembler ↓
Licence ↓
History
Download
Installation ↓
↑ Product identification
The name of the software is EuroAssembler . Please notice that there is no space between Euro and Assembler.
The name is often abbreviated as €ASM.
In a 7-bit ASCII environment it may also be referred as EUROASM and in some internal identifiers it's just ea.
The Euro character € is available on a Windows keyboard as Alt~0128 or as HTML entity € .
↑ Short Characteristics
€ASM is a macroassembler with an Intel-syntax for IA-32 and x64 AMD&Intel™ Architecture.
It also works as a linker, librarian, object convertor and make-manager.
EUROASM is a 32-bit console application for MS-Windows and for Linux which reads the source text written in assembly computer language and produces compiled object or
executable file, and listing file.
Programs written in €ASM can run on 16-bit, 32-bit or 64-bit operating systems.
€ASM is shipped with its commented source text, macrolibraries and sample programmes for a quick start.
€ASM is available free of charge.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
More than one source file can be assembled with a single Euroasm invocation. Each source produces its own listing and object files.
euroasm.exe source1.asm, source2.asm, more*.asm
A listing file successfully generated from a previous assembly session can be reused as an €ASM source code again. The listing format is compatible with assembly source
because the hexadecimal dump of the generated code is ignored by the €ASM parser.
€ASM is an iterative multipass macroassembler with full forward-reference support and partial datatype awareness.
Labels, EQUated symbols, structures may be referred (used) before they are defined (though this is not recommeded).
|0000: | ; Referring structured memory variable Today which will be defined later.
|0000:C706[1000]E007 | MOV [Today.Year],2016 ; Put immediate value to WORD memory variable.
|0006:C606[1200]0C | MOV [Today.Month],12 ; Put immediate value to BYTE memory variable.
|000B:C606[1300]1F | MOV [Today.Day],31 ; Put immediate value to BYTE memory variable.
|0010: |
|0010:00000000 |Today DS Datum ; Definition of a structured symbol whose structure will be declared later.
|0014: |
|[Datum] |Datum STRUC ; Declaration of structure Datum.
|0000:.... |.Year DW WORD
|0002:.. |.Month DB BYTE
|0003:.. |.Day DB BYTE
|0004: | ENDSTRUC Datum
Assembler instructions can be combined with HTML. Source lines that begin with <HTML tags> are treated as comments. This allows to keep the assembly source close to its
rich-text documentation.
INCLUDE statements can import either, another source file as a whole, or only its divisions, which can be specified as a range of lines or as a block delimited with the
pseudoinstructions HEAD and ENDHEAD . Includable interface division HEAD..ENDHEAD of program module does not need to be kept in a separated header file (such as “*.h”
files in C-language).
Errors and warnings are printed into the standard output and inserted also into the listing, right below the suspicious statement. Text of the error message is tailored to the
actual issue.
€ASM recovers from errors in source text. The assembly process does not stop at the first discovered error (unless it is fatal).
The emitted code always defaults to the shortest form but the programmer may choose a longer variant, using instruction modifiers:
|00000000:41 | INC ECX
|00000001:41 | INC ECX,CODE=SHORT
|00000002:FFC1 | INC ECX,CODE=LONG
|00000004: |
|00000004:83D801 | SBB EAX,1
|00000007:83D801 | SBB EAX,1,IMM=BYTE
|0000000A:81D801000000 | SBB EAX,1,IMM=DWORD
|00000010: |
|00000010:E97D000000 | JMP $+0x82, DIST=SHORT
|## W2401 Modifier "DIST=SHORT" could not be obeyed in this instruction.
|00000015: |
Data can be defined either explicitly (using pseudoinstruction D, DB, DW etc), or with a literal (ad hoc) definition.
|[DATA] |[DATA] ; Switch to data section.
|0000:4578706C~|Explicit DB "Explicit text definition.$",0
|[CODE] |[CODE] ; Switch to code section.
|0020:BA[0000] | MOV DX,Explicit
|0023:B409 | MOV AH,9 ; Write explicit string DS:DX to standard output.
|0025:CD21 | INT 21h ; Invoke DOS function.
|0027:BA[6400] | MOV DX,=B"Implicit text definition (literal).$"
|002A:B409 | MOV AH,9 ; Write implicit string DS:DX to standard output.
|002C:CD21 | INT 21h ; Invoke DOS function.
|002E: |
EuroAssembler supports Advanced vector extension set including Intel® Xeon MVEX and EVEX-encoded AVX-512 instructions.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Beside the usual subprogram blocks PROC..ENDPROC €ASM supports semiinline procedures PROC1..ENDPROC1 , which are expanded from macro only once, during its first
invocation.
€ASM can link object modules (OMF, ELF, COFF) to executable formats (COM, EXE, DLL, ELFX) as well as to other object modules and libraries. See the table of supported
combinations.
Using of dynamically linked functions may specify its DLL during import declaration, for instance IMPORT RegCloseKey, LIB="user32.dll" . The import-libraries are not required
by the €ASM linker (though they are supported).
Each source file may contain more than one module (program), each such block PROGRAM..ENDPROGRAM produces its own object or executable file. A multi-module project
source could be kept in one big file, if this is preferred by the author.
Command-line options, which clutter the invocation of many other assemblers and linkers, are not necessary. If you were to distribute the source of your program, you don't
have to specify how to make it. Executable programs are created with a simple and single euroasm source.asm .
EuroAssembler is written in EuroAssembler, its source code can be reviewed online.
The following example creates two variants of a Hello, world! program, “HelloL32.x” and “HelloL64.x” . Both executable files will be created from this source file “hello.asm”
with a single command euroasm hello.asm . We may run them in Linux or in its Windows emulator WSL:
EUROASM CPU=x64
We could hide most of assembly instruction in macroinstructions from the libraries “linapi.htm” (32 bit) and “linabi.htm” (64 bit), and using literals (=B "Hello...") for the
definition of the printed strings:
EUROASM CPU=x64
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
TerminateProgram Errorlevel=0
ENDPROGRAM HelloL64
HelloW32 PROGRAM Format=PE, Entry=Main:, Width=32 ; HelloW32.exe works in 32-bit and 64-bit Windows.
INCLUDE winapi.htm ; Define 32-bit macros WinAPI and TerminateProgram.
Main: WinAPI MessageBox,0,="Hello, world of %^Width bits in Windows!",="Title",0, Lib=user32.dll
TerminateProgram Errorlevel=0
ENDPROGRAM HelloW32
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
The examples of code in macrolibraries and €ASM sources are ignored by EuroAssembler,
because their physical lines begin with an HTML tag marker < .
explaining metainformation ┐
|0000:0000| ; €ASM printed output (listing) is displayed black on white background.
|0000:0000| ; It contains assembled machine code, copy of source instructions
|0000:0000| ; and error messages.
↑ Why Assembler
The assembly programming language (ASM) gives programmers the maximal possible control of emitted machine code. Of course, having to write every instruction for the
Central Processing Unit (CPU) by hand is very tedious. That is why subprograms were invented: procedures, functions and macroinstructions.
A subprogram is like a black box with a documented purpose, input and output. The main difference between our own ASM subprogram and a HLL function is that when it doesn't
work as expected, we can easily trace down the mistake, stepping on each machine instruction in a debugger, and that there is no-one else to blame but us.
ASM subprograms can do the same job as orders of higher level languages (HLL) or invokations of operating system (OS) application programming interface (API). The
EuroAssembler macrolanguage allows to prepare in advance macros tailored to the problem and use them to solv a task, which are similar to functions from OS or HLL libraries, and
they allow to develop programs in ASM almost as rapidly as in HLL.
The advantage of mastering the assembly language manifests when we are challenged with a third-party program that is without its source code available, or when some badly
written program throws an exception and exits. DrWatson, debuggers or disassemblers can only show the alien code converted to assembly instructions. People who never met ASM
will hardly know how to interpret the disassembled code, while ASM programmer will feel like a fish in its natural environment.
The main disadvantage of assemblers is a lack of standardized libraries which unify programming in HLL such as C or Java. In one hand, many ASM programmers build their own, which makes
their sources not portable unless the necessary libraries are shipped together with source. On the other hand, making a library with our own functions is the best method how to remember all the
function and parameter names, and on how to learn a lot about computers and operating systems.
The EuroAssembler package “euroasm.zip” contains several macrolibraries for a quick start and for inspiration.
Assembler is an universal construction kit. You may program whatever is possible to imagine, but first you have to prepare the building tools.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
I always wondered why constant EQU symbols had to be declared before the first use. Why I can't declare a macro in a macro. How to solve situations when file A includes files B and C, and file C
also includes file B, duplicating its definitions.
I don't like a language which is cluttered up with free space. In HLASM a space in the operand list signalised that everything up to the end of the punched card should be ignored. €ASM isn't that
strict in this horror vacui, in fact white spaces may be put anywhere between language elements to improve readability. However, spaces are almost never required by syntax.
€ASM does not use English word modifiers such as SHORT, NEAR, DWORD PTR, NOSPLIT which are identified by their value only. Instead, it prefers the Name=Value paradigma with keyword
instruction modifiers such as DATA=QWORD,IMM=BYTE,MASK=K5,ZEROING=ON , which remove ambiguity and replace ugly decorators proposed in the Intel documentation.
↑ Why Euro Assembler
1. Euro because it comes from Czechia, the heart of Europe .
2. Both Europe and €ASM are multilingual, as it supports national characters in identifiers and strings.
3. € is one of the few characters left unoccupied among many *ASM assemblers :-)
↑ Licence
Permission to use EuroAssembler is granted to everybody who obeys this Licence.
There are no restrictions on purpose and scope of applications created with this tool. It may be used in private, educational or commercial environments freely.
EuroAssembler is provided free of charge as-is, without any warranty guaranteed by its author.
This software may be redistributed in unmodified zipped form, as downloaded from EuroAssembler.eu. No fee may be requested for the right to use this software.
You may disseminate “euroasm.zip” on other websites, repositories, FTP archives, compact disks and similar media. Please be sure to always distribute the latest available €ASM version.
Source code of EuroAssembler was written by Pavel Šrubař, AKA vitsoft , and it is copyrighted as so.
Macrolibraries and sample projects are released as public domain and they may be modified freely.
I cannot recommend modifying the libraries, though, because they may be changed in future releases of €ASM and your enhancements would have been overwritten. Create your own files with
vacant names instead.
You may modify €ASM source code for the sole purpose to fix a bug or to enhance it with new function, but you may not distribute such modified software. It may only be used by
you on the same computer where it was edited, reassembled and linked.
EuroAssembler is not open source. I don't want to fork €ASM developement into a bazaar of incompatible versions, where each branch provides different enhancements. Please propose your
modifications to the author or to €ASM forum instead, so it might be incorporated in future releases of EuroAssembler.
↑ Installation
The distribution file “euroasm.zip” contains folders and files as listed on the Sitemap page. The modification time of all files is equally set to the nominal release time. All file names
are in lower case (Linux convention) and in 8.3 size (DOS convention), so any old DOS utility can be used for unpacking.
Choose and create EuroAssembler home directory , for instance “C:\euroasm” on Windows or “~/euroasm” on Linux, change to it and unzip the downloaded “euroasm.zip” . You
should get the directory structure as seen on the Sitemap.
If you are on Linux, move or copy the executable “euroasm.x” to some folder from system environment $PATH , for instance with sudo mv euroasm.x /usr/local/bin/euroasm .
When it is run for the first time, for instance with sudo euroasm , it will try to create configuration file “/etc/eurotool/euroasm.ini” .
If you are on Windows, move or copy the executable “euroasm.exe” with elevated rights to some folder from system environment %PATH% , for instance with
copy euroasm.exe %windir% . When it is run for the first time, for instance with euroasm.exe , it will try to create configuration file “%AppData%\eurotool\euroasm.ini” .
EuroAssembler should be able to run from everywhere with the command euroasm . You can tailor the global configuration file with a plain-text editor.
You may want to replace relative IncludePath= and LinkPath= in [EUROASM] section with an absolute path identifying the €ASM home directory.
In [PROGRAM] section you can specify your preferred target format, for instance Format=PE, Subsystem=CON and Width=32 . You could also replace IconFile="euroasm.ico" with nothing or with
your preferred personal icon, copied to “objlib” subfolder.
For the (not-recommended) bare-bone minimal installation you are now done and you could erase the whole home directory now. Yes, the executable “euroasm” itself does not
need any other supporting files, environment or registry modification.
If you prefer to read this documentation in other language, rename the default English version of this manual “eadoc\index.htm” to “eadoc\man_eng.htm” and then rename the
chosen available human language translation, e.g. “eadoc\man_cze.htm” , to “eadoc\index.htm” .
For a developement installation go to the home directory and unzip developer-scripts from the subarchive “generate.zip” . You will also need webserver and PHP (version 5.3 or
higher) installed on your localhost.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Most of EuroAssembler files are in HTML format, you may want to incorporate €ASM into your local web server, if you run it on your localhost computer.
In my Apache installation I added the following paragraph to the “httpd.conf” or “apache2.conf” :
<VirtualHost *:80>
DocumentRoot C:/euroasm/
ServerName euroasm.localhost
</VirtualHost>
I appended the statement 127.0.0.1 euroasm.localhost into the file “%SystemRoot%/SYSTEM32/drivers/etc/hosts” . Now I can write euroasm.localhost into address line of my internet browser
and explore the €ASM documentation and other files locally.
↑ Input/Output
Standard streams ↓
Other I/O ↓
Messages ↓
Input/Output files ↓
Computer programs exchange information with users through various channels: standard streams, command-line parameters, environment variables, errorlevel value, disk files and
devices.
↑ Standard streams
The basic form of communication between programs and human user has the form of characters streams, which are by default directed to the console terminal where was the
program launched from. They may also be redirected to a disk file or device driver with command-line operators > , >> , < , | .
Standard input is not used in €ASM.
Standard output prints warnings, errors and informative messages produced by €ASM.
Standard error output is not used in €ASM.
↑ Other I/O
Command-line parameters are not used. €ASM assumes that everything on the command line is the main source file name(s) intended to assemble. All options controlling the
assembly & link process are defined in the configuration files “euroasm.ini” or directly in the source file itself.
In fact there are semi-undocumented EUROASM options which are recognized in command-line, however the preferred place for EUROASM options is the configuration file or the source file.
Command-line options are employed in test examples to suppress some variable informative messages, and its use should be kept to a minimum.
Environment variables are not used in €ASM.
Environment variables may be incorporated into the source at assembly-time using the pseudoinstruction %SETE. Of course, it is also possible to read environment variables at run-
time with the corresponding API call, such as GetEnvironmentVariable() .
€ASM does not use any other devices (I/O ports, printers, sound cards, graphic adapters, etc.) at assembly-time.
↑ Messages
Important information detected by EuroAssembler during its activity is published in the form of short text messages. They are written on standard output (console window) and to the
listing file.
Message severity ↓
Messages in standard output ↓
Messages in listing ↓
Each message is identified by a combination of a capital letter followed by four decimal digits. The complete text of messages is defined in source file msg.htm.
The letter prefix and the first digit (0..9) declare message severity. The final errorlevel value , which “euroasm.exe” terminates with, is equal to the highest message severity
encounterred during the assembly session.
Message severity
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Type of Identifier Search
Prefix Severity
message range marker
Informative I I0000..I0999 0 |#
Debugging D D1000..D1999 1 |#
Warning W W2000..W3999 2..3 |##
Nonsuppressible warning W W4000..W4999 4 |##
User-defined error U U5000..U5999 5 |###
Error E E6000..E8999 6..8 |###
Fatal F F9000..F9999 9 |###
EuroAssembler is verbose by default, but it may be totally silenced when launched with the parameter NOWARN=0000..0999 , and if no error occured in source.
Warnings usually do not prevent the compiled target from execution, they are meant as a friendly reminder that the programmer might have forget about something or has made a
typo mistake.
Messages with a severity level tanging from 5..8 indicate that some statements were not compiled due to error. Although the target file may be valid, it will probably not work as
intended.
Fatal errors indicate an interaction failure with the operating system, resource exhaustion, file errors or internal €ASM errors. The target and listing file might have not been written at
all.
Informative, debugging and warning messages in the range I0000..W3999 can be suppressed with EUROASM option NOWARN=, but this ostrich-like policy is not a good idea. It's always better to
fix the root cause of the message. If you intend to publish your code, it should always assemble with an errorlevel 0.
↑ Messages on standard output
A typical message consists of its identifier followed by the actual tailored message text. When it is printed on standard output, the text is accompanied by a position indicator in the
form of a quoted file name followed by a physical line number in curly brackets, for instance
E6601 Symbol "UnknownSym" mentioned at "t1646.htm"{71} was not found. "t1646.htm"{71}
▲▲▲▲▲ ▲▲▲▲▲▲▲▲▲▲▲▲▲▲▲
Identifier position indicator
Usually there is just one position indicator per message, but when the error was discovered in the macro expansion phase, another indicator is added which determines the line in
the macro library. In case of a macro expanded in another macro, position indicators will be further chained.
↑ Messages in listing
The messages printed to the listing file have a slightly different format. The position indicator is omitted, because they are inserted just below the source line which triggered the error:
|002B: | MOV SI,UnknownSym: ; E6601 expected.
|### E6601 Symbol "UnknownSym" mentioned at "t1646.htm"{71} was not found.
▲▲▲▲
marker
The message text is prefixed with a search marker which helps to find messages in listing.
So you can use the internal function Find/FindNext (Ctrl-F) of the editor or viewer used to examine the file listing.
As amatter of fact €ASM syntax never uses multiple pound characters ## , so the search marker is unique in listing and it helps to skip (filter out) from one error|warning to the next.
You could also try the specialized €ASM listing viewer distributed as one of the sample projects.
Debugging messages D1??? produced by the pseudoinstruction %DISPLAY are published even when they are placed in false %IF branches or in blocks commented-out by
%COMMENT..%ENDCOMMENT.
The listing file is created only during the final assembly pass, and informative messages are not printed to listing at all, except for informative linker messages in the I056? range.
↑ Input/Output files
Configuration file ↓
Source file ↓
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Object file ↓
Listing file ↓
File path ↓
There are two kinds of input files which €ASM reads: configuration and source .
There are two kinds of output files which €ASM writes: object and listing .
If the output file already exists, €ASM will overwrite it without warning.
Configuration file
The configuration file, which has the immovable (predetermined) name “euroasm.ini” , specifies default options for assembler. €ASM queries two configuration files with identical
name and structure: global and local.
A global configuration file in Windows version is located in the file “%AppData\eurotool\euroasm.ini” and it is processed once after €ASM has started. If the file does not exist,
€ASM tries to create it with the factory-default contents.
Similary global configuration file in Linux version is located in the file “/etc/eurotool/euroasm.ini” and it is processed once after €ASM has started. If the file does not exist,
€ASM tries to create it with the factory-default contents.
The local configuration file is searched for in the same directory as the actual source file. If more than one source is specified at the command-line, local configuration files are read
each time when the actual source gets processed.
Local “euroasm.ini” is not automatically created by €ASM, you may need to copy or clone the global file manually, and eventually erase unchanged or unused options from the local
configuration file for better performance.
Example of command line which assembles two sources:
euroasm Source1.asm D:\Temp\Source2.asm
EuroAssembler will try to read its configuration from three files: C:\Users\login\AppData\Roaming\eurotool\euroasm.ini , .\euroasm.ini , D:\Temp\euroasm.ini .
The initial contents of configuration file, which is built-in in “euroasm.exe” and “euroasm.x” as factory-defaults, are defined in objlib/euroasm.ini. There are two sections in the file:
[EUROASM] and [PROGRAM] .
The former specifies parameters for €ASM itself, such as CPU generation, what information should go to the listing file, which warnings should be suppressed etc. The parameters
from [EUROASM] section of the configuration file can be redefined later in the source with the EUROASM pseudoinstruction, where you will find detailed explanation for each one of
the parameters.
[PROGRAM] section of configuration file specifies the default working parameters of program which is to be created by €ASM, for instance the memory model, format and name of
the object file etc. These parameters can be modified further with the PROGRAM pseudoinstruction.
The configuration parameters order is not important. Names of the parameters are case insensitive. The parameters with a boolean value accept any of the predefined enumerated
tokens such as ON, YES, TRUE, ENABLE, ENABLED as true and OFF, NO, FALSE, DISABLE, DISABLED as false. They may also accept numeric expressions which are evaluated as
boolean.
When you give away your programs source code written in EuroAssembler, you don't have to specify which comand-line parameters were used to compile and link, because they can be declared in
the source itself. A typical €ASM source program begins with configuration pseudoinstruction, such as EUROASM AUTOALIGN=YES,CPU=PENTIUM , so it is easy to tell in which assembler was the
program written.
As a developer of program written in EuroAssebler, you shouldn't rely that users of your distributed source will have the same contents of “euroasn.ini” as you have. Specify all important settings
in the beginning of the published source. Local configuration file is convenient during the development phase, when sources in the same directory do not have to explicitly specify all EUROASM and
PROGRAM parameters.
The EuroAssembler options and directives can be defined in the configuration files and in the source files (by the pseudoinstruction EUROASM). They have the following order of
precedence in their processing:
1. When euroasm.exe starts, its options are already defined with built-in factory defaults.
2. €ASM looks at the command-line; if some EUROASM keyword options were detected here, they overwride the current options in charge (factory defaults).
3. €ASM looks for the global configuration file and reapplies its options.
4. The command-line options are reapplied again (step 2 is repeated).
5. Then €ASM looks for source filename(s) at the command-line, and if a local configuration file exists in the same directory, it is processed and applied to the current
configuration derived from the previous steps.
6. Source file is now assembled. For each pseudoinstruction EUROASM found in the source that definition overwrites current working options.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
7. If another source file is provided at the command-line level in the same assembly session, €ASM restores configuration which was saved at the end of step 4 and then
continues from step 5.
↑ Source file
The source file contains the instructions to be assembled, usually it is a plain-text file or an HTML file arranged for €ASM. The file name will be provided as a command-line
parameter of the command euroasm . The source file may be identified with an absolute path in the filesystem, e.g. euroasm /user/home/euroasm/MyProject/MySource.asm , or with a
relative or omitted path, which will be related to the current shell or command line path.
The structure and syntax of source text, which €ASM is able to assemble and link, is described further down in this document.
↑ Object file
The main purpose of programming is to obtain the target file from the source code. The target file may be an object module or a library linkable to other files, or a binary file for
special purposes, or an executable file .
The format of the output file is specified by the PROGRAM parameter FORMAT=. Their layouts were standardized by their creators many, many years ago. For more details about
supported output formats see the chapter Program formats.
The final name of the target file is determined by the label used in the previously described pseudoinstruction PROGRAM, and it is appended with its default extension depending on
program format. The target name is not necessarily derived from the source filename, as in many other assemblers. For instance, if the source code file has statement
Hello PROGRAM FORMAT=COM , its output file will be created in the current directory with the name “Hello.com” , no matter what the source file is named. The default target name can
be changed by the PROGRAM parameter OUTFILE=. If the OUTFILE= name is specified with relative or omitted path, current shell directory is assumed.
↑ Listing file
Dump parameters ↓
Dump separators ↓
Dump decoration ↓
List parameters ↓
A listing file is a plain text file with two columns where EuroAssembler logs its activity:
1. The result of assembly of each statement is hexadecimally displayed in the dump column .
2. Statements, which were processed in the previous step, are copied to the source column .
The name of the listing is determined by the name of source file, which is then appended an .lst extension, and it is created in the source file directory.
The default listing filename and location might be changed with the EUROASM parameter LISTFILE=.
↑ Dump parameters
Let's create the source file “Hello.asm” with the following contents:
EUROASM DUMP=ON,DUMPWIDTH=18,DUMPALL=YES
Hello PROGRAM FORMAT=COM,LISTLITERALS=ON, \
LISTMAP=OFF,LISTGLOBALS=OFF
MOV DX,=B"Hello, world!$"
MOV AH,9
INT 21h
RET
ENDPROGRAM Hello
Submitting the file to EuroAssembler with the command euroasm Hello.asm will create the listing file “Hello.asm.lst” .
The width of the dump column expressed in characters can be specified with the EUROASM option DUMPWIDTH=. Other EUROASM options which control the dump column are
the boolean DUMPALL= and DUMP=OFF, which can suppress the dump column completely.
|<-Dump column-->|<--Source column--------
<--DumpWidth=18-->
| | EUROASM DUMP=ON,DUMPWIDTH=18,DUMPALL=YES
| |Hello PROGRAM FORMAT=COM,LISTLITERALS=ON, \
| | LISTMAP=OFF,LISTGLOBALS=OFF
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
|[COM] ::::Section changed.
|0100:BA[0801] | MOV DX,=B"Hello, world!$"
|0103:B409 | MOV AH,9
|0105:CD21 | INT 21h
|0107:C3 | RET
|[@LT1] ====ListLiterals in section [@LT1].
|0108:48656C6C6F =B"Hello, world!$"
|010D:2C20776F72 ----Dumping all. (because of DUMPALL=YES)
|0112:6C64212400 ----Dumping all.
| | ENDPROGRAM Hello
▲
column separator
↑ Dump separators
The dump column on the left side always starts with the machine comment indicator (pipe character | ) and it is terminated with a listing column separator, which determines the
origin of this line.
Listing column separators
Character Function
| (pipe) Termination of a machine comment. Used in ordinary statements, which can be reused as €ASM source.
! (exclamation) Copy of the source line with expanded preprocessing %variables (when LISTVAR=ENABLED is used).
+ (plus) Source line generated in %FOR,%WHILE,%REPEAT expansion (when LISTREPEAT=ENABLED is used).
+ (plus) Source line generated in %MACRO expansion (when LISTMACRO=ENABLED is used).
: (colon) Inserted listing line to display a changed [section].
. (fullstop) Inserted listing line to display an autoalignment stuff (when AUTOALIGN=ENABLED is used).
- (minus) Inserted listing line to display the whole dump (when DUMPALL=ENABLED is used).
= (equal) Inserted listing line to display data literals (when LISTLITERALS=ENABLED is used).
(space) Inserted envelope PROGRAM / ENDPROGRAM line.
* (asterix) Inserted listing line in INCLUDE* statement when filename wildcards are resolved.
As a side effect when the column separator is not | , the whole listing line has the form of a machine remark and it is ignored if the listing is submitted again as a program source.
↑ Dump decoration
The dump of emitting statements has their hexadecimal address (offset in the current working section), terminated with a colon : . In a 16-bit section the offset is 16 bits wide (four
hexadecimal digits), in a 32-bit and 64-bit sections it is 32 bits wide. Then the emitted bytes follow. The data contents in the dump column is always in hexadecimal notation without
an explicit number modifier. If the chosen DUMPWIDTH= is too small for all emitted bytes to fit, they are either right-trimmed and replaced with a tilde ~ (if DUMPALL=OFF ), or
additional lines with separator - are inserted to the listing ( DUMPALL=ON ).
Some other decorators are used in the dumped bytes:
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
The character < followed with one decimal digit ( N ) signals that the previously dumped byte is a 8-bit displacement which will be left-shifted by N bits at run-time to obtain the
effective displacement (the so called disp8*N compression). The digit from 1..6 specifying scaling factor N is not emitted to the assembled code.
The dump of not emitting statements is either empty or contains auxiliary information.
↑ List parameters
A listing produced with the default (factory) configuration is more or less an exact copy of the source (except for the inserted dump column). Sometimes it is useful to check if the
high-level constructs worked as expected, and this is controlled by the following boolean EUROASM options:
LISTINCLUDE=ON unrolls the contents of the included file, which is normally hidden from the main source.
LISTVAR=ON creates a copy of the statements which contain preprocessing %variable, and replace the %variable name with its expanded value in the copied line.
LISTMACRO=ON inserts statements expanded by the macroinstruction.
LISTREPEAT=ON inserts all iterations of the repeating constructs %FOR..%ENDFOR, %WHILE..%ENDWHILE, %REPEAT..%ENDREPEAT . A repeated expansion is listed as a commented-out
by dump column separator + . In the default state (defined by LISTREPEAT=DISABLED ) only the first expansion is listed.
A very useful trait by design of an EuroAssembler listing is to keep the generated listing re-usable as source code again, in the following assembly session. The messages generated in the listing
are ignored by the €ASM parser, so they need not be removed when we want to submit the listing file to a reassembly (nevertheless, those messages will be generated again if the cause of error
was not fixed).
I wanted to sustain this philosophy regardless of the LIST* parameters. In the default state with LISTINCLUDE=OFF the statement INCLUDE is normally listed and the contents of included file is
hidden. With option LISTINCLUDE=ON it is reversed: the original INCLUDE statement is commented out by dump column separator * but the included lines are inserted into the listing and they
become valid source statements. See also t2220.
When options LISTVAR, LISTMACRO, LISTREPEAT are enabled, the original line is kept as is and expanded lines will be inserted below it, commented-out by dump column separator ! or + . See
also t2230
The EUROASM option LIST=DISABLE will switch off the generating of listing lines until enabled again, or until the end of source, whichever comes first, and of course such listing will
be no longer reusable as source code.
↑ File path
Disk files can be specified by their absolute path, i. e. with a path which begins at filesystem root, e.g. C:\ProgFiles\euroasm.exe D:\Project\source.asm . Such files are
unequivocally defined.
Files may be also specified with a relative path, e. g. euroasm ..\prowin32\skeleton.asm . These relative paths are always related to the current working directory.
Files can also be specified without a path, i. e. when their name contains no colon and no slash : , \ , / . The location of such files is reviewed in the table below:
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Directory used when a file is specified without a path
Direction File Directory See also
Executable “euroasm.exe” Exe-directory OS PATH
Input Global “euroasm.ini” See instalation instruction
Output Global “euroasm.ini” See instalation instruction
Input Local “euroasm.ini” Source directory
Input Source file Current directory
Input Included source file Include directory EUROASM INCLUDEPATH=
Output Target object file Current directory PROGRAM OUTFILE=
Output Listing file Source directory EUROASM LISTFILE=
Input Linked module file Link directory EUROASM LINKPATH=
Input Linked stub file Link directory PROGRAM STUBFILE=
Input Linked icon file Link directory PROGRAM ICONFILE=
Import Dynamically imported function OS-dependent IMPORT LIB=
The current directory is the actual folder assigned to the shell process at the moment when “euroasm.exe” was launched. It's never changed by €ASM.
The exe-directory is the folder where “euroasm.exe” was found and executed, usually it is one of the directories specified by the environment variable PATH.
The source directory is the folder where the currently assembled source file lies.
The include directory is one of the directories specified by the option EUROASM INCLUDEPATH= .
The link directory is one of the directories specified by the option EUROASM LINKPATH= .
↑ Character structure
Character width ↓
Character encoding ↓
Character case ↓
Character classification ↓
↑ Character width
Source file is a sequence of characters with 8-bit width or with a variable width 8..32 bits (in UTF-8 encoding).
That is particulary important that if the source file is written in an editor that uses WIDE (16-bit) character encoding (UTF-16), it should be saved as a plain text in UTF-8 or in 8-bit
ANSI or OEM codepage before submitting the file for assembly.
↑ Character encoding
A program written in €ASM may need to display messages and texts in other languages than English. Therefore, a string which defines the output text will contain characters with
their codepoint value above 127 (codepoint is an ordinal number of the character in the [Unicode] chart).
Many European languages are satisfied with a limited set of 256 characters. Historically the relation between their codes and corresponding glyphes is called a code page .
Be aware that MS-Windows uses different code pages in console applications (OEM) and in GUI applications (ANSI) and it makes automatic conversion between them in some
circumstances. €ASM itself never changes the code page of the source.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
A programmer, who needs to mix several human-languages in MS-Windows application, may need to use 16-bit WIDE characters instead of 8-bit ANSI in text strings at run-time. See
cpmix32 as a demo example. The wide (UTF-16) strings are declared with pseudoinstruction DU (Define data in Unichars) instead of DB (Define data in Bytes) pseudoinstruction.
The wide variant of WinAPI call must be used for a visual representation of Unichar strings at run-time, e. g. TextOutW() instead of TextOutA() . However, the in-source definition of
characters in DU statement is still 8-bit. You should tell €ASM which code page was used for writing the DU statement in the source file. This information is provided by the
EUROASM CODEPAGE= option. The codepage may change dynamically in the source, thus allowing mixing of different human-languages in one program.
The texts in your program which aim to run inside the console (using the WinAPI function WriteConsoleA() or macroinstruction StdOutput) should be written in the OEM code page.
You may want to use a DOS plain-text editor, such as “EDIT.COM” for writing console programs. As text mode editors use console fonts which are in OEM code page, the text is
displayed correctly both in editor at write-time and in the console of your program at run-time.
Converserly text which would be presented in GUI windows (using the WinAPI function TextOutA() ) should be written in the ANSI code page, using a windowed editor such as
“Notepad.exe” .
The default is EUROASM CODEPAGE=UTF-8 , where characters are encoded with a variable length of one to four bytes. Thanks to the clever [UTF8] design, all non-ASCII UTF-8
characters are encoded as censecutive bytes with the values in the 128..255 range, which are treated as letters in €ASM, so any UTF-8 defined character can be used in identifiers
as is.
The recommended encoding of the EuroAssembler source files is UTF-8.
Unlike the 8-bit ANSI or OEM encodings, which limit the repertoire to 256 glyphs, CODEPAGE=UTF8 allows the mixing of arbitrary character codepoints defined in [Unicode] ,
including non-European alphabets. MS-Windows API does not, by design, directly support UTF-8 strings, and they need run-time reencoding to UTF-16 which is used by the WIDE
variant of the WinAPI functions, such as TextOutW(). This reencoding can be performed by WinAPI MultiByteToWideChar() or by macro DecodeUTF8. Exotic characters will be
displayed correctly only if the used font supports their glyphes, of course.
Example of a freeware text editor that supports UTF-8 encoding is [PSPad] .
Some UTF-8 text editors insert Byte Order Mark characters 0xEF, 0xBB, 0xBF at the start of source file. EuroAssembler ignores those three characters.
↑ Character case
€ASM is a case semi-sensitive assembler.
All identifiers created by you, the programmer, are case sensitive: labels, constants, user-defined %variables, structures, macro names. On the other hand, all built-in names are
case insensitive. Case insensivity concerns all enumerations: register names, machine instructions and prefixes, built-in data types, number modifiers, pseudoinstruction names
and parameters, symbol attributes, system %^variables.
The case insensitive names are presented in UPPER CASE in this manual but they may be used in lower or mixed case as well.
↑ Character classification
Each byte (8 bits) in €ASM source is treated as a character . Many characters have special purpose in assembler syntax unless they are quoted inside double or single quotes. A
character is unquoted if zero or an even number of quotes appears between the start of the line and the character itself.
EOL
End-of-line control character is Line Feed alias EOL (ASCII 10).
White spaces
All other control characters, Delete and Space are considered white spaces . White spaces are mainly used as separators which can improve readability but only seldom
have some syntactic significance. Unquoted multiple white spaces are treated the same way as a single one.
Digits
Digits 0..9 create numbers and identifiers. Hexadecimal numbers may also contain hexadecimal digits A..F, a..f .
Letters
Letters in €ASM are defined as a..z, A..Z , underscore _ , at sign @ , dollar sign $ , grave accent ` , question mark ? and all characters from the upper half of ASCII table
(128..255).
Some of them are employed in €ASM for special purposes, too:
Underscore _ is used in identifiers and numbers as a word separator instead of space.
A leading at-sign @ indicates a literal section name.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
The dollar sign $ alone is used as an identifier that specifies a dynamic symbol representing the current offset in a section.
The grave ` is used as a prefix when some filename not starting with a letter should represent a valid identifier.
Punctuation
All punctuation and other characters have special semantic meaning – operators, delimiters, modifiers etc. – unless they are enclosed in a pair of single ' or double "
quotes. Punctuation characters except for the percent sign % and EOL are treated as ordinary letters when they are placed inside a quoted string.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
123 { left curly bracket sublist operator
124 | vertical bar (pipe) logical operator, comment separator
125 } right curly bracket sublist operator
126 ~ tilde logical operator, shortcut indicator
127 delete white space
128..255 NonASCII characters letter
ASCII glyph name function in €ASM
↑ Horizontal structure
Physical line ↓
Statement ↓
Machine remark field ↓
Label field ↓
Prefix field ↓
Operation field ↓
Operand field ↓
Line remark field ↓
Line continuation ↓
An assembler source is treated as a text consisting of lines which are processed from left to right, from top to bottom.
↑ Physical line
A source file consists of physical lines . A physical line is a sequence of characters terminated with a line feed (ASCII 10). The line feed (EOL) character is part of the physical line,
too.
The EOL may be omitted in the last physical line of source file.
↑ Statement
A statement is an order for €ASM to perform some action at assembly-time, that is usually to emit some code to the object file or to change its internal state. A typical statement is
equivalent to a physical line but long statements might span several lines when line continuation is used.
A statement consists of several fields which are recognized by their position in the line, by the separator or by their contents. All fields are facultative (optional), any of them may be
omitted. However, no operand can be used when the operation field is omitted.
Fields in the statement
Order Field name Termination
1. Machine remark | or EOL
2. Label : or white space
3. Prefix : or white space
4. Operation white space
5. Operand ,
6. Line comment EOL
Example of a statement:
1. A structure or a symbol name or a block identifier, for example My1stStructure , My1stLabel: , Outer
2. The name of a segment, section or group, for example [.data]
3. The name of a symbolic %variable which is being set, for example %Count
4. The colon itself : , as it is explicitly telling €ASM that an empty label is used, so the following field must be a prefix or an operation.
In the first case the symbolic name may begin with a period (point) . , making the label local . The symbol in the label field may be optionally terminated with one or more colons :
immediately following the identifier. The white space between the label field and the next field may be omitted when the colon is used.
↑ Prefix field
The machine prefix is an order for CPU to change its internal state at run-time. It is similar to a machine instruction code but it only modifies the following instruction at run-time.
Each prefix assembles to a single byte machine opcode.
Prefix table
Name Group Opcode
LOCK 1 0xF0
REP 1 0xF3
REPE 1 0xF3
REPZ 1 0xF3
REPNE 1 0xF2
REPNZ 1 0xF2
XACQUIRE 1 0xF2
XRELEASE 1 0xF3
SEGCS 2 0x2E
SEGSS 2 0x36
SEGDS 2 0x3E
SEGES 2 0x26
SEGFS 2 0x64
SEGGS 2 0x65
SELDOM 2 0x2E
OFTEN 2 0x3E
OTOGGLE 3 0x66
ATOGGLE 4 0x67
The last four mnemonic names are not known in other assemblers.
The SELDOM and OFTEN may be used in front of conditional jump instructions as hints for newer CPUs to help with predictions of the jump target.
The OTOGGLE and ATOGGLE switch between 16-bit and 32-bit width of operand and address portion of machine code. They are normally generated by the assembler internally
whenever needed, without an explicit request.
Up to four prefixes can be defined in one statement but not more than one prefix from the same group.
Prefix name cannot and should not be used as a label, regardless of character-case.
The names of the prefixes are case insensitive and reserved, they cannot be used as labels. A prefix name may be terminated with colon(s) : (same as symbols).
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
AMD and Intel 64-bit architecture introduced special prefixes REX , XOP , VEX , MVEX , EVEX . €ASM treats them as part of operation encoding and does not provide mnemonic for
their direct declaration.
[AMDSSE5] introduced another instruction prefix DREX , but DREX-encoded instructions are not supported by €ASM as they never made it to the production, as far as I know.
The segment-override prefixes SEG*S can be alternatively requested as a component of memory-variable register expression. In this case they are emitted only when they are not
redundant (when they specify a non-default segment). Explicitly specified prefixes are emitted always, in the order as they appeared in the statement.
EuroAssembler warns when a prefix is used in contradiction with the CPU specification. This can be overrided when the prefix is separated in extra statement.
|0000:F091 |LOCK: XCHG AX,CX ; Prefix Lock should not be used with register operands.
|## W2356 Prefix LOCK: is not expected in this instruction.
|0002:F0 |LOCK: ; This can be outperformed when the prefix is separated in extra statement,
|0003:91 | XCHG AX,CX ; for instance to investigate CPU behaviour in such situation.
|0004: |
|0004:6691 | XCHG EAX,ECX ; Operand-size prefix 0x66 is emitted internally (in 16-bit segment).
|0006:6691 |OTOGGLE: XCHG EAX,ECX ; Its explicit specification has no effect,
|0008:6691 |OTOGGLE: XCHG AX,CX ; but here it overrides the registers sizes from 16 to 32 bits.
↑ Operation field
The operation field is the most important field of an assembler statement; it tells €ASM what to do: declare something, change its internal state or emit something to the object file. It
often gives its name to the whole statement, we may say an EXTERN operation instead of a statement with EXTERN pseudoinstruction in the operation field.
€ASM recognizes three types (genders) of operation:
Machine instructions, whose mnemonic names are defined by CPU manufacturers , they are case insensitive,
Pseudoinstructions are specified by €ASM syntax (also case insensitive),
macroinstructions are written by the user of €ASM (their names are case sensitive).
Some statements tell €ASM to generate assembled code|data to the object file, they are called emitting instructions:
prefixes,
machine instructions,
pseudoinstruction D and its clones,
pseudoinstruction ALIGN.
↑ Operand field
Ordinal operand ↓
Keyword operand ↓
Mixing operands ↓
The operands specify the data which the operation works. Conversely, the number of operands in the statement is not limited and it depends on the operation. The operand can be a
register name, number, expression, identifier, string, and almost any of their various combinations.
The operation field is separated from the first operand with at least one white-space. Operands are separated with an unquoted comma , from one another. There are two kinds of
operands recognised in €ASM: ordinal and keyword.
↑ Ordinal operands
The ordinal operands (or shortly ordinals) are referred by the order in the statement. The first operand has an ordinal number one (that is one-based index); in macros it is
identified as %1 . For instance, in the MOV AL,BL statement the AL register is operand number 1 and BL is number 2. The machine instruction MOV is known to copy contents of the
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
second operand to the first. The comma between operands will increase the ordinal number even when the operand is empty (nothing but white-spaces).
An operand of machine instruction may represent a register, immediate integer number, address, memory variable enclosed in square braces, for instance MOV AL,[ES:SI+16] .
Some other assemblers allow for different syntax of address expression, which is not supported by EuroAssembler, for instance MOV AL,ES:[SI+16] or MOV AL,[ES:16]+SI .
€ASM requires that the entire memory operand is placed inside square braces [].
↑ Keyword operands
Beside the ordinal parameters €ASM introduces one more type of operands: keyword operand (or shortly keywords). They are referred by name (key word) rather than by their
position in the operands list. A keyword operand has the cannonical form name=value where name is an identifier immediately followed by an equal sign.
Keyword operands have many advantages: they are selfdescribing (if their name is chosen wisely), they don't depend on position in the operand list (no more tedious comma counting), they may be
assigned a default value and they may be completely omitted when they have the default value.
Keyword operands are best used with macroinstructions but €ASM also employs them in some pseudoinstructions and even in machine instructions, too. For instance, in INC [EDI],DATA=DWORD
the keyword parameter DATA= tells which form of the possible INC machine instruction (increment byte, word or dword variable) should be used.
It should not have an space between keyword and equal sign to be recognized as a valid instrukction modifier:
|0000: |; Let's define two memory variables (with not recommended names).
|0000:3412 |DATA: DW 1234h
|0002:7856 |WORD: DW 5678h
|0004: |
|0004:50 | PUSH AX, DATA=WORD
|0005: |; Assembled as PUSH AX .
|0005: |; Operand DATA=WORD is recognized as a redundant but valid instruction modifier.
|0005: |
|0005:506A00 | PUSH AX, DATA = WORD
|0008: |; Operand DATA = WORD is not recognized as keyword modifier
|0008: |; due to the space which follows identifier DATA.
|0008: |; €ASM sees the 2nd operand as a numerical comparison between symbols DATA and WORD,
|0008: |; which happen to exist in this program (otherwise E6601 would have been issued).
|0008: |; Their offsets (0000h and 0002h) are different, the result is boolean FALSE
|0008: |; represented with value 0. The statement is recognized as PUSH AX, 0
|0008: |; which is legal, because €ASM accepts integration of multiple ordinal operands
|0008: |; to one statement in machine instructions PUSH, POP, INC, DEC.
|0008: |; The statement is assembled as two instructions: PUSH AX and PUSH 0 .
Operation1 in the previous example has three operands with ordinal numbers 1,2 and 4. The third operand is empty and the last two commas at the end of line are ignored, as no
other nonempty operand follows.
Mixed operands are used in Operation2 and notice that Ordinal2 has an ordinal number 2 although it is the third operand on the list. Keyword operands do not count into ordinal
numbers but empty operands do.
↑ Line comment field
A line comment begins with unquoted semicolon ; and it extends to the end of this physical line. Line comments are ignored by assembler, they are geared towards human reader of
the source code.
↑ Line continuation
A statement continues on the next physical line when line continuation character , which is an unquoted backslash \ , is used at the position where the next field would normally
begin.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
aLabel: \ ; This semicolon is redundant.
MOV EAX, \ The first operand of MOV is destination
EBX ; and the second one is source.
Everything that follows the line continuation character is treated like a comment field, so the semicolon may be omitted in this case. In a multiline statement you may add comments to any physical
line.
A line continuation may appear at the beginning of any field, but not inside the field.
The whole field of any statement must fit on one physical line.
The backslash \ is also used as modulo binary operator, which cannot appear at the beginning of operation, so the confusion is avoided.
↑ Vertical structure
Block statements ↓
Switch statements ↓
Standalone statements ↓
Statements in assembler source code are processed one by one, from top to bottom in a downwards fashion and some of them might influence successive statements but most
instructions are standalone. From this point of view there are three kinds of statements:
↑ Block statements
A block statement must appear in pair with its corresponding ending statement. The internal state of €ASM is changed only within the range between them, which is called a block .
A block is a continuous range of statements which starts with begin-block statement and ends with a matching end-block statement.
A block actually begins at the operation field of a begin-block statement and it ends at the operation field of the end-block statement.
Some block statements may be prematurely cancelled (broken) with an exit operation, for instance when an error is detected during a macro expansion.
Block statements
Label field Operation field
Obligation Represents Declares Begin block Break End block
mandatory program name program PROGRAM not used ENDPROGRAM
mandatory procedure name symbol PROC not used ENDPROC
mandatory procedure name symbol PROC1 not used ENDPROC1
mandatory structure name structure STRUC not used ENDSTRUC
optional block identifier nothing HEAD not used ENDHEAD
optional block identifier nothing %COMMENT not used %ENDCOMMENT
optional block identifier nothing %IF %ELSE %ENDIF
optional block identifier nothing %WHILE %EXITWHILE %ENDWHILE
optional ids of Begin/End swapped nothing %REPEAT %EXITREPEAT %ENDREPEAT
mandatory formal control variable %variable %FOR %EXITFOR %ENDFOR
mandatory macro name macro %MACRO %EXITMACRO %ENDMACRO
Some end-block operations can be aliased:
ENDPROC alias ENDP ,
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
ENDPROC1 alias ENDP1 ,
%ENDREPEAT alias %UNTIL .
The label field of a block statement specifies the name of the program, procedure, structure or macro. In the preprocessing of a %FOR loop the label field declares a formal variable
which changes its value in each loop cycle. In other preprocessing loops (%REPEAT, %WHILE) the label field is optional and it may contain identifier which optically connects the
beginning and the ending of block statements together (for nesting check) but has no further significance - it does not declare a symbol.
The same block identifier may be used as the first and only operand of the corresponding end-block statement.
Assemblers are not united in the cannonical format of pseudoinstructions block. In one hand MASM uses the same block identifier in the label fields of both begin- and end-block statements:
MyProcedure PROC ; MASM syntax
; some code
MyProcedure ENDP
This is good when you eyeball the source code for a procedure definition, as its name is on the left so it will hit your eyes when you scan the leftmost column. On the other hand, the same label
appears in the source twice, making an ugly exception from the rule that a non-local symbol declaration may occur only once in the program.
Perhaps for that reason Borland chose a different syntax in TASM IDEAL mode:
PROC MyProcedure ; TASM syntax
; some code
ENDP MyProcedure
It solves the double label problem but the name of MyProcedure never appears in the label field, although it is a regular label.
€ASM presents a compromise solution: the name of block is defined in the label field of a begin-block statement and it may appear in the end-block statement:
MyProcedure PROC ; €ASM syntax
; some code
ENDP MyProcedure
The operand in the endblock statement may be omitted but, if used, it must be identical to the label of the corresponding begin-block statement label. This helps to maintain a correct block nesting
because €ASM will emit an error when block identifiers don't match.
Blocks of code can be nested, but only correctly, that is, that there is no spillover between them.
Two blocks are correctly nested when one block contains the entire other block.
A %MACRO block in the example presented below contains a correctly nested %IF block.
Incorrect block nesting is only tolerated in procedures declared with the NESTINGCHECK=OFF option.
A block identifier in an operand field of end-block and exit-block statements usually only guards the correct binding. When blocks of the same type are nested one in another, exit-
block operand can be used to identify the exiting block. As an example see t2642 where one Inner %FOR block is nested in Outer %FOR block, and the operand of %EXITFOR
statement specifies which block is exited.
↑ Switch statements
A switching statement changes the internal state of €ASM for all following statements until another switching statement changes the state again, or until the end of source code is
found.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
There are two switching pseudoinstructions in €ASM: EUROASM, and SEGMENT. The latter has two forms:
[name] SEGMENT (define a new segment) and
[name] (define new section in current segment if it wasn't defined yet, and switch emitting to this section).
Examples of switching statements:
EUROASM AUTOSEGMENT=OFF, CPU=486 ; Change €ASM options for all following statements.
[Subprocedures] SEGMENT PURPOSE=CODE, ALIGN=BYTE ; Declare a new segment.
[.data] ; Switch emitting of following statements to previously defined segment [.data]
[StringData] ; Define a new section in the current segment (in [.data]).
↑ Standalone statements
All the remaining pseudoinstructions and machine instructions are not logically bound with others in a vertical structure of a program, so they are standalone , by definition.
Addresses ↓
Addressing space ↓
Alignment ↓
Boolean values ↓
Boolean extensions ↓
Comments ↓
Condition codes ↓
Data types ↓
Distance ↓
Enumerated values ↓
Expressions ↓
Groups ↓
Identifiers ↓
Length ↓
Literals ↓
Memory variables ↓
Namespace ↓
Numbers ↓
Operators ↓
Registers ↓
Scope ↓
Sections ↓
Segmentation ↓
Segments ↓
Size ↓
Strings ↓
Structures↓
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Symbols ↓
%Variables ↓
Width ↓
↑ Comments
Block comments ↓
Line comments ↓
Machine remarks ↓
Markup comments ↓
Comments are parts of the source code which are not processed by assembler and their only purpose is to explain the code for a human reader. There are four types of comments
recognised in €ASM:
↑ Line comments
Line comments start with an unquoted semicolon; everything up to the end of line is ignored by €ASM. Line comments are copied to the listing file verbatim.
↑ Machine remarks
Machine remarks are written by €ASM into the listing file and they contain the generated machine code in hexadecimal notation.
A machine remark starts with a vertical bar | which is the first non-white character on the physical line. A machine remark ends with the second occurence of the same vertical bar
| | is omitted, the whole physical line is treated as a remark. This is used for inserting error messages into the listing, just below the erroneous statement.
Machine remarks are ignored by €ASM and they are not copied to the listing. Instead, €ASM recreates them when the listing produced by previous assembly session is submitted as
a source to the assembler.
Machine remarks are not intended to be manually inserted by a programmer into the source text, use an ordinary line comment instead.
↑ Markup comments
When a physical line begins with less-than character < , it is treated as a markup comment and ignored up to the end of line. This enables to mix source code and hypertext markup
language tags. Markup comments are not copied onto the listing.
Thanks to the markup comments, €ASM source code can be stored not just only as a plain-text but also as HTML or XML hypertext.
<h2>Description of SomeProcedure</h2>
<img src="SomeImage.png"/>
SomeProcedure PROC ; See the image above for description.
All source code shipped with €ASM is completely stored in HTML format, which allows to document the source with hypertext links, tables, images and better visual representation than simple line
comments could yield.
If you want to keep your source codes in HTML, make sure that ordinary assembler statements do not start with < and rearrange the source so that every markup comment line starts with some
HTML tag. You may also use void HTML tags <span/> or <!----> to start the comment line.
↑ Block comments
A block comment can be used to temporary disable a portion of source code or to include the documentation inside the source code.
Block comment begins with %COMMENT statement and it ends with the corresponding %ENDCOMMENT. It can span over many lines of program, which as a sole restriction don't
have to start with semicolons.
Block comments are copied into the listing file.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
€ASM does not assemble the text inside the commented-out block, but it needs to parse it anyway in order to find the coresponding %ENDCOMMENT statement, so the commented-
out text should be a valid source as well.
Block comments are nestable.
↑ Identifiers
An identifier is a human readable text which gives the name to an element of assembler program: a symbol, register, instruction, structure etc.
Each identifier is a combination of letters and digits, that begins with a letter.
The length of identifiers is not limited in €ASM and all characters are significant.
↑ Numbers
Decimal numbers ↓
Binary numbers ↓
Octal numbers ↓
Hexadecimal numbers ↓
Integer numbers overview ↓
Floating point numbers ↓
Floating point special values ↓
Character constants ↓
A number notation is the way to write numeric value and those numeric values are kept and computed internally by €ASM as 64-bit signed integers.
Number notation is a combination of digits and number modifiers, which begins with a decimal digit (0..9).
A number modifier is one of the B D E G H K M P Q T character apended to the end of a digits sequence, or 0N 0O 0X 0Y (a zero followed by a letter) prefixed in front of other
digits. All number modifiers are case insensitive. Except for the decimal format, which is the default, a modifier must always be used.
Floating point numbers shell use a period (fullstop) . to separate the integer and decimal part of the number notation.
Another number modifier is the underscore character _ which is ignored by the number parser and it can be used as a digit separator instead of space or comma for a better
readability of long numbers. No white spaces are allowed in number notation.
↑ Decimal numbers
A decimal number is a combination of decimal digits 0..9 optionally suffixed with a decimal modifier D . There are five other decimal suffixes:
K (Kilo), which tells €ASM to multiply the number by 210=1024,
M (Mega), which tells €ASM to multiply the number by 220=1_048_576,
G (Giga), which tells €ASM to multiply the number by 230=1_073_741_824,
40
T (Tera), which tells €ASM to multiply the number by 2 =1_099_511_627_776,
P (Peta), which tells €ASM to multiply the number by 250=1_125_899_906_842_624.
Decimal numbers may be prefixed with 0N modifier.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
All six numbers in the following example have the same value: 1048576, 1048576d, 0n1048576, 1_048_576, 1024K, 1M .
Pay attention of the fact that using a decimal modifier is done in powers of 2, not in the usual sense of powers of tens.
Maximal possible unsigned number which would fit into 32 bits is 0xFFFF_FFFF=4_294_967_295.
Maximal possible positive number which would fit into 63 bits is 0x7FFF_FFFF_FFFF_FFFF=9_223_372_036_854_775_807.
↑ Binary numbers
A binary number is made of digits 0 1 appended with a binary number modifier B or prefixed by a modifier 0Y . Examples: 0y101, 101b, 00110010b, 1_1111_0100B are equivalent to
decimal numbers 5, 5, 50, 500 respectively.
Maximal 32-bit binary number is 1111_1111__1111_1111__1111_1111__1111_1111b.
↑ Octal numbers
Each octal digit 0..7 represents three bits of the equivalent binary notation. The number is terminated with octal suffix Q or prefixed with 0O alias 0o (digit zero followed by the
capital or small letter O ).
Example: 177_377q = 0o177_377 = 0xFEFF
The biggest 32-bit octal number is 37_777_777_777q.
The biggest 64-bit octal number is 1_777_777_777_777_777_777_777q.
↑ Hexadecimal numbers
Each hexadecimal digit encodes four bits in one character, which requires 24=16 possible values. Therefore the ten decadic digits are extended with letters A, B, C, D, E, F with
values 10, 11, 12, 13, 14, 15. Hexadecimal digits (letters) A..F are case insensitive. When the first digit of a hexadecimal number is represented with a letter A..F, an additional
leading zero must be prefixed to the number notation to avoid confusions. Hexadecimal number is terminated with suffix H or it begins with prefix 0X .
Example: 5h, 0x32, 1F4H, 0x1388, 0C350H represent decadic numbers 5, 50, 500, 5000, 50000 respectively.
Keep in mind that all numbers in €ASM are internally kept as 64-bit signed integer. Although instructions MOV EAX,0xFFFF_FFFF and MOV EAX,-1 assemble to identical codes, their operands are
internally represented as 0x0000_0000_FFFF_FFFF and 0xFFFF_FFFF_FFFF_FFFF . Boolean expression 0xFFFF_FFFF = -1 is false.
|00000000:B8FFFFFFFF | MOV EAX, 0xFFFF_FFFF
|00000005:B8FFFFFFFF | MOV EAX, -1
|FALSE | %IF 0xFFFF_FFFF = -1
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Binary, octal and hexadecimal numbers must always be written with prefix or suffix (or both, however this is not recommended, and it feels awkward). There is no RADIX directive in
€ASM.
For more examples of acceptable syntax see €ASM numbers tests.
↑ Floating point numbers
Floating point alias real numbers are parsed from the scientific notation with decimal point and exponent of 10, using this syntax:
FP number notation anatomy
Order Field name Contents
1 number sign + , - or nothing
2 significand digits 0 .. 9 , digit separators _
3 decimal point .
4 fraction digits 0..9 , digit separators _
5 FP number modifier E or e
6 exponent sign + , - or nothing
7 exponent part digits 0..9 , digit separators _
For instance, in the floating point number 1234.56E3 has value 1234.56 * 103=1234560.
An omitted sign is treated as + .
The decimal part can be omitted when it is zero(s), for instance 123.00E2 = 123.E2 .
The decimal point may be omitted when decimal part is omitted (it is equal to zero). The E modifier still specifies the floating point format. 123.00E2 = 123.E2 = 123E2 = 12300.
Exponent can be omitted when it is zero. The modifier E may be omitted in this case, too, and without the E modifier it is the presence of the decimal point which decides if the
number is integer or real. In our example: 12345.67E0 = 12345.67E = 12345.67
No white space is allowed within FP number notation.
The number is considered as floating point when its notation contains either decimal point . , or modifier E (capital or small letter E ), or both. Otherwise it is treated as an integer.
€ASM does not calculate with floating point numbers at assembly time.
All internal assembly-time calculations in €ASM are provided with 64-bit integers only. When FP is used in mathematical expression, it is converted to an integer first. And the error
E6130 (number overflow) is reported if the number does not fit to 64 bits. Warning W2210 (precision lost) is reported if the FP number had decimal part which was rounded in
conversion.
An actual FP number format [IEEE754] is maintained only when the scientific notation is used to define the static FP variable with pseudoinstruction DD, DQ, DT.
Half-precision FP numbers (float16) are not supported by €ASM, neither they are supported by processors, with exception of two packed SIMD instructions VCVTPS2PH and
VCVTPH2PS, and a few MVEX-encoded up/down conversion operations.
Unlike integer numbers, the sign of FP notation is inseparable from digits which follow. If you by mistake put a space between the sign and the number, instead of FP definition it is treated as an
operation (unary minus applied to a number), and therefore the FP number is converted to integer first, before the operation is evaluated. Examples:
|00000000:001DF1C7 | DD -123.45E3 ; Single-precision FP number -123.45*103.
|00000004:C61DFEFF | DD - 123.45E3 ; Dword signed integer number -123450.
|00000008:00000000A023FEC0 | DQ -123.45E3 ; Double-precision FP number -123.45*103.
|00000010:C61DFEFFFFFFFFFF | DQ - 123.45E3 ; Qword signed integer number -123450.
|00000018:0000000000001DF10FC0 | DT -123.45E3 ; Extended-precision FP number -123.45*103.
|00000022: | DT - 123.45E3 ; Tbyte integer number is not supported.
|### E6725 Datatype TBYTE expects plain floating-point number.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Constant Interpretation single precision (DD) double precision (DQ) extended precision (DT)
#ZERO zero 00000000 00000000_00000000 0000_00000000_00000000
+#ZERO positive zero 00000000 00000000_00000000 0000_00000000_00000000
-#ZERO negative zero 80000000 80000000_00000000 8000_00000000_00000000
#INF infinity 7F800000 7FF00000_00000000 7FFF_80000000_00000000
+#INF positive infinity 7F800000 7FF00000_00000000 7FFF_80000000_00000000
-#INF negative infinity FF800000 FFF00000_00000000 FFFF_80000000_00000000
#PINF pseudo infinity 7F800000 7FF00000_00000000 7FFF_00000000_00000000
+#PINF positive pseudo infinity 7F800000 7FF00000_00000000 7FFF_00000000_00000000
-#PINF negative pseudo infinity FF800000 FFF00000_00000000 FFFF_00000000_00000000
#NAN not a number 7FC00000 7FF80000_00000000 7FFF_C0000000_00000000
+#NAN positive not a number 7FC00000 7FF80000_00000000 7FFF_C0000000_00000000
-#NAN negative not a number FFC00000 FFF80000_00000000 FFFF_C0000000_00000000
#PNAN pseudo not a number 7F800001 7FF00000_00000001 7FFF_00000000_00000001
+#PNAN positive pseudo not a number 7F800001 7FF00000_00000001 7FFF_00000000_00000001
-#PNAN negative pseudo not a number FF800001 FFF00000_00000001 FFFF_00000000_00000001
#QNAN quiet not a number 7FC00000 7FF80000_00000000 7FFF_C0000000_00000000
+#QNAN positive quiet not a number 7FC00000 7FF80000_00000000 7FFF_C0000000_00000000
-#QNAN negative quiet not a number FFC00000 FFF80000_00000000 FFFF_C0000000_00000000
#SNAN signaling not a number 7F800001 7FF00000_00000001 7FFF_80000000_00000001
+#SNAN positive signaling not a number 7F800001 7FF00000_00000001 7FFF_80000000_00000001
-#SNAN negative signaling not a number FF800001 FFF00000_00000001 FFFF_80000000_00000001
Names of special constants are case insensitive. If the sign + or - is used, it is unseparable. Examples:
FourNans DY 4 * QWORD #NaN ; Define vector of four double-precision not-a-number FP values.
MOV ESI,=8*Q#ZERO ; Define 8*8 zero bytes in literal section and set ESI to point at them.
↑ Character constants
A number can also be written as a character constant , which is a string containing not more than eight characters. Its numeric value is taken from ordinal number of each
character in the ASCII table. Example of character constants and their values:
'0' = 30h = 48
'abc' = 636261h = 6513249
"4%%" = 2534h = 9524
A character with the least significant value is on the left position in the string.
Assemblers are not united in character constants treatment. MASM and TASM use scriptual convention where the order of characters in the written source code corresponds with the way we write
numbers: least significant digit is on the right side.
€ASM as well as other newer assemblers use the memory convention where the order of characters in the written source code corresponds with the order how they are stored in memory on little
endian architecture processors.
| | ; MASM and TASM:
|00000000:616263 | DB 'abc' ; String.
|00000003:63626100 | DD 'abc' ; Character constant.
|00000007:B863626100 | MOV EAX,'abc' ; AL='c'.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
|00000003:61626300 | DD 'abc' ; Character constant.
|00000007:B861626300 | MOV EAX,'abc' ; AL='a'.
↑ Enumerated values
Some operands may acquire only one of the few predefined values, e.g. the EUROASM option CPU= may be 086, 186, 286, 386, 486, 586, 686, PENTIUM, P6, X64 .
Although some enumerated values may look like a number, they are not countable, they merely represent a position in a predefined collection.
↑ Boolean values
Any number can be interpreted as a boolean (logical) value, too. Boolean values can acquire one of the two states: false or true. Number 0 is treated as boolean false in logical
expression, any nonzero number is treated as true.
↑ Boolean extended values
All built-in €ASM boolean options have an extended repertoire of possible values. Those boolean values accept
Extended boolean enumeration is used only with operands built in the €ASM. They are not symbols that could be used elsewhere, such as MOV EAX,TRUE . To achieve similar
functionality in macros, the programmer would have to define such symbols first, e.g.
FALSE EQU 0
false EQU 0
TRUE EQU -1
true EQU !false
MOV EAX,TRUE
When an extended Boolean value is used as the macro keyword operand, it can be also tested in the macro body with %IF, %WHILE, %UNTIL , for instance
Now we may invoke the macro as MacroWithBool Bool=Enable , MacroWithBool Bool=No etc.
Extended enumerated Boolean values are not allowed in logical expressions
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
MacroWithBool %MACRO Bool=0
%IF ! %Bool
; Do someting when Bool is set to FALSE.
%ENDIF
%ENDMACRO MacroWithBool
The previous example would not work with extended Boolean values, for instance MacroWithBool Bool=False will complain that E6601 Symbol "False" was not found. . However,
reversing the logic should work well:
MacroWithBool %MACRO Bool=0
%IF %Bool
%ELSE
; Do someting when Bool is set to FALSE.
%ENDIF
%ENDMACRO MacroWithBool
↑ Strings
A string is a set of arbitrary characters enclosed in quotes. Either double " or single quotes ' (also called apostrophes) may be used to mark the borders of a string. The
surrounding quotes do not count into the string contents. All characters within the string lose their semantic significance, with three exceptions:
1. EOL cannot be used in strings. In other words, each portion of quoted "string data" must fit to one physical line. Definition of long strings can be split, e.g.
|0000:5468697320697320 |MultilineString: DB "This is the first line",13,10, \
|0008:7468652066697273~| "and this is the second one.",13,10,0
|0036: |
2. The same quote character which is used to surround the string cannot be used inside, unless it is doubled, e.g.
|0000:4F27427269656E00 |Surname: DB 'O''Brien',0
|0008: |
3. The percent sign % keeps its function of a %variable prefix. Use two adjacent percent signs when a single % is required in a string, e.g.
|0000:313030252073617665642E00 |Status: DB "100%% saved.",0
|000C: |
No escape character is employed in €ASM, in fact the percent sign and quote escape themselves. If you need to use any of the above mentioned characters within a string, they
must be doubled. This duplication (self-escaping) concerns only the notation in the source text and it does not increase the final string size in emitted computer memory.
Strings enclosed in 'single quotes' and "double quotes" are equivalent with a single exception: if the contents of a string is a filename, only double quotes may be used, because the
apostrophe is a valid character when used in filenames on most filesystems. More examples of string definitions:
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
|### E6721 Invalid data expression ""It ain't necessarilly so'".
|0011: |
↑ Addressing space
The processor, otherwise known as Central Processing Unit (CPU), operates with data and communicates with its environment (registers, memory and devices). A typical
operation reads a piece of information from a register, memory or port (I/O device), makes some manipulation with the data and writes it back to the environment. The least
addressable unit is a single byte (1 B) and their number is limited by the addressing space . A register is identified by its name, a device is identified by its port number, a byte in
memory is identified by its address.
CPU addressing space
CPU mode GPR I/O port Memory addressing
16-bit 8* 2 B 64 KB (216) 1 MB (216+4)
32-bit 8* 4 B 64 KB (216) 4 GB (232)
64-bit 16* 8 B 64 KB (216) 16384 PB (264)
↑ Addresses
Addressing space is limited by the CPU architecture and by the number of wires connecting addressing pins between the CPU and the memory chips. A combination of logical zeros
and ones, which can be measured on those wires, is called physical address (PhA).
From an application programmer's point of view, the processor writes or reads from virtual address (VA). If the memory segmentation is not taken into account, virtual address is
sometimes called linear address (LA). As a matter of historical fact both virtual and physical address were identical only in first generations of processors operating in real mode
without memory cache and memory paging.
The objects in the linked image of a protected-mode program are often addressed with an offset from the beginning of an image loaded in memory (from the ImageBase). Such offset
is called relative virtual address (RVA).
And similary, the position of the data items in file formats are sometimes identified with file address (FA), that is defined as the distance between start of the file and the actual data
item position in this file.
Address is a symbolic representation of some position in memory.
PhA, VA, LA, RVA, FA are integer non-negative plain numbers, but addressing objects or data at assembly-time is rather more complicated. From historical reasons, the addressing
space is divided into segments of memory and each segment is identified by the contents of a segment register. An address at assembly-time is expressed as number of bytes off,
(hence the name offset) between the position and the start of its segment, and the segment identification. See also the chapters Address symbols and Address expressions.
↑ Alignment
Data and code are retrieved from memory faster when their address is aligned, which means that is rounded to a value which in turn is a multiple of power of two. Even though most
of IA-32 CPU instructions can cope with unaligned data, it takes more time as the data read from memory are not in the same cache page and the CPU may need to shift the
information internally during the fetch-time.
For the best performance, memory variables should be aligned to their natural alignment which corresponds with their size, see the Autoalign column in Data types table.
Doublewords, for instance, have autoalign value 4, which says that the last two bits of a properly aligned address should be zero. QWORD are aligned to 8, therefore the last three
bits (8=23) should be zero.
This alignment can be achieved explicitly with ALIGN pseudoinstruction, or with the ALIGN= keyword given in machine instruction or in PROC and PROC1 pseudoinstructions.
Memory variables are being aligned by €ASM implicitly when the EUROASM option AUTOALIGN=ON is set. For instance the statement SomeDword: DD 1234 is autoaligned by 4
(offset of SomeDword can be divided by 4 without a remainder). An important concept is the alignment stuff, which fills the space in front of the aligned instruction. It is zero 0x00 in
data segments and NOP 0x90 or multibyte NOP in code segments.
The align value may be a numeric expression which evaluates to 1, 2, 4, 8 or a higher power of two. €ASM accepts without warning a zero or an empty value, too, which is identical
to ALIGN=1 (it has no effect). Beside the numeric values ALIGN also accepts the enumerated values BYTE, WORD, DWORD, QWORD, OWORD, YWORD, ZWORD or their short
versions B, W, D, Q, O, Y, Z .
Alignment is always limited by the alignment of the segment on which the statement lies in. If the current segment is DWORD aligned, we cannot ask for a QWORD or an OWORD
alignment in this segment. The default segment alignment is OWORD (10h) in €ASM and it is increased to SectionAlign (usually by 1000h) when the assembled program is in ELF or
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
PE/DLL format.
Beside the instruction modifier ALIGN= the alignment may also be established with the explicit ALIGN pseudoinstruction, which allows for intentional disalignment, too.
↑ Registers
Register is a small and fast variable with fixed-size located on the CPU chip.
Though a register remembers information written to it, it is not a part of the addressable memory. Registers can be referenced by their names only, they have no address.
Registers table
Family REGTYPE# Members Size
AL, AH, BL, BH, CL, CH, DL, DH,
GPR 8-bit 'B' DIB, SIB, BPB, SPB, R8B, R9B, R10B, R11B, R12B, R13B, R14B, R15B 1
DIL, SIL, BPL, SPL, R8L, R9L, R10L, R11L, R12L, R13L, R14L, R15L
GPR 16-
'W' AX, BX, CX, DX, BP, SP, SI, DI, R8W, R9W, R10W, R11W, R12W, R13W, R14W, R15W 2
bit
GPR 32-
'D' EAX, EBX, ECX, EDX, EBP, ESP, ESI, EDI, R8D, R9D, R10D, R11D, R12D, R13D, R14D, R15D 4
bit
GPR 64-
'Q' RAX, RBX, RCX, RDX, RBP, RSP, RSI, RDI, R8, R9, R10, R11, R12, R13, R14, R15 8
bit
Segment 'S' CS, SS, DS, ES, FS, GS 2
FPU 'F' ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7 10
MMX 'M' MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7 8
XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM16,
XMM 'X' 16
XMM17, XMM18, XMM19, XMM20, XMM21, XMM22, XMM23, XMM24, XMM25, XMM26, XMM27, XMM28, XMM29, XMM30, XMM31
YMM0, YMM1, YMM2, YMM3, YMM4, YMM5, YMM6, YMM7, YMM8, YMM9, YMM10, YMM11, YMM12, YMM13, YMM14, YMM15, YMM16,
AVX 'Y' 32
YMM17, YMM18, YMM19, YMM20, YMM21, YMM22, YMM23, YMM24, YMM25, YMM26, YMM27, YMM28, YMM29, YMM30, YMM31
ZMM0, ZMM1, ZMM2, ZMM3, ZMM4, ZMM5, ZMM6, ZMM7, ZMM8, ZMM9, ZMM10, ZMM11, ZMM12, ZMM13, ZMM14, ZMM15, ZMM16,
AVX-512 'Z' 64
ZMM17, ZMM18, ZMM19, ZMM20, ZMM21, ZMM22, ZMM23, ZMM24, ZMM25, ZMM26, ZMM27, ZMM28, ZMM29, ZMM30, ZMM31
Mask 'K' K0. K1, K2. K3, K4, K5, K6, K7 8
Bound 'N' BND0, BND1, BND2, BND3 16
Control 'C' CR0, CR2, CR3, CR4, CR8 4
Debug 'E' DR0, DR1, DR2, DR3, DR6, DR7 4
Test 'T' TR3, TR4, TR5 4
Register names are case insensitive. General Purpose Registers (GPR) are aliased, for instance AL is another name for the lower half of AX, which is the lower half of EAX, which
is the lower half of RAX.
Similary, SIMD (AVX) registers are aliased as well: XMM0 is another name for the lower half of YMM0, which is the lower half of ZMM0.
Names of 8-bit registers DIB, SIB, BPB, SPB, R8B..R15B are aliases for the least significant byte of RDI, RSI, RBP, RSP, R8..R15. They may also be referred as DIL, SIL, BPL, SPL,
R8L..R15L, as used in Intel manual. €ASM supports both suffixes ~L and ~B. Those registers are available in 64-bit mode only.
Some other assemblers and Intel manuals use notation ST(0), ST(1)..ST(7) for Floating-Point Unit register names, but this syntax is not accepted in €ASM. Neither can be ST0
register aliased with ST (top of the FPU stack).
Processor x86 contains some other registers which hold flags, descriptor tables, FPU control and status registers, but they are not listed in the table above because they are not
directly accessible by their name.
↑ Condition codes
General condition codes ↓
SSE condition codes ↓
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
The result of some CPU operations is treated as a predicate with mnemonic shortcut that can be used as a part of instruction name.
↑ General condition codes
Some combinations of CPU flags ZF, CF, OF, SF, PF are given special names, so called condition codes . They are used in mnemonic of conditional branching using the jump
instructions or in bit-manipulation general-purpose instructions.
Inverted code can be used in macroinstructions to bypass region of code when the condition is not met. See the automatic %variable inverted condition code.
General condition codes table
Num. Mnemonic Inverted
Alias Description Condition
value code mnem.code
0x4 E Z Equal ZF=1 NE
0x5 NE NZ Not Equal ZF=0 E
0x4 Z E Zero ZF=1 NZ
0x5 NZ NE Not Zero ZF=0 Z
0x2 C B Carry CF=1 NC
0x3 NC NB Not Carry CF=0 C
0x2 B C Borrow CF=1 NB
0x3 NB NC Not Borrow CF=0 B
0x0 O Overflow OF=1 NO
0x1 NO Not Overflow OF=0 O
0x8 S Sign SF=1 NS
0x9 NS Not Sign SF=0 S
0xA P PE Parity PF=1 NP
0xB NP PO Not Parity PF=0 P
0xA PE P Parity Even PF=1 PO
0xB PO NP Parity Odd PF=0 PE
0x7 A NBE Above CF=0 && ZF=0 NA
0x6 NA BE Not Above CF=1 || ZF=1 A
0x3 AE NB Above or Equal CF=0 NAE
0x2 NAE B Not Above nor Equal CF=1 AE
0x2 B NAE Below CF=1 NB
0x3 NB AE Not Below CF=0 B
0x6 BE NA Below or Equal CF=1 || ZF=1 NBE
0x7 NBE A Not Below nor Equal CF=0 && ZF=0 BE
0xF G NLE Greater SF=OF && ZF=0 NG
0xE NG LE Not Greater SF<>OF || ZF=1 G
0xD GE NL Greater or Equal SF=OF NGE
0xC NGE L Not Greater nor Equal SF<>OF GE
0xC L NGE Less SF<>OF NL
0xD NL GE Not Less SF=OF L
0xE LE NG Less or Equal SF<>OF || ZF=1 NLE
0xF NLE G Not Less nor Equal SF=OF && ZF=0 LE
CXZ CX register is Zero CX=0
ECXZ ECX register is Zero ECX=0
RCXZ RCX register is Zero RCX=0
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
↑ SSE condition codes
Streaming Single Instruction Multiple Data Extension instructions (V)CMPccSS,(V)CMPccSD,(V)CMPccPS,(V)CMPccPD use different set of condition codes cc.
Only aliased mnemonic code is documented for legacy instructions CMPccSS,CMPccSD,CMPccPS,CMPccPD.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Combination of punctuation characters is used in €ASM to prescribe various operations with numbers, addresses, strings and registers in the assembly process. Placing a binary
operator between the two numbers tells €ASM to replace these three elements with the result of operation. Some operators are unary , they modify the value of operand which
they stand in front of.
All operations implemented in €ASM are presented in the following table.
Operation table
Left Right
Operation Priority Properties
operand
Operator
operand
Result II (6)
Membership 16 binary noncomm. (1) identifier . identifier identifier
Attribute 15 unary noncomm. (3) attr# element number or address
Case-insens. Equal 14 binary commutative (2) string == string boolean CMPS
Case-sens. Equal 14 binary commutative string === string boolean CMPS
Case-insens. Nonequal 14 binary commutative (2) string !== string boolean CMPS
Case-sens. Nonequal 14 binary commutative string !=== string boolean CMPS
Plus 13 unary (3) + number numeric NOP
Minus 13 unary (3) - number numeric NEG
Shift Logical Left 12 binary noncommutative number << number numeric SHL
Shift Arithmetic Left 12 binary noncommutative number #<< number numeric SAL
Shift Logical Right 12 binary noncommutative number >> number numeric SHR
Shift Arithmetic Right 12 binary noncommutative number #>> number numeric SAR
Signed Division 11 binary noncommutative number #/ number numeric IDIV
Division 11 binary noncommutative number / number numeric DIV
Signed Modulo 11 binary noncommutative number #\ number numeric IDIV
Modulo 11 binary noncommutative number \ number numeric DIV
Signed Multiplication 11 binary commutative number #* number numeric IMUL
Multiplication 11 binary commutative number * number numeric MUL
Scaling 10 binary commutative (5) number * register address expression
Addition 9 binary commutative number + number numeric ADD
Subtraction 9 binary noncommutative number - number numeric SUB
Indexing 9 binary commutative (5) number + register address expression
Bitwise NOT 8 unary (3) ~ number numeric NOT
Bitwise AND 7 binary commutative number & number numeric AND
Bitwise OR 6 binary commutative number | number numeric OR
Bitwise XOR 6 binary commutative number ^ number numeric XOR
Above 5 binary noncommutative number > number boolean JA
Greater 5 binary noncommutative number #> number boolean JG
Below 5 binary noncommutative number < number boolean JB
Lower 5 binary noncommutative number #< number boolean JL
Above or Equal 5 binary noncommutative number >= number boolean JAE
Greater or Equal 5 binary noncommutative number #>= number boolean JGE
Below or Equal 5 binary noncommutative number <= number boolean JBE
Lower or Equal 5 binary noncommutative number #<= number boolean JLE
Numeric Equal 5 binary commutative number = number boolean JE
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Numeric Nonequal 5 binary commutative (4) number != or <> number boolean JNE
Logical NOT 4 unary (3) ! number boolean NOT
Logical AND 3 binary commutative number && number boolean AND
Logical OR 2 binary commutative number || number boolean OR
Logical XOR 2 binary commutative number ^^ number boolean XOR
Segment separation 1 binary noncommutative number : number address expression
Data duplication 0 binary noncomm. (1) (5) number * datatype data expression
Range 0 binary noncomm. (1) number .. number range
Substring 0 binary noncomm. (1) text [ ] range text
Sublist 0 binary noncomm. (1) text { } range text
(1)
Special operations Membership, Duplication, Range, Substring, Sublist are solved at parser level rather than by the €ASM expression evaluator. They are listed here only for
completeness.
(2)
Case insensitive string-compare operations ignore the character case of letters A..Z but not the case of accented national letters above ASCII 127.
(3)
Unary operator applies to the following operand. Binary operators work with two operands. Attribute operator applies to the following element or expression in
parenthesis/brackets.
(4)
Numeric Nonequal operation has two aliased operators != and <> . You can choose whichever you like.
(5)
Operation Multiplication, Scaling and Duplication share the same operator * . Similary Addition and Indexing share operator + . The actual operation is determined by the
operands types.
(6)
Column II illustrates which equivalent machine instruction is used internally to compute the operation at assembly-time.
The commutative property specifies whether both operands of a binary operation can be exchanged without having impact to the result.
Priority column specifies the order of processing operators. Higher priority operations compute sooner but this can be changed with priority parenthesis ( ) . Operation with equal
priority compute in their notation order (from left to right).
Operations which calculate with signed integers have the operator prefixed with # . Operations Addition and Subtraction do not need a special "#signed" version because they
compute with signed and unsigned integer numbers in the same way.
Both numeric and boolean operations return 64-bit number. In case of boolean operations the result number has one of the two possible values: 0 (FALSE) or
-1 = 0xFFFF_FFFF_FFFF_FFFF (TRUE). For example the expression
'+' & %1 #>= 0 | '-' & %1 #< 0 is evaluated as
('+' & (%1 #>= 0)) | ('-' & (%1 #< 0)) and its result is the minus sign (45) if %1 is negative and plus sign (43) otherwise.
Spaces which separate operands and operators in expression examples serve only for better readability and they are not required by €ASM syntax.
Rich set of operators allows €ASM to get rid of cloned pseudoinstructions such as IFE, IFB, IFIDN, IFIDNI, IFDIF, ERRIDNI, ERRNB...
The Shift operators family is given higher priority than in other languages because I treat shifts as a special kind of multiplication/division.
NASM evaluates the expression 4+3<<2 as (4+3)<<2 = 28 but in €ASM it is evaluated as 4+(3<<2) = 16) .
↑ Expressions
Numeric and logical expressions ↓
Address expressions ↓
Register expressions ↓
Data expressions ↓
Special expressions ↓
Expression is a combination of operands, operators and priority parenthesis () which follows the rules in the table below.
Syntax of expression
What may follow left parenthesis unary operator operand binary operator right parenthesis end of expression
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
beginning of expression yes yes yes no no yes (2)
left parenthesis yes yes yes no (2) no
yes
unary operator yes no yes no no no
operand no no no yes yes yes
binary operator yes yes (1) yes no no no
right parenthesis no no no yes yes yes
(1)
Unary operator is permitted after the binary operation, e.g. 5*-3 evaluates as 5*(-3) .
(2)
Empty expression, empty parenthesis contents and superabundant parenthesis are valid.
The table shows which combinations are permitted. It should be read by rows, for instance the first line stipulates that expression may begin with the left parenthesis, unary operator
or an operand.
Expression is parsed into elementar unary and binary operations, which are calculated according to the priority. Operations with the same priority are computed from left to right.
Priority can be increased using parenthesis ( ) .
↑ Numeric and logical expressions
String compare ↓
Numeric compare ↓
Numeric arithmetic ↓
Shift ↓
Bitwise arithmetic ↓
Boolean algebra ↓
Numeric operations calculate internally with 64-bit integers, no matter if the target program is intended to run in 64-bit mode or not.
Result of the numeric or logical expression is a scalar 64-bit numeric value (signed integer). It may be treated as a number or as a logical value. Zero result is treated as boolean
false and any nonzero result is boolean true . Pure logical expressions, such as logical NOT, AND, OR, XOR and all compare operations return 0 when false and
0xFFFF_FFFF_FFFF_FFFF = -1 when true. This enables to use the result of logical expression in subsequent bitwise operations with all bits.
↑ String compare
String compare expressions return a boolean value. Case insensitive versions convert both strings to the same case before actual comparing; however this concerns ASCII letters
A..Z only. National letters with accents in any codepage are always compared case sensitively.
String compare is given the highest priority since no other assembly-time operation can be performed with strings beside the test of equality. At assembly time €ASM cannot tell
which string is "bigger".
|00000000:FFFFFFFFFFFFFFFF | DQ "EAX" == "eax" ; TRUE, the strings are equal.
|00000008:0000000000000000 | DQ "EAX" === "eax" ; FALSE, the strings differ in character case.
|00000010:FFFFFFFFFFFFFFFF | DQ "I'm OK." === 'I''m OK.' ; TRUE, their netto value is equal.
|00000018:0000000000000000 | DQ "Müller" == "MÜLLER" ; FALSE because of the different case of umlauted U's.
|00000020:0000000000000000 | DQ "012" == "12" ; FALSE, the strings are not equal.
|00000028:0000000000000000 | DQ "123" = 123 ; FALSE; the character constant "123"=3355185 which is not 123.
|00000030: | DQ "123" == 123 ; Syntax error; right operand is not a string.
|### E6321 String compare InsensEqual with non-string operand in expression ""123" == 123".
|00000030:
Case insensitive string compare should be used with built-in €ASM elements, such as register or datatype names , e.g.
%IF '%1' !== 'ECX'
%ERROR Only register ECX is expected as the first macro operand.
%ENDIF
When we are investigating the presence of punctuation, it's better to use case-sensitive compare, because it assembles faster (€ASM doesn't have to convert both sides to a
common character case):
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
DoSomethingWithMemoryVar %MACRO
%IF '%1[1]' !=== '[' ; Test if the 1st operand begins with a square bracket.
%ERROR The first operand should be a memory variable in [brackets].
%ENDIF
%ENDMACRO DoSomethingWithMemoryVar
The test on square bracket in previous example fails if the macro operand is a string or character-constant in quotes, e.g. DoSomethingWithMemoryVar 'xyz' . The string compare
operation will raise E6101 Expression "''' !=== '" is followed by unexpected character "[". because of syntax error. A trick how to avoid E6101 is to compare doubled values.
In this case both single or double quotes escape themselves:
DoSomethingWithMemoryVar %MACRO
%IF '%1[1]%1[1]' !=== '[[' ; Test if the 1st operand begins with a square bracket.
%ERROR The first operand should be a memory variable in [brackets].
%ENDIF
↑ Numeric compare
The numeric compare operations use a single equal sign = , optionally combined with < or > and they can compare values of two plain numbers or offsets of two addresses within
the same segment.
Numeric compare can be used to test which side of operation is bigger. Terms above/below are used when comparing unsigned numbers or addresses. Terms greater/lower are
used for comparing signed numbers. Operators which treat numbers as signed are prefixed with # modifier. Virtual addresses are always unsigned, therefore we cannot ask whether
they are greater or lower.
↑ Numeric arithmetic
Common arithmetic operations are Addition, Subtraction, Multiplication, Division and Modulo (remainder after division).
Unary minus may be applied to scalar numeric operand only. Unary plus does not change the value of operand; it is included in the operator set only for completeness. Adjacent
binary and unary numeric operator is accepted by €ASM, however weird this may seem. This is useful in evalution expressions with substituted value, such as 5 + %1 where the
symbolic argument %1 happens to be negative, e. g. -2 . This expression is calculated as 5 + %1 -> 5 + -2 -> 5 + (-2) -> 3 .
The greatest permitted value of integer number in €ASM source is 0xFFFF_FFFF_FFFF_FFFF -> 18_446_744_073_709_551_615 as unsigned, or
0x7FFF_FFFF_FFFF_FFFF -> 9_223_372_036_854_775_808 as signed. Overflow at assembly time is ignored in Addition, Subtraction and Shift Logical operation. Assembly error is
reported when overflow occurs during Multiplication and Shift Arithmetic Left operation, or when division-by-zero happens during Division or Modulo operation. This maximum must
not be exceeded even in intermediate results during the evaluation, such as 0x7FFF_FFFF_FFFF_FFFF * 2 / 2 (€ASM reports error). However, rearranged code
0x7FFF_FFFF_FFFF_FFFFF * (2 / 2) assembles well.
No overflow is reported in following examples of numeric expressions evaluation:
|00000000:0E00000000000000 | DQ 2 + 3 * 4 ; Result is 14.
|00000008:0200000000000000 | DQ 0xFFFF_FFFF_FFFF_FFF9 + 0x0000_0000_0000_0009 ; Result is 2.
|00000010:0200000000000000 | DQ -7 + 9 ; Result is 2 (0xFFFF_FFFF_FFFF_FFF9 + 0x0000_0000_0000_0009).
|00000018:0200010000000000 | DQ 0xFFF9 + 0x0009 ; Result is 65538 (0x0000_0000_0000_FFF9 + 0x0000_0000_0000_0009).
|00000020: |
€ASM calculates with the integer truncated division and with [Modulo] at assembly-time in the same way as machine instruction IDIV.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Before the signed division applies, both divident and divisor are internally converted to positive numbers. Then, having been divided as unsigned, the quotient is converted to
negative if one of the operands (but not both) was negative.
Remainder in signed modulo operation is converted to negative only when the divident was negative.
↑ Shift
The shift operations are not commutative. Operand on the left side is treated as a 64-bit integer and shifted to the left or right by the number of bits specified by the operand on the
right side.
Shift operations at assembly time are given higher priority than other numeric operation because they correspond with computing power of 2 rather than with multiplication or division.
For instance 1 << 7 is equivalent to 1 * 27 .
NASM evaluates the expression 4 + 3 << 2 as (4 + 3) << 2 -> 28 , but in €ASM it is evaluated as 4 + (3 << 2) -> 16 .
Bits which enter the least significant bit (LSb) during Shift Left operation are always 0. Bits which enter the most significant bit (MSb) during Shift Right operation are either 0 (Shift
Logical Right), or they copy their previous value (Shift Arithmetic Right), thus preserving the sign of operand.
Bits which leave LSb during Shift Right are discarded. Bits which leave MSb during Shift Left are discarded, too, but overflow error E6311 is reported by €ASM when the sign of result
(kept in MSb) has changed during Shift Arithmetic Left. Overflow sensitivity is the only difference between Shift Arithmetic Left and Shift Logical Left.
The right operand may be arbitrary number; however when it is greater than 64, the result is 0 with one exception: negative number shifted arithmetic right by more than 64 bit results
in 0xFFFF_FFFF_FFFF_FFFF -> -1 .
Shift by 0 bits does nothing. Shift by a negative number just reverses the direction of actual shift from left to right and vice versa.
Assembly-time rotate operations are not supported.
|00000000:0000010000000000 | DQ 1 << 16 ; The result is 65536.
|00000008:F4FFFFFFFFFFFFFF | DQ -3 #<< 2 ; The result is -12.
|00000010:8078675645342312 | DQ 0x1122_3344_5566_7788 << 4 ; The result is 0x1223_3445_5667_7880.
|00000018:98A9BACBDCEDFE0F | DQ 0xFFEE_DDCC_BBAA_9988 >> 4 ; The result is 0x0FFE_EDDC_CBBA_A998.
|00000020:98A9BACBDCEDFEFF | DQ 0xFFEE_DDCC_BBAA_9988 #>> 4 ; The result is 0xFFFE_EDDC_CBBA_A998.
|00000028:0000000000000000 | DQ 0x8000_0000_0000_0000 << 1 ; The result is 0x0000_0000_0000_0000.
|00000030: | DQ 0x8000_0000_0000_0000 #<< 1 ; Overflow, MSb would have been changed.
|### E6311 ShiftArithmeticLeft 64-bit overflow in "0x8000_0000_0000_0000 #<< 1".
|00000030: |
↑ Bitwise arithmetic
Bitwise NOT, AND, OR, XOR perform logical operation with the whole operands bit per bit.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
|0000:FA | DB ~ 5 ; ~ 0000_0101b is 1111_1010b which is -6.
|0001:04 | DB 5 & 12 ; 0000_0101b & 0000_1100b is 0000_0100b which is 4.
|0002:0D | DB 5 | 12 ; 0000_0101b | 0000_1100b is 0000_1101b which is 13.
|0003:09 | DB 5 ^ 12 ; 0000_0101b ^ 0000_1100b is 0000_1001b which is 9.
↑ Boolean algebra
Logical NOT, AND, OR, XOR operate with the numbers as well as with the boolean values.
Each operand, which is internally stored as a nonzero 64-bit number, is converted to boolean true ( 0xFFFF_FFFF_FFFF_FFFF ) before the actual logical operation.
Operand with the value 0 is treated as false.
|0000:FF | DB 3 && 4 ; 0000_0011b && 0000_0100b is TRUE && TRUE (both operands are non-zero) which is TRUE.
|0001:00 | DB 3 & 4 ; 0000_0011b & 0000_0100b have no common bit set, result is 0000_0000b, which is FALSE.
↑ Address expressions
Numeric expressions operate with immediate numeric values, such as 1, 0x23, '4567' or with symbols representing such scalar numeric value, such as
NumericSymbolTen EQU 10 . On the other hand, most symbols in a real assembler program represent address value which points to some data in memory or to some position in the
program code.
While a plain number (scalar) is internally stored by €ASM in eight bytes, an address needs additional room to keep information of the segment it belongs to.
Imagine yourself driving a car. You're passing the milestone 123 on a highway when some friends of yours ring you up that they're passing the milestone 97 . How far are you from one another? The
answer is as easy as subtracting only when you are both driving on the same highway.
The set of operations defined with address symbols is very limited in comparison with numeric expressions. They cannot be multiplied, divided, shifted, logically operated. Only two
kind of operations are allowed with addresses:
1. A scalar numeric value may be added to the address symbol or substracted from it. The result is address symbol again; this operation affects the offset part of address;
segment part remains intact.
2. Two symbols may be subtracted from one another (or compared with one another) if they both belong to the same segment. The result is a scalar numeric value calculated as
the difference of their offsets.
↑ Register expressions
Memory variables are addressed as the offset from the first byte of used memory segment ( displacement ) which may be updated at run-time with the contents of one or two
registers. Notation of such address is called register expression or memory address expression .
Unlike instructions with immediate number embedded in the instruction code, such as ADD EAX,1234 , machine instructions which load|store data somewhere from|to memory, must
have the entire operand enclosed in brackets [ ] . For instance ADD EAX,[1234] , where 1234 is offset of dword variable in data segment where the addend is loaded from.
MASM allows to omit square brackets even when the operand is a variable defined in memory, for instance ADD EAX,Something . A poor reader of MASM program has to search for the definition of
the variable to learn whether it was defined in memory ( Something DD 1 ) or if it was defined as a constant ( Something EQU 1 ). Newer assemblers abandoned this design flaw, luckily.
When the address expression is used in machine instruction, it may be completed with registry names; it becomes register address expression . Complete address expression
follows the schema
segment: base + scale * index + displacement where
segment is segment register CS, DS, ES, SS, FS, GS ,
base is BX, BP in 16-bit addressing mode, or EAX, EBX, ECX, EDX, EBP, ESP, ESI, EDI, R8D..R15D in 32-bit addressing mode, or
RAX, RBX, RCX, RDX, RBP, RSP, RSI, RDI, R8..R15 in 64-bit addressing mode,
scale is a numeric expression which evaluates to a scalar number 0, 1, 2, 4 or 8 ,
index is SI, DI in 16-bit addressing mode, or EAX, EBX, ECX, EDX, EBP, ESI, EDI, R8D..R15D in 32-bit addressing mode, or RAX, RBX, RCX, RDX, RBP, RSI, RDI, R8..R15 in
64-bit addressing mode,
displacement is an address or numeric expression with magnitude (width) not exceeding the addressing mode.
Some assemblers allow different syntax of memory addressing, for instance MOV EAX,Displ[ESI] , MOV EAX,dword ptr [Displ+ESI] , MOV EAX,Displ+[4*ESI] , MOV EAX,Displ+4*[ESI]+[EBX] .
EuroAssembler requires that the whole operand is surrounded in square brackets: MOV EAX,[Disp+4*ESI+EBX] .
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
The order of components in addressing expression is arbitrary. Any portion of register address expression may be omitted.
Scale is not permitted in 16-bit addressing mode and scale cannot be used if indexregister is not specified.
ESP and RSP cannot be used as index register (they cannot be scaled).
Addressing modes of different sizes cannot be mixed in the same instruction, e. g. [EBX+SI] .
16-bit addressing mode is not available in 64-bit CPU mode.
Registers allowed in addressing modes
16-bit addressing mode in 16-bit and 32-bit segment
base register BX SS:BP
index register SI DI
displacement 16-bit signed integer, sign-extended to segment's width at run-time
32-bit addressing mode in 16-bit and 32-bit segment
base register EAX EBX ECX EDX ESI EDI SS:EBP SS:ESP
index register EAX EBX ECX EDX ESI EDI EBP
displacement 32-bit signed integer, sign-extended|truncated to segment's width at run-time
32-bit addressing mode in 64-bit segment
base register EAX EBX ECX EDX ESI EDI SS:EBP SS:ESP R8D..R15D
index register EAX EBX ECX EDX ESI EDI EBP R8D..R15D
displacement 32-bit signed integer, sign-extended to segment's width at run-time
64-bit addressing mode in 64-bit segment
base register RAX RBX RCX RDX RSI RDI SS:RBP SS:RSP R8..R15
index register RAX RBX RCX RDX RSI RDI RBP R8..R15
displacement 32-bit signed integer, sign-extended to segment's width at run-time
MOFFS addressing mode in 16-bit, 32-bit and 64-bit segment
base register none
index register none
displacement unsigned integer of segment's width (16|32|64 bits)
When the segment register is not explicitly specified, a default segment is used for addressing the operand. If BP, EBP, RBP, ESP or RSP is used as a baseregister, the default
segment is SS , otherwise it is DS . Nondefault segment register used for data retrieving may be specified either as an explicit instruction prefix
SEGCS SEGDS SEGES SEGSS SEGFS SEGGS , or as a segment register which becomes part of the register expression (implicit segment override). The segment register may be included
in expression either with colon : (segment separator) or with plus + (indexing operator):
There is a subtle difference between implicit and explicit segment override: if it requests the same segment register which is already used as a default, €ASM emits the prefix only
when it is specified explicitly (in the prefix field of the statement):
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
We don't have to bother with implicit segment selection in 32-bit and 64-bit FLAT model programs, because both SS and DS are loaded with the same segment descriptor at load-time.
Although the operators * or + in register address expression look like an ordinary multiplication or addition, they specify a very different kind of operation called Scaling or Indexing when applied to
a register. The actual multiplication or addition is performed at run-time rather than at assembly-time, because the assembler cannot know the contents of registers.
Indexing operation has lower priority than the corresponding Multiplication. Hence, the register expression [EBX + 5 + ESI * 2 * 2] is evaluated as
[EBX + 5 + ESI * (2 * 2)] -> [EBX + 5 + ESI * 4] .
↑ Data expressions
Data expression specifies static data declared with pseudoinstruction D or with literals. Format of data expression is
duplicator * type value , where duplicator is a non-negative integer number, type is primitive data type in full BYTE UNICHAR WORD DWORD QWORD TBYTE OWORD YWORD ZWORD INSTR or
short B U W D Q T S O Y Z I notation, or a structure name. Optional value defines the contents of data which is repeated duplicator times.
Duplication is not a commutative operation; duplicator must be on the left side of duplication operator * . Default duplicator value is 1 (the data is not duplicated). Nested duplication
is not supported in €ASM. Priority of duplication is very low, so the data expression 2 + 3 * B 4 is evaluated as five bytes where each contains the value 4. Example:
See also pseudoinstruction D and tests t2480, t2481, t2482 for more examples.
↑ Special expressions
Membership ↓
Range ↓
Substring ↓
Sublist ↓
The remaining expression are not calculated with mathematical expression evaluator; they are evaluated by the parser.
↑ Membership
The fullstop alias the point . which joins two identifiers will make them a fully qualified name (FQN), which looks like a namespace identificator followed by the local name. FQN is
nonlocal, it never starts with fullstop. For instance, when a local symbol .bar is declared in a procedure or structure Foo , it is treated by €ASM as symbol with FQN Foo.bar .
Namespace can be local, too, so the membership operation can nest.
↑ Range
Range is defined as two numeric expressions separated with range operator, which is .. (two adjacent fullstops) and it represents the set of integer numbers between those
values, including the first and the last value.
A range has the property slope , which can be negative, zero or positive. Slope is defined as the sign of the difference between the right and the left value. Examples:
↑ Substring
Substring is an operation which returns only part of the input text. Substring operator is a range enclosed in a pair of square brackets [] . The text is treated as a sequence of 8-bit
characters (bytes) and the range specifies which of them are used.
↑ Sublist
Sublist operation is similar to Substring with the difference that curly brackets {} are used instead of braces and that it treats the input text as an array of comma-separated items (in
case of %variable expansion), or as a sequence of physical lines (in case of file inclusion).
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
INCLUDE "MySource.asm"{1..10} ; Include the first ten lines of file "MySource.asm"
When applied to files, the file name must always be specified in double quotes.
Character and items are 1-based, the first suboperable member (character/item/line) has number 1.
Number of the last suboperable member is automatically assigned to a special variable %& .
Ordinal number of the last character|item|line of input text is assigned by €ASM to an automatic preprocessing variable with the name %& . This %variable is valid only in the
suboperation, it cannot be used outside the braces.
You can use pseudoinstruction %SETS to get the number of characters assigned to a %variable, or pseudoinstruction %SETL to get the number of items in it (array length).
You can use attribute operator FILESIZE# to get the number of bytes in a file at assembly-time.
In Substring operation the value of automatic %variable %& specifies the number of characters assigned in the %variable or it specifies the size of the included file or the object file in
bytes.
In Sublist operation it represents the ordinal number of the last non-empty item in the %variable, or the number of physical lines in the included file.
A suboperated included file must be enclosed in double quotes even when its name doesn't contain spaces. The opening square bracket must immediately follow the input value
(%variable name or the quote which terminates the filename). No white spaces are allowed between the %variable and the suboperation left bracket.
Suboperations are very tolerant about the range values. No warning is reported when they refer to a nonexisting character or item, for instance when the range member is zero or
negative. Ranges with negative slope simply return nothing. Ranges with zero slope return one character|item|line when the index is between 1 and %& , otherwise they return
nothing.
|4142434445464748 |%Sample %SET ABCDEFGH ; Variable %Sample now contains 8 characters.
|0000:4142434445 | DB "%Sample[-3..5]" ; DB "ABCDE"
|0005:434445464748 | DB "%Sample[ 3..99]" ; DB "CDEFGH"
|000B:43 | DB "%Sample[ 3..3]" ; DB "C"
|000C: | DB "%Sample[5..3]" ; DB ""
|000C:4142434445464748205B352E2E335D | DB "%Sample [5..3]" ; DB "ABCDEFGH [5..3]" ; Not a suboperation.
Some of those components may be omitted, they will be given the default value. Default minimum indices is 1 . Default maximum indices is %& .
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
|4142434445464748 |%Sample %SET ABCDEFGH ; Preprocessing variable %Sample now contains 8 characters.
|0000:4142434445 | DB "%Sample[..5]" ; -> DB "%Sample[1..5]" -> DB "ABCDE"
|0005:434445464748 | DB "%Sample[3..]" ; -> DB "%Sample[3..8]" -> DB "CDEFGH"
|000B:4142434445464748 | DB "%Sample[..]" ; -> DB "%Sample[1..8]" -> DB "ABCDEFGH"
|0013:4142434445464748 | DB "%Sample[]" ; -> DB "%Sample[1..8]" -> DB "ABCDEFGH"
The last notation in previous example is useful in %variable names concatenating when we need to append some literal text to the %variable, for instance 123 to the %variable
contents. We cannot write %variable123 because the appended digits change the name of original %variable. The solution is to use empty suboperation, which doesn't change the
%variable contents but it separates its name from the successive text: %variable[]123 or %variable{}123 .
When the range inside braces contains only one index without range operator, it is treated as both minimum and maximum value and only one character|item|line is expanded:
%Sample1[3] -> %Sample[3..3] -> C .
Suboperations may be chained. The chain is processed from left to right. Example:
|4142432C4445462C2C4748492C4A4B4C |%Sample %SET ABC,DEF,,GHI,JKL ; %& is now 16 in %Sample[%&] and 5 in %Sample{%&}.
|0000:4A4B | DB "%Sample{4..5}[2..6]{2}" ; DB "JK"
The first sublist in previous example takes items nr.4 and 5, giving the list of two items GHI,JKL . The next substring extracts characters from second to sixth from that sublist, giving
HI,JK . The last sublist operation expands the second item, which is JK .
Suboperations may be nested. Inner ranges are calculated before the outer ones:
|31323334353637383930 |%Sample %SET 1234567890
|0000:3233343536 | DB "%Sample[2..%Sample[6]]" ; -> DB "%Sample[2..6]" -> DB "23456"
↑ Sections
For each emitting statement the assembler generates some data or machine code which will be dumped to the output file in the end. Fortunately we don't have to write the whole
program in the exact sequence which is required by the output file format. Assembled data and code is tossed on demand to one of several output sections . The statement, which
will switch assembly to a different section, is quite simple: just the name of the section in square brackets [ ] in the label field of the statement.
Imagine that you (the programmer) act like a manager dictating some code and data to your secretary (EuroAssembler). You have dictated a few instructions, which were written in shorthand by
your secretary on a sheet of paper labeled [TEXT] . Then you decided to dictate other kind of data. The secretary will grab another sheet, label it [DATA] and start to write there. Later, when you
want to dictate some other instructions, your secretary takes the sheet labeled [TEXT] again, and continues from the point (origin) where it was interrupted.
You are free to open new sheets and to switch between them ad libitum. When the dictation ends, all used sheets will be stapled together (linked).
In EuroAssembler is the term section used for a named division of segment. Each segment has one or more sections. By default any segment has just one section with identical
name (base section) which was created at segment definition.
↑ Segments
Intel Architecture divides memory to segments controlled by segment registers. Segment is defined in €ASM by the pseudoinstruction SEGMENT.
In the dawn of computer age, programmers demanded more memory then mere 256 bytes or 64 kilobytes which was addressable by 8-bit and 16-bit registers. Designers at Intel in pre-32-bit times
might have chosen to use joinder of two 16-bit general registers, such as DX:AX or SI:BX and to address inconceivable 4 GB of memory with them, but they didn't. Instead, they invented new 16-
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
bit segment registers specialized by the purpose of addressed memory: register CS for machine code, DS for data, SS for machine stack, ES for extra temporary usage.
Segment registers are used for addressing of 16 bytes long chunks of memory called paragraphs (alias octonary word, OWORD). Linear address in real CPU mode is calculated as a sum of
Using segment registers for addressing of 16byte paragraphs yields 1 MB of memory addressable by each segment register, which seemed enough for everybody in those times.
Contents of the segment register in real processor mode represents paragraph address of the segment.
Contents of the segment register in protected processor mode represents index to a descriptor table, which holds some auxilliary information about the addressed segment (beside
its address and size limit): access privileges and width.
Those auxilliary properties are fixed in real mode:
Segment at run-time is a continuous range of operational memory addressable with the contents of one segment register.
Segment at link-time is a named part of object file, which can be concatenated with segments of the same name from other linkable files.
In [MS_PECOFF] terminology is the linkable segment called section. I think the term segment would be more appropriate here, because COFF "sections" are differentiated by access privileges
as they are addressed by different segment registers, ergo by different segment descriptors.
In our segment-highway parable, segments in flat protected mode are highway lanes running in parallel, so they share common milestones (offsets), but each lane is dedicated to a different kind of
vehicles.
Segment at write-time is a part of assembler source which begins with section switching statement , and which ends with another switching statement or with the end of
program.
There is no ENDS (end-of-segment) directive in €ASM. It is not possible to say this part of source code doesn't belong to any segment. When you write the very first statement of your source text, it
already belongs to the default (envelope) program, and every program implicitly defines its default segments. Nevertheless, when a structure or numeric constant is being defined, it is irrelevant
which segment is currently in charge, because structures and scalar symbols do not belong to any segment, no matter where was the structure or symbol defined in the source.
Segments and section divisions of assembler source do not have to be continuous. In fact, discontinuity is their main raison d'être. It allows to keep data in the source text near the
code which manipulates with it, and this is good for readability and understanding of program function.
↑ Groups
When segments of assembler program are not much huge, they may be coalesced into segment group . The whole group of segments is addressable with one segment register.
Group can be defined with pseudoinstruction GROUP.
When a group is defined, e. g. [DGRP] GROUP [DATA],[STRINGS] beside the group [DGRP] it automatically creates a segment with the same name [DGRP] (and consequently a
section with the same name [DGRP]). It also declares that segments [DATA] and [STRINGS] belong to group [DGRP] together with its base segment [DGRP]. Nevertheless, when
nothing is emitted to the implicitely defined segment [DGRP], it will be discarder in the end.
↑ Segmentation (more about sections, segments, groups)
Base segment and section ↓
Segmentation lifetime ↓
Implicit segments ↓
Segment naming conventions ↓
Loading segment registers ↓
Ordering of sections and segments ↓
Displaying the segment map ↓
The relation between segment and its sections in EuroAssembler is similar to the relation between group and its segments.
↑ Base section and segment
Whenever a segment is defined (with the pseudoinstruction SEGMENT), a section with the same name is automatically created in it (it is called base section ). Other sections of the
same segment may be created on demand later. This is done by the statement which has only the section name in its label field (there is no explicit SECTION directive in €ASM).
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Section properties (class=, purpose=, combine=, align=) are inherited from the segment which they belong to. The alignment is not inherited when special literal sections
[@LT64] .. [@LT1], [@RT0], [@RT1].. are created; literal sections are aligned according to the type of data which they keep.
Whenever a group is defined (with the pseudoinstruction GROUP), a segment with the same name is created in it (it is called base segment ), together with other segments which
we want to incorporate to the group.
↑ Segmentation lifetime
Each segment has one or more sections. Each section belongs to exactly one segment. During assembly time all segments are assumed to be loaded at virtual address 0. At the end
of each assembly pass are sections virtually linked to their segment, so they begin at higher VA, where the preceeding section ended. However, in pass 1 it is not known yet what
size will those sections have, so all sections are assumed to start at VA=0 in pass 1. When the last assembly pass ends, all sections are linked physically (their emitted contents and
relocations are concatenated to the segment=base section) and sections are then discarded. Linker is not aware of €ASM sections at all.
Why should we actually split a segment to sections? Well, it is not necessary, mostly we can get by with just one default section per segment. In big programs, on the other hand, it may be useful to
group similar kind of data together; we may want to create separate section for double word sized variables, for floating-point numbers, for text strings. This may save a few bytes of alignment stuff,
which would be necessary when variables of different sizes are mixed together. Also literal sektions are organized in that way.
Another occasion where sections are handy is fast retrieving from read-only "databases" defined statically somewhere in data segment.
Database can be mentally visualized as a table with many rows and with columns containing data items of constant size. For fast selection of a particular row by an item of a "indexed" key value it is
profitable to emit all items from one column sequentially to a section, one after another. The data from every column will have their own section. The width of "indexed" column should be padded to
1, 2, 4 or 8 bytes, so its items can be scanned with a single machine instruction REPNE SCAS . When an item is found, the difference between register rDI and the start of section identifies the
selected row index. Remaining items of this row then can be addressed with the knowledge of row index.
This access method was used in a sample project EuroConvertor and in EuroAssembler itself, where it assigns address of instruction handler to each of the two thousands mnemonics, see
DistLookupIi.
Each group has one or more segments. Each segment belongs to exactly one group (even when it wasn't explicitly grouped, a group with the segment's name will be implicitly
created at link time for the addressing purposes). When a program with executable format is linked, all groups are physically concatenated into an image and the loader of a real
mode executable image is not aware of groups and segments.
↑ Implicit segments and groups
€ASM creates implicit segments when it starts to assemble a program. Implicit segment names depend on the chosen program format:
Implicit segments
FORMAT= Implicit segment names
BIN [BIN]
BOOT [BOOT]
COM [COM]
OMF | MZ [CODE],[RODATA],[DATA],[BSS],[STACK]
COFF | PE | DLL | ELF | ELFX | ELFSO [.text],[.rodata],[.data],[.bss]
If you are not satisfied with the implicit segments created by €ASM, you may redefine them at the start of program or create a new set of segments with different names. Segments
and sections which were not used (nothing was emitted to them) will not be linked to output file and they can be ignored.
When the assembly ends and all segments from linked modules have been incorporated (combined) to the base program, €ASM looks at segments which are not part of any group,
and creates implicit group for them (name of the group is the same as the segment). Here the memory model is taken into account:
Models with single code segment (TINY, SMALL, COMPACT) link all code into a single group, no matter how many code segments are actually defined in the program.
Multicode models (MEDIUM, LARGE) keep each code segment it its own implicit group, (if they weren't grouped explicitly), hence intersegment jumps, calls and returns should have
DIST=FAR.
Similary, single data models (TINY, SMALL, MEDIUM) assume that all initialized and uninitialized data fits into one group not exceeding 64 KB, so the €ASM linker will assign all
data segments into the implicit group and register DS does not have to be changed when accessing data from various segments, which may have been defined in the base program
or in the linked modules.
↑ Segment naming conventions
Name of the group, segment and section is always surrounded by square brackets in €ASM source.
Unlike symbols, namespace is not preposited to segment name when it starts with . (fullstop). Group, segment, section names are always nonlocal.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Number of characters in group|segment|section name is not limited by €ASM but it may be limited by the output format. In OMF object module the name of a group or segment must
not exceed 255 characters. In PE COFF executables the name in section header is truncated to 8 characters.
€ASM treats all names as case sensitive. If you want to link your segment with object module produced by an external compiler which converts segment name to uppercase or which
mangles the names by prepending underscores __ , you should adapt your naming convention to it.
Segment name should be unique, you cannot define two segments with the identical name in a program, except for the implicitly created segments, if there were not used yet.
However, it is possible to define segments with same names in different programs and link them together; their contents will be concatenated according to their COMBINE= property.
Similar rule applies to groups.
Section names cannot be duplicated on principle. When a section name appears in the source for the second time, it will only switch to that section rather than creating a new one.
Implicit literal section name begins with @LT or @RT, you'd better avoid names which begin with this combination of letters.
Segment which have dollar sign $ in their name are treated in a special way. If the characters on the left side of this $ match, all such segments will be linked adjacently in
alphabetic order.
There are conventions how "sections" are named in COFF modules, you may need to adapt to them to succesfully link €ASM program with modules created by different compilers.
↑ Loading segment registers
When €ASM creates a protected executable ELFX or PE 32-bit or 64-bit program format, we don't have to bother with segments, groups or stack at all. All segment registers are
preloaded by Linux or Windows and the stack is established automatically.
When the DOS launches a tiny COM program, it loads CS=DS=SS=ES with the paragraph address of its PSP , sets IP=100h and SP to the end of the stack segment, usually
0FFFEh. Again, we don't have to bother with segment registers at all.
When a MZ executable program is prepared to start, its segment registers have been set by the DOS loader. CS:IP is set to the program entry point, SS:SP is set to the top of
machine stack, but both DS and ES point to PSP , which is not our data segment.
There is no instruction in Intel architecture to load segment register with immediate value directly, so this is usually done via register or stack:
; Loading paragraph address of [DATA] to segment register
; using a general purpose register (which is faster):
MOV AX, PARA# [DATA]
MOV DS,AX
; or using the machine stack (which is shorter):
PUSH PARA# [DATA]
POP DS
It is the responsibility of programmer to load segment register with the address of another segment, whenever it is used. €ASM makes no assumption about the contents of segment
registers; there is no ASSUME, USING, WRT directive in €ASM.
↑ Ordering of sections and segments
Order is generally based on four sorting keys:
Order of sections
At the end of each assembly pass are all sections linked to their segments in this order:
Order of segments
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
Segments are combined and linked at link time in this order:
Segments in each group are in the order as they were defined in the source (not as they were declared in the GROUP statement). The base segment is always the first in a group.
When an executable format is linked, every segment is assigned to some group, at least to the implicit one (with identical name).
Implicit groups of segments are used internally for relocation purposes only. Protected mode programs (MODEL=FLAT) do not care of segment registers much, so we don't have to bother with
groups in programs for Windows or Linux.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
.rodata RODATA R FiAl|SeAl
.data DATA RW FiAl|SeAl
.bss BSS RW FiAl|SeAl
.dynamic DYNAMIC RW NrOfRec*(8|16) 8 | 16 4)
Remarks:
0) Special structure without its own section header.
1) Used in relocatable module only.
2) Used in executable image only.
3) Used in executable image only when EUROASM DEBUG=ENABLED.
4) Used in executable image only when linked with shared object library.
Access rights:
R Allocate memory in process address space and allow read.
W Allow write.
X Allow execute.
FiAl|SeAl maximum of File Alignment | Segment Alignment.
↑ Displaying the segment map
Pseudoinstruction %DISPLAY Sections prints to the listing file a complete map of groups, segments and sections defined so far at assembly time, one object per line represented by
a debugging message D1260 (group), D1270 (segment), D1280 (section). Segment is indented with two spaces, section is indented with four spaces.
Instead of %DISPLAY Sections we could use %DISPLAY Segment or %DISPLAY Groups , the result is identical. The entire group/segment/section map is always displayed with those statements.
At link time €ASM prints a similar map of groups and segments to the listing, with finally used virtual addresses, unless it was disabled with option PROGRAM LISTMAP=OFF .
↑ Distance
The distance is property of a difference between two addresses. It is not just the numeric difference of two offsets; in €ASM this term represents one of three enumerated values:
FAR, NEAR, SHORT .
The distance of two addresses is FAR when they belong to different groups/segments, otherwise it is NEAR or SHORT. Difference of offsets is SHORT if it fits into 8-bit signed
integer, i. e. -128..+127.
↑ Width
€ASM is 64-bit assembler, it can also compile programs for the older CPU which worked with 32 and 16 bit words only. The number of bits which CPU works with simultaneously is
called width and it is either 16 , 32 or 64 .
Width is always measured in bits.
The width is a property of segment. Some 32-bits object file formats allow to mix segments of different widths in one file. Width of addressing and operating mode can be ad hoc
changed with instruction prefix ATOGGLE, OTOGGLE.
Pseudoinstruction PROGRAM has the WIDTH= property, too. It will establish the default for all segments declared in the program. Program width is also used to select the format of
output file, for instance if the PExecutable should be created as 32-bit or 64-bit.
↑ Size
Size is a plain non-negative number which specifies the number of bytes in object (register, memory variable, structure, segment, file etc). Size of a string is specified in bytes, no
matter if the string is composed of ANSI or WIDE characters.
Size of an object can be counted with at assembly time, using the attribute operator SIZE# or FILESIZE#.
Size of a preprocessing %variable contents can be retrieved with pseudoinstruction %SETS.
Size is always measured in bytes.
Size and length of €ASM elements (identifiers, numbers, structures, expressions, file contents, nesting depth, number of operands, etc.) is not limited by design, but such sizes are
internally stored as the signed 32-bit integers, so the actual limitation is 2_147_483_647 characters. In practice we will be restricted by the amount of available memory, of course.
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
↑ Length
This term is used to count the number of comma-separated items in an array, for instance the length of operand list in the statement VPERMI2B XMM1,XMM2,XMM3,MASK=K4,ZEROING=ON
is 5.
Length of a preprocessing %variable contents can be retrieved with pseudoinstruction %SETL.
↑ Namespace
The names of symbols and structures created in a program must be unique. In large projects it might be difficult to maintain unique names, especially when more people work on
separate parts of the program. That is why the programmer can use local identifiers which must be unique only in a division of source file called namespace . The namespace is a
range of the source specified by namespace block. There are four block-pseudoinstructions in €ASM which create the namespace: PROGRAM, PROC, PROC1, STRUC . The block name
is also the name of the namespace. An identifier is local when its name begins with fullstop . . Unlike with standards symbols, the characters following the leading fullstop may start
with a decimal digit and it is not an error when they form a reserved name. Example of valid local identifiers: .L1, .20, .AX .
Names of local identifiers are kept in €ASM internally concatenated with namespace name, so they form fully qualified name (FQN). Local symbols may be referred with .local name
only within their native namespace block; they may also be referred with fully qualified name anywhere in the program.
The namespace actually starts at the operation field of the block statement and it ends at the operation field of the corresponding endblock statement. Thanks to this, the namespace
itself (label of the block) may be local, too, and the namespaces may be nested.
Beside the namespace blocks there is one more occasion where namespace is unfolded: operand fields of the structured data definition statement, which temporarily take over the
namespace of a structure which is being instanceized.
↑ Scope
Scope is the property of a symbol which specifies symbol visibility.
A symbol defined in the assembler program, such as label or memory variable, may be referred anywhere within the program at assembly time. Our program may be linked with
other programs, object modules or libraries, which might have misused the same name for their own symbols, but it's OK and no conflict occurs because programs are compiled
PDFmyURL converts web pages and even full websites to PDF easily and quickly.
separately. This is the standard behaviour, such symbols have standard private scope and their visibility is limited to the inside of PROGRAM..ENDPROGRAM block.
When a symbol name begins with fullstop . , visibility of such private local name is even narrower, it is limited to the smallest namespace block in which was the symbol defined
(PROC..ENDPROC, STRUC..ENDSTRUC).
On the other hand, executables which are linked from several programs (modules, libraries) need to acces symbols outside their standard private scope, for instance to call an entry
point of a library function. Names of such global symbols should be unique among all linked programs.
Scope recognized in €ASM
private Global
static link dynamic link
Standard local
Public Extern eXport Import
Scope of a symbol can be examined at assembly time with attribute operator SCOPE#, which returns ASCII value of uppercase scope shortcut, for instance
MySymbol EXTERN
MOV AL,SCOPE# MySymbol ; This is equivalent to MOV AL,'E'
Available shortcuts are underlined in the table above. The same shortcuts are also used when symbol properties are listed by %DISPLAY Symbols and after the link phase if
LISTGLOBALS=ENABLED.
GLOBAL, PUBLIC, EXTERN, EXPORT and IMPORT scope of a symbol can be explicitly declared by pseudoinstruction with the corresponding name. GLOBAL scope can be also
declared implicitly, using two (or more) terminating colons :: after the symbol name. A symbol declared as GLOBAL is either available as PUBLIC (if it is defined in the same
program), or it is marked as EXTERN (if it is not defined in the program).
Only the scopes for static linking (PUBLIC, EXTERN) can be declared by simplified global scope declaration (using two colons). When the symbol will be exported (if a DLL file is
created), or when it should be dynamically imported from other DLL, using two colons is not enough and either explicit declaration EXPORT/IMPORT symbol or LINK import_library
is required.
Word1: DW 1 ; Standard private scope.
Word2:: DW 2 ; Public scope declared implicitly (with double colon).
Word3 PUBLIC
; Public scope declared explicitly.
Word4 GLOBAL
; Public or extern scope (which depends on Word4 definition in this program).
Word5 GLOBAL
; Public or extern scope (which depends on Word5 definition in this program).
Word6 EXTERN
; Extern scope. Symbol Word6 must not be defined anywhere else in this program.
Word4: ; Definition of symbol Word4.
MOV EAX,Word5 ; Reference of external symbol Word5.
; Scope of Word1 is PRIVATE.
; Scope of Word2, Word3, Word4 is PUBLIC.
; Scope of Word5, Word6 is EXTERN.
↑ Data types
Information in computer memory or register represents the code or data. Important properties of stored texts and numbers is data type , which is a rule specifying how to interpret
the information. €ASM recognizes following types of data:
Fundamental data types
Typical Character Integer Floating-point Packed
Typename Short Size Autoalign Width
storage string number number vector
BYTE B 1 1 8 R8 ANSI 8-bit
UNICHAR U 2 2 16 R16 WIDE
WORD W 2 2 16 R16 16-bit
DWORD D 4 4 32 R32,ST 32-bit Single precision
QWORD Q 8 8 64 R64,ST 64-bit Double precision
TBYTE T 10 8 80 ST Extended precision
PDFmyURL converts web pages and even full websites to PDF easily and quickly.