Defining and Using Complex Data Types
Defining and Using Complex Data Types
C H A P T E R 5
With the complex data types available in MASM 6.1 — arrays, strings, records,
structures, and unions — you can access data as a unit or as individual elements
that make up a unit. The individual elements of complex data types are often the
integer types discussed in Chapter 4, “Defining and Using Simple Data Types.”
“Arrays and Strings” reviews how to declare, reference, and initialize arrays and
strings. This section summarizes the general steps needed to process arrays and
strings and describes the MASM instructions for moving, comparing, searching,
loading, and storing.
“Structures and Unions” covers similar information for structures and unions:
how to declare structure and union types, how to define structure and union
variables, and how to reference structures and unions and their fields.
“Records” explains how to declare record types, define record variables, and use
record operators.
Initializer lists of array declarations can span multiple lines. The first initializer
must appear on the same line as the data type, all entries must be initialized,
and, if you want the array to continue to the new line, the line must end with a
comma. These examples show legal multiple-line array declarations:
big BYTE 21, 22, 23, 24, 25,
26, 27, 28
If you do not use the LENGTHOF and SIZEOF operators discussed later in
this section, an array may span more than one logical line, although a separate
type declaration is needed on each logical line:
var1 BYTE 10, 20, 30
BYTE 40, 50, 60
BYTE 70, 80, 90
Referencing Arrays
Each element in an array is referenced with an index number, beginning with
zero. The array index appears in brackets after the array name, as in
array[9]
wprime[4] represents the third element (5), which is 4 bytes from the
beginning of the array. Similarly, the expression wprime[6] represents the
fourth element (7) and wprime[10] represents the sixth element (13).
The following example determines an index at run time. It multiplies the position
by two (the size of a word element) by shifting it left:
mov si, cx ; CX holds position number
shl si, 1 ; Scale for word referencing
mov ax, wprime[si] ; Move element into AX
The offset required to access an array element can be calculated with the
following formula:
nth element of array = array[(n-1) * size of element]
Referencing an array element by distance rather than position is not difficult to
master, and is actually very consistent with how assembly language works.
Recall that a variable name is a symbol that represents the contents of a
particular address in memory. Thus, if the array wprime begins at address
DS:2400h, the reference wprime[6] means to the processor “the word value
contained in the DS segment at offset 2400h-plus-6-bytes.”
Since brackets simply add a number to an address, you don’t need them when
referencing the first element. Thus, wprime and wprime[0] both refer to the
first element of the array wprime.
If your program runs only on an 80186 processor or higher, you can use the
BOUND instruction to verify that an index value is within the bounds of an
array. For a description of BOUND, see the Reference.
For data directives other than BYTE, a string may initialize only the first
element. The initializer value must fit into the specified size and conform to the
expression word size in effect (see “Integer Constants and Constant
Expressions” in Chapter 1), as shown in these examples:
wstr WORD "OK"
dstr DWORD "DATA" ; Legal under EXPR32 only
As with arrays, string initializers can span multiple lines. The line must end with
a comma if you want the string to continue to the next line.
str1 BYTE "This is a long string that does not ",
"fit on one line."
Strings must be enclosed in single (') or double (") quotation marks. To put a
single quotation mark inside a string enclosed by single quotation marks, use two
single quotation marks. Likewise, if you need quotation marks inside a string
enclosed by double quotation marks, use two sets. These examples show the
various uses of quotation marks:
char BYTE 'a'
message BYTE "That's the message." ; That's the message.
warn BYTE 'Can''t find file.' ; Can't find file.
string BYTE "This ""value"" not found." ; This "value" not found.
You can always use single quotation marks inside a string enclosed by double
quotation marks, as the initialization for message shows, and vice versa.
The ? Initializer
You do not have to initialize an array. The ? operator lets you allocate space for
the array without placing specific values in it. Object files contain records for
initialized data. Unspecified space left in the object file means that no records
contain initialized data for that address. The actual values stored in arrays
allocated with ? depend on certain conditions. The ? initializer is treated as a
zero in a DUP statement that contains initializers in addition to the ? initializer.
If the ? initializer does not appear in a DUP statement, or if the DUP statement
contains only ? initializers, the assembler leaves the allocated space unspecified.
Processing Strings
The 8086-family instruction set has seven string instructions for fast and
efficient processing of entire strings and arrays. The term “string” in “string
instructions” refers to a sequence of elements, not just character strings. These
instructions work directly only on arrays of bytes and words on the 8086–80486
processors, and on arrays of bytes, words, and doublewords on the 80386/486
processors. Processing larger elements must be done indirectly with loops.
The following list gives capsule descriptions of the five instructions discussed in
this section.
Instruction Description
MOVS Copies a string from one location to another
STOS Stores contents of the accumulator register to a string
CMPS Compares one string with another
LODS Loads values from a string to the accumulator register
SCAS Scans a string for a specified value
All of these instructions use registers in a similar way and have a similar syntax.
Most are used with the repeat instruction prefixes REP, REPE (or REPZ), and
REPNE (or REPNZ). REPZ is a synonym for REPE (Repeat While Equal) and
REPNZ is a synonym for REPNE (Repeat While Not Equal).
This section first explains the general procedures for using all string instructions.
It then illustrates each instruction with an example.
If the direction flag is clear, the string is processed upward (from low
addresses to high addresses, which is from left to right through the string). If
the direction flag is set, the string is processed downward (from high
addresses to low addresses, or from right to left). Under MS-DOS, the
direction flag is normally clear if your program has not changed it.
2. Load the number of iterations for the string instruction into the CX register.
If you want to process 100 elements in a string, move 100 into CX. If you
wish the string instruction to terminate conditionally (for example, during a
search when a match is found), load the maximum number of iterations that
can be performed without an error.
3. Load the starting offset address of the source string into DS:SI and the
starting address of the destination string into ES:DI. Some string instructions
take only a destination or source, not both (see Table 5.1).
Normally, the segment address of the source string should be DS, but you
can use a segment override to specify a different segment for the source
operand. You cannot override the segment address for the destination string.
Therefore, you may need to change the value of ES. For information on
changing segment registers, see “Programming Segmented Addresses” in
Chapter 3.
Note Although you can use a segment override on the source operand, a
segment override combined with a repeat prefix can cause problems in certain
situations on all processors except the 80386/486. If an interrupt occurs during
the string operation, the segment override is lost and the rest of the string
operation processes incorrectly. Segment overrides can be used safely when
interrupts are turned off or with the 80386/486 processors.
You can adapt these steps to the requirements of any particular string operation.
The syntax for the string instructions is:
[[prefix]] CMPS [[segmentregister:]] source, [[ES:]] destination
LODS [[segmentregister:]] source
[[prefix]] MOVS [[ES:]] destination, [[segmentregister:]] source
[[prefix]] SCAS [[ES:]] destination
[[prefix]] STOS [[ES:]] destination
Some instructions have special forms for byte, word, or doubleword operands.
If you use the form of the instruction that ends in B (BYTE), W (WORD), or D
(DWORD) with LODS, SCAS, and STOS, the assembler knows whether the
element is in the AL, AX, or EAX register. Therefore, these instruction forms
do not require operands.
Table 5.1 lists each string instruction with the type of repeat prefix it uses and
indicates whether the instruction works on a source, a destination, or both.
Table 5.1 Requirements for String Instructions
Instruction Repeat Prefix Source/Destination Register Pair
MOVS REP Both DS:SI, ES:DI
SCAS REPE/REPNE Destination ES:DI
CMPS REPE/REPNE Both DS:SI, ES:DI
LODS None Source DS:SI
STOS REP Destination ES:DI
INS REP Destination ES:DI
OUTS REP Source DS:SI
The repeat prefix causes the instruction that follows it to repeat for the number
of times specified in the count register or until a condition becomes true. After
each iteration, the instruction increments or decrements SI and DI so that it
points to the next array element. The direction flag determines whether SI and
DI are incremented (flag clear) or decremented (flag set). The size of the
instruction determines whether SI and DI are altered by 1, 2, or 4 bytes each
time.
Each prefix governs the number of repetitions as follows:
Prefix Description
REP Repeats instruction CX times
REPE, REPZ Repeats instruction maximum CX times while values are equal
REPNE, REPNZ Repeats instruction maximum CX times while values are not
equal
The prefixes apply to only one string instruction at a time. To repeat a block of
instructions, use a loop construction. (See “Loops” in Chapter 7.)
At run time, if a string instruction is preceded by a repeat sequence, the
processor:
1. Checks the CX register and exits if CX is 0.
2. Performs the string operation once.
3. Increases SI and/or DI if the direction flag is clear. Decreases SI and/or DI if
the direction flag is set. The amount of increase or decrease is 1 for byte
operations, 2 for word operations, and 4 for doubleword operations.
4. Decrements CX without modifying the flags.
5. Checks the zero flag (for SCAS or CMPS) if the REPE or REPNE prefix is
used. If the repeat condition holds, loops back to step 1. Otherwise, the loop
ends and execution proceeds to the next instruction.
When the repeat loop ends, SI (or DI) points to the position following a match
(when using SCAS or CMPS), so you need to decrement or increment DI or SI
to point to the element where the last match occurred.
Although string instructions (except LODS) are used most often with repeat
prefixes, they can also be used by themselves. In these cases, the SI and/or DI
registers are adjusted as specified by the direction flag and the size of operands.
Filling Arrays
The STOS instruction stores a specified value in each position of a string. The
string is the destination, so it must be pointed to by ES:DI. The value to store
must be in the accumulator.
The next example stores the character 'a' in each byte of a 100-byte string,
filling the entire string with “aaaa....” Notice how the code stores 50 words
rather than
100 bytes. This makes the fill operation faster by reducing the number of
iterations. To fill an odd number of bytes, you need to adjust for the last byte.
.MODEL small, C
.DATA
destin BYTE 100 DUP (?)
ldestin EQU (LENGTHOF destin) / 2
.CODE
. ; Assume ES = DS
.
.
cld ; Work upward
mov ax, 'aa' ; Load character to fill
mov cx, ldestin ; Load length of string
mov di, OFFSET destin ; Load address of destination
rep stosw ; Store 'aa' into array
Comparing Arrays
The CMPS instruction compares two strings and points to the address after
which a match or nonmatch occurs. If the values are the same, the zero flag is
set. Either string can be considered the destination or the source unless a
segment override is used. This example using CMPSB assumes that the strings
are in different segments. Both segments must be initialized to the appropriate
segment register.
.MODEL large, C
.DATA
string1 BYTE "The quick brown fox jumps over the lazy dog"
.FARDATA
string2 BYTE "The quick brown dog jumps over the lazy fox"
lstring EQU LENGTHOF string2
.CODE
mov ax, @data ; Load data segment
mov ds, ax ; into DS
mov ax, @fardata ; Load far data segment
mov es, ax ; into ES
.
.
.
cld ; Work upward
mov cx, lstring ; Load length of string
mov si, OFFSET string1 ; Load offset of string1
mov di, OFFSET string2 ; Load offset of string2
repe cmpsb ; Compare
je allmatch ; Jump if all match
.
.
.
allmatch: ; Special case for all match
.DATA
info BYTE 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
linfo WORD LENGTHOF info
.CODE
.
.
.
cld ; Work upward
mov cx, linfo ; Load length
mov si, OFFSET info ; Load offset of source
mov ah, 2 ; Display character function
get:
lodsb ; Get a character
add al, '0' ; Convert to ASCII
mov dl, al ; Move to DL
int 21h ; Call DOS to display character
loop get ; Repeat
Searching Arrays
The SCAS instruction compares the value pointed to by ES:DI with the value in
the accumulator. If both values are the same, it sets the zero flag.
A repeat prefix lets SCAS work on an entire string, scanning (from which SCAS
gets its name) for a particular value called the target. REPNE SCAS sets the
zero flag if it finds the target value in the array. REPE SCAS sets the zero flag
if the scanned array contains nothing but the target value.
This example assumes that ES is not the same as DS and that the address of the
string is stored in a pointer variable. The LES instruction loads the far address
of the string into ES:DI.
.DATA
string BYTE "The quick brown fox jumps over the lazy dog"
pstring PBYTE string ; Far pointer to string
lstring EQU LENGTHOF string ; Length of string
.CODE
.
.
.
cld ; Work upward
mov cx, lstring ; Load length of string
les di, pstring ; Load address of string
mov al, 'z' ; Load character to find
repne scasb ; Search
jne notfound ; Jump if not found
. ; ES:DI points to character
. ; after first 'z'
.
notfound: ; Special case for not found
Although AL cannot contain an index value greater than 255, you can use
XLAT with arrays containing more than 256 elements. Simply treat each 256-
byte block of the array as a smaller sub-array. For example, to retrieve the
260th element of an array, add 256 to BX and set AL=3 (260-256-1).
You can use the entire structure or union variable or just the individual fields as
operands in assembler statements. This section explains the allocating,
initializing, and nesting of structures and unions.
MASM 6.1 extends the functionality of structures and also makes some changes
to MASM 5.1 behavior. If you prefer, you can retain MASM 5.1 behavior by
specifying OPTION OLDSTRUCTS in your program.
Initializing Fields
If you provide initializers for the fields of a structure or union when you declare
the type, these initializers become the default value for the fields when you
define a variable of that type. “Defining Structure and Union Variables,”
following, explains default initializers.
When you initialize the fields of a union type, the type and value of the first field
become the default value and type for the union. In this example of an initialized
union declaration, the default type for the union is DWORD:
DWB UNION
d DWORD 00FFh
w WORD ?
b BYTE ?
DWB ENDS
If the size of the first member is less than the size of the union, the assembler
initializes the rest of the union to zeros. When initializing strings in a type, make
sure the initial values are long enough to accommodate the largest possible
string.
Field Names
Structure and union field names must be unique within a nesting level because
they represent the offset from the beginning of the structure to the
corresponding field.
A label elsewhere in the code may have the same name as a structure field, but
a text macro cannot. Also, field names between structures need not be unique.
Field names must be unique if you place OPTION M510 or OPTION
OLDSTRUCTS in your code or use the /Zm option from the command line,
since versions of MASM prior to 6.0 require unique field names. (See Appendix
A.)
Any padding required to reach the correct offset for the field is added prior to
allocating the field. The padding consists of zeros and always precedes the
aligned field. The size of the structure must also be evenly divisible by the
structure alignment value, so zeros may be added at the end of the structure.
If neither the alignment nor the /Zp command-line option is used, the offset is
incremented by the size of each data directive. This is the same as a default
alignment equal to 1. The alignment specified in the type declaration overrides
the /Zp command-line option.
These examples show how the assembler determines offsets:
STUDENT2 STRUCT 2 ; Alignment value is 2
score WORD 1 ; Offset = 0
id BYTE 2 ; Offset = 2 (1 byte padding added)
year DWORD 3 ; Offset = 4
sname BYTE 4 ; Offset = 8 (1 byte padding added)
STUDENT2 ENDS
One byte of padding is added at the end of the first byte-sized field. Otherwise,
the offset of the year field would be 3, which is not divisible by the alignment
value of 2. The size of this structure is now 9 bytes. Since 9 is not evenly
divisible by 2, 1 byte of padding is added at the end of student2.
STUDENT4 STRUCT 4 ; Alignment value is 4
sname BYTE 1 ; Offset = 0 (1 byte padding added)
score WORD 10 DUP (100) ; Offset = 2
year BYTE 2 ; Offset = 22 (1 byte padding
; added so offset of next field
; is divisible by 4)
id DWORD 3 ; Offset = 24
STUDENT4 ENDS
ITEMS STRUCT
Iname BYTE 'Item Name'
Inum WORD ?
UNION ITYPE ; UNION keyword appears first
oldtype BYTE 0 ; when nested in structure.
newtype WORD ? ; (See "Nested Structures
ENDS ; and Unions," following ).
ITEMS ENDS
.
.
.
.DATA
Item1 ITEMS < > ; Accepts default initializers
Item2 ITEMS { } ; Accepts default initializers
Item3 ITEMS <'Bolts', 126> ; Overrides default value of first
; 2 fields; use default of
; the third field
Item4 ITEMS { \
'Bolts', ; Item name
126 \ ; Part number
}
The example defines — that is, allocates space for — four structures of the
ITEMS type. The structures are named Item1 through Item4. Each definition
requires the angle brackets or curly braces even when not initialized. If you
initialize more than one field, separate the values with commas, as shown in
Item3 and Item4.
You need not initialize all fields in a structure. If a field is blank, the assembler
uses the structure’s initial value given for that field in the declaration. If there is
no default value, the field value is left unspecified.
For nested structures or unions, however, these are equivalent:
Item5 ITEMS {'Bolts', , }
Item6 ITEMS {'Bolts', , { } }
WB UNION
w WORD ?
b BYTE ?
WB ENDS
DISKDRIVES STRUCT
a1 BYTE ?
b1 BYTE ?
c1 BYTE ?
DISKDRIVES ENDS
INFO STRUCT
buffer BYTE 100 DUP (?)
crlf BYTE 13, 10
query BYTE 'Filename: ' ; String <= can override
endmark BYTE 36
drives DISKDRIVES <0, 1, 1>
INFO ENDS
The initialization for drives gives default values for all three fields of the
structure. The fields left blank in info1 use the default values for those fields.
The info2 declaration is illegal because “DirectoryName” is longer than the
initial string for that field.
The Item7 array defined here has 30 elements of type ITEMS, with the third
field of each element (the union) initialized to 10.
You can also list array elements as shown in the following example.
Item8 ITEMS {'Bolts', 126, 10},
{'Pliers',139, 10},
{'Saws', 414, 10}
Redeclaring a Structure
The assembler generates an error when you declare a structure more than once
unless the following are the same:
u Field names
u Offsets of named fields
u Initialization lists
u Field alignment value
This example, using the preceding data declarations, shows how to use the
LENGTHOF, SIZEOF, and TYPE operators with structures.
INFO STRUCT
buffer BYTE 100 DUP (?)
crlf BYTE 13, 10
query BYTE 'Filename: '
endmark BYTE 36
drives DISKDRIVES <0, 1, 1>
INFO ENDS
In the following example, the two MOV statements show how you can access
the elements of an array of unions.
WB UNION
w WORD ?
b BYTE ?
WB ENDS
As the preceding code illustrates, you can use unions to access the same data in
more than one form. One application of structures and unions is to simplify the
task of reinitializing a far pointer. For a far pointer declared as
FPWORD TYPEDEF FAR PTR WORD
.DATA
WordPtr FPWORD ?
you must follow these steps to point WordPtr to a word value named
ThisWord in the current data segment.
mov WORD PTR WordPtr[2], ds
mov WORD PTR WordPtr, OFFSET ThisWord
The preceding method requires that you remember whether the segment or the
offset is stored first. However, if your program declares a union like this:
uptr UNION
dwptr FPWORD 0
STRUCT
offs WORD 0
segm WORD 0
ENDS
uptr ENDS
This code moves the segment and the offset into the pointer and then moves the
pointer into a register with the other field of the union. Although this technique
does not reduce the code size, it avoids confusion about the order for loading
the segment and offset.
INVENTORY STRUCT
UpDate WORD ?
oldItem ITEMS { \
100,
'AF8' \ ; Named variable of
} ; existing structure
ITEMS { ?, '94C' } ; Unnamed variable of
; existing type
STRUCT ups ; Named nested structure
source WORD ?
shipmode BYTE ?
ENDS
STRUCT ; Unnamed nested structure
f1 WORD ?
f2 WORD ?
ENDS
INVENTORY ENDS
.DATA
yearly INVENTORY { }
To nest structures and unions, you can use any of these techniques:
u The field of a structure or union can be a named variable of an existing
structure or union type, as in the oldItem field. Because INVENTORY
contains two structures of type ITEMS , the field names in oldItem are not
unique. Therefore, you must use the full field names when referencing those
fields, as in the statement
mov ax, yearly.oldItem.Inum
u As shown in the Items field of Inventory, you also can use unnamed
variables of existing structures or unions inside another structure or union. In
these cases, you can reference fields directly:
mov yearly.Inum, 'C'
mov ax, yearly.f1
Records
Records are similar to structures, except that fields in records are bit strings.
Each bit field in a record variable can be used separately in constant operands or
expressions. The processor cannot access bits individually at run time, but it can
access bit fields with instructions that manipulate bits.
Records are bytes, words, or doublewords in which the individual bits or groups
of bits are considered fields. In general, the three steps for using record variables
are the same as those for using other complex data types:
1. Declare a record type.
2. Define one or more variables having the record type.
3. Reference record variables using shifts and masks.
Once it is defined, you can use the record variable as an operand in assembler
statements.
This section explains the record declaration syntax and the use of the MASK
and WIDTH operators. It also shows some applications of record variables and
constants.
The next example creates a record type CW that has six fields. Each record
declared with this type occupies 16 bits of memory. Initial (default) values are
given for each field. You can use them when declaring data for the record. The
bit diagram after the example shows the contents of the record type.
CW RECORD r1:3=0, ic:1=0, rc:2=0, pc:2=3, r2:2=1, masks:6=63
any initial values need to be enclosed in parentheses. For example, you can
define an array of record variables with
xmas COLOR 50 DUP ( <1, 2, 0, 4> )
You do not have to initialize all fields in a record. If an initial value is blank, the
assembler automatically stores the default initial value of the field. If there is no
default value, the assembler clears each bit in the field.
The definition in the following example creates a variable named warning
whose type is given by the record type COLOR. The initial values of the fields in
the variable are set to the values given in the record definition. The initial values
override any default record values given in the declaration.
COLOR RECORD blink:1,back:3,intense:1,fore:3 ; Record
; declaration
warning COLOR <1, 0, 1, 4> ; Record
; definition
; Record instance
; 8 bits stored in 1 byte
RGBCOLOR2 RECORD red:3, green:3, blue:2
rgb RGBCOLOR2 <1, 1, 1> ; Initialize to 00100101y
Record Operators
The WIDTH operator (used only with records) returns the width in bits of a
record or record field. The MASK operator returns a bit mask for the bit
positions occupied by the given record field. A bit in the mask contains a 1 if
that bit corresponds to a bit field. The following example shows how to use
MASK and WIDTH.
.DATA
COLOR RECORD blink:1, back:3, intense:1, fore:3
message COLOR <1, 5, 1, 1>
wblink EQU WIDTH blink ; "wblink" = 1
wback EQU WIDTH back ; "wback" = 3
wintens EQU WIDTH intense ; "wintens" = 1
wfore EQU WIDTH fore ; "wfore" = 3
wcolor EQU WIDTH COLOR ; "wcolor" = 8
.CODE
.
.
.
mov ah, message ; Load initial 1101 1001
and ah, NOT MASK back ; Turn off AND 1000 1111
; "back" ---------
; 1000 1001
or ah, MASK blink ; Turn on OR 1000 0000
; "blink" ---------
; 1000 1001
xor ah, MASK intense ; Toggle XOR 0000 1000
; "intense" ---------
; 1000 0001
The example continues by illustrating several ways in which record fields can
serve as operands and expressions:
; Rotate "back" of "message" without changing other values
Record variables are often used with the logical operators to perform logical
operations on the bit fields of the record, as in the previous example using the
MASK operator.