Module-1 Part2 Assemblers
Module-1 Part2 Assemblers
Module 1
Part-2
ASSEMBLERS
Introduction
Data Structure
An assembler is a program that accepts input as assembly language program & produces
its machine language equivalent along with information for the loader.
1. WORD This assembler directive is used to generate one word. It reserves one word of
storage (3 bytes) and will be initialized to a value specified in the operand field of the
instruction.
Example A WORD 10
10=00000A
LOC VALUE
A 0A
A+1 00
A+2 00
2. RESW This assembler directive instructs the assembler to reserves indicated number of
words for data area.
Example A RESW 3
3 WORDS = 9 bytes of space is reserved
3 BYTE : This is an assembler directive which instructs the assembler to generate a character
or constant (Hexadecimal) occupying as many bytes as needed to represent the constant.
LOC VALUE
STR H
STR+1 E
STR+2 L
STR+3 L
STR+4 O
STR+5
4. RESB This assembler directive instructs the assembler to reserves indicated number of
bytes for data area.
Example A RESB 3
LOC VALUE
A (B1)
A+1 (B2)
A+2 (B3)
5. START This assembler directive is used by the assembler to specify the name and
starting address of the program.
This indicates that name of the program is COPY and 1000 in the operand field indicates
that the program should be loaded into the memory from location 1000.
6. END this is an instruction to the assembler which indicates the end of the source program
is reached. An optional operand field indicates that the execution starts from operand
specified.
Write a SIC program that reads a record from an input device (F1) and copies them to an output
device 05.
The program contains three routines:
A main routine (from line 10 to line 105)
Which calls two others routines RDREC, and WRREC inside a loop (CLOOP?)
A subroutine RDREC (from line 125 to line 190)
Which reads records from an input device (identified with device code F1: line
185) and to the buffer
A subroutine WRREC (from line 210 to line 250)
Which write the record from the buffer to the output device
Each subroutine (RDREC and WRREC) must transfer the record one character at time
because the only I/O available are RD and WD. The end of each record is marked with a null
character (hexadecimal 00). If the record is longer than the length of the buffer (4096 bytes),
only the first 4096 bytes are copied. The end of the file to be copied is indicated by a zero-length
record. When the end of file is detected, the program writes EOF on the output device and
terminates by executing an RSUB instruction. We assume that this program was called by the OS
using JSUB instruction; thus, the RSUB will return control to the OS..The program starts at
address 1000.
Forward Reference:
The translation of source program to object code is accomplished by sequential processing line
by line at a time.
In sequential processing assembler functions 1, 3, 4, 5 can be easily processed. But function 2 i.e.
converting symbolic operands to their equivalent machine address presents some problem.
The instruction at line no 10 contains forward reference i.e. RETADR is defined later in the
program. Translating program line by line will be unable to process this instruction. Because of
this 2-pass Assembler is used.
First pass: scan the source program for label definitions and assign addresses
Second pass performs most of the actual translation.
Intermediate file: Finally the Assembler must write the generated object code onto some output
device. The object program will later be loaded into memory for execution.
The simple object program format we use contains three types of records
1. Header record
2. Text record
3. End record
Header record
Col. 1 H
Col. 2~7 Program name
Col. 8~13 Starting address of object program (hex)
Col. 14-19 Length of object program in bytes (hex)
Example
H^COPY ^001000^00107A
(we use the ^ symbol to separate fields)
001000 is the starting address of object program
00107A = length of object program in bytes =( last location) – (1st location + 1) = 2079 – 1000+1
(all in hexadecimal)
Text record
Col. 1 T
Col. 2~7 Starting address for object code in this record (hex)
Col. 8~9 Length of object code in this record in bytes (hex)
Col. 10~69 Object code, represented in hex (2 col. per byte)
Example
T^001000^1E^141033^482039^001036^281030^301015^482061^3C1003^00102A^0C1039^00102D
T^00101E^15^0C1036^…
End record
Col.1 E
Col.2~7 Address of first executable instruction in object program (hex)
Example
E^001000
OPTAB is used to lookup mnemonic operation codes and translate them to their machine
language equivalents. OPTAB must contain (at least) the mnemonic operation code and
its machine language equivalent. In more complex assemblers, this table also contains
information about instruction format and length.
During Pass 1, OPTAB is used to look up and validate operation codes in the source
program.
In Pass 2, it is used to translate the operation codes to machine language.
OPTAB is usually organized as a hash table, with mnemonic operation code as the key.
In most cases, OPTAB is a static table – that is, entries are not normally added to or
deleted from it.
SYMTAB is used to store values(Addresses) assigned to labels. It includes the name and
value (address) for each label in the source program, together with flags to indicate error
condition (e.g., a symbol defined in two different
places).
Format of SYMTAB is as follows
During Pass 1, labels are entered into SYMTAB as they are encountered in the source
program, along with their assigned addresses (from LOCCTR).
During Pass 2, symbols used as operands are looked up in SYMTAB to obtain the
addresses to be inserted in the assembled instruction. SYMTAB is usually organized as a
hash table for efficiency of insertion and retrieval.
Example Program
Figures 2.4 (a) and (b) show the logic flow of the two passes of our assembler.
Algorithm Pass 1
Begin
read first input line
if Opcode = ‘start’ then
begin
save #[OPERAND] as starting address
initialize LOCTR to starting address
write line to intermediate file
read first input line
end ( if START)
else
initialize LOCTR to 0
while OPCODE != END do
begin
if this is not a comment line then
begin
if there is a symbol in the LABEL field then
begin
search SYMTAB for LABEL
if found then
set error flag (duplicate symbol)
else
insert (LABEL,LOCCTR) into SYMTAB
end ( if symbol)
search OPTAB for OPCODE
if found then
add 3 {instruction length} to LOCCTR
else if OPCODE = ‘WORD’ then
add 3 to LOCCTR
else if OPCODE = ‘RESW’ then
add 3 * #[OPERAND] to LOCCTR
else if OPCODE = ‘RESB’ then
add #[OPERAND] to LOCCTR
else if OPCODE = ‘BYTE then
begin
find length of constant in bytes
add length to LOCCTR
end ( if BYTE)
else
set error flag (invalid operation code)
end {if not a comment}
write line to intermediate file
read next input line
end (while not END)
write last line to intermediate file
save (LOCCTR-starting address) as program length
end {pass 1}
fig 2.4(a)
Algorithm Pass 2
Begin
read first input line{ intermediate file }
if Opcode = ‘START’ then
begin
write listing line
read first input line
end ( if START)
write Header record to object program
initialize first Text record
begin
if this is not a comment line then
begin
search OPTAB for OPCODE
if found then
begin
if there is a symbol in OPERAND field then
begin
search SYMTAB for OPERAND
if found then
store symbol value as operand address
else
begin
store 0 as operand address
set error flag (undefined symbol)
end
end { if symbol}
else
store 0 as operand address
assemble the object code instruction
end { if opcode found}
else if OPCODE =’BYTE ‘or ‘WORD’ then
convert constant to object code
if object code will not fit into the current Text record then
begin
write Text record to object program
initialize new Text record
end
add object code to Text record
end {if not comment}
write listing line
read next input line
end {while not END}
write last Text record to object program
write End record to object program
write last listing line
end {pass 2}
fig 2.4(b)
The source lines input to this algorithm is assumed in a fixed format with fields LABEL,
OPCODE, and OPERAND. If one of these fields contains a character string that represents a
number, we denote its numeric value with the prefix # (for example, #[OPERAND]).
Prefix to operands:
@ --> indirect addressing;
# --> immediate operands;
+ --> extended instruction format.
Instructions that refer to memory are normally assembled using either the program-
counter relative or the base relative mode. The assemble directive BASE (Fig 2.5, line 13) is
used in conjunction with base relative addressing. The main differences between Fig 2.5
(SIC/XE) and Fig 2.1 (SIC) involve the use of register-to-register instructions (lines 150, 165).
In addition, immediate addressing and indirect addressing have been used as much as possible
(lines 25, 55, and 70).
Note that
When the displacement is added to the contents of the program counter (PC) or the base
register (B), the correct target address must be computed.
The resulting displacement must be small enough to fit in the 12-bit field in the
instruction. This means that the displacement must be between 0 and 4095 (for base relative
mode) or between –2048 and +2047 (for program-counter relative mode). If neither program-
counter relative nor base relative addressing can be used (because the displacements are too
large), then the 4-byte extended instruction format (20-bit displacement) must be used.
Example:
If this is not possible (out of range), the assembler then attempts to use base relative
addressing. If neither form is applicable and the extended format is not specified, then the
instruction cannot be properly assembled and the assembler must generate an error message.
Example:
The displacement calculation for program- counter relative and base relative addressing mode -
Note that the program counter is advanced after each instruction is fetched and before it
is executed.
While STL is executed, PC will contain the address of the next instruction (0003), where
RETADR (line 95) is assigned the address 0030.
The displacement we need in the instruction is 30 – 3 = 2D, that is, target address = (PC)
+ disp = 3 + 2D = 30.
Note that bit p = 1 to indicate PC relative addressing, making the last 2 bytes of the
instruction 202D.
Solution
The operand address (CLOOP=0006); during instruction execution, the PC=001A. Thus the
displacement = 6 – 1A = -14 (using 2’s complement for negative number in a 12-bit field =
FEC).
Solution
Solution
Base relative
The displacement calculation process for base relative addressing is much the same as for
PC relative addressing. The main difference is that the assembler knows what the contents of the
PC will be at execution time. On the other hand, the base register is under control of the
programmer. Therefore, the programmer must tell the assembler what the base register will
contain during execution of the program so that the assembler can compute displacements. This
is done in our example with the assembler directive BASE (line 13). In some case, the
programmer can use another assembler directive NOBASE to inform the assembler that the
contents of the base register can no longer be relied upon for addressing.
Solution
Example: 1
55 0020 LDA #3 010003
Example: 2
1) In this case, the operand (4096) is too large to fit into the 12-bit displacement field, so
the extended instruction format is called for. (If the operand were too large even for this 20-bit
address field, immediate addressing could not be used.)
Solution
b. Since the value of this symbol is the address assigned to it, this immediate
instruction has the effect of loading register B with the address of
LENGTH.
c. Note that we have combined PC relative addressing with immediate
addressing. (PC = 0006, LENGTH = 0033,
disp = 0033 – 0006 = 002D)
solution
Register-to-register instructions
Depending on addressing modes used the assembler design changes. SIC-XE machine has
specific instruction formats for the different addressing modes.
Convert the mnemonic name to their number equivalents.i.e. Register name (A, X, L, B, S, T, F,
PC, SW) and their values (0, 1, 2, 3, 4, 5, 6, 8, 9). May implement in a separate table or preload
the register names and values to SYMTAB.
Ex:
125 CLEAR X
B4 1 0
opcode X
150 COMPR A, S A0 0 4
opcode A S
1. Generate the complete object program for the following assembly level program
Solution
Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI
20
SYSTEM SOFTWARE
END FIRST
Symbol Table
SYMBOL ADDRESS
SUM 4000
FIRST 4000
LOOP 4006
TABLE 4015
COUNT 5785
ZERO 5788
TOTAL 578B
Program Relocation
a) It is desirable to load and run several programs at the same time
b) The system must be able to load programs into memory wherever there is room
c) The exact starting address of the program is not known until load time.
In the object program (Fig 2.3), this statement is translated as 00102D, specifying that register A
is to be loaded from memory address 102D. Suppose we attempt to load and execute the
program at address 2000 instead of address 1000. address 102D will not contain the value that
we expect. In reality, the assembler does not know the actual location where the program will be
loaded. However, the assembler can identify for the loader those parts of the object program
that need modification. An object program that contains the information necessary to perform
this kind of modification is called a relocatable program.
Relocatable program
The object program that contains the modification record is called a relocatable program.SIC-XE
programs are relocatable, which is assembled using a starting address of 0000 (fig a).
Fig 2.7 shows different places (0000, 5000, 7420) for locating a program. For example, in the
instruction “+JSUB RDREC”, the address of RDREC is 1036(0000), 6036(5000), 8456(7420).
When the assembler generates the object code for JSUB instruction, it will insert the address of
RDREC relative to the start of the program. (This is the reason we initialized the location counter
to 0 for the assembly.)
The assembler will also produce a command for the loader, instructing it to add the beginning
address of the program to the address field in the JSUB instruction at load time. A modification
record has the format shown in P.64. Note that the length field of a modification record is
stored in half-bytes (rather than byte) because the address field to be modified may not occupy
an integral number of bytes.
For example, the address field in the +JSUB occupies 20 bits. The starting location field of a
modification record is the location of the byte containing the leftmost bits of the address field to
be modified. If this address field occupies an odd number of half-bytes, it is assumed to begin in
the middle of the first byte at the starting location.
Example:
The modification record for the +JSUB instruction would be “M00000705”. This record
specifies that the beginning address of the program is to be added to a field that begins at address
000007 (relative to the start of the program) and is 5 half-bytes in length. Thus in the assembled
instruction 4B101036, the first 12 bits (4B1) will remain unchanged. The program load address
will be added to the last 20 bits (01036) to produce the correct operand address. In Fig 2.6, only
lines 35 and 65 need to be relocated. The rest of the instructions in the program need not be
modified when the program is loaded.
In some cases, this is because the instruction operand is not a memory address at all (e.g.,
CLEAR R or LDA #3). In other cases, no modification is needed because the operand is
specified using PC relative or base relative addressing. Obviously, the only parts of the program
that require modification at load time are those that specify direct (as opposed to relative)
addresses. Fig 2.8 shows the complete object program corresponding to the source program of
Fig 2.5.
Modification record
Col. 1 M
Col. 2-7 Starting location of the address field to be modified, relative to the beginning of the
program (Hex)
Col. 8-9 Length of the address field to be modified, in half-bytes (Hex)
Questions