Unit Ii Assemblers PDF
Unit Ii Assemblers PDF
www.noyesengine.com
www.technoscriptz.com
1
1. BASIC ASSEMBLER FUNCTIONS
6. PROGRAM RELOCATION
8. LITERALS
9. SYMBOL-DEFINING STATEMENTS
2
10. EXPRESSIONS
Assemblers
Basic Assembler Functions
Machine-dependent Assembler Features
Machine-independent Assembler Features
Assembler Design Options
Role of Assembler
3
Assembler Object
Source Linker
Code
Program
Executable
Code
Loader
Introduction to Assemblers
Fundamental functions
translating mnemonic operation codes to their machine language equivalents
assigning machine addresses to symbolic labels
Machine dependency
different machine instruction formats and codes
4
Example Program
Object Code
5
6
Purpose
reads records from input device (code F1)
copies them to output device (code 05)
at the end of the file, writes EOF on the output device, then RSUB to the operating
system
program
Assembler Directives
Pseudo-Instructions
Not translated into machine instructions
7
Providing information to the assembler
Basic assembler directives
START
END
BYTE
WORD
RESB
RESW
Examples
Mnemonic code (or instruction name) Opcode.
Variable names, Labels, Subroutines, Constants Address
Examples:
8
STL RETADR 14 10 33
STCH BUFFER,X 54 90 39
9
Source
progra
m
PASS-I Algorithm
begin
read first input line
if OPCODE = 'START' then
begin
save #[OPERAND] as starting address
initialized LOCCTR to starting address
write line to intermediate file
read next input line
end {if START}
else
initialized LOCCTR to 0
while OPCODE != 'END' do
begin
if this is not a comment line then
begin
if there is a symbol in the LABEL field then
begin
search SYMTAB for LABEL
if found then
set error flag (duplicate symbol)
else
insert (LABEL, LOCCTR) into SYMTAB
10
end {if symbol}
search OPTAB for OPCODE
if found then
add 3 {instruction lengh} to LOCCTR
else if OPCODE = 'WORD' then
add 3 to LOCCTR
else if OPCODE = 'RESW' then
add 3 * #[OPERAND] to LOCCTR
else if OPCODE = 'RESB' then
add #[OPERAND] to LOCCTR
else if OPCODE = 'BYTE' then
begin
find length of constant in bytes
add length to LOCCTR
end {if BYTE}
else
set error flag (invalid operation code)
end {if not a comment}
write line to intermediate file
read next input line
end {while not END}
write last line to intermediate file
save (LOCCTR - starting address) as program length
end
PASS-2 Algorithm
begin
read first input file {from intermediate file}
if OPCODE = 'START' then
begin
write listing line
read next input line
end {if START}
write header record to object program
initialized first Text record
while OPCODE != 'END' do
begin
if this is not a comment line then
11
begin
search OPTAB for OPCODE
if found then
begin
if there is a symbol in OPERAND field then
begin
search SYMTAB for OPERAND
if found then
store symbol value as operand address
else
begin
store 0 as operand address
set error flag (undefined symbol)
end
end {if symbol}
else
store 0 as operand address
assemble the object code instruction
end {if opcode found}
else if OPCODE = 'BYTE' or 'WORD' then
convert constant to object code
if object code not fit into the current Text record then
begin
write Text record to object program
initialized new Text record
end
add object code to Text record
end {if not comment}
write listing line
read next input line
end {while not END}
write last Text record to object program
write End record to object program
write last listing line
end
Data Structures
Operation Code Table (OPTAB)
Symbol Table (SYMTAB)
Location Counter(LOCCTR)
OPTAB (operation code table)
COPY 1000
FIRST 1000
CLOOP 1003
ENDFIL 1015
EOF 1024
THREE 102D
ZERO 1030
RETADR 1033
LENGTH 1036
BUFFER 1039
RDREC 2039
LOCCTR(Location counter)
– Used to help in the assignment of addresses
– Initialized to the beginning address specified in the START
statement
– After each source statement is processed, the length of the
assembled instruction or data area to be generated is added
– Gives the address of a label
13
Object Program
Header
Col. 1 H
(END program_name)
E 001000
Assembler Design
Machine Dependent Assembler Features
instruction formats and addressing modes
program relocation
Machine Independent Assembler Features
literals
symbol-defining statements
expressions
program blocks
control sections and program linking
14
Machine-dependent Assembler Features
Instruction formats and addressing modes
Program relocation
12 bits
6
bits
OPCODE n i x b p e Address
2D
1 2
7 0
Instruction Format and Addressing Mode
SIC/XE
PC-relative or Base-relative addressing: op m
Indirect addressing: op @m
Immediate addressing: op #c
Extended format: +op m
Index addressing: op m,x
register-to-register instructions
larger memory -> multi-programming (program allocation)
Translation
Register translation
register name (A, X, L, B, S, T, F, PC, SW) and their values (0,1, 2, 3, 4, 5, 6, 8, 9)
preloaded in SYMTAB
Address translation
Most register-memory instructions use program counter relative or base relative addressing
Format 3: 12-bit address field
base-relative: 0~4095
pc-relative: -2048~2047
Format 4: 20-bit address field
15
PC-Relative Addressing Modes
PC-relative
10 0000 FIRST STL RETADR 17202D
OPCODE n i x b p e Address
0001 01 1 1 0 0 1 0 (02D)16
OPCODE n i x b p e Address
0011 11 1 1 0 0 1 0 (FEC)16
16
OPCODE n i x b p e Address
0101 01 1 1 1 1 0 0 (003)16
NOBASE is used to inform the assembler that the contents of the base register no longer be relied
upon for addressing
OPCODE n i x b p e Address
0000 00 0 1 0 0 0 0 (003)16
OPCODE n i x b p e Address
0111 01 0 1 0 0 0 1 (01000)16
17
OPCODE n i x b p e Address
0110 10 0 1 0 0 1 0 (02D)16
LENGTH=0033=PC+displacement=0006+02D
if immediate mode is specified, the target address becomes the operand
Indirect Address Translation
Indirect addressing
target addressing is computed as usual (PC-relative or BASE-relative)
only the n bit is set to 1
70 002A J @RETADR 3E2003
OPCODE n i x b p e Address
0011 11 1 0 0 0 1 0 (003)16
TA=RETADR=0030
TA=(PC)+disp=002D+0003
Program Relocation
18
Example
Absolute program, starting address 1000
e.g. 55 101B LDA THREE 00102D
Relocate the program to 2000
e.g. 55 101B LDA THREE 00202D
Each Absolute address should be modified
Example
Except for absolute address, the rest of the instructions need not be modified
not a memory address (immediate addressing)
PC-relative, Base-relative
The only parts of the program that require modification at load time are those that specify direct
addresses
Relocatable Program
Modification record
Col 1 M
Col 2-7 Starting location of the address field to be
19
modified, relative to the beginning of the program
Col 8-9 length of the address field to be modified, in half-
bytes
Modification records are added to the object files. (See pp.64-65 and Figure 2.8.)
Example:
HCOPY 001000 001077
T000000 1D 17202D…4B101036…
T00001D ……
…
M000007 05 Modification Record
……
E000000
Object Code
Literals
Design idea
Let programmers to be able to write the value of a constant operand as a part of the
instruction that uses it.
This avoids having to define the constant elsewhere in the program and make up a
label for it.
Example
e.g. 45 001A ENDFIL LDA =C’EOF’ 032010
93 LTORG
002D * =C’EOF’ 454F46
e.g. 215 1062 WLOOP TD =X’05’ E32011
20
Literals vs. Immediate Operands
Immediate Operands
The operand value is assembled as part of the machine instruction
e.g. 55 0020 LDA #3 010003
Literals
The assembler generates the specified value as a constant at some other memory
location
e.g. 45 001A ENDFIL LDA =C’EOF’ 032010
Compare (Fig. 2.6)
e.g. 45 001A ENDFIL LDA EOF 032010
80 002D EOF BYTE C’EOF’ 454F46
Literal - Implementation
Literal pools
Normally literals are placed into a pool at the end of the program
see (END statement)
In some cases, it is desirable to place literals into a pool at some other location in the
object program
assembler directive LTORG
reason: keep the literal operand close to the instruction
Duplicate literals
e.g. 215 1062 WLOOP TD =X’05’
e.g. 230 106B WD =X’05’
The assemblers should recognize duplicate literals and store only one copy of the
specified data value
Comparison of the defining expression
• Same literal name with different value, e.g. LOCCTR=*
Comparison of the generated data value
• The benefits of using generate data value are usually not great enough to justify
the additional complexity in the assembler
LITTAB
literal name, the operand value and length, the address assigned to the operand
Pass 1
build LITTAB with literal name, operand value and length, leaving the address unassigned
when LTORG statement is encountered, assign an address to each literal not yet assigned an
address
Pass 2
21
search LITTAB for each literal operand encountered
generate data values using BYTE or WORD statements
generate modification record for literals that represent an address in the program
Symbol-Defining Statements
Labels on instructions or data areas
the value of such a label is the address assigned to the statement
Defining symbols
symbol EQU value
value can be: constant, other symbol, expression
making the source program easier to understand
no forward reference
Example 1
MAXLEN EQU 4096
+LDT #MAXLEN
+LDT #4096
Example 3
MAXLEN EQU BUFEND-BUFFER
Counter Example
BETA EQU ALPHA
ALPHA RESW 1
BETA cannot be assigned a value when it is encountered during Pass 1 of the assembly
(because ALPHA does not yet have a value).
ORG (origin)
Indirectly assign values to symbols
Reset the location counter to the specified value
ORG value
Value can be: constant, other symbol, expression
No forward reference
Example
SYMBOL: 6bytes
VALUE: 1word
FLAGS: 2bytes
LDA VALUE, X
22
SYMBOL VALUE FLAGS
STAB
(100 entries)
ORG Example
Using EQU statements. . .
STAB RESB .
1100 . .
. . .
SYMBOL EQU STAB
VALUE EQU STAB+6
FLAG EQU STAB+9
Using ORG statements
STAB RESB 1100
ORG STAB
SYMBOL RESB 6
VALUE RESW 1
FLAGS RESB 2
ORG STAB+1100
Expressions
Expressions can be classified as absolute expressions or relative expressions
MAXLEN EQU BUFEND-BUFFER
BUFEND and BUFFER both are relative terms, representing addresses within the program
However the expression BUFEND-BUFFER represents an absolute value
When relative terms are paired with opposite signs, the dependency on the program starting
address is canceled out; the result is an absolute value
SYMTAB
None of the relative terms may enter into a multiplication or division operation
Errors:
BUFEND+BUFFER
100-BUFFER
3*BUFFER
The type of an expression
keep track of the types of all symbols defined in the program
23
Implementation
Pass 1
each program block has a separate location counter
each label is assigned an address that is relative to the start of the block that contains it
at the end of Pass 1, the latest value of the location counter for each block indicates the length of
that block
the assembler can then assign to each block a starting address in the object program
Pass 2
The address of each symbol can be computed by adding the assigned block starting address and the
relative address of the symbol to that block
Each source line is given a relative address assigned and a block number
Block name Block number Address Length
(default) 0 0000 0066
CDATA 1 0066 000B
CBLKS 2 0071 1000
25
External Definition and References
External definition
EXTDEF name [, name]
EXTDEF names symbols that are defined in this control section and may be used by
other sections
External reference
EXTREF name [,name]
EXTREF names symbols that are used in this control section and are defined
elsewhere
Example
15 0003 CLOOP +JSUB RDREC 4B100000
160 0017 +STCH BUFFER,X 57900000
190 0028 MAXLEN WORD BUFEND-BUFFER 000000
Implementation
The assembler must include information in the object program that will cause the loader to insert
proper values where they are required
Define record
Col. 1 D
Col. 2-7 Name of external symbol defined in this control section
Col. 8-13 Relative address within this control section (hexadeccimal)
Col.14-73 Repeat information in Col. 2-13 for other external symbols
Refer record
Col. 1 D
Col. 2-7 Name of external symbol referred to in this control section
26
Col. 8-73 Name of other external reference symbols
Modification Record
Modification record
Col. 1 M
Col. 2-7 Starting address of the field to be modified (hexiadecimal)
Col. 8-9 Length of the field to be modified, in half-bytes (hexadeccimal)
Col.11-16 External symbol whose value is to be added to or subtracted from the indicated field
Note: control section name is automatically an external symbol, i.e. it is available for use in
Modification records.
Example
Figure 2.17
M00000405+RDREC
M00000705+COPY
One-Pass Assemblers
Main problem
forward references
27
data items
labels on instructions
Solution
data items: require all such areas be defined before they are referenced
labels on instructions: no good solution
Main Problem
forward reference
data items
labels on instructions
2. insert the symbol into SYMTAB, and mark this symbol undefined
3. the address that refers to the undefined symbol is added to a list of forward references
associated with the symbol table entry
4. when the definition for a symbol is encountered, the proper address for the symbol is
then inserted into any instructions previous generated according to the forward reference
list
29
The loader is used to complete forward references that could not be handled by the assembler
The object program records must be kept in their original order when they are presented to the
loader
Example
Multi-Pass Assemblers
Restriction on EQU and ORG
no forward reference, since symbols’ value can’t be defined during the first pass
Example
Use link list to keep track of whose value depend on an undefined symbol
30
31
32
ASSEMBLERS
1. Introduction
There are two main classes of programming languages: high level (e.g., C, Pascal) and low
level. Assembly Language is a low level programming language. Programmers code symbolic
instructions, each of which generates machine instructions.
An assembler is a program that accepts as input an assembly language program (source) and
produces its machine language equivalent (object code) along with the information for the loader.
Disadvantages:
Not portable
More complex
Requires understanding of hardware details (interfaces)
Assembler:
A two-pass assembler performs two sequential scans over the source code:
Data Structures:
- Location counter (LC): points to the next location where the code will be placed
- Op-code translation table: contains symbolic instructions, their lengths and their op-codes (or
subroutine to use for translation)
- String storage buffer (SSB): contains ASCII characters for the strings
- Configuration table: contains pointer to the string in SSB and offset where its value will be inserted
in the object code
assembly machine
language Pass1 Pass 2 language
program Symbol table program
Configuration table
String storage buffer
Partially configured object file
34
Example 1: Decrement number 5 by 1 until it is equal to zero.
Op-code Table
Symbol Table
Mnemonic Addressing Opcode
mode Symbo Value
LDA immediate 01 l
SUB immediate 1D LOOP 0103
COMP immediate 29
LDX immediate 05
ADD indexed 18
TIX direct 2C
JLT direct 38
JGT direct 34
RSUB implied 4C
35
Example 2: Sum 6 elements of a list which starts at location 200.
Symbol Table
Symbo Addres
l s
LOOP 0106
36
LIST 0112
COUN 0115
T
Configuration Table
SSB
DC00 4C
H
DC01 49H ASCII for L,I,S,T
DC02 53H
DC03 54H
DC04 5E ASCII for separation
H character
DC05
Pass1
All symbols are identified and put in ST
All op-codes are translated
Missing symbol values are marked
LC = origin
Y
Comment
“END” Y Pass 2
37
N
N pseudo-op Y what
kind?
N
Label EQU WORD/ RESW/RESB
BYTE
Y
N Label N Label
Enter label in ST Enter label in ST
Y Y
Place constant in
machine code
Advance LC by the
number of bytes specified
Advance LC in the pseudo-op
38
Translator Routine
more
information Y
will be needed Set up an entry in
in Pass 2 ? Configuration Table
N
return
Pass 2
- Fills addresses and data that was unknown during Pass 1.
Pass 1
More lines in N
Configuration Done
Table
39
Compute the location in memory
where this value will be placed
(starting address + offset)
Relocatable Code
Producing an object code, which can be placed to any specific area in memory.
Direct Address Table (DAT): contains offset locations of all direct addresses in the program (e.g., 8080
instructions that specify direct addresses are LDA, STA, all conditional jumps...). To relocate the
program, the loader adds the loading point to all these locations.
Example 3: Following relocatable object code and DAT are generated for Example 1.
DAT
0007
000A
000D
Forward and backward references in the machine code are generated relative to address 0000. To
relocate the code, the loader adds the new load-point to the references in the machine code which are
pointed by the DAT.
One-Pass Assemblers
Multi-Pass Assemblers
Example 3:
A EQU B
B EQU C
C DS 1
41
Such references can also be solved in two passes: entering symbol definitions that
involve forward references in the symbol table. Symbol table also indicates which
symbols are dependent on the values of others.
Example 4:
A EQU B
B EQU D
C EQU D
D DS 1
Symbol Table
A &1 B 0
B &1 D A 0
C &1 D 0
D 200 B C 0
Symbol Table
A 200 0
B 200 0
C 200 0
D 200 0
42