0% found this document useful (0 votes)
2K views

A Simple Two-Pass Assembler

This document describes the key functions of a simple two-pass assembler including: 1. Translating mnemonic operation codes to machine language equivalents and assigning machine addresses to symbolic labels. 2. It depends on the source language being assembled and the machine language produced, specifically the instruction format and addressing modes. 3. The assembler's main jobs are to convert mnemonics and symbols to machine codes and addresses, use proper instruction formats and addressing modes, translate data constants, and output the object program.

Uploaded by

chithra smitha
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views

A Simple Two-Pass Assembler

This document describes the key functions of a simple two-pass assembler including: 1. Translating mnemonic operation codes to machine language equivalents and assigning machine addresses to symbolic labels. 2. It depends on the source language being assembled and the machine language produced, specifically the instruction format and addressing modes. 3. The assembler's main jobs are to convert mnemonics and symbols to machine codes and addresses, use proper instruction formats and addressing modes, translate data constants, and output the object program.

Uploaded by

chithra smitha
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 99

A Simple Two-Pass Assembler

Main Functions
• Translate mnemonic operation codes to
their machine language equivalents
• Assign machine addresses to symbolic
labels used by the programmers

Depend heavily on the source language it translates


the machine language it produces.
E.g., the instruction format and addressing modes
Basic Functions
Example 2.1
Line numbers are not part of the program. They are for
reference only. Forward reference

Call
subroutine

code
Comment line

Indexing mode

Hexadecimal number
Subroutine entry point

Subroutine return point


Purpose of Example 2.1 (COPY)
• It is a copy function that reads some records
from a specified input device and then copies
them to a specified output device
– Reads a record from the input device (code F1)
– Copies the record to the output device (code 05)
– Repeats the above steps until encountering EOF.
– Then writes EOF to the output device
– Then call RSUB to return to the caller
RDREC and WRREC
• Data transfer
– A record is a stream of bytes with a null character (0016) at the
end.
– If a record is longer than 4096 bytes, only the first 4096 bytes
are copied.
– EOF is indicated by a zero-length record. (I.e., a byte stream
with only a null character.
– Because the speed of the input and output devices may be
different, a buffer is used to temporarily store the record
• Subroutine call and return
– On line 10, “STL RETADR” is called to save the return address
that is already stored in register L.
– Otherwise, after calling RD or WR, this COPY cannot return back
to its caller.
Assembler Directives
• Assembler directives are pseudo instructions
– They will not be translated into machine instructions.
– They only provide instruction/direction/information to
the assembler.
• Basic assembler directives :
– START :
• Specify name and starting address for the program
– END :
• Indicate the end of the source program, and (optionally)
the first executable instruction in the program.
Assembler Directives (cont’d)
– BYTE :
• Generate character or hexadecimal constant, occupying
as many bytes as needed to represent the constant.
– WORD :
• Generate one-word integer constant
– RESB :
• Reserve the indicated number of bytes for a data area
– RESW :
• Reserve the indicated number of words for a data area
An Assembler’s Job
• Convert mnemonic operation codes to their
machine language codes
• Convert symbolic (e.g., jump labels, variable
names) operands to their machine addresses
• Use proper addressing modes and formats to
build efficient machine instructions
• Translate data constants into internal machine
representations
• Output the object program and provide other
information (e.g., for linker and loader)
Object Program Format
• Header
Col. 1 H
Col. 2~7 Program name
Col. 8~13 Starting address of object program (hex)
Col. 14-19 Length of object program in bytes (hex)
• Text
Col.1 T
Col.2~7 Starting address for object code in this record (hex)
Col. 8~9 Length of object code in this record in bytes (hex)
Col. 10~69 Object code, represented in hexa (2 col. per byte)
• End
Col.1 E
Col.2~7 Address of first executable instruction in object program (hex)
The Object Code for COPY
H COPY 001000 00107A
T 001000 1E 141033 482039 001036 281030 301015 482061 3C1003
00102A 0C1039 00102D
T 00101E 15 0C1036 482061 081044 4C0000 454F46 000003 000000
T 002039 1E 041030 001030 E0205D 30203F D8205D 281030 302057
549039 2C205E 38203F
T 002057 1C 101036 4C0000 F1 001000 041030 E02079 302064 509039
DC2079 2C1036
T 002073 07 382064 4C0000 05
E 001000

There is no object code corresponding to addresses 1033-2038.


This storage is simply reserved by the loader for use by the
program during execution.
Assembler Design
• Machine Dependent Assembler Features
– instruction formats and addressing modes
– program relocation
• Machine Independent Assembler Features
– literals
– symbol-defining statements
– expressions
– program blocks
– control sections and program linking
Two Pass Assembler
• Pass 1
– Assign addresses to all statements in the program
– Save the values (addresses) assigned to all labels (including label
and variable names) for use in Pass 2 (deal with forward
references)
– Perform some processing of assembler directives (e.g., BYTE,
RESW, these can affect address assignment)
• Pass 2
– Assemble instructions (generate opcode and look up addresses)
– Generate data values defined by BYTE, WORD
– Perform processing of assembler directives not done in Pass 1
– Write the object program and the assembly listing
A Simple Two Pass Assembler
Implementation
Source READ (Label, opcode, operand)
program

Pass 1 Pass 2 Object


codes

OPTAB SYMTAB
SYMTAB

Mnemonic and Label and address Label and address


opcode mappings mappings enter mappings are referenced
are referenced from here from here
here
Three Main Data Structures
• Operation Code Table (OPTAB)
• Location Counter (LOCCTR)
• Symbol Table (SYMTAB)
OPTAB (operation code table)
• Content
– The mapping between mnemonic and machine code. Also
include the instruction format, available addressing modes,
and length information.
• Characteristic
– Static table. The content will never change.
• Implementation
– Array or hash table. Because the content will never change,
we can optimize its search speed.
• In pass 1, OPTAB is used to look up and validate
mnemonics in the source program.
• In pass 2, OPTAB is used to translate mnemonics to
machine instructions.
Location Counter (LOCCTR)
• This variable can help in the assignment of
addresses.
• It is initialized to the beginning address
specified in the START statement.
• After each source statement is processed, the
length of the assembled instruction and data
area to be generated is added to LOCCTR.
• Thus, when we reach a label in the source
program, the current value of LOCCTR gives the
address to be associated with that label.
Symbol Table (SYMTAB)
• Content
– Include the label name and value (address) for each label in
the source program.
– Include type and length information (e.g., int64)
– With flag to indicate errors (e.g., a symbol defined in two
places)
• Characteristic
– Dynamic table (I.e., symbols may be inserted, deleted, or
searched in the table)
• Implementation
– Hash table can be used to speed up search
– Because variable names may be very similar (e.g., LOOP1,
LOOP2), the selected hash function must perform well with
such non-random keys.
Prob 1:SIC
Prob 1:SIC
Prob 1:SIC
Prob 1:SIC
Prob 1:SIC
Prob 1:SIC
Prob 1:SIC
Prob 1:SIC
Prob 1:SIC
Prob 2 : SIC/XE
• Generate the object code for the following
SIC/XE code. Also indicate program length and
symbol table. Assume the starting address as
1000H. The following are the opcode values
• Assume
Prob2:SIC/XE
Prob2:SIC/XE
Prob2:SIC/XE
Prob2:SIC/XE
Prob2:SIC/XE
Prob2:SIC/XE
Prob2:SIC/XE
Prob2:SIC/XE
Prob2:SIC/XE
Prob2:SIC/XE
Prob2:SIC/XE
• Generate the object code for the following
SIC/XE code. Also indicate program length and
symbol table. The following are the opcode
values
Solve this
The Pseudo Code for Pass 1
The Pseudo Code for Pass 2
Machine-Dependent Assembler
Features

1. Instruction Formats and Addressing Modes


2. Program Relocation
Instruction Formats and
Addressing Modes
• PC-relative
– 10 0000 FIRST STL RETADR17202D

op(6) n i xbp e disp(12)


(14)16 1 1 0 0 1 0 (02D) 16
• displacement= RETADR - PC = 30-3 = 2D
– 40 0017 J CLOOP 3F2FEC

op(6) n i xbp e disp(12)


(3C)16 110010 (FEC) 16
• displacement= CLOOP-PC= 6 - 1A= -14= FEC
Base-Relative Addressing Modes
• Base-relative
– base register is under the control of the programmer
– 12 0003 LDB #LENGTH
– 13 BASE LENGTH
– 100 0033 LENGTH RESW 1
– 160 104E STCH BUFFER, X 57C003

( 54 )16 op(6)
1 1 1 1 0n0(i003
x b) 16p e disp(12)
(54) 111010 0036-1051= -101B 16
• displacement= BUFFER - B = 0036 - 0033 = 3
– NOBASE is used to inform the assembler that the contents of the base register no
longer be relied upon for addressing
Immediate Address Translation
• Immediate addressing
– 55 0020 LDA #3 010003
op(6) n i xbp e disp(12)
( 00 )16 010000 ( 003 ) 16

– 133 103C +LDT #4096 75101000


op(6) n i xbp e addr(20)
( 74 )16 010001 ( 01000 ) 16
Immediate Address Translation (Cont.)

– 12 0003 LDB #LENGTH 69202D


op(6) n i xbp e disp(12)
( 68)16 010010 ( 02D ) 16

• the immediate operand is the symbol LENGTH


• the address of this symbol LENGTH is loaded into register B
• LENGTH=0033=(PC)+displacement=0006+02D
• disp=LENGTH-(PC)=0033-0006=002D
• if immediate mode is specified, the target address becomes
the operand
Indirect Address Translation
• Indirect addressing
– target addressing is computed as usual (PC-relative
or BASE-relative)
– only the n bit is set to 1
– 70 002A J @RETADR 3E2003
op(6) n i xbp e disp(12)

( 3C )16 100010 ( 003 ) 16


• TA=RETADR=0030
• TA=(PC)+disp=002D+0003
• disp=RETADR-(PC)=0030-002D=0003
Program Relocation
• Multiprogramming – more than one program run
concurrently sharing the memory and resources.
• Difficult to predict the programs that are going
to execute concurrently.
• Actual starting address of the program is not
known until load time.
• Therefore the program is loaded into memory
wherever there is free space.

Chap 2
Program relocation
• Assembler does not know the actual location
where the program is loaded.
• But it can identify for the loader those parts of
the object program that need modification.
• The object program having this modification
information is the relocatable program.

Chap 2
Program Relocation
• Example Fig. 2.1(the program for which
we wrote object program)
– Absolute program, starting address 1000
e.g. 55 101B LDA THREE 00102D
– Relocate the program to 2000
e.g. 55 101B LDA THREE 00202D
– Each Absolute address should be modified
• Example Fig. 2.5:
– Except for absolute address, the rest of the
instructions need not be modified
• not a memory address (immediate
addressing)
• PC-relative, Base-relative
– The only parts of the program that require
modification at load time are those that
specify direct addresses
Program relocation
• Assembler does two functions:
1. Inserts the address of labeled
instruction(RDREC) relative to the start of the
program.
2. It will produce the command for the loader,
instructing it to add the beginning address of
the program to the address field in the
instruction (JSUB) to be modified at load time.
Example
Relocatable Program

• Modification record Note : Length is given in


half bytes as the address
Col 1 : M to be modified may not
occupy a integral
Col 2-7 : Starting
number of bytes.
location of the address
field to be modified, Ex: 4B101036 : Load
relative to the beginning of address will be added to
the program last 20 bits(01036).

Col 8-9 : length of the


address field to be
modified, in half- bytes
Object Code
Control Sections and Program Linking
• Control Sections
– are most often used for subroutines or other
logical subdivisions of a program
– the programmer can assemble, load, and
manipulate each of these control sections
separately
– instruction in one control section may need to
refer to instructions or data located in another
section because of this, there should be some
means for linking control sections together
External Definition and References
• External definition
– EXTDEF name [, name]
– EXTDEF names symbols that are defined in this
control section and may be used by other sections
• External reference
– EXTREF name [,name]
– EXTREF names symbols that are used in this control
section and are defined elsewhere
• Example
– 15 0003 CLOOP +JSUB RDREC
4B100000
– 160 0017 +STCH BUFFER,X
57900000
– 190 0028 MAXLEN WORD BUFEND-BUFFER
000000
Implementation
• The assembler must include information in the object program that will
cause the loader to insert proper values where they are required
• Define record
– Col. 1 D
– Col. 2-7 Name of external symbol defined in this control section
– Col. 8-13 Relative address within this control section (hexadeccimal)
– Col.14-73 Repeat information in Col. 2-13 for other external symbols
• Refer record
– Col. 1 D
– Col. 2-7 Name of external symbol referred to in this control section
– Col. 8-73 Name of other external reference symbols
Modification Record
• Modification record
– Col. 1 M
– Col. 2-7 Starting address of the field to be modified (hexiadecimal)
– Col. 8-9 Length of the field to be modified, in half-bytes (hexadeccimal)
– Col.11-16 External symbol whose value is to be added to or subtracted
from the indicated field
– Note: control section name is automatically an external symbol, i.e. it is
available for use in Modification records.
• Example
– Figure 2.17
– M00000405+RDREC
– M00000705+COPY
External References in Expression
• Earlier definitions
– required all of the relative terms be paired in an
expression (an absolute expression), or that all
except one be paired (a relative expression)
• New restriction
– Both terms in each pair must be relative within
the same control section
– Ex: BUFEND-BUFFER
– Ex: RDREC-COPY
• In general, the assembler cannot determine whether or
not the expression is legal at assembly time. This work will
be handled by a linking loader.
Assembler Design Options

One-pass assemblers
Multi-pass assemblers
Two-pass assembler with overlay
structure
Two-Pass Assembler with Overlay
Structure
• For small memory
– pass 1 and pass 2 are never required at the same
time
– three segments
• root: driver program and shared tables and subroutines
• pass 1
• pass 2
– tree structure
– overlay program
One-Pass Assemblers
• Main problem
– forward references
• data items
• labels on instructions
• Solution
– data items: require all such areas be defined
before they are referenced
– labels on instructions: no good solution
One-Pass Assemblers
• Main Problem
– forward reference
• data items
• labels on instructions
• Two types of one-pass assembler
– load-and-go
• produces object code directly in memory for immediate
execution
– the other
• produces usual kind of object code for later execution
Load-and-go Assembler
• Characteristics
– Useful for program development and testing
– Avoids the overhead of writing the object program out
and reading it back
– Both one-pass and two-pass assemblers can be designed
as load-and-go.
– However one-pass also avoids the over head of an
additional pass over the source program
– For a load-and-go assembler, the actual address must be
known at assembly time, we can use an absolute program
Forward Reference in One-pass Assembler
• For any symbol that has not yet been defined
1. omit the address translation
2. insert the symbol into SYMTAB, and mark this symbol
undefined
3. the address that refers to the undefined symbol is added to
a list of forward references associated with the symbol table
entry
4. when the definition for a symbol is encountered, the proper
address for the symbol is then inserted into any instructions
previous generated according to the forward reference list
Load-and-go Assembler (Cont.)
• At the end of the program
– any SYMTAB entries that are still marked with *
indicate undefined symbols
– search SYMTAB for the symbol named in the END
statement and jump to this location to begin
execution
• The actual starting address must be specified at
assembly time
• Example
– Figure 2.18, 2.19
Producing Object Code
• When external working-storage devices are not available or
too slow (for the intermediate file between the two passes
• Solution:
– When definition of a symbol is encountered, the assembler
must generate another Tex record with the correct operand
address
– The loader is used to complete forward references that could
not be handled by the assembler
– The object program records must be kept in their original order
when they are presented to the loader
• Example: Figure 2.20
Multi-Pass Assemblers
• Restriction on EQU and ORG
– no forward reference, since symbols’ value can’t
be defined during the first pass
• Example
– Use link list to keep track of whose value depend
on an undefined symbol
• Example shown in Figure below
Multi-Pass Assemblers
Multi-Pass Assemblers
Multi-Pass Assemblers
Multi-Pass Assemblers
• Basic Loader functions
1)Design of an absolute loader
2) A Simple bootstrap loader
Design of an Absolute Loader

• Absolute Program
– Advantage
• Simple and efficient
– Disadvantage
• the need for programmer to specify the actual address
• difficult to use subroutine libraries
• Program Logic
Fig. 3.2 Algorithm for an absolute loader

Begin
read Header record
verify program name and length
read first Text record
while record type is not ‘E’ do
begin
{if object code is in character form, convert into internal representation}
move object code to specified location in memory
read next object program record
end
jump to address specified in End record
end
A Simple Bootstrap Loader
• Bootstrap Loader
– When a computer is first turned on or restarted, a
special type of absolute loader, called bootstrap
loader is executed
– This bootstrap loads the first program to be run by
the computer -- usually an operating system
• Example (SIC bootstrap loader)
– The bootstrap itself begins at address 0
– It loads the OS starting address 0x80
– No header record or control information, the
object code is consecutive bytes of memory

You might also like