0% found this document useful (0 votes)
12 views

Module-1 Part2 Assemblers

Module-1 Part2 Assemblers

Uploaded by

srinivas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Module-1 Part2 Assemblers

Module-1 Part2 Assemblers

Uploaded by

srinivas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

SYSTEM SOFTWARE

Module 1
Part-2
ASSEMBLERS

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


1
SYSTEM SOFTWARE

Introduction

Assembly Language Assembler Machine Language


Program equivalent

Data Structure

An assembler is a program that accepts input as assembly language program & produces
its machine language equivalent along with information for the loader.

Basic Assemble Functions


Assembler directives:
The instructions given to the assembler while translating assembly language to machine
language are called assembler directives. They are pseudo instructions. They provide
instructions to the assembler itself. They are not translated into machine operation codes.

1. WORD This assembler directive is used to generate one word. It reserves one word of
storage (3 bytes) and will be initialized to a value specified in the operand field of the
instruction.

Example A WORD 10
10=00000A

LOC VALUE
A 0A
A+1 00
A+2 00

2. RESW This assembler directive instructs the assembler to reserves indicated number of
words for data area.
Example A RESW 3
3 WORDS = 9 bytes of space is reserved

3 BYTE : This is an assembler directive which instructs the assembler to generate a character
or constant (Hexadecimal) occupying as many bytes as needed to represent the constant.

Example INPUT BYTE X’F1’


LOC VALUE
INPUT F1

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


2
SYSTEM SOFTWARE

STR BYTE C’ HELLO’

LOC VALUE
STR H
STR+1 E
STR+2 L
STR+3 L
STR+4 O
STR+5

Note : ASCII values of characters are stored in the memory.

4. RESB This assembler directive instructs the assembler to reserves indicated number of
bytes for data area.
Example A RESB 3

LOC VALUE
A (B1)
A+1 (B2)
A+2 (B3)

5. START This assembler directive is used by the assembler to specify the name and
starting address of the program.

Example COPY START 1000

This indicates that name of the program is COPY and 1000 in the operand field indicates
that the program should be loaded into the memory from location 1000.

6. END this is an instruction to the assembler which indicates the end of the source program
is reached. An optional operand field indicates that the execution starts from operand
specified.

Example COPY START 1000


-
FIRST -
-
-
-
END FIRST

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


3
SYSTEM SOFTWARE

Write a SIC program that reads a record from an input device (F1) and copies them to an output
device 05.
The program contains three routines:
A main routine (from line 10 to line 105)
Which calls two others routines RDREC, and WRREC inside a loop (CLOOP?)
A subroutine RDREC (from line 125 to line 190)
Which reads records from an input device (identified with device code F1: line
185) and to the buffer
A subroutine WRREC (from line 210 to line 250)
Which write the record from the buffer to the output device

Each subroutine (RDREC and WRREC) must transfer the record one character at time
because the only I/O available are RD and WD. The end of each record is marked with a null
character (hexadecimal 00). If the record is longer than the length of the buffer (4096 bytes),
only the first 4096 bytes are copied. The end of the file to be copied is indicated by a zero-length
record. When the end of file is detected, the program writes EOF on the output device and
terminates by executing an RSUB instruction. We assume that this program was called by the OS
using JSUB instruction; thus, the RSUB will return control to the OS..The program starts at
address 1000.

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


4
SYSTEM SOFTWARE

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


5
SYSTEM SOFTWARE

A simple SIC Assembler


General format to represent Assembly language program for SIC machine with the generated
assembly code :

Line Location Source statements Object code


No Label Opcode<operand>

The translation of source program to object code requires following functions:


1) Convert mnemonic operation codes to their machine language equivalents – e.g.,
translate STL to 14 (line 10)
2) Convert symbolic operands to their equivalent machine address – e.g., translate
RETADR to 1033 (line 10)
3) Build the machine instructions in the proper format (format3, …)
4) Convert the data constants specified in the source program into their internal
machine representations – e.g., EOF to 454F46 (line 80)
5) Write the object program and the assembly listing.

Forward Reference:
The translation of source program to object code is accomplished by sequential processing line
by line at a time.
In sequential processing assembler functions 1, 3, 4, 5 can be easily processed. But function 2 i.e.
converting symbolic operands to their equivalent machine address presents some problem.

Eg: consider the following code


Line no loc
10 1000 FIRST STL RETADR 14----
. .
. .
. .
95 ? RETADR RESW 1

The instruction at line no 10 contains forward reference i.e. RETADR is defined later in the
program. Translating program line by line will be unable to process this instruction. Because of
this 2-pass Assembler is used.

First pass: scan the source program for label definitions and assign addresses
Second pass performs most of the actual translation.
Intermediate file: Finally the Assembler must write the generated object code onto some output
device. The object program will later be loaded into memory for execution.

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


6
SYSTEM SOFTWARE

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


7
SYSTEM SOFTWARE

The simple object program format we use contains three types of records

1. Header record
2. Text record
3. End record

Header record

Col. 1 H
Col. 2~7 Program name
Col. 8~13 Starting address of object program (hex)
Col. 14-19 Length of object program in bytes (hex)

Example
H^COPY ^001000^00107A
(we use the ^ symbol to separate fields)
001000 is the starting address of object program
00107A = length of object program in bytes =( last location) – (1st location + 1) = 2079 – 1000+1
(all in hexadecimal)

Text record

Col. 1 T
Col. 2~7 Starting address for object code in this record (hex)
Col. 8~9 Length of object code in this record in bytes (hex)
Col. 10~69 Object code, represented in hex (2 col. per byte)

Example
T^001000^1E^141033^482039^001036^281030^301015^482061^3C1003^00102A^0C1039^00102D
T^00101E^15^0C1036^…

End record

Col.1 E
Col.2~7 Address of first executable instruction in object program (hex)

Note: “^” is only for separation only

Example
E^001000

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


8
SYSTEM SOFTWARE

Pass 1 (define symbol)

The two passes of an assembler


Pass 1 (define symbols)
1. Assign addresses to all statements in the program
2. Save the addresses assigned to all labels for use in Pass 2
3. Perform assembler directives, including those for address assignment, such as BYTE and
RESW
Pass 2 (assemble instructions and generate object program)
1. Assemble instructions (generate opcode and look up addresses)
2. Generate data values defined by BYTE, WORD
3. Perform processing of assembler directives not done during Pass 1
4. Write the object program and the assembly listing.

Assembler algorithm and data structures


The simple assembler uses two major internal data structures:
1. The Operation Code Table (OPTAB) and
2. The Symbol Table (SYMTAB).

OPTAB is used to lookup mnemonic operation codes and translate them to their machine
language equivalents. OPTAB must contain (at least) the mnemonic operation code and
its machine language equivalent. In more complex assemblers, this table also contains
information about instruction format and length.

Mnemonic Machine language equivalent Instruction format Length.


Operation Code

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


9
SYSTEM SOFTWARE

During Pass 1, OPTAB is used to look up and validate operation codes in the source
program.
In Pass 2, it is used to translate the operation codes to machine language.
OPTAB is usually organized as a hash table, with mnemonic operation code as the key.
In most cases, OPTAB is a static table – that is, entries are not normally added to or
deleted from it.

SYMTAB is used to store values(Addresses) assigned to labels. It includes the name and
value (address) for each label in the source program, together with flags to indicate error
condition (e.g., a symbol defined in two different
places).
Format of SYMTAB is as follows

Name Address Flag Other information

During Pass 1, labels are entered into SYMTAB as they are encountered in the source
program, along with their assigned addresses (from LOCCTR).
During Pass 2, symbols used as operands are looked up in SYMTAB to obtain the
addresses to be inserted in the assembled instruction. SYMTAB is usually organized as a
hash table for efficiency of insertion and retrieval.

A Location Counter (LOCCTR) is used to be a variable and help in the assignment of


addresses. Whenever a label in the source program is read, the current value of LOCCTR gives
the address to be associated with that label. There is certain information (such as location
counter values and error flags for statements) that can or should be communicated between the
two passes. For this reason, Pass 1 usually writes an inter-mediate file that contains each source
statement together with its assigned address, error indicators, etc. This file is used as the input to
Pass 2.

Example Program

Line no location source statements object code


10 0100 START 0100
20 0100 FIRST LDA NUM1 00 0109
30 0103 DIV NUM2 24 010C
40 0106 STA NUM 0C 010F
50 0109 NUM1 WORD 85 00 0055 (convert to HEX)
60 010C NUM2 WORD 05 00 0005
70 010F NUM RESW 1
END
SYMTAB
NAME ADDRESS
FIRST 1000
NUM1 0109
NUM2 010C
NUM 010F
Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI
10
SYSTEM SOFTWARE

Figures 2.4 (a) and (b) show the logic flow of the two passes of our assembler.
Algorithm Pass 1
Begin
read first input line
if Opcode = ‘start’ then
begin
save #[OPERAND] as starting address
initialize LOCTR to starting address
write line to intermediate file
read first input line
end ( if START)
else
initialize LOCTR to 0
while OPCODE != END do
begin
if this is not a comment line then
begin
if there is a symbol in the LABEL field then
begin
search SYMTAB for LABEL
if found then
set error flag (duplicate symbol)
else
insert (LABEL,LOCCTR) into SYMTAB
end ( if symbol)
search OPTAB for OPCODE
if found then
add 3 {instruction length} to LOCCTR
else if OPCODE = ‘WORD’ then
add 3 to LOCCTR
else if OPCODE = ‘RESW’ then
add 3 * #[OPERAND] to LOCCTR
else if OPCODE = ‘RESB’ then
add #[OPERAND] to LOCCTR
else if OPCODE = ‘BYTE then
begin
find length of constant in bytes
add length to LOCCTR
end ( if BYTE)
else
set error flag (invalid operation code)
end {if not a comment}
write line to intermediate file
read next input line
end (while not END)
write last line to intermediate file
save (LOCCTR-starting address) as program length
end {pass 1}

fig 2.4(a)

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


11
SYSTEM SOFTWARE

Algorithm Pass 2

Begin
read first input line{ intermediate file }
if Opcode = ‘START’ then
begin
write listing line
read first input line
end ( if START)
write Header record to object program
initialize first Text record
begin
if this is not a comment line then
begin
search OPTAB for OPCODE
if found then
begin
if there is a symbol in OPERAND field then
begin
search SYMTAB for OPERAND
if found then
store symbol value as operand address
else
begin
store 0 as operand address
set error flag (undefined symbol)
end
end { if symbol}
else
store 0 as operand address
assemble the object code instruction
end { if opcode found}
else if OPCODE =’BYTE ‘or ‘WORD’ then
convert constant to object code
if object code will not fit into the current Text record then
begin
write Text record to object program
initialize new Text record
end
add object code to Text record
end {if not comment}
write listing line
read next input line
end {while not END}
write last Text record to object program
write End record to object program
write last listing line
end {pass 2}
fig 2.4(b)

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


12
SYSTEM SOFTWARE

The source lines input to this algorithm is assumed in a fixed format with fields LABEL,
OPCODE, and OPERAND. If one of these fields contains a character string that represents a
number, we denote its numeric value with the prefix # (for example, #[OPERAND]).

Machine-Dependent Assembler Features


Fig 2.5 shows the example program from Fig 2.1 by SIC/XE instruction set.

Prefix to operands:
@ --> indirect addressing;
# --> immediate operands;
+ --> extended instruction format.

Instructions that refer to memory are normally assembled using either the program-
counter relative or the base relative mode. The assemble directive BASE (Fig 2.5, line 13) is
used in conjunction with base relative addressing. The main differences between Fig 2.5
(SIC/XE) and Fig 2.1 (SIC) involve the use of register-to-register instructions (lines 150, 165).
In addition, immediate addressing and indirect addressing have been used as much as possible
(lines 25, 55, and 70).

Key points of this subsection:


The translation of the source program, and the handling of different instruction formats
and different addressing modes.
Note that the START statement (assembler directive) specifies a beginning program
address of 0.
Translation of register-to-register instructions (such as CLEAR – line 125, COMPR –
line 150):
The assembler must simply convert the mnemonic operation code to machine language
(using OPTAB) and change each register mnemonic to its numeric equivalent.

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


13
SYSTEM SOFTWARE

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


14
SYSTEM SOFTWARE

Register-to-memory instructions are assembled using either program-counter relative or


base relative addressing. The assembler must, in either case, calculate a displacement to be
assembled as part of the object instruction.

Note that
When the displacement is added to the contents of the program counter (PC) or the base
register (B), the correct target address must be computed.
The resulting displacement must be small enough to fit in the 12-bit field in the
instruction. This means that the displacement must be between 0 and 4095 (for base relative
mode) or between –2048 and +2047 (for program-counter relative mode). If neither program-
counter relative nor base relative addressing can be used (because the displacements are too
large), then the 4-byte extended instruction format (20-bit displacement) must be used.

Example:

15 0006 CLOOP +JSUB RDREC 4B101036


(bit e set to 1 to indicate extended instruction format)
Note that programmer must specify the extended format by using the prefix + (line 15).If
extended format is not specified, the assembler first attempts to translate the instruction using
program-counter relative addressing.

If this is not possible (out of range), the assembler then attempts to use base relative
addressing. If neither form is applicable and the extended format is not specified, then the
instruction cannot be properly assembled and the assembler must generate an error message.

Example:
The displacement calculation for program- counter relative and base relative addressing mode -

A typical example of program-counter relative assembly:

10 0000 FIRST STL RETADR 17202D

Note that the program counter is advanced after each instruction is fetched and before it
is executed.
While STL is executed, PC will contain the address of the next instruction (0003), where
RETADR (line 95) is assigned the address 0030.
The displacement we need in the instruction is 30 – 3 = 2D, that is, target address = (PC)
+ disp = 3 + 2D = 30.
Note that bit p = 1 to indicate PC relative addressing, making the last 2 bytes of the
instruction 202D.

Solution

10 0000 FIRST STL RETADR

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


15
SYSTEM SOFTWARE

Another example of PC relative addressing:

40 0017 J CLOOP 3F2FEC

The operand address (CLOOP=0006); during instruction execution, the PC=001A. Thus the
displacement = 6 – 1A = -14 (using 2’s complement for negative number in a 12-bit field =
FEC).

Solution

Solution

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


16
SYSTEM SOFTWARE

Base relative

The displacement calculation process for base relative addressing is much the same as for
PC relative addressing. The main difference is that the assembler knows what the contents of the
PC will be at execution time. On the other hand, the base register is under control of the
programmer. Therefore, the programmer must tell the assembler what the base register will
contain during execution of the program so that the assembler can compute displacements. This
is done in our example with the assembler directive BASE (line 13). In some case, the
programmer can use another assembler directive NOBASE to inform the assembler that the
contents of the base register can no longer be relied upon for addressing.

Example for base relative assembly:

160 104E STCH BUFFER,X 57C003


According to the BASE statement, register B = 0033 (the address of LENGTH) during
execution.
The address BUFFER is 0036.
Thus the displacement in the instruction must be 36-33=3.
Note that bits x and b are set to 1 to indicate indexed and base relative addressing.

Solution

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


17
SYSTEM SOFTWARE

Immediate addressing mode:


The assembly of instruction with immediate addressing is to convert the immediate
operand to its internal representation and insert it into the instruction.

Example: 1
55 0020 LDA #3 010003

1) The operand stored in the instruction is 003.


2) Bit i = 1 to indicate immediate addressing.

Example: 2

133 103C +LDT #4096 75101000

1) In this case, the operand (4096) is too large to fit into the 12-bit displacement field, so
the extended instruction format is called for. (If the operand were too large even for this 20-bit
address field, immediate addressing could not be used.)

Solution

Immediate & PC-relative addressing


A different way of using immediate addressing is shown in the instruction

12 0003 LDB #LENGTH 69202D

a. The immediate operand is the symbol LENGTH.

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


18
SYSTEM SOFTWARE

b. Since the value of this symbol is the address assigned to it, this immediate
instruction has the effect of loading register B with the address of
LENGTH.
c. Note that we have combined PC relative addressing with immediate
addressing. (PC = 0006, LENGTH = 0033,
disp = 0033 – 0006 = 002D)

Indirect & PC-relative addressing


The mixed usage of different address mode is allowed. For example, line 70 shows a statement
that combines PC relative and indirect addressing.

solution

Register-to-register instructions

Depending on addressing modes used the assembler design changes. SIC-XE machine has
specific instruction formats for the different addressing modes.

Eg:- Consider a statement COMPR A, S


OPCODE r1 r2
is of 2 bytes
For the above statement
Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI
19
SYSTEM SOFTWARE

Convert the mnemonic name to their number equivalents.i.e. Register name (A, X, L, B, S, T, F,
PC, SW) and their values (0, 1, 2, 3, 4, 5, 6, 8, 9). May implement in a separate table or preload
the register names and values to SYMTAB.

Ex:
125 CLEAR X
B4 1 0
opcode X

150 COMPR A, S A0 0 4
opcode A S

1. Generate the complete object program for the following assembly level program

SUM START 4000


FIRST LDX ZERO
LDA ZERO
LOOP ADD TABLE, X
TIX COUNT
JLT LOOP
STA TOTAL
RSUB
TABLE RESW 2000
COUNT RESW 1
ZERO WORD 0
TOTAL RESW 1
END FIRST

Assume below opcodes (all in hexadecimal)


LDX – 04 LDA – 00 ADD – 18 TIX – 2C JLT – 38 STA – 0C RSUB – 4C

Solution
Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI
20
SYSTEM SOFTWARE

Location Source statement Object Code

4000 SUM START 4000

4000 FIRST LDX ZERO 045788

4003 LDA ZERO 005788


4006 LOOP ADD TABLE, X 18C015

4009 TIX COUNT 2C5785

400C JLT LOOP 384006

400F STA TOTAL 0C578B

4012 RSUB 4C0000

4015 TABLE RESW 2000

5785 COUNT RESW 1

5788 ZERO WORD 0 000000

578B TOTAL RESW 1

END FIRST

Symbol Table
SYMBOL ADDRESS
SUM 4000
FIRST 4000
LOOP 4006
TABLE 4015
COUNT 5785
ZERO 5788
TOTAL 578B

Program Relocation
a) It is desirable to load and run several programs at the same time
b) The system must be able to load programs into memory wherever there is room
c) The exact starting address of the program is not known until load time.

An absolute program or absolute assembly

a) Program with starting address specified at assembly time.


b) The address may be invalid if the program is loaded into somewhere else.

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


21
SYSTEM SOFTWARE

c) All SIC programs are absolute assembly programs.

Example: 55 101B LDA THREE 00102D.

In the object program (Fig 2.3), this statement is translated as 00102D, specifying that register A
is to be loaded from memory address 102D. Suppose we attempt to load and execute the
program at address 2000 instead of address 1000. address 102D will not contain the value that
we expect. In reality, the assembler does not know the actual location where the program will be
loaded. However, the assembler can identify for the loader those parts of the object program
that need modification. An object program that contains the information necessary to perform
this kind of modification is called a relocatable program.

Relocatable program
The object program that contains the modification record is called a relocatable program.SIC-XE
programs are relocatable, which is assembled using a starting address of 0000 (fig a).

Consider the instruction of SIC-XE program

35 CLOOP +JSUB RDREC 4B101036

Fig 2.7 shows different places (0000, 5000, 7420) for locating a program. For example, in the
instruction “+JSUB RDREC”, the address of RDREC is 1036(0000), 6036(5000), 8456(7420).

Fig 2.7a Fig 2.7b Fig 2.7c

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


22
SYSTEM SOFTWARE

How to modify the address of RDREC according to different relocating address?

The solution to the relocation problem:

When the assembler generates the object code for JSUB instruction, it will insert the address of
RDREC relative to the start of the program. (This is the reason we initialized the location counter
to 0 for the assembly.)
The assembler will also produce a command for the loader, instructing it to add the beginning
address of the program to the address field in the JSUB instruction at load time. A modification
record has the format shown in P.64. Note that the length field of a modification record is
stored in half-bytes (rather than byte) because the address field to be modified may not occupy
an integral number of bytes.
For example, the address field in the +JSUB occupies 20 bits. The starting location field of a
modification record is the location of the byte containing the leftmost bits of the address field to
be modified. If this address field occupies an odd number of half-bytes, it is assumed to begin in
the middle of the first byte at the starting location.

Example:
The modification record for the +JSUB instruction would be “M00000705”. This record
specifies that the beginning address of the program is to be added to a field that begins at address
000007 (relative to the start of the program) and is 5 half-bytes in length. Thus in the assembled
instruction 4B101036, the first 12 bits (4B1) will remain unchanged. The program load address
will be added to the last 20 bits (01036) to produce the correct operand address. In Fig 2.6, only
lines 35 and 65 need to be relocated. The rest of the instructions in the program need not be
modified when the program is loaded.

In some cases, this is because the instruction operand is not a memory address at all (e.g.,
CLEAR R or LDA #3). In other cases, no modification is needed because the operand is
specified using PC relative or base relative addressing. Obviously, the only parts of the program
that require modification at load time are those that specify direct (as opposed to relative)
addresses. Fig 2.8 shows the complete object program corresponding to the source program of
Fig 2.5.
Modification record
Col. 1 M
Col. 2-7 Starting location of the address field to be modified, relative to the beginning of the
program (Hex)
Col. 8-9 Length of the address field to be modified, in half-bytes (Hex)

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


23
SYSTEM SOFTWARE

Questions

1. What are the functions of an assembler?


2. Define assembler directive. With example explain SIC assembles directives.
3. Show the structure of a header record, text record, end record and modification record
taking one example for each.
4. Briefly explain the data structures required for a simple assembler.
5. Describe how the symbol table and the operation table are used in a two pass assembler
with an example.
6. Enlist the various assembler features that are m/c dependent. Explain any one.
7. Write pass 1 and pass 2 algorithm of an assembler.
8. Explain the need of relocation of a program. Explain how it is implemented.
9. Generate the complete object program for the following assembly level program
SUM START 0000
FIRST CLEAR X
LDA #0
+ LDB #TOTAL
BASE TOTAL
LOOP ADD TABLE, X
TIX COUNT
JLT LOOP
STA TOTAL
COUNT RESW 1
TABLE RESW 2000
TOTAL RESW 1
END FIRST

Assume below opcodes (all in hexadecimal)


CLEAR – B4 LDA – 00 LDB – 68 ADD – 18 TIX – 2C JLT – 38 STA – 0C

Dr. C.K. SRINIVAS. Professor. DEPT OF CS&E, BITM, BALLARI


24

You might also like