Assemblers II
Assemblers II
cks
Chapter - 3
ASSEMBLERS-II
ASSEMBLERS-2
1
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
Literals
It is often convenient for the programmer to be able to write the value of a constant operand as a
part of the instruction that uses it. This avoids the defining the constants elsewhere in the
program and make up a label for it. Such a notation is called as literal.
cks
:
LDA =X’ 05’
:
A literal is identified with the prefix =, followed by a specification of the literal value.
The example above shows a 3-byte operand whose value is a character string EOF. The
object code for the instruction is also mentioned. It shows the relative displacement value of the
location where this value is stored. In the example the value is at location (002D) and hence the
displacement value is (010).
As another example the given statement below shows a 1-byte literal with the
hexadecimal value ‘05’.
2
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
All of the literal operands used in a program are gathered together into one or more literal pools.
Normally literals are placed into a pool at the end of the program. In some cases, it is desirable to
cks
place literals into a pool at some other location in the object program.
When the assembler encounters a LTORG statement, it creates a literal pool that contains all of
the literal operands used since the previous LTORG (or the beginning of the program). This
literal pool is placed in the object program at the location where the LTORG directive was
encountered.
Of course, literals placed in a pool by LTORG will not be repeated in the pool at the end of the
program. If we had not used the LTORG statement, the literal =C’EOF’ would be placed in the
pool at the end of the program. Most assemblers recognize duplicate literals – that is, the same
literal used in more than one place in the program – and store only one copy of the specified data
value.
Format of LITTAB
3
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
During pass 1, the assembler searches LITTAB for the specified literal name (or value). If the
literal is already present in the table, no action is needed. If it is not present, the literal is added to
LITTAB (leaving the address unassigned).
During pass 2, the operand address is obtained by searching LITTAB for each literal operand
encountered. Generate Modification record for literals that represent an address in the program.
Symbol-Defining Statements
Most assemblers provide an assembler directive that allows the programmer to define symbols
and specify their values. The directive used for this EQU (Equate).
The general form of the statement is
This statement defines the given symbol (i.e., entering in the SYMTAB) and assigning to it the
value specified. The value can be a constant or an expression involving constants and any other
symbol which is already defined. One common usage is to define symbolic names that can be
For example
+LDT #4096
cks
used to improve readability in place of numeric values.
This loads the register T with immediate value 4096, this does not clearly what exactly this value
indicates. If a statement is included as:
Then it clearly indicates that the value of MAXLEN is some maximum length value.
When the assembler encounters EQU statement, it enters the symbol MAXLEN along with its
value in the symbol table. During LDT the assembler searches the SYMTAB for its entry and its
equivalent value as the operand in the instruction. The object code generated is the same for both
the options discussed, but is easier to understand. If the maximum length is changed from 4096
to 1024, it is difficult to change if it is mentioned as an immediate value wherever required in the
instructions. We have to scan the whole program and make changes wherever 4096 is used. If we
mention this value in the instruction through the symbol defined by EQU, we may not have to
search the whole program but change only the value of MAXLENGTH in the EQU statement
(only once).
4
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
Most assemblers provide an assembler directive that allows the programmer to define symbols
and specify their value. The assembler directive generally used is EQU.
Another common usage of EQU statement is for defining values for the general-purpose
registers. The assembler can use the mnemonics for register usage like a-register A , X – index
register and so on. But there are some instructions which require numbers in place of names in
the instructions. For example in the instruction RMO 0, 1 instead of RMO A,X. The
programmer can assign the numerical values to these registers using EQU directive.
A EQU 0
X EQU 1 and so on
These statements will cause the symbols A, X, L… to be entered into the symbol table
with their respective values. An instruction RMO A, X would then be allowed. As another usage
if in a machine that has many general purpose registers named as R1, R2,…, some may be used
as base register, some may be used as accumulator. Their usage may change from one program
to another. In this case we can define these requirement using EQU statements.
BASE EQU R1
INDEX EQU R2
COUNT EQU R3
One restriction with the usage of EQU is whatever symbol occurs in the right hand side of the
BETA
ALPHA
EQU
RESW
cks
EQU should be predefined. For example, the following statement is not valid:
ALPHA
1
As the symbol ALPHA is assigned to BETA before it is defined. The value of ALPHA is not
known.
ORG Statement:
This directive can be used to indirectly assign values to the symbols. The directive is
usually called ORG (for origin). Its general format is:
ORG value
Where value is a constant or an expression involving constants and previously defined symbols.
When this statement is encountered during assembly of a program, the assembler resets its
location counter (LOCCTR) to the specified value. Since the values of symbols used as labels are
taken from LOCCTR, the ORG statement will affect the values of all labels defined until the
next ORG is encountered. ORG is used to control assignment storage in the object program.
Sometimes altering the values may result in incorrect assembly.
5
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
ORG can be useful in label definition. Suppose we need to define a symbol table with the
following structure:
SYMBOL 6 Bytes
VALUE 3 Bytes
FLAG 2 Bytes
. . .
. . .
. . .
The symbol field contains a 6-byte user-defined symbol; VALUE is a one-word representation of
the value assigned to the symbol; FLAG is a 2-byte field specifies symbol type and other
information. The space for the table can be reserved by the statement:
STAB RESB 1100
cks
If we want to refer to the entries of the table using indexed addressing, place the offset value of
the desired entry from the beginning of the table in the index register. To refer to the fields
SYMBOL, VALUE, and FLAGS individually, we need to assign the values first as shown
below:
To retrieve the VALUE field from the table indicated by register X, we can write a statement:
LDA VALUE, X
6
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
The same thing can also be done using ORG statement in the following way:
Using ORG:
Reserve space
The first statement allocates 1100 bytes of memory assigned to label STAB. In the second
statement the ORG statement initializes the location counter to the value of STAB. Now the
LOCCTR points to STAB. The next three lines assign appropriate memory storage to each of
SYMBOL, VALUE and FLAG symbols. The last ORG statement reinitializes the LOCCTR to a
new value after skipping the required number of memory for the table STAB (i.e., STAB+1100).
Notice that two-pass assembler design requires that all symbols be defined during Pass 1.
Example:
Another example:
cks ALPHA RESW 1
Expressions
Most assemblers allow the use of expressions. Each such expression must be evaluated by the
assembler to produce a single operand address or value. Assemblers generally arithmetic
expressions formed according to the normal rules using arithmetic operators +, - *, /. Division is
usually defined to produce an integer result. Individual terms may be constants, user-defined
symbols, or special terms. The only special term used is * ( the current value of location counter)
which indicates the value of the next unassigned memory location. Thus the statement
BUFFEND EQU *
7
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
Assigns a value to BUFFEND, which is the address of the next byte following the buffer area.
Some values in the object program are relative to the beginning of the program and some are
absolute (independent of the program location, like constants).
Hence, expressions are classified as either absolute expression or relative expressions depending
on the type of value they produce.
Relative: means relative to the beginning of the program. Labels on instructions and data areas,
and references to the location counter value, are relative terms.
Absolute Expressions: The expression that uses only absolute terms is absolute expression.
Absolute expression may contain relative term provided the relative terms occur in pairs with
opposite signs for each pair. Example:
cks
Note: A symbol whose value is given by EQU (or some similar assembler directive) may be
either an absolute term or a relative term depending on the expression used to define its value. If
relative terms occur in pairs and the terms in each such pair have opposite signs, then the
resulting expressions are absolute expressions. None of the relative terms may enter into a
multiplication or division operation.
A relative expression is one in which all of the relative terms except one can be paired as
described above; the remaining unpaired relative term must have a positive sign.
Both BUFEND and BUFFER are relative terms, each representing an address within the
program. However, the expression represents an absolute value: the difference between the two
addresses which is the length of the buffer area in bytes.
8
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
SYMTAB
Symbol Type Value
Name Value
RETADR R 30 COPY 0
BUFFER R 36 FIRST 0
CLOOP 6
BUFEND R 1036
ENDFIL 1A
MAXLEN A 1000 RETADR 30
LENGTH 33
BUFFER 36
LITTAB BUFEND 1036
MAXLEN 1000
RDREC 1036
C'EOF' 454F46 3 002D RLOOP 1040
X'05' 05 1 1076 EXIT 1056
INPUT 105C
WREC 105D
Program Blocks WLOOP 1062
Program blocks are referred to be segments of code that are rearranged within a single object
program unit, Program blocks allow the generated machine instructions and data to appear in the
object program in a different order by Separating blocks for storing code, data, stack, and larger
data block.
cks
Assembler Directive USE:
USE [blockname]
At the beginning, statements are assumed to be part of the unnamed (default) block. If no USE
statements are included, the entire program belongs to this single block. Each program block
may actually contain several separate segments of the source program. Assemblers rearrange
these segments to gather together the pieces of each block and assign address. Separate the
program into blocks in a particular order. Large buffer area is moved to the end of the object
program. Program readability is better if data areas are placed in the source program close to the
statements that reference them.
Fig shows our example program, as it might be written using program blocks.
9
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
Pass 1
A separate location counter for each program block is maintained.
Save and restore LOCCTR when switching between blocks.
At the beginning of a block, LOCCTR is set to 0.
Assign each label an address relative to the start of the block.
Store the block name or number in the SYMTAB along with the assigned relative address
of the label
Indicate the block length as the latest value of LOCCTR for each block at the end of
Pass1
Assign to each block a starting address in the object program by concatenating the
program blocks in a particular order
Pass 2
Calculate the address for each symbol relative to the start of the object program by
adding
The location of the symbol relative to the start of its block
The starting address of this block
cks
10
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
cks
Fig 2.12
11
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
Fig 2.12 shows this process applied to our sample program. Notice that the symbol MAXLEN
(line 107) is shown without a block number. It is an absolute symbol.
Consider an Example:
SYMTAB shows the value of the operand (LENGTH) as relative location 0003 within
program block 1 (CDATA). The starting address for CDATA is 0066. Thus the desired target
address for this instruction is 0003+0066=0069.
cks
opcode n i x b p e disp
000000 1 1 0 0 1 0 060
Object Program
It is not necessary to physically rearrange the generated code in the object program. The
assemblers just simply insert the proper load address in each Text record. The loader will load
these codes into correct place.
H^COPY ^000000^001071
T^000000^1E^172063^4B2021^032060^290000^332006^4B203B^3F2FEE^032055^0F2056^01000
3
T^00001E^09^0F2048^4B2029^3E203F
T^000027^1D^B410^B400^B440^75101000^E22038^332FFA^DB2032^A004^3320085^57A02FB
850
T^000044^09^3B2FEA^13201F^4F0000
T^000006^01^F1
T^00004D^19^B410^772017^E32031B^332FFA^53A016^FD2012^B850^3B2FEE^4F0000
T^000006^04^454F46^05
E^000000
12
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
Not present
in object program
cks
Control Sections:
A control section is a part of the program that maintains its identity after assembly; each
control section can be loaded and relocated independently of the others. Different control
sections are most often used for subroutines or other logical subdivisions. The programmer can
assemble, load, and manipulate each of these control sections separately.
Because of this, there should be some means for linking control sections together. For
example, instructions in one control section may refer to the data or instructions of other control
sections. Since control sections are independently loaded and relocated, the assembler is unable
to process these references in the usual way. Such references between different control sections
are called external references.
The assembler generates the information about each of the external references that will
allow the loader to perform the required linking. When a program is written using multiple
control sections, the beginning of each of the control section is indicated by an assembler
directive
– assembler directive: CSECT
The syntax
13
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
secname CSECT
– separate location counter for each control section
Control sections differ from program blocks in that they are handled separately by the assembler.
Symbols that are defined in one control section may not be used directly another control section;
they must be identified as external reference for the loader to handle. The external references are
indicated by two assembler directives:
cks
14
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
cks
15
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
Case 1
Case 2
There are two external references in the expression, BUFEND and BUFFER.
The assembler inserts a value of zero
passes information to the loader
Add to this data area the address of BUFEND
cks
Subtract from this data area the address of BUFFER
Case 3
On line 107, BUFEND and BUFFER are defined in the same control section and the expression
can be calculated immediately.
16
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
cks
17
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
The assembler must also include information in the object program that will cause the loader to
insert the proper value where they are required. The assembler maintains two new record in the
object code and a changed version of modification record.
A define record gives information about the external symbols that are defined in this control
section, i.e., symbols named by EXTDEF.
A refer record lists the symbols that are used as external references by the control section, i.e.,
symbols named by EXTREF.
The new items in the modification record specify the modification to be performed: adding or
subtracting the value of some external symbol. The symbol used for modification my be defined
either in this control section or in another section.
Modification record
Col. 1 M
Col. 2-7 Starting address of the field to be modified, relative to the beginning of the
control section (hexadecimal)
Col. 8-9 Length of the field to be modified, in half-bytes (hexadecimal)
Col 10 Modification flag (+ or -)
Col.11-16 External symbol whose value is to be added to or subtracted from
the indicated field.
18
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
The object program is shown below. There is a separate object program for each of the
control sections. In the Define Record and refer record the symbols named in EXTDEF and
EXTREF are included.
In the case of Define, the record also indicates the relative address of each external
symbol within the control section.
For EXTREF symbols, no address information is available. These symbols are simply
named in the Refer record.
cks
19
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
The existence of multiple control sections that can be relocated independently of one
another makes the handling of expressions complicated. It is required that in an expression that
all the relative terms be paired (for absolute expression), or that all except one be paired (for
relative expressions).
When it comes in a program having multiple control sections then we have an extended
restriction that:
Both terms in each pair of an expression must be within the same control section
o If two terms represent relative locations within the same control section, their
difference is an absolute value (regardless of where the control section is located.
Legal: BUFEND-BUFFER (both are in the same control section)
o If the terms are located in different control sections, their difference has a value
that is unpredictable.
Illegal: RDREC-COPY (both are of different control section) it is the
difference in the load addresses of the two control sections. This value
depends on the way run-time storage is allocated; it is unlikely to be of
any use.
ASSEMBLER DESIGN
Here we are discussing
o The structure and logic of one-pass assembler. These assemblers are used when it is
necessary or desirable to avoid a second pass over the source program.
o Notion of a multi-pass assembler, an extension of two-pass assembler that allows an
assembler to handle forward references during symbol definition.
One-Pass Assembler
The main problem in designing the assembler using single pass was to resolve forward
references. We can avoid to some extent the forward references by:
Eliminating forward reference to data items, by defining all the storage reservation
statements at the beginning of the program rather at the end.
Unfortunately, forward reference to labels on the instructions cannot be avoided.
(forward jumping)
To provide some provision for handling forward references by prohibiting forward
references to data items.
20
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
Load-and-Go Assembler
Load-and-go assembler generates their object code in memory for immediate execution.
No object program is written out, no loader is needed.
It is useful in a system with frequent program development and testing
o The efficiency of the assembly process is an important consideration.
Programs are re-assembled nearly every time they are run; efficiency of the assembly
process is an important consideration.
cks
Omits the operand address if the symbol has not yet been defined
Enters this undefined symbol into SYMTAB and indicates that it is undefined
Adds the address of this operand address to a list of forward references associated with
the SYMTAB entry
21
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
When the definition for the symbol is encountered, scans the reference list and inserts the
address.
At the end of the program, reports the error if there are still SYMTAB entries indicated
undefined symbols.
For Load-and-Go assembler
o Search SYMTAB for the symbol named in the END statement and jumps to this
location to begin execution if there is no error
The status is that upto this point the symbol RREC is referred once at location 2013, ENDFIL at
201C and WRREC at location 201F. None of these symbols are defined. The figure shows that
how the pending definitions along with their addresses are included in the symbol table.
cks
Fig : object code in memory and symbol table entries for the program after scanning line 40.
22
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
The status after scanning line 160, which has encountered the definition of RDREC and
ENDFIL, is as given below:
cks
If the operand contains an undefined symbol, use 0 as the address and write the Text
record to the object program.
Forward references are entered into lists as in the load-and-go assembler.
When the definition of a symbol is encountered, the assembler generates another Text
record with the correct operand address of each entry in the reference list.
When loaded, the incorrect address 0 will be updated by the latter Text record containing
the symbol definition.
23
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
cks
Multi_Pass Assembler:
For a two pass assembler, forward references in symbol definition are not allowed:
ALPHA EQU BETA
BETA EQU DELTA
DELTA RESW 1
o Symbol definition must be completed in pass 1.
Prohibiting forward references in symbol definition is not a serious inconvenience.
o Forward references tend to create difficulty for a person reading the program.
24
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
Multi-Pass Assembler (Figure 2.21 of Beck): Example for forward reference in Symbol Defining
Statements:
cks
1. HALFSZ EQU MAXLEN/2
MAXLEN has not yet been defined, so no value for HALFSZ can be computed. The defining
expression for HALFSZ is stored in the symbol table in place of its value. The entry &1 indicates that one
symbol in the defining expression is undefined. The SYMTAB would then simply contain a pointer to the
defining expression. The symbol MAXLEN is also entered in the symbol table, with the flag * identifying
it as undefined.
The same procedure is followed with the definition of MAXLEN. In this case there are two
undefined symbols involved in the definition: BUFEND and BUFFER. Both of these are entered
25
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
into SYMTAB with lists indicating the dependence of MAXLEN upon them. Similarly, the
definition of PREVBT causes this symbol to be added to the list of dependences on BUFFER.
Let us assume that when line 4 is read, the location counter contains the hexadecimal value 1034.
This is stored as the value of BUFFER. The assembler then examines the list of symbols that are
dependent on BUFFER. The symbol table entry for the first symbol in this list (MAXLEN)
shows that depends on two currently undefined symbols; therefore, MAXLEN cannot be
evaluated immediately. Instead the &2 is changed to &1 to show that only one symbol in the
definition (BUFEND) remains undefined. The other symbols in the list (PREVBT) can be
evaluated because it depends only on BUFFER. The value of the defining expression for
PREVBT is calculated and stored in SYMTAB. The result is shown in figure.
cks
26
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY
Chapter 3 ASSEMBLERS-II
Questions
Sl.No UNIT – 3 Assemblers-II Mark
s
1. Enlist the various assembler features that are m/c dependent and m/c independent. Explain 10
any one of them each.(Jan 2005)
2. In a two pass assembler, list the different data bases used in each pass. Explain the contents 10
and uses of each data base.(Jan 2005)
3. Compare a two pass assembler with a single pass assembler. How forward references are 10
handled in one pass assembler?(Dec 2007)
4. What is LITORG? When it is used? Explain with an example.(Dec 2007 June 2010) 06
5. When is multi-pass assembler required? Show step by step procedure to evaluate the 08
following statements. Show the symbol table after each scan.(Jan 2005, Dec09)
1. HALFSZ EQU MAXLEN/2
2. MAXLEN EQU BUFEND-BUFFER
3. PREVBT EQU BUFFER-1
4. BUFFER RESB 4096
5. BUFEND EQU *
OR . Write short notes on multi pass assemblers.
6. Explain the need for BASE and NOBASE directives with examples. 05
7.
8.
2005, Dec 2007)
cks
Explain program relocation. Also explain how the problems of relocation are solved.?( Jan
What is a program block? How multiple program blocks are handled by assemblers?(Dec 2007)
10
10
9. What are the different ways of specifying an operand value in a source statement? Give their 12
formats.
10. Compare a two-pass assembler with a single pass assembler. How forward references are 10
handled in one-pass assembler?
11. What is the difference between literal and immediate operand. How does the assembler 04
handle the literal operands? (Dec09,Dec2011)
12. Explain the following assembler directives with example each: 05
(i) EQU (ii) BASE (iii) ORG (iv) USE (v) NOBASE
13. Give the difference between program blocks and control sections and explain in detail 10
processing of control sections.
08
What is control section? How are they processed? (June 2009)
14. With required data structures & processing logic, explain the implementation of literals within 10
an assembler.
15. Give the format for the following record necessary to obtain object code:.(Jan 2005,Dec2011) 12
i. Header record ii. Text record iii. Refer record
iv. Define record v. Modification record (revised )
v. End record
16. Explain absolute and relative expressions. How these are processed by an assembler.(June 06
2009)
17. Explain the structure of Load and Go assembler.(Dec09,June 2010) 08
27
C.K. SRINIVAS Asst.Prof. DEPT OF CSE BITM, BELLARY