Subject-System Programming Sub. Code-2150708 Unit-3 (Assemblers) by - Prof. Deepmala Sharma
Subject-System Programming Sub. Code-2150708 Unit-3 (Assemblers) by - Prof. Deepmala Sharma
Sub. Code-2150708
Unit- 3 (Assemblers)
By- Prof. Deepmala Sharma
Contents
• What is an assembly language programming (ALP)?
• What is Assembler?
• Applications of Assembly Language
• Advantages of AL
• Disadvantages of AL
• Elements of ALP
– Statement Format and Machine Instruction Format
• Types of AL Statements
• A Simple Assembly Scheme
• Pass Structure of Assembler
• Design of a Two pass Assembler
– Pass I of the Assembler
– Intermediate code form (Variant I and Variant II)
– Pass II of the Assembler
Assembly Language Programming
(ALP)
• Assembly language is a kind of low level programming
language, which uses symbolic codes or mnemonics as
instruction.
• Some examples of mnemonics include ADD, SUB, LDA,
and STA that stand for addition, subtraction, load
accumulator, and store accumulator, respectively.
• For processing of an assembly language program we
need a language translator called assembler
• Assembler- Assembler is a Translator which translates
assembly language code into machine code
Position of Assembler
Applications of Assembly Language
• assembly language is used for direct hardware
manipulation, access to specialized processor
instructions, or to address critical
performance issues.
• Typical uses are device drivers (CD, HDD),
low-level embedded systems (Keyboard, water
tank indicator) and real-time systems
(computer, notepad).
Advantage and Disadvantages of ALP
• Advantages-
• Due to use of symbolic codes (mnemonics), an assembly program
can be written faster.
• It makes the programmer free from the burden of remembering the
operation codes and addresses of memory location.
• It is easier to debug.
• Disadvantages-
• it is a machine oriented language, it requires familiarity with
machine architecture and understanding of available instruction
set.
• Execution in an assembly language program is comparatively time
consuming compared to machine language. The reason is that a
separate language translator program is needed to translate
assembly program into binary machine code
Elements of ALP
• An assembly language provides the following three basic facilities that
simplify programming:
• Mnemonic operation codes: The mnemonic operation codes for machine
instructions (also called mnemonic opcodes) are easier to remember and
use than numeric operation codes. Their use also enables the assembler
to detect use of invalid operation codes in a program.
Ex : + 09 0 113
• Opcode, reg.operand and Memory Operand occupy 2,1,and 3 digits.
• This is a Machine code which will produce by assembler.
Types of Assembly Language statements
• Imperative statements (IS)
– An imperative statement in assembly language indicates the
action to be performed during execution of assembly statement
– Ex:- A 1,FOUR
• Declarative Statement (DL)
– These statements declares the storage area or declares the
constant in program.
– [Lable] DS <constant> ex : A DS 1
– [Lable] DC ‘<Value>’ ex : ONE DC ‘1’
• Declaration statements are for reserving memory for
variables. DS- Declare storage and DC- Declare constant
• (A DS 1) statement reserves 1 word of memory for variable
A
• (ONE DC ‘1’) statement associates the name ONE with a
memory word containing the value ‘1’
Types of Assembly Language
statements cont’d
• Assembler Directives
– It instructs the assemble to perform certain actions
during the assembly of a program.
– START <constant > ex: START 200
– END [<operand spec.>] ex: END
– Advance Assembler Directives
– ORIGIN <address spec.> ex: ORIGIN LOOP+2
– <symbol> EQU <address spec.> ex: BACK EQU LOOP
– LTORG <=‘value’> ex LTORG
=‘1’
• Where <Address spec.> can be <constant> or <symbol
name> + <displacement>
Advance assemble directives
• ORIGIN- This directive instructs the assembler to put
the address given by <address specification> in the
location counter
• EQU- The statement simply associates the name
<symbol> with the address specified by <address
specification>. However, the address in the location
counter is not affected.
• LTORG- The LT0RG directive, which stands for 'origin for
literals', allows a programmer to specify where literals
should be placed
– If a program does not use an LTORG statement, the
assembler would enter all literals used in the program into
a single pool and allocate memory to them when it
encounters the END statement.
Use of ORIGIN, EQU and LTORG
•ORIGIN- Statement number 18 of
the program viz. ORIGIN LOOP + 2
puts the address 204 in the location
counter because symbol LOOP is
associated with the address 202. The
next statement MULT CREG, B is
therefore given the address 204.
•EQU-On encountering the
statement BACK EQU LOOP, the
assembler associates the symbol
BACK with the address of LOOP i.e.
with 202
• LTORG- The literals ='5' and ='1' are
added to the literal pool in
Statements 2 and 6, respectively. The
first LTORG statement (Statement 13)
allocates the addresses 211 and 212
to the values '5' and ‘1’.
A Simple Assembly Scheme
Fig.-Design of assembler
Analysis phase
• The primary function performed by the analysis phase is the
building of the symbol table.
• For this purpose it must determine address of the symbolic name.
This function is called memory allocation.
• To implement memory allocation a data structure called location
counter (LC) is used, it is initialized to the constant specified in the
START statement.
• We refer the processing involved in maintaining the location
counter as LC processing.
• Tasks performed Analysis phase
1.Isolate the label, mnemonics opcode, and operand fields of a
constant.
2.If a label is present, enter the pair (symbol, <LC content>) in a
new entry of symbol table.
3.Check validity of mnemonics opcode.
4.Perform LC processing.
Synthesis phase
• Consider the assembly statement,
– MOVER BREG, ONE
• We must have following information to synthesize the machine
instruction corresponding to this statement:
– 1.Address of name ONE
– 2.Machine operation code corresponding to mnemonics MOVER.
Pass1: Databases
• Input source program
• “LC” location counter used to keep track of each instructions addr.
• M/c operation table (MOT) contain a field [mnemonic opcode,
class and mnemonic information (information code of IS, length)]
• Pseudo operation table [POT] contain a field [mnemonic opcode,
class and mnemonic information (R#routine number)]
• Symbol Table (ST or SYMTAB) to store each lable & it’s value.
• Literal Table (LT or LTTAB), to store each literal (variable) & it’s
location.
• Literal Pool Table (POOLTAB)
• Copy of input to used later by PASS-2.
MOT+POT=OPTAB
• OPTAB contains the field mnemonics opcodes,
class and mnemonics info.
• The class field indicates whether the opcode
belongs to an imperative statement (IS), a
declaration statement (DS), or an assembler
directive (AD).
• If an imperative, the mnemonics info field
contains the pair (machine code, instruction
length), else it contains the id of a routine to
handle the declaration statement or assembler
directive statement
Mnemonic Operation Table (OPTAB)
Mnemonics Opcode Class Mnemonics information
START AD R#1
MOVER IS (04,1)
MOVEM IS (05,1)
ADD IS (01,1)
BC IS (07,1)
LTORG AD R#5
SUB IS (02,1)
STOP IS (00,1)
ORIGIN AD R#3
MULT IS (03,1)
DS DL R#7
EQU AD R#4
END AD R#2
Machine opcode table (MOT)
Mnemonics Opcode Class Mnemonics information
MOVER IS (04,1)
MOVEM IS (05,1)
ADD IS (01,1)
BC IS (07,1)
SUB IS (02,1)
STOP IS (00,1)
MULT IS (03,1)
•A SYMTAB entry contains the symbol name, field address and length.
•Some address can be determining directly, e.g. the address of the
first instruction in the program, however other must be inferred.
•To find address of other we must fix the addresses of all program
elements preceding it. This function is called memory allocation
Literal Table (LITTAB)
Index no. Literal Address
1 =‘5’ 211
2 =‘1’ 212
3 =‘1’ 219
Literal no.
#1
#3
• Pass2: Databases
• Copy of source program input to Pass1.
• Location Counter (LC)
• MOT [Mnemonic, length, binary m/c op code, etc.]
• POT [Mnemonic & action to be taken in Pass2
• ST [prepared by Pass1, label & value]
• Base Table [or register table] indicates which registers
are currently specified using ‘USING’ pseudo op & what
are contents.
• Literal table prepared by Pass1. [Lit name & value].
Format of Data Structures pass l
• Intermediate Code
• Intermediate code consist of a set of IC units,
each unit consisting of the following three fields
• 1.Address
• 2.Representation of mnemonics opcode
• 3.Representation of operands
• Intermediate code can be in variant I or variant II
form
Variant l
• Mnemonics field
• The mnemonics field contains a pair of the form
– (statement class, code)
– Where statement class can be one of IS, DL, and AD
• For imperative statement, code is the instruction
opcode in the machine language.
• For declarations and assembler directives, code is
an ordinal number within the class.
• (AD, 01) stands for assembler directive number 1
which is the directive START
Variant I
Imperative statements Instruction
Declaration Instruction
(mnemonics) code
statement code
STOP 00
DC 01
ADD 01
DS 02
SUB 02
MULT 03
Assembler Instruction
MOVER 04
directive code
MOVEM 05
START 01
COMP 06
END 02
BC 07
ORIGIN 03
DIV 08
EQU 04
READ 09
LTORG 05
PRINT 10
JUMP 11
Variant I
Condition Instruction Register Instruction
code code
LT 1 AREG 1
LE 2 BREG 2
EQ 3 CREG 3
GT 4 DREG 4
GE 5
ANY 6
•First operand is represented by a single digit number which is a code for a register or
the condition code
•The second operand, which is a memory operand, is represented by a pair of the form
(operand class, code)
•Where operand class is one of the C, S and L standing for constant, symbol and literal
For a constant, the code field contains the internal representation of the constant itself.
Ex: the operand descriptor for the statement START 200 is (C,200).
For a symbol or literal, the code field contains the ordinal number of the operand’s
entry in SYMTAB or LITTAB
Variant ll
• This variant differs from variant I of the intermediate
code because in variant II symbols, condition codes and
CPU register are not processed.
• So, IC unit will not generate for that during pass I.
Variant I Variant II
IS, DL and AD all statements DL and AD statements contain processed
contain processed form. form while for Is statements, operand
field is processed only to identify literal
references.
Variant I Variant II
Pass-II
Pass-I Pass-II
Pass-I
Data Data
Data Data Structure Structure
Structure Structure Work Work
area area
Work Work
area area
14
0
Stmt no Lable mn operands LC sign op- Reg Memory Var I Var II
opcode code ope. Ope.
(AD,1) (C,200) (AD,1) (C,200)
(IS,4) (1) (L,1) (IS,4) AREG (L,1)
(IS,5) (1) (S,4) (IS,5) AREG (S,4)
(IS,4) (1) (S,4) (IS,4) AREG (S,4)
(IS,4) (3) (S,6) (IS,4) CREG (S,6)
(IS,1) (3) (L,2) (IS,1) CREG (L,2)
Ex : + 09 0 113
• Opcode, reg.operand and Memory Operand occupy 2,1,and 3 digits.
• This is a Machine code which will produce by assembler.
Pass – I of ASSEMBLER
Flow Chart
Pass – llof ASSEMBLER
Flow Chart
Error reporting of assembler
• Error reporting in pass I
• Listing an error in first pass has the advantage that source program
need not be preserved till pass II.
• But, listing produced in pass I can only reports certain errors not all.
• From the below program, error is detected at statement 9 and 21.
• Statement 9 gives invalid opcode error because MVER does not
match with any mnemonics in OPTAB.
• Statement 21 gives duplicate definition error because entry of A is
already exist in symbol table.
• Undefined symbol B at statement 10 is harder to detect during pass
• I, this error can be detected only after completing pass I
Error Table