0% found this document useful (0 votes)
187 views102 pages

CSC 2209 Notes

The document discusses a course on systems programming. It provides objectives, grading policy, recommended textbooks, and an outline of topics to be covered including computer architecture, assemblers, loaders and linkers, compilers, and macroprocessors. The course will introduce students to the design of various systems software and how to implement such software on different computer systems. Grades will be based on tests, coursework assignments, and a final exam.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
187 views102 pages

CSC 2209 Notes

The document discusses a course on systems programming. It provides objectives, grading policy, recommended textbooks, and an outline of topics to be covered including computer architecture, assemblers, loaders and linkers, compilers, and macroprocessors. The course will introduce students to the design of various systems software and how to implement such software on different computer systems. Grades will be based on tests, coursework assignments, and a final exam.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 102

SCHOOL OF COMPUTING AND INFORMATICS

TECHNOLOGY

COURSE: CSC 2209 SYSTEMS PROGRAMING

PRE-REQUISITE: CSC 1100 Computer Literacy

Course Objectives.
The course will introduce the students the designs of the different Systems Software and will
also consider the implementations of such software on a variety of real machines.

Grading Policy:
Grades will be based on your performance on two in-class tests, a
comprehensive final examination and Course work.
40% Course work which includes, tests and Course work assignments.
60% Final Examination.

N.B. Late homework decreases your overall score by 20% per day.

Recommended Text Books


1. SYSTEMS PROGRAMMING AND OPERATING SYSTEMS by D
M Dhamdhere.
2. SYSTEMS SOFTWARE; An Introduction to Systems Programming
by Leland L Beck
3. SYSTEMS PROGRAMMING by John J. Donovan
4. MICROCOMPUTERS FOR ENGINEERS AND SCIENTISTS by
Glenn A. Gibson and Yu-cheng Liu
2

COURSE OUTLINE:
1. Review Micro Computer Architecture
1.1 CPU
1.2 Memory
1.3 The Intel 8085/8088 CPU
1.4 Machine Language Instructions
1.5 Instruction Formats and Addressing Modes

2. The Simplified Instructional Computer


2.1 The SIC Machine Structure
2.2 The SIC / XE Machine Structure

3. Assemblers
3.1 Assembler Tables and Logic
3.2 Instruction Formats and addressing Modes.
3.3 Program Relocation
3.4 Literals
3.5 Program Blocks and Control sections

4. Loaders and Linkers


4.1 Design of an Absolute Loader
4.2 Relocation, Program Linking
4.3 Tables for a Linking Loader
4.4 Loader Options and Overlay Programs.

5. Compilers
5.1 Basic Compiler Functions
5.1.1 Grammers
5.1.2 Lexical Analysis
5.1.3 Syntactic Analysis
5.1.4 Code Generation
5.2 Machine Dependent Compiler Features
Code Optimisation
5.3 Machine Independent Code Optimization
5.3.1 Storage Allocation
5.3.2 Structured Variables
5.4 Compiler Design Options

6. Macroprocessors
7.1 Macro Definitions
7.2 Macro Processors Tables and Logic
7.3 Macro expansions
3

SYSTEMS PROGRAMMING
A computer system is sometimes subdivided into two functional entities:
Hardware and Software.
The hardware of a computer consists of all the electronic components
and the electro mechanical devices that comprise the physical entity of
the device.
Software consists of the instructions and data that the computer
manipulates to perform various data processing tasks.
Three types of software exist:
1. Systems Software
2. Development Software
3. Application Software
The system software of a computer consists of a collection of programs
whose purpose is to make more effective use of the computer. They
control the operation of the machine and carry out the most basic
functions the computer performs. They control the way in which the
computer receives input, produces output, manages and stores data,
carries out or executes instructions of other programs etc.
Examples of systems programs include language processors (compilers
and assemblers that accept people like languages and translate them
into machine language), loaders (they prepare machine language
programs for execution), macro processors (allow programmers to use
abbreviations), operating systems and file systems (allow flexible storing
and retrieval of information).
Application programs are written by the user for the purpose of solving
particular problems using a computer as a tool e.g. application
packages.
Development Software is used to create, update and maintain other
programs e.g. programming languages.
4

Systems software supports the operation and use of the computer itself
rather than any particular application. They are therefore related to the
structure of the machine on which they are to run.
There are however some aspects of systems software that do not
directly depend upon the type of the computing system being supported
e.g. the general design and logic of an assembler is basically the same
on most computers.

MICRO COMPUTER ARCHITECTURE

Interface Memory Module


Timing Circuitry

Microprocessor
(CPU) Bus Control
Logic
Interface Memory Module

System Bus
I/O
Interface Mass Storage
Device Sub System

Interface I/O Devices

The microprocessor
At the centre of all operations is the MPU (Microprocessor Unit). In a
microcomputer the CPU is the microprocessor. Its purpose is to
 Decode the instructions and use them to control the
activities within the system
 It also performs the arithmetic( + , -, /, *) and logical
(>,>=,<,<=, =, =!) computations.
Timing Circuitry (Clock)
5

Used to synchronise the activities within the microprocessor and the bus
control logic.

Memory
 Stores both data and instructions that are currently being
used. Memory is broken down into modules where each module
contains several thousand locations.
 Each location is associated with an identifier called a
memory address.

The I/O Sub System


Consists of a variety of devices for communicating with the external
world and for storing large quantities of information e.g keyboards, light
pens, e.t.c. for input and CRT monitors, printers, plotters for output.
Computer components for permanently storing programs and data are
referred to as mass storage units e.g the magnetic tapes, disk units,
magnetic bubble memory e.t.c
N.B. Both programs and data although they can be stored on mass
storage devices they must be transferred to memory first.

System Bus.
A set of conductors that connect the CPU to its memory and I/O devices.
The bus conductors are normally separated into 3 groups:
1. The Data Lines: for transmitting information
2. Address Lines: Indicate where information is to come from or
where it is to be placed.
3. Control Lines: To regulate the activities on the bus.

Interface
6

Circuitry needed to connect the bus to a device. Memory interfaces


consist of logic
 Needed to decode the address of the memory location being
accessed.
 Buffer data onto/off the bus.
 Contain circuitry to perform memory reads or write.

I/O interfaces must


 Buffer data onto/off the system bus
 Receive commands from the CPU
 Transmit information from their devices to the CPU.

Bus Control Unit


Co-ordinates the CPU activities with the external world.

THE CPU Control memory


Working Registers
Program counter
Control Unit Address Registers

Instruction Register

Processor status word

Stack pointer
Arithmetic
registers

Bus
Control
Unit Arithmetic/Logic Unit
A typical CPU consists of the control unit which contains the following
registers:
7

1. The Program Counter (PC) :It holds the address of the main
memory location from which the next instruction is to be fetched.
2. Instruction Register (IR) Receives the instruction when it is brought
from memory and holds it while it gets decoded and executed.
3. Processor Status Word (PSW) contains condition flags which
indicate the current status of the CPU and the important
characteristics of the result of the previous instruction.
4. Stack Pointer (SP): Accesses a special part of memory called a
stack. It is used to temporarily store important information while
sub routines are being executed. It hold the address at the top of
the stack.

Working Registers
They are Arithmetic registers or accumulators and address registers.
(i) Arithmetic Registers: They temporarily hold the operands and the
result of the arithmetic operations.
Accessing a register is faster than accessing memory. If several
operations are to be performed it is better to have the operands in
registers than in memory and return only the result to memory. The
more arithmetic registers a CPU has the faster it can execute
computations.

(ii) Address Registers: for addressing data and instructions in main


memory.
If a register can be used for both arithmetic operations and addressing it
is then called a general purpose register.

Arithmetic/Logic Unit
8

It performs arithmetic and logical operations on the contents of the


working registers, the PC, memory locations etc. It also sets and clears
the appropriate flags.

EXAMPLES OF CPU’S
The Intel 8085

Address/Data (16 lines)

Control Lines (20)

Bus Control Clock

S Z AC P C

Processor Status Word

Program Counter (16 bits)

Stack Pointer (16 bits)

Accumulator (8 bits)
B (8 bits) C (8 bits)
D (8 bits) E (8 bits)
H (8 bits) L (8 bits)
General Purpose Registers

ALU

It is an 8 bit processor i.e. has 8 data lines (1 byte of data can be


transmitted at a time.
Has 6 general purpose registers namely B, C, D, E, H, L with 8 bits each
and associated in pairs.
1 8 bit accumulator.
1 16 bit stack pointer
1 16 bit program counter
9

1 PSW with 5 flags.


Zero (Z): set when the result of the operation is zero.
Sign (S): set when sign of the result is negative.
Parity (P): When the parity of the bits in the result is even.
Carry (C): Addition resulted into a carry or subtraction or comparison
resulted into a borrow.
Auxiliary Carry (AC) Carry in BCD arithmetic.

Has 16 address and data lines (address space is 0 – 2 16 -1


20 control lines i.e. it has 20 control signals.
The address and data share the same bus lines and they must take
turns to use them. (They are time multiplexed). The address must be
sent first and then data is sent or received.

MACHINE LANGUAGE INSTRUCTIONS


At the time of execution all instructions are made up of a sequence of
bytes (a combination of zeros and ones).
Because instructions in their 0’s and 1’s form can be directly understood
by the computer they are therefore called machine language
instructions.
All other forms of programs, assembler, high level etc must be reduced
to their machine level form.

INSTRUCTION FORMATS
The arrangement of an instruction with respect to assigning meaning to
its various groups of bits is called its format.
The portion of the instruction that specifies what the instruction does is
called the operation code (opcode).
10

Any address or piece of data that is required by the instruction to


complete its execution is called an operand
An instruction therefore consists of an operation code and a number of
operands.
Most computers are designed so that not more than 2 operands are
needed by a single instruction.
Some instructions require only one operand and they are called single
operand instructions; others are double operand instructions.
If a quantity is taken from a location, it is called the source operand.
The location that is changed or where the source operand is taken is
called the destination.

All instruction formats reserve the first bits of the instruction for at least
part of the opcode but beyond this the formats vary considerably from
one computer to the next. The remaining bits designate the operands or
their locations.
Instructions vary in length from 1 byte to 3 or 6 bytes.
e.g.
Register to register transfer
0 1 0 0 0 1 1 1

pcode Destination Source


(Register B) (Register A)

Load accumulator from memory


0 0 1 1 1 0 1 0
Low order Address
High Order Address

Transfer of immediate Data


Opcode
Destination Register C
0 0 0 0 1 0 1 0
Data
11

Conditional branch to Zero Result


Opcode
Condition Code
1 1 0 0 1 0 1 0
Low order branch address
High Order branch Address

ADDRESSING MODES
They are the methods used to locate and fetch an operand from an
internal CPU register or from a memory location.
Each processor has its own addressing modes.
1. Immediate Addressing: Information is part of the instruction. No
addressing is needed to get the information.
It is mostly used for quantities that are constants.
They are 2 byte instructions where the operand is the second byte.
2. Direct addressing: The address is part of the instruction.
3. Register addressing: The operand is in the register and the register’s
address is part of the instruction.
4. Indirect Addressing: The address is in the location whose address is
specified as part of the instruction. This location may be a register
(register indirect addressing) or it may be a memory location.
e.g., add contents of register R1 to the memory location whose
address is in register R2.
5. Base addressing: The address is formed by adding the contents of a
memory location or register to a number called a displacement
which is part of the instruction. It is used primarily to reference
arrays or in relocating a program in memory.
6. Indexing: It is a process of incrementing or decrementing an address
as the computer sequences through a set of consecutive or evenly
spaced addresses. This is done by successively changing an
12

address that is stored in a register called an index register that can


be incremented or decremented.
7. Auto incrementing / decrementing: The index is automatically
incremented by an instruction.
13

ASSEMBLER LANGUAGE
It is a type of language that is closer to machine language instructions.
There is an assembler language instruction for each machine language
instruction.
An assembler converts Assembler Language into machine instructions.
There are 2 types of statements in assembler:
(i) Instructions: These are translated into machine code by the
assembler.
(ii) Directive: Gives directions to the assembler during the assembly
process but they are not translated into machine code.
Acronyms called mnemonics indicate the type of instruction.
Character strings called symbols or identifiers represent addresses and
perhaps numbers.
A typical assembler instruction would be
MOV A , M
A typical assembler directive would be
COST : DS 1
This directive causes the assembler to reserve a byte and associate a
symbol COST to it.

Example:
For a problem ANS: = X + Y; it can be solved as follows in the 8085
microprocessor.

LDA X
MOV B, A
LDA Y
ADD B
STA ANS

3 points are worth noting.


14

(i) Most instructions involve movement of information from one part of


the computer to another.
(ii) Computers do not work on entities with a flexible manner. e.g. for
the ADD instruction, the second operand must be in the
accumulator.
(iii) All programs whatever the language involve inputting, processing
and outputting.

Assembler Instruction Format


The General format is
Label: Mnemonic Operand, Operand, ; remarks
Label: It is a symbol assigned to the address of the first byte of the
instruction in which it appears.
Its presence is optional; if present it provides a symbolic name that can
be used in branch instructions to branch to an instruction.
If there is no label then there is no colon.
All instruction must contain a mnemonic.
The presence of operands depends on the instruction. If there is more
than one operand they are separated by a comma.
Remarks are for documenting the program; they are optional.

e.g. BRADDR: MOV A, M

Label mnemonic destination Source


Operand Operand
Register – Register Transfer
e.g. NOW: MOV B, D

Label mnemonic destination Source


Operand Operand
Load accumulator from memory
e.g. LDA NUM

Mnemonic address of operand to be loaded into memory


15

Transfer of immediate operand to a register


e.g. MVI E, 6

Mnemonic Destination register Immediate Data

Conditional branch on non-zero branch


e.g. JNZ HERE

Mnemonic address
Branch Condition

(i) All instructions are 1, 2, or 3 bytes long.


(ii) Instructions that involve only register or register indirect addressing
are 1 byte long;
(iii) Those that involve I/O or immediate operands are 2 or 3 bytes
long.

The working registers are A, B, C, D, E, H and L.


Register Addresses are:

Register Address Register pair


B 000 BC 00
C 001
D 010 DE 01
E 011
H 100 HL 10
L 101
A 111
The registers are sometimes considered in pairs of BC, DE and HL.
Both registers in the pair have the same higher 2 bits in their registers.

Assembler Directives.
The directives direct the assembler during the assembly process.
The ASM 85 has 3 directives.
They have the format
Label: Mnemonic Operand, Operand
16

The label is optional


The directives are DS, DB, and DW

DS (Define Storage)
It is used to reserve memory and perhaps to assign a label to the first
byte of the reserved area. e.g. ARRAY: DS 20 reserves 20
bytes and assigns the label ARRAY to the byte with the lowest address.

DB (Define Byte)
Used to put values into or pre-assign values to memory locations as well
as reserve space and assign labels.
It serves as the DATA statement in Fortran. It can include up to 8
operands where each operand is a string constant with no more than
128 characters or constant expressions that evaluate to a 2’s
complement number from -128 to 127.
e.g. NUM: DB 14H, ‘ABC’,01101000B
reserves 5 bytes associated with a label NUM with the first byte.

14 41 42 43 68

NUM

DW (Define Word)
Similar to DB except that it reserves words instead of bytes. Each of its
possible 8 operands should evaluate to a 16 bit number or a single string
of one or two characters.
The lower order byte of the word is stored in the lower byte address and
the high order byte in the higher byte address e.g.

TABLE: DW TASK1, TASK2, 092AH


17

TASK1 and TASK2 are labels. Assuming that TASK1 and TASK2 have
been assigned memory locations 2010 and 108C respectively

10 20 8C 10 2A 09

Table
18

THE SIMPLIFIED INSTRUCTIONAL COMPUTER


It is similar to a typical microcomputer. It comes in two versions: the
standard model and the XE version. (XE = Extra Equipment or “Extra
Expensive”)

THE SIC MACHINE


Memory:
Memory consists of 8-bit bytes; any three consecutive bytes form a word
(24 bits). There are a total of 32,768 (215) bytes in the computer memory.

Registers
There are 5 registers where each has a special use. Each register is 24
bits.
Mnemonic Number Use
A 0 Accumulator; used for arithmetic Operations
X 1 Index register; used for addressing
L 2 Linkage register; the Jump to subroutine (JSUB)
instruction stores the return address in this register.
PC 8 Program Counter; Contains the address of the next
instruction to be fetched for execution.
SW 9 Status word; contains the condition codes

Data Formats
Integers are stored as 24 bit binary numbers; 2’s complement
representation is used for negative numbers. Characters are stored
using their 8 bit ASCII codes. There is no floating point hardware on the
simple standard version.

Instruction formats
All machine instructions have the following 24 bit format.
8 1 15
Opcode x address

The flag bit x is used to indicate indexed addressing mode.


19

Addressing Modes
There are 2 addressing modes designated by the bit x. When x = 1 the
addressing mode is indexed, when it is 0 it is direct.
Mode Indication Target address calculation
Direct x=0 TA = address
Indexed x=1 TA = address + (X)
(X) means contents of register X.

Instruction Set
Instructions available include LDA, LDX, STA, STX, ADD, SUB, MUL,
DIV etc.
All Arithmetic operations involve register A and a word in memory; the
result is left in the register.
An instruction COMP compares values in register A with a word in
memory, it sets the condition code CC to indicate the result (<, = , or >).
Conditional jump instructions are JLT, JEQ, JGT. For subroutine linkage,
there is jump to subroutine JSUB and the return address is placed in
register L and return from subroutine RSUB where the program returns
by jumping to address contained in register L.

Input and Output


This is performed by transferring one byte at a time to or from the
rightmost 8 bits of register A. Each device is assigned a unique 8 bit
code.
There are three I/O instructions each of which specifies the device code
as an operand.
 TD Test Device: tests whether the addressed device is ready to
send or receive a byte of data. The condition code is set to indicate
the result of this test.
 < implies ready to send or receive.
20

= device is busy.
 RD Read data: Reads the data from a device when the device is
ready otherwise the operation is delayed.
 WR Writes data to a device.
The sequence is repeated for each byte of data to be read or written.

THE SIC/XE MACHINE


Memory:
The maximum memory is one megabyte (2 20 bytes). This increase leads
to a change in instruction formats and addressing modes.

Registers
The following additional registers are provided
Mnemonic Number Use
B 3 Base register, used for addressing
S 4 General Working register, no special use.
T 5 General Working register, no special use.
F 6 Floating Point Accumulator.

Data Formats
In addition to the standard formats for the standard version there is a 48
bit floating point data type with the following format.
1 11 36
S exponent fraction

The fraction is a value between 0 and 1. For normalized floating point


numbers, the high order bit of the fraction must be a 1. The exponent is
an unsigned binary number between 0 and 2047.
If the exponent is e and the fraction is f the absolute value of the fraction
is represented as f * 2 (e – 1024)
The sign of the floating point number is indicated by the value of s ( 0 =
positive , 1 = negative.)
21

Instruction formats
Format 1 (1 byte)
8
Opcode

Format 2 (2 bytes)
8 4 4
opcode r1 r2

Format 3 (3 bytes)
6 1 1 1 1 1 1 12
opcode n i x b p e disp

Format 4 (4 bytes)
6 1 1 1 1 1 1 20
opcode n i x b p e address

Since we now have more memory, an address can’t now fit into a 15 bit
field. Two possible options are available in the extended version by
using some sort of relative addressing (format 3) or extend the address
field to 20 bits (format 4). There are also instructions that do not
reference memory at all (formats 1 and 2)
Bit e in formats 3 and 4 is used to distinguish between formats 3 and 4.
(e = 0 means format 3, e = 1 means format 4)

Addressing Modes
Two new relative addressing modes are provided by the extended
version of format 3: Base relative addressing and program counter
relative addressing
Mode Indication Target address calculation
Base relative b= 1, p = 0 TA =(B) + disp (0 <= disp <= 4095)
Program Counter b = 0, p = 1 TA = (PC) + disp (-2048 <= disp <= 2047)

For base addressing disp is a 12 bit unsigned integer. For program


counter relative addressing it is a 12 bit signed integer with negative
values represented in 2’s complement.
22

If bits b and p are set to 0, the disp field in format 3 is taken to be the
target address. For format 4 bits b and p must be 0 and the target
address is taken from the address field of the instruction. This is called
direct addressing.
Any of these addressing modes can also be combined with indexed
addressing if bit x is set to 1. In such a case the contents of X, (X) is
added in the target address calculation.
Bits i and n are used to specify how the target address is used. If i = 1
and n = 0 the target address itself is used as the operand value. No
memory reference is made. This is immediate addressing.
If i = 0 and n = 1 the word at the location given by the target address is
fetched. The value in this word is taken as the address of the operand
value. This is indirect addressing. If bits i and n are both 0 or both = 1
the target address is taken as the location of the operand. This is
referred to as simple addressing. Indexing cannot be used with
immediate or indirect addressing.
If bits n and i are both 0 then bits b, p and e are considered to be part of
the address field of the instruction rather than flags indicating addressing
modes. This makes Instruction Format 3 identical to the format used on
the standard version of SIC.
b p Addressing Mode Target Address
0 0 Direct TA = Disp + (X)
0 1 PC Relative TA = (PC) + Disp + (X)
1 0 Base Relative TA = (B) + Disp + (X)
1 1 -----

n i Addressing Mode Operand


0 0 Simple Addressing Operand = Contents of TA (bits b,p,e are part of address bits)
0 1 Immediate Operand = TA
1 0 Indirect Addressing Word at TA is the address of the Operand
1 1 Simple addressing Operand = Contents of TA

Example
23

The figure below gives the different addressing modes available. The
contents of registers B, PC, and X and some selected memory locations
are shown.
The machine code for a series of LDA instructions is given. The target
address generated by each instruction and the value that is loaded into
register A are also shown.
(B) = 006000 (PC) = 003000(X) = 000090

………. 003600 103000 00C303 003030

3030 3600 6390 C303


value
loaded
Hex Target into
Op n i x b p e disp/address Address A
032600 000000 1 1 0 0 1 0 0110 0000 0000 3600 103000
03C300 000000 1 1 1 1 0 0 0011 0000 0000 6390 00C303
022030 000000 1 0 0 0 1 0 0000 0011 0000 3030 103000
010030 000000 0 1 0 0 0 0 0000 0011 0000 30 000030
003600 000000 0 0 0 0 1 1 0110 0000 0000 3600 103000
0310C303 000000 1 1 0 0 0 1 0000 1100 0011 0000 0011 C303 003030
24

Addressing Modes
The following addressing modes apply to Format 3 and 4 instructions. Combinations of
addressing bits not included in this table are treated as errors.

4 Format 4 Instruction
D Direct addressing
A Assembler selects either program counter relative or base relative mode

Addressing Mode n i x b p e Calculation of target Operand Notes


Address TA
Simple 1 1 0 0 0 0 disp (TA) D
1 1 0 0 0 1 addr (TA) 4 D
1 1 0 0 1 0 (PC) + disp (TA) A
1 1 0 1 0 0 (B) + disp (TA) A
1 1 1 0 0 0 Disp + (X) (TA) D
1 1 1 0 0 1 Addr + (X) (TA) 4 D
1 1 1 0 1 0 (PC) + disp + (X) (TA) A
1 1 1 1 0 0 (B) + disp + (X) (TA) A
0 0 0 - - - b /p/e/disp (TA) D
0 0 1 - - - b/ p/e/disp + (X) (TA) D

Indirect 1 0 0 0 0 0 disp ((TA)) D


1 0 0 0 0 1 addr ((TA)) 4 D
1 0 0 0 1 0 (PC) + disp ((TA)) A
1 0 0 1 0 0 (B) + disp ((TA)) A

Immediate 0 1 0 0 0 0 disp TA D
0 1 0 0 0 1 addr TA 4 D
0 1 0 0 1 0 (PC) + disp TA A
0 1 0 1 0 0 (B) + disp TA A

Instruction Set
All instructions in the standard version are still available. In addition there
are instructions to load and store the new registers (LDB, STB, etc) and
to perform floating point arithmetic. (ADDF, SUBF, MULF, DIVF).
Other instructions work on registers e.g. RMO, ADDR, SUBR, MULR,
DIVR.
In the instruction set Table below, uppercase letters refer to specific registers. The notation
m .indicates a memory address, n indicates an integer between 1 and 16 and r1 and r2
represent register identifiers.
Parentheses are used to indicate the contents of a register or memory location. Thus
A (m..m+2) specifies that the contents of the memory location m through m+2 are
loaded into register A; m..m+2 (A) specifies that the contents of register A are stored in
the word that begins at address m.
P Priviledged Instruction
25

X Instruction available only on XE version


F Floating point Instruction
C Condition code CC set to indicate result of operation

DIRECTIVES
RESW Reserve the indicated number of words for a data area.
RESB Reserve the indicated number of bytes for a data area
WORD Generate a one word integer constant
BYTE Generate character or hexadecimal constant, occupying as
many bytes as needed to represent the constant
START Specifies the name and starting address for the program.
END Indicates the end of the source program and optionally
specify the first executable instruction in the program.
26

SIC/XE INSTRUCTION SET


MNEMONIC FOR OPC EFFECT Notes
MAT ODE
ADD m ¾ 18 A (A) + (m...m+2)
ADDF m ¾ 58 F (F) + (m...m+5) X F
ADDR r1,r2 2 90 r2 (r2) + (r1) X
AND m ¾ 40 A (A) & (m…m+2)
CLEAR r1 2 B4 r1 0 X
COMP m ¾ 28 (A) : (m…m+2) C
COMPF m ¾ 88 (F) : (m…m+5) X F C
COMPR r1,r2 2 A0 (r1) : (r2) X C
DIV m ¾ 24 A (A) / (m….m+2)
DIVF m ¾ 64 F (F) / (m….m+5) X F
DIVR r1,r2 2 9C r2 (r2) / (r1) X
FIX 1 C4 A (F) convert to integer X F
FLOAT 1 C0 F (A) convert to float X F
HIO 1 F4 Halt I/O channel number (A) P X
Jm ¾ 3C PC m
JEQ m ¾ 30 PC m if CC set to =
JGT m ¾ 34 PC m if CC set to >
JLT m ¾ 38 PC m if CC set to <
JSUB m ¾ 48 L (PC); PC m
LDA m ¾ 00 A (m..m+2)
LDB m ¾ 68 B (m..m+2) X
LDCH m ¾ 50 A [rightmost byte] (m)
LDF m ¾ 70 F (m..m+5) X F
LDL m ¾ 08 L (m..m+2)
LDS m ¾ 6C S (m..m+2) X
LDT m ¾ 74 T (m..m+2) X
LDX m ¾ 04 X (m..m+2)
LPS m ¾ D0 Load processor status from information beginning at P X
address m
MUL m ¾ 20 A (A) * (m..m+2)
MULF m ¾ 60 F (A) * (m..m+5) X F
MULR r1,r2 2 98 r2 (r2) * (r1) X
NORM 1 C8 F (F) normalised X F
OR m ¾ 44 A (A) | (m..m+2)
RD m ¾ D8 A [rightmost byte] data from device specified by (m) P
RMO r1,r2 2 AC r2 (r1) X
RSUB ¾ 4C PC (L)
SHIFTL r1,n 2 A4 r1 (r1) left shift n bits X
SHIFTR r1,n 2 A8 r1 (r1) right shift n bits X
SIO 1 F0 Start I/O channel number (A) P X
SSK m ¾ EC Protection key for address m (A) P X
STA m ¾ 0C m..m+2 (A)
STB m ¾ 78 m..m+2 (B) X
STCH m ¾ 54 m (A) [rightmost byte]
STF m ¾ 80 m..m+5 (F) X F
STI m ¾ D4 Interval timer value (m ..m+2) P X
STL m ¾ 14 m..m+2 (L)
STS m ¾ 7C m..m+2 (S) X
STSW m ¾ E8 m..m+2 (SW) P
STT m ¾ 84 m..m+2 (T) X
STX m ¾ 10 m..m+2 (X)
SUB m ¾ 1C A (A) - (m..m+2)
SUBF m ¾ 5C F (F) - (m..m+5) X F
SUBR r1,r2 2 94 r2 (r2) – (r1) X
SVC n 2 B0 Generate SVC Interrupt X
TD m ¾ E0 Test Device specified by (m) P C
TIO 1 F8 Test I/O channel number (A) P X C
TIX m ¾ 2C X (X) + 1; (X) : (m..m+2) C
TIXR r1 2 B8 X (X) + 1; (X) : (r1) X C
WD m ¾ DC Device specified by (m) (A) [rightmost byte] P
27

EXAMPLES

1. Sample data movement operations for the Simple SIC


LDA FIVE
STA ALPHA
LDCH CHARZ
STCH C1

ALPHA: RESW 1
FIVE : WORD 5
CHARZ: BYTE C’Z’
C1: RESB 1

Same problem for the SIC/XE


LDA #5
STA ALPHA
LDCH #90 (Load ASCII code for Z)
STCH C1

ALPHA: RESW 1
C1: RESB 1
28

2. Sample arithmetic operation for SIC


All arithmetic operations are done using register A with the result being left in
Register A.

BETA = (ALPHA + INCR – 1); DELTA = (GAMMA + INCR – 1)

LDA ALPHA
ADD INCR
SUB ONE
STA BETA
LDA GAMMA
ADD INCR
SUB ONE
STA DELTA

ONE: WORD 1
ALPHA: RESW 1
BETA: RESW 1
GAMMA: RESW 1
DELTA: RESW 1
INCR RESW 1

Same problem for the SIC/XE


LDS INCR
LDA ALPHA
ADDR S, A
SUB #1
STA BETA
LDA GAMMA
ADDR S, A
SUB #1
STA DELTA

ALPHA: RESW 1
BETA : RESW 1
GAMMA: RESW 1
DELTA: RESW 1
INCR : RESW 1
29

3. Sample looping and Indexing operations for Simple SIC


The loop copies one 11 –byte character string to another.

LDX ZERO
MOVECH: LDCH STR1,X
STCH STR2,X
TIX ELEVEN
JLT MOVECH

STR1: BYTE C’TEST STRING’


STR2: RESB 11
ZERO: WORD 0
ELEVEN: WORD 11

Same problem for the SIC/XE


LDT #11
LDX #0
MOVECH: LDCH STR1,X
STCH STR2,X
TIXR T
JLT MOVECH

STR1: BYTE C’TEST STRING’


STR2: RESB 11
30

4. Sample looping and Indexing operations for the Simple SIC


The variables ALPHA, BETA and GAMMA are arrays of 100 words each. The loop
adds the corresponding elements of ALPHA and BETA and stores the result in the
elements of GAMMA.
The value in the index register must be incremented by 3 for each iteration of the loop
because each iteration processes a 3 byte (1 word) element of the array.

LDA ZERO
STA INDEX
ADDLP: LDX INDEX
LDA ALPHA, X
ADD BETA, X
STA GAMMA, X
LDA INDEX
ADD THREE
STA INDEX
COMP K300
JLT ADDLP

INDEX: RESW 1
ALPHA: RESW 100
BETA : RESW 100
GAMMA: RESW 100
ZERO : WORD 0
K300: WORD 300

Same problem for the SIC/XE


LDS #3
LDT #300
LDX #0
ADDLP: LDA ALPHA, X
ADD BETA, X
STA GAMMA, X
ADDR S, X
COMPR X, T
JLT ADDLP

ALPHA: RESW 100


BETA : RESW 100
GAMMA: RESW 100
31

5. Sample input and output operations for SIC


The same instructions will also work on SIC/XE.
The program reads 1 byte of data from device F1 and copies it onto device 05.
If the device is ready the condition code is set to “Less than”; if not ready the
condition code is set to “equal”.

INLOOP: TD INDEV
JEQ INLOOP
RD INDEV
STCH DATA

OUTLP: TD OUTDEV
JEQ OUTLP
LDCH DATA
WD OUTDEV

INDEV: BYTE X’F1’


OUTDEV: BYTE X’05’
DATA : RESB 1
32

6. Sample Subroutine call and record input operations for SIC


The program reads a 100 byte record from an input device into memory. The read
operation is placed in a subroutine which is called the main program by using the
JSUB instruction. At the end of the subroutine there is an RSUB instruction which
returns control to the instruction that follows the JSUB.
The READ subroutine also consists of a loop. Each execution of this loop reads one
byte of data from the input device. The bytes that are read are stored in a 100 byte
buffer area labeled RECORD.

JSUB READ

READ: LDX ZERO


RLOOP: TD INDEV
JEQ RLOOP
RD INDEV
STCH RECORD,X
TIX K100
JLT RLOOP
RSUB

INDEV: BYTE X’F1’


RECORD: RESB 100
ZERO : WORD 0
K100 : WORD 100

Same problem for the SIC/XE


JSUB READ

READ: LDX #0
LDT #100
RLOOP: TD INDEV
JEQ RLOOP
RD INDEV
STCH RECORD,X
TIXR T
JLT RLOOP
RSUB

INDEV: BYTE X’F1’


RECORD: RESB 100
33

ASSEMBLERS
Basic Assembler Functions
An assembler is a program that accepts as input an assembler language
program and it produces its machine language equivalent along with
information for the loader.

Assembler Language
Program Assembler Machine To linker
Language

Listing

COPY START 1000 Copy file from input to output


FIRST STL RETADR Save Return Address
CLOOP JSUB RDREC Read input record
LDA LENGTH Test for EOF (length = 0)
COMP ZERO
JEQ ENDFIL Exit if EOF found
JSUB WRREC Write output record
J CLOOP Loop
ENDFIL LDA EOF Insert end of file marker
STA BUFFER
LDA THREE Set length = 3
STA LENGTH
JSUB WRREC Write EOF
LDL RETADR Get Return Address
RSUB Return to Caller
EOF BYTE C’EOF’
THREE WORD 3
ZERO WORD 0
RETADR RESW 1
LENGTH RESW 1 Length of Record
BUFFER RESB 4096 4096 Byte Buffer Area
Subroutine to read record into Buffer
RDREC LDX ZERO Clear Loop Counter
LDA ZERO Clear A to Zero
RLOOP TD INPUT Test input device
JEQ RLOOP Loop until ready
RD INPUT Read character into a Register
COMP ZERO Test for End of Record (X ‘00’)
JEQ EXIT Exit Loop if EOF
STCH BUFFER,X Store character in buffer
TIX MAXLEN Loop unless MAX Length
JLT RLOOP Has been reached
EXIT STX LENGTH Save Record Length
RSUB Return to Caller
INPUT BYTE X’F1’ Code for Input Device
MAXLEN WORD 4096
Subroutine to write record from Buffer
WRREC LDX ZERO Clear Loop counter
WLOOP TD OUTPUT Test output device
JEQ WLOOP Loop until ready
LDCH BUFFER,X Get character from buffer
WD OUTPUT Write Character
TIX LENGTH Loop Until all characters
JLT WLOOP have been written
RSUB Return to Caller
OUTPUT BYTE X’05’ Code for output Device
END FIRST
34

The example above shows an assembler language program that


contains a main routine that reads records from an input device (F1) and
copies them to an output device (code 05).
The main routine calls subroutine RDREC to read a record into a buffer
and subroutine WRREC to write the record from the buffer to the output
device. Only one character is transferred at a time. The end of each
record is marked with a null character (hexadecimal 00). If a record is
longer than the length of the buffer (4096 bytes) only the first 4096 bytes
are copied.
The end of a file to be copied is indicated by a zero length record.
When the end of file is detected the program writes EOF on the output
device and terminates by executing an RSUB instruction.

A simple SIC Assembler


The code for the program is rewritten below with the generated object
code for each statement.
The translation of the assembler program to object code needs the
following:
1 Convert mnemonic operation codes to their machine language
equivalent; e.g. translate STL to 14.
2 Convert symbolic operands to their equivalent machine addresses
e.g. translate RETADR to 1033.
3 Build the machine instructions in the correct format.
4 Convert the data constants into their internal machine
representations e.g. translate EOF to 454F46
5. Write the object program and the assembly listing.
All these functions except 2 can easily be accomplished by simple
processing of the source program one line at a time but the translation of
35

addresses is a bit complicated because the address to be assigned to


the symbol is unknown.
Because of this, most assemblers make 2 passes over the source
program.
1000 COPY START 1000
1000 FIRST STL RETADR 141033
1003 CLOOP JSUB RDREC 482039
1006 LDA LENGTH 001036
1009 COMP ZERO 281030
100C JEQ ENDFIL 301015
100F JSUB WRREC 482061
1012 J CLOOP 3C1003
1015 ENDFIL LDA EOF 00102A
1018 STA BUFFER 0C1039
101B LDA THREE 00102D
101E STA LENGTH 0C1036
1021 JSUB WRREC 482061
1024 LDL RETADR 081033
1027 RSUB 4C0000
102A EOF BYTE C’EOF’ 454F46
102D THREE WORD 3 000003
1030 ZERO WORD 0 000000
1033 RETADR RESW 1
1036 LENGTH RESW 1
1039 BUFFER RESB 4096
Subroutine to read record into Buffer
2039 RDREC LDX ZERO 041030
203C LDA ZERO 001030
203F RLOOP TD INPUT E0205D
2042 JEQ RLOOP 30203F
2045 RD INPUT D8205D
2048 COMP ZERO 281030
204B JEQ EXIT 302057
204E STCH BUFFER,X 549039
2051 TIX MAXLEN 2C205E
2054 JLT RLOOP 38203F
2057 EXIT STX LENGTH 101036
205A RSUB 4C0000
205D INPUT BYTE X’F1’ F1
205E MAXLEN WORD 4096 001000
Subroutine to write record from Buffer
2061 WRREC LDX ZERO 041030
2064 WLOOP TD OUTPUT E02079
2067 JEQ WLOOP 302064
206A LDCH BUFFER,X 509039
206D WD OUTPUT DC2079
2070 TIX LENGTH 2C1036
2073 JLT WLOOP 382064
2076 RSUB 4C0000
2079 OUTPUT BYTE X’05’ 05
END FIRST

Pass 1 (Define symbols)


1. The first pass scans the source program for label definitions and
assigns addresses to all statements in the program.
2. Saves the addresses assigned to all labels for use in pass 2
36

3. Perform some processing of the assembler directives. e.g.,


determining the length of data areas defined by BYTE, RESW etc.

Pass 2 (assemble the instructions and generate machine code)


1. Assemble instructions (translating operation codes and looking up
addresses)
2. Generate data values defined by BYTE, WORD, etc.
3. Perform processing of the directives not done during pass 1.
4. Write the object program and assembly listing onto some output
device which will later be loaded in memory for execution.

A simple object program contains three types of records:


1. The Header: contains the program name, the starting address
of the program and the length of the whole object program.
2. The Text record contains the translated instructions and the
data of the program together with an indication of the addresses
where they are loaded.
3. The End record marks the end of the object program and
specifies the address of the program where execution is to begin.

Header Record
Col. 1 H
Col 2-7 Program Name
Col 8-13 Starting address of the object program
Col 14-19 Length of object program in bytes.

Text Record
Col. 1 T
Col 2-7 Starting address for object code in this record
Col 8-9 Length of object code in this record in bytes.
Col 10-69 Object code in hexadecimal.

End Record
Col. 1 E
Col 2-7 Address of first executable instruction in object
program.
37

H^ COPY ^001000^00107A
T^001000^1E^141033^482039^001036^281030^301015^482061^3C1003^00102A^0C1039^00102D
T^00101E^15^0C1036^482061^081033^4C0000^454F46^000003^000000
T^002039^1E^041030^001030^E0205D^30203F^D8205D^281030^302057^549039^2C205E^38203F
T^002057^IC^101036^4C0000^F1^001000^041030^E02079^302064^509039^DC2079^2C1036
T^002073^07^382064^4C0000^05
E^001000

Assembler Tables and Logic


Two major internal tables are used: The Operation Code table (OPTAB)
and the Symbol Table (SYMTAB).
OPTAB is used to look up mnemonic operation codes and translate
them into their machine language equivalents. SYMTAB is used to store
addresses assigned to labels.
A location counter LOCCTR is used to help in the assignment of
addresses. It is initialized to the beginning address specified in the
START statement. After each source statement is processed the length
of the assembled instruction or data area to be generated is added to
the LOCCTR. Whenever a label in the source program is encountered
the current value of the LOCCTR gives the address to be associated
with that label and it is then said to be defined. If the same label is
defined twice in the source module, an error occurs.
During pass 1 OPTAB is used to look up and validate operation codes in
the source program. In pass 2 it is used to translate the opcodes to
machine language.
The SYMTAB includes the name and address for each label in the
source code program together with flags to indicate error conditions.
In pass 1 the labels are entered into the SYMTAB as they are
encountered in the source program along with their assigned addresses.
In the second pass symbols used as operands are looked up in
SYMTAB to obtain the addresses to be inserted in the assembler
instructions.
38

Machine Dependent Assembler Features


We consider features that get affected when different machines are used
by comparing the previous program which run on the SIC machine as
compared to the SIC/XE machine. We examine the effect of the
hardware on the structure and functions of the assembler.
0000 COPY START 0
0000 FIRST STL RETADR 17202D
0003 LDB #LENGTH 69202D
BASE LENGTH
0006 CLOOP +JSUB RDREC 4B101036
000A LDA LENGTH 032026
000D COMP #0 290000
0010 JEQ ENDFIL 332007
0013 +JSUB WRREC 4B10105D
0017 J CLOOP 3F2FEC
001A ENDFIL LDA EOF 032010
001D STA BUFFER 0F2016
0020 LDA #3 010003
0023 STA LENGTH 0F200D
0026 +JSUB WRREC 4B10105D
002A J @RETADR 3E2003
002D EOF BYTE C’EOF’ 454F46
0030 RETADR RESW 1
0033 LENGTH RESW 1
0036 BUFFER RESB 4096

Subroutine to read record into Buffer


1036 RDREC CLEAR X B410
1038 CLEAR A B400
103A CLEAR S B440
103C +LDT #4096 75101000
1040 RLOOP TD INPUT E32019
1043 JEQ RLOOP 332FFA
1046 RD INPUT DB2013
1049 COMPR A,S A004
104B JEQ EXIT 332008
104E STCH BUFFER,X 57C003
1051 TIXR T B850
1053 JLT RLOOP 3B2FEA
1056 EXIT STX LENGTH 134000
1059 RSUB 4F0000
105C INPUT BYTE X’F1’ F1
Subroutine to write record from Buffer
105D WRREC CLEAR X B410
105F LDT LENGTH 774000
1062 WLOOP TD OUTPUT E32011
1065 JEQ WLOOP 332FFA
1068 LDCH BUFFER,X 53C003
106B WD OUTPUT DF2008
106E TIXR T B850
1070 JLT WLOOP 3B2FEF
1073 RSUB 4F0000
1076 OUTPUT BYTE X’05’ 05
END FIRST
The program above shows the previous first program rewritten using the
SIC/XE features. Indirect addressing is indicated by adding an appendix
@ to the operand. Immediate operands are denoted with the prefix #.
Instructions that refer to memory are usually assembled using either
39

program counter relative or the base relative mode. The assembler


directive BASE is used in conjunction with base relative addressing. If
the displacement required is too large to fit into a 3 byte instruction, the 4
byte extended format (Format 4) is used. It is specified with the prefix +
The main difference between the two versions is the use of register-
register instructions wherever possible. E.g. COMP ZERO is replaced by
COMPR A, S; TX MAXLEN to TIXR T
In addition immediate and indirect addressing has been used (COMP #0,
LDA #3, J @RETADR).
These changes result in increased execution speed of the program
because register-register instructions are faster than the corresponding
memory operations since they are shorter and do not require another
memory reference. The use of indirect addressing often avoids the need
for another instruction.

1. Instruction Formats and addressing Modes


For translation of register-register instructions e.g. CLEAR and COMPR
the assembler simply converts the mnemonic operation code to machine
language using the OPTAB. The conversion of register mnemonics to
numbers can be done with a separate table or to use SYMTAB which
can be preloaded with the register names.
Most of the register – memory instructions are assembled using either
program counter relative addressing or base relative addressing. The
assembler must in either case calculate the displacement to be
assembled as part of the object code. This is computed so that the
correct target address results when the displacement is added to the
contents of the program counter or the Base register.
40

If neither Program counter nor base relative addressing can be used


(because the displacements are too large) Format 4 is then used so that
there is no displacement to be calculated. For example in the instruction
0006 CLOOP +JSUB RDREC 4B101036
the operand address is 1036. This full address is stored in the instruction
with bit e set to 1 to indicate extended instruction format.
The instruction
0000 FIRST STL RETADR 172020D
is an example of a typical Program Counter relative assembly. During
execution the PC will contain the address of the next instruction (LDB).
From the addresses RETADR is assigned the address 0030 (from
SYMTAB). The displacement needed is 0030 – 0003 = 2D.
At execution time the target address calculation performed will be (PC) +
disp resulting into the correct address 0030.
Bit p is set to 1 to indicate program counter relative addressing making
the last two bytes of the instruction 202D.
Bits n and i are both set to 1 indicating neither indirect nor immediate
addressing which makes the first byte 17 instead of 14.
The instruction
0017 J CLOOP 3F2FEC
is another example of Program Counter relative assembly. The operand
address is 0006. During execution the PC will contain 001A. The
displacement required is 0006 – 001A = -14 = FEC as a 2’s complement
number in a 12 bit field.
Displacement for Base relative addressing is much the same except that
the assembler knows what the contents of the program counter will be at
execution time for Program Counter addressing while the base register
on the other hand is under the control of the programmer. The statement
BASE LENGTH for example informs the assembler that the base
41

register will contain the address of LENGTH which is loaded by the


preceding example LDB #LENGTH.
The instruction 104E STCH BUFFER, X 57C003

is an example of base relative assembly.


Register B contains 0033. The address of buffer is 0036. The
displacement is therefore 0036-0033 = 3. Bits x and b are set to 1 to
indicate indexed and base relative addressing.
Immediate addressing just requires converting the operand to its internal
representation and insert it into the instruction.
The instruction 0020 LDA #3 010003

is an example of immediate addressing. The operand is within the


instruction as 003. Bit i is set to 1 to indicate immediate addressing.
For 103C +LDT #4096 75101000

The operand is too large to fit into 12 bits, so extended format is used.
0003 LDB #LENGTH 69202D
is also immediate addressing. The value of the symbol is the address
assigned to it so it loads the address of length into register B. Here
program counter addressing is combined with immediate addressing.
The instruction 002A J @RETADR 3E2003

combines program counter relative and indirect addressing.

2. Program Relocation
It is always impossible to plan where the program will be executed in
memory because there are processes that are always going on. In such
a case the actual starting address of a program is not known until load
time.
The SIC program on page 41 is an example of an absolute program. It
must be loaded at address 1000 (the address that was specified at
assembly time) in order to execute properly.
42

Example: In the instruction


101B LDA THREE 00102D
register A is to be loaded from memory address 102D. If an attempt is
made to load the program at address 2000 instead of 1000 the address
102D will not contain the expected value.
Some changes in the address portion of this instruction are needed
before executing the program at address 2000.
The object program that contains information necessary to perform this
kind of modification is called a relocatable program.
For the SIC/XE program on page 44, the JSUB instruction is loaded at
address 0006. The address field of this instruction contains 01036. If the
program is loaded at address 5000 the address of the instruction labeled
RDREC is then 6036. The JSUB instruction will have to be modified as
shown to contain the new address.
No matter where the program is loaded, RDREC will always be 1036
bytes past the starting address.
0000

0006 4B101036 (+JSUB RDREC)

1036 B410 RDREC

1076

5000

5006 4B106036 (+JSUB RDREC)

6036 B410
6 RDREC

6076

7420

7426 4B108456 (+JSUB RDREC)

8456 B410 RDREC

8496

The relocation problem is solved in the following way:


43

1. When the assembler generates the object code for the JSUB
instruction it will insert the address of RDREC relative to the start of
the program.
2. The assembler will also produce a command for the loader
instructing it to add the beginning address of the program to the
address field in the JSUB instruction at load time.

The command for the loader must also be a part of the object program.
This is accomplished by having a modification record with the format:
Col. 1 M
Col 2-7 starting location of the address field to be modified
relative to the beginning of the program
Col 8-9 length of the address field to be modified in half
Bytes

For the JSUB instruction the modification record would be M00000705


The beginning address of the program is to be added to a field that
begins at the address 000007 and it is 5 half bytes in length.
The same relocation must be added for the instructions at addresses
0013 and 0026 respectively.
The other instructions in the program need not to be modified because in
some cases the operand is not a memory address e.g. CLEAR, S or
LDA #3. In other cases the operand is specified using program counter
relative addressing or base relative addressing.
The only parts of the program that require modification at load time are
those that specify direct addresses.
The object code for the above program would be
H^COPY ^000000^001077
T^000000^1D^17202D^69202D^4B101036^032026^290000^332007^4B10105D^3F2FEC^032010
T^00001D^13^0F2016^010003^0F200D^4B10105D^3E2003^454F46
T^001036^1D^B410^B400^B440^75101000^E32019^332FFA^DB2013^A004^332008^57C003^B850
T^001053^1D^3B2FEA^134000^4F0000^F1^B410^774000^E32011^332FFA^53C003^DF2008^B850
T^001070^07^3B2FEF^4F0000^05
M^000007^05
M^000014^05
M^000027^05
E^000000
44

Machine Independent Assembler Features.


These are features that are not closely related to machine structure.
Presence or absence of these features is related to issues like
programmer’s convenience and software environment than to machine
structure.
0000 COPY START 0
0000 FIRST STL RETADR 17202D
0003 LDB #LENGTH 69202D
BASE LENGTH
0006 CLOOP +JSUB RDREC 4B101036
000A LDA LENGTH 032026
000D COMP #0 290000
0010 JEQ ENDFIL 332007
0013 +JSUB WRREC 4B10105D
0017 J CLOOP 3F2FEC
001A ENDFIL LDA =C’EOF’ 032010
001D STA BUFFER 0F2016
0020 LDA #3 010003
0023 STA LENGTH 0F200D
0026 +JSUB WRREC 4B10105D
002A LTORG
002D * =C’EOF’ 454F46
0030 RETADR RESW 1
0033 LENGTH RESW 1
0036 BUFFER RESB 4096
1036 BUFEND EQU *
1000 MAXLEN EQU BUFEND-BUFFER
Subroutine to read record into Buffer
1036 RDREC CLEAR X B410
1038 CLEAR A B400
103A CLEAR S B440
103C +LDT #4096 75101000
1040 RLOOP TD INPUT E32019
1043 JEQ RLOOP 332FFA
1046 RD INPUT DB2013
1049 COMPR A,S A004
104B JEQ EXIT 332008
104E STCH BUFFER,X 57C003
1051 TIXR T B850
1053 JLT RLOOP 3B2FEA
1056 EXIT STX LENGTH 134000
1059 RSUB 4F0000
105C INPUT BYTE X’F1’ F1
Subroutine to write record from Buffer
105D WRREC CLEAR X B410
105F LDT LENGTH 774000
1062 WLOOP TD =X’05’ E32011
1065 JEQ WLOOP 332FFA
1068 LDCH BUFFER,X 53C003
106B WD =X’05’ DF2008
106E TIXR T B850
1070 JLT WLOOP 3B2FEF
1073 RSUB 4F0000
END FIRST’
1076 * =X’05’ 05

1. Literals
45

Writing the value of a constant operand as part of the instruction that


uses it avoids having to define the constant elsewhere in the program.
Such an operand is called a literal because the value is stated literally in
the instruction.
e.g. the literal in the statement below specifies a 3 byte operand whose
value is the character string EOF.
001A ENDFIL LDA =C’EOF’ 032010, Similarly
1062 WLOOP TD =X’05’ specifies a 1 byte literal with a
hexadecimal value 05.
There is a difference between a literal and immediate addressing. With
immediate addressing the operand value is assembled as part of the
machine instruction. With a literal the assembler generates the specified
value as a constant at some other memory location. The address of this
generated constant is used as the target address for the machine
instruction.
Literal operands used in a program are gathered together into one or
more literal pools at the end of the program showing the assigned
addresses and the generated data values.
In other cases literal pools can be placed at some other location in the
object code. This is enabled by using a directive LTORG. When the
assembler encounters LTORG it creates a literal pool that contains all
the literal operands used since the previous LTORG or the beginning of
the program.
The assembler handles literals in such a way that a Literal Table LITTAB
is used. For each literal used the table contains the literal name, the
operand value and the address assigned to the operand when it is
placed in the literal pool.
46

During pass 1 the assembler gathers all the literals and puts them in the
LITTAB. During pass 2 the data values specified by the literals in each
literal pool are inserted at the appropriate places in the object program.

2. Symbol Defining Statements


Most Assemblers provide the EQU (equate) directive that allows the
programmer to define symbols and specify their values. The general
form for that statement is
Symbol EQU Value
It defines the given symbol (enters it into the SYMTAB) and assigns it
the specified value. The value may be a constant or an expression
involving constants and previously defined symbols.
Example:
For a statement like +LDT #4096 we could include a statement
MAXLEN EQU 4096, then in the program we write
+LDT #MAXLEN
EQU is also used in defining mnemonic names for registers A, X, L etc.
e.g. for machines which have only standard general purpose registers
the Base and index registers may be defined as
BASE EQU R1
INDEX EQU R2
Another common directive that can be used to assign values to symbols
is ORG (origin). Its form is
ORG VALUE
where VALUE is a constant or an expression involving constants and
previously defined symbols.
When the assembler encounters this statement it resets the LOCCTR to
the specified value.
47

3. Expressions
Most assemblers allow the use of expressions wherever a single
operand is permitted. Each expression must be evaluated by the
assembler to produce a single operand address or value. Individual
terms in the expression may be constants, user-defined symbols or
special terms; e.g. the most common special term is the current value of
the location counter (designated by *). It represents the value of the next
unassigned memory location.
The statement BUFEND EQU * in the previous program gives
BUFEND the value that is the address of the next byte after the buffer
area.
Expressions are classified as either absolute expressions or relative
expressions depending upon the value they produce. An expression that
contains only absolute terms (independent of the program location like
constants) is an absolute expression. It may also contain relative terms
so long as the relative terms occur in pairs and the terms in each pair
have opposite signs.
A relative expression is one in which all the relative terms except one
can be paired. The remaining unpaired relative term must have a
positive sign. (Non of the relative terms may enter into a multiplication or
division operation). In the expression
MAXLEN EQU BUFEND – BUFFER
Both BUFEND and BUFFER are relative terms but the expression
represents an absolute value; the difference between the two addresses.
Expressions such as BUFEND + BUFFER, 100 – BUFFER OR 3*
BUFFER represent neither absolute values nor locations within the
program.

4. Program Blocks
48

They are segments of code that are re-arranged within a single object
unit.
In the example below three program blocks have been used. The first
unnamed block contains the executable instructions. The second block
CDATA contains data areas that are a few words in length, the third block

CBLKS has data areas that have larger blocks of memory.


The assembler directive USE indicates which portion of the program
belongs to the various blocks.
0000 0 COPY START 0
0000 0 FIRST STL RETADR 172063
0003 0 CLOOP JSUB RDREC 4B2021
0006 0 LDA LENGTH 032060
0009 0 COMP #0 290000
000C 0 JEQ ENDFIL 332006
000F 0 JSUB WRREC 4B203B
0012 0 J CLOOP 3F2FEE
0015 0 ENDFIL LDA =C’EOF’ 032055
0018 0 STA BUFFER 0F2056
001B 0 LDA #3 010003
001E 0 STA LENGTH 0F2048
0021 0 JSUB WRREC 4B2029
0024 0 J @RETADR 3E203F
0000 1 USE CDATA
0000 1 RETADR RESW 1
0003 1 LENGTH RESW 1
0000 2 USE CBLKS
0000 2 BUFFER RESB 4096
1000 2 BUFEND EQU *
1000 MAXLEN EQU BUFEND-BUFFER
Subroutine to read record into Buffer
0027 0 USE
0027 0 RDREC CLEAR X B410
0029 0 CLEAR A B400
002B 0 CLEAR S B440
002D 0 +LDT #4096 75101000
0031 0 RLOOP TD INPUT E32038
0034 0 JEQ RLOOP 332FFA
0037 0 RD INPUT DB2032
003A 0 COMPR A,S A004
003C 0 JEQ EXIT 332008
003F 0 STCH BUFFER,X 57A02F
0042 0 TIXR T B850
0044 0 JLT RLOOP 3B2FEA
0047 0 EXIT STX LENGTH 13201F
004A 0 RSUB 4F0000
0006 1 USE CDATA
0006 1 INPUT BYTE X’F1’ F1
Subroutine to write record from Buffer
004D 0 USE
004D 0 WRREC CLEAR X B410
004F 0 LDT LENGTH 772017
0052 0 WLOOP TD =X’05’ E3201B
0055 0 JEQ WLOOP 332FFA
0058 0 LDCH BUFFER,X 53A016
005B 0 WD =X’05’ DF2012
005E 0 TIXR T B850
0060 0 JLT WLOOP 3B2FEF
0063 0 RSUB 4F0000
49

0007 1 USE CDATA


LTORG
0007 1 * =C’EOF’ 454F46
000A 1 * =X’05’ 05
END FIRST’

The assembler will rearrange these segments to gather together the


pieces of each block. These blocks are then assigned addresses in the
object program with the blocks appearing in the same order in which
they were first began in the source program.
During pass 1 a separate location counter is maintained for each block.
It is initialized to 0 when the block is first began.
At the end of pass 1 the latest value of the location counter for each
block indicates the length of that block.
MAXLEN is shown without a block number because it is an absolute
value whose value is not relative to the start of any block.
At the end of pass 1 the assembler constructs a working table that
contains the starting addresses and lengths of all blocks
Block Name Block Number Address Length
Default 0 0000 0066
CDATA 1 0066 000B
CBLKS 2 0071 1000

For the instruction 0006 LDA LENGTH

the value of the operand has relative address 0003 within the CDATA
block. The starting address for CDATA is 0066. The desired target
address for this instruction is therefore 0003 + 0066= 0069. The
instruction is to be assembled using program counter relative
addressing.
The address of the next instruction is 0009 within the default block. The
required displacement therefore is 0069 – 0009 = 60. Similar
calculations are performed during pass 2.
50

Because the large buffer area is moved to the end of the object program
there is no need to use extended format instructions. Base register is
also no longer necessary.

5. Control Sections and Program Linking


A control Section is part of the program that maintains its identity after
assembly; each section can be loaded and relocated independently of
each other. Different control sections are most often used for
subroutines or other logical subdivisions of a program.
0000 COPY START 0
EXTDEF BUFFER,BUFEND, LENGTH
EXTREF RDREC,WRREC
0000 FIRST STL RETADR 172027
0003 CLOOP +JSUB RDREC 4B100000
0007 LDA LENGTH 032023
000A COMP #0 290000
000D JEQ ENDFIL 332007
0010 +JSUB WRREC 4B100000
0014 J CLOOP 3F2FEC
0017 ENDFIL LDA =C’EOF’ 032010
001A STA BUFFER 0F2016
001D LDA #3 010003
0020 STA LENGTH 0F200A
0023 +JSUB WRREC 4B100000
0027 J @RETADR 3E2000
002A RETADR RESW 1
002D LENGTH RESW 1
LTORG
0030 * =C’EOF’ 454F46
0033 BUFFER RESB 4096
1033 BUFEND EQU *
1000 MAXLEN EQU BUFEND-BUFFER

0000 RDREC CSECT


Subroutine to read record into Buffer
EXTREF BUFFER,LENGTH,BUFEND
0000 CLEAR X B410
0002 CLEAR A B400
0004 CLEAR S B440
0006 LDT MAXLEN 77201F
0009 RLOOP TD INPUT E3201B
000C JEQ RLOOP 332FFA
000F RD INPUT DB2015
0012 COMPR A,S A004
0014 JEQ EXIT 332009
0017 +STCH BUFFER,X 57900000
001B TIXR T B850
001D JLT RLOOP 3B2FE9
0020 EXIT +STX LENGTH 13100000
0024 RSUB 4F0000
0027 INPUT BYTE X’F1’ F1
0028 MAXLEN WORD BUFEND-BUFFER 000000

0000 WRREC CSECT


Subroutine to write record from Buffer
EXTREF LENGTH,BUFFER
0000 CLEAR X B410
0002 +LDT LENGTH 77100000
51

0006 WLOOP TD =X’05’ E32012


0009 JEQ WLOOP 332FFA
000C +LDCH BUFFER,X 53900000
0010 WD =X’05’ DF2008
0013 TIXR T B850
0015 JLT WLOOP 3B2FEE
0018 RSUB 4F0000
END FIRST’
001B * =X’05’ 05
When control sections form logically related parts of a program it is
necessary to provide some means for linking them together e.g.
instructions in one section might refer to instructions or data in another
section. Because control sections are independently loaded and
relocated the assembler is unable to process these references in the
usual way because the assembler does not know where any other
control section will be loaded at execution time.
Such references between control sections are called external references.
Control sections differ from program blocks in that they are handled
separately by the assembler. (It is not even necessary for all control
sections in a program to be assembled at the same time.) Symbols
defined in one control section may not be used directly by another
control section; they must be identified as external references for the
loader to handle.
The program above has been written using three control sections, the
main program and a section for each subroutine. The directive CSECT
signals the start of a new control section.
The EXTDEF statement names symbols called external symbols that are
defined in this control section but may also be used by other sections.
EXTREF names symbols used in this section but are defined elsewhere.
0003 CLOOP +JSUB RDREC 4B100000
The operand RDREC is an EXTREF. The assembler has no idea where
the section that has RDREC will be loaded; it is difficult to assemble the
address for this instruction. The assembler inserts an address of zeros
and passes information to the loader which will cause the proper
address to be inserted at load time. Relative addressing is not possible
52

so an extended format must be used to provide room for the actual


address to be inserted. This is true for all instructions involving external
references.
The instruction 0017 +STCH BUFFER,X 57900000 is also
assembled using extended format but the x bit is set to 1 to indicate
indexed addressing.
The instruction 0028 MAXLEN WORD BUFEND-BUFFER 000000 both
BUFEND and BUFFER are external references which are stored as
zeros.
For 1000 MAXLEN EQU BUFEND-BUFFER BUFEND and BUFFER are

defined in the same control section, the value of the expression can
therefore be calculated immediately by the assembler.
Since the assembler leaves room for inserting values for external
symbols it must also include information in the object program that will
cause the loader to insert the proper values where they are required.
Two new record types are defined, they are DEFINE and REFER.
A DEFINE record gives information about EXTDEF and a REFER record
lists the EXTREFs.
The Define record:
Col 1 D
Col 2-7 Name of external symbol defined in this control section
Col 8-13 Relative address of symbol within this section
Col 14-73 Repeat information in col 2-13 for other external symbols.

The Refer record:


Col 1 R
Col 2-7 Name of external symbol defined in this control section
Col 8-73 Names of other external reference symbols.

The other needed information is added to the modification record whose


format is revised as follows:
Col 1 M
53

Col 2-7 Starting address of the field to be modified, relative to the beginning of the
control section
Col 8-9 Length of the field to be modified in half bytes.
Col 10 Modification flag (+ or -)
Col 11-16 External symbol whose value is to be added to or subtracted from the indicated field.

The figure below shows the object program corresponding to the source
code in the previous program. Note that there is a separate set of object
program records from Header through End for each section.
The modification record M^000004^05^+RDREC implies that the address of
RDREC is to be added onto this field in order to produce the correct
machine instruction for execution.
For the instruction at address 0028 both BUFEND and BUFFER are in a
different control section. The assembler generates an initial value of zero
for this word. The last two modification records in RDREC direct that the
address of BUFEND be added to this field and the address of BUFFER
be subtracted from it.
If an expression is to be used, all terms in an expression must be relative
within the same section because if the terms are in different sections
their difference has a value that is unpredictable.
H^COPY ^000000^001033
D^BUFFER^000033^BUFEND^001033^LENGTH^0002D
R^RDREC ^WRREC
T^000000^1D^172027^4B100000^032023^290000^332007^4B100000^3F2FEC^032016^0F2016
T^00001D^0D^010003^0F200A^4B100000^3E2000
T^000030^03^454F46
M^000004^05^+RDREC
M^000011^05^+WRREC
M^000024^05^+WRREC
E^000000

H^RDREC ^000000^00002B
R^BUFFER^LENGTH^BUFEND
T^000000^1D^B410^B400^B440^77201F^E3201B^332FFA^DB2015^A004^332009^57900000^B850
T^00001D^0E^3B2FE9^13100000^4F0000^F1^000000
M^000018^05^+BUFFER
M^000021^05^+LENGTH
M^000028^06^+BUFEND
M^000028^06^-BUFFER
E

H^WRREC ^000000^00001C
R^LENGTH^BUFFER
T^000000^1C^B410^77100000^E32012^332FFA^53900000^DF2008^B850^3B2FEE^4F0000^05
54

M^000003^05^+LENGTH
M^00000D^05^+BUFFER
E

ASSEMBLER DESIGN OPTIONS


Two Pass Assembler with Overlay Structure
Most assemblers divide the processing of the source program into 2
passes. The internal tables and subroutines that are used only during
pass 1 are not needed after the pass is completed; others like the
SYMTAB are needed for both passes.
Since pass 1 and pass 2 are not needed at the same time, they can
occupy the same locations in memory during execution of the
assembler. Driver

Shared Tables and


Routines

Pass 1 Tables Pass 2 tables and


and Routines Routines

Three program segments exist. The root segment contains a simple


driver program whose function is to call in turn the other two segments. It
also contains the tables and routines needed by both passes.
Since Pass 1 and Pass 2 segments are never needed at the same time
they can occupy the same locations in memory during execution. Initially
the root and pass 1 segments are loaded into memory and the
assembler makes the first pass. At the end of the first pass, the pass 2
segment is loaded in memory replacing the pass 1 segment.
In this way the assembler needs less memory hence reducing its
memory requirements.

One Pass assemblers


55

The main problem in trying to assemble a program in one pass is


forward references because instruction operands are often symbols that
are not yet defined.
There are two types of one pass assemblers; one type produces object
code directly in memory for immediate execution while the other
produces the usual kind of object program for later execution.
The program below illustrates both types. It is similar to the one on page
41 but has data item definitions placed ahead of the code that
references them. In the first type of assembler no object program is
written out and no loader is needed ( load and go). It avoids the overhead
of writing the object program out and reading it back in.

Object Code in Memory and symbol table entries for the program below after the
instruction at address 2021
Memory
Address Contents Symbol Value

1000 454F4600 00030000 00xxxxxx xxxxxxxx LENGTH 100C


1010 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx RDREC * . 2013 ø
THREE 1003
ZERO 1006
2000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx14
2010 100948-- --00100C 28100630 ------48-- WRREC * . 201F ø
2020 ---3C2012 EOF 1000
ENDFIL * . 201C ø
RETADR 1009
BUFFER 100F
CLOOP 2012
FIRST 200F

Object Code in Memory and symbol table entries for the program below after the
instruction at address 2052
Memory
Address Contents Symbol Value

1000 454F4600 00030000 00xxxxxx xxxxxxxx LENGTH 100C


1010 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx RDREC 203D
THREE 1003
ZERO 1006
2000 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxx14 WRREC * . 201F ø
2010 10094820 3D00100C 28100630 202448-- EOF 1000
2020 ---3C2012 0010000C 100F0010 030C100C ENDFIL 2024
2030 48-------08 10094C00 00F10010 00041006 RETADR 1009
2040 001006E0 20393020 43D82039 28100630 BUFFER 100F
2050 -------5490 0F CLOOP 2012
FIRST 200F
MAXLEN 203A
INPUT 2039
EXIT * .
RLOOP 2043
56

2050 ø

The assembler generates object code instructions as it scans the source


program. If the operand is a symbol that has not yet been defined, the
operand address is omitted when the instruction is assembled. The
symbol used as an operand is entered in the symbol table. This entry is
flagged as undefined. When the definition for the symbol is encountered
the symbol table is scanned and the proper address is inserted into any
instructions previously generated.
Any SYMTAB entries that are still marked with * at the end of the
program indicate undefined symbols and they should be flagged by the
assembler as errors.
1000 COPY START 1000
1000 EOF BYTE C’EOF’ 454F46
1003 THREE WORD 3 000003
1006 ZERO WORD 0 000000
1009 RETADR RESW 1
100C LENGTH RESW 1
100F BUFFER RESB 4096

200F FIRST STL RETADR 141009


2012 CLOOP JSUB RDREC 48203D
2015 LDA LENGTH 00100C
2018 COMP ZERO 281006
201B JEQ ENDFIL 302024
201E JSUB WRREC 482062
2021 J CLOOP 3C2012
2024 ENDFIL LDA EOF 001000
2027 STA BUFFER 0C100F
202A LDA THREE 001003
202D STA LENGTH 0C100C
2030 JSUB WRREC 482062
2033 LDL RETADR 0810009
2036 RSUB 4C0000
Subroutine to read record into Buffer
2039 INPUT BYTE X’F1’ F1
203A MAXLEN WORD 4096 001000

203D RDREC LDX ZERO 041006


2040 LDA ZERO 001006
2043 RLOOP TD INPUT E02039
2046 JEQ RLOOP 302043
2049 RD INPUT D82039
204C COMP ZERO 281006
204F JEQ EXIT 30205B
2052 STCH BUFFER,X 54900F
2055 TIX MAXLEN 2C203A
2058 JLT RLOOP 382043
205B EXIT STX LENGTH 10100C
205E RSUB 4C0000
57

Subroutine to write record from Buffer


2061 OUTPUT BYTE X’05’ 05

2062 WRREC LDX ZERO 041006


2065 WLOOP TD OUTPUT E02061
2068 JEQ WLOOP 302065
206B LDCH BUFFER,X 50900F
206E WD OUTPUT DC2061
2071 TIX LENGTH 2C100C
2074 JLT WLOOP 382065
2077 RSUB 4C0000
END FIRST
The second type of a one pass assembler that produces object code as
output is needed on systems where external working storage devices
are not available.
Forward references are entered into lists as before but when the
definition of the symbol is encountered another text record with the
correct operand address is generated. When the program is loaded, this
address will be inserted into the instruction by the action of the loader.
The second text record contains the object code generated from 200F
through 2021. The operand addresses for the instructions at addresses
2012, 201B and 201E have been generated at addresses 0000. When
the definition of ENDFIL at address 2024 is encountered the assembler
generates a third text record. This record specifies that the value 2024 is
to be loaded at location 201C. When this program is loaded the value
2024 will replace 0000 which was previously loaded.

H^COPY ^00100^00107A
T^001000^09^454F46^000003^000000
T^00200F^15^141009^480000^00100C^281006^300000^480000^3C2012
T^00201C^02^2024
T^002024^19^001000^0C100F^001003^0C100C^480000^081009^4C0000^F1^001000
T^002013^02^203D
T^00203D^1E^041006^001006^E02039^302043^D82039^281006^300000^54900F^2C203A^382043
T^002050^02^205B
T^00205B^07^10100C^4C0000^05
T^00201F^02^2062
T^002031^02^2062
T^002062^18^041006^E02061^302065^50900F^DC2061^2C100C^382065^4C0000
E^00200F
58

LOADERS AND LINKERS


An object program contains translated instructions and data values from
the source program and specifies addresses in memory where these
items are to be loaded.
Loading: brings the object program into memory for execution.
Relocation: modifies the object program so that it can be loaded at an
address different from the location originally specified.
Linking: Combines two or more separate object programs and supplies
the information needed to allow references between them.

A loader is a system program that performs the loading function. Many


loaders also support relocation and linking. Some machines have a
linker to perform the linking operation and a separate loader to handle
relocation and loading but in most cases one system loader or linker can
be used regardless of the original source programming language.

Basic Loader Functions.


This is to bring an object program into memory and to start executing it.
For loading a simple absolute object module like the one for the simple
SIC machine on page 41 there is no linking and program relocation. All
functions are accomplished in one pass.
The header record is checked to verify that the correct program has
been presented for loading and that it will fit into the available memory.
As each text record is read the object code it contains is moved to the
indicated address in memory. When the END record is encountered the
loader jumps to the specified address to begin execution of the object
program.
59

Machine Dependent Loader Features


The absolute loader is simple but it has some disadvantages e.g. we do
not know in advance where a program will be loaded. There is need to
write relocatable programs. Similarly we can’t use subroutine libraries
efficiently since they can’t be used efficiently if they are pre assigned
absolute addresses.

1. Relocation
Loaders that allow program relocation are called relocating or relative
loaders. There are two methods for specifying relocation as part of the
object program.
The first method uses modification records which describe each part of
the object code that must be changed when the program is relocated.
Using the program on page 44 the only portions that must be relocated
are those that contain actual addresses at addresses 0006, 0013 and
0026.

H^COPY ^000000^001077
T^000000^1D^17202D^69202D^4B101036^032026^290000^332007^4B10105D^3F2FEC^032010
T^00001D^13^0F2016^010003^0F200D^4B10105D^3E2003^454F46
T^001036^1D^B410^B400^B440^75101000^E32019^332FFA^DB2013^A004^332008^57C003^B850
T^001053^1D^3B2FEA^134000^4F0000^F1^B410^774000^E32011^332FFA^53C003^DF2008^B850
T^001070^07^3B2FEF^4F0000^05
M^000007^05+COPY
M^000014^05+COPY
M^000027^05+COPY
E^000000

There is one modification record for each value that must be changed
during relocation. Each modification record specifies the starting address
and length of the field whose value is to be altered. It then specifies the
modification to be performed. Here all modifications add the value of the
symbol COPY which represents the starting address of the program.
This method is not suitable for a program which uses absolute
addressing and may require all records to be modified.
60

A second method is to use a relocation bit associated with each word of


the object code in case of a machine that uses primarily direct
addressing and it has a fixed instruction format like the SIC machine.
The figure below illustrates this method. There are no modification
records. There is a relocation bit associated with each word of object code.
Since all SIC instructions occupy one word, this means that there is one
relocation bit for each possible instruction. The relocation bits are
gathered together into a bit mask following the length indicator in each text
record.
0000 COPY START 0
0000 FIRST STL RETADR 140033
0003 CLOOP JSUB RDREC 481039
0006 LDA LENGTH 000036
0009 COMP ZERO 280030
000C JEQ ENDFIL 300015
000F JSUB WRREC 481061
0012 J CLOOP 3C0003
0015 ENDFIL LDA EOF 00002A
0018 STA BUFFER 0C0039
001B LDA THREE 00002D
001E STA LENGTH 0C0036
0021 JSUB WRREC 481061
0024 LDL RETADR 080033
0027 RSUB 4C0000
002A EOF BYTE C’EOF’ 454F46
002D THREE WORD 3 000003
0030 ZERO WORD 0 000000
0033 RETADR RESW 1
0036 LENGTH RESW 1
0039 BUFFER RESB 4096
Subroutine to read record into Buffer
1039 RDREC LDX ZERO 040030
103C LDA ZERO 000030
103F RLOOP TD INPUT E0105D
1042 JEQ RLOOP 30103F
1045 RD INPUT D8105D
1048 COMP ZERO 280030
104B JEQ EXIT 301057
104E STCH BUFFER,X 548039
1051 TIX MAXLEN 2C105E
1054 JLT RLOOP 38103F
1057 EXIT STX LENGTH 100036
105A RSUB 4C0000
105D INPUT BYTE X’F1’ F1
105E MAXLEN WORD 4096 001000
Subroutine to write record from Buffer
1061 WRREC LDX ZERO 040030
1064 WLOOP TD OUTPUT E01079
1067 JEQ WLOOP 301064
106A LDCH BUFFER,X 508039
106D WD OUTPUT DC1079
1070 TIX LENGTH 2C0036
1073 JLT WLOOP 381064
1076 RSUB 4C0000
1079 OUTPUT BYTE X’05’ 05
END FIRST
61

In the object code for the program above this mask is represented as
three hexadecimal digits. They are underlined for easier identification. If
the relocation bit corresponding to a word of object code is set to 1 the
program’s starting address is to be added to this word when the program
is to be relocated. A bit value of 0 indicates that no modification is
necessary. If a text record contains fewer than 12 words of object code,
the bits corresponding to the unused words are set to 0.
HCOPY ^000000^00107A
T^000000^1E^FFC^140033^481039^000036^280030^300015^481061^3C0003^00002A^0C0039^00002D
T^00001E^15^E00^0C0036^481061^080033^4C0000^454F46^000003^000000
T^001039^1E^FFC^040030^000003^ E0105D^30103F^D8105D^280030^301057^548039^2C105E^38103F
T^001057^0A^800^100036^4C0000^F1^001000
T^001061^19^FE0^040030^E01079^301064^508039^DC1079^2C0036^381064^4C0000^05
E^000000

Program Linking
Concepts of program linking were discussed under control sections.
The example below consists of three differently assembled programs
each having a list of items LISTA, LISTB and LISTC. Their ends are
marked by ENDA, ENDB and ENDC. The labels on the beginnings and
ends of the lists are external symbols. Each program has the same set
of references to these external symbols.
In PROGA, REF1 is a reference to a label within the program which is
assembled by program counter relative. No modification is necessary.
In PROGB, REF1 Refers to an external symbol. The assembler uses an
extended format instruction with the address field set to 000000. There is
a modification record in the object program instructing the loader to add
the value of LISTA to this address field when the program is linked.
REF2 and REF3 are explained similarly.
In PROGA the assembler can evaluate all of the expression in REF4
except for the value of LISTC. This results in an initial value of 000014
and one modification record.
62

0000 PROGA START 0


EXTDEF LISTA,ENDA
EXTREF LISTB, ENDB,LISTC,ENDC
.

0020 REF1 LDA LISTA 03201D


0023 REF2 +LDT LISTB + 4 77100004
0027 REF3 LDX #ENDA - LISTA 050014

0040 LISTA EQU *

0054 ENDA EQU *


0054 REF4 WORD ENDA-LISTA+LISTC 000014
0057 REF5 WORD ENDC-LISTC-10 FFFFF6
005A REF6 WORD ENDC-LISTC+LISTA-1 00003F
005D REF7 WORD ENDA-LISTA-(ENDB-LISTB) 000014
0060 REF8 WORD LISTB-LISTA FFFFC0
END REF1

0000 PROGB START 0


EXTDEF LISTB,ENDB
EXTREF LISTA, ENDA,LISTC,ENDC
.

0036 REF1 +LDA LISTA 03100000


003A REF2 LDT LISTB + 4 772027
003D REF3 +LDX #ENDA - LISTA 05100000

0060 LISTB EQU *

0070 ENDB EQU *


0070 REF4 WORD ENDA-LISTA+LISTC 000000
0073 REF5 WORD ENDC-LISTC-10 FFFFF6
0076 REF6 WORD ENDC-LISTC+LISTA-1 FFFFFF
0079 REF7 WORD ENDA-LISTA-(ENDB-LISTB) FFFFF0
007C REF8 WORD LISTB-LISTA 000060
END

0000 PROGC START 0


EXTDEF LISTC,ENDC
EXTREF LISTA, ENDA,LISTB,ENDB
.

0018 REF1 +LDA LISTA 03100000


001C REF2 +LDT LISTB + 4 77100004
0020 REF3 +LDX #ENDA - LISTA 05100000

0030 LISTC EQU *

0042 ENDC EQU *


0042 REF4 WORD ENDA-LISTA+LISTC 000030
0045 REF5 WORD ENDC-LISTC-10 000008
0048 REF6 WORD ENDC-LISTC+LISTA-1 000011
004B REF7 WORD ENDA-LISTA-(ENDB-LISTB) 000000
004E REF8 WORD LISTB-LISTA 000000
END
63

H^PROGA ^000000^000063
D^LISTA ^000040^ENDA ^000054
R^LISTB ^ENDB ^LISTC ^ENDC
T^000020^0A^03201D^77100004^050014
T^000054^0F^000014^FFFFF6^00003F^000014^FFFFC0
M^000024^05^+LISTB
M^000054^06^+LISTC
M^000057^06^+ENDC
M^000057^06^-LISTC
M^00005A^06^+ENDC
M^00005A^06^-LISTC
M^00005A^06^+PROGA
M^00005D^06^-ENDB
M^00005D^06^+LISTB
M^000060^06^+LISTB
M^000060^06^-PROGA
E^000020

H^PROGB ^000000^00007F
D^LISTB ^000060^ENDB ^000070
R^LISTA ^ENDA ^LISTC ^ENDC
T^000036^0B^03100000^772027^05100000
T^000070^0F^000000^FFFFF6^FFFFFF^FFFFF0^000060
M^000037^05^+LISTA
M^00003E^05^+ENDA
M^00003E^05^-LISTA
M^000070^06^+ENDA
M^000070^06^-LISTA
M^000070^06^+LISTC
M^000073^06^+ENDC
M^000073^06^-LISTC
M^000076^06^+ENDC
M^000076^06^-LISTC
M^000076^06^+LISTA
M^000079^06^+ENDA
M^000079^06^-LISTA
M^00007C^06^+PROGB
M^00007C^06^-LISTA
E

H^PROGC ^000000^000051
D^LISTC ^000030^ENDC ^000042
R^LISTA ^ENDA ^LISTB ^ENDB
T^000018^0C^03100000^77100004^05100000
T^000042^0F^000030^000008^000011^000000^000000
M^000019^05^+LISTA
M^00001D^05^+LISTB
M^000021^05^+ENDA
M^000021^05^-LISTA
M^000042^06^+ENDA
64

M^000042^06^-LISTA
M^000042^06^+PROGC
M^000048^06^+LISTA
M^00004B^06^+ENDA
M^00004B^06^-LISTA
M^00004B^06^-ENDB
M^00004B^06^+LISTB
M^00004E^06^+LISTB
M^00004E^06^-LISTA
E
The same expression in PROGB contains no terms that can be
evaluated by the assembler. The object code therefore contains an initial
value of 000000 and three modification records.
In PROGC the assembler can supply the value of LISTC relative to the
beginning of the program which is not known until the program is loaded.
The initial value for this data word contains the relative address of LISTC
= 000030. The modification records instruct the loader to add the
beginning address of the program (PROGC), to add the value of ENDA
and to subtract the value of LISTA.
Assume that PROGA has been loaded starting at address 4000 followed
immediately by PROGB and PROGC. REF4 through REF8 result into
the same value after relocation and linking for all the three programs.
E.g. REF4 in PROGA is located at address 4054. The initial value was
000014. To this is added the address assigned to LISTC which is 4112
(40E2 + 30). This results in the value 004126. This value will be the
same at address 70 (40D3) in PROGB and at address 0042 in PROGC.

Tables and Logic for a Linking Loader


Modification records are used for relocation so that the linking and
relocating functions are performed using the same mechanism.
The input to a linking loader consists of a set of object programs, (i.e. the
control sections) that are to be linked together. Since it is possible for a
control section to make an external reference to a symbol whose
65

definition does not appear until later in the input, the required linking
cannot be performed until an address is assigned to the external symbol
involved.
A linking loader therefore makes two passes just like the assembler.
Pass 1 assigns addresses to all external symbols and pass 2 performs
the actual loading, relocation and linking.
The main data structure used is the External symbol table ESTAB which
is similar to the SYMTAB. It is used to store names and addresses of
each external symbol in the set of control sections being loaded. It also
indicates in which control section the symbol is defined.
The beginning address in memory where the linked program is loaded is
called PROGADDR. Its value is supplied to the loader by the operating
system.
CSADDR is the starting address assigned to the control section currently
being scanned by the loader. It is added to all relative addresses within
the control section to convert them to actual addresses.
During pass 1 the loader is concerned with only the Header and Define
record types. The beginning load address for the linked program
PROGADDR becomes the starting address CSADDR for the first control
section.
The control section name for the Header record is entered into ESTAB
with a value given by CSADDR. All the external symbols appearing in
the Define record are also entered into ESTAB. Their addresses are
obtained by adding the value specified in the Define record to CSADDR.
When the End record is read, the control section length CSLTH which
was saved by the header record is added to CSADDR. This calculation
gives the starting address of the next control section.
At the end of pass 1 ESTAB contains all external symbols defined in the
control sections together with the address assigned to each.
66

Control Symbol Address Length


Section Name
PROG A 4000 0063
LISTA 4040
ENDA 4054
PROGB 4063 007F
LISTB 40C3
ENDB 40D3
PROGC 40E2 0051
LISTC 4112
ENDC 4124

A printout of ESTAB is called a load map.


Pass 2 performs the actual loading, relocation and linking of the
program. As each text record is read the object code is moved to the
specified address (plus the current value of CSADDR). When the
modification record is encountered the symbol whose value is to be used
for modification is looked up in ESTAB. This value is then added to or
subtracted from the indicated location in memory.
The last step performed by the loader is the transferring of control to the
loaded program to begin execution.

MACHINE INDEPENDENT LOADER FEATURES


1. Automatic Library Search
This feature allows the programmer to use standard subroutines without
explicitly including them in the program to be loaded. The routines are
automatically retrieved from a library, linked with the main program and
they are loaded. The programmer just mentions the subroutine names
as external references in the source program.
Linking loaders that support automatic library search keep track of
external symbols that are referred to but not defined in the input to the
loader. This is done by entering symbols for each refer record into
ESTAB. At the end of pass 1 symbols in ESTAB that remain undefined
represent unresolved external references. The loader searches the
67

libraries specified for routines that contain definitions of these symbols


and processes the subroutines found by this search.
Subroutines fetched from a library in this way may themselves contain
external references. The library search process must therefore be
repeated until all external references are resolved.
The libraries to be searched by the loader usually contain assembled or
compiled versions of the subroutines

2. Loader Options
Many loaders allow the user to specify options that modify the standard
processing of the program. Below are a few of them:
(i) An option that allows the selection of alternative sources
of input e.g. INCLUDE program_name (library_name) directs the loader
to read the designated object program from a library and treat it as
if it were part of the primary loader input.
(ii) An option to allow the user to delete external symbols or
entire control sections e.g. DELETE csect_name instructs the
loader to delete the named control section from the set of
programs being loaded.
(iii) An option to change external symbols e.g. CHANGE
name1, name2 causes the external symbol name1 to be changed to

name2 wherever it appears in the object programs.


(iv) An option to specify alternative libraries to be searched
e.g. LIBRARY MYLIB will cause library MYLIB to be searched

before the standard system libraries.


(v) Loaders that perform automatic library search may be
asked to exclude some functions that come with the library search
e.g. NOCALL STDEV, PLOT, CORREL will instruct the loader to
68

exclude the said functions. This avoids the overhead of loading


and linking the unneeded routines and saves on memory space.
(vi) Options to control outputs from the loader. The user can
specify whether an output is needed or not.
(vii) Options to specify whether external references will be
resolved by library search.
(viii) An option to specify the location at which execution is to
begin thus overriding any information given in the object program.
(ix) An option to control whether or not the loader should
attempt to execute the program if errors are detected during the
load.

3. Overlay Programs
They are programs that are designed to execute in such a way that if
both or all of them are not needed in memory at the same time, one can
execute first and the other will execute in the same memory space after
the first one has been executed.
1 A

2 B 5 C 6 D/E

3 F/G 4 H 7 I 8 J 9 K
Control Length Control Length
Section (bytes) Section (bytes)
A 1000 G 400
B 1800 H 800
C 4000 I 1000
D 2800 J 800
E 800 K 2000
F 1000
69

In the example above the letters represent control sections and the lines
show control between the control sections. Control section A (the root)
can call B, C, or D/E etc. D/E means that control sections D and E are
closely related and they are always used together. The nodes in the tree
are called segments. The root segment (A) is loaded when execution of
the program begins and it remains in memory until the program ends.
The other segments are loaded as they are called.
If H is being executed both B and A should be in memory since H was
called by B, and B was called by A. Thus the three sections A, B and H
must be active. The other segments cannot be active since there is no
path from them to H. If for example the segment containing K was called
previously it must have returned to D/E and then to A before B could be
called by A.
Because segments at the same level e.g. B, C and D/E can be called
only from the level above, they cannot be required at the same time;
thus they can be assigned to the same locations in memory. If a
segment is loaded it overlays any segments at the same level and their
subordinate segments that may be in memory. The entire program
therefore can be executed in a smaller total amount of memory. This is
the main reason for the use of overlay structures.
The structure of an overlay program is defined to the loader using the
following commands:
SEGMENT seg_name(control-section….) and
PARENT seg_name

SEGMENT seg_name(control-section….) defines a segment (i.e a


node in the tree structure), gives it a name and lists the control sections
to be included in it. The first segment defined is the root. Two
consecutive SEGMENT statements specify a parent child relationship
between the segments defined.
70

PARENT seg_name identifies the (already existing) segment


that is to be the parent of the next segment defined.
The statements below define the above overlay structure.
SEGMENT SEG1(A)
SEGMENT SEG2(B)
SEGMENT SEG3(F,G)
PARENT SEG2
SEGMENT SEG4(H)
PARENT SEG1
SEGMENT SEG5(C)
PARENT SEG1
SEGMENT SEG6(D/E)
SEGMENT SEG7(I)
PARENT SEG6
SEGMENT SEG8(J)
PARENT SEG6
SEGMENT SEG9(K)

Once the overlay structure has been defined the starting addresses for
the segments can be found because each segment begins immediately
after the end of its parent.
The figure below shows the length and the relative starting address of
each segment in our example. It assumes that the beginning load
address for the program is 8000.
Segment Starting Address
Relative Actual Length
1 0000 8000 1000
2 1000 9000 1800
3 2800 A800 1400
4 2800 A800 800
5 1000 9000 4000
6 1000 9000 3000
7 4000 C000 1000
8 4000 C000 800
9 4000 C000 2000
During the execution of the program many different segments may be in
memory together. Below are some possibilities.
71

The loader can assign an actual starting address to every segment in the
overlay program once the initial load address is supplied. Thus the
addresses of all external symbols are known and all relocation and
linking operations can be performed.

8000
A A A
9000

A000 B D
B000
H
C000
E
D000

E000

The root segment can be loaded directly into memory; the other
segments with their linking information are loaded into a special working
file called SEGFILE that is created by the loader.
The actual loading of the segments during program execution is handled
by an overlay manager, OVLMGR. This is a special control section which is
automatically included in the root segment of the overlay program by the
loader. OVLMGR uses a segment table SEGTAB which has all the
information about the overlay program. SEGTAB also includes a special
transfer area for each segment except the root. If a segment is currently

loaded in memory the transfer area contains a jump instruction to the


entry point of that segment. If the segment is not currently loaded the
transfer area contains instructions that invoke OVLMGR and pass to it
information concerning the segment to be loaded.

LOADER DESIGN OPTIONS


1. Linkage Editors
72

A linking loader performs all linking and relocation operations including


automatic library search if specified and loads the linked program directly
into memory for execution. A linkage editor on the other hand produces a
linked version of the program (called a load module or an executable image)
which is written to a file or library for later execution. When the user is
ready to run the linked program a simple relocating loader can be used
to load the program in memory. The only object code modification
required is the addition of an actual load address to relative values within
the program.
If a program is to be executed many times without being reassembled
the use of linkage editors substantially reduces the overhead required.
Resolution of external references and library searching are only done
once (when the program is link edited). In contrast a linking loader
searches libraries and resolves external references every time the
program is executed.

2. Dynamic Linking
The linking function is performed at execution time. A subroutine is
loaded and linked to the rest of the program when it is first called. It
provides for the ability to load the routines only when (and if) they are
needed.
73

COMPILERS
A compiler bridges the semantic gap between a Programming Language
domain and an execution domain. Two aspects of the compiler are:
1. To generate code to implement meaning of a source program in
the execution domain and
2. To provide diagnostics for violations of the programming language
semantics in the source program.
For purposes of compiler construction a high level language is usually
described in terms of a grammar. The grammar specifies the form or
syntax of legal statements in the language.
For example an assignment statement might be defined by the grammar
as a variable name, followed by an assignment operator (:=) followed by
an expression. The problem of compilation becomes the matching of the
statements written by the programmer to structures defined by the
grammar, and generating the appropriate object code for each
statement.
The source program statements are regarded as tokens. Tokens are the
fundamental building blocks of the language. It might be a keyword, a
variable name, an integer, an arithmetic operator etc. The task of
scanning the source statement, recognizing and classifying the various
tokens is known as lexical analysis. The part of the compiler that
performs this analytical function is called the scanner.
After the token scan, each statement in the program must be recognized
as some language construct, such as a declaration, or an assignment
statement, described by the grammar. This process which is called
syntactic analysis or parsing is performed by part of the compiler that
is called the parser. The last step is the basic translation process in the
generation of object code.
74

GRAMMARS
A grammar for a programming language is a formal description of the
syntax or form of programs and individual statements written in the
language. The grammar does not describe the semantics or meaning of
the various statements.
A number of different notations are used to write grammars. The
simplest and widely used notation is the BNF (Backus–Naur Form).
A BNF grammar consists of a set of rules each of which defines the
syntax of some construct in the programming language. Below is a BNF
grammar of a restricted Pascal Language.
1. <prog> ::= PROGRAM <prog-name> VAR <dec-list> BEGIN <stmt-list> END
2. <prog-name>::= id
3. <dec-list> ::= <dec> | <dec-list> ; <dec>
4. <dec> ::= <id-list> : <type>
5. <type> ::= INTEGER
6. <id-list> ::= id | <id-list> , id
7. <stmt-list>::= <stmt> | <stmt-list> ; <stmt>
8. <stmt> ::= <assign> | <read> | <write> | for
9. <assign> ::= id := <exp>
10. <exp> ::= <term> | <exp> + <term> | <exp> - <term>
11. <term> ::= <factor> | <term * <factor> | <term> DIV <factor>
12. <factor> ::= id | int | ( <exp>)
13. <read> ::= READ ( <id-list> )
14. <write> ::= WRITE ( <id-list> )
15. <for> ::= FOR <index-exp> DO <body>
16. <index-exp> ::= id := <exp> TO <exp>
17. <body> ::= <stmt> | BEGIN <stmt-list> END

Here below is an example of a Pascal program that relates to the above


grammar.
Consider rule 13 in the grammar, <read> ::= READ ( <id-list>)
The symbol ::= means “is defined to be.”
Character strings enclosed between the angle brackets < and > are
called non terminal symbols (i.e. names of the constructs defined in
the grammar). Entries not enclosed in angle brackets are terminal
symbols of the grammar (i.e. tokens).
75

1. PROGRAM STATS
2. VAR
3. SUM, SUMSQ, I, VALUE,MEAN, VARIANCE : INTEGER
4. BEGIN
5. SUM := 0;
6. SUMSQ := 0;
7. FOR I := 1 TO 100 DO
8. BEGIN
9. READ (VALUE)
10. SUM := SUM + VALUE;
11. SUMQ := SUMQ + VALUE * VALUE
12. END;
13. MEAN : = SUM DIV 100;
14. VARIANCE := SUMQ DIV 100 - MEAN * MEAN;
15. WRITE (MEAN, VARIANCE)
16. END.

In this rule the non terminal symbols are <read> and <id-list>, and the
terminal symbols are the tokens READ, (, and ). Thus the rule specifies
that a <read> consists of the token READ, followed by the token “(“ ,
followed by a language construct <id-list>, followed by the token “)”.
To recognize a <read> of course we need the definition of <id-list> which
is provided for in rule 6.
It is often convenient to display the analysis of a source statement in
terms of a grammar as a tree called the parse tree or syntax tree.
Below are parse trees for statement number 9, READ (VALUE) and
statement 14 VARIANCE := SUMQ DIV 100 –MEAN * MEAN.
<read>

<id-list>

READ ( id )
{value}
76

<assign>

<exp>

<exp>

<term> <term>

<term> <term>

<factor> <factor> <factor> <factor>

id := id DIV int _ id * id
{variance} {sumq} {100} {mean} {mean}

Lexical Analysis
This involves scanning the program to be complied and recognizing the
tokens that make up the source statements. Scanners are usually
designed to recognize keywords, operators, identifiers, integers, floating
point numbers, character strings and other similar items that are written
as part of the source program.
Items such as identifiers and integers are usually recognized as either
single tokens or they could be defined as part of the grammar e.g.
<ident> ::= <letter> | <ident> <letter> | <ident> <digit>
<letter> ::= A | B | C | D |………|Z
<digit> ::= 0 | 1 | 2 | 3 | ……..|9

In such a case the scanner would recognize as tokens the single


characters A, B, 0, 1 etc. Similarly the scanner recognizes both single
character and multiple character tokens directly.
The output of a scanner consists of a sequence of tokens. Each token is
usually represented by some fixed length code such as an integer.
Below is the token coding scheme for the grammar considered:
Token code Token Code Token code
77

PROGRAM 1 WRITE 9 - 17
VAR 2 TO 10 * 18
BEGIN 3 DO 11 DIV 19
END 4 ; 12 ( 20
END. 5 : 13 ) 21
INTEGER 6 , 14 id 22
FOR 7 := 15 int 23
READ 8 + 16

In such a coding scheme the token PROGRAM would be represented by


the integer value 1, an identifier id would be represented by 22.
Line Token Type Token Line Token Type Token Specifier
Specifier
1 1 10 22 SUM
22 STATS 15
2 2 22 SUM
3 22 SUM 16
14 22 VALUE
22 SUMQ 12
14 11 22 SUMQ
22 I 15
14 22 SUMQ
22 VALUE 16
14 22 VALUE
22 MEAN 18
14 22 VALUE
22 VARIANCE 12 4
13 12
6 13 22 MEAN
4 3 15
5 22 SUM 22 SUM
15 19
23 #0 23 #100
12 12
6 22 SUMQ 14 22 VARIANCE
15 15
23 #0 22 SUMQ
12 19
7 7 23 #100
22 I 17
15 22 MEAN
23 #1 18
10 22 MEAN
23 #100 12
11 15 9
8 3 20
9 8 22 MEAN
20 14
22 VALUE 22 VARIANCE
21 21
12 16 5
78

In case of an identifier or an integer it is necessary to specify the


particular identifier name or value that was scanned. A token specifier is
therefore associated with that particular type of code.
The figure shows the output from a scanner for the Pascal program we
considered.
Apart from recognizing tokens the scanner is also responsible for
reading the lines of the source program and possibly printing the source
listing.
The scanner must take into account any special format required of the
source statements e.g. in Fortran a number in columns 1-5 of a source
statement is a statement number not an integer, whether blanks function
as delimeters for tokens (as in Pascal) or not, whether statements can
be continued freely from one line to the next (as in Pascal) or whether
special continuation flags are necessary (as in Fortran)

Syntactic Analysis
During syntactic analysis the source statements written by the
programmer are recognized as language constructs described by the
grammar being used. This may be regarded as building the parse tree
for the statements being translated. Parsing techniques are of two types;
bottom up and top down according to the way in which the parse tree is
being constructed.
Top down methods begin with the rule of the grammar that specifies the
goal of the analysis (i.e. the root of the tree), and attempt to construct
the tree so that the terminal nodes match the statements being
analyzed.
Bottom up methods begin with terminal nodes of the tree (the statements
being analyzed), and attempt to combine these into successively higher-
level nodes until the root is reached.
79

A large number of different parsing techniques have been devised; one


of the bottom-up parsing techniques is the operator-precedence method
which is based on examining pairs of consecutive operators in the
source program and making decisions about which operation should be
performed first.
Consider A+B*C–D
Multiplication and division usually have higher precedence than addition
and subtraction.
So for the first pair of operators i.e. (+ and *), + has lower precedence
than * i.e. + < *; similarly * > - for the next pair.
So for the expression
A+B*C–D
< >
This implies that the expression B * C is to be computed before either of
the other operations. In form of a parse tree this means that the *
operation appears at a lower level than the + or – operators.
During this process the statement being analyzed is scanned for a sub
expression whose operators have higher precedence than the
surrounding operators. This sub expression then is interpreted in terms
of the rules of the grammar under consideration. This process continues
until the root of the tree is reached.
The first step in constructing an operator precedence parser is to
determine the precedence relations between the operators of the
grammar. In this context, operator means any terminal symbol (a token).
From the table:
PROGRAM = VAR means that the two tokens involved have equal
precedence.
BEGIN < FOR means that BEGIN has less precedence than FOR.
Precedence relations do not follow the ordinary rules for comparisons
e.g. ; > END but END > ;
80

Where there are no precedence relations between pairs of tokens


means that the two tokens cannot appear together in any legal
statement. If such a combination occurs during parsing it should be
recognized as a syntax error.
VAR BEGIN END INTEGER FOR READ WRITE TO DO ; : , := + - * DIV ( ) id int
PROGRAM = <
VAR = < < < <
BEGIN = < < < < <
END > >
INTEGER > = > <
FOR
READ =
WRITE =
TO > < < < < < < <
DO < > < < < > <
; > > < < < > < < <
: > < >
, =
:= > = > < < < < < < <
+ > > > > > > < < < > < <
- > > > > > > < < < > < <
* > > > > > > > > < > < <
DIV > > > > > > > > < > < <
( > < < < < < < = < <
) > > > > > > > > >
Id > > > > > > > = > > > > >
Int > > > > > > > > >
The statements are scanned from left to right one token at a time. For
each pair of operators the precedence relation between them is
determined.
Examples:
1. Read(Value);
…(i) …BEGIN READ ( id )
< = < >

…(ii) BEGIN READ ( <N1> ) ; <N1>


< = = >
id
{Value}

…(iii) BEGIN <N2> ; <N2>

READ ( <N1> )

id {Value}
81

In part (i) the parser identifies the portion of the statement delimited by
the precedence relationship < and > which consists of a single token id.
This portion can be identified as a factor (rule 12), prog_name (rule 2) or
an id_list (rule 6). It is simply interpreted as some non terminal symbol
<N1>.
Precedence relations hold only between terminal symbols, so <N1> is
not involved in this process.
2. Variance := Sumq DIV 100 – Mean * Mean ;
(i) id1 := id2 DIV int – id3 * id4 ;
< = < >
(ii) id1 := [N1] DIV int – id3 * id4 ; <N1>
< = < < >
id2
{SumQ}

(iii) id1 := [N1] DIV [N2] – id3 * id4 ; <N1> <N2>


< = < >
id2 int
{SumQ} {100}

(iv) id1 := [N3] – id3 * id4 ; <N3>


< = < < >

<N1> <N2>

id2 DIV int


{SumQ} {100}

(v) id1 := [N3] – [N4] * id4 ; <N3>


< = < < < >

<N1> <N2> <N4>

id2 DIV int id3


{SumQ} {100} {mean}

(vi) id1 := [N3] – [N4] * [N5] ; <N3>


< = < < < >

<N1> <N2> <N4> <N5>

id2 DIV int id3 id4


{SumQ} {100} {mean} {mean}
82

(vii) id1 := [N3] – [N6] ; <N3> <N6>


< = < >

<N1> <N2> <N4> <N5>

id2 DIV int id3 * id4


{SumQ} {100} {Mean} {Mean}

(viii) id1 := [N7] ; <N7>


< = >

<N3> <N6>

<N1> <N2> <N4> <N5>

id2 DIV int - id3 * id4


{SumQ} {100} {Mean} {Mean}

(ix) [N8] ; <N8>

<N7>

<N3> <N6>

<N1> <N2> <N4> <N5>

id1 := id2 DIV int - id3 * id4


{Variance} {SumQ} {100} {Mean} {Mean}

Note that each portion of the parse tree is constructed from the terminal
nodes up towards the root, hence the term bottom up parsing.
There are a few differences between these parse trees and the first
ones. This is because the operator precedence parse is not concerned
with the names of the non terminals and it is not necessary to perform
this additional step in the recognition process.

Code Generation
83

After the syntax has been analysed the last task of the compilation is the
generation of object code. A simple code generation technique is the
one that creates the object code for each part of the program as soon as
its syntax has been recognized.
The technique involves a set of routines one for each rule or alternative
rule in the grammar. When the parser recognizes a portion of the source
program according to some rule of the grammar, the corresponding
routine is executed. Such routines are called semantic routines because
the processing performed is related to the meaning associated with the
corresponding construct in the language. These semantic routines
generate object code directly so they can also be called code generation
routines.
The code to be generated depends upon the computer for which the
program is being compiled. We will use the generation of the object code
for the SIC/XE machine.
The code generation routines create segments of object code for the
compiled program which will be represented here using SIC assembler
language. The actual code generated is machine language not
assembler. As each piece of object code is generated a location counter
is updated to reflect the next available address in the compiled program.
Regardless of the method used to generate the parse tree, the parser
will always recognize at each step the left most substring of the input
that can be interpreted according to the rule of the grammar. In the
operator precedence method this recognition occurs when a substring of
the input is reduced to some non terminal <Ni>. The assembler code
below shows the symbolic representation of the object code to be
generated for the READ statement.

+JSUB XREAD
WORD 1
84

WORD VALUE

It involves a call to subroutine XREAD which would be part of a standard


library associated with the compiler. It can be called by any program that
wants to perform a READ operation. XREAD is linked together with the
generated object program by a linking loader or a linking editor.
Since XREAD may be used to perform any READ operation it must be
passed parameters that specify the details of the READ. In this case the
parameter list for XREAD is defined immediately after the JSUB that
calls it. The first word in this parameter list contains a value that specifies
the number of variables that will be assigned values by the READ. The
following words give the addresses of these variables. Thus the second
line specifies that one variable is to be read and the third line gives the
address of this variable.
The parser in generating the parse tree recognizes first <id-list> and
then <read>. At each step the parser calls the appropriate code
generation routine.
For the assignment statement
VARIANCE := SUMQ DIV 100 – MEAN * MEAN
most of the work involves the analysis of the <exp> statement on the
right hand side of the :=. The parser first recognizes the id SUMQ as a
<factor> and a <term>; then it recognizes the int 100 as a <factor>; then it
recognizes SUMQ DIV 100 as a <term> and so on. As each portion of
the statement is recognized a code-generation routine is called to create
the corresponding object code.
The assembler code below shows the symbolic representation of the
object code to be generated for the assignment statement
VARIANCE := SUMQ DIV 100 – MEAN * MEAN

LDA SUMQ
85

DIV #100
STA T1
LDA MEAN
MUL MEAN
STA T2
LDA T1
SUB T2
STA VARIANCE

Below is the symbolic representation of the object code generated from


the Pascal program on page 75.
Line Symbolic representation of the generated code
1 STATS STATS 0 {program Header}
EXTREF XREAD, XWRITE
STL RETADR {Save return address}
J {EXADDR}
RETADR RESW 1 {Variable declarations}
3 SUM RESW 1
SUMQ RESW 1
I RESW 1
VALUE RESW 1
MEAN RESW 1
VARIANCE RESW 1
5 {EXADDR} LDA #0 {SUM := 0}
STA SUM
LDA #0 {SUMQ := 0}
6 STA SUMQ
7 LDA #1 {For I := 1 to 100}
{L1} STA I
COMP 100
JGT {L2}
9 +JSUB XREAD {READ(VALUE)}
WORD 1
WORD VALUE
10 LDA SUM {SUM := SUM + VALUE}
ADD VALUE
STA SUM
11 LDA VALUE {Sumq:=Sumq+Vvalue*Value}
MUL VALUE
ADD SUMQ
STA SUMQ
LDA I {End of FOR Loop}
ADD #1
J {L1}
13 {L2} LDA SUM
DIV #100
STA MEAN
14 LDA SUMQ {Variance := SumqDIV100–mean*mean}
DIV #100
STA T1
LDA MEAN
MUL MEAN
STA T2
LDA T1
SUB T2
STA VARIANCE
15 +JSUB XWRITE {Write(Mean,Variance)}
WORD 2
WORD MEAN
WORD VARIANCE
LDL RETADR {Return}
RSUB
T1 RESW 1 { Working Variables Used}
T2 RESW 1
END

Machine Dependent Compiler Features.


86

Most high level programming languages are designed to be relatively


independent of the machine being used. This means that the process of
analyzing the syntax of the program should also be machine
independent. The only machine dependencies of a compiler are related
to the generation and optimization of the object code.
The code optimization is done using an intermediate form of the program
being analysed. In the intermediate form the syntax and the semantics of
the source statements have been completely analysed but the actual
translation into machine code has not yet been performed. It is much
easier to analyse and manipulate the intermediate form of the program
for the purposes of code optimization than to perform the corresponding
operations on either the source program or the machine code.

Intermediate form of the program


One of the methods used in representing a program in an intermediate
form represents the executable instructions of the program with a
sequence of quadruples. Each quadruple is of the form
Operation, op1, op2, result
where operation is the function to be performed by the object code, op1
and op2 are the operands and result is where the resulting value is to be
placed.
SUM := SUM + VALUE
could be represented with quadruples as
+ , SUM, VALUE, i1
:=, i1 , ,SUM
where i1 represents the intermediate result (SUM + VALUE); the second
quadruple assigns the value of this intermediate result to SUM.
Assignment is treated as a separate operation (:=) .
Similarly
VARIANCE:= SUMQ DIV 100 – MEAN * MEAN
87

Could be represented with quadruples as


DIV, SUMQ, #100, i1
*, MEAN, MEAN, I2
-, i1 , i2 ,i3
:=, i3 , , VARIANCE

Many types of analysis and manipulation can be performed on the


quadruples for code optimization purposes e.g. the intermediate results
ij can be assigned to registers or to temporary variables to make their
use more efficient. After optimization has been performed the modified
quadruples are translated into machine code.
Below is the intermediate code for the Pascal program.
operation Op1 Op2 Result
(1) := #0 SUM {SUM := 0}
(2) := #0 SUMQ {SUMQ := 0}
(3) := #1 I {FOR I := 1 TO 100}
(4) JGT I #100 (15)
(5) CALL XREAD {READ(VALUE)}
(6) PARAM VALUE
(7) + SUM VALUE i1 {SUM:= SUM + VALUE}
(8) := i1 SUM
(9) * VALUE VALUE i2 {SUMQ:=SUMQ
(10) + SUMQ i2 i3 + VALUE * VAL }
(11) := i3 SUMQ
(12) + I #1 i4 {end of FOR loop}
(13) := i4 I
(14) J (4)
(15) DIV SUM #100 i5 {MEAN := SUM DIV 100}
(16) := i5 MEAN
(17) DIV SUMQ #100 i6 {VARIANCE := SUMQ DIV 100 -
(18) * MEAN MEAN i7 MEAN * MEAN}
(19) - i6 i7 i8
(20) := i8 VARIANCE
(21) CALL XWRITE {WRITE(MEAN,VARIANCE)}
(22) PARAM MEAN
(23) PARAM VARIANCE

The READ and WRITE statements are represented with a CALL


operation followed by PARAM quadruples that specify the parameters of
the READ and WRITE.

Code-Optimization
Machine instructions that use registers as operands are usually faster
than the corresponding instructions that refer to locations in memory. It is
therefore better to keep in registers all variables and intermediate results
that will be used later in the program.
88

Consider the variable VALUE which is used once in quadruple 7 and


twice in quadruple 9. It is possible to fetch this value once and retain it in
a register for use by the code generated from quadruple 9. Similarly if i5
is stored into a register it could be used wherever the variable MEAN is
required.
Another possibility for code optimization involves rearranging quadruples
before machine code is generated.
DIV SUMQ #100 i1
* MEAN MEAN i2
- i1 i2 i3
:= i3 VARIANCE

This corresponds to
LDA SUMQ
DIV #100
STA T1
LDA MEAN
MUL MEAN
STA T2
LDA T1
SUB T2
STA VARIANCE

The value of the intermediate result i1 is calculated first and stored in a


temporary variable T1, then i2 is calculated. The third quadruple needs
subtracting i2 from i1. Since i2 has just been computed its value is in
register A. It is necessary to store the value of i2 in another temporary
variable T2 and then load the value of i1 from T1 into register A before
performing the subtraction.
An optimizing compiler can rearrange the quadruples so that the second
operand of subtraction is computed first as shown below.
* MEAN MEAN i2
DIV SUMQ #100 i1
- i1 i2 i3
:= i3 VARIANCE

corresponding to
LDA MEAN
MUL MEAN
STA T1
LDA SUMQ
89

DIV #100
SUB T1
STA VARIANCE
The resulting machine code requires fewer instructions and uses only
one temporary variable instead of two.

Machine Independent Compiler Features.


Storage Allocation
The type of storage assignment where all programmer defined variables
are assigned fixed addresses within the program is called static
allocation. It is often used for programs like FORTRAN that do not allow
recursive use of procedures or subroutines.

(1) System System System

MAIN (1) MAIN (1)

CALL SUB CALL SUB

RETADR
RETADR RETADR
(a) (2) (2)

SUB SUB

CALL SUB
(3)

RETADR
RETADR
(b)
(c)
If procedures may be called recursively like in PASCAL static allocation
cannot be used. In the figure the program MAIN has been called by the
operating system (call 1). MAIN stores its return address at a fixed
location RETADR within MAIN.
90

MAIN calls SUB (call 2). The return address of this call is stored at a fixed
location within SUB. If SUB calls itself recursively as in fig (c) a problem
occurs because SUB stores the return address for call 3 into RETADR
from register L. This destroys the return address for call 2 and as a result
there is no possibility of ever making a correct return to MAIN.
A similar difficulty occurs with respect to any variables used by SUB.
When recursive calls are made variables within SUB may be set to new
values; however the previous values may be needed by call 2 of SUB
after the return from the recursive call
It is therefore necessary to preserve the previous values of any variables
used in SUB including parameters, temporaries, return addresses,
register save areas etc.
This is usually accomplished by the dynamic storage allocation
technique where each procedure call creates an activation record that
contains storage for all the variables used by the procedure. If the
procedure is called recursively another activation record is created. Each
activation record is associated with a particular invocation of the
procedure. An activation record is not deleted until a return has been
made from the corresponding invocation. The starting address for the
current activation record is usually contained in a base register which is
used by the procedure to address its variables. In this way the values of
variables used by the different invocations of a procedure are kept
separate from one another.
Activation records are typically allocated on a stack, with the current
record on top of the stack.
In the diagram below, (a) MAIN has been called, its activation record
appears on the stack. The base register is set to indicate the starting
address of this of the current activation record. The first word in an
91

activation record normally contains a pointer PREV to the previous


record on the stack. Since this record is the first the pointer value is null.
Variables
System System for SUB
(1) MAIN (1)
MAIN RETADR
Variables
For MAIN Call SUB NEXT

RETADR B PREV
(2)
NEXT Variables
B 0 for MAIN

(a) SUB
RETADR

NEXT
0

(b)
Variables
For SUB

RETADR

NEXT

B
PREV
Variables
System for SUB
(1) Variables
System for SUB
(1)
Call SUB RETADR RETADR

NEXT Call SUB NEXT

(2) PREV B PREV

Variables Variables
SUB for Main (2) for MAIN

SUB
(3) RETADR RETADR
Call SUB
NEXT NEXT

0 0

(c) (d)
92

The second word of the activation record contains a pointer NEXT to the
first unused word of the stack, which will be the starting address for the
next activation record created. The third word contains the return
address for this invocation of the procedure, and the remaining words
contain the values of all the variables used by the procedure.
In diagram (b) MAIN has called SUB. On the top of the stack a new
activation record has been created with register B set to indicate the new
current record. The pointers PREV and next are set as shown.
In (c) SUB has called itself recursively and another activation record has
been created.
When a procedure returns to its caller the current activation record is
deleted. The pointer PREV in the deleted record is used to reestablish
the previous activation record as the current one and execution
continues.
Fig (d) shows how the stack would appear after SUB returns from the
recursive call. Register B has been reset to point to the activation record
for the previous invocation of SUB. The return address and all the
variable values in this activation record are exactly the same as they
were before the recursive call.
This technique is called automatic allocation of storage. In this technique
the compiler generates code for references to variables using some sort
of relative addressing. The compiler assigns each variable an address
which is relative to the beginning of the activation record instead of an
actual location within the program. The address of the current activation
record is contained in register B. the displacement in this instruction is
the relative address of the variable within the activation record.
The compiler also generates additional code to manage the activation
records themselves. At the beginning of each procedure there must be
code to create a new activation record, linking it to the previous one and
93

setting the appropriate pointers. This code is often called prologue for the
procedure. At the end of the procedure there must be a code to delete
the current activation record and resetting pointers as needed. This code
is called an epilogue.
Other types of dynamic storage allocation allow the programmer to
specify when storage is to be assigned. In PL/I the statement
ALLOCATE (A) allocates storage for the variable A while FREE (A)
releases the storage assigned to A by the previous ALLOCATE. This
feature is called controlled storage in PL/I.
In Pascal the statement NEW(P) allocates storage for a variable and
sets the pointer P to indicate the variable just created. The statement
DISPOSE(P) releases the storage that was previously assigned to the
variable pointed to by P.

Structured Variables
These include arrays, records, strings, sets etc.
Consider an array A: ARRAY[1..10], if each integer variable occupies
one word of memory, then ten words have to be allocated to store this
array.
In general an array ARRAY[l..u] of integer needs an allocation of u-l+1
words of storage for the array.
For a two-dimensional array like B: ARRAY[0..3,1..6] of integer, the first
subscript on 4 different values (0-3) and the second subscript can take
on 6 values. We need to allocate a total of 4 * 6 = 24 words to store the
array. In general an array ARRAY[l1..u1, l2..u2] of integer needs to be
allocated a storage of (u1-l1+1)*(u2-l2+1) words.
To generate code for array references it is important to know which array
element corresponds to each word of allocated storage. For a one
94

dimensional array there is an obvious correspondence e.g. in the array A


above the first word contains A[1], the second word A[2] etc.
A two dimensional array has two possible ways of storing its elements.
All array elements that have the same value of the first subscript are
stored in contiguous locations. This is called row major order.
0,1 0,2 0,3 0,4 0,5 0,6 1,1 1,2 1,3 1,4 1,5 1,6 2,1 2,2 2,3 2,4 2,5 2,6 3,1 3,2 3,3 3,4 3,5 3,6

The right most subscript varies most rapidly.

Where all elements that that have the same value of the second
subscript are stored together is called the column major order.
0,1 1,1 2,1 3,1 0,2 1,2 2,2 3,2 0,3 1,3 2,3 3,3 0,4 1,4 2,4 3,4 0,5 1,5 2,5 3,5 0,6 1,6 2,6 3,6

The left most subscript varies most rapidly.

Compilers for most high level languages store arrays using row-major
order; FORTAN compilers however store arrays in column order.
To refer to an array element we calculate the address of the referenced
element relative to the base address of the array. The compiler will
generate code to place this relative address in an index register.
Assume a one dimensional array A: ARRAY[1..10] of integer and
suppose that a statement refers to an array element A[6]. There are five
array elements preceding A[6]; on a SIC machine each element will
occupy 3 bytes, thus the address of A[6] relative to the starting address
of the array is given by 5 x 3 = 15.
In general for an array element A[s] of a one dimensional array, A:
ARRAY [l..u] where each array element occupies w bytes of storage, its
location will be
w * (s - l)
95

A multi dimensional array will consider whether a row major or column


major is used.
Assume a row major and an array B: ARRAY[0..3,1..6] of integer. For
the array element B[2,5] skip two rows row 0 and row 1 Each row
contains 6 elements so this involves 2 x 6 = 12. Skip also the first 4
elements in row 2 to get to B[2,5]. This makes a total of 16 array
elements. Each element is three bytes so the array element B[2,5] is at
address 48 relative to the beginning of the array.
In general, for an array declaration B: ARRAY [l1..u1, l2..u2], the relative
address of element B[s1,s2] is given by
w [(s1 – l1) * (u2 – l2 + 1) + (s2 – l2)]
96

MACROPROCESSORS
A macro instruction (often abbreviated as a macro) represents a
commonly used group of statements in the source program language.
The macro processor replaces each macro instruction with the
corresponding group of source language statements. This is called
expanding the macros. Macro instructions therefore allow the
programmer to write a short hand version of a program. The functions of
a macro processor essentially involve the substitution of one group of
characters or lines for another.

Macro Definition and Expansion.


A macro consists of a name, a set of formal parameters and a body.
The table below shows an example of a SIC/XE program using macro
instructions. It uses 2 macro instructions RDBUFF (RDREC) and
WRBUFF (WRREC).
Two new assembler directives (MACRO and MEND) are used.
MACRO identifies the beginning of a macro definition. RDBUFF is the
name of the macro and the operands are the parameters of the macro
instruction. Each parameter begins with a character &. Following the
MACRO directive are the statements that make up the body of the
macro definition (lines 15 – 90). These are statements that will be
generated as the expansion of the macro.
MEND marks the end of the macro definition.
The main program itself begins on line 180. The statement on liner 190
is a macro invocation (macro call).
97

5 COPY START 0 COPY FILE FROM INPUT TO OUTPUT


10 RDBUFF MACRO &INDEV,&BUFADR,&RECLTH
15
20 MACRO TO READ RECORD INTO BUFFER
25
30 CLEAR X CLEAR LOOP COUNTER
35 CLEAR A
40 CLEAR S
45 +LDT #4096 SET MAXIMUM RECORD LENGTH
50 TD =X’&INDEV’ TEST INPUT DEVICE
55 JEQ *-3 LOOP UNTIL READY
60 RD =X’&INDEV’ READ CHARACTER INTO REGISTER A
65 COMPR A,S TEST FOR END OF RECORD
70 JEQ *+11 EXIT LOOP IF EOR
75 STCH &BUFADR,X STORE CHARACTER IN BUFFER
80 TIXR T LOOP UNLESS MAXIMUN LENGTH HAS BEEN REACHED
85 JLT *-19
90 STX ,&RECLTH SAVE RECORD LENGTH
95 MEND

100 WRBUFF MACRO &OUTDEV,&BUFADR,&RECLTH


105
110 Macro to write Record from Buffer
115
120 CLEAR X CLEAR LOOP COUNTER
125 LDT &RECLTH
130 LDCH &BUFADR,X GET CHARACTER FROM BUFFER
135 TD =X’&OUTDEV’ TEST OUTPUT DEVICE
140 JEQ *-3 LOOP UNTIL READY
145 WD =X’&OUTDEV’ WRITE CHARACTER
150 TIXR T LOOP UNTIL ALL CHARACTERS HAVE BEEN WRITTEN
155 JLT *-14
160 MEND
165
170
175 Main Program
180 FIRST STL RETADR SAVE RETURN ADDRESS
190 CLOOP RDBUFF F1, BUFFER, LENGTH READ RECORD INTO BUFFER
195 LDA LENGTH TEST FOR END OF FILE
200 COMP #0
205 JEQ ENDFIL EXIT IF EOF FOUND
210 WRBUFF 05, BUFFER, LENGTH WRITE OUTPUT RECORD
215 J CLOOP LOOP
220 ENDFIL WRBUFF 05, EOF, THREE INSERT EOF MARKER
225 J @RETADR
230 EOF BYTE C‘EOF’
235 THREE WORD 3
240 RETADR RESW 1
245 LENGTH RESW 1 LENGTH OF RECORD
250 BUFFER RESB 4096 4096 - - BYTE BUFFER AREA
255 END FIRST
98

5 COPY START 0 COPY FILE FROM INPUT TO OUTPUT


180 FIRST STL RETADR SAVE RETURN ADDRESS
190 RDBUFF F1, BUFFER, LENGTH READ RECORD INTO BUFFER
.CLOOP
190a CLOOP CLEAR X CLEAR LOOP COUNTER
190b CLEAR A
190c CLEAR S
190d +LDT #4096 SET MAXIMUM RECORD LENGTH
190e TD =X’F1’ TEST INPUT DEVICE
190f JEQ *-3 LOOP UNTIL READY
190g RD =X’F1’ READ CHARACTER INTO REGISTER A
190h COMPR A,S TEST FOR END OF RECORD
190i JEQ *+11 EXIT LOOP IF EOR
190j STCH BUFFER,X STORE CHARACTER IN BUFFER
190k TIXR T LOOP UNLESS MAXIMUN LENGTH HAS BEEN
REACHED
190l JLT *-19
190m STX LENGTH SAVE RECORD LENGTH
195 LDA LENGTH TEST FOR END OF FILE
200 COMP #0
205 JEQ ENDFIL EXIT IF EOF FOUND
210 WRBUFF 05, BUFFER, LENGTH WRITE OUTPUT RECORD
210a CLEAR X CLEAR LOOP COUNTER
210b LDT LENGTH
210c LDCH BUFFER,X GET CHARACTER FROM BUFFER
210d TD =X’05’ TEST OUTPUT DEVICE
210e JEQ *-3 LOOP UNTIL READY
210f WD =X’05’ WRITE CHARACTER
210g TIXR T LOOP UNTIL ALL CHARACTERS HAVE BEEN WRITTEN
210h JLT *-14
215 J CLOOP LOOP
220 WRBUFF 05, EOF, THREE INSERT EOF MARKER
.ENDFIL
220a ENDFIL CLEAR X CLEAR LOOP COUNTER
220b LDT THREE
220c LDCH EOF,X GET CHARACTER FROM BUFFER
220d TD =X’05’ TEST OUTPUT DEVICE
220e JEQ *-3 LOOP UNTIL READY
220f WD =X’05’ WRITE CHARACTER
220g TIXR T LOOP UNTIL ALL CHARACTERS HAVE BEEN WRITTEN
220h JLT *-14
225 J @RETADR
230 EOF BYTE C‘EOF’
235 THREE WORD 3
240 RETADR RESW 1
245 LENGTH RESW 1 LENGTH OF RECORD
250 BUFFER RESB 4096 4096 - - BYTE BUFFER AREA
255 END FIRST

The figure above shows the output that would be generated. In the
expanded form:
 The macro instruction definitions have been deleted.
99

 Each macro instruction has been expanded into the statements


that form the body of the macro with the arguments from the macro
invocation substituted for the parameters in the macro prototype.
 The macro invocation statement itself has been included as a
comment line.

Differences between Macros and Subroutine calls


The statements from the body of the macro WRBUFF are generated
twice i.e. lines 210a to 210h and lines 220a to 220h in the above figure.
In the figure on page 40 the corresponding statements appear only
once. In general the statements that form the expansion of the macro
are generated (and assembled) each time the macro is invoked.
Statements in a subroutine appear only once regardless of how many
times the subroutine is called.
Macro instructions are written with no labels in the body of the macro.
Line 140 “JEQ *-3 “, line 155 “JLT *-14” instead of JEQ WLOOP and JLT
WLOOP, where WLOOP is the label on the TD instruction that tests the

output device. If such a statement appeared on line 135 of the macro


body it would be generated twice on lines 210d and 220d.

Macro processor Tables and Logic.


There are 3 main data structures involved in macro processors.
 The macro definitions are stored in a definition table
DEFTAB which contains the macro prototype and the statements
that make up the macro body. Comment lines from the macro
definition are not entered into DEFTAB.
 Macro names are entered into NAMTAB which serves as
an index to DEFTAB. For each defined macro NAMTAB contains
pointers to the beginning and end of the definition in DEFTAB.
100

 The third structure is the argument table, ARGTAB, used


during the expansion of macro calls. The arguments are stored in
ARGTAB according to their positions in the argument list.

NAMTAB DEFTAB

RDBUFF &INDEV, &BUFADR, &RECLTH


RDBUFF . . CLEA
CLEA
X
A
CLEA S
+LDT #4096
TD =X’?1’
JEQ *-3
RD = X’?1’
COMPR A,S
JEQ *+11
STCH ?2,X
TIXR T
JLT *-19
STX ?3
MEND

ARGTAB

1 F1

2 BUFFER

3 LENGTH
The positional notation for the parameters &INDEV has been converted
to ?1 etc. The first argument in the figure above is F1.

Generation of Unique Labels.


Since it is not possible for the body of the macro instruction to contain
labels, relative addressing is used. However for large jumps over many
instructions such a notation is very inconvenient, error prone and difficult
to read. Special types of labels are therefore used. Labels within the
macro body begin with the special character $. In the expansion each
symbol beginning with $ is modified by replacing $ with $AA. More
generally the character $ will be replaced by $XX where XX is a two
character alphanumeric counter of the number of macro instructions
101

25 RDBUFF MACRO &INDEV,&BUFADR,&RECLTH


30 CLEAR X CLEAR LOOP COUNTER
35 CLEAR A
40 CLEAR S
45 +LDT #4096 SET MAXIMUM RECORD LENGTH
50 $LOOP TD =X’&INDEV’ TEST INPUT DEVICE
55 JEQ $LOOP LOOP UNTIL READY
60 RD =X’&INDEV’ READ CHARACTER INTO REGISTER A
65 COMPR A,S TEST FOR END OF RECORD
70 JEQ $EXIT EXIT LOOP IF EOR
75 STCH &BUFADR,X STORE CHARACTER IN BUFFER
80 TIXR T LOOP UNLESS MAXIMUN LENGTH HAS BEEN REACHED
85 JLT $LOOP
90 $EXIT STX &RECLTH SAVE RECORD LENGTH
95 MEND

RDBUFF F1, BUFFER, LENGTH

30 CLEAR X CLEAR LOOP COUNTER


35 CLEAR A
40 CLEAR S
45 +LDT #4096 SET MAXIMUM RECORD LENGTH
50 $AALOOP TD =X’F1’ TEST INPUT DEVICE
55 JEQ $AALOOP LOOP UNTIL READY
60 RD =X’F1’ READ CHARACTER INTO REGISTER A
65 COMPR A,S TEST FOR END OF RECORD
70 JEQ $AAEXIT EXIT LOOP IF EOR
75 STCH BUFFER,X STORE CHARACTER IN BUFFER
80 TIXR T LOOP UNLESS MAXIMUN LENGTH HAS BEEN REACHED
85 JLT $AALOOP
90 $AAEXIT STX LENGTH SAVE RECORD LENGTH

expanded. For the first macro expansion in a program XX will have the
value AA. For succeeding macro expansion, XX will be set to AB, AC
etc.

Conditional Macro Expansion


Most macro processors can modify the sequence of statements
generated during a macro expansion depending on the arguments
supplied in the invocation. The first figure below shows a definition of a
macro RDBUFF. Two additional parameters &EOR (a hexadecimal
character code that marks the end of the record) and &MAXLTH
(specifying the maximum record length that can be read). It is possible
for either or both of these parameters to be omitted in an invocation of
102

25 RDBUFF MACRO &INDEV,&BUFADR,&RECLTH, &EOR, &MAXLTH


26 IF (&EOR NE ‘ ‘)
27 &EORCK SET 1
28 ENDIF
30 CLEAR X CLEAR LOOP COUNTER
35 CLEAR A
38 IF (&EORCK EQ 1)
40 LDCH =X’&EOR’ SET EOR CHARACTER
42 RMO A,S
43 ENDIF
44 IF (&MAXLT EQ ‘ ‘)
45 +LDT #4096 SET MAXIMUM RECORD LENGTH
46 ELSE
47 +LDT #&MAXLTH SET MAXIMUM RECORD LENGTH
48 ENDIF
50 $LOOP TD =X’&INDEV’ TEST INPUT DEVICE
55 JEQ $LOOP LOOP UNTIL READY
60 RD =X’&INDEV’ READ CHARACTER INTO REGISTER A
65 COMPR A,S TEST FOR END OF RECORD
70 JEQ $EXIT EXIT LOOP IF EOR
75 STCH &BUFADR,X STORE CHARACTER IN BUFFER
80 TIXR T LOOP UNLESS MAXIMUN LENGTH HAS BEEN
REACHED
85 JLT $LOOP
90 $EXIT STX &RECLTH SAVE RECORD LENGTH
95 MEND

RDBUFF F3, BUF, RECL, 04, 2048

30 CLEAR X CLEAR LOOP COUNTER


35 CLEAR A
40 LDCH =X’04’ SET EOR CHARACTER
42 RMO A,S
47 +LDT #2048 SET MAXIMUM RECORD LENGTH
50 $AALOOP TD =X’F3’ TEST INPUT DEVICE
55 JEQ $AALOOP LOOP UNTIL READY
60 RD =X’F3’ READ CHARACTER INTO REGISTER A
65 COMPR A,S TEST FOR END OF RECORD
70 JEQ $AAEXIT EXIT LOOP IF EOR
75 STCH BUF,X STORE CHARACTER IN BUFFER
80 TIXR T LOOP UNLESS MAXIMUN LENGTH HAS BEEN REACHED
85 JLT $AALOOP
90 $AAEXIT STX RECL SAVE RECORD LENGTH

RDBUFF. Statements on lines 44 to 48 of this definition illustrate a


simple macro time conditional structure. If the value of the expression is
TRUE the statements following the IF are generated until an ELSE is
encountered. If the parameter &MAXLTH is equal to the null string the
statement on line 45 is generated. A similar structure appears on lines
26-28.

You might also like