Assemblers, Table Processing, and Macro Processors: A Compre-Hensive Overview
Assemblers, Table Processing, and Macro Processors: A Compre-Hensive Overview
hensive Overview
1. Assemblers
An assembler is a system software tool that translates programs written in assembly language (which use
human-readable mnemonics) into machine code (binary instructions). This translation is necessary because
computers execute only machine code. Assemblers also handle address allocation for labels and generate
information for loaders/linkers. Modern assemblers range from basic one-to-one translators to more advanced
programs offering macro facilities and optimizations.
Design of an Assembler
Assemblers traditionally operate in one or more passes over the source program. A one-pass assembler scans
the source code once and attempts to translate on the fly, handling forward-references by backpatching or
generating fix-up records for the loader. In contrast, a two-pass assembler (the most common design) makes
two passes: Pass 1 reads the source, processing declarations and pseudocode, and builds tables (like the
symbol table for labels and a literal table for constants) while computing addresses; Pass 2 then uses the
tables from Pass 1 to translate instructions into machine code. The diagram below illustrates a typical
two-pass assembler flow, where an Intermediate Representation (IR) and a symbol table produced in Pass 1
are used by Pass 2 to generate the final machine code output.
Working of pass 1 and 2(own)
Data structures central to assembler design include the symbol table, which stores each label (symbolic
address) along with its assigned address (and sometimes additional attributes like data type or section)
, and the literal table, which holds literal constants encountered (e.g. numeric or string constants) and
their assigned memory addresses. Assemblers also use an opcode table (machine instruction table) mapping
mnemonics to binary opcode values and instruction formats geeksforgeeks.org , and may use a pseudo-op
table for assembler directives (like START, END, BYTE, WORD, etc.) . During assembly, a location counter
(LC) tracks the current instruction address as the assembler parses the program. The LC is incremented
according to the size of each assembled instruction or reserved data area, thereby computing addresses for
subsequent symbols. These data structures and the two-pass algorithm form the core of assembler design.
To illustrate the assembler’s workflow, consider a simple assembly snippet with labels and literals.
Example:
Assembly Code:
1
START 100
LOOP MOVER AREG , A
ADD AREG , B
MOVEM AREG , C
STOP
A DC 5
B DC 3
C DS 1
END
Symbol Address
LOOP 100
Symbol Table: A 104
B 105
C 106
Table Processing
Searching Techniques
Efficient methods to access and retrieve data from tables:
2
• Binary Search: Efficient on sorted data, repeatedly divides search interval by half. (study in detail)
• Hashing: Maps keys directly to table indices for quick access.
Sorting Techniques
Methods for arranging data into a particular order:
• Bubble Sort: Repeatedly steps through the list, compares adjacent elements, and swaps them if
necessary.
• Insertion Sort: Builds the sorted array incrementally, placing each new element into its correct
position.
• Quick Sort: Uses divide-and-conquer, selecting a pivot and partitioning the array around it.
(Also read the benefit of each)
Macro Instructions
A macro instruction can be thought of as a template or pattern for generating a sequence of assembly
instructions. The programmer defines a macro with a name and a body of code. Later, using the macro’s
name (a macro call) in the assembly code will cause the assembler’s macro processor to replace that single
macro call with the full sequence of instructions defined in the macro body (this is called macro expansion).
This all happens at assembly time (before actual machine code generation), so by the time the assembler
proper is translating to machine code, the macros have been expanded into ordinary assembly instructions.
For example, a simple macro that adds two numbers might look like:
ADD TWO MACRO &X, &Y
MOV R1, &X ; load X into register R1
ADD R1, &Y ; add Y to R1
MEND
(read in details)
3
Implementation of a Macro Processor (Pass I and II Design)
The implementation of macro processing in assemblers can be done in different ways. One common design
is a two-pass macro processor which is often integrated with the assembler’s passes. Alternatively, some
assemblers handle macros in a single pass on the fly, but the two-pass approach is conceptually cleaner to
explain.
In a two-pass macro processor design:
Pass 1 (Macro Definition Pass): The assembler (or pre-assembler) scans the source code looking primarily
for macro definitions. When it encounters a MACRO directive, it knows a macro is being defined. It then
takes all lines until the matching MEND and stores them in a Macro Definition Table (MDT). It also creates
an entry in the Macro Name Table (MNT) for this macro name, which includes pointers or indices into
the MDT where that macro’s definition is stored. The macro’s parameters are noted as well (some macro
processors store a Parameter Name Table for default values or keyword parameters). Essentially, Pass 1
does not expand any macros; it only gathers and remembers macro definitions. Any normal assembly code
lines that are not part of macro definitions might be written to an intermediate file unchanged, but macro
definitions themselves are not passed to the next stage (they’re kind of “eaten” by the macro processor
and replaced with nothing or a placeholder). By the end of Pass 1, the macro processor knows about all
macros (names and bodies) defined in the program macro calls to be resolved even if they occur “before”
the definition in the source file, as long as the definition exists somewhere (the Pass 1 ensures everything is
collected).
Pass 2 (Macro Expansion Pass): In the second pass, the assembler actually processes the program for
translation, but now it is equipped with knowledge of macros. When it encounters a macro call (an opcode
that matches a name in the Macro Name Table), it will pause reading from the main input and instead
expand the macro: it looks up the macro’s body from the MDT, then for each line of the macro body, it
takes it and performs argument substitution (replace formal parameters with the actual arguments from the
call). These substituted lines are then inserted into the output (or into the assembly input stream) as if
they had come from the original source. The macro processor may use an Argument Table (ARGTAB) or
similar structure during expansion: when a macro call is recognized, it takes the arguments from the call and
stores them in ARGTAB indexed by the parameters. Then as it writes out the macro’s body from MDT,
whenever it sees a parameter symbol (like &X), it replaces it with the corresponding ARGTAB value (the
actual argument). If the macro body contains nested macro calls, the macro processor will recursively expand
those as well by looking up those macro names in MNT and processing similarly. The expansion process
often uses a Macro Expansion Counter (MEC) or similar to keep track of where in the macro definition it is,
and perhaps a stack to handle nested expansions. Once the macro expansion is done, the macro processor
resumes scanning the main input where it left off.
Effectively, after Pass 2 (macro expansion), the assembler has a fully expanded source program with no
macros – just regular assembly instructions. At this point, the normal assembly translation (into machine
code) continues. In some designs, the macro expansion Pass 2 is interwoven with the actual code generation
pass of the assembler. In other designs, the macro processor is a distinct preprocessing step that outputs an
expanded source file which is then fed into a separate assembly phase.
One-pass macro processors: It’s worth noting that it’s possible to design a one-pass assembler that also
handles macros in one pass, but it requires that macros be defined before use (so definitions appear earlier
in the file than any call to them). If that rule is followed, then the assembler can expand macros on the fly:
whenever it sees a macro definition, store it; whenever it sees a macro call, expand it immediately. However,
if macros can be forward-referenced or especially if macros can be defined within other macros, one-pass
expansion becomes difficult. That’s why the two-pass approach is more general. For instance, if Macro A is
defined inside Macro B’s expansion, a one-pass assembler wouldn’t even know about A until it expanded B,
which would be too late if A was called earlier or something. The two-pass macro processor cleanly separates
definition and use phases, similar to how a two-pass assembler separates label definition and usage.
Data Structures for Macro Processor: As mentioned:
MNT (Macro Name Table): stores macro names and pointers/indices to MDT entries, plus perhaps
parameter info (number of params, etc.).
MDT (Macro Definition Table): stores the actual lines of macro bodies. Often, a special indicator (like
“MEND”) is stored as well to mark end of each macro in this table.
4
ALA (Argument List Array) or ARGTAB: Used during expansion to map macro definition’s parameters
to the invocation’s arguments.
Optionally, KPDTAB (Keyword Parameter Default Table) and EVTAB (Expansion Time Variable Table)
if the macro language supports default parameters or special variables.