0% found this document useful (0 votes)
19 views5 pages

Assemblers, Table Processing, and Macro Processors: A Compre-Hensive Overview

The document provides a comprehensive overview of assemblers, table processing, and macro processors, detailing how assemblers translate assembly language into machine code through a two-pass system involving symbol and literal tables. It also discusses efficient searching and sorting techniques for data tables, as well as the implementation and features of macro processors that enhance code reusability and flexibility in assembly language. Key data structures such as the Macro Name Table and Macro Definition Table are highlighted in the context of macro expansion during assembly.

Uploaded by

nilakshbe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views5 pages

Assemblers, Table Processing, and Macro Processors: A Compre-Hensive Overview

The document provides a comprehensive overview of assemblers, table processing, and macro processors, detailing how assemblers translate assembly language into machine code through a two-pass system involving symbol and literal tables. It also discusses efficient searching and sorting techniques for data tables, as well as the implementation and features of macro processors that enhance code reusability and flexibility in assembly language. Key data structures such as the Macro Name Table and Macro Definition Table are highlighted in the context of macro expansion during assembly.

Uploaded by

nilakshbe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Assemblers, Table Processing, and Macro Processors: A Compre-

hensive Overview
1. Assemblers
An assembler is a system software tool that translates programs written in assembly language (which use
human-readable mnemonics) into machine code (binary instructions). This translation is necessary because
computers execute only machine code. Assemblers also handle address allocation for labels and generate
information for loaders/linkers. Modern assemblers range from basic one-to-one translators to more advanced
programs offering macro facilities and optimizations.

Design of an Assembler
Assemblers traditionally operate in one or more passes over the source program. A one-pass assembler scans
the source code once and attempts to translate on the fly, handling forward-references by backpatching or
generating fix-up records for the loader. In contrast, a two-pass assembler (the most common design) makes
two passes: Pass 1 reads the source, processing declarations and pseudocode, and builds tables (like the
symbol table for labels and a literal table for constants) while computing addresses; Pass 2 then uses the
tables from Pass 1 to translate instructions into machine code. The diagram below illustrates a typical
two-pass assembler flow, where an Intermediate Representation (IR) and a symbol table produced in Pass 1
are used by Pass 2 to generate the final machine code output.
Working of pass 1 and 2(own)

Data structures central to assembler design include the symbol table, which stores each label (symbolic
address) along with its assigned address (and sometimes additional attributes like data type or section)
, and the literal table, which holds literal constants encountered (e.g. numeric or string constants) and
their assigned memory addresses. Assemblers also use an opcode table (machine instruction table) mapping
mnemonics to binary opcode values and instruction formats geeksforgeeks.org , and may use a pseudo-op
table for assembler directives (like START, END, BYTE, WORD, etc.) . During assembly, a location counter
(LC) tracks the current instruction address as the assembler parses the program. The LC is incremented
according to the size of each assembled instruction or reserved data area, thereby computing addresses for
subsequent symbols. These data structures and the two-pass algorithm form the core of assembler design.
To illustrate the assembler’s workflow, consider a simple assembly snippet with labels and literals.

Example:
Assembly Code:

1
START 100
LOOP MOVER AREG , A
ADD AREG , B
MOVEM AREG , C
STOP
A DC 5
B DC 3
C DS 1
END

Symbol Address
LOOP 100
Symbol Table: A 104
B 105
C 106

Symbol Table, Literal Table, and Intermediate Code


We have already touched on the roles of the symbol table and literal table in assembler design. The symbol
table (SYMTAB) is a critical data structure that stores each symbol (usually labels in the assembly code,
which mark memory addresses of instructions or data) along with attributes like its address, data type,
length, etc. During Pass 1, when the assembler encounters a label (e.g. LABEL1:), it enters it into the
symbol table with the current location counter value as its address. If a symbol is referenced in an operand
but not yet defined, an entry is made with a placeholder address and a note that it’s undefined (a forward
reference). Later, when the symbol is defined, the table is updated with the actual address (this is a process
called backpatching). In Pass 2, any time a symbol appears as an operand, the assembler simply looks up
its address in the symbol table and inserts that into the machine code instruction.
The literal table (LITTAB) similarly stores literal constants that appear in the code. In many assembly
languages, a literal is indicated by a prefix such as = (for example MOV R1, =5 uses the constant 5 as
an operand). The assembler collects these literals because it typically needs to allocate storage for them at
some point (often at the end of the program or at literal pool directives like LTORG). The literal table maps
each literal value to a location in memory where that value will be stored. During assembly, when a literal
is first encountered, it’s added to LITTAB with no address. When a literal pool is processed (either at an
explicit LTORG or at the end of the program), the assembler assigns each pending literal an address (often
at the current end of the program) and writes the literal value to the object code at that location. In Pass
2, references to the literal are replaced with the literal’s address from the table. This way, multiple uses of
the same literal value can refer to one memory location (saving space), and the actual binary value is stored
in the object code at a known place.
Intermediate code formats: Between Pass 1 and Pass 2, many assemblers produce an intermediate file or
representation. This intermediate code is often a line-by-line translation of the original source into a more
uniform format that is easier for Pass 2 to process. For example, after Pass 1, an intermediate line might
contain the original instruction with an appended address field for any symbols. If a symbol was undefined
in Pass 1, the intermediate code might carry a placeholder or an index into a forward-reference list. The
intermediate file typically includes the computed LC (location counter) values for each instruction and data
definition, so that Pass 2 knows the address at which it should generate code.

Table Processing
Searching Techniques
Efficient methods to access and retrieve data from tables:

• Linear Search: Sequential checking of each element. (study in detail)

2
• Binary Search: Efficient on sorted data, repeatedly divides search interval by half. (study in detail)
• Hashing: Maps keys directly to table indices for quick access.

Sorting Techniques
Methods for arranging data into a particular order:
• Bubble Sort: Repeatedly steps through the list, compares adjacent elements, and swaps them if
necessary.
• Insertion Sort: Builds the sorted array incrementally, placing each new element into its correct
position.
• Quick Sort: Uses divide-and-conquer, selecting a pivot and partitioning the array around it.
(Also read the benefit of each)

Macro Language and Macro Processor


Assemblers often provide a macro language feature that allows programmers to define macros – essentially,
parametric abbreviations for sequences of assembly instructions. A macro instruction (or just macro) is
a user-defined operation that expands into one or more real assembly instructions during assembly time.
The software component that handles this expansion is called the macro processor (or macro preprocessor, if
integrated into the assembler). Macros are a powerful facility because they enable code reuse and extensibility
in assembly language, which lacks functions or procedures in the high-level sense. In this section, we will
discuss macro instructions and their format, features of macro facilities (expansion, parameters, nesting),
how a macro processor is implemented (typically with two passes, analogous to an assembler), and real-world
applications and benefits of using macros.

Macro Instructions
A macro instruction can be thought of as a template or pattern for generating a sequence of assembly
instructions. The programmer defines a macro with a name and a body of code. Later, using the macro’s
name (a macro call) in the assembly code will cause the assembler’s macro processor to replace that single
macro call with the full sequence of instructions defined in the macro body (this is called macro expansion).
This all happens at assembly time (before actual machine code generation), so by the time the assembler
proper is translating to machine code, the macros have been expanded into ordinary assembly instructions.
For example, a simple macro that adds two numbers might look like:
ADD TWO MACRO &X, &Y
MOV R1, &X ; load X into register R1
ADD R1, &Y ; add Y to R1
MEND

Features of Macro Facility


• Code Reusability: Enables repeated use of code fragments without duplication.
• Parameterized Macros: Allows macro definitions to accept parameters, enhancing flexibility.
• Conditional Assembly: Supports conditional compilation based on specified conditions.

(read in details)

3
Implementation of a Macro Processor (Pass I and II Design)
The implementation of macro processing in assemblers can be done in different ways. One common design
is a two-pass macro processor which is often integrated with the assembler’s passes. Alternatively, some
assemblers handle macros in a single pass on the fly, but the two-pass approach is conceptually cleaner to
explain.
In a two-pass macro processor design:
Pass 1 (Macro Definition Pass): The assembler (or pre-assembler) scans the source code looking primarily
for macro definitions. When it encounters a MACRO directive, it knows a macro is being defined. It then
takes all lines until the matching MEND and stores them in a Macro Definition Table (MDT). It also creates
an entry in the Macro Name Table (MNT) for this macro name, which includes pointers or indices into
the MDT where that macro’s definition is stored. The macro’s parameters are noted as well (some macro
processors store a Parameter Name Table for default values or keyword parameters). Essentially, Pass 1
does not expand any macros; it only gathers and remembers macro definitions. Any normal assembly code
lines that are not part of macro definitions might be written to an intermediate file unchanged, but macro
definitions themselves are not passed to the next stage (they’re kind of “eaten” by the macro processor
and replaced with nothing or a placeholder). By the end of Pass 1, the macro processor knows about all
macros (names and bodies) defined in the program macro calls to be resolved even if they occur “before”
the definition in the source file, as long as the definition exists somewhere (the Pass 1 ensures everything is
collected).
Pass 2 (Macro Expansion Pass): In the second pass, the assembler actually processes the program for
translation, but now it is equipped with knowledge of macros. When it encounters a macro call (an opcode
that matches a name in the Macro Name Table), it will pause reading from the main input and instead
expand the macro: it looks up the macro’s body from the MDT, then for each line of the macro body, it
takes it and performs argument substitution (replace formal parameters with the actual arguments from the
call). These substituted lines are then inserted into the output (or into the assembly input stream) as if
they had come from the original source. The macro processor may use an Argument Table (ARGTAB) or
similar structure during expansion: when a macro call is recognized, it takes the arguments from the call and
stores them in ARGTAB indexed by the parameters. Then as it writes out the macro’s body from MDT,
whenever it sees a parameter symbol (like &X), it replaces it with the corresponding ARGTAB value (the
actual argument). If the macro body contains nested macro calls, the macro processor will recursively expand
those as well by looking up those macro names in MNT and processing similarly. The expansion process
often uses a Macro Expansion Counter (MEC) or similar to keep track of where in the macro definition it is,
and perhaps a stack to handle nested expansions. Once the macro expansion is done, the macro processor
resumes scanning the main input where it left off.
Effectively, after Pass 2 (macro expansion), the assembler has a fully expanded source program with no
macros – just regular assembly instructions. At this point, the normal assembly translation (into machine
code) continues. In some designs, the macro expansion Pass 2 is interwoven with the actual code generation
pass of the assembler. In other designs, the macro processor is a distinct preprocessing step that outputs an
expanded source file which is then fed into a separate assembly phase.
One-pass macro processors: It’s worth noting that it’s possible to design a one-pass assembler that also
handles macros in one pass, but it requires that macros be defined before use (so definitions appear earlier
in the file than any call to them). If that rule is followed, then the assembler can expand macros on the fly:
whenever it sees a macro definition, store it; whenever it sees a macro call, expand it immediately. However,
if macros can be forward-referenced or especially if macros can be defined within other macros, one-pass
expansion becomes difficult. That’s why the two-pass approach is more general. For instance, if Macro A is
defined inside Macro B’s expansion, a one-pass assembler wouldn’t even know about A until it expanded B,
which would be too late if A was called earlier or something. The two-pass macro processor cleanly separates
definition and use phases, similar to how a two-pass assembler separates label definition and usage.
Data Structures for Macro Processor: As mentioned:
MNT (Macro Name Table): stores macro names and pointers/indices to MDT entries, plus perhaps
parameter info (number of params, etc.).
MDT (Macro Definition Table): stores the actual lines of macro bodies. Often, a special indicator (like
“MEND”) is stored as well to mark end of each macro in this table.

4
ALA (Argument List Array) or ARGTAB: Used during expansion to map macro definition’s parameters
to the invocation’s arguments.
Optionally, KPDTAB (Keyword Parameter Default Table) and EVTAB (Expansion Time Variable Table)
if the macro language supports default parameters or special variables.

You might also like