0% found this document useful (0 votes)
22 views

Language Processor Notes

Uploaded by

finooshasv
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Language Processor Notes

Uploaded by

finooshasv
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Familiarization of Language Processing

High-level languages like C, Java, C++, Python, and more are used to write the programs,
called source code, as it is very difficult to write a computer program directly in machine
code. These source codes need to translate into machine language to be executed because
they cannot be executed directly by the computer. Hence, a special translator system, a
language processor, is used to convert source code into machine language. A language
processor is a special type of software program that has the potential to translate the program
codes into machine codes.
Types of language processors
There are mainly three kinds of language processors, which are discussed below:

1. Compiler: The language processor allows the computer to run and understand the
program by reading the complete source program in one time, which is written in a
high-level language. The computer can then interpret this code because it is translated
into machine language.

Phases of compiler
Lexical Analysis
This phase scans the source code as a stream of characters and converts it into meaningful
lexemes. Lexical analyzer represents these lexemes in the form of tokens as:

<token-name, attribute-value>
Syntax Analysis
The next phase is called the syntax analysis or parsing. It takes the token produced by lexical
analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements
are checked against the source code grammar, i.e. the parser checks if the expression made by
the tokens is syntactically correct.
Semantic Analysis
Semantic analysis checks whether the parse tree constructed follows the rules of language.
Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether
identifiers are declared before use or not etc. The semantic analyzer produces an annotated
syntax tree as an output.
Intermediate Code Generation
After semantic analysis the compiler generates an intermediate code of the source code for
the target machine. It represents a program for some abstract machine. It is in between the
high-level language and the machine language. This intermediate code should be generated in
such a way that it makes it easier to be translated into the target machine code.
Code Optimization
The next phase does code optimization of the intermediate code. Optimization can be
assumed as something that removes unnecessary code lines, and arranges the sequence of
statements in order to speed up the program execution without wasting resources (CPU,
memory).
Code Generation
In this phase, the code generator takes the optimized representation of the intermediate code
and maps it to the target machine language. The code generator translates the intermediate
code into a sequence of (generally) re-locatable machine code. Sequence of instructions of
machine code performs the task as the intermediate code would do.
2. Assembler

An assembler converts programs written in assembly language into machine code. It is


also referred to assembler as assembler language by some users. The source program has
assembly language instructions, which is an input of the assembler. The assemble
translates this source code into a code that is understandable by the computer, called
object code or machine code.
The translation of source program to object code requires to accomplish the following
basic functions:

1. Convert mnemonic operation codes to their machine language equivalents.


2. Convert symbolic operands to their equivalent machine addresses.
3. Build the machine instructions in the proper format
4. Convert the data constants specified in the source program into their internal machine
Representations.
5. Write the object program and assembly listing.

The simple object program contains three types of records: Header record, Text record and
end record. The header record contains the starting address and length. Text record contains
the translated instructions and data of the program, together with an indication of the
addresses where these are to be loaded. The end record marks the end of the object program
and specifies the address where the execution is to begin.

The format of each record is as given below.


Header record:

Col 1 H
Col. 2-7 Program name
Col 8-13 Starting address of object program(hexadecimal)
Col 14-19Length of object program in bytes(hexadecimal)

Text record:

Col. 1 T
Col 2-7. Starting address for object code in this record (hexadecimal)
Col 8-9 Length off object code in this record in bytes (hexadecimal)
Col 10-69 Object code, represented in hexadecimal (2 columns per byte of object code)

End record:
Col. 1 E
Col 2-7 Address of first executable instruction in object
program (hexadecimal)

The assembler can be designed either as a single pass assembler or as a two-pass assembler.
The general description of both passes is as given below:
• Pass 1 (define symbols)

– Assign addresses to all statements in the program


– Save the addresses assigned to all labels for use in Pass 2
– Perform some processing of assembler directives, including those for address assignment,
such as BYTE and RESW etc.

• Pass 2 (assemble instructions and generate object program)


– Assemble instructions (generate opcode and look up addresses)
– Generate data values defined by BYTE, WORD
– Perform processing of assembler directives not done during Pass 1
– Write the object program and the assembly listing

The simple assembler uses two major internal data structures: the operation Code Table
(OPTAB) and the Symbol Table (SYMTAB).

OPTAB:

·It is used to lookup mnemonic operation codes and translates them to their machine
language equivalents. In more complex assemblers the table also contains information about
instruction format and length.
·In pass 1 the OPTAB is used to look up and validate the operation code in the source
program. In pass 2, it is used to translate the operation codes to machine language.
·OPTAB is usually organized as a hash table, with mnemonic operation code as the key
Most of the cases the OPTAB is a static table- that is, entries are not
normally added to or deleted from it.

SYMTAB

This table includes the name and value for each label in the source program, together with
flags to indicate the error conditions (e.g., if a symbol is defined in two different places).
·During Pass 1: labels are entered into the symbol table along with their assigned address
value as they are encountered. All the symbols address value should get resolved at the pass
1.
·During Pass 2: Symbols used as operands are looked up the symbol table to obtain the
address value to be inserted in the assembled instructions.

Apart from the SYMTAB and OPTAB, this is another important variable which helps in the
assignment of the addresses. LOCCTR is initialized to the beginning address mentioned in
the START statement of the program. After each statement is processed, the length of the
assembled instruction is added to the LOCCTR to make it point to the next instruction.
Whenever a label is encountered in an instruction the LOCCTR value gives the address to be
associated with that label.

3. Macroprocessor
Macro represents a group of commonly used statements in the source programming
language. Macro Processor replaces each macro instruction with the corresponding group
of source language statements. This is known as the expansion of macros. Macro
Processor involves definition, invocation, and expansion.
There are three main data structures involved in macro processor. The macro definitions
themselves are stored in a definition table (DEFTAB), which contains the macro
prototype and the statements that make up the macro body (with a few modifications).
Comment lines from the macro definition are not entered into DEFTAB because they will
not be part of the macro expansion. References to the macro instruction parameters are
converted to a positional notation for efficiency in substituting arguments. The macro
names are entered into NAMTAB, which serves as an index to DEFTAB. For each macro
instruction defined, NAMTAB contains pointers to the beginning and end of the
definition in DEFTAB. The third data structure is an argument table (ARGTAB),
which is used during the expansion of macro invocations. When a macro invocation
statement is recognized, the arguments are stored in ARGTAB according to their
position in the argument list. As the macro is expanded, arguments from ARGTAB are
substituted for the corresponding parameters in the macro body.

You might also like