SS VVFGC New
SS VVFGC New
Background
This chapter is an introduction to the design and implementation of system
software. System software consists of a variety of programs that supports the
operation of computer.
Software is a collection of data and instructions for controlling ,integrating and
managing the hardware components of computer and perform specific task.
There are two types of software such as ,
System software
Application software.
1. Instruction Interpreter
2. Location Counter
3. Instruction Register
4. Working Registers
5. General Register
MEMORY
INSTRUCTION
An instruction is a single operation of a process. Instructions may be of the fallowing
types:
Arithmetic instructions
Logical instructions
Control or transfer instructions
Interrupt instructions
Instructions may be of different formats depending on type of operands. The different
operand types are :
register operands
storage operands
immediate operands
The different instruction formats are:
I. RR (Register to Register) format
II. RX (Register and Indexed) format
III. RS (Register and Storage) format
IV. SI (Storage and Immediate) format
V. SS (Storage to Storage ) format
i. Assembler: In the earlier stages the computer programmers are used to write
programs using 0’s and 1’s. The programmers found difficult in writing programs
using machine language. In order to overcome difficulty an Assembly language is
developed. It is a low level programming language that allows a user to write
programs using letters and symbols (mnemonic) which are more easily
remembered. An assembler is a system program that converts programs written
in assembly language into machine language, which can be executed by a
computer.
ii. Loaders: Loader is the system program which is responsible for loading
programs and libraries into memory and prepares them for execution. The
assembler itself could place the object program directly in memory and transfer
control to it, then that machine level language program is executed. But there are
two disadvantages i. Wastage of memory-assembler itself occupies more space in
memory during execution. ii. Wasting translation time-need of retranslation of the
program with each execution. In order to avoid this the new system software
called Loader is introduced. If the program size is very large then subdivide the
program into smaller routines called subroutines The loader must performs
following four functions i. Allocation: Allocate space in memory for the programs
ii. Linking: Resolve symbolic references between object decks iii. Relocation:
Adjust all address dependent locations iv. Loading: Physically place the machine
instructions and data into memory Based on the loading function the loader is
divided into different types they are a) Compile and go b) Absolute loader c)
Relocating loader d) Direct linking e) Dynamic loading and linking .
iv. Compiler: As the user started concentrating problems into areas such as
scientific, business, statistical areas. High level languages were developed to
express certain problems more easily. COBOL, FORTRAN, PASCAL, ALGOL, C.,
etc. are high level languages, which is processed by compilers and interpreters. A
compiler is a software program that converts high-level language into a machine
language, which can be executed by a computer.
Assembler
Compilers
Interpreter
Loaders
Linkers
Macros
Operating system
Assembler
Loaders: Loader is the system program which is responsible for loading programs
and libraries into memory and prepares them for execution.
Linkers: Linker is the system program which intakes the object code generated
by the assembler or compiler and combine them to generate the executable
module.
Machine level language is the lowest and most elementary level of programming
language. The computer can understand only string of binary digits (bits) 0 and
1. Such programming language is called Machine level language. Since a
computer is capable of recognizing electric signals, it understands machine
language. The symbol zero (0) stands for the absence of an electric pulse and the
one (1) stands for the presence of an electric pulse.
Compiler Interpreter
Scans the entire program and Translates program one statement at a
translates it as a whole into machine time.
code.
It takes large amount of time to analyse It takes less amount of time to analyse
the source code but the overall the source code but the overall
execution time is comparatively faster. execution time is slower.
Generates intermediate object code No intermediate object code is
which further requires linking, hence generated, hence are memory efficient.
requires more memory.
The compilation is done before Compilation and execution take place
execution. simultaneously.
Display all errors after compilation, all Displays error of each line one by one.
at the same time.
C, C++, C#, Scala, typescript uses Shell scripts, Java, PHP, Perl, Python,
compiler Ruby uses an interpreter.
Functions of assembler
Assembler can take an input and produces its machine instructions
Converts symbolic instructions for each machine instructions
It decides the proper instruction format
Converts the data constants to internal machine representations
Write the object program and the assembly listing
Types of Assembler
There are two types of assemblers based on how many passes through the source are needed
(how many times the assembler reads the source) to produce the object file.
One-pass assemblers go through the source code once. here source program is translated
instruction by instruction. When Assembler references labels, it leave address space for label
and when assembler found the declaration of label, it uses back patching.
Multi-pass (Two) assemblers create a table with all symbols and their values in the first passes,
then use the table in later passes to generate code.
Forward referencing: Using an identifier before its declaration is called a forward reference.
In forward referencing, variable or label is referenced before it is declared. Different problems
can be solved using One Pass or Two Pass forward referencing.
In One Pass forward referencing source program is translated instruction by instruction.
Assembler leave address space for label when it is referenced and when assembler found the
declaration of label, it uses back patching.
In Two Pass forward referencing consist of two passes.
During first pass symbol table, op-code table and label table are maintained.
In op-code table instruction size and address is stored.
Label and label's address is stored in label table. When label is encountered, its name is
stored in label table when label declaration is found then its location is also stored in
label table.
During 2nd Pass, translation from source language to machine language takes place. Instruction
addresses and label addresses are used from symbol table instead of their names.
2.3 General design Procedure: The below six steps that should be followed by the designer
i. Specify the problem statement.
ii. Specifies data structures (database)
iii. Define format of data structure
iv. Specify the algorithm
v. Look for modularity
Assembler must (i) Generate instructions and (ii) Process pseudo ops. This sequence of
operations divided into two passes:
Pass 1-purpose: Define the symbols and literals
a. Determine the length of machine instruction (MOTGET1)
b. Keep track of Location Counter (LC)
c. Remember the values of symbols until pass2 (STSTO)
d. Process some pseudo ops (POTGET1)
e. Remember the values (LITSTO)
The above steps can be overviewed in the below diagram
Pass 2: databases
i. Copy of the source program input to pass-1
Usha Kamala B T,VVFGC, Tumakuru
5
Chapter-2 Assembler System Software
ii. LC-it is used to store each instruction location
iii. MOT-it is used to store all directives or mnemonics and its length, binary opcode and
instruction format
iv. POT- it is used to store all directives and its action corresponding index
v. ST-it is used to prepare by pass 1 it consists of each label and its value
vi. BT-it is used to store which registers are currently specified as base register by USING
pseudo opcode and it specifies the contents of these registers
vii. INST-this is a work space used to store each instruction as its various parts, Ex: binary
opcode, register fields, length fields etc.
viii. Print line: it is also work space used to produces a printed listing
ix. Punch card: it is also work space used for converting the assembled instructions in the
format needed by the loader.
ii. Pseudo Operation Table (POT):this table containing pseudo op and associated pointer to
processing the pseudo ops. The size of POT table is 8 bytes per entry. The content of this table
are not filled in or altered during the assembly process. Pass1 and Pass2 assembler uses two
separate POT table. The below diagram shows the format of POT.
iii. Symbol Table (ST) and Literal Table (LT):this table containing, name of each entry, value of
the fields, length of a field and relative location indicator. The size of ST table is 14 bytes per
entry. The length fields indicates the length (in bytes) of the instruction. The relative location
indicator tells the assembler whether the value of the symbols is absolute or relative to the base
program. The below diagram shows the format of Symbol Table and Literal Table.
iv. Base Table (BT): the base table contains the relative address of the symbol. The size of BT is
4 bytes per entry.
PASS2 – Assembler: Generate Code: The purpose of pass2 is to generate the machine code
and structure the generated code into appropriate format for loader.
Algorithm
1. First location counter is initialized as zero
2. Read the statement from source file copied by pass1
3. Process the opcode field
a. If it is a machine opcode then it will store in to the MOT entry and to find length and
binary opcode and format of the instruction format. And, process the instruction based on
its format like RR, RX, RS, SS, etc.
b. If it is pseudo op of type
i. If it is USING pseudo-op or DROP pseudo opcode, then they may require addition
processing in pass2
ii. If it is a DC pseudo op convert the constant and output it thereby updating LC
iii. If it is a DS pseudo opcode then update the LC value
iv. If it is a START or EQU pseudo opcode just print the card in program.
v. If it is END pseudo op indicates the end of the source program and terminates the
assembly.
Macro call
The occurrences of the macro name in the source program is called as macro call. OR Once the
macro has been defined, the use of the macro name in the place of sequence of instruction is
called as macro call or macro instruction.
For example we can assign the name INCR to the repeated sequence, as follows
:
:
INCR
:
:
INCR
:
:
Macro Expansion
When there is a macro call the macro processor substitutes the macro definition in the place of
macro call is called Macro expansion. In this macro expansion MACRO, MEND and name of
the macro doesn’t appear in the expanded source code. All other remaining lines are appearing.
In the above program the sequence of instructions are similar but not identical. The first
sequence performs an operation using DATA1 as operand, second sequence performs an
operation using DATA2 as operand, so the same operation performs with the different types of
parameters. Such type of parameters is known as macro instruction argument or dummy
arguments. It is specified on the macro name line with an ‘&’ as its first character.
The preceding program could be written as:
It is possible to supply more than one argument in a macro call. These arguments are separated
by comma
MACRO
INCR &arg1, &arg2, &arg3
A 1, &arg1
A 2, &arg2
A 3, &arg3 Expanded source code
MEND
.
.
INCR DATA1, DATA2, DATA3 A 1, DATA1
. A 2, DATA2
. A 3, DATA3
(ii) AGO: It is an unconditional branching pseudo opcode, It is also called as goto statement.
The general format
AGO<sequence label>
i. Recognize macro definitions: The macros are recognized by the keyword MACRO and
MEND pseudo-opcodes. It identifies the nested macro the macro processor must recognize the
nesting and should correctly match the last or outer MEND with the first macro
ii. Save the definitions: The macro processor stores all the macro instruction definitions in
memory which it will need for expanding macro calls.
iii. Recognize calls: It must recognize macro calls (i.e., macro name) that appear as operation
mnemonics.
iv Expand calls and substitute arguments: The macro processor substitute dummy or macro
definition arguments with the corresponding arguments from a macro call. Then assembly
language text is then substituted for the macro call.
Pass2 database:
i. The copy of the input macro source deck
ii. The output expanded source deck to be used as input to the assembler
iii. Macro definition table(MDT), created by pass1
iv. The macro name table(MNT), created by pass1
v. The macro definition table counter (MDTP),used to indicate next line of text to be used
during macro expansion.
vi. The argument list array (ALA),used to substitute macro call arguments for the index
markers in the stored macro definition.
Argument List Array (ALA): Argument list array (ALA) maintains the details about the
parameters. ALA is used in both pass1 and pass2,but the functions are reverse in both the passes.
In pass1,when the macro definitions are stored, the arguments in the macro definitions are
replaced by index markers (#).
&LOOP1 INCR &arg1, &arg2, &arg3
#0 A 1, #1
A 2, #2
A 3, #3
MEND
In pass2, arguments in the macro call are substituted for the index markers stored in macro
definition
For example,
Consider a macro call
LOOP1 INCR DATA1, DATA2, DATA3
The macro call expander would prepare the following ALA
Macro name table (MNT): MNT is used to store the names of the defined macros. Each MNT
entry consists of macro name whose size is 8 bytes. The size of the MDT index is 4 bytes.
Therefore the size of the MNT is 12 bytes per entry.
flowchart of Pass1
Flowchart pass-2
Algorithm pass-2
Step1: Read next line fr0m the source program copied by pass1
Step2. a: Search macro name table (MNT) for match with operation code. Check whether you
have encountered a macro call i.e. checks whether a macro name is found.
i. if it is a macro name, set the macro definition table points, to the corresponding
macro definition stored in macro definition table (MDT) for that assign MDT index
field of MNT entry to MDTP
ii. prepare argument list array (ALA)
iii. Increment macro definition table counter (MDTP)
iv. Get line from macro definition table (MDT)
The figure below shows the different data structures described and their
relationship
DEFINE
MACRO
PROCESSOR
GETLINE
EXPANDING=FALSE PROCESSLINE
EXPAND
GETLINE
PROCESSLINE
EXPANDING=TRUE
GETLINE
GETLINE PROCESSLINE
EXPANDING FALSE
TRUE
Disadvantages
The program becomes too large to fit into the core of some machines
Program becomes complex with macros
NOTE:
Macro processor for assembly language can be implemented in various ways. They are;
1. Independent two pass processor.
2. Independent one pass processor.
3. Processor incorporated into pass 1 of a standard two pass assembler.
Loaders
Introduction
In a computer operating system, a loader is a component that locates
a given program (which can be an application or, in some cases, part of
the operating system itself) in offline storage (such as a hard disk), loads it
into main storage (in personal computer, it's called random access
memory) and give a program control of the computer (allows it to execute
its instructions).
Source
Program Results
Object Binary
Module Programs
Functions of Loader
The loader is responsible for the activities such as allocation, linking,
relocation and loading
1) Memory Allocation: It allocates the space for program in the
memory, by calculating the size of the program. This activity is
called allocation.
2) Linking: It resolves the symbolic references (code/data) between the
object modules by assigning all the user subroutine and library
subroutine addresses. This activity are called linking.
3) Relocation: There are some address dependent locations in the
program, such address constants must be adjusted according to
allocated space, such activity done by loader is called relocation.
4) Loading: Finally it places all the machine instructions and data of
corresponding programs and subroutines into the memory. Thus
program now becomes ready for execution, this activity is called
loading.
Memory
Fig: Loading scheme
LOADING SCHEMES
Based on the various functionalities of loader, there are various types
of loaders:
1) Compile and Go loader or Assemble and Go loader.
PROGRAM LOADED
SOURCE COMPILE-AND- GO IN MEMORY
PROGRAM TRANSLATOR
MEMORY
Advantages
This scheme is simple to implement. Because assembler is placed at
one part of the memory and loader simply loads assembled machine
instructions into the memory.
Disadvantages
In this scheme some portion of memory is occupied by assembler
which is simpy a wastage of memory. As this scheme is combination
of assembler and loader activities, this combination program
occupies large block of memory.
There is no production of .obj file, the source code is directly
converted to executable form. Hence even though there is no
modification in the source it needs to be assembled and executed
each time, which then becomes a time consuming activity.
It cannot handle multiple source programs or multiple programs
written in different languages. This is because assembler can
translate one source language to other target language.
For a programmer it is very difficult to make an orderly modulator
program and also it becomes difficult to maintain such program, and
the "compile and go" loader cannot handle such programs.
The execution time will be more in this scheme as every time
program is assembled and then executed
2) General Loader Scheme
SOURCE TRANSLATOR
OBJECT PROGRAM READY
PROGRAM
PROGRAM FOR EXECUTION
1 LOADER
SOURCE TRANSLATOR
OBJECT
PROGRAM LOADER
PROGRAM
2
Text cards
This type of card is used to store instructions and data.
The capacity of this card is 80bytes.
'It must convey the machine instructions that assembler has created
along with
assigned core location.
Let's discuss the following example
Transfer cards
These cards must convey the entry point of the program, which is where
the loader 1s to transfer the control when all instructions are loader.
Advantages
1. It is simple to implement.
2. This scheme allows multiple programs or the source programs
written different languages.
If there are multiple programs written in different languages then the
respective language assembler will convert it to the language and a
common object file can be prepared with all the ad resolution.
3. The task of loader becomes simpler as it simply obeys the instruction
regarding
Where to place the object code in the main memory
4. The process of execution is efficient.
Disadvantages
1. In this loader program adjust all internal segment addresses. So that
programmers
must and should know the memory management and address of the
programs
2. If any modification is done in one segment then starting address is
also changed.
3. If there are multiple segments the programmer must and should
remember the
addresses of all sub-routine.
Initialize
Read Card
Card
Type
Statement 1:
start
Statement 2:
read header record [first record or first line]
Statement 3:
program length
Statement 4:
if [it is text card or transfer card]
If it is text card,
then store the data and instruction
Else
Transfer instructions
Statement 5: code is in character for then it will convert in to internal
9 Usha Kamala B T, VVFGC Tumakuru
Chapter-4 Loaders System Software
representation
Statement 6: read next object program
Statement 7: end
Or
Input: Object codes and starting address of program segments.
Output: An executable code for corresponding source program.This
executable
code is to be placed in the main memory
Method: Begin
For each program segment
do
Begin
Read the first line from object module to obtain information about
memory location. The starting address say S in corresponding
object module is the memory location where executable code is to
be placed.
Hence,
Memory location=S
Line counter=1;as it is first line While(! end of file)
For the current object code
do
Begin
1.Read next line
2.Write line into location S
3.S=S+1
4.Line counter=Line counter+1
SUB-ROUTINE LINKAGES
If one main program is transfer to sub program and that sub
program also transfer to another program.
The assembler does not know this mechanism [symbolic
reference]hence it will
declare the error message.
10 Usha Kamala B T, VVFGC Tumakuru
Chapter-4 Loaders System Software
START 500
ENTRY TOTAL
EXTRAN MAX, ALPHA
READ N
LOOP
MOVER AREG, ALPHA
BC ANY, MAX
.
.
.
BC LT, LOOP
STOP
TOTAL DS 1
DS 1
END
Program Unit Q
START 200
ENTRY ALPHA
.
DS
ALPHA END 25
32
OP R1 X2 A2
8
4 4 16
8
13 Usha Kamala B T, VVFGC Tumakuru
8
Chapter-4 Loaders System Software
The list of symbols which are not defined in the current segment but can
be used in the current segment are stored in a data structure called
USE table. The list of symbols which are defined in the current segment
and can be referred by the other segments are stored in a data structure
called DEFINITION table.
Output Given to
Translator Linker
Output
Load in memory
Given to
Loader
2. TXT card
It contains the actual information are text which are already
translated.
1 0 PG 1 START
2 ENTRY PG1ENT1. PG1ENT2
3 EXTRN PG1ENT1. PG2.
4 20 PG1ENT1 _____
5 30 PG1ENT2 _____
6 40 DC A(PG1ENT1)
7 44 DC A(PG1ENT2+15) PG1
8 48 DC A(PG2ENT2-
9 52 DC PG1ENT1 3)
10 56 DC A(PG2)
11 END A(PG2ENT1+PG2-
PG1ENT1+4)
12 0 PG2 START
13 ENTRY
14 EXTRN PG2
15 16 PG2ENT1 _____ PG2ENT1
16 24 DC PG1ENT1. PG1ENT2
17 28 DC
18 32 DC A(PG1ENT1)
19 END A(PG1ENT2+15)
A(PG1ENT2-
PG1ENT1-3)
ESD Cards
In ESD card table contains information necessary to build the
external symbol dictionary or symbol table. In the above source code the
symbol are PG1, PG1ENT2, PG2, PG2ENT1.
6 = A (PG1ENT1) = 20
7 = A (PG1ENT2+15) = 30 + 15 = 45
8 = A (PG1ENT2- PG1ENT1-3) = 30 – 20 – 3 = 7
9 = A (PG2) = 0=
10 = A (PG2ENT1+PG2- PG1ENT1+4) = 0 + 0 - 20 + 4 = -16
18 Usha Kamala B T, VVFGC Tumakuru
Chapter-4 Loaders System Software
6 02 4 + 40
7 02 4 + 44
9 03 4 + 52
10 02 4 + 56
10 03 4 + 56
10 02 4 - 56
Fig: Format of RLD card
16=A (PG1ENT1) = 0
17=A (PG1ENT2+15) =0+15=15
18=A (PG1ENT2-PG1ENT1-3) =0-0-3=-3
Pass1 databases:
1. Input object decks.
2. The initial program load addresses [IPLA]: The IPLA supplied by the
programmer or operating system that specifies the address to load
the first segment.
3. Program load address counter [PLA]: It is used to keep track of each
segments assigned location.
4. Global external symbol table [GEST]: It is used to store each external
symbol and its corresponding assigned core address.
5. A copy of the input to be used later by pass2.
6. A printed listing that specifies each external symbol and its assigned
value.
Pass2 database:
1. A copy of object program is input to pass2.
2. The initial program load address [IPLA].
3. The program load address counter [PLA].
4. A table the global external symbol table [GEST].
Copy of
Input object deck Program
Object Loaded in
Pass 1 of loader Pass 2 of loader
deck memory
Program
load address
(PLA)
Global external
Symbol Table
(GEST)
PASS 1: Algorithm
1. Initial program load address [PLA] it's set to the initial program load
address [IPLA].
2. Read the object card.
3. Write copy of card for pass2.
4. The card can be any one of the following type.
Text card or RLD card- there is no processing required during pass1
and then read the next card.
ESD card is processed based on the type of the external symbols.
5. SD card reads the length field LENGTH from the card is temporarily saved
in the variable SLENGTH.
The value, VALUE to assign to this symbol is set to current value of the
PLA.
The symbol and its assigned value are then stored in the GEST.
If the symbol already existed in the GEST then this is an error.
The symbol and its value are printed as part of the load map.
6. LD is read the value to be assigned is set to the current PLA+ the relative
address [ADDR].The ADDR indicates on the ESD card.
7. ER symbol do not required any processing duration pass1.
8. When an END card is encountered the program load address is
incremented by length of the segment and saved on SLENGTH.
9. EOF cord is read pass 1 is completed and control transfer to pass2.
PASS 2 : Algorithm
If an address is specified a the END card then that address is used as the
executed start address otherwise, the execution will begin from the first
segment.
In pass2 the four cards are read one by one described as follows
At the beginning of pass2 the program load address is initialized as in pass 1
and the execution start address [EXADDR] is set to IPLA.
I. ESD card
SD type=>the length of the segment is temporarily saved in the variable
SLENGTH. The LESA [ID] is set to the current value of the PLA.
LD type=> Does not requires any processing during pass2
ER type=>the gest is searched for a match with the ER symbols. If it is
not found then there is an error. If found in the GEST its value is
extracted and the corresponding LESA entry is set.
2. Text card: The text card is copied from the card to the appropriate
relocated memory location [PLA+ADDR]
3. RLD card: The value to be used for relocation and linking is executed
from the LESA as specified by the ID field. Depending upon the flag
values is either added to or subroutine from the address constants.
4. END card: If an execution start address is specified on the END card it
is saved in a variable EXADDR. The PLA is incremented by length of the
segment and saved in
PASS 2: FLOWCHART
DYNAMIC LOADING
For the entire loader scheme we have assured that all of the subroutine
needed or loaded into memory at the same time.
If the total amount of memory required by all these subroutine exceeds
the amount available especially, for large programs there is a problem to
load all the subroutine into memory at the same time. There are several
hardware techniques to solve this problem such as paging or segments.
Usually the subroutine of a program is need at different times. So dynamic
loading scheme is used.
“The mechanism of loading certain part or required routine of the program
into the main memory only when it is called by the program, is
called Dynamic Loading” , and this enhance the performance of computer.
Example
A(20K)
B(20k)
C(30K)
D(10K)
E(20K)
Fig 1: Subroutine call between the procedures
Above figure illustrate a program consist of five sub programs [A to E]
and that requires 100 K bytes of memory.
The arrow indicator says that
• sub program A only calls B, D and E
• sub program B calls C and E
• sub program D calls only E and
• sub programs C , E does not calls any other sub programs or
routines.
[ Note that procedure B and D are never use at the same time. ]
A(20K)
B(20K) D(10K)
C(30K) E(20K)
DYNAMIC LINKING
The main disadvantages of all of the previous loading scheme are that the
subroutine as references but they never executed, but the loader still in
use the overhead of linking the subroutines. In This mechanism the
loading and linking of external references are postponed until execution
time. the loader loads only the main program . If the main program
should execute a transfer instruction to an external address or external
variable the loader is called. Only then the segment containing the
external reference loaded.
Advantages
1. The number overhead is incurred unless the procedure to be called or
reference is actual used.
2. System can be dynamically reconfigured.
3. Saves memory.
Disadvantages:
1. What is Binder?
The program which performs allocation, relocation, and linking called binder.
2. What is overlays?
The inter dependency of the segments can be specified by a tree like
structure called static overlay structure.
3. What is dynamic loading?
Dynamic loading is the process in which one can attach a shared library
to the address space of the process during execution
4. What is direct linking loader?
A Direct linking loader is a general relocating loader it allows the
programmer to use multiple procedure and multiple data segments.
5. What is relocating loader?
The relocating loader will load the program anywhere in memory, altering
the various addresses as required to ensure correct referencing.
COMPILER
TYPES OF COMPILER
The various types of compiler that you can use are
Cross compilers
One-pass or multi-pass compiler
Source-to-source compiler
Stage compiler
Just-in-time compiler
1. Cross Compiler
A cross compiler is a compiler that creates an executable code for one platform
and runs the created executable code on anther platform. A cross compiler
separate the build environment from the target environment and is useful in a
number of ways, such as it provides compiling for embedded systems and
compiling an operating system for the first time.
Uses of Cross compilers
Embedded computers.
Compiling for multiple machines.
Use of virtual machines.
3.Source-to-source Compiler:
source to source compiler are the complier that take a high level language as
its input and gives an output written in the same high level language.
4. Stage Compiler:
A stage complier is used for compiling assembly language of a machine. For
example, Warren Abstract Machine (WAM) is used as a stage compiler.
1 Usha Kamala B T,VVFGC,Tumakuru
Chapter-5 Compiler System Programming
Terminal table: This table is created by lexical analysis phase and contains all
variable in the program.
Identifier table: It contains all variable in the program and temporary storage
(Ex M1, M2, M3 ... M7] and information needed to reference allocate storage for
the variables. This table is created by lexical analysis.
Re-locatable object codes: The final output of the assembly phase ready to be
use as input to loader.
PHASES OF COMPILER
A compiler is broken into several logical phases that help in the execution of a
source code with efficiency to improve the performance of the compiler. The
common logical phases that you use in a compiler for translating a source code
into the target code are: To understand the phases let's consider the following
examples:
Lexical phase
The lexical phase performs the following three tasks:
1. Recognize basic elements are tokens present in the source code.
2. Build literal and an identifier table.
3. Build a uniform symbol table.
Database:
Lexical phase involves the manipulation of 5 databases
1. Source program
2. Terminal table
3. Literal table
4. Identifier table
5. Uniform symbol table
1.Source program: The original form of the program created by the user.
Example
1 : Yes
2 ; Yes
3 ( Yes
4 ) Yes
5 “ Yes
6 * Yes
7 Declare No
8 Procedure No
9 + Yes
10 - Yes
11 * Yes ‘
12 Rate No
13 Start No
14 Finish No
Fig: Terminal Table
Literal Table:
It describes all literals constants used in the source program. It fields. Other
information and address are stored in lateral phases.
Example
Information
Identifier Table
It describes all identifiers used in the source program. It consists of three fields
It consists of 2 fields:
Table Index
IDN 1 WCM
TRM 1 :
TRM 8 Procedure
TRM 3 (
IDN 2 Rate
TRM 5 ,
IDN 3 Start
TRM 5 ,
IDN 4 Finish
TRM 4 )
TRM 2 ;
Algorithm
Step 1: The first task of the lexical analysis algorithm is to parse the input
character strange into tokens
Step 2: the second step is to make appropriate entries in the table.
Implementation
The input string is separated into tokens by break character. Brake characters
are denoted by the contents of a special field in the terminal table.
Segregating Tokens
i. If symbol== TERMINAL table then create uniform symbol table of type TRM
ii. Else if symbol==IDENTIFIER table then Create uniform symbol table of type
IDN
End if
Syntax Phase:
The functions of the syntax phase are:
DATABASES
1 Uniform symbol table: The uniform symbols are the source of input to the
stack which s used by syntax and interpretation phase
3 Reduction table: The syntax rules of the source language are contained in
the reduction table. The syntax analysis phase is an interpreter driven by the
reductions.
The general form of the reduction or rules is:
1. Label:- optional
2. old top of stack:- to be compared to top of stack
3. Action routines:- to be called if old top of stack matches top of stack
4. new top of stack:- changes to be made to top of stack after action routines
are executed.
5. next reduction:- interpret the next reduction.
ALGORITHM
Step 1: Reduction or tested consequently 1or match between old top of stack
field and the actual top of stack until match 1s found.
Step 2: When match 18 found the action routine specified in the action fields
are executed in ordered from left to right
Step 3: when controlled return to the syntax analyser, it modifies the top of
stack to agree with the new top of tack.
Step 4: step1 is repeated starting with the reduction specified in the next
reduction field.
Interpretation Phase
It is a collection of routines which are called when a Construct is recognized.
Databases
1.Uniform symbol table: The table create a by lexical phase.
The uniform symbols are the source of input to the stack Which s Used by
syntax and interpretation phase.
Nam Bas Scal Precisi Stora Arra Structu Liter Bloc Oth Addre
e e e on ge y re al k er ss
Algorithm:
Optimization Phase
Optimization performed by a compiler is of 2 types. They are:
Data bases:
Literal table: It describes all literals constants used in the source program. It
consists of 6 fields. Other information and address are stored in lateral phases
Algorithm-
1. Place the matrix in a form so that common sub expression can be recognized.
2. Recognize two sub expression as being equivalent.
3. Eliminate one of them.
4. Alter the rest of the matrix to reflect the elimination of this entry.
Example 1:
The following table shows the matrix with sub expression and matrix alter
elimination
1 * A B
2 * A B
3 + M1 M2
4 = SUM M3
M2 * B M1
M3 = A M2
M4 - C D
M5 * M4 5
M6 = E M5
M1 - C D
M2 * B M1
M3 = A M2
M4 - C D
M5 * M1 5
M6 = E M5
Example: A=5*4/2*BB
Before optimizations
M1 * 5 4
M2 / M1 2
M3 * M2 B
M4 = A M3
After optimization
M1
M2
M3 * 10 B
M4 = A M3
If a or b AND c
If the computation within the loop depends on a variable that does not change
with in the loop, the computation may be moves outside the loop
In order to save memory and improve its execution speed, there is a need
machine dependent optimization.
Example l:
Consider A=B+C
The following depicts the code for the above statement
Matrix Original Code Better Code
+B C 1) L1,B 1)L 1, B
2) A 1,C 2) A 1,C
3) ST 1, M1 3) ST 1, A
=A M1 4) L 1, M1
6) ST 1, A
Example 2:
Optimisation (Machine dependent)
After elimination:
1 * A B
3 + M1 M2
4 = SUM M3
Assembly Phase
If majority of work are done by code generation phase then assembly phase
does as follows:
It resolves label reference in object program.
It formats the object deck.
It formats the appropriate information for loader.
1. The assembly phase scans the object code, resolving all label references
and producing the TXT card.
2. Then it scans the identifier table to create ESD cards.
3. The RLD cards are created using object code, ESD cards and their
identifier table.
TXT card contains actual assembled program.
ESD card contains information about all symbol that are defined in the
programbut referenced elsewhere and symbols defined and referenced in
current program.
RLD card contains information about location of each constant to be
changed due to relocation.
PASSES OF A COMPILER
Pass 1:
It corresponds to the lexical analysis of a compiler. It scans the source program
and creates the identifiers, literals and uniform symbol tables.
Pass 2:
It corresponds to syntax and interpretation phases. Pass2 scans the uniform
symbol table produces the matrix.
Pass 3 through Pass N-3 means Pass 4:
They corresponding to the optimization phase.
BALR USING
It is a machine-op It is a pseudo -op
Sets the register with the next address Only provides information to the
assembler