0% found this document useful (0 votes)
7 views20 pages

Compiler II Lecture Note For GOUNI

Uploaded by

Ozioko Johnpaul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views20 pages

Compiler II Lecture Note For GOUNI

Uploaded by

Ozioko Johnpaul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

October 17, 2017 [COMPILER CONSTRUCTION II] O. E.

Oguike

CSC437, Compiler Construction II


Asso. Prof. Oguike, Osondu Everestus
Department of Computer Science and Mathematics,
Godfrey Okoye University of Nigeria,
Enugu

Course Contents
1. Grammar and language
2. Recognizer
3. Top down and bottom up parsing
4. Production language
5. Run time storage organization
6. The use of display in run time storage allocation
7. L-R grammar and analyzer
8. Construction of L-R table
9. Organisation of symbol tables
10. Allocation of storage to runtime variables
11. Code generation
12. Optimization
13. Translator writing system

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 1


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

1 Introduction to Compiler Construction


1.0 Introduction
A compiler is a piece of software that converts a source program written in a high level
programming language into a target code in another programming language, lower than the source
code. This course focuses on the theoretical concepts in the construction or engineering of a
compiler. On the other hand, an interpreter with substitution converts a line of program or an
instruction into a target code before executing the line of code, while a pure interpreter does not
convert an instruction into a target code before it executes the line of instruction. Compilers and
Interpreters belong to the family of software, called Translators.

1.1 Tasks in the Compilation Process


The process of converting a source code into a target code is called compilation. This process may
require the transformation of the source code into an intermediate target code, before transforming
the intermediate code into target code. The compilation process requires the following tasks:

 Analysis:
This task determines the structure of the source code and its meaning. The analysis task
deals with the properties of the programming language used to write the source code. It
converts the source code into an abstract representation, based on the syntax of the
programming language. This abstract representation is usually implemented as a tree. The
analysis task is further broken into two major compiler tasks, which are:

o Structural Analysis

This task determines the static structure of the source code. It is further divided
into the following tasks:

 Lexical Analysis

This task identifies the basic symbols or words or tokens of the source
code. The process of performing this task is called lexical analysis, and the
part of the compiler that does this task is called lexical analyzer or scanner.

 Syntactic Analysis

This task ensures that the source code conforms to the syntax of the source
programming language.

o Semantic Analysis

This task determines the meaning of the source code.

 Synthesis
This task creates a target code equivalent of the source code. This task begins from the
developed abstract representation (tree), by providing additional information that relates to
the mapping of the source code to target code. Synthesis consists of these two major tasks:

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 2


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

o Code Generation
This task transforms the abstract representation into an equivalent target code.

o Assembly
This task resolves target addressing and converts the target code into an
appropriate output format accepted by the line editor.

However, at any stage during the compilation process, errors can be detected and reported. This
means that error handling is part of the tasks that compiler performs at any stage of the
compilation process.
Finally, code optimization is another task that a compiler performs, which attempts to improve
the target code, based on specific measure of cost, i.e. code size, execution speed. The diagram
that follows shows the various tasks of a compiler.

Compilation

Synthesis
Analysis

Code Generation Assembly


Structural Analysis Semantic Analysis

Lexical Analysis Syntactic Analysis

Figure 1.1 Tasks in the Compilation Process

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 3


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

2. Syntactic Analysis
2.0 Introduction
This is one of the tasks in the compilation process, which ensures that source codes conform to
the syntax of the source programming language.

2.1Grammar and Language


Compiler is written for a particular programming language; therefore, we need to consider some
of the properties of programming language, for which a compiler is written for. The basic
structures of any programming language are the syntax and semantic of the programming
language. While the syntax focuses on the rules that will be followed in order to form a legal
word of the programming language, the semantic addresses the meaning of the words of a
programming language. In the past, the syntax and semantic of a programming language were
described by lengthy English explanation. However, one of the advances in programming
languages has been the development of formal system for describing the syntax of a programming
language. The formal system for describing the syntax of a programming language is called the
Grammar of the language. One of the formal systems for describing the syntax of a programming
language is Context Free Grammar. CFG. It uses the following notations: Backus Naur Form
(BNF), Extended Backus Naur Form (EBNF), syntax diagram/rail road diagram. The Context
Free Grammar and each of its notations for describing the syntax of a programming language will
be considered in detail.

2.1.1 Lexical Structure of Programming Language


Lexical structure of a programming language is the structure of the words or tokens of the
programming language. In the scanning/lexical analysis task, the scanner or lexical analyser
identifies the type of token/word that each sequence of characters that is separated by a white
space, in the input program belongs to. A sequence of character, which is separated by a white
space can belong to any of the following words/tokens:

 Reserved words; This is sometimes called the keywords of that programming language.
In Java programming language, reserve words or keywords can be any of the following:
java, public, static, void, if, while etc.

 Constant or Literals: Depending on the data type of the programming language, it can be
any of the following: numeric constant e.g. 34, 657 etc, string constant e.g. “hello”,
“welcome” etc, Boolean constant e.g. true, false.

 Special symbols: These are the special symbols that the programming language permits.
Examples of special symbols in Java programming language are: {, }, <, =. >, +, - etc.

 Identifiers: These are names that the programmer chooses for user defined names. The
following can serve as identifier in Java, myjavaprogram, my_program etc. A reserved
word cannot be used as identifier. In some programming languages, identifiers have a
fixed maximum size, while in others, identifiers can have any length.

The principle of longest substring is always used to determine the end of a token or word. This
principle means that tokens are separated by token delimiter or white space.

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 4


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

2.1.2 Context Free Grammar


Context free grammar is one of the formal systems for describing the syntax of a programming
language, which consists of series of grammar rules. Each grammar rule consists of a left hand
side that is a single structure name, then a metasymbol, ‘:=’, followed by a right hand side that
can be a sequence of items that can be symbols or other structure names. Each structure name,
like sentence is written as <sentence>, and it is called non-terminal. This is because it can be
broken into other structures. If a token or word cannot be broken down into other structures, it is
called terminal. Consider the following examples of grammar that describes the syntax of English
language:
<sentence> := <nounphrase> <verbphrase>.
<nounphrase> := <article> <noun>
<article> := a | the
<noun> := girl | dog
<verbphrase> := <verb> <nounphrase>
<verb> := sees | pets
There are six non-terminals, six grammar rules and six terminals. Each of the grammar rules can
also be called production. The above grammar is a context free grammar because the non-
terminals appear singly on the right hand side of the production, which means that any non-
terminal can be replaced by any alternative terminal on the right hand side of the production.
There is no context under which the replacement can be made.

2.1.3 Backus Naur Form (BNF)


John Backus and Peter Naur developed some notations, called meta-symbols that will be used to
write context free grammar. The notations are:
 :=, which means “consists of” or “is the same as”
 < >, Two angular brackets that enclose the name of the token or word.
 |. Vertical line, which means “or” or alternative.
When a context free grammar is written using these meta-symbols, we say that the context free
grammar is in Backus-Naur Form.
The following example of context free grammar is in Backus Naur Form:

<exp> := <exp> + <exp> | <exp> . <exp>


| (<exp> ) | <number>
<number> := <number> <digit> | <digit>
<digit> := 0|1|2|3|4|5|6|7|8|9

The first production in the above context free grammar in BNF means that an expression, <exp>
consists of <exp> followed by a + sign, followed by an <exp>, OR an <exp> consists of an <exp>
followed by a . sign, followed by an <exp>, OR an <exp> consists of <exp> enclosed in ()
bracket, OR an <exp> is a <number>.
Furthermore, in the second production, <number> consists of <number>, followed by <digit>,
OR, a <number> consists of <digit>.
In the third production, <digit> is a 0 or 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9.

2.1.4 Extended Backus Naur Form (EBNF)


Some meta-symbols were introduced into the original sets of meta-symbols in BNF, to give rise
to an extension of Backus Naur Form, called Extended Backus Naur Form. The new meta-
symbols that were introduced to BNF are listed below:
 { }, this means ‘zero or more of anything inside the bracket’. This meta-symbol is used
for repetition.

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 5


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

 [ ], this means that anything inside the square bracket is optional.


Example, consider the following context free grammar in BNF.
<exp> := <exp> + <term> | <term>
<term> := <term> . <factor> | <factor>
<factor> := ( <exp> ) | <number>
<number> := <number> <digit> | <digit>
<digit> := 0|1|2|3|4|5|6|7|8|9
This context free grammar can be written in EBNF, using the new meta-symbols where
appropriate, as follows:
<exp> := <term> { + <term> }
<term> := <factor> {. <factor> }
<factor> := ( <exp> ) | <number>
<number> := <digit> { <digit> }
<digit> := 0|1|2|3|4|5|6|7|8|9
The first production means that <exp> consists of <term> followed by zero or more of the
addition symbol, followed by <term>, while the second production means that <term> consists of
<factor> followed by zero or more of . followed by <factor>. The fourth production means that
<number> consists of <digit> followed by zero or more of <digit>.
Furthermore, consider the following context free grammar for <if-statement> in BNF.
<if-statement> := IF <condition> THEN <statement> |
IF <condition> THEN <statement> ELSE <statement>
The new meta-symbol can be used to present the context free grammar for the <if-statement> in
EBNF, as follows:
<if-statement> := IF <condition> THEN <statement> [ELSE <statement>]

2.1.5 Syntax Diagram


The syntax diagram is a useful graphical representation of the grammar rule. It shows the
sequence of the terminals and non-terminals encountered on the right hand side of the grammar
rule. The diagram helps you to know how to form a valid word/token of a programming language.
Different literatures adopt different conventions for the syntax diagram, but the principle remains
the same. Follow the arrow, anything that you get as you follow the arrow will be used in the
formation of the word/token.
Consider the following context free grammars in BNF.
<nounphrase> := <article> <noun>
<article> := a | the
The syntax diagram for the above context free grammar in BNF is given below:

When we combine the two, we obtain the following syntax diagram:

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 6


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

noun-phrase
noun

the

Consider each of the following production rules in EBNF with the corresponding syntax diagram.

<exp> := <term> { + <term> }


exp
term

<term> := <factor> {. <factor> }


term
factor

<factor> := ( <exp> ) | <number>


( exp
factor

number

<number> := <digit> { <digit> }

number
digit

<digit> := 0|1|2|3|4|5|6|7|8|9

digit

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 7


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

The rail road diagram for the production rule <finalyear> in EBNF has been shown below:

<finalyear> ::= <fourthyear>[<external>]

<finalyear>
<fourthyear> <external>

Furthermore, the context free grammar for the <if-statement> is shown below in EBNF and its
syntax diagram is shown below.
<if-statement> := IF <condition> THEN <statement> [ELSE <statement>]

In conclusion, syntax diagram are best written from context free grammar that are in EBNF.

2.2Parse Trees
The context free grammar is used to generate the parse tree. The nodes of the parse tree are the
non-terminals that are specified and defined in the context free grammar, while the leaves of the
parse tree are the terminals. When you read from left to right will form the input strings of token
or words that you are parsing. The following examples will illustrate how a parse tree is generated
from the context free grammar. Consider the context free grammar for the token/word,
<sentence>, which is given as:

<sentence> := <nounphrase> <verbphrase>.


<nounphrase> := <article> <noun>
<article> := a | the
<noun> := girl | dog
<verbphrase> := <verb> <nounphrase>
<verb> := sees | pets

Consider this example: Suppose we want to generate the parse tree that will be used to parse the
sentence, “The girl sees a dog.” . The parse tree is given as:

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 8


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

<sentence>

<nounphrase> <verbphrase> .

<article> <noun> <verb> <nounphrase>

The girl sees <article> <noun>

a dog
Figure 2.2.1

Observe that the leaves of the parse tree are the terminals, while the nodes of the parse tree are the
non-terminals. When the parse tree is recombined from left to write, you get back the word that
you are parsing.
Consider another example: Suppose the context free grammar in BNF for the token/word,
<number> is given below as:
<number> := <number> <digit> | <digit>
<digit> := 0|1|2|3|4|5|6|7|8|9
The parse tree for the number 2345 is given as:

<number>

<number> <digit>

<number> <digit> 5

<number> <digit> 4

<digit> 3

2
Figure 2.2.3

When we remove all the non-terminals in the parse tree, we obtain the syntax tree, which is
shown below as:
5

2
Figure 2.2.4
When the leaves of the tree are recombined from left to right, we obtain the number, 2345..

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 9


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

Consider another example: Suppose the context free grammar for the token/word, <exp> is given
below as:
<exp> := <exp> + <exp> | <exp> . <exp>
| (<exp> ) | <number>
<number> := <number> <digit> | <digit>
<digit> := 0|1|2|3|4|5|6|7|8|9
We want to use the context free grammar to generate the parse tree for this <exp>, 3 + 4.5
<exp>

<exp> + <exp>

<number> <exp> . <exp>

<digit> <number> <number>

3 <digit> <digit>

4 5
Figure 2.2.5
When we remove all the non-terminals, we obtain the syntax tree, as shown below:
+

3 .

4 5
Figure 2.2.6

Furthermore, the parse tree for <exp> = 3.4 + 5 is given as:

<ecp>

<ecp> + <ecp>

<exp> , <exp> <number>

<number> <number> <digit>

<digit> <digit> 5

3 4

When we remove all the non-terminals, we obtain the syntax tree, as shown below:

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 10


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

, 5

3 4

Exercise 2.2
Use the above context free grammar in BNF to draw the parse tree and the syntax tree for the
following expressions:
a. 345.65 + 64.3

<exp>

<exp> + <exp>

<exp> . <exp> <exp> . <exp>

<number> <number> <number>

<number> <digit> <number> <digit>number> <digit> <number>

<number> <digit> 5 <digit> 5 <digit> 4


<digit>
4 6
<digit> 6 3

.
.

3
5 6 5 6 4
3 4

b. 20 + (15.3 + 5)

2.3 Left Recursive and Right Recursive Grammar


Consider this context free grammar in BNF.
<exp> := <exp> - <exp> | <exp> . <exp>
| (<exp> ) | <number>
<number> := <number> <digit> | <digit>
<digit> := 0|1|2|3|4|5|6|7|8|9
Using the above context free grammar in BNF to parse the <exp>, 4 – 3 – 2, the parse tree can be
any of the parse trees shown below.

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 11


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

<exp> <exp>

- -
<exp> <exp> <exp> <exp>

<number> - -
<number> <number> <number> <number> <number>
<digit>
<digit> <digit> <digit> <digit> <digit>
4
3 2 4 3 2
Figure 2..3.1 Figure 2.3.2

This means that using the first parse tree, the <exp>, 4 – 3 – 2 is parsed as 4 – (3 – 2), this is
called right associative, while the second parse tree, the <exp>, 4 – 3 – 2 is parsed as
(4 – 3) – 2, this is called left associative. However, using right or left associative to parse the
<exp>, 4 – 3 – 2 leads to ambiguity because the two lead to different result. In order to remove
the ambiguity, we need to redefine the grammar rule:
<exp> := <exp> - <exp> | <exp> . <exp>
| (<exp> ) | <number>
We redefine it using one recursive definition on the left hand side or on the right hand side of the
symbol, -. This leads to left recursive grammar and right recursive grammar, respective.
The right recursive grammar to the above grammar rule is given below as:
<exp> := <number> - <exp> | <exp> . <exp>
| (<exp> ) | <number>
This production rule will parse the <exp>, 4 – 3 – 2 as right associative.
The left recursive grammar to the original production rule for <exp> is given below as:
<exp> := <exp> - <number> | <exp> . <exp>
| (<exp> ) | <number>
This production rule will parse the <exp>, 4 – 3 – 2 as left associative. Therefore, right recursive
or left recursive grammar has removed the ambiguity in a double recursive grammar.
However, the revised context free grammar for the <exp> can be defined as left recursive
grammar, as shown below:
<exp> := <exp> - <term> | <term>
<term> := <term> . <factor> | <factor>
<factor> := ( <exp> ) | <number>
<number> := <number> <digit> | <digit>
<digit> := 0|1|2|3|4|5|6|7|8|9
It can also be defined as right recursive grammar, as shown below:
<exp> := <term> - <exp> | <term>
<term> := <term> . <factor> | <factor>
<factor> := ( <exp> ) | <number>
<number> := <number> <digit> | <digit>
<digit> := 0|1|2|3|4|5|6|7|8|9
Any of the two has removed the ambiguity that existed in the left-right (double) recursive
grammar. The choice depends on what the language designer wants.

Exercise 2.3
Use the revised left recursive and right recursive grammar in BNF, which is shown above to
generate parse tree and abstract syntax tree for each of the following <exp>:

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 12


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

1. 2–5–8–3
2. 43.7 – (34 – 23.76)

2.4 Parsing Techniques


A grammar written in BNF, EBNF or syntax diagram describes the syntax of the string of token
or word of a programming language. It does not describe the actions that a parser must take to
parse a string of token or word correctly. In order to parse a string of token/word of a
programming language, the parser generates parse trees. The simplest form of a parser is called a
recognizer, which is a program that accepts or rejects strings of token/word based on whether it is
legal or not in the programming language. The following are the various methods that a general
purpose parser uses in parsing:

 Bottom Top Parsing


The bottom top method of parsing matches the input string with the right hand side of the
grammar rules, When a match occurs, the right hand side is replaced by, or reduced to, the
non-terminal on the left. It is called bottom up because the parse tree is constructed from
the leaf to the root. It can also be called shift-reduce method.

 Top Down Parsing


In top down parsing method, the non-terminals are expanded to match incoming tokens.
The bottom top method of parsing is more powerful method of parsing than the top down method,
therefore it is usually the preferred method of parsing.

3. Semantic Analysis
3.0 Introduction
This is one of the tasks of the compilation process that determines the meaning of the source code.
One of the fundamental mechanisms in any programming language is the use of names, or
identifiers assigned by the programmer to denote language entities or constructs, like variables,
procedures (methods), constants etc. Therefore, a fundamental step in semantic analysis is to
describe the convention that determines the meaning of such names used in a program. Another
aspect of semantic of a programming language is the concept of location and value. Values are
storable quantities, like integer, real etc, while locations are like memory addresses where values
are stored and retrieved from.

3.1 Meaning of Names in a Program

The meaning of names in a program (variable, procedure (methods), constants) is determined by


the properties or attributes associated to that name. The process of associating an attribute to a
name is called binding. Attributes/properties of names can be classified based on the binding time,
i.e. time during the translation/execution process when the attribute is computed and bound to the
name. Two types of binding time are available, which are:
 Static binding
Static binding occurs before execution of the program.
 Dynamic binding
Dynamic binding occurs during the execution of the program.
Therefore any attribute/property that is bound to the name before execution commences is called
static attribute, while any attribute/property that is bound during program execution is called
dynamic attribute. The binding time can depend on programming language, translator. Static
binding is classified into the following:

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 13


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

 Language definition time


 Language implementation time
 Translator time
This is when static binding occurs at compile time
 Link time
Static binding occurs at link time
 Load time
Static binding occurs at load time.

3.2 Symbol Table


Binding of attributes/properties to names in a program must be maintained and managed by a
translator so that appropriate meanings are given to names during translation and execution. A
translator maintains and manages binding using a data structure, called symbol table. Since we are
not interested in the detail of the data structure but only on its properties, we can consider it as a
function that expresses the binding of attributes to names. This function is a fundamental part of
language semantics, and it is called symbol table. Mathematically, symbol table is a function from
names to attributes, which we could write as:
SymbolTable: Names Attributes. Graphically, a symbol table can be written as:
SymbolTable
Names Attributes
The way a symbol table is maintained by an interpreter differs from the way it is maintained by a
compiler. This is because a compiler does static binding while an interpreter does dynamic
binding. Therefore, the symbol table for a compiler can be pictured as follows:
Symbol Table
Names Static Attributes
A symbol table can be implemented by any number of data structure, like list, tree, table etc.
However, to maintain the property of scope of variable names, methods/procedures, stack data
structure has been widely used to implement symbol table. Variable names and procedure names
can be associated to two or more declarations in a scope and types language, like Pascal and Java,
therefore, the Symbol table for such variable names and method/procedure name that have scope
property can change during the execution of the program. When it changes, the current property
of a variable name or method name will be the active one on top of the stack. The following
Pascal program will be used to illustrate further:

program example
var x : integer;
y : boolean;

procedure p;
var x : boolean;

procedure q;
var y : integer;

begin (* q *)

...
end; (* q *)

begin (* p *)

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 14


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

...
end (* p *)

begin (* main *)

...
end (* main *)

The names in the above program are example, x, y, p and q, but x and y are associated with two
different declarations with different scopes. After the processing of the declaration of p, the
symbol table can be represented as:
Names Attributes Binding

x Boolean, local to p integer, global

y Boolean, global

p procedure, global

However, after the processing of the declaration q, the stack that is used to maintain the symbol
table will change as follows:

Names Attribute Binding

x boolean, local to p integer, global

y Integer, local to q boolean, global

p procedure, global

q procedure, local to p
.
After the processing of the processing of the body of q, during the processing of the body of p, the
symbol table, which is implemented as a stack looks as shown below:

x boolean, local to p integer, global

y Boolean, global

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 15


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

p procedure, global

q procedure, local to p

Finally, after the processing/execution of the procedure, p, the symbol table is shown below:
Names Attributes Binding

x integer, global

y boolean, global

p procedure, global

Exercise
Consider the following Pascal program:

program example
var x : integer;
y : boolean;

procedure p;
var x : Boolean;

procedure q;
var y : integer;

procedure r;
var y : boolean;

begin (* r *)

...
end; (* r *)
begin (* q *)

...
end; (* q *)

begin (* p *)

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 16


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

...
end (* p *)

begin (* main *)

...
end (* main *)

1. Show the symbol table after the declaration of the procedure r.


2. Show the symbol table after the processing of the declaration r.
3. Show the symbol table after the processing of the declaration q.
4. +
5. Show the symbol table after the processing of the declaration p.

3.3 Environment: Allocation of Storage Location to Names


Storage location is one of the attributes of names, in particular, variable names. During the
execution of compiled program, a compiler generates code that maintains the location attribute in
data structure. The binding of names to storage location is called Environment. Like symbol table,
it can be considered as a function that maps a name to its storage locations. Like symbol table, it
can be expressed mathematically as:
Environment
Names Locations
In block structure language with scope and visibility, a variable name can have more than one
storage location during the execution of the program, but only one location can be active at any
time. The compiler of such programming language can use the stack data structure, like symbol
table to maintain the various storage locations of a variable name. The location that is active at
any time will be at the top of the stack. At the beginning of a program without any procedure will
have a simple stack based environment with one record as activation record to show the locations
of all the global variables of the program, as shown below in figure 3.3a:

program eg x environment pointer


var x, y : integer; global variables
y

begin
free memory
...
end
Figure 3.3a stack data structure maintaining the
environment of the program.
A pointer called an environment pointer is used to maintain the current environment in the stack
data structure.
In a program that contains many procedures that are not nested, as shown below:

program eg1

procedure p;

begin

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 17


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

...
end;

procedure q;

begin
p;
end;
begin
q;
end.

In the above program, each call of a procedure is called activation. Whenever there is an
activation, an activation record is created and pushed into the stack based environment. Each
activation record will contain the storage locations of all the variables of that procedure.
Whenever an activation record is created and pushed into the stack based environment, the
environment pointer will point to the new activation record. When the procedure finishes
execution, the current activation record will be popped out of the stack based environment and the
environment pointer will point to the previous activation record that called the current activation
pointer. Therefore, to make this to happen, the current activation record must maintain a pointer to
the activation record that called the current activation record, this pointer is called control link.
Therefore, each activation record that is not the first record that was pushed into the stack based
environment will contain the storage locations of all the local and global variables to that
procedure, together with the control link.
At the beginning of execution of program eg1, the stack based environment will be the same as
figure 3.3a. However, after the call to procedure q, an activation record will be pushed into the
stack based environment, as shown in figure 3.3b.

Global variables
of eg1

Environment pointer
Global and local
Variables of q. Activation record of q
Control link

Free memory

Figure 3.3b Stack based activation record after the call to procedure q.

Furthermore, when p is called from q, an activation record for p is pushed into the stack based
environment, as shown in figure 3.3c below.

Global variables
of eg1

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 18


October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

Control link Activation record of q

Activation record of p
Control link

Free memory

Figure 3.3c: Stack based environment when p is called inside q.


We have said that the activation record of a procedure stores the storage locations of all the local
and global variables of that procedure. In order for the compiler to determine the global variables
of a procedure, the stack based environment uses a pointer called access link, which links to the
activation record of a procedure that contains the global variable.
The following program illustrates further:

program eg2

var x : integer;

procedure p( y : integer)
var i : integer;
b : boolean;
begin
i := x;
end;

procedure q( a : integer)
var x : integer;
begin
p(1);
end;

begin
q(2);
end;

The compiler will maintain the following stack based environment at various stages of execution
of the program

Department of Computer Science/Mathematics,


q GOUNI, Enugu. Staff No: 00721 Page 19
October 17, 2017 [COMPILER CONSTRUCTION II] O. E. Oguike

Global environment

a
x Activation record of q
control link
access link

y
i
b Activation record of p
control link
access link

Free memory

References

Torben Ægidius Mogensen, (2000 - 2010), Basics of Compiler Design, Torben Ægidius
Mogensen 2000 – 2010

William M. Waite, Gerhard Goos, (1994), Compiler Construction, Springer-Verlag, Berlin, New
York Inc

Kenneth C. Louden, (1993), Programming Languages: Principles and Practice, PWS publishing
company.

Department of Computer Science/Mathematics, GOUNI, Enugu. Staff No: 00721 Page 20

You might also like