0% found this document useful (0 votes)
396 views28 pages

Compiler Design Unit 4

Symbol tables are data structures used by compilers to store information about identifiers in source code such as variable names, function names, classes, etc. A symbol table stores each identifier's name, type, and attributes in an entry. Symbol tables allow compilers to check scopes, perform type checking, and resolve identifiers. They can be implemented as linear lists, binary trees, or hash tables with operations like insert and lookup. Symbol tables are organized hierarchically to represent scopes with nested blocks having sub-tables.

Uploaded by

Shubham Dixit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
396 views28 pages

Compiler Design Unit 4

Symbol tables are data structures used by compilers to store information about identifiers in source code such as variable names, function names, classes, etc. A symbol table stores each identifier's name, type, and attributes in an entry. Symbol tables allow compilers to check scopes, perform type checking, and resolve identifiers. They can be implemented as linear lists, binary trees, or hash tables with operations like insert and lookup. Symbol tables are organized hierarchically to represent scopes with nested blocks having sub-tables.

Uploaded by

Shubham Dixit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Symbol Table

•Symbol table is an important data structure created and


maintained by compilers in order to store information
about the occurrence of various entities such as
•variable names
•function names
•objects
•classes
•Interfaces, etc.

•Symbol table is used by both the analysis and the


synthesis parts of a compiler.
Symbol Table
•A symbol table may serve the following purposes
depending upon the language in hand:
•To store the names of all entities in a structured form
at one place.
•To verify if a variable has been declared.
•To implement type checking, by verifying
assignments and expressions in the source code are
semantically correct.
•To determine the scope of a name (scope
resolution).
Symbol Table
•A symbol table is simply a table which can be either
linear or a hash table. It maintains an entry for each
name in the following format:
<symbol name, type, attribute>
For example, if a symbol table has to store
information about the following variable declaration:
static int age;
then it should store the entry such as:
<age, int, static>
Symbol Table:
Implementation
If a compiler is to handle a small amount of data,
then the symbol table can be implemented as an
unordered list, which is easy to code, but it is only
suitable for small tables only. A symbol table can be
implemented in one of the following ways:
•Linear (sorted or unsorted) list
•Binary Search Tree
•Hash table
Among all, symbol tables are mostly implemented
as hash tables, where the source code symbol itself
is treated as a key for the hash function and the
return value is the information about the symbol.
Symbol Table:
Operation
A symbol table, either linear or hash, should provide the following
operations.
insert()
•This operation is more frequently used by analysis phase.
•This operation is used to add information in the symbol table about
unique names occurring in the source code.
•An attribute for a symbol in the source code is the information associated
with that symbol.
•This information contains the value, state, scope, and type about the
symbol.
•The insert() function takes the symbol and its attributes as arguments
and stores the information in the symbol table.
For example:
int a;
insert(a, int);
Symbol Table:
Operation
lookup()
lookup() operation is used to search a name in the symbol
table to determine:
•if the symbol exists in the table.
•if it is declared before it is being used.
•if the name is used in the scope.
•if the symbol is initialized.
•if the symbol declared multiple times.
The format of lookup() function varies according to the
programming language. The basic format should match the
following:
lookup(symbol)
•This method returns 0 (zero) if the symbol does not exist in the
symbol table.
•If the symbol exists in the symbol table, it returns its attributes stored
in the table.
Symbol Table:
Scope Management
A compiler maintains two types of symbol tables:
•global symbol table :
which can be accessed by all the procedures

•scope symbol tables


that are created for each scope in the program.
Symbol Table:
Scope Management
To determine the scope of a name, symbol tables are arranged in
hierarchical structure as shown in the example below:

void pro_one()
void pro_two()
{
{
int one_1;
int two_1;
int one_2;
int two_2;
{
{
int one_3;
int two_3;
int one_4;
int two_4;
}
}
int one_5;
int two_5;
{
{
int one_6;
int two_6;
int one_7;
int two_7;
}
}
}
}
Symbol Table:
Scope Management
To determine the scope of a name, symbol tables are arranged in
hierarchical structure as shown in the example below:
Symbol Table:
Representing Scope Information
In the source program, every name possesses a region of validity, called
the scope of that name.
The rules in a block-structured language are as follows:
1.If a name declared within block B then it will be valid only within B.
2.If B1 block is nested within B2 then the name that is valid for block B2 is
also valid for B1 unless the name's identifier is re-declared in B1.
3.Tables are organized into stack and each table contains the list of names
and their associated attributes.
4.Whenever a new block is entered then a new table is entered onto the
stack. The new table holds the name that is declared as local to this block.
5.When the declaration is compiled then the table is searched for a name.
6.If the name is not found in the table then the new name is inserted.
7.When the name's reference is translated then each table is searched,
starting from the each table on the stack.
Symbol Table:
Storage organization
•When the target program executes Subdivision of Run-time Memory:
then it runs in its own logical address
space in which the value of each
program has a location.

•The logical address space is shared


among the compiler, operating
system and target machine for
management and organization. The
operating system is used to map the
logical address into physical address
which is usually spread throughout
the memory.
Symbol Table:
Activation Record
•Activation record is used to manage the information needed
by a single execution of a procedure.

•An activation record is pushed into the stack when a


procedure is called and it is popped when the control returns
to the caller function.

•When it is called (activation begins) then the procedure


name will push on to the stack and when it returns (activation
ends) then it will popped.

•Control stack is a run time stack which is used to keep track


of the live procedure activations i.e. it is used to find out the
procedures whose execution have not been completed.
Symbol Table:
Activation Record
The diagram below shows the contents of activation records:
It is used by calling procedure to return a value to
calling procedure.

It is used by calling procedures to supply parameters to


the called procedures.
It points to activation record of the caller.

It is used to refer to non-local data held in other


activation records.
It holds the information about status of machine before
the procedure is called.
It holds the data that is local to the execution of the
procedure.
It stores the value that arises in the evaluation of an
expression.
Symbol Table:
Storage Allocation
The different ways to allocate memory are:
1.Static storage allocation
2.Stack storage allocation
3.Heap storage allocation
Symbol Table:
Storage Allocation
Static storage allocation

•In static allocation, names are bound to storage locations.

•If memory is created at compile time then the memory will be


created in static area and only once.

•Static allocation supports the dynamic data structure that means


memory is created only at compile time and deallocated after
program completion.

•The drawback with static storage allocation is that the size and
position of data objects should be known at compile time.
•Another drawback is restriction of the recursion procedure.
Symbol Table:
Storage Allocation
Stack Storage Allocation

•In static storage allocation, storage is organized as a stack.

•An activation record is pushed into the stack when activation begins
and it is popped when the activation end.

•Activation record contains the locals so that they are bound to fresh
storage in each activation record. The value of locals is deleted when
the activation ends.

•It works on the basis of last-in-first-out (LIFO) and this allocation


supports the recursion process.
Symbol Table:
Storage Allocation
Heap Storage Allocation

•Heap allocation is the most flexible allocation scheme.

•Allocation and deallocation of memory can be done at any time and


at any place depending upon the user's requirement.

•Heap allocation is used to allocate memory to the variables


dynamically and when the variables are no more used then claim it
back.

•Heap storage allocation supports the recursion process.


Error Handling
•An important role of the compiler is to report any errors in
the source program that it detects during the entire
translation process.

•Each phases of compiler can encounter errors, after


detecting errors, must be corrected to precede compilation
process.

•The syntax and semantic phases handle large number of


errors in compilation process.

•Error handler handles all types of errors like lexical errors,


syntax errors, semantic errors and logical errors.
Error Handling
Classification of Errors:

ERRORS

COMPILE RUN TIME


TIME

LEXICAL SYNTAX SEMANTIC


ERROR ERROR ERROR
Error Handling
Lexical phase errors
•These errors are detected during the lexical analysis
phase.
•Exceeding length of identifier or numeric constants.
Ex:
void main()
{
int x=10,y=5; 2fx is neither a number nor
char * b an identifier. So this code will
b= &x;
show the lexical error
x= 2fx;
Error Handling
Syntactic phase errors
•These errors are detected during the syntax analysis
phase
•Errors like semicolon missing or unbalanced parenthesis.
•Example: ((a+b* (c-d)). In this statement ) missing after b.

Typical Syntax errors:


•Errors in Structure:
•Missing Operators
•Misspelled Keywords
•Unbalanced Parenthesis
Error Handling
Semantic errors
•These errors are detected during the semantic analysis
phase.
•Data type mismatch errors handled by semantic analyzer.
•Incompatible data type value assignment.
Example: Assigning a string value to integer.

Typical Semantic Errors:


•Incompatible type of operands
•Undeclared variables
•Not matching of actual arguments with formal arguments
Error Handling
Error Recovery:
There are mainly four strategies for Error recovery:
1. Panic Mode
2. Phrase level recovery
3. Error production
4. Global correction
Error Handling
Error Recovery:
1. Panic Mode:
• In this method, the parser discards input symbol one
at a time.
• This process is continued until one of a designated set
of synchronizing tokens are found.
• Synchronizing tokens are delimiters such as
Semicolon , comma, parenthesis. These tokens
indicate an end of the input statement.
• Thus in Panic mode recovery a considerable amount
of input checking it for additional errors.
Error Handling
Error Recovery:
2. Phrase level recovery:
• In this method, on discovering an error parser
performs local correction.
• It can replace a prefix of remaining input by some
string.
• The local correction can be replacing comma by
semicolon, deletion of semicolons or inserting missing
semicolon. This type of local correction is decided by
compiler designer.
• While doing replacement a care should be taken for
not going in an infinite loop.
• This method is used in many error-repairing
compilers.
Error Handling
Error Recovery:
3. Error production
• If we have good knowledge of common errors that
might be encountered, then we can augment the
grammar for the corresponding language with error
productions that generate the erroneous constructs.
• If error production is used during the parsing, we can
generate appropriate error message to indicate the
erroneous construct that has been recognized in the
input.
• This method is extremely difficult to maintain because
if we change the corresponding productions.
Error Handling
Error Recovery:
3. Error production
Ex: Suppose the input string is “abcd”

SX S’SY
XaX / bX/ a / a SX
Ycd XaX / bX/ a / b
Ycd
Error Handling
Error Recovery:
4. Global correction
•The parser examines the whole program and
tries to find out the closest match for it which is
error-free.

•The closest match program has less number of


insertions, deletions, and changes of tokens to
recover from erroneous input.

•Due to high time and space complexity, this


method is not implemented practically.

You might also like