0% found this document useful (0 votes)
17 views

Compiler Construction Notes

compiler construction notes pdf

Uploaded by

Tehreem Shahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Compiler Construction Notes

compiler construction notes pdf

Uploaded by

Tehreem Shahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Compiler Construction

Lecture 03
Symbol Table
Content Taken From
• Compilers Principles, Techniques and Tools by Alfred V. Aho, Ravi Sethi and Jeffry D. Ulman

• Basics of Compiler Design, Anniversary edition by Torben Ægidius Mogensen

Web URL:
http://syllabus.cs.manchester.ac.uk/ugt/2019/COMP36512/2017/
Agenda
01 Symbol Table

02 Who Creates a Symbol Table

03 Scope of Symbol Table

04 Organizing the Symbol Table


Symbol Table
Symbol tables are data structures that are used by compilers to hold infor
mation
about source-program constructs.

The information is collected incrementally by the analysis phases of a com


piler and used by the synthesis phases to generate the target code.

Entries in the symbol table contain information about an identifier such as i


ts character string (or lexeme) , its type, its position in storage, and any ot
her relevant information.
Symbol Table
A symbol table is a table that binds names to information. We need a number of operations on symbol tabl
es to accomplish this:

• We need an empty symbol table, in which no name is defined.

• We need to be able to bind a name to a piece of information. In case the name is already defined in the s
ymbol table, the new binding takes precedence over the old.

• We need to be able to look up a name in a symbol table to find the information the name is bound to. If th
e name is not defined in the symbol table, we need to be told that.

• We need to be able to enter a new scope.

• We need to be able to exit a scope, reestablishing the symbol table to what it was before the scope was
entered.
Who creates Symbol Table Entities?
Symbol-table entries are created and used during the analysis phase by the lexical
analyzer, the parser, and the semantic analyzer.

In some cases, a lexical analyzer can create a symbol-table entry as soon as it se


es the characters that make up a lexeme.
More often, the lexical analyzer can only return to the parser a token, say id, alon
g with a pointer to the lexeme.
Only the parser, however, can decide whether to use a previously created symbol-
table entry or create a new one for the identifier.
Symbol Table Per Scope
The term "scope of identifier x really refers to the scope of a particular declaration
of x.
The term scope by itself refers to a portion of a program that is the scope of one or
more declarations.
Scopes are important, because the same identifier can be declared for different
purposes in different parts of a program.
If blocks can be nested, several declarations of the same identifier can appear
within a single block. The following syntax results in nested blocks when stmts
can generate a block:

block -+ '(I decls stmts '3'


Scope Example
What information is stored in the symbol table?
What items to enter in the symbol table?
Variable names; defined constants; procedure and function names; literal constant
s and strings; source text labels; compiler-generated temporaries.

What kind of information might the compiler need about each item:
textual name, data type, declaring procedure, storage information. Depending on t
he type of the object, the compiler may want to know list of fields (for structures), n
umber of parameters and types (for functions), etc…
In practice, many different tables may exist.
Symbol table information is accessed frequently:
hence, efficiency of access is critical!
Organising the symbol table
Linear List:
Simple approach, has no fixed size; but inefficient: a lookup may need to traverse the e
ntire list: this takes O(n).
Binary tree:
An unbalanced tree would have similar behaviour as a linear list (this could arise if sym
bols are entered in sorted order).
A balanced tree (path length is roughly equal to all its leaves) would take O(log2n) probe
s per lookup (worst-case). Techniques exist for dynamically rebalancing trees.
Hash table:
Uses a hash function, h, to map names into integers; this is taken as a table index to st
ore information. Potentially O(1), but needs inexpensive function, with good mapping pr
operties, and a policy to handle cases when several names map to the same single ind
ex.
Bucket hashing (open hashing)
A hash table consisting of a fixed array of m p
ointers to table entries.
foo... qq...
Table entries are organised as separate linke
d lists called buckets.
Use the hash function to obtain an integer fro
m 0 to m-1.
As long as h distributes names fairly uniforml i...
y (and the number of names is within a small
constant factor of the number of buckets), bu
cket hashing behaves reasonably well.
Linear Rehashing (open addressing)
Use a single large table to hold records. Whe a
n a collision is encountered, use a simple tec
hnique (i.e., add a constant) to compute subs
equent indices into the table until an empty sl rehash
ot is found or the table is full. If the constant i b
s relatively prime to the table size, this, event
ually, will check every slot in the table.

Disadvantages: too many collisions may degr


ade performance. Expanding the table may n
ot be straightforward.
If h(a)=h(b) rehash
(say, add 3).
Linear Rehashing (open addressing)
Choosing a good hash function is of paramount importance:
take the hash key in four-byte chunks, XOR the chunks together and take thi
s number modulo 2048 (this is the symbol table size). What is the problem wi
th this?
See the universal hashing function (Cormen, Leiserson, Rivest), Knuth’s mult
iplicative function… This is one of those cases we should pay attention to the
ory!
Linear Rehashing (open addressing)
Lexical scoping:
Many languages introduce independent name scopes:
C, for example, may have global, static (file), local and block scopes.
Pascal: nested procedure declarations
C++, Java: class inheritance, nested classes
C++, Java, Modula: packages, namespaces, modules, etc…
Namespaces allow two different entities to have the same name within t
he same scope: E.g.: In Java, a class and a method can have the same
name (Java has six namespaces: packages, types, fields, methods, loca
l variables, labels)

The problems:
at point x, which declaration of variable y is current?
as parser goes in and out of scopes, how does it track y? allocate and initialise a
symbol table for each level!
Exercises
Explain why we use Symbol Table?

Give an example of Bucket Hashing?


Thank you

You might also like