0% found this document useful (0 votes)
34 views8 pages

SE Compiler Chapter 5-Type Checking and Symbol Table

This document discusses type checking and symbol tables in compiler design. It defines what a type system and type checking are, and explains that different programming languages have different type systems to balance safety, expressiveness and other priorities. Type checking verifies that operations respect the language's type rules, and can catch errors staticlly at compile-time or dynamically at run-time. Strongly typed languages aim to prevent all type errors through static or dynamic checking.

Uploaded by

mikiberhanu41
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views8 pages

SE Compiler Chapter 5-Type Checking and Symbol Table

This document discusses type checking and symbol tables in compiler design. It defines what a type system and type checking are, and explains that different programming languages have different type systems to balance safety, expressiveness and other priorities. Type checking verifies that operations respect the language's type rules, and can catch errors staticlly at compile-time or dynamically at run-time. Strongly typed languages aim to prevent all type errors through static or dynamic checking.

Uploaded by

mikiberhanu41
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Principles of Compiler Design – CENG 2042

Chapter Five – Type Checking and Symbol Table


Software Engineering Department, School of Computing
Ethiopian Institute of Technology – Mekelle (EiT-M), Mekelle University

5.1. Type Checking

Type systems are the biggest point of variation across programming languages. Even languages that look
similar are often greatly different when it comes to their type systems.

Definition: A type system is a set of types and type constructors (integers, arrays, classes, etc.) along with the
rules that govern whether or not a program is legal with respect to types (i.e., type checking).

For example, C++ and Java have similar syntax and the control structures. They even have a similar set of
types (classes, arrays, etc.). However, they differ greatly with respect to the rules that determine whether or
not a program is legal with respect to types. As an extreme example, one can do this in C++ but not in Java:

int x = (int) “Hello”;

In other words, Java’s type rules do not allow the above statement but C++’s type rules do allow it.

Why do different languages use different type systems? The reason for this is that there is no one perfect type
system. Each type system has its strengths and weaknesses. Thus, different languages use different type
systems because they have different priorities. A language designed for writing operating systems is not
appropriate for programming the web; thus they will use different type systems. When designing a type
system for a language, the language designer needs to balance the tradeoffs between execution efficiency,
expressiveness, safety, simplicity, etc.

The need for a type system


Some older languages and all machine assembly languages do not have a notion of type. A program simply
operates on memory, which consists of bytes or words. An assembly language may distinguish between
“integers” and “floats”, but that is primarily so that the machine knows whether it should use the floating
point unit or the integer unit to perform a calculation. In this type-less world, there is no type checking. There
is nothing to prevent a program from reading a random word in memory and treating it as an address
or a string or an integer or a float or an instruction.

The ability to grab a random word from memory and interpret it as if it were a specific kind of value (e.g.,
integer) is not useful most of the time even though it may be useful sometimes (e.g., when writing low-level
code such as device drivers). Thus even assembly language programmers use conventions and practices to
organize their program’s memory. For example, a particular block of memory may (by convention) be
treated as an array of integers and another block a sequence of instructions. In other words, even in a type-
less world, programmers try to impose some type structure upon their data. However, in a type-less world,
there is no automatic mechanism for enforcing the types. In other words, a programmer could mistakenly
write and run code that makes no sense with respect to types (e.g., adding 1 to an instruction). This mistake
may manifest itself as a crash when the program is run or may silently corrupt values and thus the program
produces an incorrect answer. To avoid these kinds of problems, most languages today have a rich type
system. Such a type system has types and rules that try to prevent such problems. Some researchers have also
proposed typed assembly languages recently; these would impose some kinds of type checking on assembly
language programs!

The rules in a given type system allows some programs and disallows other programs. For example, unlike
assembly language, one cannot write a program in Java that grabs a random word from memory and treats it

Ins: Fkrezgy Yohannes Compiler Design 1|Page


as if it were an integer. Thus, every type system represents a tradeoff between expressiveness and safety. A
type system whose rules reject all programs is completely safe but inexpressive: the compiler will
immediately reject all programs and thus also all the “unsafe” programs. A type system whose rules accept
all programs (e.g., assembly language) is very expressive but unsafe (someone may write a program that
writes random words all over memory).

What exactly are types and type checking?


We have been talking about types and type checking but have not defined it formally. We will now use a
notion of types from theory that is very helpful in understanding how types work.

Definition: A type is a set of values.

A boolean type is the set that contains True and False. An integer type is the set that contains all integers
from minint to maxint (where minint and maxint may be defined by the type system itself, as in Java, or by
the underlying hardware, as in C). A float type contains all floating point values. An integer array of length
2 is the set that contains all length-2 sequences of integers. A string type is the set that contains all strings.
For some types the set may be enumerable (i.e., we can write down all the members of the set, such as True
and False). For other types type set may not be enumerable or at least be very hard to enumerate (e.g.,
String).

When we declare a variable to be of a particular type, we are effectively restricting the set of values that the
variable can hold. For example, if we declare a variable to be of type Boolean, (i) we can only put True or
False into the variable; and (ii) when we read the contents of the variable it must be either True or False.

There are three categories of types in most programming languages:

Base types int, float, double, char, bool, etc. These are the primitive types provided directly by
the underlying hardware. There may be a facility for user-defined variants on the base types (such as
C enums).

Compound types arrays, pointers, records, structs, unions, classes, and so on. These types are
constructed as aggregations of the base types and simple compound types.

Complex types lists, stacks, queues, trees, heaps, tables, etc. You may recognize these as abstract data
types. A language may or may not have support for these sort of higher-level abstractions.

Function declarations or prototypes serve a similar purpose for functions that variable declarations do for
variables. Function and method identifiers also have a type, and the compiler can use ensure that a program is
calling a function/method correctly. The compiler uses the prototype to check the number and types of
arguments in function calls. The location and qualifiers establish the visibility of the function (Is the function
global? Local to the module? Nested in another procedure? Attached to a class?) Type declarations (e.g., C
typedef, C++ classes) have similar behaviors with respect to declaration and use of the new typename.

Now that we have defined what a type is, we can define what it means to do type checking.
Definition: Type checking checks and enforces the rules of the type system to prevent type errors
from happening. (OR Type checking is the process of verifying that each operation executed in a
program respects the type system of the language.)
Definition: A type error happens when an expression produces a value outside the set of values it is
supposed to have.

For example, consider the following assignment:


VAR s:String;

Ins: Fkrezgy Yohannes Compiler Design 2|Page


s := 1234;

Since s has type String, it can only hold values of type String. Thus, the expression on the right-hand-side of
the assignment must be of type String. However, the expression evaluates to the value 1234 which does not
belong to the set of values in String. Thus the above is a type error.

Consider another example:


VAR b: Byte;
b := 12345;

Since b has type Byte, it can only hold values from -128 to 127. Since the expression on the right-hand-side
evaluates to a value that does not belong to the set of values in Byte, this is also a type error.

Definition: Strong type checking prevents all type errors from happening. The checking may happen
at compile time or at run time or partly at compile time and partly at run time.
Definition: A strongly-typed language is one that uses strong type checking.
Definition: Weak type checking does not prevent type errors from happening.
Definition: A weakly-typed language is one that uses weak type checking.

Static and Dynamic Checking of Types

Static type checking is done at compile-time. The information the type checker needs is obtained via
declarations and stored in a master symbol table. After this information is collected, the types involved in
each operation are checked. It is very difficult for a language that only does static type checking to meet the
full definition of strongly typed. Even motherly old Pascal, which would appear to be so because of its use of
declarations and strict type rules, cannot find every type error at compile time. This is because many type
errors can sneak through the type checker. For example, if a and b are of type int and we assign very large
values to them, a * b may not be in the acceptable range of ints, or an attempt to compute the ratio
between two integers may raise a division by zero. These kinds of type errors usually cannot be detected at
compile time. C makes a somewhat paltry attempt at strong type checking—things as the lack of array
bounds checking, no enforcement of variable initialization or function return create loopholes. The typecast
operation is particularly dangerous. By taking the address of a location, casting to something inappropriate,
dereferencing and assigning, you can wreak havoc on the type rules. The typecast basically suspends type
checking, which, in general, is a pretty risky thing to do.

Dynamic type checking is implemented by including type information for each data location at runtime. For
example, a variable of type double would contain both the actual double value and some kind of tag
indicating "double type". The execution of any operation begins by first checking these type tags. The
operation is performed only if everything checks out. Otherwise, a type error occurs and usually halts
execution. For example, when an add operation is invoked, it first examines the type tags of the two operands
to ensure they are compatible. LISP is an example of a language that relies on dynamic type checking.
Because LISP does not require the programmer to state the types of variables at compile time, the compiler
cannot perform any analysis to determine if the type system is being violated. But the runtime type system
takes over during execution and ensures that type integrity is maintained. Dynamic type checking clearly
comes with a runtime performance penalty, but it usually much more difficult to subvert and can report errors
that are not possible to detect at compile-time.

Designing a Type Checker


When designing a type checker for a compiler, here’s the process:
1. identify the types that are available in the language
2. identify the language constructs that have types associated with them
3. identify the semantic rules for the language

Ins: Fkrezgy Yohannes Compiler Design 3|Page


Specification of simple type checker
1. Type checking of Expression
In the following rules, the attribute type for E gives the type expression assigned to the expression
generated by E.

a. E → literal {E.type = char}


E → num {E.type = integer}

Here, constants represented by the tokens literal and num have type of char and integer.

b. E → id {E.type = lookup (id.entry)}

Lookup(e) is used to fetch the type saved in the symbol table entry pointed to by e.

c. E → E1 mod E2 {E.type = if E1.type = integer and E2.type = integer Then integer else
type_error}

The expression formed by applying the mod operator to two sub-expressions of type integer has
type integer; otherwise, its type is type_error

d. E → E1[E2] { E.type = if E2.type = integer and E1. type = array(s,t) then t else
type_error}

In an array reference E1[E2], the index expression E2 must have type integer. The result is the
element type t obtained from the type array(s,t) of E1.

e. E → E1 ↑ { E.type = if E1.type = pointer (t) then t else type_error }

The postfix operator ↑ yields the object pointed to by its operand. The type of E ↑ is the type t of
the object pointed to by the pointer E.

2. Type checking of Statements


Statements do not have values; hence the basic type void can be assigned to them. If an error is
detected within a statement, then type_error is assigned.

Translation Scheme (SDT) for checking the type of statements:

a. Assignment Statement

S → id = E {S.type = if id.type =E.type then void else type_error}

b. Conditional Statements

S → if E then S1 { S.type = if E.type = boolean then S1.type else type_error }

c. While Statements

S → while E do S1 { S.type = if E.type = boolean then S1.type else type_error }

Ins: Fkrezgy Yohannes Compiler Design 4|Page


3. Type checking of functions
The rule for checking the type of a function application is:

a. E → E1(E2) { E.type = if E2.type = s and E1. type = s then t else type_error}

Example: Consider the following SDT and draw the annotated parse tree for the input (2 + 3) == 8

E → E1 + E2 {if ((E1.type = E2.type) and (E1.type = int)) then E.type = int else error;}
E → E1 = = E2 {if ((E1.type = E2.type) and (E1.type = int/bool)) then E.type = bool else error;}
E → (E1) {E.type = E1.type;}
E → num {E.type = int;}
E → true {E.type = bool;}
E → false {E.type = bool;}

Solution:
E.type = bool
E

E.type = int
E E.type = int == E

E E.type = int num = 8


( )

E E.type = int
E E.type = int +

num = 2 num = 3

4.2. Symbol Table

Symbol table is an important data structure created and maintained by compilers in order to store information
about the occurrence of various entities such as variable names, function names, objects, classes, interfaces,
etc. Symbol table is used by both the analysis and the synthesis parts of a compiler.

A symbol table may serve the following purposes depending upon the language in hand:
· To store the names of all entities in a structured form at one place.
· To verify if a variable has been declared.
· To implement type checking, by verifying assignments and expressions in the source code are
semantically correct.
· To determine the scope of a name (scope resolution).

We will store the following information about identifiers.


· The name (as a string).
· The data type.
· The block level.
· Its scope (global, local, or parameter).
· Its offset from the base pointer (for local variables and parameters only).

Ins: Fkrezgy Yohannes Compiler Design 5|Page


Operations of Symbol Table
insert()
This operation is more frequently used by analysis phase, i.e., the first half of the compiler where tokens are
identified and names are stored in the table. This operation is used to add information in the symbol table
about unique names occurring in the source code. The format or structure in which the names are stored
depends upon the compiler in hand.

An attribute for a symbol in the source code is the information associated with that symbol. This information
contains the value, state, scope, and type about the symbol. The insert() function takes the symbol and its
attributes as arguments and stores the information in the symbol table.

For example:
int a;

should be processed by the compiler as:

insert(a, int);

lookup()
lookup() operation is used to search a name in the symbol table to determine:
· if the symbol exists in the table.
· if it is declared before it is being used.
· if the name is used in the scope.
· if the symbol is initialized.
· if the symbol declared multiple times.

The format of lookup() function varies according to the programming language. The basic format should
match the following:

lookup(symbol)

This method returns 0 (zero) if the symbol does not exist in the symbol table. If the symbol exists in the
symbol table, it returns its attributes stored in the table.

delete(s):
deletes s from the table (or, typically, hides it).

4.2.1. Symbol Table Implementation

· Each entry in a symbol table can be implemented as a record that consists of several fields.
· The entries in symbol table records are not uniform and depends on the program element identified
by the name.
· Some information about the name may be kept outside of the symbol table record and/or some fields
of the record may be left vacant for the reason of uniformity. A pointer to this information may be
stored in the record.
· The name may be stored in the symbol table record itself, or it can be stored in a separate array of
characters and a pointer to it in the symbol table.

Symbol table organization

Ins: Fkrezgy Yohannes Compiler Design 6|Page


• There are various approaches to symbol table organization.
a. Linear List
b. Binary Search Tree
c. Hash Table

a. Linear List

• It is the simplest approach in symbol table organization.


• The new names are added to the table in the order they arrive.
• Whenever a new name is to be added to the table:
§ The table is first searched linearly or sequentially to check whether or not the name is already
present in the table.
§ If the name is not present, then the record for new name is created and added to the list at a
position specified by the available pointer. (look on the picture)
• To retrieve the information about the name, the table is searched sequentially, starting from the first record
in the table.
• The average number of comparisons, p, required are proportional to p=0.5* (n+1) for successful search
and p=n for an unsuccessful search, where n=number of entries in the table.

Advantage: It takes less space, and additions to the table are simple.
Disadvantage: It has a higher accessing time.

b. Binary Search Tree

• It is more efficient than Linear list.


• We provide two links – left and right, which point to record in the search tree.
• A new name is added at a proper location in the tree such that it can be accessed alphabetically.
• For any node name1 in the tree, all names accessible by following the left link precede name1 alphabetically.
• Similarly, for any node name1 in the tree, all names accessible by following the right link succeed name1
alphabetically.
• The time for adding/searching a name is proportional to (m+n) log2 n.
• The property of this tree is,
§ All names (Name ‘j’) accessible from Name ‘i' by using the LEFT ‘i’ is always lesser than NAME
‘i’. i.e., Name ‘j’ < Name ‘i’

Ins: Fkrezgy Yohannes Compiler Design 7|Page


§ All names (Name ‘k’) accessible from Name ‘i' by using the RIGHT ‘i’ is always greater than
NAME ‘i’. i.e., Name ‘k’ > Name ‘i’.

c. Hash Table

· A hash table is an array with index range: 0 to TableSize – 1


· Most commonly used data structure to implement symbol tables
· Insertion and lookup can be made very fast – O(1)
· A hash function maps an identifier name into a table index A hash function, h(name), should depend solely
on name h(name) should be computed quickly
· h should be uniform and randomizing in distributing names
· All table indices should be mapped with equal probability.
· Similar names should not cluster to the same table index

Ins: Fkrezgy Yohannes Compiler Design 8|Page

You might also like