0% found this document useful (0 votes)

34 views8 pages

SE Compiler Chapter 5-Type Checking and Symbol Table

This document discusses type checking and symbol tables in compiler design. It defines what a type system and type checking are, and explains that different programming languages have different type systems to balance safety, expressiveness and other priorities. Type checking verifies that operations respect the language's type rules, and can catch errors staticlly at compile-time or dynamically at run-time. Strongly typed languages aim to prevent all type errors through static or dynamic checking.

Uploaded by

mikiberhanu41

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views8 pages

SE Compiler Chapter 5-Type Checking and Symbol Table

Uploaded by

mikiberhanu41

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Principles of Compiler Design – CENG 2042

Chapter Five – Type Checking and Symbol Table

Software Engineering Department, School of Computing
Ethiopian Institute of Technology – Mekelle (EiT-M), Mekelle University

5.1. Type Checking

Type systems are the biggest point of variation across programming languages. Even languages that look
similar are often greatly different when it comes to their type systems.

Definition: A type system is a set of types and type constructors (integers, arrays, classes, etc.) along with the
rules that govern whether or not a program is legal with respect to types (i.e., type checking).

For example, C++ and Java have similar syntax and the control structures. They even have a similar set of
types (classes, arrays, etc.). However, they differ greatly with respect to the rules that determine whether or
not a program is legal with respect to types. As an extreme example, one can do this in C++ but not in Java:

int x = (int) “Hello”;

In other words, Java’s type rules do not allow the above statement but C++’s type rules do allow it.

Why do different languages use different type systems? The reason for this is that there is no one perfect type
system. Each type system has its strengths and weaknesses. Thus, different languages use different type
systems because they have different priorities. A language designed for writing operating systems is not
appropriate for programming the web; thus they will use different type systems. When designing a type
system for a language, the language designer needs to balance the tradeoffs between execution efficiency,
expressiveness, safety, simplicity, etc.

The need for a type system

Some older languages and all machine assembly languages do not have a notion of type. A program simply
operates on memory, which consists of bytes or words. An assembly language may distinguish between
“integers” and “floats”, but that is primarily so that the machine knows whether it should use the floating
point unit or the integer unit to perform a calculation. In this type-less world, there is no type checking. There
is nothing to prevent a program from reading a random word in memory and treating it as an address
or a string or an integer or a float or an instruction.

The ability to grab a random word from memory and interpret it as if it were a specific kind of value (e.g.,
integer) is not useful most of the time even though it may be useful sometimes (e.g., when writing low-level
code such as device drivers). Thus even assembly language programmers use conventions and practices to
organize their program’s memory. For example, a particular block of memory may (by convention) be
treated as an array of integers and another block a sequence of instructions. In other words, even in a type-
less world, programmers try to impose some type structure upon their data. However, in a type-less world,
there is no automatic mechanism for enforcing the types. In other words, a programmer could mistakenly
write and run code that makes no sense with respect to types (e.g., adding 1 to an instruction). This mistake
may manifest itself as a crash when the program is run or may silently corrupt values and thus the program
produces an incorrect answer. To avoid these kinds of problems, most languages today have a rich type
system. Such a type system has types and rules that try to prevent such problems. Some researchers have also
proposed typed assembly languages recently; these would impose some kinds of type checking on assembly
language programs!

The rules in a given type system allows some programs and disallows other programs. For example, unlike
assembly language, one cannot write a program in Java that grabs a random word from memory and treats it

Ins: Fkrezgy Yohannes Compiler Design 1|Page

as if it were an integer. Thus, every type system represents a tradeoff between expressiveness and safety. A
type system whose rules reject all programs is completely safe but inexpressive: the compiler will
immediately reject all programs and thus also all the “unsafe” programs. A type system whose rules accept
all programs (e.g., assembly language) is very expressive but unsafe (someone may write a program that
writes random words all over memory).

What exactly are types and type checking?

We have been talking about types and type checking but have not defined it formally. We will now use a
notion of types from theory that is very helpful in understanding how types work.

Definition: A type is a set of values.

A boolean type is the set that contains True and False. An integer type is the set that contains all integers
from minint to maxint (where minint and maxint may be defined by the type system itself, as in Java, or by
the underlying hardware, as in C). A float type contains all floating point values. An integer array of length
2 is the set that contains all length-2 sequences of integers. A string type is the set that contains all strings.
For some types the set may be enumerable (i.e., we can write down all the members of the set, such as True
and False). For other types type set may not be enumerable or at least be very hard to enumerate (e.g.,
String).

When we declare a variable to be of a particular type, we are effectively restricting the set of values that the
variable can hold. For example, if we declare a variable to be of type Boolean, (i) we can only put True or
False into the variable; and (ii) when we read the contents of the variable it must be either True or False.

There are three categories of types in most programming languages:

Base types int, float, double, char, bool, etc. These are the primitive types provided directly by
the underlying hardware. There may be a facility for user-defined variants on the base types (such as
C enums).

Compound types arrays, pointers, records, structs, unions, classes, and so on. These types are
constructed as aggregations of the base types and simple compound types.

Complex types lists, stacks, queues, trees, heaps, tables, etc. You may recognize these as abstract data
types. A language may or may not have support for these sort of higher-level abstractions.

Function declarations or prototypes serve a similar purpose for functions that variable declarations do for
variables. Function and method identifiers also have a type, and the compiler can use ensure that a program is
calling a function/method correctly. The compiler uses the prototype to check the number and types of
arguments in function calls. The location and qualifiers establish the visibility of the function (Is the function
global? Local to the module? Nested in another procedure? Attached to a class?) Type declarations (e.g., C
typedef, C++ classes) have similar behaviors with respect to declaration and use of the new typename.

Now that we have defined what a type is, we can define what it means to do type checking.
Definition: Type checking checks and enforces the rules of the type system to prevent type errors
from happening. (OR Type checking is the process of verifying that each operation executed in a
program respects the type system of the language.)
Definition: A type error happens when an expression produces a value outside the set of values it is
supposed to have.

For example, consider the following assignment:

VAR s:String;

Ins: Fkrezgy Yohannes Compiler Design 2|Page

s := 1234;

Since s has type String, it can only hold values of type String. Thus, the expression on the right-hand-side of
the assignment must be of type String. However, the expression evaluates to the value 1234 which does not
belong to the set of values in String. Thus the above is a type error.

Consider another example:

VAR b: Byte;
b := 12345;

Since b has type Byte, it can only hold values from -128 to 127. Since the expression on the right-hand-side
evaluates to a value that does not belong to the set of values in Byte, this is also a type error.

Definition: Strong type checking prevents all type errors from happening. The checking may happen
at compile time or at run time or partly at compile time and partly at run time.
Definition: A strongly-typed language is one that uses strong type checking.
Definition: Weak type checking does not prevent type errors from happening.
Definition: A weakly-typed language is one that uses weak type checking.

Static and Dynamic Checking of Types

Static type checking is done at compile-time. The information the type checker needs is obtained via
declarations and stored in a master symbol table. After this information is collected, the types involved in
each operation are checked. It is very difficult for a language that only does static type checking to meet the
full definition of strongly typed. Even motherly old Pascal, which would appear to be so because of its use of
declarations and strict type rules, cannot find every type error at compile time. This is because many type
errors can sneak through the type checker. For example, if a and b are of type int and we assign very large
values to them, a * b may not be in the acceptable range of ints, or an attempt to compute the ratio
between two integers may raise a division by zero. These kinds of type errors usually cannot be detected at
compile time. C makes a somewhat paltry attempt at strong type checking—things as the lack of array
bounds checking, no enforcement of variable initialization or function return create loopholes. The typecast
operation is particularly dangerous. By taking the address of a location, casting to something inappropriate,
dereferencing and assigning, you can wreak havoc on the type rules. The typecast basically suspends type
checking, which, in general, is a pretty risky thing to do.

Dynamic type checking is implemented by including type information for each data location at runtime. For
example, a variable of type double would contain both the actual double value and some kind of tag
indicating "double type". The execution of any operation begins by first checking these type tags. The
operation is performed only if everything checks out. Otherwise, a type error occurs and usually halts
execution. For example, when an add operation is invoked, it first examines the type tags of the two operands
to ensure they are compatible. LISP is an example of a language that relies on dynamic type checking.
Because LISP does not require the programmer to state the types of variables at compile time, the compiler
cannot perform any analysis to determine if the type system is being violated. But the runtime type system
takes over during execution and ensures that type integrity is maintained. Dynamic type checking clearly
comes with a runtime performance penalty, but it usually much more difficult to subvert and can report errors
that are not possible to detect at compile-time.

Designing a Type Checker

When designing a type checker for a compiler, here’s the process:
1. identify the types that are available in the language
2. identify the language constructs that have types associated with them
3. identify the semantic rules for the language

Ins: Fkrezgy Yohannes Compiler Design 3|Page

Specification of simple type checker
1. Type checking of Expression
In the following rules, the attribute type for E gives the type expression assigned to the expression
generated by E.

a. E → literal {E.type = char}

E → num {E.type = integer}

Here, constants represented by the tokens literal and num have type of char and integer.

b. E → id {E.type = lookup (id.entry)}

Lookup(e) is used to fetch the type saved in the symbol table entry pointed to by e.

c. E → E1 mod E2 {E.type = if E1.type = integer and E2.type = integer Then integer else
type_error}

The expression formed by applying the mod operator to two sub-expressions of type integer has
type integer; otherwise, its type is type_error

d. E → E1[E2] { E.type = if E2.type = integer and E1. type = array(s,t) then t else
type_error}

In an array reference E1[E2], the index expression E2 must have type integer. The result is the
element type t obtained from the type array(s,t) of E1.

e. E → E1 ↑ { E.type = if E1.type = pointer (t) then t else type_error }

The postfix operator ↑ yields the object pointed to by its operand. The type of E ↑ is the type t of
the object pointed to by the pointer E.

2. Type checking of Statements

Statements do not have values; hence the basic type void can be assigned to them. If an error is
detected within a statement, then type_error is assigned.

Translation Scheme (SDT) for checking the type of statements:

a. Assignment Statement

S → id = E {S.type = if id.type =E.type then void else type_error}

b. Conditional Statements

S → if E then S1 { S.type = if E.type = boolean then S1.type else type_error }

c. While Statements

S → while E do S1 { S.type = if E.type = boolean then S1.type else type_error }

Ins: Fkrezgy Yohannes Compiler Design 4|Page

3. Type checking of functions
The rule for checking the type of a function application is:

a. E → E1(E2) { E.type = if E2.type = s and E1. type = s then t else type_error}

Example: Consider the following SDT and draw the annotated parse tree for the input (2 + 3) == 8

E → E1 + E2 {if ((E1.type = E2.type) and (E1.type = int)) then E.type = int else error;}
E → E1 = = E2 {if ((E1.type = E2.type) and (E1.type = int/bool)) then E.type = bool else error;}
E → (E1) {E.type = E1.type;}
E → num {E.type = int;}
E → true {E.type = bool;}
E → false {E.type = bool;}

Solution:
E.type = bool
E

E.type = int
E E.type = int == E

E E.type = int num = 8

( )

E E.type = int
E E.type = int +

num = 2 num = 3

4.2. Symbol Table

Symbol table is an important data structure created and maintained by compilers in order to store information
about the occurrence of various entities such as variable names, function names, objects, classes, interfaces,
etc. Symbol table is used by both the analysis and the synthesis parts of a compiler.

A symbol table may serve the following purposes depending upon the language in hand:
· To store the names of all entities in a structured form at one place.
· To verify if a variable has been declared.
· To implement type checking, by verifying assignments and expressions in the source code are
semantically correct.
· To determine the scope of a name (scope resolution).

We will store the following information about identifiers.

· The name (as a string).
· The data type.
· The block level.
· Its scope (global, local, or parameter).
· Its offset from the base pointer (for local variables and parameters only).

Ins: Fkrezgy Yohannes Compiler Design 5|Page

Operations of Symbol Table
insert()
This operation is more frequently used by analysis phase, i.e., the first half of the compiler where tokens are
identified and names are stored in the table. This operation is used to add information in the symbol table
about unique names occurring in the source code. The format or structure in which the names are stored
depends upon the compiler in hand.

An attribute for a symbol in the source code is the information associated with that symbol. This information
contains the value, state, scope, and type about the symbol. The insert() function takes the symbol and its
attributes as arguments and stores the information in the symbol table.

For example:
int a;

should be processed by the compiler as:

insert(a, int);

lookup()
lookup() operation is used to search a name in the symbol table to determine:
· if the symbol exists in the table.
· if it is declared before it is being used.
· if the name is used in the scope.
· if the symbol is initialized.
· if the symbol declared multiple times.

The format of lookup() function varies according to the programming language. The basic format should
match the following:

lookup(symbol)

This method returns 0 (zero) if the symbol does not exist in the symbol table. If the symbol exists in the
symbol table, it returns its attributes stored in the table.

delete(s):
deletes s from the table (or, typically, hides it).

4.2.1. Symbol Table Implementation

· Each entry in a symbol table can be implemented as a record that consists of several fields.
· The entries in symbol table records are not uniform and depends on the program element identified
by the name.
· Some information about the name may be kept outside of the symbol table record and/or some fields
of the record may be left vacant for the reason of uniformity. A pointer to this information may be
stored in the record.
· The name may be stored in the symbol table record itself, or it can be stored in a separate array of
characters and a pointer to it in the symbol table.

Symbol table organization

Ins: Fkrezgy Yohannes Compiler Design 6|Page

• There are various approaches to symbol table organization.
a. Linear List
b. Binary Search Tree
c. Hash Table

a. Linear List

• It is the simplest approach in symbol table organization.

• The new names are added to the table in the order they arrive.
• Whenever a new name is to be added to the table:
§ The table is first searched linearly or sequentially to check whether or not the name is already
present in the table.
§ If the name is not present, then the record for new name is created and added to the list at a
position specified by the available pointer. (look on the picture)
• To retrieve the information about the name, the table is searched sequentially, starting from the first record
in the table.
• The average number of comparisons, p, required are proportional to p=0.5* (n+1) for successful search
and p=n for an unsuccessful search, where n=number of entries in the table.

Advantage: It takes less space, and additions to the table are simple.
Disadvantage: It has a higher accessing time.

b. Binary Search Tree

• It is more efficient than Linear list.

• We provide two links – left and right, which point to record in the search tree.
• A new name is added at a proper location in the tree such that it can be accessed alphabetically.
• For any node name1 in the tree, all names accessible by following the left link precede name1 alphabetically.
• Similarly, for any node name1 in the tree, all names accessible by following the right link succeed name1
alphabetically.
• The time for adding/searching a name is proportional to (m+n) log2 n.
• The property of this tree is,
§ All names (Name ‘j’) accessible from Name ‘i' by using the LEFT ‘i’ is always lesser than NAME
‘i’. i.e., Name ‘j’ < Name ‘i’

Ins: Fkrezgy Yohannes Compiler Design 7|Page

§ All names (Name ‘k’) accessible from Name ‘i' by using the RIGHT ‘i’ is always greater than
NAME ‘i’. i.e., Name ‘k’ > Name ‘i’.

c. Hash Table

· A hash table is an array with index range: 0 to TableSize – 1

· Most commonly used data structure to implement symbol tables
· Insertion and lookup can be made very fast – O(1)
· A hash function maps an identifier name into a table index A hash function, h(name), should depend solely
on name h(name) should be computed quickly
· h should be uniform and randomizing in distributing names
· All table indices should be mapped with equal probability.
· Similar names should not cluster to the same table index

Ins: Fkrezgy Yohannes Compiler Design 8|Page

06 Type Checking
No ratings yet
06 Type Checking
16 pages
HLR9820-Configuration Guide (V900R003C02 04, Db2)
No ratings yet
HLR9820-Configuration Guide (V900R003C02 04, Db2)
105 pages
Azure Data Engineering Interview Q & A - Topicwise
No ratings yet
Azure Data Engineering Interview Q & A - Topicwise
57 pages
PL 04TypesAndPolymorphism
100% (1)
PL 04TypesAndPolymorphism
59 pages
Type Systems Summary
No ratings yet
Type Systems Summary
2 pages
M6 Main
100% (1)
M6 Main
46 pages
Type Checking
No ratings yet
Type Checking
17 pages
Data Storage, Retrieval and DBMS
100% (1)
Data Storage, Retrieval and DBMS
31 pages
Omni 3000 - 6000 Modbus Database 4C
100% (1)
Omni 3000 - 6000 Modbus Database 4C
178 pages
Data Types Note
No ratings yet
Data Types Note
6 pages
CSC 403
No ratings yet
CSC 403
21 pages
Unit-2 PPL Datatypes
67% (3)
Unit-2 PPL Datatypes
89 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Chapter 4
No ratings yet
Chapter 4
43 pages
Lecture 5
No ratings yet
Lecture 5
68 pages
PPL-Unit 2
No ratings yet
PPL-Unit 2
45 pages
Ch07 Type Systems 4e
No ratings yet
Ch07 Type Systems 4e
23 pages
CH 6
No ratings yet
CH 6
88 pages
Data Types and Representation
No ratings yet
Data Types and Representation
88 pages
Lecture 4
No ratings yet
Lecture 4
36 pages
Interaction Data Reference Guide (Compliance) - NP - 3 5 PDF
No ratings yet
Interaction Data Reference Guide (Compliance) - NP - 3 5 PDF
99 pages
Type Checking and Type Equality: Type Systems Are The Biggest Point of Variation Across Programming Languages. Even
No ratings yet
Type Checking and Type Equality: Type Systems Are The Biggest Point of Variation Across Programming Languages. Even
10 pages
Data Types: ISBN 0-321-49362-1
No ratings yet
Data Types: ISBN 0-321-49362-1
84 pages
Compiler Note Book
No ratings yet
Compiler Note Book
41 pages
Chapter Five: Type Checking
100% (1)
Chapter Five: Type Checking
48 pages
12 Type System
No ratings yet
12 Type System
44 pages
Type Checking
No ratings yet
Type Checking
9 pages
03 Types
No ratings yet
03 Types
30 pages
20180723220018D2749 - Comp6062-Pert18-19 - 2018
No ratings yet
20180723220018D2749 - Comp6062-Pert18-19 - 2018
28 pages
Lecture 5 - Type Systems
No ratings yet
Lecture 5 - Type Systems
13 pages
Software II: Principles of Programming Languages: Some Basic Definitions
No ratings yet
Software II: Principles of Programming Languages: Some Basic Definitions
55 pages
PPL Unit-II - Datatypes Final
No ratings yet
PPL Unit-II - Datatypes Final
154 pages
UNIT-II - Structuring The Data, Computations and Program
No ratings yet
UNIT-II - Structuring The Data, Computations and Program
105 pages
PL Data Types
No ratings yet
PL Data Types
41 pages
Lecture 03
No ratings yet
Lecture 03
44 pages
Chapter 7:: Data Types: Programming Language Pragmatics, Fourth Edition
No ratings yet
Chapter 7:: Data Types: Programming Language Pragmatics, Fourth Edition
21 pages
Chapter4 Type Systems Full Guide
No ratings yet
Chapter4 Type Systems Full Guide
4 pages
CH 6
No ratings yet
CH 6
50 pages
Module 2 (Data Types)
No ratings yet
Module 2 (Data Types)
97 pages
Type Checking
No ratings yet
Type Checking
21 pages
Introduction To Compilers and Language Design - Chapter7
No ratings yet
Introduction To Compilers and Language Design - Chapter7
21 pages
CSC 204 Data Structure-1
No ratings yet
CSC 204 Data Structure-1
31 pages
Type Checking
No ratings yet
Type Checking
24 pages
Questions
No ratings yet
Questions
36 pages
Compiler CH 5
No ratings yet
Compiler CH 5
28 pages
Topicwise Lecture Notes of Compiler Design (CS - 603 (C) ) As On 8.4.2024
No ratings yet
Topicwise Lecture Notes of Compiler Design (CS - 603 (C) ) As On 8.4.2024
23 pages
Unit II
No ratings yet
Unit II
23 pages
M6 Guide
No ratings yet
M6 Guide
10 pages
Final Exam: Introduction To Database Systems: Class Account
No ratings yet
Final Exam: Introduction To Database Systems: Class Account
14 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Java Programing
No ratings yet
Java Programing
144 pages
UNIT-2 Short Answers
No ratings yet
UNIT-2 Short Answers
12 pages
CSC401 2
No ratings yet
CSC401 2
8 pages
Type Checking
No ratings yet
Type Checking
18 pages
AT&CD Unit 3 - 2 Part
No ratings yet
AT&CD Unit 3 - 2 Part
8 pages
Content Technologies
No ratings yet
Content Technologies
54 pages
Load Utilities in Teradata
No ratings yet
Load Utilities in Teradata
12 pages
Intermediate Code Generation
No ratings yet
Intermediate Code Generation
9 pages
Full Stack UNIT 3
No ratings yet
Full Stack UNIT 3
36 pages
Type Checking
No ratings yet
Type Checking
5 pages
PPL-Unit 2 Part 3
No ratings yet
PPL-Unit 2 Part 3
12 pages
Tugas GSLC 06PIT
No ratings yet
Tugas GSLC 06PIT
23 pages
Lecture 5
No ratings yet
Lecture 5
10 pages
Class X Imp Notes Cbse
No ratings yet
Class X Imp Notes Cbse
131 pages
SQL Error Code
No ratings yet
SQL Error Code
35 pages
PostgreSQL IQ
No ratings yet
PostgreSQL IQ
27 pages
CS-602 - PPL - Unit-2
No ratings yet
CS-602 - PPL - Unit-2
31 pages
Principles of Programming Languages: UNIT II - Intro To Programming Concepts Lecture 7 - Data Types
No ratings yet
Principles of Programming Languages: UNIT II - Intro To Programming Concepts Lecture 7 - Data Types
92 pages
Cheatsheet: Postgresql Monitoring: More Info More Info
No ratings yet
Cheatsheet: Postgresql Monitoring: More Info More Info
2 pages
Type Checking: By:Vishnu Kumar Gehlot
No ratings yet
Type Checking: By:Vishnu Kumar Gehlot
16 pages
Chapter 5 Symbol Tables and Type Checking
No ratings yet
Chapter 5 Symbol Tables and Type Checking
39 pages
Unit 3 - Compiler Design - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Compiler Design - WWW - Rgpvnotes.in
19 pages
Ebook Mql5
No ratings yet
Ebook Mql5
22 pages
PPL-Unit 3
No ratings yet
PPL-Unit 3
28 pages
Examen 70-764
No ratings yet
Examen 70-764
187 pages
IKEA Sales Tree Simulator
No ratings yet
IKEA Sales Tree Simulator
16 pages
Firebird 2.1 ReleaseNotes
No ratings yet
Firebird 2.1 ReleaseNotes
172 pages
MySQL Database Design
No ratings yet
MySQL Database Design
21 pages
File Organisation
No ratings yet
File Organisation
7 pages
Nova Guliyev: SR Data Consultant
No ratings yet
Nova Guliyev: SR Data Consultant
6 pages
Exercise: Arrays and Clusters: Description
No ratings yet
Exercise: Arrays and Clusters: Description
11 pages
SE Compiler Chapter 1
No ratings yet
SE Compiler Chapter 1
19 pages
Chapter 9 Code Generation
No ratings yet
Chapter 9 Code Generation
27 pages
Chapter 3
No ratings yet
Chapter 3
33 pages
Ultraverse
No ratings yet
Ultraverse
29 pages
SE Compiler Chapter 3-Parser
No ratings yet
SE Compiler Chapter 3-Parser
27 pages
Improving Crystal Reports Performance in Visual Studio .NET Applications
No ratings yet
Improving Crystal Reports Performance in Visual Studio .NET Applications
12 pages
ST Solution
No ratings yet
ST Solution
29 pages
2935 5841 1 SM
No ratings yet
2935 5841 1 SM
8 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
13 pages
Teradata Technical Question
No ratings yet
Teradata Technical Question
5 pages
MS Access 2007
No ratings yet
MS Access 2007
3 pages
Coding for beginners The basic syntax and structure of coding
From Everand
Coding for beginners The basic syntax and structure of coding
Diamond Moore
No ratings yet
C# Interview Questions You'll Most Likely Be Asked
From Everand
C# Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet