0% found this document useful (0 votes)
15 views

Lecture 5

Type checking is a major component of semantic analysis in programming languages. Different languages have different type systems - some like C have weak type systems while others like Ada have very strong type systems. Before type checking, name resolution determines the type of each identifier by adding definitions to a symbol table. This table is then referenced during semantic analysis to check types and evaluate code correctness. Semantic analysis also checks other aspects like array bounds and control flow.

Uploaded by

mohamed samy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Lecture 5

Type checking is a major component of semantic analysis in programming languages. Different languages have different type systems - some like C have weak type systems while others like Ada have very strong type systems. Before type checking, name resolution determines the type of each identifier by adding definitions to a symbol table. This table is then referenced during semantic analysis to check types and evaluate code correctness. Semantic analysis also checks other aspects like array bounds and control flow.

Uploaded by

mohamed samy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Chapter Four

Semantic Analysis

Type checking is a major component of semantic analysis.


Different programming languages have different approaches to
type checking. Some languages (like C) have a rather weak type
system, so it is possible to make serious errors if you are not
careful. Other languages (like Ada) have very strong type
systems, but this makes it more difficult to write a program that
will compile at all!
Before we can perform type checking, we must determine the
type of each identifier used in an expression.
A variable x in an expression could refer to a local variable, a
function parameter, a global variable, or something else entirely.
We solve this problem by performing name resolution, in
which each definition of a variable is entered into a symbol
table. This table is referenced throughout the semantic analysis
stage whenever we need to evaluate the correctness of some
code.
Once name resolution is completed, we have all the information
necessary to check types. In this stage, we compute the type of
complex expressions by combining the basic types of each value
according to standard conversion rules. Semantic analysis also
includes other forms of checking the correctness of a program,
such as examining the limits of arrays, avoiding bad pointer

1
traversals, and examining control flow. Depending on the design
of the language, some of these problems can be detected at
compile time, while others may need to wait until runtime.

4.1 Overview of Type Systems


Most programming languages assign to every value (whether a
literal, constant, or variable) a type, which describes the
interpretation of the data in that variable.
The type system of a language serves several purposes:
• Correctness. A compiler uses type information provided by
the programmer to raise warnings or errors if a program
attempts to do something improper.
• Performance. A compiler can use type information to find the
most efficient implementation of a piece of code.
• Expressiveness. A program can be made more compact and
expressive if the language allows the programmer to leave out
facts that can be inferred from the type system.
A programming language (and its type system) are commonly
classified on the following axes:
• safe or unsafe
• static or dynamic
• explicit or implicit
In an unsafe programming language, it is possible to write
valid programs that have wildly undefined behavior that violates
the basic structure of the program. For example, the following

2
code in C is syntactically legal and will compile, but is unsafe
because it writes data outside the bounds of the array a[].
/* This is C code */
int i;
int a[10];
for(i=0;i<100;i++) a[i] = i;
In a safe programming language, it is not possible to write a
program that violates the basic structures of the language.
A safe programming language enforces the boundaries of arrays,
the use of pointers, and the assignment of types to prevent
undefined behavior.
Most interpreted languages, like Perl, Python, and Java, are safe
languages.
For example, in C#, the boundaries of arrays are checked at
runtime, so that running off the end of an array has the
predictable effect of throwing
an IndexOutOfRangeException:
/* This is C-sharp code */
a = new int[10];
for(int i=0;i<100;i++) a[i] = i;

In a statically typed language, all type checking is performed


at compile time, long before the program runs.
Static typing is often used to distinguish between integer and
floating point operations. While operations like addition and
multiplication are usually represented by the same symbols in
3
the source language, they are implemented with fundamentally
different machine code. For example, in the C language on X86
machines, (a+b) would be translated to an ADDL instruction for
integers, but an FADD instruction for floating point values.
To know which instruction to apply, we must first determine the
type of a and b and deduce the intended meaning of +.
In a dynamically typed language, type information is available
at runtime and stored in memory a long side the data that it
describes.
In an explicitly typed language, the programmer is responsible
for indicating the types of variables and other items in the code
explicitly.
Explicit typing can also be used to prevent assignment between
variables that have the same underlying representation, but
different meanings.

In an implicitly typed language, the compiler will infer the


type of variables and expressions (to the degree possible)
without explicit input from the programmer.
4.2 Designing a Type System
To describe the type system of a language, we must explain its
atomic types, its compound types, and the rules for assigning
and converting between types.
The atomic types of a language are the simple types used to
describe individual variables: integers, floating point numbers,

4
Boolean values, and so forth. For each atomic type, it is
necessary to clearly define the range that is supported.
The compound types of a language combine together existing
types into more complex aggregations.
Suppose that an integer i is assigned to a floating point f. A
similar situation arises when an integer is passed to a function
expecting a floating point as an argument. There are several
possibilities for what a language may do in this case:
• Disallow the assignment. A very strict language (like B-
Minor) could simply emit an error and prevent the program from
compiling!
• Perform a bitwise copy. If the two variables have the same
underlying storage size, the unlike assignment could be
accomplished by just copying the bits in one variable to the
location of the other.
• Convert to an equivalent value. For certain types, the
compiler may have built-in conversions that change the value to
the desired type implicitly.
• Interpret the value in a different way. In some cases, it may
be desirable to convert the value into some other value that is
not equivalent but still useful for the programmer.
4.3 The B-Minor Type System
The B-Minor type system is safe, static, and explicit.
B-Minor has the following atomic types:
• integer - A 64 bit signed integer.

5
• boolean - Limited to symbols true or false.
• char - Limited to ASCII values.
• string - ASCII values, null terminated.
• void - Only used for a function that returns no value.
And the following compound types:
• array [size] type
• function type ( a: type, b: type, ... )
And here are the type rules that must be enforced:
• A value may only be assigned to a variable of the same type.
• A function parameter may only accept a value of the same
type.
• The type of a return statement must match the function return
type.
• All binary operators must have the same type on the left and
right hand sides.
• The equality operators != and = = may be applied to any type
except void, array, or function and always return boolean.
The comparison operators < <= >= > may only be applied to
integer values and always return boolean.
• The boolean operators ! && || may only be applied to boolean
values and always return boolean.
• The arithmetic operators + - * / % ˆ ++ -- may only be applied
to integer values and always return integer.

4.4 The Symbol Table

6
The symbol table records all of the information that we need to
know about every declared variable (and other named items, like
functions) in the program. Each entry in the table is a struct
symbol which is shown in Figure 4.1.
struct symbol {
symbol_t kind;
struct type *type;
char *name;
int which;
};

The kind field indicates whether the symbol is a local variable, a


global variable, or a function parameter. The type field points to
a type structure indicating the type of the variable. The name
field gives the name (obviously), and the which field gives the
ordinal position of local variables and parameters.
To begin semantic analysis, we must create a suitable symbol
structure for each variable declaration and enter it into the
symbol table.
Conceptually, the symbol table is just a map between the name
of each variable, and the symbol structure that describes it:

7
However, it’s not quite that simple, because most programming
languages allow the same variable name to be used multiple
times, as long as each definition is in a distinct scope. For
example, the following B-Minor program defines the symbol x
three times, each with a different type and storage class. When
run, the program should print 10 hello false.
x: integer = 10;
f: function void ( x: string ) =
{ print x, "\n";
{
x: boolean = false;
print x, "\n";}}
main: function void () =
{
print x, "\n";
f("hello"); }

To accommodate these multiple definitions, we will structure


our symbol table as a stack of hash tables. Each hash table maps
the names in a given scope to their corresponding symbols. This
allows a symbol (like x) to exist in multiple scopes without
conflict. As we proceed through the program, we will push a
new table every time a scope is entered, and pop a table every
time a scope is left.

8
Figure 4.2: A Nested Symbol Table

void scope_enter();
void scope_exit();
int scope_level();
void scope_bind( const char *name, struct symbol *sym );
struct symbol *scope_lookup( const char *name );
struct symbol *scope_lookup_current( const char *name );
Figure 4.3: Symbol Table API

To manipulate the symbol table, we define six operations given


in Figure 4.3. They have the following meanings:
• scope enter() causes a new hash table to be pushed on the top
of the stack, representing a new scope.

9
• scope exit() causes the topmost hash table to be removed.
• scope level() returns the number of hash tables in the current
stack. (This is helpful to know whether we are at the global
scope or not.)
• scope bind(name,sym) adds an entry to the topmost hash table
of the stack, mapping name to the symbol structure sym.
• scope lookup(name) searches the stack of hash tables from top
to bottom, looking for the first entry that matches name exactly.
If no match is found, it returns null.
• scope lookup current(name) works like scope lookup except
that it only searches the topmost table. This is used to determine
whether a symbol has already been defined in the current scope.

4.5 Name Resolution


With the symbol table in place, we are now ready to match each
use of a variable name to its matching definition. This process is
known as name resolution.
Wherever a variable is declared, it must be entered into the
symbol table and the symbol structure linked into the abstract
syntax tree (AST). Wherever a variable is used, it must be
looked up in the symbol table, and the symbol structure linked
into the AST. Of course, if a symbol is declared twice in the
same scope, or used without declaration, then an appropriate
error message must be emitted.

11

You might also like