Lecture 5
Lecture 5
Semantic Analysis
1
traversals, and examining control flow. Depending on the design
of the language, some of these problems can be detected at
compile time, while others may need to wait until runtime.
2
code in C is syntactically legal and will compile, but is unsafe
because it writes data outside the bounds of the array a[].
/* This is C code */
int i;
int a[10];
for(i=0;i<100;i++) a[i] = i;
In a safe programming language, it is not possible to write a
program that violates the basic structures of the language.
A safe programming language enforces the boundaries of arrays,
the use of pointers, and the assignment of types to prevent
undefined behavior.
Most interpreted languages, like Perl, Python, and Java, are safe
languages.
For example, in C#, the boundaries of arrays are checked at
runtime, so that running off the end of an array has the
predictable effect of throwing
an IndexOutOfRangeException:
/* This is C-sharp code */
a = new int[10];
for(int i=0;i<100;i++) a[i] = i;
4
Boolean values, and so forth. For each atomic type, it is
necessary to clearly define the range that is supported.
The compound types of a language combine together existing
types into more complex aggregations.
Suppose that an integer i is assigned to a floating point f. A
similar situation arises when an integer is passed to a function
expecting a floating point as an argument. There are several
possibilities for what a language may do in this case:
• Disallow the assignment. A very strict language (like B-
Minor) could simply emit an error and prevent the program from
compiling!
• Perform a bitwise copy. If the two variables have the same
underlying storage size, the unlike assignment could be
accomplished by just copying the bits in one variable to the
location of the other.
• Convert to an equivalent value. For certain types, the
compiler may have built-in conversions that change the value to
the desired type implicitly.
• Interpret the value in a different way. In some cases, it may
be desirable to convert the value into some other value that is
not equivalent but still useful for the programmer.
4.3 The B-Minor Type System
The B-Minor type system is safe, static, and explicit.
B-Minor has the following atomic types:
• integer - A 64 bit signed integer.
5
• boolean - Limited to symbols true or false.
• char - Limited to ASCII values.
• string - ASCII values, null terminated.
• void - Only used for a function that returns no value.
And the following compound types:
• array [size] type
• function type ( a: type, b: type, ... )
And here are the type rules that must be enforced:
• A value may only be assigned to a variable of the same type.
• A function parameter may only accept a value of the same
type.
• The type of a return statement must match the function return
type.
• All binary operators must have the same type on the left and
right hand sides.
• The equality operators != and = = may be applied to any type
except void, array, or function and always return boolean.
The comparison operators < <= >= > may only be applied to
integer values and always return boolean.
• The boolean operators ! && || may only be applied to boolean
values and always return boolean.
• The arithmetic operators + - * / % ˆ ++ -- may only be applied
to integer values and always return integer.
6
The symbol table records all of the information that we need to
know about every declared variable (and other named items, like
functions) in the program. Each entry in the table is a struct
symbol which is shown in Figure 4.1.
struct symbol {
symbol_t kind;
struct type *type;
char *name;
int which;
};
7
However, it’s not quite that simple, because most programming
languages allow the same variable name to be used multiple
times, as long as each definition is in a distinct scope. For
example, the following B-Minor program defines the symbol x
three times, each with a different type and storage class. When
run, the program should print 10 hello false.
x: integer = 10;
f: function void ( x: string ) =
{ print x, "\n";
{
x: boolean = false;
print x, "\n";}}
main: function void () =
{
print x, "\n";
f("hello"); }
8
Figure 4.2: A Nested Symbol Table
void scope_enter();
void scope_exit();
int scope_level();
void scope_bind( const char *name, struct symbol *sym );
struct symbol *scope_lookup( const char *name );
struct symbol *scope_lookup_current( const char *name );
Figure 4.3: Symbol Table API
9
• scope exit() causes the topmost hash table to be removed.
• scope level() returns the number of hash tables in the current
stack. (This is helpful to know whether we are at the global
scope or not.)
• scope bind(name,sym) adds an entry to the topmost hash table
of the stack, mapping name to the symbol structure sym.
• scope lookup(name) searches the stack of hash tables from top
to bottom, looking for the first entry that matches name exactly.
If no match is found, it returns null.
• scope lookup current(name) works like scope lookup except
that it only searches the topmost table. This is used to determine
whether a symbol has already been defined in the current scope.
11