Compiler Notes Unit 3
Compiler Notes Unit 3
MODULE-1,..TYPE CHECKING
TYPE CHl.:CKING
A compil er must check that th e source µrogrn m fo ll ows bot h synla ct1c and scrn ont1 c conventi ons
of the source IAnguage.
This checking. ca ll ed .Hath · c/1eck111g. detects and report s prog1a111m111g errors
• A type checker verifies that the type of a construct matches that expected by its context.
For example : arithmetic operator mod in Pascal requires integer operands, so a type
checker verifies that the operands of mod have type integer.
• Type information gathered by a type checker may be needed when code is generated.
TYPE SYSTEMS
The design of a type checker for a language is based on information about the syntactic
constructs in the language, the notion of types, and the rules for assigning types to language
constructs.
For example : " if both operands of the arithmetic operators of+,- and * are of type integer, then
the result is of type integer "
Type Expre.~sions
• The type of a language construct will be denoted by a "type expression."
• A type expression is either a basic type or is formed by applying an operator called a type
comtructor to other type expressions.
• The sets of basic types and constructors depend on the language to be checked.
I. Basic types such as boolean, char, integer, real are type expressions.
A special basic type, type error , will signal an error during type checking; void denoting
''the absence of a value" allows statements to be checked.
2. Since type expressions may be named, a type name is a type expression.
3. A type constructor applied to type expressions is a type expression.
Constructors include:
Array1· : If T is a type expression then array (1 ,T) is a type expression denoting the type
of an array with elements of type T and index set I.
Pointer.s : If T is a type expression, then poimer(T) is a type expression denoting the type
"pointer to an object of type T '.
For example, var p : j row declares variable p to have type pointer(row).
4. Type expressions may contain variables whose values are type expressions .
/ ~poTter
.
mteger
Type systems
► A type .system is a collection of rules for assigning type expressions to the various parts of
a program.
► Different type systems may be used by different compilers or processors of the same
language.
► Checking done by a compiler is said to be static, while checking done when the target
program runs is termed dynamic.
► Any check can be done dynamically, if the target code carries the type of an element
along with the value of that element.
Sound type system
A sound type syst
allows us to d t .
. .
em e1unmates the need for dynamic checking for type errors because it
That ,s .f. e erdmme statically that these errors cannot occur when the target program runs .
' I a soun type syste . " .
I
m assigns a type other than 1y1,e error to a program part then type
errors cannot occur wl ,
1en t1le target code for the program part is run.
Strongly typed language
_ A lang~age is strongly typed if its compiler can guarantee that the programs it accepts
will execute without type errors.
Error Recovery
r Since type checking has the potential for catching errors in program, it is desirable for
type checker to recover from errors, so it can check the rest of the input.
► Error handling has to be designed into the type system right from the start; the type
checking rules must be prepared to cope with errors.
Here, we specify a type checker for a simple language in which the type of each identifier
must be declared before the identifier is used. The type checker is a translation scheme that
sy nth es izes the type of each expression from the types of its subexpressions . The type checker
can handle arrays, pointers, statements and functions.
A Sim1>le Language
P - D;E
D .- D ; D I id : T
T -► char I integer I array [ num ] of T I j T
E -► litera l I num I id I E mod E I E [ E ] I E i
Translation scheme:
P -► D ; E
0 -► 0 ; D
o - id : T { addtJpe (id.entry , T.type) }
T - char { T. type : =char }
T .- integer { T.type : = integer }
T.- jTI { T.type : = pointer(T1.rype)}
T - array [ num] of TI { T.type : = array ( I .. . num. val , Ti .type)}
In the following rul es, th e attribute ~)lf'e for E give.~ th e type ex press ion assigned to th e
expression generated by E
Here, constants represented by the tokens litcrnl and 1111111 have type chCII' and tnleKer
The postfix operator i yields the object pointed to by its operand. The type of E i is the type ,
of the object pointed to by the pointer E.
Statements do not have values~hence the basic type void c.an be assigned to them. If an error is
detected within a statement, then type error is assigned.
I . Assignment statement:
S-.icl : =E { S.type : = if id.type= £.type then void
else rype error}
2. Conditional statement:
S -. if Ethen S 1 { S.1ype : = if Etype = boolean then S 1.type
else ~ype error}
3. While staf'crnent:
S -·► while E do S 1 { S.~ype : = if E.~VJJe = boolean then Si .type
else type aror }
~
-
bilia :we co- n
-
-
MODULE ·.2;· RUN-TIME ENVIRONMENTS
Procedures:
A procedure definition is a d l · •
: · ec arat1on t11at associates an identifier with a statement The
identifier is theprocedr ,.., d .
' e name, an t11e statement 1s the procedure body.
pl'Ocedure readarray;
var i : integer;
begin
for i : = I to 9 do read(a[i])
end;
When a procedure name appears within an executable statement, the procedure is said to be
Activation trees:
An ac/ivalion tree is used to depict the way control enters and leaves activations. In an
activation tree,
Control stack:
• A control stack is used to keep track of live procedure activations. The idea is to push the
node for an activation onto the control stack as the activation begins and to pop the node
when the activation ends.
• The contents of the control stack are related to paths to the root of the activation tree.
When node n is at the top of control stack, the stack contains the nodes along the path
from n to the root.
The Sco1>e of a Declaration:
A declaration is a syntactic construct that associates information with a name.
Declarations may be explicit, such as:
var i . integer .
or they may be implicit. Example, any variable name starting with l is assumed to denote an
integer.
The portion of the program to which a declaration applies is called the ,;cope of that declaration.
Binding of name..~:
Even if each name is declared once in a program, the same name may denote different
data objects at mn time. "Data object" corresponds to a storage location that holds values.
The term environment refers to a function that maps a name to a storage location .
The term state refers to a function that maps a storage location to the value held there.
environment state
When an environment associates storage locations with a name x, we say that xis bound
to s. This association is referred to as a binding of x.
STORAGE ORGANISATION
• The executing target program runs in its own logical address space in which each
program value has a location.
• The management and organization of this logical address space is shared between the
complier, operating system and target machine. The operating system maps the logical
address into physical addresses, which are usually spread throughout memory.
Code
Static Data
Stack
~
free memory
t
Heap
• Run-time s.torage come..,; 111 blocks, where a byte ,s the small est unit of addressable
memory Fo ur bytes fo rm a machine word. Mult1 byte obJects are stored in consecuti ve
bytes and given the nddress of firs t byte.
• The storage layout fo r data obJects 1s strongly innuenced by the addressing constraints of
the target machme.
• A character array of length IO needs only enough bytes to hold IO characters, a com piler
may allocate 12 bytes to get alignment, leaving 2 bytes unused
• This unused space due to alignment considerations 1s referred to as padd ing.
• Th e size of some program objects may be known at run time and may be placed 111 an
area called static.
• The dynamic areas used to maximize the utilization of space at run time are stack and
heap.
Activation records :
• Procedure calls and returns are usu.ally managed by a run time stack called the con1rol
stack.
• Each live activation has an activation record on the control stack, with the root of the
activation tree at the bottom, the latter activation has its record at the top of the stack.
• The contents of the activation record vary with the language being implemented. The
diagram below shows the contents of activation record.
I
C
C
C I
C
C
C
• such as those arising from the evaluation of expressions.
Temporary values
• Local data belonging to the procedure whose activation record this is.
• A saved machine status, with information about the state of the machine just before the
call to procedures.
• An access link may be needed to locate data needed by the called procedure but found
elsewhere.
• A control link pointing to the activation record of the caller.
. , . t al I ca l led procedures
• Space fo r the return value of the ca lled fun ct:1011s , tf ,wy Again . no f
return a value, and if ono does, we may prefer to pl ace that value in a regi Ster or
effi ciency. .
• The actu~I parameters used by the ca lling procedure. These are not placed 111 acti vati on
rec0rd but rath er 111 registers, when pof>s ihle, for grea ter effi ciency
STATIC ALLOCATION t
• ln static allocation, names are bound to storage as the program is compiled, so there is no i
need for a run-time support package. I
• Since the bindings do not change at run-time, everytime a procedure is activated, its
names are bound to the same storage locations.
• Therefore values of local names are retained across activations of a procedure. That is,
l
I
when control returns to a procedure the values of the locals are the same as they were
'II
I
when control left the last time. t
• From the type of a name, the compiler decides the an10unt of storage for the name and
decides where the activation records go. At compile time, we can fill in the addresses at
which the target code can find the data it operates on.
• All compilers for languages that use procedures, functions or methods as units of user-
defined actions manage at least part of their run-time memory as a stack.
• Each time a procedure is called , space for its local variables is pushed onto a stack, and
when the procedure terminates, that space is popped off the stack.
Calling sequences:
• Procedures called are implemented in what is called as calling sequence, which consists
of code that aIJocates an activation record on the stack and enters information into its
fields.
• A return sequence is similar to code to restore the state of machine so the calling
procedure can continue its execution after the call.
• The code in calling sequence is often divided between the calling procedure ( caller) and
the procedure it calls (callee).
• When designing calling sequences and the layout of activation records, the following
principles are helpful :
► Values communicated between caller and callee are generally placed at the
begi nning of the callee's activation record, so they are as close as possible to the
ca ller's activation record.
tra
► Fixed lengt·h items ttre genc,-,1II I . .
the control I'111 k ' Y Paced 111 th e middl e. Such items typically include
1
It· . • t l e access link, and the mA chine statu s fi elds
► ems w 11ose size may not I k
, •. ' )C nown c1uly enough are r,laced at th e end of the
act1 va t1011 record The 111 . . . .
, · ost common ox11111pl e 1s dyn,1mi call y sized ,may where the
Wl Iue ot 0ne of the a 111, ,,
u , ec s parameters cletcrm,n cs the length of th e array
► .
vve must loCl-'ltc the tot) of. t k
.
. . ·
- -s ac po1111 er .1 11 c1,c ,owdy A common approa ch ,~ to have
it pomt to the end of fixed •lcngth fields m the acti vation record. Fixed-length data
can 1:hen be accessed by fixed offsets, known to the interm ediate-code generator,
relative to the top-of-stack pointer.
•►
T
'''
.::
'j'
+
caller· s temporaries and local data
!responsibility
Parameters and returned values
callee's ----------------------------- ---
activation control link
record links and saved status
-------------------------------- _. Iop sp
/ callee's
~esponsi¥1ity temporaries and local data
• The calling sequence and its division between caller and callee are as follows.
array A
arrays of p --------------------------------
array B -~----~
------------ ------ --------------
array C
-::-----~
activation record for
procedure q called by p
+-
arrays of q top
+ ~--------------'i----.~
Access to dynamically allocated an·ays
• Procedure p has three local arrays, whose sizes cannot be determined at compile time.
The storage for these arrays is not part of the activation record for p.
• Access to the data is through two pointers, top and rop-sp. Here the top marks the actual
top of stack; it points the position at which the next activation record will begin.
• The second top-.\J} is used to find local, fixed-length fields of the top activation record.
• The code to reposition top and rop-.sp can be generated at compile time, ·in terms of sizes
that will become known at run time.
HEAP ALLOCATION
Stack allocation strategy cannot be d ·r .
I. The values of local names must ~se_ , .either of the following is possible :
2. A called activation outl . I e ietamed when an activation ends.
ives t 1e caller.
s Retained activation
,,,,,,' I s record for r
-- control link
q(l ,9)
• The record for an activation of procedure r is retained when the activation ends.
• Therefore, the record for the new activation q( I , 9) cannot follow that for s physically.
• If the retained activation record for r is deallocated, there will be free space in the heap
between the activation records for s and q.
-
7 ■ MSDfaoL TA-BLE •
ANAGEMENT
A_ s~b~l table is. a data structure used by a compiler to keep track of scope/
b1ndmg information about names. This information is used in the source
program to identify the various program elements, like variables, constants,
procedures, and the labels of statements. The symbol table is searched every
time a name is encountered in the source text. When a new name or new
information about an existing name is discovered, the content of the symbol
table changes. Therefore, a symbol table must have an effici~t mechan~
for accessing the information held in the table as well as for adding new entnes
to the symbol table.
. th . 1 entation data structure for the
For efficiency, our choice of ~ imp em t should be stress a minimal
· ation its conten s .
symbol table and the org~ . th infonnation on existing entnes.
cost when adding new entnes or accessing . :UY
as necessary, then it is more
Also, if the symbol table can grow dynanuc
useful for a compiler.
~[:;=:===;====s__jj_-.,{ LB, j
[ a I int _ I
I I I UB,
I I IL----=--1~~;::==LB=======;I 2
I I I UB, \
SYMBOL TABLE
I I
FIGURE 7 .1 A pointer steers the symbol table to remotely stored informa-
tion for the array a.
Information is entered into the symbol table in various ways. In some cases,
the symbol table record is created by the lexical analyzer as soon as the name
is encountered in the input, and the attributes of the name are entered when
the declarations are processed. But very often, the same name is use~ to
denote different objects, perhaps even in the same block. For example, 10 C
programming, the same name can be used as a variable name and as a member
name of a structure, both in the same block. In such cases, the lexical analyzer
only returns ~e name to the parser, rather than a pointer to the symbol tabl~ I
record_. ~at 1s, a symbol table record is not created by the lexical analyzerd
th stnn
e g itself is returned to the parser, and the symbol table record is create
th
when e name's syntactic role is discovered. \
. compiler oeslgn
242 comPret,ens1ve
ES TO SYMBOL TABLE ORGAttt~
VARIOUS APPROACH ..
7.6 N
TIO .zing the symbol table. These methOds
al methods of organ1 are
There are sever
discussed below.
Linear List
7 .6.1 Th• . is the easiest way to implement a SY_lllbol table. The
A linear bst of records th t ble in the order that they amve. Whenever
new names are added tod e tahe table the table is first searched linearly ora
• to be adde to ' . d •
new name 1s h th or not the name 1s alrea y present m the table.
sequentially _to check w ; t:Cn the record for new na~e is created and added
If the name is not P~~sen ' .tied by the available pomter, as shown in the
to the list at a pos1t1on spec1
Figure 7.3.
name1
info 1
namei
inf~
available ..-
left namP-
-z ~-c.0
uu• right
info right
1
-
---- ..
-
-
- info
..-
- nam e
, .
name info
-
name info
--.r---J
..-
- .. '
k-1
Hash Table
FIGURE 7 .5 Hash table method of symbol table organization.
•
91 ERROR HA~•rtDLIN<i
One
d of the important
fr tasks that a compiler must --r.
~uonn 1s· the detection .
of
an r~cove:Y om err?rs. Recovery from errors is important, because the
compiler will be scannmg and compiling the entire program, perhaps in the
presence of errors; so as many errors as possible need to be detected.
Every phase of a compilation expects the input to be in a particular format,
and whenever that input is not in the required format, an error is returned.
When detecting an error, a compiler scans some of the tokens that are ahead
of the error's point of occurrence. The fewer the number of tokens that must
be scanned ahead of the point of error occurrence, the better the compiler's
error-detection capability. For example, consider the following statement:
ifa= bthenr. =y+z; . .
Th . the above statement will be detected in the syntactic analysis
e error m al zer sees the token "then"; but the first
phase, but not before the syntax an Y
token, itself, is in error. thin that a compiler is supposed to do ~s
After detecting an error, ~he ~~tabl! diagnostic. A good error diagnosnc
to report the error by prod~ClDS •es.
should possess the folloW1D8 properU. rms of the original source program
uld be produced m te . f the source pro-
1. The message sho f some internal representauon ~ with the line
rather than in terms o e should be produced ong
-nft"lnle the messag
gram. F or ~ ·a r , gram
numbers of the source pro · 259
pller oeslgn
~ hanslve corn
cornP•" sy to understan d b Y t h e user
110 o ld be ea .
or message s b specific a nd should localize the p
2. The err r message should ce should read, "xis not declared in ~bl~tl\.
3. The erro le an error o,cssagd laration " nct10Y\
for cxan1p ' . "missing cc . .
r. ,, and not Just, b edu ndant; th at ts, the sam e message h-- .
,un, should not e r . s.lV\J.ld
4 The message .n and again.
. not be produced ag_a1 h ld report errors by generating message .
Therefore, a compiler s orus captured by the compiler can be class;! wd1th
rt' s The erro . 1ue as
the above prope ie · emantic errors. Syntactic errors .are those P?..
·c errors or s . 1 . h b . . . 10ts
either syntactI . the lexical or syntactic ana ys1s p. ase y the compiler
that are detected in ors detected by the compiler. .
Semantic errors are those err
- ~~R~EC~O~V~E~R~V:Jf:!R~OrnM~L~E~X~•~c~A~L..:..P..:..H==-A=-=S:.=E:;...;E~R~R;;_O...;,_,R....;.S______
~2
. d tects an error when it discovers that an input's prefix
The lexical ana1yzer e ft d ·
.fication of any token class. A er etect1ng an error the
does not fi t the speCl • Th' • ·' .
lexical analyzer can invoke an error recovery routme. 1s can entail a vanety
of remedial actions.
The simplest possible error recovery is to skip t~e ~rr~neous characters
until the lexical analyzer finds another token. But this 1s likely to cause the
parser to read a deletion error, which can cause severe difficulties in the syntax-
analysis and remaining phases. One way the parser can help the lexical analyzer
can improve its ability to recover from errors is to make its list of legitimate
tokens (in the current context) available to the error recovery routine. The
error-recovery routine can then decide whether a remaining input's prefix
matches one of these tokens closely enough to be treated as that token.
-
id +
--- 1
-
Io S2
S4 Accept ---
-
11
S3
R3 R3 R3 -
-
12
5
-
13 S2
-
6
14 S2
-
S/ R1 S4 /R 1 Rl
15
S/R2 S4/R 2 R2
16
id + * $ E
Io S2 1
11
- 12 SJ S4 Accept
R3 R3 R3
13 s2
5
14
~
s2
6
15 ~
16
- Rl
S◄ Rt
--
R2
R2 R2
-
. Error Handn,,
The parsing table with eno . . g 21 3
r routines .
T,A nL 1s shawn .
1'D E 9 .3 Parsing 1' b ltl Table 9 3
a le W 'th l! ,
r--
id +
-
r----
.,.
I •
. . rror flo .
·Utrnt3
- lo s ,.._
,....
1.
S2
E2
el
SJ
el
r-
ti
r- 2 l
:
~ s4 Accept - l
-
12 R3 R3
- R3
R3
-
13 S2 el
E, I
El 5
14 S2 El El
~
El 6
ls R1 RI S4 R1
16 R2 R2 R2 R2
where routine e1 is called fro~ st~tes 10, 13, and /4, which pushes an imaginary
id onto the stack and covers 1t with state 12• The routine e is called from state
2
/ 1, which pushes + onto stack and covers it with state J .
3
For example, if we trace the behavior of the parser described above for the
input id + *id $:
Stack Unspent
Contents Input Moves
$Io id+*id$ shift and enter into state 2
+*id$ reduce by production number 3
$Ioidl2
+*id$ shift and enter into state 3
$Ic,EI1
*id$ call error routine e1
$Ic,EI1+I3
reduce bY production number 3
$IoEI1+/ 3id 1i *id$
(id 12 pushed by e1) 'ft and enter into state 4
*id$ sh1 2
$IoEI1+I3EI5 . d enter into state
id$ shift an 1...a.. l