Module 2&3

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 127

6th SEM CSE

 A translator inputs and then converts a source program into


an object or target program.
 Source program is written in a source language
 Object program belongs to an object language

 A translators could be: Assembler, Compiler, Interpreter

Assembler:

source program object program


(in assembly language) Assembler (in machine language)

Compiler Design 2
- Compiler: translates a source program written in a High-
Level Language (HLL) such as Pascal, C++ into
computer’s machine language (Low-Level Language
(LLL)).
* The time of conversion from source program into
object program is called compile time
* The object program is executed at run time

- Interpreter: processes an internal form of the source


program and data at the same time (at run time); no object
program is generated.

Compiler Design 3
 A compiler can be treated as a single box that maps a
source program into a semantically equivalent target
program.
 The analysis part (Front End) breaks up the source

program into constituent pieces and imposes a


grammatical structure on them.
 The synthesis part (Back End) constructs the desired

target program from the intermediate representation


and the information in the symbol table.
 Lexical analyzer reads the stream of characters making up the
source program and groups the characters into meaningful
sequences called lexeme.
 For each lexeme, the lexical analyzer produces a token of the
form
(token-name, attribute-value)
that it passes on to the subsequent phase, syntax analysis.
 Token-name: an abstract symbol is used during syntax analysis
 Attribute-value: points to an entry in the symbol table for this
token.

Compiler Design 7
1.”position” is a lexeme mapped into a token <id, 1>, where id is an abstract symbol
standing for identifier and 1 points to the symbol table entry for position. The
symbol-table entry for an identifier holds information about the identifier, such as its
name and type.
2. = is a lexeme that is mapped into the token<=>. Since this token needs no attribute-
value, we have omitted the second component. For notational convenience, the
lexeme itself is used as the name of the abstract symbol.
3. “initial” is a lexeme that is mapped into the token <id, 2>, where 2 points to the
symbol-table entry for initial.
4. + is a lexeme that is mapped into the token <+>.
5. “rate” is a lexeme mapped into the token <id, 3>, where 3 points to the symbol-table
entry for rate.
6. * is a lexeme that is mapped into the token <*> .
7. 60 is a lexeme that is mapped into the token <60>

Blanks separating the lexemes would be discarded by the lexical analyzer.


Compiler Design 8
 The parser uses the first components of the tokens produced by the lexical
analyzer to create a tree-like intermediate representation that depicts the
grammatical structure of the token stream.
 A typical representation is a syntax tree in which each interior node
represents an operation and the children of the node represent the
arguments of the operation

Compiler Design 9
 The semantic analyzer uses the syntax tree and the information in the symbol
table to check the source program for semantic consistency with the language
definition.
 Gathers type information and saves it in either the syntax tree or the symbol table,
for subsequent use during intermediate-code generation.

 An important part of semantic analysis is type checking, where the compiler


checks that each operator has matching operands. For example, many
programming language definitions require an array index to be an integer; the
compiler must report an error if a floating-point number is used to index an array.
 The language specification may permit some type conversions called coercions.
For example, a binary arithmetic operator may be applied to either a pair of
integers or to a pair of floating-point numbers. If the operator is applied to a
floating-point number and an integer, the compiler may convert or coerce the
integer into a floating-point number.

Compiler Design 10
After syntax and semantic analysis of the source program, many compilers
generate an explicit low-level or machine-like intermediate representation
(a program for an abstract machine). This intermediate representation
should have two important properties:
◦ It should be easy to produce and
◦ It should be easy to translate into the target machine.
The considered intermediate form called three-address code, which consists
of a sequence of assembly-like instructions with three operands per
instruction. Each operand can act like a register.

Compiler Design 11
 The machine-independent code-optimization phase attempts to improve the
intermediate code so that better target code will result.
 Usually better means:
◦ faster, shorter code, or target code that consumes less power.
 The optimizer can deduce that the conversion of 60 from integer to floating
point can be done once and for all at compile time, so the int to float
operation can be eliminated by replacing the integer 60 by the floating-point
number 60.0. Moreover, t3 is used only once

 There are simple optimizations that significantly improve the running time
of the target program without slowing down compilation too much.

Compiler Design 12
 If the target language is machine code, registers or memory
locations are selected for each of the variables used by the
program.
 Then, the intermediate instructions are translated into sequences
of machine instructions that perform the same task.
 A crucial aspect of code generation is the Judicious assignment
of registers to hold variables.

Compiler Design 13
Compiler Design 14
The symbol table is a data structure containing a
record for each variable name, with fields for the
attributes of the name.
 The data structure should be designed to allow the
compiler to find the record for each name quickly and
to store or retrieve data from that record quickly

 These attributes may provide information about the


storage allocated for a name, its type, its scope and in
the case of procedure names, such things as the
number and types of its arguments, the method of
passing each argument (for example, by value or by
reference), and the type returned.
Compiler Design 15
 Deals with the logical organization of a compiler
 For example, the front-end phases of lexical analysis,

syntax analysis, semantic analysis, and intermediate


code generation can be grouped together into one
pass.
 Code optimization might be an optional pass.
 Then there could be a back-end pass consisting of

code generation for a particular target machine.


 Parser generators that automatically produce syntax
analyzers from a grammatical description of a
programming language.
 Scanner generators that produce lexical analyzers

from a regular-expression description of the tokens of


a language.
 Syntax-directed translation engines that produce

collections of routines for walking a parse tree and


generating intermediate code.
 Code Generators translates each operation of the
intermediate language into the machine language for
a target machine.
 Data flow analysis engines that facilitate the

gathering of information about how values are


transmitted from one part of a program to each other
part. Data-flow analysis is a key part of code
optimization.
 Compiler Construction Toolkits that provide an

integrated set of routines for constructing various


phases of a compiler.
The Move to Higher-level Languages
 First-generation - machine languages.
 Second-generation - assembly languages.
 Third-generation - higher-level languages like Fortran,
Cobol, Lisp, C, C++, C#, and Java.
 Fourth-generation languages are languages designed for
specific applications like SQL for database queries and
Postscript for text formatting.
 Fifth-generation language has been applied to logic and
constraint-based languages like Prolog and OPS5.
 Imperative languages specifies how a computation is
to be done.
 Languages such as C, C++, C#, and Java are imperative

languages.
 In imperative languages there is a notion of program

state and statements that change the state.


 Functional languages such as ML and Haskell and

constraint logic languages such as Prolog are often


considered to be declarative languages (what
computation is to be done).
 Scripting languages are interpreted languages with
high-level operators designed for "gluing together"
computations.
 Awk, JavaScript, Perl, PHP, Python, Ruby, and Tel are

popular examples of scripting languages.


 Develop new algorithms and translations to support for
new programming language feature.
 New translation algorithms must take advantage of the
new hardware capabilities.
 Millions of lines of codes may be there in input program.
 A compiler must translate correctly the potentially infinite
set of programs that could be written in the source
language.
 The problem of generating the optimal target code from a
source program is undecidable.
Modeling in Compiler Design and Implementation
 The study of compilers is about how we design the

right mathematical models and choose the right


algorithms.
 Some of fundamental models are finite-state
machines, regular expressions and context free
languages.
 The term "optimization" in compiler design refers to
the attempts that a compiler makes to produce code
that is more efficient than the obvious code.
 Compiler optimizations must meet the following

design objectives:
◦ Optimization must be correct, that is preserve the meaning
of the compiled program,
◦ Optimization must improve the performance of programs.
◦ The compilation time must be kept reasonable and
◦ engineering effort required must be manageable.
 Optimizations speed ups execution time also
conserve power.
 Compilation time should be short to support a

rapid development and debugging cycle.


 Compiler is a complex system; we must keep

the system simple to assure that the


engineering and maintenance costs of the
compiler are manageable.
Implementation of High-Level Programming Languages
 Higher-level programming languages are easier to

program in, but are less efficient, that is, the target
programs generated run more slowly.
 Programmers using a low-level language have more

control over a computation and can produce more efficient


code.
 The register keyword in the C programming language is an

early example of the interaction between compiler


technology and language evolution.
 Different Programming Languages support different
levels of abstractions.
 Data Flow Optimizations has been developed to

analyze the flow of data through the program and


removes redundancies across these constructs.
 The key ideas behind object orientation are

◦ Data abstraction.
◦ Inheritance of properties
 Both of which have been found to make programs
more modular and easier to maintain.
 Compiler optimizations have been developed
to reduce the overhead ex: unnecessary range
check, unreachable objects
 Effective algorithms have been developed to

minimize the overhead of garbage collection.


 All high-performance systems take advantage
of the same two basic techniques: parallelism
and memory hierarchies.
 Instruction-level parallelism can also appear

explicitly in the instruction set.


 VLIW (Very Long Instruction Word) machines

have instructions that can issue multiple


operations in parallel. The Intel IA64 is a well
known example of such an architecture.
 Programmers can write multithreaded code for
multiprocessors or parallel code can be automatically
generated by a compiler.
 Compiler hides from the programmers the details of
finding parallelism in a program.
 Distributes the computation across the machine, and
minimizing synchronization and communication
among the processors.
 Compilers focus on optimizing the processor
execution by making the memory hierarchy more
effective.
 In the early days of computer architecture
design, compilers were developed after the
machines were built.
 That has changed.
 Since programming in high level languages is

the important, the performance of a computer


system is also determined by the compilers
that can exploit its features.
 Compilers influenced the design of computer
architecture called RISC.
 Prior to RISC, CISC architecture which makes

assembly programming easier was used.


 Compiler optimizations often can reduce these

instructions to a small number of simpler operations by


eliminating the redundancies across complex
instructions.
 RISC – Power PC, SPARC.
 CISC – x86.
 They include data flow machines
◦ VLIW (Very Long Instruction Word) machines.
◦ SIMD (Single Instruction, Multiple Data) arrays of
processors
◦ Multiprocessors with shared memory.
◦ Multiprocessors with distributed memory.
 Compiler technology is needed not only to support

programming for these architectures, but also to


evaluate proposed architectural designs.
 Binary Translation - Compiler technology can be
used to translate the binary code for one machine to
that of another.
 Binary translation can also be used to provide

backward compatibility.
 Hardware Synthesis - hardware designs are mostly

described in high-level hardware description


languages like Verilog and VHDL.
 Hardware Synthesis tools translate RTL descriptions

automatically into gates, which are then mapped to


transistors and eventually to a physical layout.
 Database Query Interpreters – Database queries consist
of predicates containing relational and boolean
operators. They can be interpreted or compiled into
commands to search a database for records satisfying
that predicate.
 Compiled Simulation - Simulations can be very

expensive. Instead of writing a simulator that interprets


the design.
 It is faster to compile the design to produce machine

code that simulates that particular design natively.


 Compiled simulation can run orders of magnitude

faster than an interpreter-based approach.


 Type Checking – Type checking is an effective and
well-established technique to catch inconsistencies in
programs. It can be used to catch errors.
 Bounds Checking - It is easier to make mistakes when

programming in a lower-level language than a higher


level one.
 Memory Management Tools - Garbage collection is

another excellent example. Purify is a widely used tool


that dynamically catches memory management errors
as they occur.
 As the first phase of a compiler, the main task of the
lexical analyzer is to read the input characters of the
source program, group them into lexemes.
 Produce as output a sequence of tokens for each

lexeme in the source program.


 The stream of tokens is sent to the parser for syntax

analysis.
 Stripping out comments and white spaces
 Correlating error messages with source

program
 Keeps track of Line numbers
 Expansion of macros
 Sometimes, lexical analyzers are divided into
a cascade of two processes:
◦ Scanning consists of the simple processes that
do not require tokenization of the input, such as
deletion of comments and compaction of
consecutive whitespace characters into one.
◦ Lexical analysis proper is the more complex
portion, scanner produces the sequence of
tokens as output.
The separation of lexical and syntactic analysis often
allows us to simplify at least one of these tasks.
 Simplicity of design is the most important
consideration.
 Compiler efficiency is improved.

- A separate lexical analyzer allows us to apply specialized


techniques that serve only the lexical task, not the job of
parsing.
- In addition, specialized buffering techniques for reading
input characters can speed up the compiler significantly.
 Compiler portability is enhanced.
 A token is a pair consisting of a token name and an
optional attribute value.
 The token name is an abstract symbol representing a

kind of lexical unit.


 e.g., a particular keyword, or a sequence of input

characters denoting an identifier.


A pattern is a description of the form that the
lexemes of a token may take.
 In the case of a keyword as a token, the pattern

is just the sequence of characters that form the


keyword.
A lexeme is a sequence of characters in the
source program that matches the pattern for a
token and is identified by the lexical analyzer
as an instance of that token.
 The token names and associated attribute values for the
Fortran statement.
 <id, pointer to symbol-table entry for E>
 <assign_op>
 <id, pointer to symbol-table entry for M>
 <mult_op>
 <id, pointer to symbol-table entry for C>
 <exp_op>
 <number, integer value 2>
 f i ( a == f (x) ) . . .
 a lexical analyzer cannot find whether fi is a

misspelling of the keyword if or an undeclared


function identifier.
 Since fi is a valid lexeme for the token id, the lexical

analyzer must return the token id to the parser and let


some other phase of the compiler.
 Suppose if the lexical analyzer is unable to proceed because
none of the patterns for tokens matches any prefix of the
remaining input.
 The simplest recovery strategy is "panic mode" recovery.
Delete successive characters from the remaining input, until
the lexical analyzer can find a well-formed token at the
beginning of what input is left.(iffff)
 Other possible error-recovery actions are:
1. Delete one character from the remaining input. (iff)
2. Insert a missing character into the remaining input.(pritf)
3. Replace a character by another character.(ef)
4. Transpose two adjacent characters.
 Two-buffer scheme that handles large lookaheads
safely called Buffer Pairs.
 Specialized buffering techniques have been developed

to reduce the amount of overhead required to process a


single input character.
-Each buffer is of the same size N, ie size of disk block-
4096 bytes
-using one system read command-can read N characters
into buffer
Two pointers to the input are maintained:
1. Pointer lexemeBegin, marks the beginning of the
current lexeme, whose extent we are attempting to
determine.
2. Pointer forward scans ahead until a pattern match is
found.
 Thus, for each character read, we make two tests: one
for the end of the buffer and one to determine what
character is read.
 Each buffer can hold a sentinel character at the end.
 The sentinel is a special character that cannot be part of

the source program and a natural choice is the character


eof.
switch ( *forward+ + ) {
case eof:
if (forward is at end of first buffer ) {
reload second buffer;
forward = beginning of second buffer;
}
else if (forward is at end of second buffer ) {
reload first buffer;
forward = beginning of first buffer;
}
else eof within a buffer marks the end of input
terminate lexical analysis;
break;
Cases for the other characters
}
 Regular expressions are an important notation for
specifying lexeme patterns.
 While they cannot express all possible patterns, they

are very effective in specifying those types of patterns


that we actually need for tokens.
 Strings and Languages
 An alphabet is any finite set of symbols such as
letters, digits, and punctuation.
 examples

◦ The set {0,1) is the binary alphabet


◦ ASCII is used in many software systems
◦ unicode
◦ A String over an alphabet is a finite sequence of symbols
drawn from that alphabet
◦ In language theory, the terms "sentence" and "word" are
often used as synonyms for "string."
◦ |s| represents the length of a string s,
Ex: banana is a string of length 6
◦ The empty string, is the string of length zero.
◦ if x and y are strings, then the concatenation of x and y is
also string, denoted xy, For example, if x = usp and y = cd,
then xy = uspcd.
◦ The empty string is the identity under concatenation; that is,
for any string s, ɛS = Sɛ = s.
 A language is any countable set of strings over some
fixed alphabet

 Let L = {A, . . . , Z}, then{“A”,”B”,”C”,


“BE”…,”ABZ”,…] is consider the language defined by
L
 Abstract languages like , the empty set, or

{},the set containing only the empty string, are


languages under this definition.
Example:
Let L be the set of letters {A, B, . . . , Z, a, b, . . . , z ) and
let D be the set of digits {0,1,.. .9).
L and D are, respectively, the alphabets of uppercase and
lowercase letters and of digits.
Other languages can be constructed from L and D, using the
operators illustrated above

1. L U D is the set of letters and digits - strictly speaking the


language with 62 (52+10) strings of length one, each of which
strings is either one letter or one digit.

2. LD is the set of 520 strings of length two, each consisting of one


letter followed by one digit.(10×52).
Ex: A1, a1,B0,etc
3. L4 is the set of all 4-letter strings. (ex: aaba, bcef)

4. L* is the set of all strings of letters, including  , the


empty string.

5. L(L U D)* is the set of all strings of letters and digits


beginning with a letter.

6. D+ is the set of all strings of one or more digits.


 Regular Expressions are an important notation for
describing all the languages that can be built from the
operators applied to the symbols of some alphabet

 For Example: we were able to describe identifies by giving


names to sets of letters and digits and using the language
operators union, concatenation and closure.

 In this notation, if letter_ is stand for any letter or the


underscore and digit is stand for any digit, then we could
describe the language of C identifies by:
letter_ (letter_ | digit )*

Compiler Construction
The RE are built Recursively out of smaller regular
expressions, using the rules described below.
Each RE r denotes a language L(r), which is also
defined recursively from the languages denoted by r’s
sub expressions.

Compiler Construction
 Here are the rules that define the regular expression
over alphabet ∑ and languages that those
expressions denote.
Basis: there are two rules form the basis:
 € is a regular expression, and L(€) is { € }, that is,

the language containing only the empty string.


 if a is symbol in Σ, then a regular expression

denoting L(a)={ a }, is the language with only one


string consisting of the single symbol ‘a’ .

Compiler Construction
 Induction:
- Larger regular expressions are built from smaller ones.
Let r and s are regular expressions denoting languages L(r)
and L(s), respectively.
1. (r) | (s) is a regular expression denoting the language L(r) U
L(s).
2. (r) (s) is a regular expression denoting the language L(r) L(s)
3. (r) * is a regular expression denoting (L (r)) * .
4. (r) is a regular expression denoting L(r).

Compiler Construction
This last rule says that we can add additional pairs of
parentheses around expressions without changing the
language they denote.
 for example, we may replace the regular expression

(a) | ((b) * (c)) by a| b*c.

Compiler Construction
Example: Let ∑ = {a, b}
 The regular expression a|b denotes the language {a, b}.

 (a|b) (a|b) denotes {aa, ab, ba, bb} the language of all strings of

length two over the alphabet ∑.


 a* denotes the language consisting of all strings of zero or more

a's, that is, {Є, a, aa, aaa, ... }


 (a|b)* denotes the set of all strings consisting of zero or more

instances of a or b, that is, all strings of a's and b's: {Є, a, b, aa,
ab, ba, bb, aaa, ... }.
 a|a*b denotes the language {ab, b, ab, aab, aaab, ... }, that is, the

string a and all strings consisting of zero or more a's and ending
in b.
 A language that can be defined by a Regular
Expressions is called regular set.
 If two RE r and s denote the same regular set,

we say they are equivalent and write r=s


 For instance: (a|b)= (b|a)
 If ∑ is an alphabet of basic symbols, then a regular
definition is a sequence of definitions of the form:
d 1 ⇒ r1
d 2 ⇒ r2
……
dn ⇒ rn
 Each d is a new symbol, not in ∑ and not the same as
i
any other of the d's.
 Each r is a regular expression over the alphabet ∑ ∪
i
{d1 , d2 , . . . ,di-1}
 Examples strings such as 5280, 0 . 0 1234, 6 . 336E4,
or 1 . 89E-4
1) One or more instances
unary operator + is a positive closure of RE and its
language

2) Zero or one Instance


unary operator ?
ex: r?= r | 
3) Character Classes
R.E a1|a2|….|an, can be replaced by the shorthand
[a1a2…an]
Moreover when a1,a2,…an form a logical sequence,
we can replace them by [a1-an]
Regular definition for C identifiers:

letter_ -> [A-Z a-z _]


digit -> [0-9]
Id -> letter_ ( letter_ | digit )*
Regular definition for unsigned numbers:

digit -> [0-9]


digits -> digit+
number -> digits ( . digits)? (E[+-]? digits )?
•Given the grammar of branching statement:

Compiler Construction
 The terminals of the grammar, which are if, then,
else, relop, id, and number, are the names of tokens
as used by the lexical analyzer.
 The lexical analyzer also has the job of stripping out

whitespace, by recognizing the "token" ws defined


by:

 Token ws is not return to parser instead it restart the


LA from character that follows the white space.
 Goal of Lexical Analyzer is, for each lexeme
or family of lexemes, which token name is
returned to the parser and what attribute
value is returned..
 Is discussed in table
Compiler Construction
 To recognize tokens there are 2 steps

1. Design of Transition Diagram


2. Implementation of Transition Diagram
 A transition diagram is similar to a flowchart for (a part of)
the lexer.
 We draw one for each possible token. It shows the decisions

that must be made based on the input seen.


 Transition diagrams have a collection of nodes or circles,

called States.
 Each state represents a condition that could occur during

the process of scanning the input looking for a lexeme that


matches one of several patterns.
  Edges are directed from one state of the transition diagram

to another.– edges are labeled by i/p symbols.


  
Ex : RELOP = < | <= | = | <> | > | >=

2 return(relop,LE)
=
1 3 return(relop,NE)
>
<
start other *
0 = 5 4 return(relop,LT)
return(relop,EQ)
>
=
6 7 return(relop,GE)

* indicates input retraction other * return(relop,GT)


8
Compiler Construction
 Ex2: ID = letter(letter | digit) *
Transition Diagram:
letter or digit

letter other
*
start
9 10 11
return ( getToken(),
installID() )

*indicates input retraction

Compiler Construction
 Two questions remain.
1. How do we distinguish between identifiers and
keywords such as if, then and else, which also match
the pattern in the transition diagram?
2. What is (getToken(), installID())?
1) Install the reserved words in the Symbol Table initially.
installID() checks if the lexeme is already in the symbol
table. If it is not present, the lexeme is installed ( places
it in symbol table) as an id token. In either case a pointer
to the entry is returned.
getToken examines the symbol table entry for the lexeme
found and returns the token name
2) Create separate transition for each keyword;
Multiple accepting state

Accepting float
e.g. 12.31E4

Accepting integer Accepting float


e.g. 12 e.g. 12.31

Compiler Construction
 There are several ways that a collections of
Transition Diagram can be used to build a LA.

 Each State is represented by a piece of code

 Implementation of relop transition diagram is


shown below
Token getRelop() {
TOKEN retToken = new(RELOP);
while (1) {
switch (state) {
case 0: c = nextchar();
if (c is white space) state = 0;
else if (c == ‘<‘) state = 1;
else if (c == ‘=‘) state = 5;
else if (c == ‘>’) state = 6;
else fail ( );
break;
case 1: ….
case 8: retract(1);
retToken.attribute = GT;
return (retToken); }
} }
1. The transition diagram for each token to be tried
sequentially.
- then function fail() resets the pointer forward and start
the next TD, each time its called.

2. Run the various TDs “in parallel,” feeding the next


input character to all of them and allowing each one to
make whatever transitions it required.

3. The Preferred approach is to combine all the TD into


one.
 Actual parameter
 Formal Parameter

1. Call–by-Value
2. Call-by-Reference
3. Call-by-Name:
Actual parameter were substituted
literally for the formal parameter(as if macro
for Actual parameter) in the code of callee.
Two formal parameter can refer to the same
location such variables called aliases of one
another.
Ex: a is an array of procedure p

P calls another proc q(x, y) with a call q(a, a)


Now x and y are become aliases of each other
x[10]=2 is lead to y[10]=2
 Programming languages has precise rules
that gives the syntactic structure of programs
 Ex: in C, pgm is made of functions
 Can be specified by CFG
 Grammar benefits for both language designer

and compiler writers


Gives a precise syntactic specification of a
programming language.
 Can construct a parser that determines the

syntactic structure.

 Useful for translating source programs into


correct object code and detecting error
 a grammar allows a language can evolved or

developed iteratively.
 Verifies the tokens can be generated by the
grammar
 Report syntax errors
 Recover from commonly occurring errors
 Construct a parse tree and passes it to rest of

the compiler
We categorize the parsers into three groups:
1 Universal parser
Can parse any grammar but too inefficient to
use in production compilers
2 Top-Down Parser
the parse tree is created top to bottom,
starting from the root.
3 Bottom-Up Parser
the parse is created bottom to top; starting
from the leaves
Both top-down and bottom-up parsers
scan the input from left to right
(one symbol at a time).
 Efficient top-down and bottom-up parsers

can be implemented only for sub-classes of


context-free grammars.
– LL for top-down parsing
– LR for bottom-up parsing
 Common Programming errors can occur at
many different levels.
1. Lexical errors: include misspelling of
identifiers, keywords, or operators.
2. Syntactic errors : include misplaced
semicolons or extra or missing braces.
3. Semantic errors: include type mismatches
between operators and operands.
4. Logical error: incorrect reasoning
use of = instead of ==
 Report the presence of errors clearly
and accurately
 Recover from each error quickly
enough to detect subsequent errors.
 Add minimal overhead to the
processing of correct programs
 Panic-Mode Recovery
 Phrase-Level Recovery
 Error Productions
 Global Correction
 On discovering an error, the parser discards
input symbols one at a time until one of a
designated set of Synchronizing tokens is
found.
 Synchronizing tokens are usually delimiters.
 Ex: semicolon or } whose role in the source

program is clear and unambiguous.


 It often skips a considerable amount of input

without checking it for additional errors.


Advantages:
 Simplicity
 Is guaranteed not to go into an infinite loop
 A parser may perform local correction on the
remaining input. i.e
it may replace a prefix of the remaining input by
some string that allows the parser to continue.
Ex: replace a comma by a semicolon, insert a
missing semicolon
 Local correction is left to the compiler

designer.
 It is used in several error-repairing compliers,

as it can correct any input string


 We can augment the grammar for the language
at hand with productions that generate the
erroneous constructs.
 Then we can use the grammar augmented

by these error productions to Construct a


parser.
If an error production is used by the parser, we
can generate appropriate error diagnostics to
indicate the erroneous construct that has been
recognized in the input.
 We use algorithms that perform minimal
sequence of changes to obtain a globally least
cost correction
 Given an incorrect input string x and grammar G,

these algorithms will find a parse tree for a


related string y.
 Such that the number of insertions, deletions and

changes of tokens required to transform x into y


is as small as possible.
 It is too costly to implement in terms of time
space, so these techniques only of theoretical
interest
 Its is implemented by filling in the blank
entries in the predictive parsing table with
pointers to error routines
 Routines may change, insert or delete

symbols on the input and issue appropriate


error messages.

You might also like