0% found this document useful (0 votes)
19 views27 pages

Lecture I

Uploaded by

Lakad Chowdhury
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views27 pages

Lecture I

Uploaded by

Lakad Chowdhury
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Parsing

Part I
Language and Grammars

• Every (programming) language has precise rules


– In English:
• Subject Verb Object
– In C
• programs are made of functions
» Functions are made of statements etc.
Parsing

• A.K.A. Syntax Analysis


– Recognize sentences in a language.
– Discover the structure of a document/program.
– Construct (implicitly or explicitly) a tree (called as a
parse tree) to represent the structure.
– The above tree is used later to guide translation.
Role of the parser

tokens Intermediate
Scanner Parser representation
Source Parse Rest of
program
(lexical (syntax tree
analysis) analysis) front end
Get next
tokens

Symbol
Table

• Verifies if the string of token can be generated from the grammar


• Error?
• Report with a good descriptive, helpful message
• Recover and continue parsing!
• Build a parse tree !!
Rest of Front End

• Collecting token information


• Type checking
• Intermediate code generation
Errors in Programs

• Lexical
if x<1 thenn y = 5:
“Typos”
• Syntactic
if ((x<1) & (y>5))) ...
{ ... { ... ... }
• Semantic
if (x+5) then ...
Type Errors
Undefined IDs, etc.
• Logical Errors
if (i<9) then ...
Should be <= not <
Bugs
Compiler cannot detect Logical Errors
Error Detection
• Much responsibility on Parser
– Many errors are syntactic in nature
– Precision/ efficiency of modern parsing method
– Detect the error as soon as possible

• Challenges for error handler in Parser


– Report error clearly and accurately
– Recover from error and continue..
– Should be efficient in processing

• Good news is
– Simple mechanism can catch most common errors

• Errors don’t occur that frequently!!


• 60% programs are syntactically and semantically correct
• 80% erroneous statements have only 1 error, 13% have 2
• Most error are trivial : 90% single token error
• 60% punctuation, 20% operator, 15% keyword, 5% other error
Adequate Error Reporting is Not a Trivial Task

• Difficult to generate clear and accurate error messages.


Example
function foo () {
...
if (...) {
...
} else {
...
Missing } here
...
}
<eof> Not detected until here
Example
int myVarr;
...
x = myVar; Misspelled ID here
...
Not detected until here
Error Recovery

• After first error recovered


– Compiler must go on!
• Restore to some state and process the rest of the input

• Error-Correcting Compilers
– Issue an error message
– Fix the problem
– Produce an executable

Example
Error on line 23: “myVarr” undefined.
“myVar” was used.

May not be a good Idea!!


– Guessing the programmers intention is not easy!
Error Recovery May Trigger More Errors!

• Inadequate recovery may introduce more errors


– Those were not programmers errors

• Example:
int myVar flag ;
...
Declaration of flag is discarded
x := flag;
...
... Variable flag is undefined
while (flag==0)
...
Variable falg is undefined

Too many Error message may be obscuring


– May bury the real message
– Remedy:
• allow 1 message per token or per statement
• Quit after a maximum (e.g. 100) number of errors
Error Recovery Approaches: Panic Mode

• Discard tokens until we see a “synchronizing” token.

Example

Skip to next occurrence of


} end ;
Resume by parsing the next statement

• The key...
– Good set of synchronizing tokens
– Knowing what to do then
• Advantage
– Simple to implement
– Does not go into infinite loop
– Commonly used
• Disadvantage
– May skip over large sections of source with some errors
Error Recovery Approaches: Phrase-Level Recovery

• Compiler corrects the program


by deleting or inserting tokens
...so it can proceed to parse from where it was.

Example

while (x==4) y:= a + b

Insert do to fix the statement

• The key...
Don’t get into an infinite loop
...constantly inserting tokens and never scanning the actual source
• Generally used for error-repairing compilers
– Difficulty: Point of error detection might be much later the point of
error occurrence
Error Recovery Approaches: Error Productions

• Augment the CFG with “Error Productions”


• Now the CFG accepts anything!
• If “error productions” are used...
Their actions:
{ print (“Error...”) }

• Used with...
– LR (Bottom-up) parsing
– Parser Generators
Error Recovery Approaches: Global Correction

• Theoretical Approach
• Find the minimum change to the source to yield a
valid program
– Insert tokens, delete tokens, swap adjacent tokens
• Global Correction Algorithm
Input: grammatically incorrect input string x; grammar G
Output: grammatically correct string y
Algorithm: converts x Æ y using minimum number
changes (insertion, deletion etc.)
• Impractical algorithms - too time consuming
Context Free Grammars (CFG)

• A context free grammar is a formal model that consists of:


• Terminals
Keywords
Token Classes
Punctuation
• Non-terminals
Any symbol appearing on the lefthand side of any rule
• Start Symbol
Usually the non-terminal on the lefthand side of the first rule
• Rules (or “Productions”)
BNF: Backus-Naur Form / Backus-Normal Form
Stmt ::= if Expr then Stmt else Stmt
Rule Alternative Notations
Notational Conventions
Derivations
Derivation
Leftmost Derivation
Rightmost Derivation
Parse Tree
Parse Tree
Parse Tree
Parse Tree
Ambiguous Grammar
Ambiguous Grammar

• More than one Parse Tree for some sentence.


– The grammar for a programming language may be
ambiguous
– Need to modify it for parsing.

• Also: Grammar may be left recursive.


• Need to modify it for parsing.

You might also like