0% found this document useful (0 votes)
10 views57 pages

CD Chapter 3

Uploaded by

ephitsegaye7878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views57 pages

CD Chapter 3

Uploaded by

ephitsegaye7878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

WCU

College of Engineering and Technology


Dep’t : Computer Science

Course: Compiler Design


Chapter 3: Syntax Analysis

Instructor: Agegnehu Ashenafi (MSc.)


3.1. Introduction to Syntactic Analysis
 Every programing language has rules that prescribe the syntactic structure of well-
formed programs.
 For example c or Pascal programs made out of blocks, a block out of statements, a
statement out of expressions, an expression out of tokens and so on.
 The syntax of these all high level constructs such as blocks, statements, expressions can
be described by context free grammar(CFG).
 The low level construct called token can be described using regular expression.
3.1.1 Advantage of Grammar for Syntactic Specification
3.1.1. Types of Parsing or Types of Syntactic Analysis
3.1.1. Types of Parsing or Types of Syntactic Analysis

Fig 3.1 position of parser in compiler model


3.1.1. Types of Parsing or Types of Syntactic Analysis
3.1. Introduction …
The parser performs the following functions.
1. To verify the string structure generated by the tokens, based on
the grammar or not
2. To construct parse tree
3. Report the existence of errors
4. Performs error recovery
The parser can’t detect the following errors
a) Variable re-declaration (the variable declared already or not )
b) Variable initialization before use
c) Data type mismatch for an operation
All these issues are handled by semantic analysis phase
PARSING:
Parsing is the activity of checking whether a string of symbols is in the language of some
grammar, where this string is usually the stream of tokens produced by the lexical analyzer.
If the string is in the grammar, we want a parse tree, and if it is not, we hope for some kind
of error message explaining why not.
There are two main kinds of parsers in use, named for the way they build the parse trees:
• Top-down: A top-down parser attempts to construct a tree from the root, applying
productions forward to expand non-terminals into strings of symbols.
• Bottom-up: A Bottom-up parser builds the tree starting with the leaves, using
productions in reverse to identify strings of symbols that can be grouped together. In
both case the input the parser is scanned from left to right, one symbol at a time.
The most efficient top – down and bottom – up parsing methods work only based on
subclasses of grammar such as:-
1. LL(Left production, Left most traversing) grammar (this parsers implemented by hand )
2. LR(Left production, right most traversing) grammar (since the largest class, LR parsers usually
constructed by automated tools) A parse tree is the graphical representation of the
structure of a sentence according to its grammar.
PARSING
Example:
Let the production P is:

• The parse tree may be viewed as a representation for a derivation


that filters out the choice regarding the order of replacement. Parse
tree for a * b + c
3.2. Syntax Error Handling
If a compiler had to process only correct programs, its design and implementation
would be greatly simplified.
But programmers frequently write incorrect programs, and a good compiler should
assist the programmer in identifying and locating errors.
It is striking that although errors are so commonplace, few languages have been
designed with error handling in mind.
Our civilization would be radically different if spoken languages had the same
requirement for syntactic accuracy as computer languages, most programming
language specifications do not describe how a compiler should respond to errors; the
response is left to the compiler designer.
Planning the error handling right from the start can both simplify the structure of a
compiler and improve its response to errors.
We know that programs can contain errors at many different levels. For example, errors
can be:-
Lexical, such as misspelling an identifier, keyword, or operator
Syntactic, such as an arithmetic expression with unbalanced parentheses
Semantic, such as an operator applied to an incompatible operand
Logical, such as an infinitely recursive call
3.2. Syntax Error Handling …cont
• Often much of the error detection and recovery in a compiler is centered
on the syntax analysis phase.
• One reason for this is that many errors are syntactic in nature or are
exposed when the stream of tokens coming from the lexical analyzer
disobeys the grammatical rules defining the programming language.
• Another is the precision of modern parsing methods; they can detect the
presence of syntactic errors in programs very efficiently.
• Accurately detecting the presence of semantic and logical errors at compile
time is a much more difficult task.
• In this section, we present a few basic techniques for recovering from
syntax errors; their implementation is discussed in conjunction with the
parsing methods in this chapter.
• The error handler in a parser has simple-to-state goals:
It should, report the presence of errors clearly and accurately.
It should recover from each error quickly enough to be able to detect subsequent
errors.
It should not significantly stop down the processing of correct programs,
3.2. Syntax Error Handling …cont
Several parsing methods, such as the LL and LR methods, detect an error as soon as possible, they
detect that an error has occurred as soon as they see a prefix of the input that is not a prefix of any
string in the language.

Error-Recovery Strategies
There are many different general strategies that a parser can employ to recover from a syntactic
error. Although no one strategy has proven itself to be universally acceptable, a few methods have
broad applicability. Here we introduce the following strategies;
a)Panic mode:- This is the simplest method to implement and can be used by most parsing
methods.
 On discovering an error, the parser discards input symbols one at a time until one of a designated set
of synchronizing tokens is found.
 The synchronizing tokens are usually delimiters, such as semicolon or end, whose role in the source
program is clear.
b) Phrase level:- On discovering an error, a parser may perform local correction on the remaining
input; that is, it may replace a prefix of the remaining input by some string that allows the parser
to continue.
 A typical local correction would be to replace a comma by a semicolon. Delete an extraneous
semicolon, or insert a missing semicolon.
 The choice of the local correction is left to the compiler designer. Of course, we must be careful to
choose replacements that do not lead to infinite loops,
3.2. Syntax Error Handling …cont
c) Error productions:- If we have a good Idea of the common errors that might be
encountered, we can augment the grammar for the language at hand with
productions that generate the erroneous constructs, We then use the grammar
augmented by these error productions to construct a parser, if an error production
is used by the parser, we can generate appropriate error diagnostic to indicate the
erroneous construct that has been recognized in the input.

d) Global correction:- Ideally, we would like a compiler to make as few changes as


possible in processing an incorrect input string.
There are algorithms for choosing a minimal sequence of changes to obtain a
globally least cost correction. Given an incorrect input string x and grammar G.
these algorithms will find a parse tree for a related string y, such that the number
of insertions, deletions. and changes of tokens required to transform x into y is as
small as possible.
Unfortunately, these methods are in general tm costly to implement in terms of
time and space, so these techniques are currently only of theoretical interest.
3.3. Context Free Grammar …cont
Many programming language construct have an inherently recursive
structure that can be defined by context free grammars. For example,
we might have a conditional statement defined by a rule such as
If S1 and S2 are statements and E is an expression. Then
“if E then S1 else S2" Its a statement.
• This form of conditional statement cannot be specified using the
notation of regular expressions; On the other hand, using the
syntactic variable
• stmt to denote the class of statements and expr the class of
expressions, we can readily express the above using the grammar
production
3.3. Context Free Grammar …cont
A context free grammar (grammar for short) consists of terminals, nonterminal, a
start symbol, and productions.
1. Terminals are the basic symbols from which strings are formed. The word
"token" is a synonym for "terminal" when we are talking about grammars fur
programming languages. The keywords if, Then, and else is a terminal.
2. Nonterminal are syntactic variables that denote sets of strings. Such like stmt
and expr are nonterminal. The nonterminal defines sets of strings that help define
the language generated by the grammar. They also impose a hierarchical structure
on the language that is useful for both syntax analysis and translation.
3. In a grammar, one nonterminal is distinguished as the start symbol, and the set
of strings it denotes is the language defined by the grammar.
4. The productions of a grammar specify the manner in which the terminals and
nonterminal can be combined to form strings. Each production consists of a
nonterminal. Followed by an arrow (sometimes the symbol = is used in place of
the arrow), followed by a string of nonterminal and terminals.
3.3. Context Free Grammar …cont
Example 3.2, the grammar with the following productions defines
simple arithmetic expressions.

In this grammar, the terminal symbols are


The nonterminal symbols are expr and op, and expr is the start symbol.
3.3. Context Free Grammar …cont
3.3. Context Free Grammar …cont
 Notational Conventions
 Notational Conventions
Derivation
• There are several ways to view the process by which a grammar defines a language. In
fact, this derivational view gives a precise description of the top-down construction of a
parse tree, The central idea here is that a production is treated as a rewriting rule in
which the nonterminal on the left is replaced by the string on the right side of the
production.
• For example, consider the following grammar for arithmetic expressions, with the
nonterminal E representing an expression.
.

.
Bottom-up Parsing
• We introduce a general style of bottom-up syntax analysis, known as shift-reduce
parsing. A much more general method of shift-reduce parsing, called LR parsing, LR
parsing is used in a number of automatic parser generators.
• Shift-reduce parsing attempts to construct a parse tree for an input string beginning at
the leaves (the bottom) and working up towards the r o d (the top). We can think of this
process as one of "reducing" a string w to the start symbol of a grammar.
• At each reduction step a particular substring matching the right side of a production is
replaced by the symbol on the left of that production, and if the substring is chosen
correctly at each step, a rightmost derivation is traced out in reverse
.

• 1. Handle Pruning
A rightmost derivation in reverse can be obtained by "handle
pruning," That is, we start with a string of terminals w that we wish to
parse. If w is a sentence of the grammar at hand. Then w = yn where
yn is the nth right-sentential form of some as yet known rightmost
derivation
Eg. Consider the following arithmetic grammar and reduces id1 + id2*id3 to the start
symbol E, The reader should observe that the sequence of right-sentential forms in
this example is just the reverse of the sequence in the first rightmost derivation
2. Stack Implementation of Shift-Reducing Parsing
• There are two problems that must be solved if we are to parse by
handle pruning. The first is to locate the substring to be reduced in a
right-sentential form, and the second is to determine what
production to choose in case there is more than one production with
that substring on the right side.
• Before we get to these questions, let us first consider the type of data
structures to use in a shift-reduce parser, A convenient way to
implement a shift-reduce parser is to use a stack to hold grammar
symbols and an input buffer to hold the string w to be parsed. We use
$ to mark the bottom of the stack and also the right end of the input.
• Initially, the stack is empty, and the string w is on the input, as
follows:
2. Stack Implementation of Shift-Reducing Parsing ...
After entering this configuration, the parser halts and announces
successful completion of parsing. Example:
2. Stack Implementation of Shift-Reducing Parsing ...
Thank You!

You might also like