0% found this document useful (0 votes)
7 views20 pages

Parser Lec1

Uploaded by

Mohammad Humayun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views20 pages

Parser Lec1

Uploaded by

Mohammad Humayun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Syntax Analysis

Contents

• Syntax Analysis
• Introduction
• The Role of the Parser
• Representative Grammars
• Context-Free Grammars
• Formal Definition of a CFG
• Notational Conventions
• Derivations

2
Syntax Analysis
• Grammars offer significant benefits for both language
designers and compiler writers.

• A grammar gives a precise syntactic specification of a


programming language.

• From certain classes of grammars, we can construct


automatically an efficient parser that determines the syntactic
structure of a source program.

• The structure imparted to a language by a properly designed


grammar is useful for translating source programs into correct
object code and for detecting errors.
3
Role of the Parser
• In compiler model, the parser obtains a string of tokens
from the lexical analyzer & verifies that the string of token
names can be generated by the grammar for the source
language.

4
Role of the Parser..
• There are 03 general types of parsers for grammars:
universal, top-down, and bottom-up.
• Universal parsing methods such as the Cocke-Younger-Kasami
algorithm and Earley's algorithm can parse any grammar. These
general methods are, however, too inefficient to use in
production compilers.

• The methods commonly used in compilers is either top-down or


bottom-up.

• Top-down methods build parse trees from the top (root) to the
bottom (leaves), while bottom-up methods start from the leaves
and work their way up to the root.

5
Representative Grammars
• Expressions with + and *

E→E+T|T
T→T*F|F
F → ( E ) | id

• This takes care of precedence, but as we saw before, gives


us trouble since it is left-recursive and we did top-down
parsing.

• So we use the next non-left-recursive grammar that generates


the same language.
6
Representative Grammars
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id
• Following ambiguous grammar will be used for
illustration, but in general we try to avoid ambiguity.

E → E + E | E * E | ( E ) | id
• This grammar does not enforce precedence and it does not
specify left vs right associativity.
For example, id + id + id and id * id + id each have two parse
trees.
7
CFG
• Grammars used to systematically describe the syntax of
programming language constructs like expressions and
statements.

stmt --> if ( expr ) stmt else stmt

• A syntactic variable stmt is used to denote statements and variable


expr to denote expressions.

• Other productions then define precisely what an expr is and what else
a stmt can be.

• A language generated by a (context-free) grammar is called


a context free language.
8
CFG Definition
• Context-free grammar (grammar ) consists of terminals,
non-terminals, a start symbol, and productions.

• Terminals:
• The basic components found by the lexer.
• They are sometimes called token names, i.e., the first
component of the token as produced by the lexer.

• Non-terminals:
• Syntactic variables that denote sets of strings.
• The sets of strings denoted by non-terminals help define the
language generated by the grammar
9
CFG Definition..
• Start Symbol:
• A non-terminal that forms the root of the parse tree.
• Conventionally, the productions for the start symbol are listed
first.

• Productions:
• The productions of a grammar specify the manner in which the
terminals and non-terminals can be combined to form strings.

• Each production consists of:


1. A nonterminal called the head or left side of the production,
this production defines some of the strings denoted by the
head.
10
CFG Definition..
2. The symbol  Sometimes ::= has been used in place of the
arrow.

3. A body or right side consisting of zero or more terminals and


non-terminals.

The components of the body describe one way in which


strings of the non-terminal at the head can be constructed.

11
CFG Definition..
• Ex Grammar

• Terminals: id + - * / ( )
• Non-Terminals: expression, term, factor
• Start Symbol: expression
12
Notational Conventions
• Notational conventions for grammars:

• These symbols are terminals:

(a) Lowercase letters early in the alphabet, such as a, b, c.


(b) Operator symbols such as +, *, and so on.
(c) Punctuation symbols such as parentheses, comma, and so on.
(d) The digits 0, 1, . . . , 9.
(e) Boldface strings such as id or if, each of which represents a
single terminal symbol.

13
Notational Conventions..

• These symbols are non-terminals:

(a) Uppercase letters early in the alphabet, such as A, B, C.


(b) The letter S, which, when it appears, is usually the start
symbol.
(c) Lowercase, italic names such as expr or stmt.
(d) When discussing programming constructs, uppercase letters
may be used to represent non-terminals for the constructs.
For example, non-terminals for expressions, terms, and factors
are often represented by E, T, and F, respectively.

14
Notational Conventions…
• Uppercase letters late in the alphabet, such as X, Y, Z, represent
grammar symbols, that is, either non-terminals or terminals.

• Lowercase letters late in the alphabet , chiefly u, v, ... ,z,


represent (possibly empty) strings of terminals.

• Lowercase Greek letters, represents (possibly empty) strings of


grammar symbols.

• A set of productions A  α1 , A  α2 ,…, A  αk with a common


head A (call them A-productions) , may be written as
A  α1 | α2 ,…, |αk
15
Notational Conventions…

• The grammar we defined earlier using notations:

16
Derivations
• Assume we have a production A → α.
• We would then say that A derives α and write A ⇒ α

• We generalize this. If, in addition, β and γ are strings, we


say that βAγ derives βαγ and write
βAγ ⇒ βαγ

• We generalize further. If α derives β and β derives γ, we


say α derives γ and write
α ⇒* z
• Means drives in zero or more steps.

17
Derivations
• Formal definition of zero or more definitions:
1. α ⇒* α, for any string α.
2. If α ⇒* β and β ⇒ γ, then α ⇒* γ.

• If S is the start symbol and S ⇒* α, we say α is a sentential


form of the grammar.

• A sentential form may contain non-terminals and terminals.


• If it contains only terminals it is a sentence of the grammar and
the language generated by a grammar G, L(G), is the set of sentences.

• Two grammars generating the same language are


called equivalent.
18
Derivations
• Ex: E → E + E | E * E | ( E ) | id

• We see that id + id is a sentence. Indeed it can be derived


in two ways from the start symbol E.
E ⇒ E + E ⇒ id + E ⇒ id + id E ⇒ E + E ⇒ E + id ⇒ id + id

• In the first derivation, we replaced the leftmost non-terminal by


the body of a production having the non-terminal as head. This
is called a leftmost derivation.
• Similarly the second derivation in which the rightmost non-
terminal is replaced is called a rightmost derivation or
a canonical derivation.

19
Thank You

You might also like