0% found this document useful (0 votes)

18 views78 pages

Chapter 3 Syntax Analysis

Chapter 3 discusses syntax analysis in programming languages, focusing on the parsing problem, parsing notations, and context-free grammars. It explains the goals of parsers, the structure of grammars, and the types of parsers such as top-down and bottom-up. Additionally, it covers recursive-descent parsing, left recursion issues, and the complexity of parsing algorithms.

Uploaded by

kimmaurice4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views78 pages

Chapter 3 Syntax Analysis

Uploaded by

kimmaurice4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

Chapter 3

Syntax Analysis

1
The Parsing Problem
• Goals of the parser, given an input program:
• Find all syntax errors;
• for each, produce an appropriate diagnostic message and
recover quickly
• Produce the parse tree, or at least a trace of the parse
tree, for the program

2
Parsing Notations
• Lowercase letters at the beginning of the alphabet
(a, b, …) for terminal symbols.
• Uppercase letters at the beginning of the alphabet
(A, B, …) for non terminal symbols.
• Uppercase letters at the end of the alphabet (W, X,
Y, Z) for terminals or nonterminals.
• Lowercase letters at the end of the alphabet (w, x,
y, z) for strings of terminal.
• Lowercase Greek letters (, , , ) for mixed
strings (terminals and/or nonterminals)

1-3
Syntax of Programming Language
• Described by a context-free grammar (Backus-Naur Form -
BNF).
• Similar to the languages specified by regular expressions, but
more general.
• A grammar gives a precise syntactic specification of a language.
• From some classes of grammars, tools exist that can
automatically construct an efficient parser.
• These tools can also detect syntactic ambiguities and other problems
automatically.
• A compiler based on a grammatical description of a language
is more easily maintained and updated.
• Syntax decide whether program satisfies syntactic structure
• Error detection
• Error recovery
• Simplification: rules on tokens 4
Context-free Grammars
• A context-free grammar for a language specifies the
syntactic structure of programs in that language.
• Components of a grammar:
• a finite set of tokens (obtained from the scanner);
• a set of variables representing “related” sets of strings, e.g.,
declarations, statements, expressions.
• a set of rules that show the structure of these strings.
• an indication of the “top-level” set of strings we care about.

5
Context-free Grammars: Definition
• Formally, a context-free grammar G is a 4-tuple G = (V, T, P,
S), where:
• V is a finite set of variables (or nonterminals). These describe sets
of “related” strings.
• T is a finite set of terminals (i.e., tokens).
• P is a finite set of productions, each of the form
A  
where A  V is a variable, and   (V  T)* is a
sequence of terminals and nonterminals.
• S  V is the start symbol.
6
Context-free Grammars: An Example
A grammar for palindromic bit-strings:
G = (V, T, P, S), where:
• V = { S, B }
• T = {0, 1}
• P = { S  B,
S  ,
S  0 S 0,
S  1 S 1,
B  0,
B1
} 7
Context-free Grammars: Terminology

• Derivation: Suppose that

•  and  are strings of grammar symbols, and
• A   is a production.
Then, A   (“A derives ”).

•  : “derives in one step”

* : “derives in 0 or more steps”
 *  (0 steps)
 *  if    and  *  ( 1 steps)

8
Derivations: Example
• Grammar for palindromes: G = (V, T, P, S),
• V = {S},
• T = {0, 1},
• P = { S  0 S 0 | 1 S 1 | 0 | 1 |  }.
• A derivation of the string 10101:
S
1S1 (using S  1S1)
 1 0S0 1 (using S  0S0)
 10101 (using S  1)

9
Leftmost and Rightmost Derivations
• A leftmost derivation is one where, at each step, the leftmost
nonterminal is replaced.
(analogous for rightmost derivation)
• Example: a grammar for arithmetic expressions:
E  E + E | E * E | id
• Leftmost derivation:
E  E * E  E + E * E  id + E * E  id + id * E  id + id * id
• Rightmost derivation:
E  E + E  E + E * E  E + E * id  E + id * id  id + id * id

10
Summary on Syntax
Grammar rules: Symbols:
E  id terminals (tokens) + * ( ) id num
E  num non-terminals E
EE+E
EE*E
E(E)

Derivation: Parse tree:

E E
E+E
1+E E + E
1+E+E
1 *
1+2+E E E
1+2*3

11
Summary on Syntax - Ambiguity
Grammar rules:
E  id
E  num
EE+E
EE*E
E(E)

Leftmost derivation Rightmost derivation

Derivation: Parse tree: Derivation: Parse tree:

E E
E E
E+E E*E
1+E E + E E*3 E * E
1+E+E E+E*3
1 * 3
1+2+E E E E+2*3 E + E
1+2*3 1+2*3
2 3 1 2

12
Summary on Syntax - Grammar rewriting
Ambiguous grammar: Non-ambiguous grammar:
E  id EE+T
E  num ET
EE+E TT*F
EE*E TF
E(E) F  id
F(E)

Derivation: Parse tree:

E E
E+T
1+T E + T
1+T*F
T *
1+F*F T F
1+2*F F
F 3
1+2*3

13
Categories of Parsers
• Top down - produce the parse tree, beginning at the root
• Order is that of a leftmost derivation - i.e. branches from a particular node are followed in left-to-
right order
• Traces or builds the parse tree in preorder - i.e. each node is visited before its branches are
followed
• For every non-terminal and token predict the next production
• Top Down Parsers are classified into two:
• Recursive Descent Parsers
• LL Parsers

• Bottom up - produce the parse tree, beginning at the leaves towards the root
• Order is that of the reverse of a rightmost derivation - That is, the sentential forms of the
derivation are produced in order of last to first.
• For every potential right hand side and token decide when a production is found

• Useful parsers look only one token ahead in the input

14
Top-Down Parsers
• Determining the next sentential form is a matter of choosing the correct grammar rule
that has A as its LHS
• the leftmost derivation, using only the first token produced by A
• E.g. a sentential form, xA with the following A-rules
A → bB
A → cBb
A→a
the parser must choose the correct A-rule to get the next sentential form, which could be xbB, xcBb, or xa
• This is the parsing decision problem for top-down parsers
• The most common top-down parsing algorithms are called LL algorithms
• first L specifies a left-to-right scan of the input
• second L specifies that a leftmost derivation is generated.
• Two implementations of the algorithms are possible
• Using a recursive-descent parser coded version of a syntax analyzer based directly on the BNF description of the
syntax of language.
15
• Using a parsing table to implement the BNF rules.
Complexity of Parsing
• Parsers (algorithms) that work for any unambiguous
grammar are complex and inefficient
• Complexity of such algorithms is O(n3), where n is the length
of the input
• Thus, need to search for faster algorithms, though less general i.e.
generality is traded for efficiency.
• Compilers use parsers that only work for a subset of all
unambiguous grammars, but do it in linear time ( O(n),
where n is the length of the input )
16
Recursive-Descent Parsing
• Recursive descent parsing is a method where each non-terminal in the grammar is
associated with a procedure or function in the parsing code.
• There is a subprogram for each nonterminal in the grammar can parse sentences that
can be generated by that nonterminal
• These procedures recursively call each other to match the input string against the
production rules of the grammar.
• Recursive descent parsers are relatively straightforward to implement and understand,
but
• They can be inefficient for grammars with left recursion or ambiguity.
• EBNF is ideally suited for being the basis for a recursive-descent parser
• because EBNF minimizes the number of nonterminals
• A recursive-descent parser is an LL parser
• EBNF

17
Recursive-Descent Parsing (cont.)
• A grammar for simple expressions:

<expr>  <term> {(+ | -) <term>}

<term>  <factor> {(* | /) <factor>}
<factor>  id | int_constant | ( <expr> )

18
Recursive-Descent Parsing (cont.)
• Assume we have a lexical analyzer named lex, which puts
the next token code in nextToken
• The coding process when there is only one RHS:
• For each terminal symbol in the RHS, compare it with the next
input token;
• if they match, continue, else there is an error
• For each nonterminal symbol in the RHS, call its associated
parsing subprogram

19
Recursive-Descent Parsing (cont.)
/* Function expr
Parses strings in the language
generated by the rule:
<expr> → <term> {(+ | -) <term>}
*/

void expr() {

/* Parse the first term */

term();
/* As long as the next token is + or -, call
lex to get the next token and parse the • This particular routine does not detect
next term */
errors
while (nextToken == ADD_OP ||
• Convention: Every parsing routine
nextToken == SUB_OP){ leaves the next token in nextToken
lex();
term();
}
20
}
Recursive-Descent Parsing (cont.)
/* term
Parses strings in the language generated by the rule:
<term> -> <factor> {(* | /) <factor>)
*/
void term() {
printf("Enter <term>\n");
/* Parse the first factor */
factor();
/* As long as the next token is * or /,
next token and parse the next factor */
while (nextToken == MULT_OP || nextToken == DIV_OP) {
lex();
factor();
}
printf("Exit <term>\n");
} /* End of function term */
21
Recursive-Descent Parsing (cont.)
• A nonterminal that has more than one RHS requires an initial
process to determine which RHS it is to parse
• The correct RHS is chosen on the basis of the next token of input (the
lookahead)
• The next token is compared with the first token that can be generated by
each RHS until a match is found
• If no match is found, it is a syntax error

22
Recursive-Descent Parsing (cont.)
/* Function factor
Parses strings in the language
generated by the rule:
<factor> -> id | (<expr>) */

void factor() {

/* Determine which RHS */

if (nextToken) == ID_CODE || nextToken == INT_CODE)

/* For the RHS id, just call lex */

lex();

/* If the RHS is (<expr>) – call lex to pass over the left parenthesis,
call expr, and check for the right parenthesis */
else if (nextToken == LP_CODE) {
lex();
expr();
if (nextToken == RP_CODE)
lex();
else
error();
} /* End of else if (nextToken == ... */

else error(); /* Neither RHS matches */

} 23
Recursive-Descent Parsing (cont.)
- Trace of the lexical and syntax analyzers on (sum + 47) / total

Next token is: 25 Next lexeme is ( Next token is: 11 Next lexeme is total
Enter <expr> Enter <factor>
Enter <term> Next token is: -1 Next lexeme is EOF
Enter <factor> Exit <factor>
Enter
Next token is: 11 Next lexeme is sum Exit <term>
Enter <expr> Exit <expr> Enter Exit
Enter <term>
Enter <factor>
Next token is: 21 Next lexeme is +
Exit <factor>
Exit <term>
Next token is: 10 Next lexeme is 47
Enter <term>
Enter <factor>
Next token is: 26 Next lexeme is )
Exit <factor>
Exit <term>
Exit <expr>
Next token is: 24 Next lexeme is /
Exit <factor> 24
Recursive-Descent Parsing - Left Recursion
Problem
• A problem in LL Grammar Class
• Left recursion: E  E + T
• Symbol on left also first symbol on right
• Predictive parsing fails when two rules can start with same token
EE+T
ET
• If a grammar has left recursion, either direct or indirect, it cannot be the basis for a top-down
parser
• A grammar can be modified to remove left recursion
• For each nonterminal, A,
1. Group the A-rules as A → Aα1 | … | Aαm | β1 | β2 | … | βn
where none of the β‘s begins with A
2. Replace the original A-rules with
A → β1A’ | β2A’ | … | βnA’
25
A’ → α1A’ | α2A’ | … | αmA’ | ε
More left recursion
• Non-terminal with two rules starting with same prefix

Grammar: Left factored grammar:

S  if E then S else S S  if E then S X
S  if E then S X
X  else S

26
Recursive-Descent Parsing – lack of pairwise
disjointness
• Another problem that disallows top-down parsing is
• whether the parser can always choose the correct RHS on the basis of next
token input using only the first token generated by the leftmost nonterminal
in the current sentential form i.e. one token lookahead.
• This is referred to as lack of pairwise disjointness
• To solve this, pairwise disjointness test needs to be performed on
FIRST set.
FIRST() = {a |  =>* a }
(If  =>* ,  is in FIRST())

in which =>* means 0 or more derivation steps.

27
Recursive-Descent Parsing (cont.)
• Pairwise Disjointness Test:
• For each nonterminal, A, in the grammar that has more than one RHS, for each pair
of rules, A  i and A  j, it must be true that
FIRST(i) ∩ FIRST(j) = 
• Examples:
A  aB | bAb | Bb
B  cB | d - pass the test

A  aB | BAb
B  aB | b - fail the test

28
Recursive-Descent Parsing (cont.)
• Left factoring can resolve the problem

Replace
<variable>  identifier | identifier [<expression>]
with
<variable>  identifier <new>
<new>   | [<expression>]
or
<variable>  identifier [[<expression>]]
(the outer brackets are metasymbols of EBNF)

29
Top-down parsing - Example
• Builds parse tree in preorder
• LL(1) example

Grammar: if 5 then print 8 else…

S  if E then S else S Token : rule
S  begin S L if : S  if E then S else S
S  print E if E then S else S
L  end 5 : E  num
L;SL if 5 then S else S
E  num print : print E
if 5 then print E else S
…

30
LL Parsers
• LL parsers are a type of top-down parser used in computer science to analyze and process the structure of
strings according to a formal grammar.
• The term "LL" stands for "Left-to-right, Leftmost derivation," indicating the strategy used by these parsers to
process input.
• Here are some key characteristics of LL parsers:
• Left-to-right scanning: LL parsers scan the input string from left to right, processing symbols in the order
they appear.
• Leftmost derivation: LL parsers aim to derive the leftmost derivation of the input string. This means that
they always expand the leftmost non-terminal in the current sentential form.
• Predictive parsing: LL parsers use predictive parsing to determine which production rule to apply at each
step based on a finite lookahead. This lookahead involves examining a fixed number of input symbols to
predict the next production rule to apply.
• LL(k) grammars: LL parsers are often characterized by the maximum number of tokens they look ahead in
the input string. For example, LL(1) parsers look ahead one token to decide which production rule to apply,
while LL(k) parsers look ahead k tokens.
• Table-driven parsing: LL parsers are typically implemented using parsing tables, which store information
about which production rule to apply for each combination of non-terminal and lookahead symbol.

31
Bottom Up Parsing
• Unlike top-down parsing, which starts with the root of the parse tree and works down to the leaves, bottom-
up parsing begins with the input string and builds the parse tree from the leaves up to the root.
• Here are the key characteristics of bottom-up parsing:
1. Shift-Reduce Parsing: Bottom-up parsing is often implemented using a strategy called shift-reduce parsing.
In shift-reduce parsing, the parser shifts input symbols onto a stack until it can reduce a portion of the stack
to a non-terminal symbol according to the grammar rules.
2. Reduction: Reduction involves replacing a sequence of symbols on the top of the stack with a non-terminal
symbol according to a production rule in the grammar. The parser continues reducing portions of the stack
until it reaches the start symbol of the grammar.
3. Handle: During reduction, the portion of the stack that matches the right-hand side of a production rule is
called a handle. The parser identifies handles and replaces them with the corresponding non-terminal
symbol.
4. Bottom-up Parse Tree: The result of bottom-up parsing is a parse tree rooted at the start symbol of the
grammar, with the input string as its leaves. Each internal node in the parse tree represents a non-terminal
symbol, and its children represent the symbols derived from that non-terminal.
5. Shift-Reduce Conflict and Reduce-Reduce Conflict: Bottom-up parsing may encounter shift-reduce conflicts
or reduce-reduce conflicts when deciding whether to shift a symbol onto the stack or reduce a portion of
the stack. Conflicts can arise due to ambiguity or lack of sufficient lookahead in the grammar.

32
Bottom-Up Parsers
• Given a right sentential form, , determine what substring
of  is the right-hand side of the rule in the grammar that
must be reduced to produce the previous sentential form in
the right derivation
• Eg.
S  aAc
A  aA | b

S  aAc  aaAc  aabc

• The correct RHS is called the handle.
• The most common bottom-up parsing algorithms are in the LR family
• L specifies a left-to-right scan of the input and the
• R specifies that a rightmost derivation is generated 33
Bottom-up Parsing
• The parsing problem is finding the correct RHS in a right-sentential form
(handle) to reduce to get the previous right-sentential form in the
derivation
• No problem with left recursion
• Example grammar:

E→E+T|T
T→T*F|F
F → ( E ) | id (1)

E.g. derived sentence: id + id * id

34
Bottom-up Parsing (cont.)
• Intuition about handles (continued):
• Def:  is the handle of the right sentential form
 = w if and only if S =>*rm Aw =>rm w

• Def:  is a phrase of the right sentential form

 if and only if S =>*  = 1A2 =>+ 12

• Def:  is a simple phrase of the right sentential form  if and only if S =>*  = 1A2 =>
12

• The handle of a right sentential form is its leftmost simple phrase

• Given a parse tree, it is now easy to find the handle
• Parsing can be thought of as handle pruning

35
Bottom up Parsing - Shift-Reduce parsing
• Uses Pushdown Automata (PDA)
• Parser stack: symbols (terminal and non-terminals) + automaton states
• Parsing actions: sequence of shift and reduce operations
• Action determined by top of stack and k input tokens
• Shift: move next token to top of stack
• Reduce: replacing the handle on the top of the parse stack with its
corresponding LHS
• For example: rule X  A B C
pop C, B, A then push X
• Convention: $ stands for end of file
• The LR family of shift-reduce parsers is the most common bottom-up
parsing approach

36
Shift-reduce Parsing: Example
Grammar: S → aABe
A → Abc | b
B→d

Input: abbcde (using A → b)

 aAbcde (using A → Abc)
 aAde (using B → d)
 aABe (using S → aABe)
 S

37
Shift-Reduce Parsing: cont’d
• Need to choose reductions carefully:
abbcde  aAbcde  aAbcBe  …
doesn’t work.
• A handle of a string s is a substring  s.t.:
•  matches the RHS of a rule A → ; and
• replacing  by A (the LHS of the rule) represents a step in the
reverse of a rightmost derivation of s.
• For shift-reduce parsing, reduce only handles.

38
Shift-reduce Parsing: Implementation
• Data Structures:
• a stack, its bottom marked by ‘$’. Initially empty.
• the input string, its right end marked by ‘$’. Initially w.
• Actions:
repeat
1. Shift some ( 0) symbols from the input string onto the stack, until a
handle  appears on top of the stack.
2. Reduce  to the LHS of the appropriate production.
until ready to accept.
• Acceptance: when input is empty and stack contains only the
start symbol.
39
Example
Stack (→) Input Action
$ abbcde$ shift
$a bbcde$ shift
$ab bcde$ reduce: A → b Grammar :
$aA bcde$ shift S → aABe
$aAb cde$ shift A → Abc | b
$aAbc de$ reduce: A → Abc B→d
$aA de$ shift
$aAd e$ reduce: B → d
$aAB e$ shift
$aABe $ reduce: S → aABe
$S $ accept

40
Conflicts
• Can’t decide whether to shift or to reduce
• both seem OK (“shift-reduce conflict”).
Example: S → if E then S | if E then S else S | …

• Can’t decide which production to reduce with

• several may fit (“reduce-reduce conflict”).
Example: Stmt → id ( args ) | Expr
Expr → id ( args )

41
Advantages of LR parsers
• They will work for nearly all grammars that describe
programming languages.
• They work on a larger class of grammars than other bottom-
up algorithms, but are as efficient as any other bottom-up
parser.
• They can detect syntax errors as soon as it is possible.
• The LR class of grammars is a superset of the class parsable
by LL parsers.

42
Constructing LR Parsers
• LR parsers must be constructed with a tool
• Knuth’s insight: A bottom-up parser could use the entire
history of the parse, up to the current point, to make parsing
decisions
• There were only a finite and relatively small number of different
parse situations that could have occurred, so the history could be
stored in a parser state, on the parse stack

43
Constructing LR Parsers (cont.)
• An LR configuration stores the state of an LR parser

(S0X1S1X2S2…XmSm, aiai+1…an$)

44
Constructing LR Parsers (cont.)
• LR parsers are table driven, where the table has
two components
• The ACTION table specifies the action of the parser,
given the parser state and the next token
• Rows are state names; columns are terminals
• The GOTO table specifies which state to put on top
of the parse stack after a reduction action is done
• Rows are state names; columns are nonterminals

45
Structure of An LR Parser

46
Parser Actions
• Initial configuration: (S0, a1…an$)
• Parser actions:
• If ACTION[Sm, ai] = Shift S, the next configuration is:
(S0X1S1X2S2…XmSmaiS, ai+1…an$)
• If ACTION[Sm, ai] = Reduce A   and S = GOTO[Sm-r, A], where r =
the length of , the next configuration is
(S0X1S1X2S2…Xm-rSm-rAS, aiai+1…an$)
• If ACTION[Sm, ai] = Accept, the parse is complete and no errors
were found.
• If ACTION[Sm, ai] = Error, the parser calls an error-handling routine.
47
LR Parsing Table

• A parser table can be

generated from a given
grammar with a tool, e.g.,
yacc

48
Bottom-up Parsing (cont.)
• Grammar (1) rewritten and numbered for easy
referencing in a parsing table.

1. E→E+T
2. E→T
3. T→T*F
4. T→F
5. F→(E)
6. F → id

1-49
Bottom-up Parsing (cont.)
Stack Input Action

0 id + id * id $ Shift 5

0id5 + id * id $ Reduce 6 (use GOTO[0, F])

0F3 + id * id $ Reduce 4 (use GOTO[0, T])

0T2 + id * id $ Reduce 2 (use GOTO[0, E])

0E1 + id * id $ Shift 6

0E1+6 id * id $ Shift 5

0E1+6id5 * id $ Reduce 6 (use GOTO[6, F])

0E1+6F3 * id $ Reduce 4 (use GOTO[6, T])

0E1+6T9 * id $ Shift 7

0E1+6T9*7 id $ Shift 5

0E1+6T9*7id5 $ Reduce 6 (use GOTO[7, F])

0E1+6T9*7F10 $ Reduce 3 (use GOTO[6, T])

0E1+6T9 $ Reduce 1 (use GOTO[0, E])

0E1 $ Accept

1-50
YACC Syntax Analysis Tool

51
Introduction
• What is YACC ?
• Tool which will produce a parser for a given grammar.
• YACC (Yet Another Compiler Compiler)
• Program designed to compile a LALR(1) grammar and to
produce the source code of the syntactic analyzer of the
language produced by this grammar.

52
Common Tools
• ANTLR tool
• Generates LL(k) parsers
• Yacc (Yet Another Compiler Compiler)
• Generates LALR parsers
• Bison
• Improved version of Yacc

53
YACC File Format
%{
C declarations
%}
yacc declarations
%%
Grammar rules
%%
Additional C code

• Comments enclosed in /* ... */ may appear in any of

the sections.

54
YACC
• Input specification for YACC (similar to flex)
• Three parts: Definitions, Rules, User code
• Use “%%” as a delimiter for each part

• First part: Definitions (C and YACC declarations)

• Definition of tokens for the second part and for use by flex
• Definition of variables for use by the parser code

• Second part: Rules

• Grammar for the parser

• Third part: User code

• The code in this part is copied into the parser generated by YACC
55
Definitions Section
%{
#include <stdio.h>
#include <stdlib.h>
%}
It is a terminal
%token ID NUM
%start expr
expr parse

56
YACC Declaration Summary
`%start'
Specify the grammar's start symbol
`%union'
Declare the collection of data types that semantic values may have
`%token'
Declare a terminal symbol (token type name) with no precedence or associativity specified
`%type'
Declare the type of semantic values for a nonterminal symbol
`%right'
Declare a terminal symbol (token type name) that is
right-associative
`%left'
Declare a terminal symbol (token type name) that is left-associative
`%nonassoc'
Declare a terminal symbol (token type name) that is nonassociative (using it in a way that would be 57
associative is a syntax error, ex: x op. y op. z is syntax error)
Rules Section
• This section defines grammar
• Example
expr : expr '+' term | term;
term : term '*' factor | factor;
factor : '(' expr ')' | ID | NUM;

58
Rules Section
• Normally written like this
• Example:
expr : expr '+' term
| term
;
term : term '*' factor
| factor
;
factor : '(' expr ')'
| ID
| NUM
;
59
The Position of Rules
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
;

60
The Position of Rules
$1
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
;

61
The Position of Rules
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
; $2
62
The Position of Rules
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
; $3 Default: $$ = $1;

63
YACC File Example
%{
#include <stdio.h>
%}

%token NAME NUMBER

statement: NAME '=' expression

| expression { printf("= %d\n", $1); }
;

expression: expression '+' NUMBER { $$ = $1 + $3; }

| expression '-' NUMBER { $$ = $1 - $3; }
| NUMBER { $$ = $1; }
;
%%
int yyerror(char *s)
{
fprintf(stderr, "%s\n", s);
return 0;
}

int main(void)
{
yyparse();
return 0;
} 64
Example 1
%{ #include <ctype.h> %}
Also results in definition of
%token DIGIT #define DIGIT xxx
%%
line : expr ‘\n’ { printf(“= %d\n”, $1); }
;
expr : expr ‘+’ term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term ‘*’ factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : ‘(’ expr ‘)’ { $$ = $2; }
| DIGIT { $$ = $1; }
; Attribute of factor (child)
%% Attribute of
int yylex() term (parent) Attribute of token
{ int c = getchar();
(stored in yylval)
if (isdigit(c))
{ yylval = c-’0’; Example of a very crude lexical
return DIGIT; analyzer invoked by the parser
}
return c;
} 65
Example 2
%{
#include <ctype.h>
Double type for attributes
#include <stdio.h> and yylval
#define YYSTYPE double
%}
%token NUMBER
%left ‘+’ ‘-’
%left ‘*’ ‘/’
%right UMINUS
%%
lines : lines expr ‘\n’ { printf(“= %g\n”, $2); }
| lines ‘\n’
| /* empty */
;
expr : expr ‘+’ expr { $$ = $1 + $3; }
| expr ‘-’ expr { $$ = $1 - $3; }
| expr ‘*’ expr { $$ = $1 * $3; }
| expr ‘/’ expr { $$ = $1 / $3; }
| ‘(’ expr ‘)’ { $$ = $2; }
| ‘-’ expr %prec UMINUS { $$ = -$2; }
| NUMBER
;
%% 66
Example 2 (cont’d)
%%
int yylex()
{ int c;
while ((c = getchar()) == ‘ ‘)
;
if ((c == ‘.’) || isdigit(c)) Crude lexical analyzer for
{ ungetc(c, stdin); fp doubles and arithmetic
scanf(“%lf”, &yylval); operators
return NUMBER;
}
return c;
}
int main()
{ if (yyparse() != 0)
fprintf(stderr, “Abnormal exit\n”); Run the parser
return 0;
}
int yyerror(char *s)
{ fprintf(stderr, “Error: %s\n”, s);
Invoked by parser
} to report parse errors
67
How YACC Works
File containing desired grammar in yacc format
gram.y

yacc yacc program

y.tab.c C source program created by yacc

cc
or gcc
C compiler

Executable program that will parse grammar given

a.out
in gram.y
68
How YACC Works
y.tab.h
YACC source (*.y) yacc y.tab.c
y.output
(1) Parser generation time

y.tab.c C compiler/linker a.out

(2) Compile time

Abstract
Token stream a.out Syntax
Tree

69
Benefits of YACC
• Faster development
• Compared to manual implementation
• Easier to change the specification and generate new
parser
• Than to modify 1000s of lines of code to add, change, delete
an existing feature
• Less error-prone, as code is generated
• Cost: Learning curve
• Invest once, amortized over 40+ years career

70
Lex with Yacc
Lex source Yacc source
(Lexical Rules) (Grammar Rules)

Lex Yacc

lex.yy.c y.tab.c
call
Parsed
Input yylex() yyparse()
Input

return token
71
YACC works with Lex

How to work ?

72
YACC works with Lex

[0-9]+
call yylex()

next token is NUM

NUM ‘+’ NUM

73
Simple example
• Implement a calculator which can recognize adding or subtracting of
numbers

[linux33]% ./y_calc
1+101
= 102
[linux33] % ./y_calc
1000-300+200+100
= 1000
[linux33] %

74
Example – the Lex part
%{
#include <math.h>
#include "y.tab.h"
extern int yylval;
%} Definitions
pattern
action
%%
[0-9]+ { yylval = atoi(yytext);
return NUMBER; }
[\t ]+ ; /* Do nothing for white space */
\n return 0;/* End of the logic */
. return yytext[0];
%% Rules
75
Example – the Yacc part
%token NAME NUMBER
%%
statement: NAME '=' expression
| expression
Definitions
{ printf("= %d\n", $1); }
;
expression:expression '+' NUMBER
{ $$ = $1 + $3; }
|expression '-' NUMBER
{ $$ = $1 - $3; } Include Yacc library
| NUMBER (-ly)
{ $$ = $1; }
;

Rules
76
LEX and YACC – Another Example
%{ yacc -d xxx.y
scanner.l Produced
#include <stdio.h>
y.tab.h:
#include "y.tab.h"
%}
id [_a-zA-Z][_a-zA-Z0-9]* # define CHAR 258
%% # define FLOAT 259
int { return INT; } # define ID 260
char { return CHAR; } # define INT 261
float { return FLOAT; }
{id} { return ID;}
%{
parser.y
#include <stdio.h>
#include <stdlib.h>
%}
%token CHAR, FLOAT, ID, INT
%%
77
Lex vs. Yacc
• Lex
• Lex generates C code for a lexical analyzer, or scanner
• Lex uses patterns that match strings in the input and converts the
strings to tokens

• Yacc
• Yacc generates C code for syntax analyzer, or parser.
• Yacc uses grammar rules that allow it to analyze tokens from Lex
and create a syntax tree.

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Module-2 1
No ratings yet
Module-2 1
51 pages
M2 Compiler Design
No ratings yet
M2 Compiler Design
51 pages
CD Chapter-3
No ratings yet
CD Chapter-3
105 pages
Chapter – three
No ratings yet
Chapter – three
139 pages
Chapter 3
No ratings yet
Chapter 3
180 pages
Top Down PDF
No ratings yet
Top Down PDF
49 pages
APznzaYtAWjYy0s_GBEoizaF1ROv5e2pS_Nl6BcNYabrBN8gt4KeYj7LFiXdkYVxT_V92vXdgLmWE0ZcbyVltch5fozoqQQ4KdG766DLjO8aJsMIPKjEjniZOjL0qtNhMykCRh_ohPtDpZvrHNBAvbbZBhvxDpVEqpjDluyzuJGi-VI3NuG46DY_24QwGBEoRdfQYjfevW6tvweeRG (1)
No ratings yet
APznzaYtAWjYy0s_GBEoizaF1ROv5e2pS_Nl6BcNYabrBN8gt4KeYj7LFiXdkYVxT_V92vXdgLmWE0ZcbyVltch5fozoqQQ4KdG766DLjO8aJsMIPKjEjniZOjL0qtNhMykCRh_ohPtDpZvrHNBAvbbZBhvxDpVEqpjDluyzuJGi-VI3NuG46DY_24QwGBEoRdfQYjfevW6tvweeRG (1)
100 pages
Chapter-3 so far
No ratings yet
Chapter-3 so far
50 pages
CS6109-MODULE-5
No ratings yet
CS6109-MODULE-5
117 pages
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
No ratings yet
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
44 pages
CH03
No ratings yet
CH03
57 pages
Top Down Parser
No ratings yet
Top Down Parser
111 pages
Tekkom M4,5
No ratings yet
Tekkom M4,5
29 pages
Lec03 parserCFG
No ratings yet
Lec03 parserCFG
27 pages
CD Unit-3 Part-1
No ratings yet
CD Unit-3 Part-1
99 pages
Top Down
No ratings yet
Top Down
25 pages
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
34 pages
Unit-II CD
No ratings yet
Unit-II CD
81 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
What Is Parsing: Parsing Is The Process of Analyzing An Input Sequence in Order
No ratings yet
What Is Parsing: Parsing Is The Process of Analyzing An Input Sequence in Order
9 pages
CD Unit-2
100% (1)
CD Unit-2
60 pages
Unit 3
No ratings yet
Unit 3
37 pages
Chapter 3 (2)
No ratings yet
Chapter 3 (2)
41 pages
Chapter 3
No ratings yet
Chapter 3
96 pages
parser (1)
No ratings yet
parser (1)
36 pages
Syntax Analyser
No ratings yet
Syntax Analyser
30 pages
Ch4a
No ratings yet
Ch4a
36 pages
Parsing
No ratings yet
Parsing
38 pages
ACD-UNIT-4 Notes
No ratings yet
ACD-UNIT-4 Notes
32 pages
Parsing ME Modified
No ratings yet
Parsing ME Modified
168 pages
KCA015 Unit2
No ratings yet
KCA015 Unit2
29 pages
Compiler Rewind
No ratings yet
Compiler Rewind
52 pages
Chapter 3 Syntax Analyzer1
No ratings yet
Chapter 3 Syntax Analyzer1
58 pages
Compiler Design: - Top-Down Parsing With A Recursive Descent Parser
No ratings yet
Compiler Design: - Top-Down Parsing With A Recursive Descent Parser
20 pages
Chapter-3-Syntax Analysis
No ratings yet
Chapter-3-Syntax Analysis
126 pages
Unit - Ii 2.1 Syntax Analysis
No ratings yet
Unit - Ii 2.1 Syntax Analysis
122 pages
Compiler Construction Lecture 12 Predictive Parsing-Step1
No ratings yet
Compiler Construction Lecture 12 Predictive Parsing-Step1
24 pages
CD Chapter 2
No ratings yet
CD Chapter 2
39 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
54 pages
Syntax Analysis: EECS 483 - Lecture 4 University of Michigan Monday, September 17, 2006
No ratings yet
Syntax Analysis: EECS 483 - Lecture 4 University of Michigan Monday, September 17, 2006
28 pages
G52Cmp Compilers: Syntax Analysis
No ratings yet
G52Cmp Compilers: Syntax Analysis
36 pages
[Week 4] Syntax Analysis (CFG)
No ratings yet
[Week 4] Syntax Analysis (CFG)
50 pages
Lecture 5
No ratings yet
Lecture 5
28 pages
Mod 2.1 - (Lec 8) - Syntax Analyzer and CFG
No ratings yet
Mod 2.1 - (Lec 8) - Syntax Analyzer and CFG
39 pages
Parsing - 1
No ratings yet
Parsing - 1
59 pages
Chapter 4 - Syntax Analysis
No ratings yet
Chapter 4 - Syntax Analysis
73 pages
2.2 - Syntax Analysis (Upto Top-down Parsing)
No ratings yet
2.2 - Syntax Analysis (Upto Top-down Parsing)
91 pages
Context Free Grammars
No ratings yet
Context Free Grammars
10 pages
CD UNIT 3
No ratings yet
CD UNIT 3
76 pages
Chapter – 3
No ratings yet
Chapter – 3
46 pages
Compiler Design Chapter-3
0% (1)
Compiler Design Chapter-3
177 pages
Syntax Analysis
No ratings yet
Syntax Analysis
90 pages
4.parsing
No ratings yet
4.parsing
32 pages
Chapter 3 (Part 1)
No ratings yet
Chapter 3 (Part 1)
33 pages
2024_CD-Ch03_Syntaxx_Analysis
No ratings yet
2024_CD-Ch03_Syntaxx_Analysis
28 pages
Chapter 4 - Syntax Analysis CIE1
No ratings yet
Chapter 4 - Syntax Analysis CIE1
69 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
Cs1622 Parsing Part2 Bun
No ratings yet
Cs1622 Parsing Part2 Bun
5 pages
The Genetic Code of All Languages,(Part 2.1; Numerals)
From Everand
The Genetic Code of All Languages,(Part 2.1; Numerals)
Moni Kanchan Panda
No ratings yet
Arrays in VB-1
No ratings yet
Arrays in VB-1
27 pages
SIT 103 Introduction L1
No ratings yet
SIT 103 Introduction L1
31 pages
Control Structures in Visual Basic
No ratings yet
Control Structures in Visual Basic
20 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
CC Lab Manual - 16feb2015
No ratings yet
CC Lab Manual - 16feb2015
34 pages
Compiler Design Lecture Notes
No ratings yet
Compiler Design Lecture Notes
308 pages
Manual
No ratings yet
Manual
30 pages
Compiler Isha
No ratings yet
Compiler Isha
30 pages
Compiler all practicals
No ratings yet
Compiler all practicals
38 pages
Semantic Analysis - 16CO125-151-254
No ratings yet
Semantic Analysis - 16CO125-151-254
43 pages
Lecture 04
No ratings yet
Lecture 04
35 pages
Compiler Lab (2016-2017)
No ratings yet
Compiler Lab (2016-2017)
63 pages
Compiler Design Lab Mannual
No ratings yet
Compiler Design Lab Mannual
34 pages
Unit-1-Intro-To-Compiler CD KCS502
No ratings yet
Unit-1-Intro-To-Compiler CD KCS502
16 pages
CD Lab Prgms Final
No ratings yet
CD Lab Prgms Final
43 pages
Implementation of Symbol Table Using Flex On Unix Environment
No ratings yet
Implementation of Symbol Table Using Flex On Unix Environment
19 pages
7MCE1C4-Principles of Compiler Design
No ratings yet
7MCE1C4-Principles of Compiler Design
117 pages
Flexman PDF
No ratings yet
Flexman PDF
37 pages
Notes Compiler
No ratings yet
Notes Compiler
28 pages
1 Principles of Compiler Design
No ratings yet
1 Principles of Compiler Design
89 pages
Compiler Construction Notes - Hamza
No ratings yet
Compiler Construction Notes - Hamza
126 pages
Lex and Yacc Examples Lab Task
No ratings yet
Lex and Yacc Examples Lab Task
6 pages
Lab Manual CL I Vibhs
No ratings yet
Lab Manual CL I Vibhs
84 pages
CD Record
No ratings yet
CD Record
33 pages
Compiler 2
No ratings yet
Compiler 2
10 pages
Compiler Design Assignment Lexical Analysis: 30/08/2021 Neha Vijay Khairnar 191081036 IT
No ratings yet
Compiler Design Assignment Lexical Analysis: 30/08/2021 Neha Vijay Khairnar 191081036 IT
8 pages
Cse 6TH Sem Syllabus
No ratings yet
Cse 6TH Sem Syllabus
9 pages
CompilerDesign Lab Manual
No ratings yet
CompilerDesign Lab Manual
66 pages
Module-4 Lex and Yacc
No ratings yet
Module-4 Lex and Yacc
67 pages
s7 Ktu Lab Manual
No ratings yet
s7 Ktu Lab Manual
41 pages
tarun (1)
No ratings yet
tarun (1)
51 pages
Implementation of Calculator Using LEX and YACC
0% (1)
Implementation of Calculator Using LEX and YACC
4 pages
CC2
No ratings yet
CC2
6 pages
Compiler Lab File
No ratings yet
Compiler Lab File
44 pages

Chapter 3 Syntax Analysis

Uploaded by

Chapter 3 Syntax Analysis

Uploaded by

Chapter 3

• Derivation: Suppose that

•  : “derives in one step”

Derivation: Parse tree:

Leftmost derivation Rightmost derivation

Derivation: Parse tree: Derivation: Parse tree:

Derivation: Parse tree:

• Useful parsers look only one token ahead in the input

<expr>  <term> {(+ | -) <term>}

/* Parse the first term */

/* Determine which RHS */

/* For the RHS id, just call lex */

else error(); /* Neither RHS matches */

Grammar: Left factored grammar:

in which =>* means 0 or more derivation steps.

Grammar: if 5 then print 8 else…

S  aAc  aaAc  aabc

E.g. derived sentence: id + id * id

• Def:  is a phrase of the right sentential form

• The handle of a right sentential form is its leftmost simple phrase

Input: abbcde (using A → b)

• Can’t decide which production to reduce with

• A parser table can be

0id5 + id * id $ Reduce 6 (use GOTO[0, F])

0F3 + id * id $ Reduce 4 (use GOTO[0, T])

0T2 + id * id $ Reduce 2 (use GOTO[0, E])

0E1+6id5 * id $ Reduce 6 (use GOTO[6, F])

0E1+6F3 * id $ Reduce 4 (use GOTO[6, T])

0E1+6T9*7id5 $ Reduce 6 (use GOTO[7, F])

0E1+6T9*7F10 $ Reduce 3 (use GOTO[6, T])

0E1+6T9 $ Reduce 1 (use GOTO[0, E])

• Comments enclosed in /* ... */ may appear in any of

• First part: Definitions (C and YACC declarations)

• Second part: Rules

• Third part: User code

%token NAME NUMBER

statement: NAME '=' expression

expression: expression '+' NUMBER { $$ = $1 + $3; }

yacc yacc program

y.tab.c C source program created by yacc

Executable program that will parse grammar given

y.tab.c C compiler/linker a.out

(2) Compile time

next token is NUM

NUM ‘+’ NUM

You might also like