CH03
CH03
CH03
Syntax Analysis
❒ Synt ax: t he way in which t okens are put t oget her t o form
expressions, st at ement s, or blocks of st at ement s.
D The rules governing t he format ion of st at ement s in a programming
language.
❒ Synt ax analysis: t he t ask concerned wit h fit t ing a sequence of
t okens int o a specified synt ax.
❒ Parsing: To break a sent ence down int o it s component part s wit h an
explanat ion of t he form, funct ion, and synt act ical relat ionship of
each part .
❒ The synt ax of a programming language is usually given by t he
grammar rules of a cont ext free grammar (CFG).
1
Role of a Parser
❒ The synt ax analyzer (parser) checks whet her a given
source program sat isfies t he rules implied by a CFG
or not .
❍ If it sat isfies, t he parser creat es t he parse t ree of t hat
p rogram.
❍ Ot herwise, t he parser gives t he error messages.
❒ A CFG:
❍ gives a precise synt act ic specificat ion of a
programming language.
❍ A grammar can be direct ly convert ed in t o a parser by
some t ools (yacc).
2
Role of a Parser…
Parse t ree
next char next t oken
lexical Synt ax
get next
char analyzer analyzer
get next
t oken
Source
Progra
symbol
m
t able
Lexical Synt ax
(Cont ains a record Error
Error
for each ident ifier)
3
Parser…
❒ The parser can be cat egorized int o t w o groups:
❒ Top-down parser
❍ The parse t ree is creat ed t op t o bot t om, st art ing from
t he root t o leaves.
❒ Bot t om-up parser
❍ The parse t ree is creat ed bot t om t o t op, st art ing from
t he leaves t o root .
❒ Bot h t op-down and bot t om-up parser scan t he input
from left t o right (one symbol at a t ime).
❒ Efficient t op-down and bot t om-up parsers can be
implement ed by making use of cont ext -free-
grammar.
❍ LL for t op-down parsing
❍ LR for bot t om-up parsing
4
Cont ext free grammar (CFG)
❒ A context-free grammar is a specification for the
syntactic structure of a programming language.
Context-free grammar has 4-tuples:
G = (T, N, P, S) where
❍ T is a finit e set of t erminals (a set of t okens)
❍ N is a finit e set of non-t erminals (synt act ic variables)
❍ P is a finit e set of product ions of t he
form A → α where A is non-t erminal
and α is a st rings of t erminals and non-
t erminals (including t he empt y st ring)
D S ∈ N is a designat ed st art symbol (one of t he non-
t erminal symbols)
5
Example: grammar for simple arit hmet ic
expressions
6
Not at ional Convent ions Used
❒ Terminals:
❍ Lowercase let t ers early in t he alphabet , such as a, b, c.
❍ Operat or symbols such as +, *, and so on.
❍ Punct uat ion symbols such as parent heses, comma, and so on.
❍ The digit s 0,1,. . . ,9.
❍ Boldface st rings such as id or if, each of which represent single
t erminal symbols.
❒ Non-t erminals:
❍ Uppercase let t ers early in t he alphabet , such as A, B, C.
❍ The let t er S is usually t he st art symbol.
❍ Lowercase, it alic names such as expr or st mt .
❍ Uppercase let t ers may be used t o represent non-t erminals for
t he const ruct s.
• expr, t erm, and f act or are represent ed by E, T, F
7
Not at ional Convent ions Used…
Grammar symbols
Uppercase late let t ers lat e in t he alphabet , such as X, Y, Z,
t hat is, eit her non-t erminals or t erminals.
❒ St rings of t erminals.
❒ Lowercase let t ers lat e in t he alphabet , mainly u,v,x,y ∈ T*
E E + T | E - T I T
T T * F I T / F I F
F ( E ) | id
8
Derivat ion
❒ A derivat ion is a sequence of replacement s of st ruct ure names by
choices on t he right hand sides of grammar rules.
Example: E → E + E | E – E | E * E | E / E | -E
E→ ( E)
E → id
1
0
Parse t ree
❒ A parse t ree is a graphical represent at ion of a
derivat ion
❒ It filt ers out t he order in which product ions are applied
t o replace non-t erminals.
E E E E E
- E - E - E - E
( E ) ( E ) ( E )
E + E E + E
This is a t op-down derivat ion
because we st art building t he id id
parse t ree at t he t op parse t ree
12
Exercise
a) Using t he grammar below, draw a parse t ree for t he
following st ring:
( ( id . id ) id ( id ) ( ( ) ) )
S→ E
E →id
| ( E. E )
| ( L)
| ()
L→ LE
| E
b) Give a right most derivat ion for t he st ring given in (a).
13
Ambiguit y
❒ A grammar produces more t han one parse t ree for a
sent ence is called as an ambiguous grammar.
• produces more t han one left most derivat ion or
• more t han one right most derivat ion for t he same
sent ence.
14
Ambiguit y: Example
❒ Example: The arit hmet ic expression
grammar E → E + E | E * E | ( E ) | id
❒ permit s t wo dist inct left most derivat ions for t he
sent ence id + id * id:
(a) (b)
E => E + E E => E * E
=> id + E => E + E * E
=> id + E * E => id + E * E
=> id + id * E => id + id * E
=> id + id * id => id + id * id
15
Ambiguit y: example
To add precedence
❒ Creat e a non-t erminal for each level of precedence
❒ Isolat e t he corresponding part of t he grammar
❒ Force t he parser t o recognize high precedence sub expressions first
For algebraic expressions
❒ Mult iplicat ion and division, first (level one)
❒ Subt ract ion and addit ion, next (level t wo)
E → E + E | E ∗ E | ( E ) | id
E E E E
E + T E + T E + T
E + T E + T
E + T
19
Eliminat ion of Left recursion
❒ A grammar is left recursive, if it has a non-t erminal A
such t hat t here is a derivat ion
A=>+Aα for some st ring α.
❒ Top-down parsing met hods cannot handle left -
recursive grammar.
❒ so a transformation that eliminates left-recursion is
❒ needed.
To eliminate left recursion for single production
A Aα | β could be replaced by t he nonleft - recursive
product ions
A β A’
A’ α A’ | ε
20
Eliminat ion of Left recursion…
E→ E+T| T
This left -recursive
T→ T∗F| F
grammar:
F → ( E ) | id
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → ∗FT’ | ε
F → ( E ) | id
Exercise: Parse id + id * id using the non-left recursive grammar above using left-most derivation.
21
Top-Down and Bot t om-Up
Parsers
Top-down parsers:
• St art s const ruct ing t he parse t ree at t he t op (root ) of t he
t ree and move down t owards t he leaves.
• Easy t o implement by hand, but work wit h rest rict ed
grammars.
example: Recursive Decent Parser
22
Top-down (LL) parsing
Recursive Descent Parsing (RDP)
❒ This met hod of t op-down parsing can be considered as
an at t empt t o find t he left most derivat ion for an input
st ring. It may involve backt racking.
❒ To const ruct t he parse t ree using RDP:
o we creat e one node t ree consist ing of S.
❍ t wo point ers, one for t he t ree and one for t he input , will
be used t o indicat e where t he parsing process is.
❍ init ially, t hey will be on S and t he first input symbol,
respect ively.
❍ t hen we use t he first S-product ion t o expand t he t ree.
The t ree point er will be posit ioned on t he left most
symbol of t he newly creat ed sub-t ree.
23
Recursive Descent Parsing (RDP)…
Home work: 25
Convert the grammar into non-left recursive and draw the parse tree using RDP
Exercise
Using t he grammar below, draw a parse t ree for t he
following st ring using RDP algorit hm:
( ( id . id ) id ( id ) ( ( ) ) )
S→ E
E→
id
| ( E. E )
| ( L)
| ()
L→ LE
| E
26
Bot t om-Up (LR) Parser
Abot t om-up parser, or a shift -reduce parser, begins
at t he leaves and works up t o t he t op of t he t ree.
S → aABe
Consider t he Grammar: A → Abc | b
B →d
27
Bot t om-Up Parser: Simulat ion
INPUT: a b b c d e $ OUTPUT:
Product ion
S→ aABe
Bot t om-Up Parsing
A→ Abc
Program
A→ b
B→ d
I NPUT: a b b c d e $ OUTPUT:
Pr oduct ion
S → aABe
Bot t om-Up Par sing
A → Abc A
Pr ogr am
A→b
B→d b
28
Bot t om-Up Parser: Simulat ion
I NPUT: a A b c d e $ OUTPUT:
Pr oduct ion
S → aABe
Bot t om-Up Par sing
A → Abc A
Pr ogr am
A→b
B→d b
I NPUT: a A b c d e $ OUTPUT:
Pr oduct ion
S → aABe
Bot t om-Up Par sing
A → Abc A
Pr ogr am
A→b
B→d b
Pr oduct ion
A
S → aABe
Bot t om-Up Par sing
A → Abc A b c
Pr ogr am
A→b
B→d b
I NPUT: a A d e $ OUTPUT:
Pr oduct ion
A
S → aABe
Bot t om-Up Par sing
A → Abc A b c
Pr ogr am
A→b
B→d b
30
Bot t om-Up Parser: Simulat ion
I NPUT: a A d e $ OUTPUT:
Pr oduct ion
A B
S → aABe
Bot t om-Up Par sing
A → Abc A b c d
Pr ogr am
A→b
B→d b
I NPUT: a A B e $ OUTPUT:
Pr oduct ion
A B
S → aABe
Bot t om-Up Par sing
A → Abc A b c d
Pr ogr am
A→b
B→d b
31
Bot t om-Up Parser: Simulat ion
I NPUT: a A B e $ OUTPUT:
S
Pr oduct ion e
a A B
S → aABe
Bot t om-Up Par sing
A → Abc A b c d
Pr ogr am
A→b
B→d b
I NPUT: S $ OUTPUT:
S
Pr oduct ion e
a A B
S → aABe
Bot t om-Up Par sing
A → Abc A b c d
Pr ogr am
A→b
B→d b
33
St ack implement at ion of shift / reduce
parsing
❒ In LR parsing t he t wo maj or problems are:
❍ locat e t he subst ring t hat is t o be reduced
❍ locat e t he product ion t o use
34
St ack implement at ion of shift / reduce parsing…
35
Synt ax error handling
If a compiler had to process only correct programs, its
design and implementation would be simplified greatly.
However, a compiler is expected to assist the
programmer in locating and tracking down errors that
inevitably creep into programs, despite the
programmer's best efforts.
36
Synt ax error handling
37
Synt ax error handling…
❒ The error handler should be writ t en wit h t he
following goals in mind:
• Errors should be report ed clearly and accurat ely
• It should report t he place of t he error
• It should also report t he t ype of t he error
38
Synt ax error handling…
39
The Parser Generat or: Yacc
❒ Yacc st ands for "yet anot her compiler-compiler".
❒ Yacc: a t ool for aut omat ically generat ing a parser
given a grammar writ t en in a yacc specificat ion (.y
file)
❒ Yacc parser – calls lexical analyzer t o collect
t okens from input st ream.
❒ Tokens are organized using grammar rules
❒ When a rule is recognized, it s act ion is execut ed
Not e
lex t okenizes t he input and yacc parses t he
t okens, t aking t he right act ions, in cont ext .
169
Scanner, Parser, Lex and Yacc
170
Yacc…
❒ There are four st eps involved in creat ing a compiler in Yacc:
1. Specify t he grammar:
– Writ e t he grammar in a .y file (also specify t he act ions here t hat
are t o be t aken in C).
– Writ e a lexical analyzer t o process input and pass t okens t o t he
parser. This can be done using Lex.
– Writ e a funct ion t hat st art s parsing by calling yyparse().
– Writ e error handling rout ines (like yyerror()).
2. Generat e a parser from Yacc by running Yacc over t he
grammar file.
3. Compile code produced by Yacc as well as any ot her
relevant source files.
4. Link t he obj ect files t o appropriat e libraries for t he
execut able parser.
172
43
Writ ing a Grammar in Yacc
❒ Product ions in Yacc are of t he form:
44
Synt hesized At t ribut es
❒ Semant ic act ions may refer t o values of t he synt hesized
at t ribut es of t erminals and non-t erminals in a
product ion:
X : Y1 Y2 Y3 …Yn { act ion }
❍ $$ refers t o t he value of t he at t ribut e of X
❍ $ i refers t o t he value of t he at t ribut e of Yi
❒ For example
fact or : ‘ (’ expr ‘ )’ { $$=$2; }
f act or.val=x
$$=$2
( expr.val=x )
45
Lex Yacc int eract ion…
yyparse()
input
calc.y y.t ab.c
Yacc
Lex
calc.l lex.yy.c
Compiled
yylex()
out put
46
Lex Yacc int eract ion…
❒ If lex is t o ret urn t okens t hat yacc will process, t hey
have t o agree on what t okens t here are. This is
done as follows:
❍ The yacc file will have t oken definit ions
%t oken INTEGER
in t he definit ions sect ion.
❍ When t he yacc file is t ranslat ed wit h yacc -d, a header file
y.t ab.h is creat ed t hat has definit ions like
#define INTEGER 258
❍ This file can t hen be included in bot h t he lex and yacc
program.
❍ The lex file can t hen call ret urn INTEGER, and t he yacc
program can mat ch on t his t oken.
47
Example : Simple calculat or: yacc file
%{
int t ypes for at t ribut es
#include <stdio.h>
and yylval
void yyerror(char *);
#define YYSTYPE int Grammar rules
%}
%token INTEGER action
%%
program:
program expr '\n' { printf("%d\n", $2); }
|
; The value of
expr: LHS(expr)
INTEGER { $$=$1;}
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
;
%% The value of
void yyerror(char *s) { t okens on RHS
fprintf(stderr, "%s\n", s);} St ored in yylval
int main(void) {
yyparse();
return 0;} Lexical analyzer invoked by
t he parser 179
Example : Simple calculat or: lex file
%{ The lex program mat ches
#include <stdio.h> Numbers and operat ors
#include "y.tab.h" and ret urns t hem
extern int yylval ; Generat ed by yacc, cont ains
%} #define INTEGER 256
%%
[0-9]+ {yylval=atoi(yytext); Defined in y.t ab.c
return INTEGER;
} Place t he int eger value
[-+*/\n] return *yytext; In t he st ack
[ \t] ;/*Skip white space*/
. yyerror("invalid character");
%%
int yywrap(void){
operat ors will
return 1; be ret urned
} 180
Lex and Yacc: compile and run
[compiler@localhost yacc]$ vi calc.l
[compiler@localhost yacc]$ vi calc.y
[compiler@localhost yacc]$ yacc -d calc.y
yacc: 4 shift / reduce conflict s.
[compiler@localhost yacc]$ lex calc.l
[compiler@localhost yacc]$ ls
a.out calc.l calc.y lex.yy.c t ypescript y.t ab.c y.t ab.h
[compiler@localhost yacc]$ gcc y.t ab.c lex.yy.c
[compiler@localhost yacc]$ ls
a.out calc.l calc.y lex.yy.c t ypescript y.t ab.c y.t ab.h
[compiler@localhost yacc]$ ./ a.out
2+3
5
23+8+
Invalid characht er
synt ax error 50
Example : Simple calculat or: yacc file– opt ion2
%{
#include<stdlib.h>
#include<stdio.h>
%}
%token INTEGER;
%%
Program :
program expr '\n' {printf("%d\n ", $2);}
|
;
expr : expr '+' mulexpr {$$=$1 + $3;}
|expr '-' mulexpr {$$=$1 - $3;}
|mulexpr {$$=$1;}
;
mulexpr : mulexpr '*' term {$$=$1 * $3;}
| mulexpr '/' term {$$=$1 / $3;}
|term {$$=$1;}
;
term :
'(' expr ')' {$$=$2;}
| INTEGER {$$=$1;}
;
%%
51
Example : Simple calculat or: yacc file– opt ion2
52
Calculat or 2: Example– yacc file
%{
#include<stdio.h> user: 3 * (4 + 5)
sym holds t he calc: 27
int sym[26];
%} value of t he user: x = 3 * (4 +
%token INTEGER VARIABLE associat ed 5) user: y = 5
%left '+' '-' variable user: x
%left '*' '/' calc: 27
%% associat ive and user: y
program : precedence rules calc: 5
program statement '\n'
| user: x + 2*y
; calc: 37
statement :
expression {printf("%d\n", $1);}
|VARIABLE '=' expression {sym[$1]= $3;}
;
expression :
INTEGER {$$=$1;}
|VARIABLE {$$=sym[$1];}
|expression '+' expression {$$=$1 + $3;}
|expression '-' expression {$$=$1 - $3;}
|expression '*' expression {$$=$1 * $3;}
|expression '/' expression {$$=$1 * $3;}
| '(' expression ')' {$$=$2;}
; 53
%%
Calculat or 2: Example– yacc file
54
Calculat or 2: Example– lex file
%{
#include<stdio.h> The lexical
#include<stdlib.h> analyzer ret urns
#include "y.tab.h“ variables and
void yyerror(char *); int egers
extern int yylval;
%}
%%
[a-z] { yylval=*yytext; For variables
return VARIABLE; yylval specifies an
} index t o t he
[0-9]+ { yylval=atoi(yytext); symbol t able sym.
return INTEGER;
}
[-+*/()=\n] return *yytext;
[ \t] ; /*Skip white space*/
. yyerrror(" Invalid character
%% ");
int yywrap(void)
{
return 1;
} 186
Conclusions
❒ Yacc and Lex are very helpful for
building t he compiler front -end
❒ A lot of t ime is saved when compared t o
hand-implement at ion of parser and scanner
❒ They bot h work as a mixt ure of “ rules” and
“ C code”
❒ C code is generat ed and is merged wit h t he
rest of t he compiler code
Calculat or program
❒ Expand t he calculat or program so t hat t he new
calculat or program is capable of processing:
user: 3 * (4 + 5)
user: x = 3 * (4 + 5)
user: y = 5
user: x + 2*y
2^3/ 6
sin(1) + cos(PI)
t an
log
fact orial
57