Yacc
Yacc
1
Introduction
• Grammar
– CFG
– Recursive Rules
• Shift/Reduce Parsing
– See Figure 3-2.
– LALR(1)
– What Yacc Cannot Parse
• It cannot deal with ambiguous grammars
• If you give it one that it cannot handle it will tell
you, so there is no problem of overcomplex parsers
silently failing.
2
The Structure of a Yacc grammar
(Definition section)
%%
(Rules section)
%%
(User subroutines section)
3
The Definition Section
• The definition section includes declarations
of the tokens used in the grammar, the types
of values used on the parser stack, and other
odds and ends.
– You don’t have to specify the number of the
token.
• It can also include a literal block, C code
enclosed in %{ %}
4
The Rules Section
• Since ASCII keyboards don’t have a key, we
use a colon between the left- and right-hand sides
of a rule, and we put a semicolon at the end of
each rule
• The symbol on the left-hand side of the first rule
in the grammar is normally the start symbol,
though you can use a %start declaration in the
definition section to override that.
5
Symbol Values and Actions
• Every symbol in a yacc parser has a value
– The semantic record
– A number, a literal text string, ….
– Nonterminal symbols can have any values you want,
created by code in the parser
– In real parsers, the values of different symbols use
different data types
• int, double, char *, ….
• If you have multiple value types, you have to list all the value
types used in a parser so that yacc can create a C union typedef
called YYSTYPE to contain them
• By default, yacc makes all values of type int
6
Symbol Values and Actions
• $$:
– The value of the LHS symbol
– The semantic routine should give value to it.
• $i:
– The value of the i-th symbol in the RHS of the
production
• Terminal symbol: The value was given by the lex.
• Nonterminal symbol: The value was given previously
by an execution of some semantic routine.
7
The Lexer
• The parser is the higher level routine, and
calls the lexer yylex()
• Yacc defines the token names in the parser
as C preprocessor names in y.tab.h
– See ch3-01.l
– Whenever the lexer returns a token to the
parser, if the token has an associated value, the
lexer must store the value in yylval before
returning
• In the first example, we explicitly declare yylval.
• In more complex parsers, yacc defines yylval as a
union and puts the definition in y.tab.h
8
Compiling and Running a Simple
Parser
• See P. 59.
• Note that you cannot exchange the order of
the executions of yacc and lex.
9
Arithmetic Expressions and
Ambiguity
• You may input an ambiguity grammar to test Yacc
– There are 16 shift/reduce conflicts in the program of P.60
• There are two ways to specify precedence and associativity
in a grammar implicitly and explicitly
– To specify them implicitly,
• Rewrite the grammar using separate non-terminal symbols for each
precedence level
– See P.62
– To specify them explicitly
• Add some rule to the definition section
%left ‘+’ ‘-’
%left ‘*’ ‘/’
%nonassoc UMINUS
10
Exercise
• Using the expression rules shown in P.62 of
“lex and yacc” to write a yacc program.
– Hint: ch3-01.y and ch3-01.l
• Please list your source code and execution
results.
11
When Not to Use Precedence
Rules
• You can use precedence rules to fix any
shift/reduce conflict that occurs in the
grammar
• We recommend that you use precedence in
only two situations
– In expression grammars
– To resolve the “dangling else” conflict in
grammars for if-then-else language constructs
12
Variables and Typed Tokens
• See Example 3-2, P.64
• Symbol Values and %union
13