Lecture003 LEXandYACC
Lecture003 LEXandYACC
Kun-Yuan Hsieh
[email protected]
Programming Language Lab., NTHU
{definitions}
%% (required)
{transition rules}
%% (optional)
{user subroutines}
• The absolute minimum Lex program is thus
%% PLLab, NTHU,Cs2403 Programming Languages 8
Lex v.s. Yacc
• Lex
– Lex generates C code for a lexical analyzer, or sca
nner
– Lex uses patterns that match strings in the input a
nd converts the strings to tokens
• Yacc
– Yacc generates C code for syntax analyzer, or par
ser.
– Yacc uses grammar rules that allow it to analyze t
okens from Lex and create a syntax tree.
Lex Yacc
lex.yy.c y.tab.c
call
Parsed
Input yylex() yyparse()
Input
return token
PLLab, NTHU,Cs2403 Programming Languages 10
Regular Expressions
• E.g.
ab?c => ac or abc
[a-z]+ => all strings of lower case letters
[a-zA-Z][a-zA-Z0-9]* => all alphanumer
ic strings with a leading alphabetic character
%%
“=“ printf(“operator: ASSIGNMENT”);
↓↓
a=b+c;d=b*c;
…
%%
pink {npink++; REJECT;}
ink {nink++; REJECT;}
pin {npin++; REJECT;}
.|
\n ;
%%
…
• E.g.
[a-z]+ printf(“%s”, yytext);
[a-z]+ ECHO;
[a-zA-Z]+ {words++; chars += yyleng;}
%%
{letter}+ {printf(“a word\n”); counter++;}
%%
main() {
yylex();
printf(“There are total %d words\n”, counter);
}
PLLab, NTHU,Cs2403 Programming Languages 27
Usage
• To run Lex on a source file, type
lex scanner.l
• It produces a file named lex.yy.c which is a
C program for the lexical analyzer.
• To compile lex.yy.c, type
gcc lex.yy.c –ll
• To run the lexical analyzer program, type
./a.out < inputfile
PLLab, NTHU,Cs2403 Programming Languages 28
EXAMPLE
%{ int noms,mots,lignes;
%}
mot [a-z]+
maj [A-Z]
%%
{maj}{mot} {noms++; printf("Bonjour %s.
n",yytext);}
{mot} {mots++;} \n {lignes+
+;printf(" Encore !\n");}
. ;
%%
main()
{
noms=mots=lignes=0;
yylex();
printf("nb de noms :%d, mots: %d, lignes: %d.\
n",noms,mots,lignes);
} PLLab, NTHU,Cs2403 Programming Languages 29
Versions of Lex
• AT&T -- lex
http://www.combo.org/lex_yacc_page/lex.html
• GNU -- flex
http://www.gnu.org/manual/flex-2.5.4/flex.html
• a Win32 version of flex :
http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html
or Cygwin :
http://sources.redhat.com/cygwin/
• What is YACC ?
– Tool which will produce a parser for a
given grammar.
– YACC (Yet Another Compiler Compiler)
is a program designed to compile a LAL
R(1) grammar and to produce the sourc
e code of the syntactic analyzer of the la
nguage produced by this grammar.
cc
or gcc
C compiler
int main(void)
{
yyparse();
return 0;
}
LEX
yylex()
I nput programs
YACC
yyparse() How to 12 + 26
work ?
LEX [0-9]+
call yylex() yylex()
I nput programs
YACC
yyparse() 12 + 26
next token is NUM
%{
#include <stdio.h>
#include <stdlib.h>
%} It is a terminal
%token ID NUM
%start expr 由 expr 開始 parse
LEX [0-9]+
call yylex() yylex()
I nput programs
YACC
yyparse() 12 + 26
next token is NUM
%{ parser.y
#include <stdio.h>
#include <stdlib.h>
%}
%token CHAR, FLOAT, ID, INT
%% PLLab, NTHU,Cs2403 Programming Languages 49
YACC
• Rules may be recursive
• Rules may be ambiguous*
• Uses bottom up Shift/Reduce parsing
– Get a token
– Push onto stack Phrase -> cart_animal AND CART
| work_animal AND PLOW
– Can it reduced (How do we know?) …
• If yes: Reduce using a rule
• If no: Get another token
• Yacc cannot look ahead more than one token
• Yacc (AT&T)
– yacc –d xxx.y 產生 y.tab.c, 與 yacc 相同
不然會產生 xxx.tab.c
• Bison (GNU)
– bison –d –y xxx.y
%right ‘=‘
%left '<' '>' NE LE GE
%left '+' '-‘
%left '*' '/'
highest precedence
• shift/reduce conflict
– occurs when a grammar is written in suc
h a way that a decision between shifting
and reducing can not be made.
– ex: IF-ELSE ambigious.
• To resolve this conflict, yacc will choo
se to shift.
`%union'
Declare the collection of data types that semantic values may h
ave
`%token'
Declare a terminal symbol (token type name) with no precedenc
e or
associativity specified
`%type'
Declare the type of semantic values for a nonterminal symbol
`%left'
Declare a terminal symbol (token type name) that is left-associa
tive
`%nonassoc'
Declare a terminal symbol (token type name) that is nonassocia
tive
(using it in a way that would be associative is a syntax error,
ex: x op. y op. z is syntax error)