Chapter 3 Syntax Analysis
Chapter 3 Syntax Analysis
Syntax Analysis
1
The Parsing Problem
• Goals of the parser, given an input program:
• Find all syntax errors;
• for each, produce an appropriate diagnostic message and
recover quickly
• Produce the parse tree, or at least a trace of the parse
tree, for the program
2
Parsing Notations
• Lowercase letters at the beginning of the alphabet
(a, b, …) for terminal symbols.
• Uppercase letters at the beginning of the alphabet
(A, B, …) for non terminal symbols.
• Uppercase letters at the end of the alphabet (W, X,
Y, Z) for terminals or nonterminals.
• Lowercase letters at the end of the alphabet (w, x,
y, z) for strings of terminal.
• Lowercase Greek letters (, , , ) for mixed
strings (terminals and/or nonterminals)
1-3
Syntax of Programming Language
• Described by a context-free grammar (Backus-Naur Form -
BNF).
• Similar to the languages specified by regular expressions, but
more general.
• A grammar gives a precise syntactic specification of a language.
• From some classes of grammars, tools exist that can
automatically construct an efficient parser.
• These tools can also detect syntactic ambiguities and other problems
automatically.
• A compiler based on a grammatical description of a language
is more easily maintained and updated.
• Syntax decide whether program satisfies syntactic structure
• Error detection
• Error recovery
• Simplification: rules on tokens 4
Context-free Grammars
• A context-free grammar for a language specifies the
syntactic structure of programs in that language.
• Components of a grammar:
• a finite set of tokens (obtained from the scanner);
• a set of variables representing “related” sets of strings, e.g.,
declarations, statements, expressions.
• a set of rules that show the structure of these strings.
• an indication of the “top-level” set of strings we care about.
5
Context-free Grammars: Definition
• Formally, a context-free grammar G is a 4-tuple G = (V, T, P,
S), where:
• V is a finite set of variables (or nonterminals). These describe sets
of “related” strings.
• T is a finite set of terminals (i.e., tokens).
• P is a finite set of productions, each of the form
A
where A V is a variable, and (V T)* is a
sequence of terminals and nonterminals.
• S V is the start symbol.
6
Context-free Grammars: An Example
A grammar for palindromic bit-strings:
G = (V, T, P, S), where:
• V = { S, B }
• T = {0, 1}
• P = { S B,
S ,
S 0 S 0,
S 1 S 1,
B 0,
B1
} 7
Context-free Grammars: Terminology
8
Derivations: Example
• Grammar for palindromes: G = (V, T, P, S),
• V = {S},
• T = {0, 1},
• P = { S 0 S 0 | 1 S 1 | 0 | 1 | }.
• A derivation of the string 10101:
S
1S1 (using S 1S1)
1 0S0 1 (using S 0S0)
10101 (using S 1)
9
Leftmost and Rightmost Derivations
• A leftmost derivation is one where, at each step, the leftmost
nonterminal is replaced.
(analogous for rightmost derivation)
• Example: a grammar for arithmetic expressions:
E E + E | E * E | id
• Leftmost derivation:
E E * E E + E * E id + E * E id + id * E id + id * id
• Rightmost derivation:
E E + E E + E * E E + E * id E + id * id id + id * id
10
Summary on Syntax
Grammar rules: Symbols:
E id terminals (tokens) + * ( ) id num
E num non-terminals E
EE+E
EE*E
E(E)
11
Summary on Syntax - Ambiguity
Grammar rules:
E id
E num
EE+E
EE*E
E(E)
12
Summary on Syntax - Grammar rewriting
Ambiguous grammar: Non-ambiguous grammar:
E id EE+T
E num ET
EE+E TT*F
EE*E TF
E(E) F id
F(E)
13
Categories of Parsers
• Top down - produce the parse tree, beginning at the root
• Order is that of a leftmost derivation - i.e. branches from a particular node are followed in left-to-
right order
• Traces or builds the parse tree in preorder - i.e. each node is visited before its branches are
followed
• For every non-terminal and token predict the next production
• Top Down Parsers are classified into two:
• Recursive Descent Parsers
• LL Parsers
• Bottom up - produce the parse tree, beginning at the leaves towards the root
• Order is that of the reverse of a rightmost derivation - That is, the sentential forms of the
derivation are produced in order of last to first.
• For every potential right hand side and token decide when a production is found
14
Top-Down Parsers
• Determining the next sentential form is a matter of choosing the correct grammar rule
that has A as its LHS
• the leftmost derivation, using only the first token produced by A
• E.g. a sentential form, xA with the following A-rules
A → bB
A → cBb
A→a
the parser must choose the correct A-rule to get the next sentential form, which could be xbB, xcBb, or xa
• This is the parsing decision problem for top-down parsers
• The most common top-down parsing algorithms are called LL algorithms
• first L specifies a left-to-right scan of the input
• second L specifies that a leftmost derivation is generated.
• Two implementations of the algorithms are possible
• Using a recursive-descent parser coded version of a syntax analyzer based directly on the BNF description of the
syntax of language.
15
• Using a parsing table to implement the BNF rules.
Complexity of Parsing
• Parsers (algorithms) that work for any unambiguous
grammar are complex and inefficient
• Complexity of such algorithms is O(n3), where n is the length
of the input
• Thus, need to search for faster algorithms, though less general i.e.
generality is traded for efficiency.
• Compilers use parsers that only work for a subset of all
unambiguous grammars, but do it in linear time ( O(n),
where n is the length of the input )
16
Recursive-Descent Parsing
• Recursive descent parsing is a method where each non-terminal in the grammar is
associated with a procedure or function in the parsing code.
• There is a subprogram for each nonterminal in the grammar can parse sentences that
can be generated by that nonterminal
• These procedures recursively call each other to match the input string against the
production rules of the grammar.
• Recursive descent parsers are relatively straightforward to implement and understand,
but
• They can be inefficient for grammars with left recursion or ambiguity.
• EBNF is ideally suited for being the basis for a recursive-descent parser
• because EBNF minimizes the number of nonterminals
• A recursive-descent parser is an LL parser
• EBNF
17
Recursive-Descent Parsing (cont.)
• A grammar for simple expressions:
18
Recursive-Descent Parsing (cont.)
• Assume we have a lexical analyzer named lex, which puts
the next token code in nextToken
• The coding process when there is only one RHS:
• For each terminal symbol in the RHS, compare it with the next
input token;
• if they match, continue, else there is an error
• For each nonterminal symbol in the RHS, call its associated
parsing subprogram
19
Recursive-Descent Parsing (cont.)
/* Function expr
Parses strings in the language
generated by the rule:
<expr> → <term> {(+ | -) <term>}
*/
void expr() {
term();
/* As long as the next token is + or -, call
lex to get the next token and parse the • This particular routine does not detect
next term */
errors
while (nextToken == ADD_OP ||
• Convention: Every parsing routine
nextToken == SUB_OP){ leaves the next token in nextToken
lex();
term();
}
20
}
Recursive-Descent Parsing (cont.)
/* term
Parses strings in the language generated by the rule:
<term> -> <factor> {(* | /) <factor>)
*/
void term() {
printf("Enter <term>\n");
/* Parse the first factor */
factor();
/* As long as the next token is * or /,
next token and parse the next factor */
while (nextToken == MULT_OP || nextToken == DIV_OP) {
lex();
factor();
}
printf("Exit <term>\n");
} /* End of function term */
21
Recursive-Descent Parsing (cont.)
• A nonterminal that has more than one RHS requires an initial
process to determine which RHS it is to parse
• The correct RHS is chosen on the basis of the next token of input (the
lookahead)
• The next token is compared with the first token that can be generated by
each RHS until a match is found
• If no match is found, it is a syntax error
22
Recursive-Descent Parsing (cont.)
/* Function factor
Parses strings in the language
generated by the rule:
<factor> -> id | (<expr>) */
void factor() {
/* If the RHS is (<expr>) – call lex to pass over the left parenthesis,
call expr, and check for the right parenthesis */
else if (nextToken == LP_CODE) {
lex();
expr();
if (nextToken == RP_CODE)
lex();
else
error();
} /* End of else if (nextToken == ... */
Next token is: 25 Next lexeme is ( Next token is: 11 Next lexeme is total
Enter <expr> Enter <factor>
Enter <term> Next token is: -1 Next lexeme is EOF
Enter <factor> Exit <factor>
Enter
Next token is: 11 Next lexeme is sum Exit <term>
Enter <expr> Exit <expr> Enter Exit
Enter <term>
Enter <factor>
Next token is: 21 Next lexeme is +
Exit <factor>
Exit <term>
Next token is: 10 Next lexeme is 47
Enter <term>
Enter <factor>
Next token is: 26 Next lexeme is )
Exit <factor>
Exit <term>
Exit <expr>
Next token is: 24 Next lexeme is /
Exit <factor> 24
Recursive-Descent Parsing - Left Recursion
Problem
• A problem in LL Grammar Class
• Left recursion: E E + T
• Symbol on left also first symbol on right
• Predictive parsing fails when two rules can start with same token
EE+T
ET
• If a grammar has left recursion, either direct or indirect, it cannot be the basis for a top-down
parser
• A grammar can be modified to remove left recursion
• For each nonterminal, A,
1. Group the A-rules as A → Aα1 | … | Aαm | β1 | β2 | … | βn
where none of the β‘s begins with A
2. Replace the original A-rules with
A → β1A’ | β2A’ | … | βnA’
25
A’ → α1A’ | α2A’ | … | αmA’ | ε
More left recursion
• Non-terminal with two rules starting with same prefix
26
Recursive-Descent Parsing – lack of pairwise
disjointness
• Another problem that disallows top-down parsing is
• whether the parser can always choose the correct RHS on the basis of next
token input using only the first token generated by the leftmost nonterminal
in the current sentential form i.e. one token lookahead.
• This is referred to as lack of pairwise disjointness
• To solve this, pairwise disjointness test needs to be performed on
FIRST set.
FIRST() = {a | =>* a }
(If =>* , is in FIRST())
A aB | BAb
B aB | b - fail the test
28
Recursive-Descent Parsing (cont.)
• Left factoring can resolve the problem
Replace
<variable> identifier | identifier [<expression>]
with
<variable> identifier <new>
<new> | [<expression>]
or
<variable> identifier [[<expression>]]
(the outer brackets are metasymbols of EBNF)
29
Top-down parsing - Example
• Builds parse tree in preorder
• LL(1) example
30
LL Parsers
• LL parsers are a type of top-down parser used in computer science to analyze and process the structure of
strings according to a formal grammar.
• The term "LL" stands for "Left-to-right, Leftmost derivation," indicating the strategy used by these parsers to
process input.
• Here are some key characteristics of LL parsers:
• Left-to-right scanning: LL parsers scan the input string from left to right, processing symbols in the order
they appear.
• Leftmost derivation: LL parsers aim to derive the leftmost derivation of the input string. This means that
they always expand the leftmost non-terminal in the current sentential form.
• Predictive parsing: LL parsers use predictive parsing to determine which production rule to apply at each
step based on a finite lookahead. This lookahead involves examining a fixed number of input symbols to
predict the next production rule to apply.
• LL(k) grammars: LL parsers are often characterized by the maximum number of tokens they look ahead in
the input string. For example, LL(1) parsers look ahead one token to decide which production rule to apply,
while LL(k) parsers look ahead k tokens.
• Table-driven parsing: LL parsers are typically implemented using parsing tables, which store information
about which production rule to apply for each combination of non-terminal and lookahead symbol.
31
Bottom Up Parsing
• Unlike top-down parsing, which starts with the root of the parse tree and works down to the leaves, bottom-
up parsing begins with the input string and builds the parse tree from the leaves up to the root.
• Here are the key characteristics of bottom-up parsing:
1. Shift-Reduce Parsing: Bottom-up parsing is often implemented using a strategy called shift-reduce parsing.
In shift-reduce parsing, the parser shifts input symbols onto a stack until it can reduce a portion of the stack
to a non-terminal symbol according to the grammar rules.
2. Reduction: Reduction involves replacing a sequence of symbols on the top of the stack with a non-terminal
symbol according to a production rule in the grammar. The parser continues reducing portions of the stack
until it reaches the start symbol of the grammar.
3. Handle: During reduction, the portion of the stack that matches the right-hand side of a production rule is
called a handle. The parser identifies handles and replaces them with the corresponding non-terminal
symbol.
4. Bottom-up Parse Tree: The result of bottom-up parsing is a parse tree rooted at the start symbol of the
grammar, with the input string as its leaves. Each internal node in the parse tree represents a non-terminal
symbol, and its children represent the symbols derived from that non-terminal.
5. Shift-Reduce Conflict and Reduce-Reduce Conflict: Bottom-up parsing may encounter shift-reduce conflicts
or reduce-reduce conflicts when deciding whether to shift a symbol onto the stack or reduce a portion of
the stack. Conflicts can arise due to ambiguity or lack of sufficient lookahead in the grammar.
32
Bottom-Up Parsers
• Given a right sentential form, , determine what substring
of is the right-hand side of the rule in the grammar that
must be reduced to produce the previous sentential form in
the right derivation
• Eg.
S aAc
A aA | b
E→E+T|T
T→T*F|F
F → ( E ) | id (1)
34
Bottom-up Parsing (cont.)
• Intuition about handles (continued):
• Def: is the handle of the right sentential form
= w if and only if S =>*rm Aw =>rm w
• Def: is a simple phrase of the right sentential form if and only if S =>* = 1A2 =>
12
35
Bottom up Parsing - Shift-Reduce parsing
• Uses Pushdown Automata (PDA)
• Parser stack: symbols (terminal and non-terminals) + automaton states
• Parsing actions: sequence of shift and reduce operations
• Action determined by top of stack and k input tokens
• Shift: move next token to top of stack
• Reduce: replacing the handle on the top of the parse stack with its
corresponding LHS
• For example: rule X A B C
pop C, B, A then push X
• Convention: $ stands for end of file
• The LR family of shift-reduce parsers is the most common bottom-up
parsing approach
36
Shift-reduce Parsing: Example
Grammar: S → aABe
A → Abc | b
B→d
37
Shift-Reduce Parsing: cont’d
• Need to choose reductions carefully:
abbcde aAbcde aAbcBe …
doesn’t work.
• A handle of a string s is a substring s.t.:
• matches the RHS of a rule A → ; and
• replacing by A (the LHS of the rule) represents a step in the
reverse of a rightmost derivation of s.
• For shift-reduce parsing, reduce only handles.
38
Shift-reduce Parsing: Implementation
• Data Structures:
• a stack, its bottom marked by ‘$’. Initially empty.
• the input string, its right end marked by ‘$’. Initially w.
• Actions:
repeat
1. Shift some ( 0) symbols from the input string onto the stack, until a
handle appears on top of the stack.
2. Reduce to the LHS of the appropriate production.
until ready to accept.
• Acceptance: when input is empty and stack contains only the
start symbol.
39
Example
Stack (→) Input Action
$ abbcde$ shift
$a bbcde$ shift
$ab bcde$ reduce: A → b Grammar :
$aA bcde$ shift S → aABe
$aAb cde$ shift A → Abc | b
$aAbc de$ reduce: A → Abc B→d
$aA de$ shift
$aAd e$ reduce: B → d
$aAB e$ shift
$aABe $ reduce: S → aABe
$S $ accept
40
Conflicts
• Can’t decide whether to shift or to reduce
• both seem OK (“shift-reduce conflict”).
Example: S → if E then S | if E then S else S | …
41
Advantages of LR parsers
• They will work for nearly all grammars that describe
programming languages.
• They work on a larger class of grammars than other bottom-
up algorithms, but are as efficient as any other bottom-up
parser.
• They can detect syntax errors as soon as it is possible.
• The LR class of grammars is a superset of the class parsable
by LL parsers.
42
Constructing LR Parsers
• LR parsers must be constructed with a tool
• Knuth’s insight: A bottom-up parser could use the entire
history of the parse, up to the current point, to make parsing
decisions
• There were only a finite and relatively small number of different
parse situations that could have occurred, so the history could be
stored in a parser state, on the parse stack
43
Constructing LR Parsers (cont.)
• An LR configuration stores the state of an LR parser
(S0X1S1X2S2…XmSm, aiai+1…an$)
44
Constructing LR Parsers (cont.)
• LR parsers are table driven, where the table has
two components
• The ACTION table specifies the action of the parser,
given the parser state and the next token
• Rows are state names; columns are terminals
• The GOTO table specifies which state to put on top
of the parse stack after a reduction action is done
• Rows are state names; columns are nonterminals
45
Structure of An LR Parser
46
Parser Actions
• Initial configuration: (S0, a1…an$)
• Parser actions:
• If ACTION[Sm, ai] = Shift S, the next configuration is:
(S0X1S1X2S2…XmSmaiS, ai+1…an$)
• If ACTION[Sm, ai] = Reduce A and S = GOTO[Sm-r, A], where r =
the length of , the next configuration is
(S0X1S1X2S2…Xm-rSm-rAS, aiai+1…an$)
• If ACTION[Sm, ai] = Accept, the parse is complete and no errors
were found.
• If ACTION[Sm, ai] = Error, the parser calls an error-handling routine.
47
LR Parsing Table
48
Bottom-up Parsing (cont.)
• Grammar (1) rewritten and numbered for easy
referencing in a parsing table.
1. E→E+T
2. E→T
3. T→T*F
4. T→F
5. F→(E)
6. F → id
1-49
Bottom-up Parsing (cont.)
Stack Input Action
0 id + id * id $ Shift 5
0E1 + id * id $ Shift 6
0E1+6 id * id $ Shift 5
0E1+6T9 * id $ Shift 7
0E1+6T9*7 id $ Shift 5
0E1 $ Accept
1-50
YACC Syntax Analysis Tool
51
Introduction
• What is YACC ?
• Tool which will produce a parser for a given grammar.
• YACC (Yet Another Compiler Compiler)
• Program designed to compile a LALR(1) grammar and to
produce the source code of the syntactic analyzer of the
language produced by this grammar.
52
Common Tools
• ANTLR tool
• Generates LL(k) parsers
• Yacc (Yet Another Compiler Compiler)
• Generates LALR parsers
• Bison
• Improved version of Yacc
53
YACC File Format
%{
C declarations
%}
yacc declarations
%%
Grammar rules
%%
Additional C code
54
YACC
• Input specification for YACC (similar to flex)
• Three parts: Definitions, Rules, User code
• Use “%%” as a delimiter for each part
56
YACC Declaration Summary
`%start'
Specify the grammar's start symbol
`%union'
Declare the collection of data types that semantic values may have
`%token'
Declare a terminal symbol (token type name) with no precedence or associativity specified
`%type'
Declare the type of semantic values for a nonterminal symbol
`%right'
Declare a terminal symbol (token type name) that is
right-associative
`%left'
Declare a terminal symbol (token type name) that is left-associative
`%nonassoc'
Declare a terminal symbol (token type name) that is nonassociative (using it in a way that would be 57
associative is a syntax error, ex: x op. y op. z is syntax error)
Rules Section
• This section defines grammar
• Example
expr : expr '+' term | term;
term : term '*' factor | factor;
factor : '(' expr ')' | ID | NUM;
58
Rules Section
• Normally written like this
• Example:
expr : expr '+' term
| term
;
term : term '*' factor
| factor
;
factor : '(' expr ')'
| ID
| NUM
;
59
The Position of Rules
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
;
60
The Position of Rules
$1
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
;
61
The Position of Rules
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
; $2
62
The Position of Rules
expr : expr '+' term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : '(' expr ')' { $$ = $2; }
| ID
| NUM
; $3 Default: $$ = $1;
63
YACC File Example
%{
#include <stdio.h>
%}
int main(void)
{
yyparse();
return 0;
} 64
Example 1
%{ #include <ctype.h> %}
Also results in definition of
%token DIGIT #define DIGIT xxx
%%
line : expr ‘\n’ { printf(“= %d\n”, $1); }
;
expr : expr ‘+’ term { $$ = $1 + $3; }
| term { $$ = $1; }
;
term : term ‘*’ factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : ‘(’ expr ‘)’ { $$ = $2; }
| DIGIT { $$ = $1; }
; Attribute of factor (child)
%% Attribute of
int yylex() term (parent) Attribute of token
{ int c = getchar();
(stored in yylval)
if (isdigit(c))
{ yylval = c-’0’; Example of a very crude lexical
return DIGIT; analyzer invoked by the parser
}
return c;
} 65
Example 2
%{
#include <ctype.h>
Double type for attributes
#include <stdio.h> and yylval
#define YYSTYPE double
%}
%token NUMBER
%left ‘+’ ‘-’
%left ‘*’ ‘/’
%right UMINUS
%%
lines : lines expr ‘\n’ { printf(“= %g\n”, $2); }
| lines ‘\n’
| /* empty */
;
expr : expr ‘+’ expr { $$ = $1 + $3; }
| expr ‘-’ expr { $$ = $1 - $3; }
| expr ‘*’ expr { $$ = $1 * $3; }
| expr ‘/’ expr { $$ = $1 / $3; }
| ‘(’ expr ‘)’ { $$ = $2; }
| ‘-’ expr %prec UMINUS { $$ = -$2; }
| NUMBER
;
%% 66
Example 2 (cont’d)
%%
int yylex()
{ int c;
while ((c = getchar()) == ‘ ‘)
;
if ((c == ‘.’) || isdigit(c)) Crude lexical analyzer for
{ ungetc(c, stdin); fp doubles and arithmetic
scanf(“%lf”, &yylval); operators
return NUMBER;
}
return c;
}
int main()
{ if (yyparse() != 0)
fprintf(stderr, “Abnormal exit\n”); Run the parser
return 0;
}
int yyerror(char *s)
{ fprintf(stderr, “Error: %s\n”, s);
Invoked by parser
} to report parse errors
67
How YACC Works
File containing desired grammar in yacc format
gram.y
cc
or gcc
C compiler
69
Benefits of YACC
• Faster development
• Compared to manual implementation
• Easier to change the specification and generate new
parser
• Than to modify 1000s of lines of code to add, change, delete
an existing feature
• Less error-prone, as code is generated
• Cost: Learning curve
• Invest once, amortized over 40+ years career
70
Lex with Yacc
Lex source Yacc source
(Lexical Rules) (Grammar Rules)
Lex Yacc
lex.yy.c y.tab.c
call
Parsed
Input yylex() yyparse()
Input
return token
71
YACC works with Lex
How to work ?
72
YACC works with Lex
[0-9]+
call yylex()
73
Simple example
• Implement a calculator which can recognize adding or subtracting of
numbers
[linux33]% ./y_calc
1+101
= 102
[linux33] % ./y_calc
1000-300+200+100
= 1000
[linux33] %
74
Example – the Lex part
%{
#include <math.h>
#include "y.tab.h"
extern int yylval;
%} Definitions
pattern
action
%%
[0-9]+ { yylval = atoi(yytext);
return NUMBER; }
[\t ]+ ; /* Do nothing for white space */
\n return 0;/* End of the logic */
. return yytext[0];
%% Rules
75
Example – the Yacc part
%token NAME NUMBER
%%
statement: NAME '=' expression
| expression
Definitions
{ printf("= %d\n", $1); }
;
expression:expression '+' NUMBER
{ $$ = $1 + $3; }
|expression '-' NUMBER
{ $$ = $1 - $3; } Include Yacc library
| NUMBER (-ly)
{ $$ = $1; }
;
Rules
76
LEX and YACC – Another Example
%{ yacc -d xxx.y
scanner.l Produced
#include <stdio.h>
y.tab.h:
#include "y.tab.h"
%}
id [_a-zA-Z][_a-zA-Z0-9]* # define CHAR 258
%% # define FLOAT 259
int { return INT; } # define ID 260
char { return CHAR; } # define INT 261
float { return FLOAT; }
{id} { return ID;}
%{
parser.y
#include <stdio.h>
#include <stdlib.h>
%}
%token CHAR, FLOAT, ID, INT
%%
77
Lex vs. Yacc
• Lex
• Lex generates C code for a lexical analyzer, or scanner
• Lex uses patterns that match strings in the input and converts the
strings to tokens
• Yacc
• Yacc generates C code for syntax analyzer, or parser.
• Yacc uses grammar rules that allow it to analyze tokens from Lex
and create a syntax tree.
78