Chap 04
Chap 04
1
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
Consider the following example of an assignment statement together with C++ declarations
#define ID 1
#define INTLIT 2
#define ASSIGN 3
#define MINUS 4
#define SLASH 5
#define SEMICOLON 6
Figure 4.3 C++ constant definitions that support tokens
• Lexical analyzers (scanners) extract lexemes (tokens) from a given input string.
• Lexical analyzers skip comments and blanks.
• There are three approaches to building a lexical analyzer:
1. Write a formal description of the token patterns of the language using a
descriptive language related to regular expressions. These descriptions are
used as input to a software tool that automatically generates a lexical analyzer.
The oldest and most accessible of these, name lex, is commonly included as
part of UNIX systems.
2. Design a state transition diagram that describes the token patterns of the
language and write a program that implements the diagram.
3. Design a state transition diagram that describes the token patterns of the
language and hand-construct a table-driven implementation of the state
diagram.
• A state diagrams is a directed graph. The nodes of a state diagram are labeled with state
names. The edges are labeled with the input characters that cause the transitions among
the states.
• Finite state machines are collections of related state diagrams called finite automata.
• A class of languages called regular languages or regular expression can be translated to
finite automata.
2
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
b
b
start a b
0 1 2 3
a
a
a
Figure 1. DFA accepting (a|b)*abb
a b
0 1 0
1 1 2
2 1 3
3 1 0
1. S = {0,1,2,3}
2. Table 1 shows the transition function move for the DFA of Figure 1.
3. Σ = {a, b}
4. s0 = 0
5. F = {3}
3
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
4
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
• In the sentential form 𝑥𝐴𝛼 a top-down parser must select one of the rules having 𝐴 on
the left hand side. Given the following 𝐴-rules,
𝐴 → 𝑏𝐵
𝐴 → 𝑐𝐵𝑏
𝐴→𝑎
A top-down parser must use one of the foregoing rules to transform the left sentential
form 𝑥𝐴𝛼 to
𝑥𝑏𝐵𝛼
𝑥𝑐𝐵𝑏𝛼
𝑥𝑎𝛼
• Recursive-descent parsers are the most common method of implementing a top-
down parser.
• A recursive-descent parser is coded directly from the BNF grammar and has one
function or procedure for each nonterminal symbol.
• Recursive-descent parsers employ LL algorithms. The first L is for a Left to right scan
of the input. The second L is for a Leftmost derivation.
4.3.3. Bottom-Up Parsers
• A bottom-up parser constructs a parse tree by beginning at the leaves and progressing
toward the root.
• Give a right sentential form 𝛼, the parser must determine what substring of 𝛼 is the
RHS (right-hand side) of the rule in the grammar that must be reduced to its LHS (left-
hand side) to produce the previous sentential form in the rightmost derivation.
• Consider the following grammar and derivation
LHS RHS
S → aAc
A → aA
A → b
5
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
//-------------------------------------------------------------
//factor -> ID | INTLIT | ( expr )
//-------------------------------------------------------------
void factor(bool get)
{ ParsePrint("Enter factor");
switch (Token()) {
case ID:
case INTLIT:
Lex(); LexPrint(*o);
break;
case LPAREN:
Lex(); LexPrint(*o);
expr(false);
Expected[0]=RPAREN;
if (Token()!=RPAREN) throw ParseException(Expected,1,Token());
Lex(); LexPrint(*o);
break;
default:
Expected[0]=ID;Expected[1]=INTLIT;Expected[2]=LPAREN;
ParsePrint("Exit factor");
throw ParseException(Expected,3,Token());
break;
}
ParsePrint("Exit factor");
}
Function factor
6
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
//-------------------------------------------------------------
// term -> factor { (*|/) factor }
//-------------------------------------------------------------
void term(bool get)
{ ParsePrint(“Enter term”);
factor(get);
while (Token()==MUL_OP||Token()==DIV_OP) {
Lex(); LexPrint(*o);
factor(false);
}
ParsePrint(“Exit term”);
}
Function term
//-------------------------------------------------------------
// expr -> term { (+|-) term }
//-------------------------------------------------------------
void expr(bool get)
{ ParsePrint("Enter expr");
term(get);
while (Token()==ADD_OP||Token()==DIF_OP) {
Lex(); LexPrint(*o);
term(false);
}
ParsePrint("Exit expr");
}
7
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
8
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
Id LHS RHS
1 expression → term
2 expression → expression + term
3 expression → expression - term
4 term → factor
5 term → term * factor
6 term → term / factor
7 factor → ( expression )
8 factor → id
9 factor → intlit
expression
term
term / factor
factor id(total)
( expression )
expression + term
term factor
factor intlit(47)
id(sum)
Parse tree of (sum+47)/total
9
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
Id LHS RHS
1 E → TE’
2 E’ → +TE’
3 E’ → -TE’
4 E’ → 𝝐
5 T → FT’
6 T’ → *FT’
7 T’ → /FT’
8 T’ → 𝝐
9 F → (E)
10 F → id
11 F → intlit
T E’
F T’
( E ) / F T’
T E’ id ∈
(total)
F T’ + T E’
id ∈ F T’ ∈
( sum)
intlit ∈
(47)
10
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
11
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
• Define FIRST(α), where α is any string of grammar symbols, to be the set of terminals
∗
that begin strings derived from α. If α ⇒ ∈, then ∈ is also in FIRST(α).
To compute 𝐹𝐼𝑅𝑆𝑇(𝑋)
12
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
1. The ACTION function takes as arguments a state i and a terminal a ($, the input
endmarker). The value of ACTION[i,a] can have one of four forms:
1.1. Shift j where j is a state. The action taken by the parser effectively shifts input a
to the stack, but uses state j to represent a.
1.2. Reduce 𝐴 → 𝛽. The action of the parser effectively reduces 𝛽 on the top of the
stack to head 𝐴.
1.3. Accept. The parser accepts the input and finishes parsing.
1.4. Error. The parser discovers an error in its input and takes some corrective
action.
2. We extend the GOTO function, defined on sets of items, to states: if GOTO[Ii,A]=Ij,
then GOTO also maps a state i and a nonterminal A to state j.
Input a1 ai an $
Stack
sm LR Parsing Program Output
s m-1
$ ACTION GOTO
Figure 1. LR Parser Model
left side right side
1 E → E+T
2 E → T
3 T → T*F
4 T → F
5 F → (E)
6 F → id
Table 1. Set of productions expressions
13
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
14
Programming Languages Lexical and Syntax Analysis
CMSC 4023 Chapter 4
( s 0 s1 s m , ai ai +1 a n $)
X 1 X 2 X mai ai +1 a n
15