Lex
Lex
•Lex is a program that generates lexical analyzer. It is used with YACC parser
generator.
•It reads the input stream and produces the source code as output through
implementing the lexical analyzer in the C program.
The function of Lex is as follows
•Firstly lexical analyzer creates a program lex.1 in the Lex language. Then Lex compiler runs the
lex.1 program and produces a C program lex.yy.c.
•Finally C compiler runs the lex.yy.c program and produces an object program a.out.
•a.out is lexical analyzer that transforms an input stream into a sequence of tokens.
Lex
Lex File Format
A Lex program is separated into three sections by %% delimiters. The formal of Lex source is as
follows:
{ definitions }
%%
{ rules }
%%
{ user subroutines }
Definitions include declarations of constant, variable and regular definitions.
Rules define the statement of form p1 {action1} p2 {action2}....pn {action}.
Where pi describes the regular expression and action1 describes the actions what
action the lexical analyzer should take when pattern pi matches a lexeme.
User subroutines are auxiliary procedures needed by the actions. The subroutine can be loaded
with the lexical analyzer and compiled separately.
Context Free Grammar
Context free grammar is a formal grammar which is used to generate all possible strings in a given formal language.
Context free grammar G can be defined by four tuples as:
G= (V, T, P, S)
Where,
Derivation is a sequence of production rules. It is used to get the input string through these production rules. During parsing
we have to take two decisions. These are as follows:
•We have to decide the non-terminal which is to be replaced.
•We have to decide the production rule by which the non-terminal will be replaced.
We have two options to decide which non-terminal to be replaced with production rule.
Left-most Derivation
In the left most derivation, the input is scanned and replaced with the production rule from left to right. So in left most
derivatives we read the input string from left to right.
Example:
Production rules:
S=S+S
S=S-S
S = a | b |c
Input:
a-b+c
Derivation
In the right most derivation, the input is scanned and replaced with the production rule from right to left. So in right most
derivatives we read the input string from right to left.
Example:
S=S+S
S=S-S
S = a | b |c
Input:
a-b+c
•Parse tree is the graphical representation of symbol. The symbol can be terminal or non-terminal.
•In parsing, the string is derived using the start symbol. The root of the parse tree is that start symbol.
•It is the graphical representation of symbol that can be terminals or non-terminals.
•Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So, the operator in the parent node has
less precedence over the operator in the sub-tree.
The parse tree follows these points:
All leaf nodes have to be terminals.
All interior nodes have to be non-terminals.
In-order traversal gives original input string.
Parse Tree
Example:
Production rules:
T= T + T | T * T
T = a|b|c
Input:
a*b+c
Parse Tree
Step 3:
Step 1:
Step 2:
Step 4:
Step 3:
Ambiguity
A grammar is said to be ambiguous if there exists more than one leftmost derivation or
more than one rightmost derivative or more than one parse tree for the given input
string. If the grammar is not ambiguous then it is called unambiguous.
Example:
S = aSb | SS
S=∈
For the string aabb, the above grammar generates two parse trees:
If the grammar has ambiguity then it is not good for a compiler construction. No method can automatically detect
and remove the ambiguity but you can remove ambiguity by re-writing the whole grammar without ambiguity.