Unit 1 Lexical Analyzer
Unit 1 Lexical Analyzer
Sem VI
error messages
6
Applications of Compiler Technology & Tools
7
Other Applications
• In addition to the development of a compiler, the techniques used in
compiler design can be applicable to many problems in computer
science.
– Techniques used in a lexical analyzer can be used in text editors, information
retrieval system, and pattern recognition programs.
– A symbolic equation solver which takes an equation as input. That program should
parse the given input equation.
– Most of the techniques used in compiler design can be used in Natural Language
Processing (NLP) systems.
oldval 12
– Both of them do similar things; But the lexical analyzer deals with the simple non-
recursive constructs of the language.
– The syntax analyzer deals with the recursive constructs of the language.
– The lexical analyzer recognizes the smallest meaningful units (tokens) in a source
program.
– The syntax analyzer works on the smallest meaningful units (tokens) in a source
program to recognize meaningful structures in our programming language.
Prof. Reshma Pise
Parsing Techniques
• Depending on how the parse tree is created, there are different parsing
techniques.
• These parsing techniques are categorized into two groups:
– Top-Down Parsing,
– Bottom-Up Parsing
• Top-Down Parsing:
– Construction of the parse tree starts at the root, and proceeds towards the leaves.
– Efficient top-down parsers can be easily constructed by hand.
– Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).
• Bottom-Up Parsing:
– Construction of the parse tree starts at the leaves, and proceeds towards the root.
– Normally efficient bottom-up parsers are created with the help of some software
tools.
– Bottom-up parsing is also known as shift-reduce parsing.
– Operator-Precedence Parsing – simple, restrictive, easy to implement
– LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR
Prof. Reshma Pise
3. Semantic Analyzer
• A semantic analyzer checks the source program for semantic errors and
collects the type information for the code generation.
• Type-checking is an important part of semantic analyzer.
• Scope checking
• Context-free grammars used in the syntax analysis are integrated with
attributes (semantic rules)
– the result is a syntax-directed translation,
– Attribute grammars
• Ex: a = b + c;
newval := oldval + 12
• The type of the identifier newval must match with type of the expression
(oldval+12)
Prof. Reshma Pise
4. Intermediate Code Generation
• A compiler may produce an explicit intermediate codes representing
the source program.
• These intermediate codes are generally machine (architecture)
independent. But the level of intermediate codes is closer to the level of
machine codes.
• Ex:
newval := oldval * fact + 1
• Ex:
temp1 := id2 * id3
id1 := temp1 * 1 => id1 = temp1 ( transformations)
a = a*2 => a= a+a
a = b + c;
for(i=1; i<100; i++)
{ a = b + c; Loop Invariant
z++; =>
x = y + z;
}
MOVE id2,R1
MULT id3,R1 MOV R1, id3
MOV R2, 60
ADD #1,R1
MUL R1, R2
MOVE R1,id1 ADD R1, id2
Prof. Reshma Pise
Prof. Reshma Pise
The Structure of a Compiler
Tokens
Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Code
Parse tree
Code Generator
Semantic Process
[Semantic analyzer] Target machine code
main ()
{
int i,sum;
float f;
sum = 0;
for (i=1; i<=10; i++);
sum = sum + i;
printf("%d\n",sum);
}
Prof. Reshma Pise
Prof. Reshma Pise
Prof. Reshma Pise
Prof. Reshma Pise
Prof. Reshma Pise
Prof. Reshma Pise
Prof. Reshma Pise
• S -> D SL
• D -> type idlist ;
• SL -> A | IF | While
• Type -> int | float ….
• …..
• OR
• S -> D SL D -> Type idlist; D
• D -> type idlist ;
• SL -> S | A | IF | While
• Type -> int | float
Approaches to implementation
. Use assembly language- Most efficient but most difficult to implement
. Use tools like lex, flex- Easy to implement but not as efficient as the first
two cases
Prof. Reshma Pise
Lexical Analyzer in Perspective
• LEXICAL ANALYZER • PARSER
– Scan Input – Perform Syntax Analysis
– Remove WS, NL, … – Actions Dictated by Token Order
– Identify Tokens – Update Symbol Table Entries
– Create Symbol Table – Create Abstract Rep. of Source
– Insert Tokens into ST – Generate Errors
– Generate Errors – And More…. (We’ll see later)
– Send Tokens to Parser
S -closure({s0})
c nextchar;
while c eof do
NFA
S -closure(move(S,c));
c nextchar; simulation
end;
if SF then return “yes”
else return “no”
Prof. Reshma Pise
Pattern Matching Based on NFA (1)
P1 : a {action}
P2 : abb {action} 3 patterns
P3 : a*b+ {action}
NFA’s :
P1
start a
1 2
P2
start a b b
3 4 5 6
a b
start P3
7 8
b
start a b b
0 3 4 5 6 P2
a b
P3
7 8
b
Examples a a b a
{0,1,3,7} {2,4,7} {7} {8} death
pattern matched: - P1 - P3 -
a b b
{0,1,3,7} {2,4,7} {5,8} {6,8}
break tie in
pattern matched: - P1 P3 P2,P3 favor of P2
Prof. Reshma Pise
DFA for Lexical Analyzers
Alternatively Construct DFA:
keep track of correspondence between patterns and new accepting states
Input Symbol
STATE a b Pattern
{0,1,3,7} {2,4,7} {8} none
{2,4,7} {7} {5,8} P1
{8} - {8} P3
{7} {7} {8} none
{5,8} - {6,8} P3
break tie in
{6,8} - {8} P2 favor of P2
Prof. Reshma Pise
Example
Input: aaba
{0,1,3,7} {2,4,7} {7} {8}
Input: aba
{0,1,3,7} {2,4,7} {5,8} P3
Input Symbol
STATE a b Pattern
{0,1,3,7} {2,4,7} {8} none
{2,4,7} {7} {5,8} P1
{8} - {8} P3
{7} {7} {8} none
{5,8} - {6,8} P3
{6,8} - {8} P2
Prof. Reshma Pise
Example- TD Based Lexical Analyzers
other
8 * RTN(LT)
We’ve accepted “<” and have read other char that must be
unread.
DRAW a TD for {<=,<>,=,>=,>}.
Prof. Reshma Pise
Example : All RELOPs
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
= 4
*
return(relop, LT)
5 return(relop, EQ)
>
=
6 7 return(relop, GE)
other
8
*
return(relop, GT)
letter or digit
letter other
9 10 11
digit
*
digit other
25 26 27
case 8: .......
}
}
}
case 8: retract();
retToken.attribute = GT;
return(RELOP);
}
}
}
Prof. Reshma Pise
Implementing Transition Diagrams
.............
case 9: c = nextchar();
if (isletter(c)) state = 10;
else state = fail();
break;
case 10; c = nextchar();
if (isletter(c)) state = 10;
else if (isdigit(c)) state = 10;
else state = 11;
break;
case 11; retract(1); lexical_value = install_id();
return ( gettoken(lexical_value) );
.............
letter or digit
*
letter other
reads token 9 10 11
name fromProf.ST
Reshma Pise
Implementing Transition Diagrams, III
digit
*
digit other
25 26 27
advances
............. forward
case 25; c = nextchar();
if (isdigit(c)) state = 26;
else state = fail();
Case numbers correspond
break;
case 26; c = nextchar();
to transition diagram states
if (isdigit(c)) state = 26;
!
else state = 27;
break;
case 27; retract(1); lexical_value = install_num();
return ( NUM );
.............
looks at the region
retracts lexeme_beginning ... forward
Prof. Reshma Pise
forward
When Failures Occur:
int fail()
{
forward = lexeme beginning;
switch (start) {
case 0: start = 9; break;
case 9: start = 12; break;
case 12: start = 20; break;
case 20: start = 25; break;
Switch to
case 25: recover(); break;
next transition
default: /* lex error */
diagram
}
return start;
}
DO10I=1.25 // assignment
E = M * C * * 2 eof
lexeme beginning
forward (scans
ahead to find
pattern match)