0% found this document useful (0 votes)
5 views30 pages

Lec 2

The document outlines the role of a lexical analyzer in the compilation process, which involves reading input characters and producing a sequence of tokens for the parser. It details the specification and recognition of tokens, including the use of regular expressions and finite automata. Additionally, it discusses the importance of separating lexical analysis from parsing for improved compiler efficiency and design simplicity.

Uploaded by

ezatrashad2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views30 pages

Lec 2

The document outlines the role of a lexical analyzer in the compilation process, which involves reading input characters and producing a sequence of tokens for the parser. It details the specification and recognition of tokens, including the use of regular expressions and finite automata. Additionally, it discusses the importance of separating lexical analysis from parsing for improved compiler efficiency and design simplicity.

Uploaded by

ezatrashad2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

COMPILERS

Lexical Analysis
OUTLINE
▪ Role of lexical analyzer
▪ Specification of tokens
▪ Recognition of tokens
▪ Lexical analyzer generator
THE ROLE OF LEXICAL ANALYZER

token
Source To semantic
Lexical Analyzer Parser
program analysis
getNextToken

Symbol
table
WHY TO SEPARATE LEXICAL ANALYSIS AND
PARSING
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability
 Lexical analyzer: reads input characters and
produces a sequence of tokens as output
(nexttoken()).
LEXICAL ANALYZER

 Trying to understand each element in a


program.
 Token: a group of characters having a collective
meaning.
const pi = 3.14159;

Token 1: (const, -)
Token 2: (identifier, ‘pi’)
Token 3: (=, -)
Token 4: (realnumber, 3.14159)
Token 5: (;, -)
EXAMPLE
Token Informal description Sample lexemes
if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2


number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but “ sorrounded by “ “core dumped”

printf(“total = %d\n”, score);


ATTRIBUTES FOR TOKENS
▪ E = M * C ** 2
▪ <id, pointer to symbol table entry for E>
▪ <assign-op>
▪ <id, pointer to symbol table entry for M>
▪ <mult-op>
▪ <id, pointer to symbol table entry for C>
▪ <exp-op>
▪ <number, integer value 2>
LEXICAL ERRORS
▪ Some errors are out of power of lexical analyzer to recognize:
▪ fi (a == f(x)) …

▪ However it may be able to recognize errors like:


▪ d = 2r

▪ Such errors are recognized when no pattern for tokens matches


a character sequence
LEXICAL ANALYSIS
SECTION 1 THE ROLE OF THE LEXICAL ANALYZER

First phase of a compiler


1、Main task
▪ To read the input characters
▪ To produce a sequence of tokens used by the parser for syntax analysis
▪ As an assistant of parser
LEXICAL ANALYSIS
SECTION 1 THE ROLE OF THE LEXICAL ANALYZER
2、Interaction of lexical analyzer with parser

token
Source Lexical Parser
program analyzer Get next
token

Symbol
table
LEXICAL ANALYSIS
SECTION 1 THE ROLE OF THE LEXICAL ANALYZER

3、Processes in lexical analyzers


▪ Scanning
▪ Pre-processing
▪ Strip out comments and white space

▪ Correlating error messages from compiler


with source program
▪ A line number can be associated with an
error message
LEXICAL ANALYSIS
SECTION 1 THE ROLE OF THE LEXICAL ANALYZER
4、Terms of the lexical analyzer
▪ Token
▪ Types of words in source program
▪ Keywords, operators, identifiers, constants,
literal strings, punctuation symbols(such as
commas, semicolons)
▪ Lexeme
▪ Actual words in source program
▪ Pattern
▪ A rule describing the set of lexemes that can
represent a particular token in source program
▪ Relation {<.<=,>,>=,==,<>}
LEXICAL ANALYSIS
SECTION 1 THE ROLE OF THE LEXICAL ANALYZER
5、Attributes for Tokens
▪ A pointer to the symbol-table entry in which the
information about the token is kept
E.g E=M*C**2
<id, pointer to symbol-table entry for E>
<assign_op,>
<id, pointer to symbol-table entry for M>
<multi_op,>
<id, pointer to symbol-table entry for C>
<exp_op,>
<num,integer value 2>
LEXICAL ANALYSIS
SECTION 1 THE ROLE OF THE LEXICAL ANALYZER
6、Lexical Errors
▪ Deleting an extraneous character
▪ Inserting a missing character
▪ Replacing an incorrect character by a correct character
▪ Transposing two adjacent characters(such as , fi=>if)
LEXICAL ANALYSIS
SECTION 2 SPECIFICATION OF TOKENS
1、Regular Definition of Tokens
▪ Defined in regular expression
e.g. Id → letter(letter|digit)
letter →A|B|…|Z|a|b|…|z
digit →0|1|2|…|9
Notes: Regular expressions are an important
notation for specifying patterns. Each pattern
matches a set of strings, so regular
expressions will serve as names for sets of
strings.
LEXICAL ANALYSIS
SECTION 2 SPECIFICATION OF TOKENS
2、Regular Expression & Regular language
▪ Regular Expression
▪ A notation that allows us to define a pattern
in a high level language.
▪ Regular language
▪ Each regular expression r denotes a
language L(r) (the set of sentences relating
to the regular expression r)
Notes: Each word in a program can be
expressed in a regular expression
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
1、Task of recognition of token in a lexical analyzer
▪ Isolate the lexeme for the next token in the input buffer
▪ Produce as output a pair consisting of the appropriate token and
attribute-value, such as <id,pointer to table entry> , using the
translation table given in the Fig in next page
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
1、Task of recognition of token in a lexical analyzer

Regular Token Attribute-


expression value
if if -
id id Pointer to
table entry
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
2、Methods to recognition of token
▪ Use Transition Diagram
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
3、Transition Diagram(Stylized flowchart)
▪ Depict the actions that take place when a lexical analyzer is called
by the parser to get the next token

Accepting
state
start > =
0 6 7 return(relop,GE)
Start other
state 8 * return(relop,GT)
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
4、Implementing a Transition Diagram
▪ Each state gets a segment of code
▪ If there are edges leaving a state, then its code reads a character
and selects an edge to follow, if possible
▪ Use nextchar() to read next character from the input buffer
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
4、Implementing a Transition Diagram
while (1) {
switch(state) {
case 0: c=nextchar();
if (c==blank || c==tab || c==newline){
state=0;lexeme_beginning++}
else if (c== ‘<‘) state=1;
else if (c==‘=‘) state=5;
else if(c==‘>’) state=6 else state=fail();
break
case 9: c=nextchar();
if (isletter( c)) state=10;
else state=fail(); break
… }}}
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
5、A generalized transition diagram
Finite Automation
▪ Deterministic or non-deterministic FA
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
e.g:The FA simulator for Identifiers is:

▪ Which represent the rule: identifier=letter(letter|digit)*

letter
letter
1 2
digit
LEXICAL ANALYSIS
SECTION 4 FINITE AUTOMATION
1、Usage of FA
▪ Precisely recognize the regular sets
▪ A regular set is a set of sentences relating to the regular expression

2、Sorts of FA
▪ Deterministic FA
▪ Non-deterministic FA
LEXICAL ANALYSIS
SECTION 4 FINITE AUTOMATA
3、Deterministic FA (DFA)
DFA is a quintuple, M(S,,move,s0,F)
▪ S: a set of states
▪ : the input symbol alphabet
▪ move: a transition function, mapping from S 
to S, move(s,a)=s’
▪ s0: the start state, s0 ∈ S
▪ F: a set of states F distinguished as accepting
states, FS
LEXICAL ANALYSIS
SECTION 4 FINITE AUTOMATION
3、Deterministic FA (DFA)
Note: 1) In a DFA, no state has an -transition;
2)In a DFA, for each state s and input
symbol a, there is at most one edge labeled a
leaving s
3)To describe a FA,we use the transition
graph or transition table
4)A DFA accepts an input string x if and
only if there is some path in the transition
graph from start state to some accepting state
e.g. DFA M=({0,1,2,3},{a,b},move,0,{3})
Move: move(0,a)=1 m(0,b)=2 m(1,a)=3 m(1,b)=2
m(2,a)=1 m(2,b)=3 m(3,a)=3 m(3,b)=3
Transition table

input a b

state 1 a
a
a
0 1 2 b a
0 3
1 3 2 b
b
b
2 1 3 2
3 3 3 Transition graph
e.g. Construct a DFA M,which can accept the strings which begin with
a or b, or begin with c and contain at most one a。

b b
0 c 2 a 3
a b c c

c 1 a
b
So ,the DFA is
b b
M=({0,1,2,3,},{a,b,c},move,0,{1,2,3})
move:move(0,a)=1 move(0,b)=1 0 c 2 a 3
move(0,c)=2 move(1,a)=1
move(1,b)=1 move(1,c)=1 a b c c
move(2,a)=3 move(2,b)=2
move(2,c)=2 move(3,b)=3
move(3,c)=3 c 1 a
b

You might also like