0% found this document useful (0 votes)

5 views30 pages

Lec 2

The document outlines the role of a lexical analyzer in the compilation process, which involves reading input characters and producing a sequence of tokens for the parser. It details the specification and recognition of tokens, including the use of regular expressions and finite automata. Additionally, it discusses the importance of separating lexical analysis from parsing for improved compiler efficiency and design simplicity.

Uploaded by

ezatrashad2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views30 pages

Lec 2

Uploaded by

ezatrashad2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

COMPILERS

Lexical Analysis
OUTLINE
▪ Role of lexical analyzer
▪ Specification of tokens
▪ Recognition of tokens
▪ Lexical analyzer generator
THE ROLE OF LEXICAL ANALYZER

token
Source To semantic
Lexical Analyzer Parser
program analysis
getNextToken

Symbol
table
WHY TO SEPARATE LEXICAL ANALYSIS AND
PARSING
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability
 Lexical analyzer: reads input characters and
produces a sequence of tokens as output
(nexttoken()).
LEXICAL ANALYZER

 Trying to understand each element in a

program.
 Token: a group of characters having a collective
meaning.
const pi = 3.14159;

Token 1: (const, -)
Token 2: (identifier, ‘pi’)
Token 3: (=, -)
Token 4: (realnumber, 3.14159)
Token 5: (;, -)
EXAMPLE
Token Informal description Sample lexemes
if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2

number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but “ sorrounded by “ “core dumped”

printf(“total = %d\n”, score);

ATTRIBUTES FOR TOKENS
▪ E = M * C ** 2
▪ <id, pointer to symbol table entry for E>
▪ <assign-op>
▪ <id, pointer to symbol table entry for M>
▪ <mult-op>
▪ <id, pointer to symbol table entry for C>
▪ <exp-op>
▪ <number, integer value 2>
LEXICAL ERRORS
▪ Some errors are out of power of lexical analyzer to recognize:
▪ fi (a == f(x)) …

▪ However it may be able to recognize errors like:

▪ d = 2r

▪ Such errors are recognized when no pattern for tokens matches

a character sequence
LEXICAL ANALYSIS
SECTION 1 THE ROLE OF THE LEXICAL ANALYZER

First phase of a compiler

1、Main task
▪ To read the input characters
▪ To produce a sequence of tokens used by the parser for syntax analysis
▪ As an assistant of parser
LEXICAL ANALYSIS
SECTION 1 THE ROLE OF THE LEXICAL ANALYZER
2、Interaction of lexical analyzer with parser

token
Source Lexical Parser
program analyzer Get next
token

Symbol
table
LEXICAL ANALYSIS
SECTION 1 THE ROLE OF THE LEXICAL ANALYZER

3、Processes in lexical analyzers

▪ Scanning
▪ Pre-processing
▪ Strip out comments and white space

▪ Correlating error messages from compiler

with source program
▪ A line number can be associated with an
error message
LEXICAL ANALYSIS
SECTION 1 THE ROLE OF THE LEXICAL ANALYZER
4、Terms of the lexical analyzer
▪ Token
▪ Types of words in source program
▪ Keywords, operators, identifiers, constants,
literal strings, punctuation symbols(such as
commas, semicolons)
▪ Lexeme
▪ Actual words in source program
▪ Pattern
▪ A rule describing the set of lexemes that can
represent a particular token in source program
▪ Relation {<.<=,>,>=,==,<>}
LEXICAL ANALYSIS
SECTION 1 THE ROLE OF THE LEXICAL ANALYZER
5、Attributes for Tokens
▪ A pointer to the symbol-table entry in which the
information about the token is kept
E.g E=M*C**2
<id, pointer to symbol-table entry for E>
<assign_op,>
<id, pointer to symbol-table entry for M>
<multi_op,>
<id, pointer to symbol-table entry for C>
<exp_op,>
<num,integer value 2>
LEXICAL ANALYSIS
SECTION 1 THE ROLE OF THE LEXICAL ANALYZER
6、Lexical Errors
▪ Deleting an extraneous character
▪ Inserting a missing character
▪ Replacing an incorrect character by a correct character
▪ Transposing two adjacent characters(such as , fi=>if)
LEXICAL ANALYSIS
SECTION 2 SPECIFICATION OF TOKENS
1、Regular Definition of Tokens
▪ Defined in regular expression
e.g. Id → letter(letter|digit)
letter →A|B|…|Z|a|b|…|z
digit →0|1|2|…|9
Notes: Regular expressions are an important
notation for specifying patterns. Each pattern
matches a set of strings, so regular
expressions will serve as names for sets of
strings.
LEXICAL ANALYSIS
SECTION 2 SPECIFICATION OF TOKENS
2、Regular Expression & Regular language
▪ Regular Expression
▪ A notation that allows us to define a pattern
in a high level language.
▪ Regular language
▪ Each regular expression r denotes a
language L(r) (the set of sentences relating
to the regular expression r)
Notes: Each word in a program can be
expressed in a regular expression
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
1、Task of recognition of token in a lexical analyzer
▪ Isolate the lexeme for the next token in the input buffer
▪ Produce as output a pair consisting of the appropriate token and
attribute-value, such as <id,pointer to table entry> , using the
translation table given in the Fig in next page
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
1、Task of recognition of token in a lexical analyzer

Regular Token Attribute-

expression value
if if -
id id Pointer to
table entry
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
2、Methods to recognition of token
▪ Use Transition Diagram
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
3、Transition Diagram(Stylized flowchart)
▪ Depict the actions that take place when a lexical analyzer is called
by the parser to get the next token

Accepting
state
start > =
0 6 7 return(relop,GE)
Start other
state 8 * return(relop,GT)
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
4、Implementing a Transition Diagram
▪ Each state gets a segment of code
▪ If there are edges leaving a state, then its code reads a character
and selects an edge to follow, if possible
▪ Use nextchar() to read next character from the input buffer
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
4、Implementing a Transition Diagram
while (1) {
switch(state) {
case 0: c=nextchar();
if (c==blank || c==tab || c==newline){
state=0;lexeme_beginning++}
else if (c== ‘<‘) state=1;
else if (c==‘=‘) state=5;
else if(c==‘>’) state=6 else state=fail();
break
case 9: c=nextchar();
if (isletter( c)) state=10;
else state=fail(); break
… }}}
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
5、A generalized transition diagram
Finite Automation
▪ Deterministic or non-deterministic FA
LEXICAL ANALYSIS
SECTION 3 RECOGNITION OF TOKENS
e.g：The FA simulator for Identifiers is:

▪ Which represent the rule: identifier=letter(letter|digit)*

letter
letter
1 2
digit
LEXICAL ANALYSIS
SECTION 4 FINITE AUTOMATION
1、Usage of FA
▪ Precisely recognize the regular sets
▪ A regular set is a set of sentences relating to the regular expression

2、Sorts of FA
▪ Deterministic FA
▪ Non-deterministic FA
LEXICAL ANALYSIS
SECTION 4 FINITE AUTOMATA
3、Deterministic FA (DFA)
DFA is a quintuple, M(S,,move,s0,F）
▪ S: a set of states
▪ : the input symbol alphabet
▪ move: a transition function, mapping from S 
to S, move(s,a)=s’
▪ s0: the start state, s0 ∈ S
▪ F: a set of states F distinguished as accepting
states, FS
LEXICAL ANALYSIS
SECTION 4 FINITE AUTOMATION
3、Deterministic FA (DFA)
Note: 1) In a DFA, no state has an -transition;
2)In a DFA, for each state s and input
symbol a, there is at most one edge labeled a
leaving s
3)To describe a FA,we use the transition
graph or transition table
4)A DFA accepts an input string x if and
only if there is some path in the transition
graph from start state to some accepting state
e.g. DFA M=({0,1,2,3},{a,b},move,0,{3})
Move: move(0,a)=1 m(0,b)=2 m(1,a)＝3 m(1,b)＝2
m(2,a)=1 m(2,b)=3 m(3,a)＝3 m(3,b)＝3
Transition table

input a b

state 1 a
a
a
0 1 2 b a
0 3
1 3 2 b
b
b
2 1 3 2
3 3 3 Transition graph
e.g. Construct a DFA M，which can accept the strings which begin with
a or b, or begin with c and contain at most one a。

b b
0 c 2 a 3
a b c c

c 1 a
b
So ,the DFA is
b b
M=({0,1,2,3,},{a,b,c},move,0,{1,2,3})
move：move(0,a)=1 move(0,b)=1 0 c 2 a 3
move(0,c)=2 move(1,a)=1
move(1,b)=1 move(1,c)=1 a b c c
move(2,a)=3 move(2,b)=2
move(2,c)=2 move(3,b)=3
move(3,c)=3 c 1 a
b

CS3304 9 LanguageSyntax 2 PDF
No ratings yet
CS3304 9 LanguageSyntax 2 PDF
39 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
No ratings yet
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
64 pages
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
88 pages
Lect 03
No ratings yet
Lect 03
19 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Chpater 2 Lexical Analysis
No ratings yet
Chpater 2 Lexical Analysis
48 pages
Lecture 3
No ratings yet
Lecture 3
31 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
CC Note 1
No ratings yet
CC Note 1
11 pages
CC Unit 2
No ratings yet
CC Unit 2
80 pages
2 LexicalAnalysis
No ratings yet
2 LexicalAnalysis
11 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Compiler
No ratings yet
Compiler
60 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
CH 2
No ratings yet
CH 2
36 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
L3 FSM
No ratings yet
L3 FSM
20 pages
Lexical Analysis
No ratings yet
Lexical Analysis
88 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
Compiler Design - Lexical Analysis
No ratings yet
Compiler Design - Lexical Analysis
16 pages
4 LexicalAnalysis
No ratings yet
4 LexicalAnalysis
27 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Compiler Design Part 2
No ratings yet
Compiler Design Part 2
20 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
2 Lex
No ratings yet
2 Lex
45 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
CD KCS502 Unit 1 B
No ratings yet
CD KCS502 Unit 1 B
12 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
6-Lexical Analysis Part5
No ratings yet
6-Lexical Analysis Part5
20 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
2.1 Constituents of Lexical Analysis
No ratings yet
2.1 Constituents of Lexical Analysis
10 pages
CP 324 Lexical Analysis l2
No ratings yet
CP 324 Lexical Analysis l2
26 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
2 - 3recognition of Tokens
No ratings yet
2 - 3recognition of Tokens
17 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
CD ch2
No ratings yet
CD ch2
104 pages
Lecture 2
No ratings yet
Lecture 2
20 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
The Role of The Lexical Analyzer: Token Source Program
No ratings yet
The Role of The Lexical Analyzer: Token Source Program
30 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Lexical Analyzer (Compiler Contruction)
100% (1)
Lexical Analyzer (Compiler Contruction)
6 pages
CD - Unit II - Notes
No ratings yet
CD - Unit II - Notes
20 pages
Unit 6
No ratings yet
Unit 6
109 pages
Lexical Analysis I: Compiler Construction
No ratings yet
Lexical Analysis I: Compiler Construction
35 pages
Lexer
No ratings yet
Lexer
6 pages
CD LexProgram
No ratings yet
CD LexProgram
11 pages
Le Re Lundke
No ratings yet
Le Re Lundke
3 pages
Lexical Analysis
No ratings yet
Lexical Analysis
23 pages
R2031421 CD Online Bits - Ref
No ratings yet
R2031421 CD Online Bits - Ref
11 pages
Cse 309 Slides 03 Lexicalanalysis
No ratings yet
Cse 309 Slides 03 Lexicalanalysis
67 pages
Unity University: Department of Computer Sciences
No ratings yet
Unity University: Department of Computer Sciences
4 pages
Compiler Construction: Chapter 1: Introduction To Compilation
No ratings yet
Compiler Construction: Chapter 1: Introduction To Compilation
65 pages
Chapter 1: Introduction To Compiling
No ratings yet
Chapter 1: Introduction To Compiling
3 pages
Makefile Generation From Autotools
No ratings yet
Makefile Generation From Autotools
25 pages
Three Address Code Report
No ratings yet
Three Address Code Report
11 pages
UNIT 2 Part 3 Lexical Analyzer Generator
No ratings yet
UNIT 2 Part 3 Lexical Analyzer Generator
27 pages
Yacc / Bison Parser Generator
No ratings yet
Yacc / Bison Parser Generator
19 pages
Chapter 1 Overview of Compilation
No ratings yet
Chapter 1 Overview of Compilation
17 pages
Compiler Lecture 2
No ratings yet
Compiler Lecture 2
15 pages
YACC - Compiler Design
No ratings yet
YACC - Compiler Design
13 pages
Compiler Design Unit 1
No ratings yet
Compiler Design Unit 1
42 pages
Chilton - ACL - Compiler Compiler 1966
No ratings yet
Chilton - ACL - Compiler Compiler 1966
7 pages
C Make Cache
No ratings yet
C Make Cache
36 pages
What Is Shift
No ratings yet
What Is Shift
11 pages
Compiler Construction
No ratings yet
Compiler Construction
26 pages
Phases of Compiler
No ratings yet
Phases of Compiler
13 pages
De La Atio S Et A e Ide Tified Usi G The Esults of Le I Al A Al Sis
No ratings yet
De La Atio S Et A e Ide Tified Usi G The Esults of Le I Al A Al Sis
24 pages
MVN Dep
No ratings yet
MVN Dep
8 pages
Introduction To Compiler Design (CD) : Mu-Mit
No ratings yet
Introduction To Compiler Design (CD) : Mu-Mit
22 pages
Offensivecon 22 Attacking Javascript Engines
No ratings yet
Offensivecon 22 Attacking Javascript Engines
75 pages
SE Compiler Chapter 4-SDT
No ratings yet
SE Compiler Chapter 4-SDT
7 pages
ANTLR4 Home 260414 0049 66
0% (1)
ANTLR4 Home 260414 0049 66
3 pages
QB Compiler Design
100% (2)
QB Compiler Design
13 pages
Compiler Design (All Modules) - 10
No ratings yet
Compiler Design (All Modules) - 10
1 page

Lec 2

Uploaded by

Lec 2

Uploaded by

COMPILERS

 Trying to understand each element in a

id Letter followed by letter and digits pi, score, D2

printf(“total = %d\n”, score);

▪ However it may be able to recognize errors like:

▪ Such errors are recognized when no pattern for tokens matches

First phase of a compiler

3、Processes in lexical analyzers

▪ Correlating error messages from compiler

Regular Token Attribute-

▪ Which represent the rule: identifier=letter(letter|digit)*

You might also like