0% found this document useful (0 votes)

27 views27 pages

Chapter 2

Uploaded by

Senay Mekonnen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views27 pages

Chapter 2

Uploaded by

Senay Mekonnen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Department of Computer Science

Compiler Design (COSC 4103)

2. Lexical Analysis

1
Lexical Analyzer
toke
source lexical n
analyzer parser
program
get
next
 LEXICAL ANALYZER token
symbol
 Scan Input table
 PARSER
 Remove WS, NL, …
 Perform Syntax Analysis
 Identify Tokens
 Actions Dictated by Token
 Create Symbol Table
Order
 Insert Tokens into ST  Update Symbol Table

 Generate Errors Entries

 Create Abstract Rep. of
 Send Tokens to
Source
Parser
2  Generate Errors
TOKEN, PATTERN and LEXEME
What are Major Terms for Lexical Analysis?
TOKEN
A classification for a common set of strings
Examples Include Identifier, Integer, Float, Assign,
etc.
PATTERN
The rules which characterize the set of strings for a
token
 EX: integers [0-9]+
Recall File and OS Wildcards ([A-Z]*.*)
LEXEME
Actual sequence of characters that matches pattern
and is classified by a token
Identifiers: x, count, name, etc…
3 Integers: 345, 20 -12, etc.
TOKEN, PATTERN and LEXEME con…

Token Sample Lexemes Informal Description of Pattern

const const const
If If Characters I, f
E, l, s, e
Else Else
< or <= or = or < > or >= or >
relation <, <=, =, < >, >,
>= letter followed by letters and digits
id
pi, count, D2 any numeric constant
num
3.1416, 0, 6.02E23 any characters between “ and “
literal
except “
“core dumped”

Actual values are critical. Info is :

Classifies
1. Stored in symbol table
Pattern
2. Returned to parser
4
Examples of Non-Tokens
Examples of non-tokens
comment: /* do not change */
preprocessor directive: #include <stdio.h>
preprocessor directive: #define NUM 5
blanks,
tabs,
newlines

5
Handling Lexical Errors
Error Handling is very localized, with Respect
to Input Source
For example: whil ( x := 0 ) do
generates no lexical errors
In what Situations do Errors Occur?
Prefix of remaining input doesn’t match any
defined token
Possible error recovery actions:
Deleting or Inserting Input Characters
Replacing or Transposing Characters
Or, skip over to next separator to “ignore”
problem

6
Input Buffering
to find the end of token, LA may need to go
one or more characters beyond the next
lexeme
E.g., to find ID or >, =, ==
Buffer Pairs
Concerns with efficiency issues
Used with a lookahead on the
E = M * C * * 2 e
input o
f

lexemeBegin ptr Forward ptr

7 Using a pair of input buffers
Basic Scanning technique
Use 1 character of look-ahead
Obtain char with getc()
Do a case analysis
Based on lookahead char
Based on current lexeme
Outcome
If char can extend lexeme, all is well, go on.
If char cannot extend lexeme:
Figure out what the complete lexeme is and
return its token
Put the lookahead back into the symbol stream

8
Lexical Analyzer: Implementation Approaches

 General Approach to implement Lexical Analyzer (LA)

1. Tool such as Lex

2. Write the LA using Programming Languages
3. Write LA in assembly language (difficult but efficient)

9
Formalizing Token
Definition
DEFINITIONS:

ALPHABET :Finite set of symbols {0,1}, or {a,b,c}, or

{n,m, … , z}
STRING : Finite sequence of symbols from an
alphabet.
0011 or abbca or AABBC …
A.K.A. word / sentence
If S is a string, then |S| is the length of S, i.e. the
number of symbols in the string S.
 : Empty String, with |  | = 0

10
Language Concepts
A language, L, is simply any set of strings
overAlphabet
a fixed alphabet.
Languages
{0,1}
{0,10,100,1000,100000…}

{0,1,00,11,000,111,…}
{a,b,c}
{abc,aabbcc,aaabbbccc,…}
{A, … ,Z} {TEE,FORE,BALL,…}
{FOR,WHILE,GOTO,
…}
{A,…,Z,a,…,z,0,…9, { All legal C/C++
Special Languages:  - EMPTY LANGUAGE
progs}
11
+,-,…,<,>,…} {- All
contains  string
grammatically
only
Language & Regular Expressions

 A Regular Expression is a Set of Rules /

Techniques for Constructing Sequences of
Symbols (Strings) From an Alphabet.
Let  Be an Alphabet, r a Regular

Expression Then L(r) is the Language That

is Characterized by the Rules of R

12
Towards Token Definition
Regular Definitions: Associate names with
Regular Expressions
For Example : C/C++ IDs
letter  A | B | C | … | Z | a | b | … | z
digit  0 | 1 | 2 | … | 9
id  letter ( letter | digit )*
Shorthand Notation:
“+” : one or more r* = r+ |  & r+ = r
r*
“?” : zero or one
[range] : set range of characters (replaces
“|” )
Using Shorthand[A-Z]: =
C/C++
A | B |IDs
C|…|Z
id  [A-Za-z][A-Za-z0-9]*
13
We’ll Use Both Techniques
Token Recognition
How can we use concepts developed so far to assist
in recognizing tokens of a source language ?
Assume Following Tokens:
if, then, else, relop,
id, num
What language construct are
Giventhey usedWhat
Tokens, for ?are
if
Patterns ?if
then  then
else  else
relop  < | <= | > | >= | = | <>
id  letter ( letter | digit )*
num  digit + (. digit + ) ? ( E(+ | -) ? digit +

14 )?
What does this represent ?
Constructing Transition Diagrams for Tokens
• Transition Diagrams (TD) are used to represent
the tokens – these are automatons!
• As characters are read, the relevant TDs are
used to attempt to match lexeme to a pattern
• Each TD has:
• States : Represented by Circles
• Actions : Represented by Arrows between states
• Start State : Beginning of a pattern (Arrowhead)
• Final State(s) : End of pattern (Concentric Circles)

• Each TD is Deterministic - No need to choose

between 2 different actions !
15
Example : All RELOPs And Id
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
othe
=
r
4 * return(relop, LT)

5 return(relop, EQ)
>

=
6 7 return(relop, GE)
othe
r 8 * return(relop, GT)

id :
letter or
digit
star lett oth *
0 1 return(id,
t er er
16
2 lexeme)
Important Final Notes on Transition
Diagrams & Lexical Analyzers
state = 0;
token nexttoken() •How does this work?
{ while(1) {
switch (state) {
case 0: c = nextchar();
/* c is lookahead character */
if (c== blank || c==tab || c== newline) {
state = 0;
lexeme_beginning++;
What /* advance beginning of lexeme */
does }
else if (c == ‘<‘) state = 1;
this
else if (c == ‘=‘) state = 5;
do? else if (c == ‘>’) state = 6;
else state = fail();
break;
…
17
Tokens / Patterns / Regular Expressions
Lexical Analysis - searches for matches of lexeme
to pattern
Lexical
ForAnalyzer returns:<actual
Token lexeme, symbolic
identifier Symbolic ID
of token>
Example:
if 1
Set of all regular then 2
expressions plus else 3
symbolic ids plus>,>=,<,… 4
analyzer define := 5
required id 6
functionality.
int 7
real 8

18 REs --- NFA --- DFA (program for

Automata & Language
Theory
 Terminology
 FSA
A recognizer that takes an input string and
determines whether it’s a valid string of the
language.
 Non-Deterministic FSA (NFA)
Has several alternative actions for the same input
symbol
 Deterministic FSA (DFA)
Has at most 1 action for any given input symbol
 Bottom Line
 expressive power(NFA) == expressive power(DFA)
 Conversion can be automated
19
Finite Automata & Language Theory

Finite A recognizer that takes an input

Automata : string & determines whether it’s a
valid sentence of the language

Non-Deterministic
Has more than one alternative
: action for the same input symbol.
Can’t utilize algorithm !

DeterministicHas
: at most one action for a given
input symbol.
Both types are used to recognize regular
expressions.

20
Representing NFAs
Number
Transition Diagrams : states (circles),
arcs, final states, …

Transition Tables: More suitable to

representation within a
S = { 0, 1, 2, computer
3} a
start a b b
s0 = 0 0 1 2 3

F={3}
b
Fig:3
 = { a,i bn }
put
b  (null) moves
s possible
0 { 0, 1 { 0 } i  j
t }
a 1 -- {2}
Switch state but do
t 2 -- {3}
21 not use any input
e
symbol
NFA- Regular Expressions & Compilation
Problems with NFAs for Regular Expressions:
1. Valid input might not be accepted
2. NFA may behave differently on the same input
Example: for Fig 3 aabb is accepted along path : 0 → 0 →
1→2→3
BUT… it is not
Relationship of accepted
NFAs to along the valid path: 0 → 0 → 0 → 0
Compilation:
→0
1. Regular expression “recognized” by NFA
2. Regular expression is “pattern” for a “token”
3. Tokens are building blocks for lexical analysis
4. Lexical analyzer can be described by a collection
of NFAs. Each NFA is for a language token.
22
Deterministic Finite Automata (DFA)

 A DFA is an NFA with a few restrictions

 No epsilon transitions
 For every state s, there is only one transition
(s,x) from s for any symbol x in Σ
 Corollaries
 Easy to implement a DFA with an
algorithm!
 Deterministic behavior

23
NFA to DFA Conversion
 Look at the state reachable without consuming
any input, and Aggregate them in macro states

24 • A state is final IFF one of the NFA

Deterministic Finite Automata
A DFA is an NFA with the following
restrictions:
•  moves are not allowed
• For every state s S, there is one and
only one path from s for every input
Since transition
symbol a tables
 . don’t have any alternative
options, DFAs are easily simulated via an
algorithm.s  s0
c  nextchar;
while c  eof do
s  move(s,c);
c  nextchar;
end;
if s is in F then return “yes”
else return “no”

25
Conversion : NFA  DFA Algorithm

• Algorithm Constructs a Transition Table for

DFA from NFA
• Each state in DFA corresponds to a SET of
states of the NFA
• Why does this occur ?
•  moves
• non-determinism
Both require us to characterize multiple
situations that occur for accepting the same
string.
26 (Recall : Same input can have multiple paths
Converting NFA to DFA


2 a 3 b 4

 
0  1 5  8

 

6 c 7


From State 0, Where can we move without consuming
any input ?
27 This forms a new state: 0,1,2,6,8 What transitions are
defined for this new state ?

Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Chapter 2
No ratings yet
Chapter 2
99 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
52 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
File 1675742677 110405 LexicalAnalysis-Continue1
No ratings yet
File 1675742677 110405 LexicalAnalysis-Continue1
39 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Unit 1
No ratings yet
Unit 1
34 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
4-Intro To Flex and Bison-09!09!2024
No ratings yet
4-Intro To Flex and Bison-09!09!2024
28 pages
4 LexicalAnalysis
No ratings yet
4 LexicalAnalysis
27 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Chpater 2 Lexical Analysis
No ratings yet
Chpater 2 Lexical Analysis
48 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
Compiler Design - Lexical Analysis
No ratings yet
Compiler Design - Lexical Analysis
16 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Lexical Analysis 3
No ratings yet
Lexical Analysis 3
27 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
88 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Lect 03
No ratings yet
Lect 03
19 pages
Content - DELMIA - Ergonomics at Work Essentials
No ratings yet
Content - DELMIA - Ergonomics at Work Essentials
28 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
CC Note 1
No ratings yet
CC Note 1
11 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Lexical Analysis I: Compiler Construction
No ratings yet
Lexical Analysis I: Compiler Construction
35 pages
Compiler
No ratings yet
Compiler
60 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
Phoenix Black-Microwave Muffle Furnace
No ratings yet
Phoenix Black-Microwave Muffle Furnace
12 pages
NIJ-0108.01 Ballistic Resistant Protective Materials
100% (1)
NIJ-0108.01 Ballistic Resistant Protective Materials
16 pages
Display A CDS View Using ALV With IDA
No ratings yet
Display A CDS View Using ALV With IDA
7 pages
3 Micro Quadlok Interconnection System Mqs
No ratings yet
3 Micro Quadlok Interconnection System Mqs
144 pages
A List of All My Torrents
No ratings yet
A List of All My Torrents
3 pages
1.7.1.8 Flow Switch - 2
No ratings yet
1.7.1.8 Flow Switch - 2
3 pages
Huawei SUN2000 30KTL-A - 33KTL - 40KTL User Manual (Issue04 - 2016!06!20)
No ratings yet
Huawei SUN2000 30KTL-A - 33KTL - 40KTL User Manual (Issue04 - 2016!06!20)
108 pages
Tree Menu Magic 2
No ratings yet
Tree Menu Magic 2
77 pages
IT 2023 - Digital - (SEGi Susan 012-2820 251)
No ratings yet
IT 2023 - Digital - (SEGi Susan 012-2820 251)
24 pages
Second Floor Beam & Slab Layout: B C D E A
No ratings yet
Second Floor Beam & Slab Layout: B C D E A
1 page
Appendix C - Machine Language: Code Operand Description
No ratings yet
Appendix C - Machine Language: Code Operand Description
1 page
TDS Tam 395 Coaltar Epoxy Black
No ratings yet
TDS Tam 395 Coaltar Epoxy Black
2 pages
Iso 123
No ratings yet
Iso 123
13 pages
5a931d082a7d0 PDF
No ratings yet
5a931d082a7d0 PDF
83 pages
17 Microprocessor Systems Lecture No 17 JMP and LOOP Instructions PDF
No ratings yet
17 Microprocessor Systems Lecture No 17 JMP and LOOP Instructions PDF
12 pages
Combined Voltage and Current Post Insulator Sensors: Ordering Table Part Number Sequence 96AB/CDEFGH Where
No ratings yet
Combined Voltage and Current Post Insulator Sensors: Ordering Table Part Number Sequence 96AB/CDEFGH Where
2 pages
EBLQ-CV3, CW1 EDLQ-CV3, CW1 4PEN522034-1 2018 01 Installer Reference Guide English
No ratings yet
EBLQ-CV3, CW1 EDLQ-CV3, CW1 4PEN522034-1 2018 01 Installer Reference Guide English
108 pages
Meghnaghat Power Plant
No ratings yet
Meghnaghat Power Plant
65 pages
Emfd Eec
No ratings yet
Emfd Eec
2 pages
Normalizer Free Networks
No ratings yet
Normalizer Free Networks
22 pages
Biped Humanoid Robot of 17 Degree of Freedom (Dof)
No ratings yet
Biped Humanoid Robot of 17 Degree of Freedom (Dof)
5 pages
Unit 2
No ratings yet
Unit 2
15 pages
Hephaestus 7100 - Quick Reference Guide
No ratings yet
Hephaestus 7100 - Quick Reference Guide
4 pages
Com01 PPT Operatorsandconditionalstatement
No ratings yet
Com01 PPT Operatorsandconditionalstatement
19 pages
Social Media Influences To Teenagers: June 2020
No ratings yet
Social Media Influences To Teenagers: June 2020
12 pages
Rans Simulation of Viscous Flow Around Hull of Multipurpose Amphibious Vehicle
No ratings yet
Rans Simulation of Viscous Flow Around Hull of Multipurpose Amphibious Vehicle
5 pages
13-13, Connection Box EJB 5380
No ratings yet
13-13, Connection Box EJB 5380
1 page
Naat Nisa Brochure 2023...
No ratings yet
Naat Nisa Brochure 2023...
4 pages
Curriculum Vitae: Nguyen Viet Anh
No ratings yet
Curriculum Vitae: Nguyen Viet Anh
7 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet

Chapter 2

Uploaded by

Chapter 2

Uploaded by

Department of Computer Science

Compiler Design (COSC 4103)

 Generate Errors Entries

Token Sample Lexemes Informal Description of Pattern

Actual values are critical. Info is :

lexemeBegin ptr Forward ptr

 General Approach to implement Lexical Analyzer (LA)

1. Tool such as Lex

ALPHABET :Finite set of symbols {0,1}, or {a,b,c}, or

 A Regular Expression is a Set of Rules /

Expression Then L(r) is the Language That

• Each TD is Deterministic - No need to choose

18 REs --- NFA --- DFA (program for

Finite A recognizer that takes an input

Transition Tables: More suitable to

 A DFA is an NFA with a few restrictions

24 • A state is final IFF one of the NFA

• Algorithm Constructs a Transition Table for

You might also like