0% found this document useful (0 votes)

103 views40 pages

Chapter 2 - Lexical Analyser

The document discusses the role and construction of a lexical analyzer. A lexical analyzer reads source code character by character and generates tokens. It uses regular expressions to specify patterns that define tokens. Tools like Lex can automatically generate a lexical analyzer from a specification file defining the patterns and translation rules. The generated analyzer scans the input and returns tokens to the parser based on the defined patterns. It helps tokenize programs into basic elements like identifiers, keywords, and operators.

Uploaded by

bekalu alemayehu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views40 pages

Chapter 2 - Lexical Analyser

Uploaded by

bekalu alemayehu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 40

Compiler Design

Instructor: Mohammed O.
Email: [email protected]
Samara University
Chapter Two
This Chapter Covers:
Role of lexical analyser
Token Specification and Recognition
NFA to DFA
Lexical Analyzer
Lexical Analyzer reads the source program character by
character to produce tokens.
Normally a lexical analyzer doesn’t return a list of tokens
at one shot, it returns a token when the parser asks a
token from it.

3
2

1
Token
Token represents a set of strings described by a pattern.
Identifier represents a set of strings which start with a
letter continues with letters and digits
Lexeme: is a sequence of characters in the source
program that matched by the pattern for a token.
Tokens: identifier, number, addop, delimeter, …
Since a token can represent more than one lexeme,
additional information should be held for that specific
lexeme. This additional information is called as the
attribute of the token.
For simplicity, a token may have a single attribute which
holds the required information for that token.
For identifiers, this attribute a pointer to the symbol table,
and the symbol table holds the actual attributes for that
token.
Token (Cont.)
Some attributes:
<id,attr> where attr is pointer to the symbol table
<assgop,_> no attribute is needed (if there is only one
assignment operator)
<num,val> where val is the actual value of the number.

Token type and its attribute uniquely identifies a lexeme.

Regular expressions are widely used to specify patterns.
Scanner
A scanner groups (classed together) input characters into
tokens.
For example, if the input is:
x = x*(b+1); then the scanner generates the following
sequence of tokens
id(x), =, id(x), *, (, id(b), +, num(1), ), ;
where id(x) indicates the identifier with name x (a
programme variable in this case) and num(1) indicates the
integer 1.
Each time the parser needs a token, it sends a request to
the scanner.
Then, the scanner reads as many characters from the input
stream as it is necessary to construct a single token.
Scanner (Cont.)
The scanner may report an error during scanning.
Otherwise, when a single token is formed, the scanner is
suspended (stop from being active temporarily) and
returns the token to the parser.

The parser will repeatedly call the scanner to read all the
tokens from the input stream or until an error is detected
(such as a syntax error).
Some tokens require some extra information.
For example, an identifier is a token (so it is represented by
some number) but it is also associated with a string that
holds the identifier name.
Scanner (Cont.)
For example, the token id(x) is associated with the string, "x".
Similarly, the token num(1) is associated with the number, 1.
Tokens are specified by patterns, called regular expressions.
For example, the regular expression [a-z][a-zA-Z0-9]*
recognises all identifiers with at least one alphanumeric letter
whose first letter is lower-case alphabetic.

A typical scanner:
recognises the keywords of the language (these are the
reserved words that have a special meaning in the language,
such as the word class in Java); (such as the #include "file"
directive in C).
Scanner (Cont.)
recognises special characters, such as parentheses ( and ),
or groups of special characters, such as := (equal by
definition) and ==;
recognises identifiers, integers, reals, decimals, strings, etc;
ignores whitespaces and comments;

Efficient Scanners can be built using regular expressions

and finite automata.
There are automated tools called scanner generators, such
as flex (Fast Lexical Analyzer Generator) for C and JLex
for Java, which construct a fast scanner automatically
according to specifications (regular expressions).
Role of Lexical Analyser
Lexical analyzer performs below given tasks:-
 Remove white spaces and comments from the source program.
 Correlates (make correct) error messages with the source
program.
 Read input characters from the source program.
 Helps to identify token into the symbol table.
Example: Symbol table for a code:
//Define a global function
int add(int a, int b) {
int sum =0;
sum =a+b;
return sum; }
Lexical Analysis
In lexical analysis, we read the source programme
character by character and converge (meet) them to
tokens.
A token is the smallest unit recognisable by the compiler.

Generally, we have four classes of tokens that are usually

recognised and they are:
1. Keywords
2. Identifies
3. Constants
4. Delimiters
Construction of Lexical Analyser
There are 2 general ways to construct lexical analyser:
Hand implementation
Automatic generation of lexical analyser

Hand Implementation
There are two ways to use hand implementation:
Input Buffer approach
Transitional diagrams approach
Input Buffering
The lexical analyser scans the characters of the source
programme one at a time to discover tokens.
Cont.
Often, many characters beyond (in addition to) the next
token may have to be examined before the next token itself
can be determined.

For this and other reasons, it is desirable for the lexical

analyser to read its input from an input buffer.
Operations on Languages
Concatenation: The operation of joining two or more strings
together.
L1L2 = { s1s2 | s1  L1 and s2  L2 }
Union: The operation of combining the result set of two or
more strings.
L1 L2 = { s | s  L1 or s  L2 }
Exponentiation: Repeated strings of the base.
L0 = {} L1 = L L2 = LL
Kleene Closure : Infinite set of all possible strings, including
the emptyε string.

L* = i 0
Li

Positive Closure : The infinite set of all possible strings,

excludingε.
L =+  Li

i 1
Example
L1 = {a,b,c,d} L2 = {1,2}

L1L2 = {a1,a2,b1,b2,c1,c2,d1,d2}

L1  L2 = {a,b,c,d,1,2}

L13 = all strings with length three (using a,b,c,d}

L1* (zero or more)= all strings using letters a, b, c, d and empty (ε )
string.
a * =i.e., it can generate {ε, a, aa, aaa, …}
L1+ (one or more) = doesn’t include the empty (ε ) string.
a + = i.e., it can generate {a, aa, aaa, …}
Regular Expressions
Regular Expressions (REs)
We use regular expressions to describe tokens of a
programming language.

Regular expressions are a very convenient (suitable) form

of representing (possibly infinite) sets of strings, called
regular sets.

For example, the RE (a| b)*aa represents the infinite set

{`àa",`àaa",``baa",`àbaa", ... }, which is the set of all
strings with characters a and b that end in aa.
RE Order of Precedence
We can freely put parentheses around REs to denote the
order of evaluation.

We can drop redundant parenthesis by assuming:

the Kleene star operator * has the highest precedence and
is left associative
concatenation has the next highest precedence and is left
associative
the union operator | has the lowest precedence and is left
associative
Tokens/Patterns/Lexemes/Attributes
A token is sequence of characters which represents a unit
of information in the source program.

Lexeme: a sequence of characters in the source program

that is matched by the pattern for a token

A Pattern is a rule describing the set of lexemes that can

represent a particular token in source programs.
In other word: A set of strings in the input for which the
same token is produced as output.
Pattern is a regular definition or regular definition.
Cont.
An attribute of a token is usually a pointer to the symbol
table entry that gives additional information about the
token, such as its type, value, line number, etc.
Attributes are used to distinguish different lexemes in a
token
Example:
Lexeme Token Pattern
int keyword int
if keyword if
<,<=,=,>,>= relational operator < or <= or = or > or
newval identifier newval
Regular Expression
To write regular expression for some languages can be
difficult, because their regular expressions can be quite
complex. In those cases, we may use regular definitions.

A regular definition is a sequence of the definitions of the

form and they are described using regular definitions, as
follow.

digit → [0-9] ---- any of the numerals from 0-9.

letter → [A-Za-z] ---- a set of upper and lower case letters.
id → letter ( letter \ digit )* --- a set of letters, underscore or
digits (0-9).
relop → < | > | <= | >= | = | <>
Lexical Analyser generator Lex
There are tools that can generate lexical analyzers.
Lex is a special-purpose programming language for creating
programmes to process streams of input characters.
An input file, which we call lex.l, is written in the Lex
language and describes the lexical analyzer to be generated.

The Lex compiler transforms lex.l to a C program, in a file

that is always named lex.yy.c. The latter file is compiled by
the C compiler into a file called a.out, as always.
The C-compiler output is a working lexical analyzer that can
take a stream of input characters and produce a stream of
tokens.
Cont.

Creating a lexical analyzer with Lex

Lex Specifications
Lex source is separated into three sections by %% delimiters
Declarations :- This section includes declaration of variables, constants and
regular definitions.
%%
translation rules :- defines the rules that parse the input stream (regular
expressions ).
%%
auxiliary functions (optional)
Steps in lex implementation
1. Read input language specification
2. Construct NFA with epsilon-moves (Can also do DFA
directly)
3. Convert NFA to DFA
4. Optimise the DFA
5. Generate parsing tables & code
Finite Automata
A finite automaton can be: deterministic(DFA) or non-
deterministic (NFA)
Both deterministic and non-deterministic finite automaton
recognize regular sets.
Which one?
deterministic – faster recognizer, but it may take more
space
non-deterministic – slower, but it may take less space
deterministic automatons are widely used lexical
analyzers.
First, we define regular expressions for tokens; Then we
convert them into a DFA to get a lexical analyzer for our
tokens.
Finite Automata (Cont.)
Algorithm1: Regular Expression  NFA  DFA (two
steps: first to NFA, then to DFA)
Algorithm2: Regular Expression  DFA (directly
convert a regular expression into a DFA)
Converting a RE to an NFA
Every regular expression (RE) can be converted into an
equivalent NFA.

Every NFA can be converted into an equivalent DFA.

The task of a scanner generator, such as JLex, is to
generate the transition tables or to synthesise the scanner
programme given a scanner specification (in the form of a
set of REs).

This is accomplished in two steps: first it converts REs into

an NFA and then it converts the NFA into a DFA
(Algorithm1).
Converting a RE to an NFA (Cont.)
An NFA is similar to a DFA but it also permits (allow)
multiple transitions over the same character and
transitions over ɛ.

In the case of multiple transitions from a state over the

same character, when we are at this state and we read this
character, we have more than one choice; the NFA
succeeds if at least one of these choices.
The ɛ-transition does not consume any input characters, so
you may jump to another state for free.
Clearly DFAs are a subset of NFAs.
Non-Deterministic Finite Automaton
A non-deterministic finite automaton (NFA) is a
mathematical model that consists (made up) of:
 S - a set of states
  (sigma) - a set of input symbols (alphabet)
 move – a transition function
 s0 - a start (initial) state
 F – a set of accepting states (final states)

- transitions are allowed in NFAs. In other words, we can

move from one state to another one without consuming any
symbol.
Non-Deterministic Finite Automaton
A NFA accepts a string x, if and only if there is a path from
the starting state to one of accepting states such that edge
labels along this path spell out x.
NFA (Example)
0 is the start state s0
{2} is the set of final states F
 = {a,b}
S = {0,1,2}
Transition Function: a b
0 {0,1} {0}
1 _ {2}
2 _ _

The language recognized by this NFA is (a|b)* ab

Transition Tables
We can also represent an NFA by a transition table, whose
rows correspond to states, and whose columns correspond
to the input symbols and ɛ.
The entry for a given state and input is the value of the
transition function applied to those arguments.
If the transition function has no information about that
state-input pair, we put 0 in the table for the pair.

Transition table for the NFA of RE (a|b)*abb

Deterministic Finite Automaton (DFA)
A Deterministic Finite Automaton (DFA) is a special form
of a NFA.
no state has - transition
for each state s and input symbol a there is exactly one
transition out of s labelled a.
A DFA represents a finite state machine that recognises a
RE.

The language recognized by this DFA is also (a|b)* ab

Converting a NFA into a DFA (Example)

-closure({0}) = {0,1,2,4,7}
 mark S0
-closure(move(S0,a)) = -closure({3,8}) = {1,2,3,4,6,7,8} = S1
-closure(move(S0,b)) = -closure({5}) = {1,2,4,5,6,7} = S2
transfunc[S0,a]  S1 transfunc[S0,b]  S2
 mark S1
-closure(move(S1,a)) = -closure({3,8}) = {1,2,3,4,6,7,8} = S1
-closure(move(S1,b)) = -closure({5}) = {1,2,4,5,6,7} = S2
transfunc[S1,a]  S1 transfunc[S1,b]  S2
 mark S2
-closure(move(S2,a)) = -closure({3,8}) = {1,2,3,4,6,7,8} = S1
-closure(move(S2,b)) = -closure({5}) = {1,2,4,5,6,7} = S2
transfunc[S2,a]  S1 transfunc[S2,b]  S2
Converting a NFA into a DFA (Cont.)

S0 is the start state of DFA since 0 is a member of

S0={0,1,2,4,7}
S1 is an accepting state of DFA since 8 is a member
of S1 = {1,2,3,4,6,7,8}
Converting RE Directly to DFAs
We may convert a regular expression into a DFA (without
creating a NFA first).
First we augment the given regular expression by
concatenating it with a special symbol #.
r  (r)# augmented regular expression (make
something) greater by adding to it.)

Then, we create a syntax tree for this augmented regular

expression.
In this syntax tree, all alphabet symbols (plus # and the
empty string) in the augmented regular expression will be
on the leaves, and all inner nodes will be the operators in
that augmented regular expression.
Regular Expression  DFA (cont.)
Then each alphabet symbol (plus #) will be numbered
(position numbers).
(a|b) * a  (a|b) * a # augmented regular expression


Syntax tree of (a|b) * a #
 #
4
* a
3 • each symbol is numbered (positions)
• each symbol is at a leave
|

a b • inner nodes are operators

1 2
Minimizing Number of States of a DFA
partition the set of states into two groups:
G1 : set of accepting states
G2 : set of non-accepting states
For each new group G
partition G into subgroups such that states s1 and s2 are in
the same group if, for all input symbols a, states s1 and s2
have transitions to states in the same group.
Start state of the minimized DFA is the group containing
the start state of the original DFA.
Accepting states of the minimized DFA are the groups
containing the accepting states of the original DFA.
Minimizing DFA - Example

G1 = {2}
G2 = {1,3}

G2 cannot be partitioned because

move(1,a)=2 move(1,b)=3
move(3,a)=2 move(2,b)=3

So, the minimized DFA (with minimum states)

Minimizing DFA – Another Example

a b
1->2 1->3
2->2 2->3
3->4 3->3

So, the minimized DFA

Quiz 5%
Write LEX program to implement a simple calculator?
L1 = {0,1} L2 = {0,1}
L1L2 = {} ?

RE 0* = ?
 RE (0|1)* = ?
RE (0|1)*11 = ?

Lecture 13: Locks: Mythili Vutukuru IIT Bombay
No ratings yet
Lecture 13: Locks: Mythili Vutukuru IIT Bombay
12 pages
Optical Sensor Systems in Biotechnology, 1st Edition Verified Download
100% (19)
Optical Sensor Systems in Biotechnology, 1st Edition Verified Download
16 pages
IEEE Pervasive Computing
No ratings yet
IEEE Pervasive Computing
80 pages
B Com 1st, 3rd, 5th
No ratings yet
B Com 1st, 3rd, 5th
1 page
Operating Instructions Clv63x Clv64x Clv65x Bar Code Scanners en Im0071081
No ratings yet
Operating Instructions Clv63x Clv64x Clv65x Bar Code Scanners en Im0071081
268 pages
纸张研究
100% (2)
纸张研究
12 pages
Job Cover Letter With Referral
100% (1)
Job Cover Letter With Referral
6 pages
10-7-24 Computer
No ratings yet
10-7-24 Computer
1 page
New Model Service Ratio - 15022025
No ratings yet
New Model Service Ratio - 15022025
36 pages
Dibrugarh University Certificate
No ratings yet
Dibrugarh University Certificate
2 pages
Orange and White Modern Creative Marketing Plan Presentation
No ratings yet
Orange and White Modern Creative Marketing Plan Presentation
12 pages
Master Thesis Results and Discussion
100% (3)
Master Thesis Results and Discussion
7 pages
Scheme of Exam JAA
No ratings yet
Scheme of Exam JAA
4 pages
Multiple Screen Addiction Part 1
0% (1)
Multiple Screen Addiction Part 1
18 pages
Network
No ratings yet
Network
9 pages
РЛС ICE RADAR FICE-100 Мануал1
No ratings yet
РЛС ICE RADAR FICE-100 Мануал1
20 pages
Multiplexer and Demultiplexer
No ratings yet
Multiplexer and Demultiplexer
11 pages
A Sustainable Quality Assessment Model For The Information Delivery in E - Learning System
No ratings yet
A Sustainable Quality Assessment Model For The Information Delivery in E - Learning System
38 pages
Compiler Design Worksheet
No ratings yet
Compiler Design Worksheet
2 pages
TM03 Website Information Architecture
No ratings yet
TM03 Website Information Architecture
53 pages
ISO - 3170 - EN. Petroleum Liquids - Manual Sampling
No ratings yet
ISO - 3170 - EN. Petroleum Liquids - Manual Sampling
11 pages
Computer Science Exit Exam Model Questions
No ratings yet
Computer Science Exit Exam Model Questions
14 pages
Exit Exam From Ministry of Education
No ratings yet
Exit Exam From Ministry of Education
90 pages
AIOps Whitepaper
100% (1)
AIOps Whitepaper
28 pages
GYTA53
No ratings yet
GYTA53
7 pages
Final Exam 50% Compiler Design
No ratings yet
Final Exam 50% Compiler Design
4 pages
Poly Studio p21 Data Sheet
No ratings yet
Poly Studio p21 Data Sheet
3 pages
UNIT 5 MCQs
No ratings yet
UNIT 5 MCQs
12 pages
MOE Computer Science Departemt Exit Exam 2016
No ratings yet
MOE Computer Science Departemt Exit Exam 2016
19 pages
Listening Skills Practice: Living Online - Exercises: Preparation
100% (1)
Listening Skills Practice: Living Online - Exercises: Preparation
2 pages
MPG12V155F
No ratings yet
MPG12V155F
2 pages
rkCD-Chapter 2 - LEXICAL ANALYSIS
No ratings yet
rkCD-Chapter 2 - LEXICAL ANALYSIS
9 pages
Bettermaker EQ232D Manual
No ratings yet
Bettermaker EQ232D Manual
6 pages
MCQs - CSE322
100% (1)
MCQs - CSE322
19 pages
02 4runner Window PDF
No ratings yet
02 4runner Window PDF
6 pages
'Case Studies'': Ar. EMMARA Abubakar Muhammad Arslan B-21878
No ratings yet
'Case Studies'': Ar. EMMARA Abubakar Muhammad Arslan B-21878
5 pages
Os Final Notes For Exam Preparation
No ratings yet
Os Final Notes For Exam Preparation
7 pages
Computer Graphics - Chapter 4 - 10
No ratings yet
Computer Graphics - Chapter 4 - 10
226 pages
Final Municipality Document
0% (1)
Final Municipality Document
104 pages
Red Lion Control DSPSX000 Data Station Plus
No ratings yet
Red Lion Control DSPSX000 Data Station Plus
8 pages
HNDIT1012 Visual Programming - Question - PaperV1.1 - Password Added
No ratings yet
HNDIT1012 Visual Programming - Question - PaperV1.1 - Password Added
4 pages
Chapter 4-Knowledge and Reasoning
No ratings yet
Chapter 4-Knowledge and Reasoning
10 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
63 pages
Final Exam Questions and Answers 1921
100% (1)
Final Exam Questions and Answers 1921
8 pages
Chapter 1
No ratings yet
Chapter 1
58 pages
National Exit Exam Term 1 and Term 2
No ratings yet
National Exit Exam Term 1 and Term 2
5 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Chapter 5 - Security Mechanisms-Unlocked122
No ratings yet
Chapter 5 - Security Mechanisms-Unlocked122
8 pages
MCQ 1 Programming
No ratings yet
MCQ 1 Programming
9 pages
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
8 pages
Parallel Processing
No ratings yet
Parallel Processing
35 pages
Exit Mock With Answer
No ratings yet
Exit Mock With Answer
25 pages
Ch-2 DFA and NFA
No ratings yet
Ch-2 DFA and NFA
27 pages
Chapter One:data Structures and Algorithm Analysis
No ratings yet
Chapter One:data Structures and Algorithm Analysis
209 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
77 pages
Advanced Programming Exit Exam
No ratings yet
Advanced Programming Exit Exam
11 pages
Chapter 3 - Syntax Analyzer
No ratings yet
Chapter 3 - Syntax Analyzer
28 pages
Cse CD QB R18
No ratings yet
Cse CD QB R18
30 pages
SQL Queries Table Name:-Employee: Empi D Empnam E Departmen T Contactno Emailid Empheadi D
No ratings yet
SQL Queries Table Name:-Employee: Empi D Empnam E Departmen T Contactno Emailid Empheadi D
4 pages
Compiler Design Chapter 2
No ratings yet
Compiler Design Chapter 2
14 pages
Aa
No ratings yet
Aa
15 pages
Beki Exam.
No ratings yet
Beki Exam.
4 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
DLD-Chap - 3-Logic Gates
0% (1)
DLD-Chap - 3-Logic Gates
40 pages
Exit Exam Training
No ratings yet
Exit Exam Training
16 pages
SWE-202 Automata Theory Formal Language Final Exam Paper
No ratings yet
SWE-202 Automata Theory Formal Language Final Exam Paper
3 pages
Wolkite University: Complexity Theory (Cosc4131) Chapter Two
100% (1)
Wolkite University: Complexity Theory (Cosc4131) Chapter Two
67 pages
Unit 1: Compiler Design
No ratings yet
Unit 1: Compiler Design
74 pages
Solution-: Calculate The Points Between The Starting Point (5, 6) and Ending Point (13, 10) - Using DDA Algorithm
100% (1)
Solution-: Calculate The Points Between The Starting Point (5, 6) and Ending Point (13, 10) - Using DDA Algorithm
3 pages
BITF19A005 - COAL Lab 4
100% (1)
BITF19A005 - COAL Lab 4
2 pages
Homework 3
No ratings yet
Homework 3
14 pages
Compiler Design Final Question Bank
No ratings yet
Compiler Design Final Question Bank
5 pages
Chapter 3 Syntax Analysis (Parsing)
No ratings yet
Chapter 3 Syntax Analysis (Parsing)
29 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Microlink Information Technology College
No ratings yet
Microlink Information Technology College
43 pages
Advanced Database Technology: Ambo University
100% (1)
Advanced Database Technology: Ambo University
28 pages
Distributed System: Naming System in DS
No ratings yet
Distributed System: Naming System in DS
51 pages
Chapter 4: Data Movement Instructions
No ratings yet
Chapter 4: Data Movement Instructions
39 pages
Compiler Design Chapter-6
No ratings yet
Compiler Design Chapter-6
83 pages
Compiler Design Chapter-1
No ratings yet
Compiler Design Chapter-1
41 pages
Chapter One: Introduction To Object-Oriented Programming
No ratings yet
Chapter One: Introduction To Object-Oriented Programming
16 pages
Question Text: Fill in The Blanks: Write The Missing Word To Complete The Sentence
100% (1)
Question Text: Fill in The Blanks: Write The Missing Word To Complete The Sentence
39 pages
DLF 2 Mark Question Banks
100% (1)
DLF 2 Mark Question Banks
5 pages
Chapter 3 - Simple Sorting and Searching
100% (1)
Chapter 3 - Simple Sorting and Searching
18 pages
Allslides Handout
No ratings yet
Allslides Handout
269 pages
Architectures For Distributed Systems
No ratings yet
Architectures For Distributed Systems
52 pages
Chapter 3-JAVA GUI Programming-Reveiw Final
No ratings yet
Chapter 3-JAVA GUI Programming-Reveiw Final
88 pages
Web Programming Final Exam: 1. Answer The Following Questions
No ratings yet
Web Programming Final Exam: 1. Answer The Following Questions
4 pages
Compiler Design Questions
No ratings yet
Compiler Design Questions
6 pages

Chapter 2 - Lexical Analyser

Uploaded by

Chapter 2 - Lexical Analyser

Uploaded by

Compiler Design

Token type and its attribute uniquely identifies a lexeme.

Efficient Scanners can be built using regular expressions

Generally, we have four classes of tokens that are usually

For this and other reasons, it is desirable for the lexical

Positive Closure : The infinite set of all possible strings,

L13 = all strings with length three (using a,b,c,d}

Regular expressions are a very convenient (suitable) form

For example, the RE (a| b)*aa represents the infinite set

We can drop redundant parenthesis by assuming:

Lexeme: a sequence of characters in the source program

A Pattern is a rule describing the set of lexemes that can

A regular definition is a sequence of the definitions of the

digit → [0-9] ---- any of the numerals from 0-9.

The Lex compiler transforms lex.l to a C program, in a file

Creating a lexical analyzer with Lex

Every NFA can be converted into an equivalent DFA.

This is accomplished in two steps: first it converts REs into

In the case of multiple transitions from a state over the

- transitions are allowed in NFAs. In other words, we can

The language recognized by this NFA is (a|b)* ab

Transition table for the NFA of RE (a|b)*abb

The language recognized by this DFA is also (a|b)* ab

S0 is the start state of DFA since 0 is a member of

Then, we create a syntax tree for this augmented regular

a b • inner nodes are operators

G2 cannot be partitioned because

So, the minimized DFA (with minimum states)

So, the minimized DFA

You might also like