0% found this document useful (0 votes)

62 views38 pages

Chapter 2 - Lexical Analyser

This document discusses lexical analysis in compiler design. It covers the role of the lexical analyzer in removing whitespace and comments and identifying tokens. Regular expressions are used to specify patterns for tokens like identifiers, numbers, and keywords. A lexical analyzer can be built by hand or using a tool like Lex, which generates C code from a specification file. Finite automata, specifically deterministic finite automata (DFAs), are commonly used to recognize tokens based on their regular expression patterns. The document provides examples of regular expressions and operations on languages.

Uploaded by

Yitbarek Murche

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views38 pages

Chapter 2 - Lexical Analyser

Uploaded by

Yitbarek Murche

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 38

Compiler Design

Instructor: Mohammed O.
Email: [email protected]
Samara University
Chapter Two
This Chapter Covers:
Role of lexical analyser
Token Specification and Recognition
NFA to DFA
Lexical Analyzer
Lexical Analyzer reads the source program character by
character to produce tokens.
Normally a lexical analyzer doesn’t return a list of tokens
at one shot, it returns a token when the parser asks a
token from it.

3
2

1
Tokens/Patterns/Lexemes/Attributes
A token is sequence of characters which represents a unit
of information in the source program.

Lexeme: a sequence of characters in the source program

that is matched by the pattern for a token.

A Pattern is a rule describing the set of lexemes that can

represent a particular token in source programs.

Pattern is a regular definition.

(cont’d)
An attribute of a token is usually a pointer to the symbol
table entry that gives additional information about the
token, such as its type, value, line number, etc.

Example:

Token Lexeme Pattern

num 3.1416, 0.6, 6.22 Any numeric constant
literal “hello world” Any character b/n “and” except”

When more than one pattern matches a lexeme, the lexical

analyzer must provide additional information about the
particular lexeme.
(cont’d)
Example: X = B*1

Token and associated attributes:

<id, attr> where attr is pointer to the symbol table for X
<assignOp> no attribute is needed (if there is only one assignment operator)
<id, attr> where attr is pointer to the symbol table for B
<multiOp> no attribute is needed
<num,val> where val is the actual value of the number.
Scanner
A scanner groups (classed together) input characters into
tokens.
For example, if the input is:
x = x*(b+1); then the scanner generates the following
sequence of tokens
id(x), =, id(x), *, (, id(b), +, num(1), ), ;

Each time the parser needs a token, it sends a request to

the scanner.
Then, the scanner reads as many characters from the input
stream as it is necessary to construct a single token.
(cont’d)
The scanner may report an error during scanning.
Otherwise, when a single token is formed, the scanner is
suspended (stop from being active temporarily) and
returns the token to the parser.

The parser will repeatedly call the scanner to read all the
tokens from the input stream or until an error is detected
(such as a syntax error).

Some tokens require some extra information.

For example, an identifier is a token (so it is represented by
some number) but it is also associated with a string that
holds the identifier name.
(cont’d)
For example, the token id(x) is associated with the string, "x".
Similarly, the token num(1) is associated with the number, 1.
Tokens are specified by patterns, called regular expressions.
For example, the regular expression [a-z][a-zA-Z0-9]*
recognises all identifiers with at least one alphanumeric letter
whose first letter is lower-case alphabetic.

A typical scanner:
recognises the keywords of the language (these are the
reserved words that have a special meaning in the language,
such as the word class in Java); (such as the #include "file"
directive in C).
(cont’d)
recognises special characters, such as parentheses ( and ),
or groups of special characters, such as := (equal by
definition) and ==;
recognises identifiers, integers, reals, decimals, strings, etc;
ignores whitespaces and comments;

Efficient scanners can be built using regular expressions

and finite automata.

There are automated tools called scanner generators, such

as flex (Fast Lexical Analyzer Generator) for C and JLex
for Java, which construct a fast scanner automatically
according to specifications (regular expressions).
Role of Lexical Analyser
Lexical analyzer performs below given tasks:-
 Remove white spaces and comments from the source program.
 Correlates (make correct) error messages with the source
program.
 Read input characters from the source program.
 Helps to identify token into the symbol table.
Example: Symbol table for a code:
//Define a global function
int add(int a, int b) {
int sum =0;
sum =a+b;
return sum; }
Lexical Analysis
In lexical analysis, we read the source programme
character by character and converge (meet) them to
tokens.

A token is the smallest unit recognisable by the compiler.

Generally, we have four classes of tokens that are usually

recognised and they are:
1. Keywords
2. Identifies
3. Constants
4. Delimiters
Construction of Lexical Analyser
There are 2 general ways to construct lexical analyser:
Hand implementation
Automatic generation of lexical analyser

Hand Implementation
There are two ways (methods) to use hand implementation:
Input Buffer approach
Transitional diagrams approach

Input Buffering
The lexical analyser scans the characters of the source
programme one at a time to discover tokens.
(cont’d)
Often, many characters beyond (in addition to) the next
token may have to be examined before the next token itself
can be determined.

For this and other reasons, it is desirable for the lexical

analyser to read its input from an input buffer.

The input buffer is a location that holds all incoming

information before it continues to the CPU for processing.
Operations on Languages
Concatenation: The operation of joining two or more strings
together.
L1L2 = { s1s2 | s1  L1 and s2  L2 }
Union: The operation of combining the result set of two or
more strings.
L1 L2 = { s | s  L1 or s  L2 }
Exponentiation: Repeated strings of the base.
L0 = {} L1 = L L2 = LL
Kleene Closure : Infinite set of all possible strings, including
the emptyε string.

L* = i 0
Li

Positive Closure : The infinite set of all possible strings,

excludingε.
L =+  Li

i 1
Example
L1 = {a,b,c,d} L2 = {1,2}

L1L2 = {a1,a2,b1,b2,c1,c2,d1,d2}

L1  L2 = {a,b,c,d,1,2}

L13 = all strings with length three (using a,b,c,d}

L1* (zero or more)= all strings using letters a, b, c, d and empty (ε )
string.
a * =i.e., it can generate {ε, a, aa, aaa, …}
L1+ (one or more) = doesn’t include the empty (ε ) string.
a + = i.e., it can generate {a, aa, aaa, …}
Regular Expressions
Regular Expressions (REs)
We use regular expressions to describe tokens of a
programming language.

Regular expressions are a very convenient (suitable) form

of representing (possibly infinite) sets of strings, called
regular sets.

For example, the RE (a|b)*aa represents the infinite set

{"aa","aaa","baa","abaa", ... }, which is the set of all strings
with characters a and b that end in aa.
RE Order of Precedence
We can freely put parentheses around REs to denote the
order of evaluation.

We can drop redundant parenthesis by assuming:

The Kleene star operator * has the highest precedence and
is left associative.

Concatenation has the next highest precedence and is left

associative.

The union operator has the lowest precedence and is left

associative.
Regular Expression
To write regular expression for some languages can be
difficult, because their regular expressions can be quite
complex. In those cases, we may use regular definitions.

A regular definition is a sequence of the definitions of the

form and they are described using regular definitions, as
follow.

digit → [0-9] ---- any of the numerals from 0-9.

letter → [A-Za-z] ---- a set of upper and lower case letters.
id → letter ( letter \ digit )* --- a set of letters, underscore or
digits (0-9).
relop → < | > | <= | >= | = | <>
Lexical Analyser generator Lex
There are tools that can generate lexical analyzers.
Lex is a special-purpose programming language for creating
programmes to process streams of input characters.
An input file, which we call lex.l, is written in the Lex
language and describes the lexical analyzer to be generated.

The Lex compiler transforms lex.l to a C program, in a file

that is always named lex.yy.c. The latter file is compiled by
the C compiler into a file called a.out, as always.
The C-compiler output is a working lexical analyzer that can
take a stream of input characters and produce a stream of
tokens.
(cont’d)

Creating a lexical analyzer with Lex

Lex Specifications
Lex source is separated into three sections by %% delimiters
Declarations :- This section includes declaration of variables, constants and
regular definitions.
%%
translation rules :- defines the rules that parse the input stream (regular
expressions ).
%%
auxiliary functions (optional)
Steps in lex implementation
1. Read input language specification
2. Construct NFA with epsilon-moves (Can also do DFA
directly)
3. Convert NFA to DFA
4. Optimise the DFA
5. Generate parsing tables & code
Finite Automata
A finite automaton can be: deterministic(DFA) or non-
deterministic (NFA)
Both deterministic and non-deterministic finite automaton
recognize regular sets.
Which one?
deterministic – faster recognizer, but it may take more
space
non-deterministic – slower, but it may take less space
deterministic automatons are widely used lexical
analyzers.
First, we define regular expressions for tokens; Then we
convert them into a DFA to get a lexical analyzer for our
tokens.
(cont’d)
Algorithm1: Regular Expression  NFA  DFA (two
steps: first to NFA, then to DFA)
Algorithm2: Regular Expression  DFA (directly
convert a regular expression into a DFA)
Converting a RE to an NFA
Every regular expression (RE) can be converted into an
equivalent NFA.

Every NFA can be converted into an equivalent DFA.

The task of a scanner generator, such as JLex, is to

generate the transition tables or to synthesise the scanner
programme given a scanner specification (in the form of a
set of REs).

This is accomplished in two steps: first it converts REs into

an NFA and then it converts the NFA into a DFA
(Algorithm1).
(cont’d)
An NFA is similar to a DFA but it also permits (allow)
multiple transitions over the same character and
transitions over ɛ.

In the case of multiple transitions from a state over the

same character, when we are at this state and we read this
character, we have more than one choice; the NFA
succeeds (achieve) if at least one of these choices.

The ɛ-transition does not consume any input characters, so

you may jump to another state for free.

Clearly (obviously) DFAs are a subset of NFAs.

Non-Deterministic Finite Automaton
A non-deterministic finite automaton (NFA) is a
mathematical model that consists (made up) of:
 S - a set of states
  (sigma) - a set of input symbols (alphabet)
 move – a transition function
 s0 - a start (initial) state
 F – a set of accepting states (final states)

- transitions are allowed in NFAs. In other words, we can

move from one state to another one without consuming any
symbol.
(cont’d)
A NFA accepts a string x, if and only if there is a path from
the starting state to one of accepting states such that edge
labels along this path spell out x.
NFA (Example)
0 is the start state s0
{2} is the set of final states F
 = {a,b}
S = {0,1,2}
Transition Function: a b
0 {0,1} {0}
1 _ {2}
2 _ _

The language recognized by this NFA is (a|b)* ab

Transition Tables
We can also represent an NFA by a transition table, whose
rows correspond to states, and whose columns correspond
to the input symbols and ɛ.
The entry for a given state and input is the value of the
transition function applied to those arguments.
If the transition function has no information about that
state-input pair, we put 0 in the table for the pair.

Transition table for the NFA of RE (a|b)*abb

Deterministic Finite Automaton (DFA)
A Deterministic Finite Automaton (DFA) is a special form
of a NFA.
no state has - transition
for each state s and input symbol a there is exactly one
transition out of s labelled a.
A DFA represents a finite state machine that recognises a
RE.

The language recognized by this DFA is also (a|b)* ab

Converting a NFA into a DFA (Example)

-closure({0}) = {0,1,2,4,7}
 mark S0
(move(S0,a)) = ({3,8}) = S1
(move(S0,b)) = ({5}) = S2
transfunc[S0,a]  S1 transfunc[S0,b]  S2
 mark S1
(move(S1,a)) = ({3,8}) = S1
(move(S1,b)) = ({5}) = S2
transfunc[S1,a]  S1 transfunc[S1,b]  S2
 mark S2
(move(S2,a)) = ({3,8}) = S1
(move(S2,b)) = ({5}) = S2
(cont’d)
S0 is the start state of DFA since 0 is a member of
S0={0,1,2,4,7}
S1 is an accepting state of DFA since 8 is a member
of S1 = {3,8}
Converting RE Directly to DFAs
We may convert a regular expression into a DFA (without
creating a NFA first).
First we augment (enlarge) the given regular expression by
concatenating it with a special symbol #.
r  (r)# augmented regular expression (make
something) greater by adding to it.)

Then, we create a syntax tree for this augmented regular

expression.
In this syntax tree, all alphabet symbols (plus # and the
empty string) in the augmented regular expression will be
on the leaves, and all inner nodes will be the operators in
that augmented regular expression.
(cont’d)
Then each alphabet symbol (plus #) will be numbered
(position numbers).
(a|b) * a  (a|b) * a # augmented regular expression


Syntax tree of (a|b) * a #
 #
4
* a
3 • each symbol is numbered (positions)
• each symbol is at a leave
|

a b • inner nodes are operators

1 2
Minimizing Number of States of a DFA
partition the set of states into two groups:
 G1 : set of accepting states
 G2 : set of non-accepting states
For each new group G
 partition G into subgroups such that states s1 and s2 are in
the same group if, for all input symbols a, states s1 and s2 have
transitions to states in the same group.

Start state of the minimized DFA is the group containing

the start state of the original DFA.

Accepting states of the minimized DFA are the groups

containing the accepting states of the original DFA.
Minimizing DFA - Example

G1 = {2}
G2 = {1,3}

G2 cannot be partitioned because

move(1,a)=2 move(1,b)=3
move(3,a)=2 move(2,b)=3

So, the minimized DFA (with minimum states)

Minimizing DFA – Another Example

a b
1->2 1->3
2->2 2->3
3->4 3->3

So, the minimized DFA

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
CD 1
No ratings yet
CD 1
92 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
30 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
1 - Scanning Slides Sanyal Part1
No ratings yet
1 - Scanning Slides Sanyal Part1
22 pages
Compiler - Lexical Analyzer-2
No ratings yet
Compiler - Lexical Analyzer-2
16 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
COS 320 Compilers: David Walker
No ratings yet
COS 320 Compilers: David Walker
38 pages
Lexical Analysis 3
No ratings yet
Lexical Analysis 3
27 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
17 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Lexical Analysis: Risul Islam Rasel
No ratings yet
Lexical Analysis: Risul Islam Rasel
148 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
Lexical Analysis
No ratings yet
Lexical Analysis
88 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
rkCD-Chapter 2 - LEXICAL ANALYSIS
No ratings yet
rkCD-Chapter 2 - LEXICAL ANALYSIS
9 pages
CD ch2
No ratings yet
CD ch2
104 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Lec 02
No ratings yet
Lec 02
17 pages
Lecture3 E
No ratings yet
Lecture3 E
153 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Lexical Analysis
No ratings yet
Lexical Analysis
153 pages
CC Unit 2
No ratings yet
CC Unit 2
80 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
ch-2 Compiler Design
No ratings yet
ch-2 Compiler Design
9 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Compiler Design Part 2
No ratings yet
Compiler Design Part 2
20 pages
Lexical Analysis
No ratings yet
Lexical Analysis
6 pages
2.1 Constituents of Lexical Analysis
No ratings yet
2.1 Constituents of Lexical Analysis
10 pages
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
No ratings yet
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
53 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
03 Lex Analysis
No ratings yet
03 Lex Analysis
61 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
CS-352 - Spring 2024 - Lec2
No ratings yet
CS-352 - Spring 2024 - Lec2
35 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Ambo University: Android Based Farmer Assistant Application
No ratings yet
Ambo University: Android Based Farmer Assistant Application
63 pages
Wolaita Sodo University: School of Informatics
No ratings yet
Wolaita Sodo University: School of Informatics
66 pages
Chapter 3 - Syntax Analyzer
No ratings yet
Chapter 3 - Syntax Analyzer
28 pages
Compiler Design: Instructor: Mohammed O. Samara University
100% (1)
Compiler Design: Instructor: Mohammed O. Samara University
28 pages
TCS Journal C017
No ratings yet
TCS Journal C017
92 pages
Theory of Computation: Dr. Krishnendu Rarhi E: Krishnendu.e9621@cumail - in
No ratings yet
Theory of Computation: Dr. Krishnendu Rarhi E: Krishnendu.e9621@cumail - in
44 pages
CS-850: Advanced Theory of Computation: Adnan Rashid
No ratings yet
CS-850: Advanced Theory of Computation: Adnan Rashid
72 pages
MCA Lateral 2017 PDF
No ratings yet
MCA Lateral 2017 PDF
53 pages
Group Assignment - Formal Language Theory
No ratings yet
Group Assignment - Formal Language Theory
8 pages
TOC Unit 1 Task1
100% (1)
TOC Unit 1 Task1
2 pages
Handbook of Exact String-Matching Algorithmss
No ratings yet
Handbook of Exact String-Matching Algorithmss
220 pages
Automata Theory Introduction
No ratings yet
Automata Theory Introduction
30 pages
Automata Theory and Computability: "NFA To DFA Conversion"
No ratings yet
Automata Theory and Computability: "NFA To DFA Conversion"
7 pages
Atcd Model QP
0% (1)
Atcd Model QP
4 pages
Compiler Design - Compilers Principles and Practice - A.hosking - Compiler Course Slides
No ratings yet
Compiler Design - Compilers Principles and Practice - A.hosking - Compiler Course Slides
237 pages
Lexical Analysis - Part II From Regular Expression To Scanner Comp 412
No ratings yet
Lexical Analysis - Part II From Regular Expression To Scanner Comp 412
17 pages
BCS Iii
No ratings yet
BCS Iii
19 pages
QB of AT ESE Even 22-23
100% (1)
QB of AT ESE Even 22-23
4 pages
Theory of Computation - TE7299
No ratings yet
Theory of Computation - TE7299
4 pages
CS 373: Theory of Computation: Manoj Prabhakaran Mahesh Viswanathan Fall 2008
No ratings yet
CS 373: Theory of Computation: Manoj Prabhakaran Mahesh Viswanathan Fall 2008
15 pages
Unit 3.-Finite Automatic: 3.1 Concepts: Definition and Classification of Finite Automata (AF) - Definition 1
No ratings yet
Unit 3.-Finite Automatic: 3.1 Concepts: Definition and Classification of Finite Automata (AF) - Definition 1
22 pages
TOC Merged Final
50% (2)
TOC Merged Final
3,971 pages
Formal Languages and Automata Theory QB
100% (2)
Formal Languages and Automata Theory QB
5 pages
Cse322 Formal Languages and Automation Theory
100% (1)
Cse322 Formal Languages and Automation Theory
2 pages
Formal Languages and Automata Theory PDF
No ratings yet
Formal Languages and Automata Theory PDF
6 pages
Ahmedabad Institute of Technology: Compiler Design (2120701)
No ratings yet
Ahmedabad Institute of Technology: Compiler Design (2120701)
7 pages
CSC 361 Finite Automata 1
No ratings yet
CSC 361 Finite Automata 1
33 pages
Formal Languages and Automata Theory
No ratings yet
Formal Languages and Automata Theory
12 pages
Deterministic Finite Automata: 1. (MCQ) (GATE-2021: 2M) 3. (NAT) (GATE-2021: 1M)
No ratings yet
Deterministic Finite Automata: 1. (MCQ) (GATE-2021: 2M) 3. (NAT) (GATE-2021: 1M)
50 pages
TAFL 1st Sessional
No ratings yet
TAFL 1st Sessional
2 pages
S5 Academic HandBook
No ratings yet
S5 Academic HandBook
121 pages
Unit 4 CFG
No ratings yet
Unit 4 CFG
137 pages
Deterministic and Non Deterministic
No ratings yet
Deterministic and Non Deterministic
23 pages
Computer Sc.-Syllabus
No ratings yet
Computer Sc.-Syllabus
49 pages

Chapter 2 - Lexical Analyser

Uploaded by

Chapter 2 - Lexical Analyser

Uploaded by

Compiler Design

Lexeme: a sequence of characters in the source program

A Pattern is a rule describing the set of lexemes that can

Pattern is a regular definition.

Token Lexeme Pattern

When more than one pattern matches a lexeme, the lexical

Token and associated attributes:

Each time the parser needs a token, it sends a request to

Some tokens require some extra information.

Efficient scanners can be built using regular expressions

There are automated tools called scanner generators, such

A token is the smallest unit recognisable by the compiler.

Generally, we have four classes of tokens that are usually

For this and other reasons, it is desirable for the lexical

The input buffer is a location that holds all incoming

Positive Closure : The infinite set of all possible strings,

L13 = all strings with length three (using a,b,c,d}

Regular expressions are a very convenient (suitable) form

For example, the RE (a|b)*aa represents the infinite set

We can drop redundant parenthesis by assuming:

Concatenation has the next highest precedence and is left

The union operator has the lowest precedence and is left

A regular definition is a sequence of the definitions of the

digit → [0-9] ---- any of the numerals from 0-9.

The Lex compiler transforms lex.l to a C program, in a file

Creating a lexical analyzer with Lex

Every NFA can be converted into an equivalent DFA.

The task of a scanner generator, such as JLex, is to

This is accomplished in two steps: first it converts REs into

In the case of multiple transitions from a state over the

The ɛ-transition does not consume any input characters, so

Clearly (obviously) DFAs are a subset of NFAs.

- transitions are allowed in NFAs. In other words, we can

The language recognized by this NFA is (a|b)* ab

Transition table for the NFA of RE (a|b)*abb

The language recognized by this DFA is also (a|b)* ab

Then, we create a syntax tree for this augmented regular

a b • inner nodes are operators

Start state of the minimized DFA is the group containing

Accepting states of the minimized DFA are the groups

G2 cannot be partitioned because

So, the minimized DFA (with minimum states)

So, the minimized DFA

You might also like