0% found this document useful (0 votes)

14 views20 pages

Lecture 2

Uploaded by

as.business.023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views20 pages

Lecture 2

Uploaded by

as.business.023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Lexical Analysis -

Part 1

Lexical Analysis - Part

Outline of the
Lecture

What is lexical analysis?

Why should LA be separated from syntax
analysis? Tokens, patterns, and lexemes
Difficulties in lexical analysis
Recognition of tokens - finite automata and
transition diagrams
Specification of tokens - regular expressions and
regular definitions
LEX - A Lexical Analyzer Generator

Lexical Analysis - Part

Compiler
Overview

Lexical Analysis - Part

What is Lexical
Analysis?

The input is a high level language program, such

as a ’C’ program in the form of a sequence of
characters
The output is a sequence of tokens that is sent
to the parser for syntax analysis
Strips off blanks, tabs, newlines, and comments
from the source program
Keeps track of line numbers and associates
error messages from various parts of a
compiler with line numbers
Performs some preprocessor functions such
as #define
and #include in ’C’

Lexical Analysis - Part

Separation of Lexical Analysis from Syntax
Analysis

Simplification of design - software engineering

reason I/O issues are limited LA alone
More compact and faster parser
Comments, blanks, etc., need not be handled by the
parser A parser is more complicated than a lexical
analyzer and shrinking the grammar makes the
parser faster
No rules for numbers, names, comments, etc., are
needed in the parser
LA based on finite automata are more efficient to
implement than pushdown automata used for
parsing (due to stack)

Lexical Analysis - Part

Tokens, Patterns, and
Lexemes
Running example: float abs_zero_Kelvin = -273;
Token (also called word)
A string of characters which logically belong
together float, identifier, equal, minus, intnum,
semicolon Tokens are treated as terminal
symbols of the grammar specifying the source
language
Pattern
The set of strings for which the same token is
produced The pattern is said to match each
string in the set
float, l(l+d+_)*, =, -, d+, ;
Lexeme
The sequence of characters matched by a pattern
to form the corresponding token
“float”, “abs_zero_Kelvin”, “=”, “-”, “273”, “;”
Lexical Analysis - Part
Tokens in Programming
Languages

Keywords, operators, identifiers (names),

constants, literal strings, punctuation symbols
such as parentheses, brackets, commas,
semicolons, and colons, etc.
A unique integer representing the token is passed
by LA to the parser
Attributes for tokens (apart from the integer
representing the token)
identifier: the lexeme of the token, or a pointer
into the symbol table where the lexeme is
stored by the LA
intnum: the value of the integer (similarly for
floatnum, etc.)
string: the string itself
The exact set of attributes are dependent on the
compiler designer
Lexical Analysis - Part
Difficulties in Lexical
Analysis
Certain languages do not have any reserved
words, e.g.,
while, do, if, else, etc., are reserved in ’C’, but not
in PL/1
In FORTRAN, some keywords are context-
dependent
In the statement, DO 10 I = 10.86, DO10I is an
identifier, and DO is not a keyword
But in the statement, DO 10 I = 10, 86, DO is a
keyword Such features require substantial look
ahead for resolution
Blanks are not significant in FORTRAN and can
appear in the midst of identifiers, but not so in ’C’
LA cannot catch any significant errors except for
simple errors such as, illegal symbols, etc.
In such cases, LA skips characters in the input
Lexical Analysis - Part
Specification and Recognition of
Tokens
Regular definitions, a mechansm based on
regular expressions are very popular for
specification of tokens
Has been implemented in the lexical analyzer
generator tool, LEX
We study regular expressions first, and then,
token specification using LEX
Transition diagrams, a variant of finite state
automata, are used to implement regular
definitions and to recognize tokens
Transition diagrams are usually used to model LA
before translating them to programs by hand
LEX automatically generates optimized FSA from
regular definitions
We study FSA and their generation from regular
expressions in order to understand transition
diagrams and LEX
Lexical Analysis - Part
Language
s
Symbol: An abstract entity, not defined
Examples: letters and digits
String: A finite sequence of juxtaposed symbols
abcb, caba are strings over the symbols a,b, and c
|w| is the length of the string w, and is the #symbols
in it
ϵ is the empty string and is of length 0
Alphabet: A finite set of symbols
Language: A set of strings of symbols from some
alphabet

Φ and {ϵ} are languages

The set of palindromes over {0,1} is an infinite
language The set of strings, {01, 10, 111} over
{0,1} is a finite language
If Σ is an alphabet, Σ ∗ is the set of all strings
over Σ Lexical Analysis - Part
Language
Representations
Each subset of Σ ∗ is a language
This set of languages over Σ ∗ is uncountably
infinite Each language must have by a finite
representation
A finite representation can be encoded by a finite
string Thus, each string of Σ∗ can be thought of
as representing some language over the alphabet
Σ
Σ∗ is countably infinite
Hence, there are more languages than
language representations
Regular expressions (type-3 or regular
languages), context-free grammars (type-2 or
context-free languages), context-sensitive
grammars (type-1 or context-sensitive
languages), and type-0 grammars are finite
representations of respective languages
Lexical Analysis - Part
Examples of
Languages

Let Σ = { a, b, c}
L1 = { am bn |m, n ≥ 0} is regular
L2 = {a n b n |n ≥ 0} is context-free but not regular
L3 = {a n b n c n |n ≥ 0} is context-sensitive but
neither regular nor context-free
Showing a language that is type-0, but none of
CSL, CFL, or RL is very intricate and is omitted

Lexical Analysis - Part

Automat
a
Automata are machines that accept languages
Finite State Automata accept RLs (corresponding to
REs) Pushdown Automata accept CFLs
(corresponding to CFGs) Linear Bounded Automata
accept CSLs (corresponding to CSGs)
Turing Machines accept type-0 languages
(corresponding to type-0 grammars)
Applications of Automata
Switching circuit design
Lexical analyzer in a
compiler
String processing (grep, awk),
etc.
State charts used in object-
oriented design
Modelling control applications, e.g., elevator
operation Parsers of all types
Compilers Lexical Analysis - Part
Finite State
Automaton
An FSA is an acceptor or recognizer of regular
languages An FSA is a 5-tuple, (Q, Σ, δ, q0, F ),
where
Q is a finite set of states
Σ is the input alphabet
δ is the transition function, δ : Q × Σ → Q
That is, δ(q, a) is a state for each state q and input
symbol a q0 is the start state
F is the set of final or accepting states
In one move from some state q, an FSA reads an
input symbol, changes the state based on δ, and
gets ready to read the next input symbol
An FSA accepts its input string, if starting from q0,
it consumes the entire input string, and reaches a
final state
If the last state reached is not a final state, then
the input string is rejected
Lexical Analysis - Part
FSA Example -
1

Lexical Analysis - Part

FSA Example -1
(Contd.)
Q = { q0 , q1 , q2 , q3 }
Σ = { a, b, c}
q0 is the start state and F = { q0 , q2 }
The transition function δ is defined by the
table below
state symbol
a b c
q0 q1 q3 q3
q1 q1 q1 q2
q2 q3 q3 q3
q3 q3 q3 q3

The accepted language is the set of all strings

beginning with an ’a’ and ending with a ’c’ (ϵ is also
accepted)
Lexical Analysis - Part
FSA Example -
2

Q = { q0 , q1 , q2 , q3 } , q0 is the start state

F = { q0 } , δ is as in the figure
Language accepted is the set of all strings of 0’s
and 1’s, in which the no. of 0’s and the no. of 1’s
are even numbers
Lexical Analysis - Part
Regular
Languages

The language accepted by an FSA is the set of all

strings accepted by it, i.e., δ(q0 , x )ϵF
This is a regular language or a regular set
Later we will define regular expressions and
regular grammars which are generators of
regular languages
It can be shown that for every regular expression,
an FSA can be constructed and vice-versa

Lexical Analysis - Part

Nondeterministic
FSA
NFAs are FSA which allow 0, 1, or more transitions
from a state on a given input symbol
An NFA is a 5-tuple as before, but the transition
function δ
is different
δ(q, a) = the set of all states p, such that
there is a transition labelled a from q to p
δ : Q × Σ → 2Q
A string is accepted by an NFA if there exists a
sequence of transitions corresponding to the
string, that leads from the start state to some
final state
Every NFA can be converted to an equivalent
deterministic FA (DFA), that accepts the same
language as the NFA Lexical Analysis - Part
Nondeterministic FSA
Example - 1

Lexical Analysis - Part

Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
95 pages
MOD 04 - Language Description & Lexical Analysis
No ratings yet
MOD 04 - Language Description & Lexical Analysis
107 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
100 pages
Lec 2
No ratings yet
Lec 2
30 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Compiler 2
No ratings yet
Compiler 2
38 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
CC Unit 2
No ratings yet
CC Unit 2
80 pages
Chapter 2 Lexical - Analysis
No ratings yet
Chapter 2 Lexical - Analysis
38 pages
M2 Main
No ratings yet
M2 Main
41 pages
Lecture 3
No ratings yet
Lecture 3
31 pages
CP 324 Lexical Analysis l2
No ratings yet
CP 324 Lexical Analysis l2
26 pages
Compiler Construction: Lexical Analysis
No ratings yet
Compiler Construction: Lexical Analysis
37 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
4 LexicalAnalysis
No ratings yet
4 LexicalAnalysis
27 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
CH 2
No ratings yet
CH 2
36 pages
CSC 415 Compiler Design: Lexical Analysis
No ratings yet
CSC 415 Compiler Design: Lexical Analysis
40 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
CC Note 1
No ratings yet
CC Note 1
11 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
18 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
88 pages
Compiler Design - Lexical Analysis
No ratings yet
Compiler Design - Lexical Analysis
16 pages
Lexical Analysis: Dr. Nguyen Hua Phung
No ratings yet
Lexical Analysis: Dr. Nguyen Hua Phung
27 pages
Unit 6
No ratings yet
Unit 6
109 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
26 pages
Recognition of Tokens: Expr STMT Expr STMT STMT STMT Expr Term Term Term Term If Then Else Relop Id Num
No ratings yet
Recognition of Tokens: Expr STMT Expr STMT STMT STMT Expr Term Term Term Term If Then Else Relop Id Num
15 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
2024 CD-Ch02 Lexical Analysis
No ratings yet
2024 CD-Ch02 Lexical Analysis
25 pages
Lect 03
No ratings yet
Lect 03
19 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chpater 2 Lexical Analysis
No ratings yet
Chpater 2 Lexical Analysis
48 pages
2 LexicalAnalysis
No ratings yet
2 LexicalAnalysis
11 pages
Compiler Design Part 2
No ratings yet
Compiler Design Part 2
20 pages
Lecture 3 (30-1-23)
No ratings yet
Lecture 3 (30-1-23)
11 pages
CS3304 9 LanguageSyntax 2 PDF
No ratings yet
CS3304 9 LanguageSyntax 2 PDF
39 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
Strategy Mobile Trading - Envelopes With RSI (Instructions)
No ratings yet
Strategy Mobile Trading - Envelopes With RSI (Instructions)
6 pages
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
No ratings yet
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
64 pages
Slide Set 4 Lexical Analysis
No ratings yet
Slide Set 4 Lexical Analysis
11 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
2 Lexical
100% (1)
2 Lexical
7 pages
Compiler 2
No ratings yet
Compiler 2
10 pages
2 Lex
No ratings yet
2 Lex
45 pages
C4000 PM en 31 PDF
No ratings yet
C4000 PM en 31 PDF
195 pages
2.1 Constituents of Lexical Analysis
No ratings yet
2.1 Constituents of Lexical Analysis
10 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
M.Suhaib Khalid PDF
No ratings yet
M.Suhaib Khalid PDF
10 pages
Petts, Ann - Shapley, Bernard - On Supervision - Psychoanalytic and Jungian Analytic Perspectives-Karnac (2007)
100% (1)
Petts, Ann - Shapley, Bernard - On Supervision - Psychoanalytic and Jungian Analytic Perspectives-Karnac (2007)
266 pages
Compiler
No ratings yet
Compiler
60 pages
Compiler Design - Lexical Analysis: University of Salford, UK
No ratings yet
Compiler Design - Lexical Analysis: University of Salford, UK
1 page
R&S ESW User Manual en 01
No ratings yet
R&S ESW User Manual en 01
828 pages
Lexical Analyzer (Compiler Contruction)
100% (1)
Lexical Analyzer (Compiler Contruction)
6 pages
Invitation Letter For Visa Spouse
No ratings yet
Invitation Letter For Visa Spouse
2 pages
Cima f7 dvanced-Financial-Reporting PDF
100% (1)
Cima f7 dvanced-Financial-Reporting PDF
590 pages
Gardner Denver MH5 Hydrapak
0% (1)
Gardner Denver MH5 Hydrapak
8 pages
Analog Display Digital VFO
No ratings yet
Analog Display Digital VFO
3 pages
Chapter 11 Test Bank PDF
No ratings yet
Chapter 11 Test Bank PDF
116 pages
Muhammad Okasha - KFUPM GRADUATE PROGRAMS - Online Application - Regular Programs - 241
No ratings yet
Muhammad Okasha - KFUPM GRADUATE PROGRAMS - Online Application - Regular Programs - 241
8 pages
Australian Royal Commission Into National Natural Disaster Arrangements - Report (Accessible)
No ratings yet
Australian Royal Commission Into National Natural Disaster Arrangements - Report (Accessible)
594 pages
Using ICT To Improve Your Monitoring & Evaluation: A Workbook To Help You Develop An Effective ICT System (Davey, Parkinson and Wadia (2008)
No ratings yet
Using ICT To Improve Your Monitoring & Evaluation: A Workbook To Help You Develop An Effective ICT System (Davey, Parkinson and Wadia (2008)
92 pages
Development Agreement
No ratings yet
Development Agreement
36 pages
Basic Firefighting Course
No ratings yet
Basic Firefighting Course
15 pages
Unit III
No ratings yet
Unit III
58 pages
SB3000
No ratings yet
SB3000
76 pages
Vagtacho Usb: See The List of Supported Cars For The Delco Hsfi, and Delco "F" Update
No ratings yet
Vagtacho Usb: See The List of Supported Cars For The Delco Hsfi, and Delco "F" Update
9 pages
Eric Garland V SCHROEDER Et Al 2023-2024 Custody & DENIAL - MTD Case
No ratings yet
Eric Garland V SCHROEDER Et Al 2023-2024 Custody & DENIAL - MTD Case
7 pages
Guide For The IFT Approval
No ratings yet
Guide For The IFT Approval
34 pages
2024 Amherstburg Calendar - Web
No ratings yet
2024 Amherstburg Calendar - Web
36 pages
Solution DFH Worksheet1
No ratings yet
Solution DFH Worksheet1
10 pages
COMP40004 - Web Development and Operating Systems
No ratings yet
COMP40004 - Web Development and Operating Systems
4 pages
CV Syamsul Maarif
No ratings yet
CV Syamsul Maarif
4 pages
0901d19680089cee PDF Preview Medium
No ratings yet
0901d19680089cee PDF Preview Medium
4 pages
If Else
No ratings yet
If Else
2 pages
Tuple and Dictionary
No ratings yet
Tuple and Dictionary
1 page
Reyes VS NLRC
No ratings yet
Reyes VS NLRC
2 pages
Math 7 Unit 2 Introducing Proportional Relationships Extra Practice Problems KEY
No ratings yet
Math 7 Unit 2 Introducing Proportional Relationships Extra Practice Problems KEY
2 pages
Journey Management Plan 3
No ratings yet
Journey Management Plan 3
1 page
HTML in A Day For Digital Marketing Pro Course
No ratings yet
HTML in A Day For Digital Marketing Pro Course
1 page
P18 - KHUSHI Advertising
No ratings yet
P18 - KHUSHI Advertising
2 pages
Homewwork 1
No ratings yet
Homewwork 1
2 pages
Email Exchange
No ratings yet
Email Exchange
2 pages
Heather R. Flores: Creative Director
No ratings yet
Heather R. Flores: Creative Director
1 page
Introduction to Formal Languages
From Everand
Introduction to Formal Languages
György E. Révész
2/5 (1)
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)

Lecture 2

Uploaded by

Lecture 2

Uploaded by

Lexical Analysis -

Lexical Analysis - Part

What is lexical analysis?

Lexical Analysis - Part

Lexical Analysis - Part

The input is a high level language program, such

Lexical Analysis - Part

Simplification of design - software engineering

Lexical Analysis - Part

Keywords, operators, identifiers (names),

Φ and {ϵ} are languages

Lexical Analysis - Part

Lexical Analysis - Part

The accepted language is the set of all strings

Q = { q0 , q1 , q2 , q3 } , q0 is the start state

The language accepted by an FSA is the set of all

Lexical Analysis - Part

Lexical Analysis - Part

You might also like