0% found this document useful (0 votes)

6 views40 pages

Ch3 LexicalAnalysis

The document outlines the principles of lexical analysis, which is the first phase of compilation that identifies lexical units from source code. It covers the roles of the lexical analyzer, the formation of tokens, and the use of regular expressions and transition diagrams for recognizing lexical units. Additionally, it discusses error handling strategies and introduces the Flex tool for generating lexical analyzers.

Uploaded by

Ryma Mahfoudhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views40 pages

Ch3 LexicalAnalysis

Uploaded by

Ryma Mahfoudhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Language Theory and Automata

Ch3: Lexical Analysis

by Dr. Ameni Mejri

Academic Year 2023/2024

Outline

● Generalities (Lexical analyzer, Interface to lexical analyzer, Role of

lexical analyzer).
● Lexical units, Lexemes and Models
● Regular definition
● Transition diagram
● Error handling

2
Lexical Analysis Phase

● First compilation phase.

● Recognition of lexical units from source code (character streams).
● Main lexical units :
● Single special characters: +, =, etc.
● Double special characters: <=, ++, etc.
● Key words: if, while, etc.Literal constants: 123, -5, etc.
● Identifiers: i, wind_speed, etc.

3
Lexical Analysis Phase

● The main task of the lexical analyzer is to read the input characters and, as a result, produce
a sequence of lexical units that the syntactic analyzer will use.
● The lexical analyzer and the parser (syntax analyzer) form a producer/consumer pair.
● The channel between the lexical analyzer and the syntactic analyzer is a buffer with a
capacity of a certain number of tokens. The parser sometimes needs to consult the next
tokens without consuming them.
● On receiving a "next lexical unit?" command from the parser, the lexical analyzer reads
the input characters until it can identify the next lexical unit.

provide a token
and its
Read a character attributes
Lexical Syntactic
Input
Analyzer Analyzer

Return a character next lexical unit?

4
Role of Lexical Analyzer

● Read characters from the input text: reading is done character by character until a lexical
unit is formed.
● Remove blanks, comments, etc.: although spacing and comments can play a role in
separating tokens, the lexical analyzer eliminates them.
● Form lexical units (tokens).
● Pass <lexical unit,lexical value> pairs to the Syntax Analyzer:

Example:
● when the lexical analyzer comes across a sequence of numbers as input, it sends the num
token to the parser. The value of the number is sent as an attribute of the token.
● For example, the input 25+11 is transformed into the sequence of token/attribute pairs:
● <num,25> <+,> <num,11>

● Link error messages from the compiler to the source code.

● Pre-processor processing, if any..

5
Role of Lexical Analyzer

Typical implementation diagram for the lexical analyzer

- interaction with the syntactic analyzer

Current TOKEN
Source code Lexical Syntactic
Analyzer Analyzer
Next TOKEN

Table of symbols

6
Lexical Units, lexemes and models

● In terms of a lexical unit recognized in the source text, we need to distinguish four
important concepts:

● the lexical unit

● the lexeme.
● possibly, an attribute.
● the model.

7
Lexical Units, lexemes and models

● A lexical unit, also known as a lexical token, is a pair consisting of a name and an
optional value.
● The name of the lexical unit is a class of lexemes.
● For most programming languages, the following constructs are treated as lexical
units:
● Keywords.
● Operators.
● Identifiers.
● Constants.
● Punctuation symbols ( '(', ')', ',', ':', etc.).

8
Lexical Units, lexemes and models

● A lexeme is a sequence of characters in the source program that matches the lexical
unit model.

● Example :
● const max_length = 256;
● In the previous declaration, the string max_length is a lexeme of the lexical unit Identifier.

9
Lexical Units, lexemes and models

● A model is a rule describing the strings that correspond to a lexical unit.

● For reserved words (key-words) such as const, if, while, etc., the lexeme and model
generally coincide. The model for the lexical unit const is the string const.

● For a rel_oper lexical unit representing relational operators, the model is the set of relational
operators: <, < =, ==, >=, >, !=.

● To precisely describe the models (patterns) of more complex lexical units such as identifiers
and numbers, we use regular expressions.

● In pattern-driven programming, patterns are expressed by regular expressions.

● Languages and tools are available for efficient recognition of regular expressions by finite
automata.

10
Lexical Units, lexemes and models

● An attribute is defined as a pointer to the symbol table entry in which information

about the lexical unit is stored.

● The attribute, if it exists, depends on the lexical unit in question, and completes it. Only the
last of the two preceding units have an attribute:
● For a number, this is its value (123, -5).
● For an identifier, this is a reference to a table containing all identifiers encountered.

● For diagnostic purposes, the line number where a lexical unit first appears can be stored.
Both the lexical unit and the line number can be stored in the associated symbol table entry.

11
Lexical Units, lexemes and models

Lexical unit Lexemes Informal description of models

const const const

if if if

rel_oper < <= == != >= > < <= == != >= >

identifier e pi length Letter followed by letters or numbers or the '_' character

number 3.141 256 0.196 Numerical constants.

literal « stack overflow » Strings enclosed in quotation marks.

12
Specification of lexical units

● Words (Strings).
● Languages.
● Regular expressions.
● Regular definitions.

13
Specification of lexical units

Words (Strings).
● An alphabet is a finite set of symbols.
● Examples: {0; 1}, {A; C; G; T}, the set of all letters, the set of all numbers, the ASCII code, etc.
● Blank characters (i.e. spaces, tabs and end-of-line marks) are generally not part of
alphabets.
● A string (or word) in the alphabet is a finite sequence of symbols extracted from it.
● Examples, relating respectively to the preceding alphabets:
● 00011011,
● ACCAGTTGAAGTGGACCTTT,
● Bonjour,
● 2001.
● An empty string with no characters is 
● The length of a string s, |s|, is the number of occurrences of symbols in s.
● The string  is of length 0.
● A language on an alphabet is a set of strings built on it.
● Trivial examples: , the empty language, {}, the language reduced to the single empty string. More
interesting examples (relative to the preceding alphabets): the set of numbers in binary notation, the set
of DNA strings, the set of words in the French language, etc.

14
Recognizing lexical units

Goal
● Build a lexical analyzer that isolates the lexemes associated with the next lexical unit and
produces a pair consisting of the appropriate lexical unit and an attribute value using the
table described in the following.
● Blanks defined by the ER "blank" are eliminated by the lexical analyzer.
● When the lexical analyzer encounters blanks, it continues searching for a significant lexical
unit, which it returns to the syntactic analyzer.

15
Recognizing lexical units

Regular Expression Lexical Unit Value of attribute

blank
if if
then then
else else
id id Pointer to an entry in the symbol table
float float Pointer to an entry in the symbol table
< relop LT
<= relop LE
= relop EQ
<> relop DIFF
>= relop GE
> relop GT

16
Transition diagrams

● Transition diagrams describe the actions that are performed when the
parser calls a lexical analyzer to provide the next lexical unit.

● An initial state of the diagram.

● Entering a state reads the next character.

● If the label of an arc coming out of the current state matches the input
character, we move on to the state pointed to by this arc. Otherwise, an
error is signalled.

17
Transition diagrams

ExAmples begin < = return (relop, LE)

0 1 2
>
3 return (relop, DIFF)
other
= 4 * return (relop, LT)

> 5 return (relop,EQ)

6 = 7 return (relop, GE)

other *
8 return (relop, GT)

transition Diagram for relation operators

Accept state
Letter | digit

begin letter other *

9 10 11 return (token_id (), insert_id())

transition Diagram for identifiers and key-words

18
Transition diagrams

● Separate keywords from identifiers.

● Place the keywords (if, then, else) in the symbol table before starting the
analysis.
● Note in the symbol table the lexical unit to be returned when one of these
strings is recognized.
● The insert_id procedure accesses the buffer where the lexical unit was
found. It works as follows:
● The symbol table is examined. If the lexeme is found with the keyword
indication, 0 is returned.
● If the lexeme is found as a variable, a pointer to a symbol table entry is
returned.
● If the lexeme is not found in the symbol table, it is stored there and a pointer to
this new entry is returned.
● The token_id procedure searches for the lexeme in the symbol table; if it's
a keyword, the corresponding lexical unit is returned, otherwise an id is
returned.

19
Transition diagrams

Keyword processing (if and else) by automata

20
Construction of a Transition diagram

begin < =
0 1 2 return (relop, LE)
>
3 return (relop, DIFF)
other
= 4 * return (relop, LT)

> 5 return (relop,EQ)

6 =
7 return (relop, GE)
other
Letter | digit 8 * return (relop, GT)

letter other *
9 10 return (token_id (), insert_id())
digit digit digit
digit . digit E +|- digit other *
11 12 13 14 15 16 18
digit E digit digit

digit . digit other *

19 20 21 22
digit

digit other *
23 24
blank
blank other *
25 26

21
Transition diagrams

● There are a few principles to observe:

● an accepting state does not consume characters;

● a non-accepting state (usually) consumes characters;
● each arc can only consume one character at a time, possibly selected from
several possibilities;
● automata must consume as many characters as possible with each token
recognized (lexical voracious analysis or maximal-munch tokenization).

● Voracious lexical analysis is the operation used to read the maximum

number of characters before accepting, when recognizing a token.

22
Transition diagrams

token lexical () {
while (TRUE) {
switch (state) {
case 0: c = nextchar (); case 24: back (1);
if (c == SPACE || c == TAB || c == EOL); insert_nb ();
else if (c == '<') state = 1; return NB;
else if (c == '=') state = 5; }
else if (c == '>') state = 6; }
else state = failure (); }
break;
/* ... States 1 to 8 */
case 9: c = nextchar (input);
if (isalnum (c)) state = 10;
else state = failure ();
break;
case 10: back (1); back (n) moves back n characters in the buffer
insert_id (); failure () : error recovery routine
return (token_id());
/* ... State 12 to 22 */
case 23: c = nextchar ();
if (!isdigit (c)) state = 24;
break;

23
Error Handling
● Some errors are lexical in nature. For example, encountering the ASCII
character number 14 (shift out), which should never appear in the source
programs of a given language.

● Many errors cannot be handled by the lexical analyzer. For example

fi ( a == f(x) )….
● This error seems lexical to us because of the spelling mistake, but 'fi' is a
perfectly valid identifier.

● Examples of lexical errors:

● Identifier too long
● Character constant too long
● Invalid character
● End of file in a comment: fatal error
● End of file in a string: fatal error
● Error in a numerical constant
● Line too long...
24
Error Handling
There are several ways to handle a lexical error:
● Stop the whole compilation with an error message.
● Drop characters from the input until a well-formed token is available.
● Perform one or more editing operations, such as deleting a character:
● Delete a character (ensures termination).
● Insert a [suitable] character
● Replace one character with another.
● Reverse the order of two consecutive characters.
● Find a minimum editing distance to obtain a lexically valid program from
the source program.

25
TP 1: Flex – What is a lexical analyzer?

● reads a source text (sequence of characters): input

● produces a sequence of lexical units: output

⇨ Lexical unit recognition is based on the notion of regular

expressions.

26
TP 1: Flex – How to build a lexical analyzer?

● define lexical units

● model each lexical unit by a RE
● represent each RE by an automaton
● build the global diagram
● implement the resulting diagram by hand

● Implementing a diagram with a large number of states by hand is not an easy

enough task.
● If you need to add or modify a lexical unit, you have to go through the whole prog
to make the necessary changes.

⇨ several tools to simplify these tasks: for example, FLEX

27
TP1 : Flex – Flex tool

● GNU version of Lex

● lexical analyzer generator
● accepts as input lexical units in the form of RE and produces a prog
written in C
● Once the C prog has been compiled, it can recognize these lexical units.
● The resulting executable reads the input text character by character until
it identifies the longest prefix in the source text that matches one of the
REs.

28
TP1 : Flex – Flex tool

29
TP1 : Flex – Flex tool

● A .l specification file consists of 4 parts:

%{
Declarations in c
%}

Declaring regular definitions

%%
Translation rules
%%

Main block and auxiliary functions in C

30
TP1 : Flex – Flex tool

● Regular definitions :
● A regular definition allows you to associate a name with a regular expression and
then refer to that name, rather than the regular expression, in the rules section.

● Translation rules :
● exp1 { action1}
exp2 { action2}...
...
expn { actionn}

Each expi is a regular expression that models a lexical unit.

Each actioni is a sequence of instructions in C.

31
TP1 : Flex – Flex tool

● Flex regular expressions:

● c : the character c.
● . : any single character, except carriage return
● [ ]: one of the specified characters , exp: [abcdABCD0123]
● - : a character interval if in [ ] , exp : [a-dA-D0-3]
● * : repeat (zero or more) , exp : [ab]*
● +: repetition (at least one), exp : [ab]+ is [ab][ab]*
● | : alternative , exp : 000|110|101|011
● ( ): expression grouping , exp : (0|2|4|8)*

32
TP1 : Flex – Flex tool

● Variables:
● yyin: read file (default: stdin)

● yyout: write file (default: stdout)

● char yytext []: character array containing the accepted lexeme.

● int yyleng: length of the accepted lexeme.

● Functions :
● int yylex ( ) : function that starts the parser.
● int yywrap ( ): function always called at the end of the input text. It returns 0
if the analysis is to continue and 1 otherwise.

33
TP1 : Flex – Setting up and preparing resources

● install the Flex and Codeblocks tools

● add to the path environment variable the "bin" location in both the
CodeBlocks and GNUWin32 folders
C:\Program Files (x86)\CodeBlocks\MinGW\bin
C:\Program Files (x86)\GnuWin32\bin (flex)
● create a folder on the desktop and start the tutorial
● From the command prompt, issue the command: flex FileName.l
● If successful, the file lex.yy.c is generated in the same directory.
● Compile the file lex.yy.c to generate the executable: gcc lex.yy.c
● Test the a.exe file obtained to check that it works correctly.

34
First step with Flex – First Example

1. Start by installing the Flex and CodeBloks tools (development tools

using the C language).
2. Write the following Flex program to tell whether an input string is a
binary number or not: Open a new text file and type in the above code.
The file must be saved with the .l extension (e.g. binary.l).

%%
(0|1)+ printf (« it is a binary number");
.* printf (« it is not a binary number");
%%

int yywrap(){return 1;}

main() { yylex();
}

35
First step with Flex – First Example

1. Place the resulting file in 'C:\Program Files\GnuWin32\bin ‘.

2. From the command prompt, issue the command: C:\Program
Files\GnuWin32\bin> flex binary.l
3. If successful, the file lex.yy.c is generated in the same directory.
4. Compile the lex.yy.c file from the command prompt, issue the
command: C:\Program Files\GnuWin32\bin> gcc lex.yy.c to generate
the executable.
5. Test the lex.yy.exe file obtained to check that it works correctly.

36
First step with Flex – Other Examples

1. Modify the previous exercise to display only recognized binary

numbers.
2. Write and compile the following specification file:

pairpair (aa|bb)((ab|ba) (aa|bb)(ab|ba) (aa|bb))

%%
{pairpair} printf ("[%s]: even number of a and b \n", yytext);
a*b* printf ("[%s]: first a's, then b’s \n", yytext);
.
%%
int yywrap(){return 1;}
main() { yylex();
}

37
First step with Flex – Other Examples

1. Test babbaaab abbb aabb baabbbb bbaabbba baabbbbab aaabbbba

inputs.
2. Same question, swapping the two lines:

pairpair (aa|bb)((ab|ba) (aa|bb)(ab|ba) (aa|bb))

%%
a*b* printf ("[%s]: first a's, then b’s \n", yytext);
{pairpair} printf ("[%s]: even number of a and b \n", yytext);
.
%%
int yywrap(){return 1;}
main() { yylex();
}

38
First step with Flex – Other Examples

1. Is there a difference? What difference? Why or why not?

2. Consider the lexical unit id defined as follows: an identifier is a
sequence of letters and numbers. The first character must be a letter.
Using Flex, write a lexical analyzer that can recognize the lexical unit id
from an input string.
3. Modify the previous exercise so that the lexical analyzer recognizes the
two lexical units id and nb, knowing that nb is a lexical unit that
designates natural integers.

39
First step with Flex – Example: if (vitesse >= 110)
i.f : LU KW m1
(a|..|z).(a|..|z|0|..|9)* = [a-z].[a-z0-9]* : LU ID m2
[0-9]+ : LU NB m3
…

SOP - University of Texas at Austin
No ratings yet
SOP - University of Texas at Austin
2 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Chap-7 Memory and Programmable Logic 4th Ed. Mano
100% (1)
Chap-7 Memory and Programmable Logic 4th Ed. Mano
42 pages
Sizing of Amine Absorber
No ratings yet
Sizing of Amine Absorber
7 pages
MTS 2020 06 October 2021 Shift 3 in English
No ratings yet
MTS 2020 06 October 2021 Shift 3 in English
29 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
95 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Lecture3 E
No ratings yet
Lecture3 E
153 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
IBM Tivoli Monitoring Exploring
No ratings yet
IBM Tivoli Monitoring Exploring
172 pages
An Introduction To Submarine Cables
100% (1)
An Introduction To Submarine Cables
7 pages
CC Unit 2
No ratings yet
CC Unit 2
80 pages
How To Apply Initial Stress Using INISTATE
No ratings yet
How To Apply Initial Stress Using INISTATE
4 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
MOD 04 - Language Description & Lexical Analysis
No ratings yet
MOD 04 - Language Description & Lexical Analysis
107 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
100 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
100 pages
Lexical Analysis
No ratings yet
Lexical Analysis
153 pages
Compiler Construction Lexical Analysis
No ratings yet
Compiler Construction Lexical Analysis
63 pages
CSC 415 Compiler Design: Lexical Analysis
No ratings yet
CSC 415 Compiler Design: Lexical Analysis
40 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Unit 6
No ratings yet
Unit 6
109 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
Chpater 2 Lexical Analysis
No ratings yet
Chpater 2 Lexical Analysis
48 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Compiler Design Part 2
No ratings yet
Compiler Design Part 2
20 pages
2024 CD-Ch02 Lexical Analysis
No ratings yet
2024 CD-Ch02 Lexical Analysis
25 pages
CP 324 Lexical Analysis l2
No ratings yet
CP 324 Lexical Analysis l2
26 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
L4 - Lexical Analysis
No ratings yet
L4 - Lexical Analysis
44 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Padovan 2014
No ratings yet
Padovan 2014
11 pages
L2 Lexical Analysis
No ratings yet
L2 Lexical Analysis
59 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
CCNA-1 Answer
No ratings yet
CCNA-1 Answer
14 pages
M2 Main
No ratings yet
M2 Main
41 pages
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
No ratings yet
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
64 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
88 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
4 LexicalAnalysis
No ratings yet
4 LexicalAnalysis
27 pages
Compiler
No ratings yet
Compiler
60 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Lecture 2
No ratings yet
Lecture 2
20 pages
CH 2
No ratings yet
CH 2
36 pages
Adept Owl Simulated Business
No ratings yet
Adept Owl Simulated Business
64 pages
CS3304 9 LanguageSyntax 2 PDF
No ratings yet
CS3304 9 LanguageSyntax 2 PDF
39 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Compiler - Lexical Analyzer-2
No ratings yet
Compiler - Lexical Analyzer-2
16 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Uiet 2009 Cutoff
No ratings yet
Uiet 2009 Cutoff
17 pages
Lect 03
No ratings yet
Lect 03
19 pages
BST, S&I, and EI: Lab Manual
No ratings yet
BST, S&I, and EI: Lab Manual
28 pages
Starting A Project
No ratings yet
Starting A Project
36 pages
Aes56 2008
No ratings yet
Aes56 2008
16 pages
Crashing
No ratings yet
Crashing
33 pages
Toc Theory
No ratings yet
Toc Theory
7 pages
Unit-5 Operator Overloading
No ratings yet
Unit-5 Operator Overloading
8 pages
Boolean Algebra and Logic Gates
No ratings yet
Boolean Algebra and Logic Gates
11 pages
Compiler Design 2
No ratings yet
Compiler Design 2
9 pages
Lexical Analysis I: Compiler Construction
No ratings yet
Lexical Analysis I: Compiler Construction
35 pages
Om Technology Introduction-1
No ratings yet
Om Technology Introduction-1
14 pages
rkCD-Chapter 2 - LEXICAL ANALYSIS
No ratings yet
rkCD-Chapter 2 - LEXICAL ANALYSIS
9 pages
Hyundai HX380L FT
No ratings yet
Hyundai HX380L FT
10 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
CSI 411 - Compiler - Lecture 2 PDF
No ratings yet
CSI 411 - Compiler - Lecture 2 PDF
22 pages
CC Note 1
No ratings yet
CC Note 1
11 pages
Database Management System - Practical File
No ratings yet
Database Management System - Practical File
11 pages
Sales Promotion On Two Wheeler Dealers in Coimbatore
No ratings yet
Sales Promotion On Two Wheeler Dealers in Coimbatore
16 pages
BeatProfiler Multimodal in Vitro Analysis of Cardiac Function Enables Machine Learning Classification of Diseases and Drugs
No ratings yet
BeatProfiler Multimodal in Vitro Analysis of Cardiac Function Enables Machine Learning Classification of Diseases and Drugs
12 pages
2 - Lexical AnalysisD
No ratings yet
2 - Lexical AnalysisD
9 pages
Lexical Analyzer in Perspective: Parser Source Program Token
No ratings yet
Lexical Analyzer in Perspective: Parser Source Program Token
22 pages
Official Document Vector Art, Icons, and Graphics For
No ratings yet
Official Document Vector Art, Icons, and Graphics For
11 pages
M.Suhaib Khalid PDF
No ratings yet
M.Suhaib Khalid PDF
10 pages
Software Questionbank 1st Edition
No ratings yet
Software Questionbank 1st Edition
3 pages
Hardware of The PIC16F877
No ratings yet
Hardware of The PIC16F877
2 pages
CV - CV Vanila Amadeu Communication Manager - Reviwed.
No ratings yet
CV - CV Vanila Amadeu Communication Manager - Reviwed.
3 pages
Rystad Energy Selection Process Is Scheduled On 8th November 2024
No ratings yet
Rystad Energy Selection Process Is Scheduled On 8th November 2024
2 pages
Smart Traffic Management Project
No ratings yet
Smart Traffic Management Project
2 pages
Experience Summary: Vijaya Bhaskar P
No ratings yet
Experience Summary: Vijaya Bhaskar P
3 pages
Compiler Design - Lexical Analysis: University of Salford, UK
No ratings yet
Compiler Design - Lexical Analysis: University of Salford, UK
1 page
Time Estimation Gizmo ExploreLearning
No ratings yet
Time Estimation Gizmo ExploreLearning
1 page
From Simple IO to Monad Transformers
From Everand
From Simple IO to Monad Transformers
J Adrian Zimmer
2/5 (1)
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)

Ch3 LexicalAnalysis

Uploaded by

Ch3 LexicalAnalysis

Uploaded by

Language Theory and Automata

Ch3: Lexical Analysis

Academic Year 2023/2024

● Generalities (Lexical analyzer, Interface to lexical analyzer, Role of

● First compilation phase.

Return a character next lexical unit?

● Link error messages from the compiler to the source code.

Typical implementation diagram for the lexical analyzer

● the lexical unit

● A model is a rule describing the strings that correspond to a lexical unit.

● In pattern-driven programming, patterns are expressed by regular expressions.

● An attribute is defined as a pointer to the symbol table entry in which information

Lexical unit Lexemes Informal description of models

const const const

rel_oper < <= == != >= > < <= == != >= >

identifier e pi length Letter followed by letters or numbers or the '_' character

number 3.141 256 0.196 Numerical constants.

literal « stack overflow » Strings enclosed in quotation marks.

Regular Expression Lexical Unit Value of attribute

● An initial state of the diagram.

● Entering a state reads the next character.

ExAmples begin < = return (relop, LE)

> 5 return (relop,EQ)

6 = 7 return (relop, GE)

transition Diagram for relation operators

begin letter other *

transition Diagram for identifiers and key-words

● Separate keywords from identifiers.

Keyword processing (if and else) by automata

> 5 return (relop,EQ)

digit . digit other *

● There are a few principles to observe:

● an accepting state does not consume characters;

● Voracious lexical analysis is the operation used to read the maximum

● Many errors cannot be handled by the lexical analyzer. For example

● Examples of lexical errors:

● reads a source text (sequence of characters): input

⇨ Lexical unit recognition is based on the notion of regular

● define lexical units

● Implementing a diagram with a large number of states by hand is not an easy

⇨ several tools to simplify these tasks: for example, FLEX

● GNU version of Lex

● A .l specification file consists of 4 parts:

Declaring regular definitions

Main block and auxiliary functions in C

Each expi is a regular expression that models a lexical unit.

● Flex regular expressions:

● yyout: write file (default: stdout)

● char yytext []: character array containing the accepted lexeme.

● int yyleng: length of the accepted lexeme.

● install the Flex and Codeblocks tools

1. Start by installing the Flex and CodeBloks tools (development tools

int yywrap(){return 1;}

1. Place the resulting file in 'C:\Program Files\GnuWin32\bin ‘.

1. Modify the previous exercise to display only recognized binary

pairpair (aa|bb)*((ab|ba) (aa|bb)*(ab|ba) (aa|bb)*)*

1. Test babbaaab abbb aabb baabbbb bbaabbba baabbbbab aaabbbba

pairpair (aa|bb)*((ab|ba) (aa|bb)*(ab|ba) (aa|bb)*)*

1. Is there a difference? What difference? Why or why not?

You might also like

pairpair (aa|bb)((ab|ba) (aa|bb)(ab|ba) (aa|bb))

pairpair (aa|bb)((ab|ba) (aa|bb)(ab|ba) (aa|bb))