Lecture 2.76

The lecture covers the fundamentals of lexical analysis in compiler construction, including the role of lexical analyzers, token specification, and the use of finite automata. It explains concepts such as lexemes, tokens, symbol tables, and the process of tokenization through examples and code snippets. Additionally, it introduces regular expressions for defining tokens and the implementation of finite automata for language acceptance.

Uploaded by

javeria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views31 pages

Lecture 2.76

Uploaded by

javeria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

CS4031

Compiler Construction
Lecture 2
Mahzaib Younas
Lecturer, Department of Computer Science
FAST NUCES CFD
Outlines
• The role of lexical analyzer
• Input Buffering
• Specification of tokens
• Recognition of tokens
• Lexical Analyzer Generator Lex
• Finite Automata
• Design lexical Analyzer generator
• Optimization of DFA based pattern mactches
Lexical Analysis
• The main task of the lexical analyzer is to read the input characters of
the source program, group them into lexemes, and produce as output a
sequence of tokens for each lexeme in the source program.

It takes the modified source code from language preprocessors that are
written in the form of sentences. The lexical analyzer breaks these
syntaxes into a series of tokens, by removing any whitespaces or
comments in the source code.
Lexeme
• A lexeme is a sequence of source code that matches one of the
predefined patterns and thereby forms a valid token.
• Example:
• int c = 5;
Lexeme Tokens
Int Keyword
C Identifier
= Assignment operator
5 constant
; symbol
Pattern
• A pattern is a description of the form that the lexemes of a token
may take. In the case of a keyword as a token, the pattern is just the
sequence of characters that form the keyword.
• For identifiers and some other tokens, the pattern is a more complex
structure that is matched by many strings
Tokens
A token is a pair consisting of a token name and an optional attribute value.
Partition input string into substring, and classify according to the rule

• Identifier x, y11, maxsize

• Keywords if else while for
• Integers 2 1000
• Floats 2.0 1000.0
• Symbols +)(><
• Strings “enter x” “error”
Interaction between lexical Analyzer and
parser
Symbol Table
• A symbol table is one of the most important data structures within a
compiler, where all the identifiers used in a program are stored along
with their type, scope, and memory locations.
Example of symbol table
• int semester;
• char x[ ] = “compiler construction”

Name Type Size Dimension Line Of Line Of Address

Declaration Usage
semester int 2 0 - - -
x char 20 1 - - -
To see how these concepts are used in practice, in the C statemen

printf("Total = %d\n", score);

• both printf and score are lexemes matching the pattern for token id,
and "Total = %d\n" is a lexeme matching literal.
Ad-hoc Lexer
• Ad-Hoc means using the concept of already known lanaguges.
• Hand-write code to generate tokens.
• Partition the input string by reading left-to-right.
• Recognize one token at a time.
• Ad-hoc Lexer required Look-a-head
• LOOK A HEAD
It used to check where one token end and next token begins.
Example: simply we create a class which have
ability to make the token of input stream.
class scanner
{
Inputstream s;
char next; //look ahead
Lexer(Inputstream _s)
{
s = _s;
next = s.read();
}
Example: simply we create a class which have
ability to make the token of input stream.
class scanner
{
Inputstream s;
//as we know inputstream is function in
C++ which used for I/O.
char next; //look ahead
Lexer(Inputstream _s)
{
s = _s;
next = s.read();
}
Example: simply we create a class which have
ability to make the token of input stream.
class scanner
{
Inputstream s;
char next; //look ahead
Lexer(Inputstream _s) //constructer of the class
{
s = _s; //s is used to store the input stream
next = s.read();
}
How to perform the Tokenization via
Program.
• Here we declare the method of class

Token nextToken() {
if( idChar(next) )
return readId();
if( number(next) )
return readNumber();
if( next == ‘”’ )
return readString();
...
...
How to perform the Tokenization via
Program.
• Here we declare the method of class

Token nextToken() {
if( idChar(next) )
return readId(); // if the letter is identifier mean follow the rule of
identifier
if( number(next) )
return readNumber();
if( next == ‘”’ )
return readString();
...
...
How to Make the token of Identifier
Token readId() {
string id = “”;
while(true){
char c = input.read();
if(idChar(c) == false)
return
new Token(TID,id);
id = id + string(c);
}
}
Ad-Hoc Lexer using C++

• Identifier x, y11, maxsize

• Keywords if else while for
• Integers 2 1000
• Floats 2.0 1000.0
• Symbols +)(><
• Strings “enter x” “error”
Firstly create the function for keywords
int isKeyword(char buffer[]) {
char keywords[32][10] = {

"auto","break","case","char","const","continue","default","do","double","else","enum","extern","float
","for","goto", "if","int","long","register","return","short","signed",
"sizeof","static","struct","switch","typedef","union","unsigned","void","volatile","while" };
int i, flag = 0;
for (i = 0; i < 32; ++i) {
if (strcmp(keywords[i], buffer) == 0) {
flag = 1;
break;
}}
return flag;
}
To check the operator
//decalration of operator
operators[] = "+-*/%=";
for (i = 0; i < 6; ++i) {
if (ch == operators[i])
cout << ch << " is operator\n";
}
Check identifiers
if (isalnum(ch)) {
buffer[j++] = ch;
}
else if ((ch == ' ' || ch == '\n') && (j != 0)) {
buffer[j] = '\0';
j = 0;
if (isKeyword(buffer) == 1)
cout << buffer << " is keyword\n";
else
cout << buffer << " is indentifier\n";
}
}
How to describe the tokens?
Regular languages are the most popular for specifying the tokens.

1. Simple and useful theory

2. Easy to understand
3. Efficient implementations
Languages
• Let S be a set of characters. S is called the alphabet.
• A language over S is set of strings of characters drawn from S.
Notations
• Languages are sets of strings (finite sequence of characters)
• Need some notation for specifying which sets we want
• For lexical analysis we care about regular languages.
• Regular languages can be described using regular expressions.
Regular languages
• Each regular expression is a notation for a regular language (a set of
words).
• If A is a regular expression, we write L(A) to refer to language
denoted by A.
• A regular expression (RE) is defined inductively
a ordinary character from S
e the empty string
Basics of Regular expression
Functionalities Purpose
R|S Either R or S
RS R followed by S
R* Concatenation of R zero or more
time
R? E|R (Zero or one R)
R+ RR* (one or more R)
(R) R ( grouping)
Example
Integers Identifiers
A non-empty string of digits String or letter or digits starting
with a letter
• Digits = 0|1|2|3|4|5|6|7|8|9|10
• Integer = digit digit*
• C identifier
• a–z
• A–Z
•0-9
How to use RE?
• We need mechanism to determine if an input string w belongs to L(R),
the language denoted by regular expression R. Such a mechanism is
called acceptor.
input w
string yes, if w e L
acceptor
no, if w e L
language L
Requirement
• Specification
Regular Expression

• Implementation
Finite Automata
Finite Automata
Finite Automaton consists of
• An input alphabet (S)
• A set of states
• A start (initial) state
• A set of transitions
• A set of accepting (final) states
Finite Automata
• A finite automaton accepts a string if we can follow transitions
labelled with characters in the string from start state to some
accepting state.
• FA Example: A FA that accepts any number of 1’s followed by
signle 0.

Lexical Analysis
No ratings yet
Lexical Analysis
88 pages
Unit 1
No ratings yet
Unit 1
34 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Unit 2-Introduction To Compilers
No ratings yet
Unit 2-Introduction To Compilers
51 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Ads Lab Record
No ratings yet
Ads Lab Record
40 pages
Slides CHP 3 and 4
No ratings yet
Slides CHP 3 and 4
21 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
CD ch2
No ratings yet
CD ch2
104 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Compiler - Lexical Analyzer-2
No ratings yet
Compiler - Lexical Analyzer-2
16 pages
Chapter 2
No ratings yet
Chapter 2
36 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
4-Intro To Flex and Bison-09!09!2024
No ratings yet
4-Intro To Flex and Bison-09!09!2024
28 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
31 pages
Compiler Design
No ratings yet
Compiler Design
42 pages
Lexical Analysis
No ratings yet
Lexical Analysis
153 pages
Maths Project
No ratings yet
Maths Project
14 pages
1 - Scanning Slides Sanyal Part1
No ratings yet
1 - Scanning Slides Sanyal Part1
22 pages
2 Lexing
No ratings yet
2 Lexing
73 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Lec 02
No ratings yet
Lec 02
17 pages
Assignment I Template
No ratings yet
Assignment I Template
79 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
2 - Scanner
No ratings yet
2 - Scanner
49 pages
2-Lexical Analysis Part1
No ratings yet
2-Lexical Analysis Part1
39 pages
CD Lab Manual
No ratings yet
CD Lab Manual
48 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
The Structure of A Compiler: Any Compiler Must Perform Two Major Tasks
No ratings yet
The Structure of A Compiler: Any Compiler Must Perform Two Major Tasks
57 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Tuples
No ratings yet
Tuples
36 pages
Side - C - Crash - Course - Berkley
No ratings yet
Side - C - Crash - Course - Berkley
33 pages
Lexical Analysis 3
No ratings yet
Lexical Analysis 3
27 pages
Unit 1
No ratings yet
Unit 1
17 pages
03 Lex Analysis
No ratings yet
03 Lex Analysis
61 pages
723 Seminar Report
No ratings yet
723 Seminar Report
24 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Design Programining Logic
No ratings yet
Design Programining Logic
21 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Section08 Sorting
No ratings yet
Section08 Sorting
5 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
No ratings yet
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
64 pages
COS 320 Compilers: David Walker
No ratings yet
COS 320 Compilers: David Walker
38 pages
BLANKDTR
No ratings yet
BLANKDTR
2 pages
What Is Karnaugh Map-Assignment
No ratings yet
What Is Karnaugh Map-Assignment
3 pages
Sign Language To Text Converter
No ratings yet
Sign Language To Text Converter
16 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
4 Static Dynamic Scope
No ratings yet
4 Static Dynamic Scope
3 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
LAB8 DSA W23 Open Ended
No ratings yet
LAB8 DSA W23 Open Ended
5 pages
CPP MidTerm
No ratings yet
CPP MidTerm
8 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
2.3 Queues
No ratings yet
2.3 Queues
24 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Execution (Computing)
No ratings yet
Execution (Computing)
4 pages
Toc A
No ratings yet
Toc A
4 pages
Heart Disease Prediction Using Machine Learning Techniques: Abstract
No ratings yet
Heart Disease Prediction Using Machine Learning Techniques: Abstract
5 pages
A Cluster-Based Optimization Framework For Vehicle Routing Problem With Workload Balance
No ratings yet
A Cluster-Based Optimization Framework For Vehicle Routing Problem With Workload Balance
14 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Lexical Analysis: Programming Languages Translators
No ratings yet
Lexical Analysis: Programming Languages Translators
21 pages
Compiler
No ratings yet
Compiler
60 pages
Manual SIWAREX WP521 WP522 en - PDF Page 108
No ratings yet
Manual SIWAREX WP521 WP522 en - PDF Page 108
1 page
Unit 3 Ci 2017
No ratings yet
Unit 3 Ci 2017
25 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Lexical Analyzer: Using Flex by Dr. S. M. Farhad
No ratings yet
Lexical Analyzer: Using Flex by Dr. S. M. Farhad
22 pages
212CSE3303-Operating Systems Lab Manual (2023-2024)
No ratings yet
212CSE3303-Operating Systems Lab Manual (2023-2024)
99 pages
Data Structure and Algorithms Unit-2 Strings
No ratings yet
Data Structure and Algorithms Unit-2 Strings
20 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
56 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Disk Scheduling Algorithms: - Operating System Computer Mansi Mandaviya Mansi Karia 4 (
No ratings yet
Disk Scheduling Algorithms: - Operating System Computer Mansi Mandaviya Mansi Karia 4 (
15 pages
DLD MCQS
No ratings yet
DLD MCQS
2 pages
Sem3 Makut
No ratings yet
Sem3 Makut
16 pages
MCQ-Python FILE HANDLING-QB
No ratings yet
MCQ-Python FILE HANDLING-QB
17 pages
Data Stage Scenarios: Scenario1. Cummilative Sum
No ratings yet
Data Stage Scenarios: Scenario1. Cummilative Sum
13 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)

Lecture 2.76

Uploaded by

Lecture 2.76

Uploaded by

CS4031

• Identifier x, y11, maxsize

Name Type Size Dimension Line Of Line Of Address

printf("Total = %d\n", score);

• Identifier x, y11, maxsize

1. Simple and useful theory

You might also like