0% found this document useful (0 votes)

36 views25 pages

2024 CD-Ch02 Lexical Analysis

This document discusses Chapter Two of Compiler Design, focusing on Lexical Analysis and the role of the lexical analyzer in processing source code. It explains the concepts of tokens, lexemes, and patterns, as well as the process of recognizing and recovering from lexical errors. Additionally, it covers finite automata and the conversion from NFA to DFA, highlighting the importance of regular expressions in defining token patterns.

Uploaded by

munyemola

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views25 pages

2024 CD-Ch02 Lexical Analysis

Uploaded by

munyemola

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Wachemo University

Institute of Technology
Department of Computer Science
Course Title: Compiler Design (CoSc4103)
Chapter Two: Lexical Analysis and Lex

By: Tseganesh M.(MSc.)

Subscribe on Yadah Academy YouTube channel

Compiler Design (CoSc4103)
Chapter Two
Lexical Analysis and Lex
2.1. The role of the lexical analyzer
2.2. Token: Specification and Recognition of Tokens
Outline2.3. Lexical Error-Recovery
2.4. Finite Automata: NFA to DFA Conversation
2.5. A typical Lexical Analyzer Generator
By: Tseganesh M.(MSc.)
2.1. The role of the Lexical Analyzer
 Lexical analysis is the first phase of a compiler.
A lexical analyzer is also called a "Scanner".
 The input to a lexical analyzer is the pure high-level code from the preprocessor.

 Main functions of Lexical analyzer

 1
st task: read the given source code from left to right in character-wise and produce a
sequence of tokens that are uses for syntax analysis.
 i.e., the output of lexical analysis is a stream of tokens, which is input to the parser

 2
nd task: is removing any comments and white spaces from source code in the form of blank,
tab, and newline characters.
 Another task: it generates an error messages, if it finds invalid token from the source program.

 It identifies valid lexemes from the program and returns tokens to the syntax analyzer,
one after the other, corresponding to the getNextToken command from the syntax
analyzer
read char Token & token value
Source Lexical To semantic
Parser
program Analyzer analysis
put back char getNextToken
id

Read entire program into memory Symbol table

11/28/202 WCU-CS Compiled by TM. 2
Lexical Analyzer cont’d……
 The lexical analyzer works closely with the syntax analyzer.
 But, there are some Issues/reasons why to separating lexical analysis from parsing
 Simplicity of design

 Improving compiler efficiency

 Enhancing compiler portability (e.g. Linux to Win)

 When you work on Lexical analysis, there are three important terms to know:
 Lexemes, Pattern, and Tokens,

 Token, Pattern, Lexeme

 Lexeme: is a sequence of characters (alphanumeric) in the source program that matches the
pattern of a token.
 Pattern: is a set of rules for every lexeme that the scanner follow to identify a valid token.
 A pattern explains what can be a token, and

 These patterns can be defined by means of regular expressions

 Tokens: are a set of strings defining an atomic element with a defined meaning
 It is a pre-defined sequence of characters that cannot be broken down further

 A token can have a token name and an optional token/attribute value

11/28/202 WCU-CS Compiled by TM. 2

Lexical Analyzer cont’d……
 Some example of tokens, lexemes, and pattern
Token Lexeme Pattern
Keyword while w-h-i-l-e
Relop < <, >, >=, <=, !=, ==
Integer 7 (0 - 9)*-> Sequence of digits with at least one digit
String "Hi" Characters enclosed by " "
Punctuation , ; , . ! etc.
Identifier number A - Z, a - z A sequence of characters and numbers initiated by a
character.

 But, here is some questions which raised from the tasks of LA:
 How does the lexical analyzer read the input string and break it into lexemes?

 How can it understand the patterns and check if the lexemes are valid?

 What does the Lexical Analyzer send to the next phase?

11/28/202 WCU-CS Compiled by TM. 2

2.2. Token: Specification and Recognition of Tokens
 In programming language; keywords, constants, identifiers, strings, numbers, whitespace,
operators, and punctuations are considered as tokens.
 For example, in C or C++ language, the variable declaration line

 int value = 100;

 contains the tokens:

 int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).

 Attributes of Token
 In a program, some times more than one lexeme matches pattern correspond to one token,

 So, Lexical analyzer must provide additional information about the particular lexeme.

 Because, the rest of the phases need additional information about the lexeme to perform
different operations.
 Lexical analyzer collects information about tokens into their associated attributes and sends a
sequence of tokens with their information to the next phase.
 i.e., the tokens are sent as a pair of <Token name, Attribute value> to the Syntax
analyzer

11/28/202 WCU-CS Compiled by TM. 6

Tokens cont’d……
 Example: see the tokens and associated attribute-values for the following FORTRAN statement
 E=M * C** 2 are written below as a sequence of pairs:

 <id, pointer to symbol table entry for E> Token Attribute

 <assign-op> ID Index to symbol table entry E
 <id, pointer to symbol table entry for M> =
ID Index to symbol table entry M
 <mult-op>
*
 <id, pointer to symbol table entry for C>
ID Index to symbol table entry C
 <exp-op>
**
 <number, integer value 2> NUM 2

 A lexeme is like an instance of a token, and the attribute column is used to show which lexeme
of the token is used.

 For every lexeme, the 1st and 2nd columns of the above table are sent to the Syntax Analyzer.

11/28/202 WCU-CS Compiled by TM. 7

Tokens cont’d……
 Specifications of Tokens
 To answer the question “how the lexical analyzer can check the validity of lexemes with
tokens”, it is critical to know the following specifications of tokens:
1) Alphabet
2) Strings
3) Special symbols
4) Language
5) Regular expression
6) etc……
 Let us understand how the language theory undertakes these terms:
1. Alphabets
 Any finite set of symbols

 {0,1} is a set of binary alphabets,

 {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal alphabets,

 {a-z, A-Z} is a set of English language alphabets.

2. Strings
 Any finite sequence of alphabets (characters) is called a string.

 A string over some alphabet is a finite sequence of symbols drawn from that alphabet.
11/28/202 WCU-CS Compiled by TM. 8
Tokens cont’d……
 In language theory, the terms sentence and word are often used as synonyms for the term
"string."
 Length of the string S is the total number of occurrence of alphabets, and it is denoted by |S|

 e.g., the length of the string compiler is 8 and is denoted by |compiler| = 8

 A string having no alphabets, i.e. a string of zero length is known as an empty string and is
denoted by ε (epsilon).
3. Special symbols
 A typical high-level language contains the following special symbols:-

Arithmetic Symbols Addition(+), Subtraction(-), Modulo(%), Multiplication(*), Division(/)

Punctuation Comma(,), Semicolon(;), Dot(.), Arrow(->)
Assignment =
Special Assignment +=, /=, *=, -=
Comparison ==, !=, <, <=, >, >=
Preprocessor #
Location Specifier &
Logical &, &&, |, ||, !
Shift Operator >>, >>>, <<, <<<

11/28/202 WCU-CS Compiled by TM. 9

Tokens cont’d……
4. Language
 Language is considered as a finite set of strings over some finite set of fixed alphabets.

 Computer languages are considered as finite sets, and mathematically set operations can be
performed on them.
 Finite languages can be described by means of regular expressions.

5. Regular Expressions
 Regular expressions are an important notation to specify lexeme patterns for a token.

 Each pattern matches a set of strings, so regular expressions serve as names for a set of
strings.
 Regular expressions are used to represent the language for lexical analyzer
 The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme
that belong to the language in hand.
 It searches for the pattern defined by the language rules.

 A grammar defined by regular expressions is known as regular grammar

 The language defined by regular grammar is known as regular language.
11/28/202 WCU-CS Compiled by TM. 10
Tokens cont’d……
Programming language tokens can be described by regular languages.
 There are a number of algebraic laws that are obeyed by regular expressions, also known as
operations on language
 Operations on languages

 There are several important operations that can be applied to languages.

 Union of two languages L and M is written as;

 L U M = {s | s is in L or s is in M}
 Concatenation of two languages L and M is written as;

 LM = {st | s is in L and t is in M}
 Kleene closure of a language L is written as;

 L* = Zero or more occurrence of language L

 Example: the following example shows the operations on strings:

 Let L={0,1} and S={a,b,c}

 Union : L U S={0,1,a,b,c}

 Concatenation : L.S={0a,1a,0b,1b,0c,1c}

 Kleene closure : L*={ ε,0,1,00….}

 Positive closure : L+={0,1,00….}

11/28/2024 WCU-CS Compiled by TM. 11
Tokens cont’d……
 In lexical analysis by using regular expression it is possible to represent:
i. valid tokens of a language,
ii. occurrences of symbols, and
iii. language tokens;

i. Representing valid tokens of a language in regular expression

 If x is a regular expression, then:

 x* means zero or more occurrence of x.

 i.e., it can generate { e, x, xx, xxx, xxxx, … }

 x+ means one or more occurrence of x.

 i.e., it can generate { x, xx, xxx, xxxx … } or x.x*

 x? means at most one occurrence of x

 i.e., it can generate either {x} or {e}.

 [a-z] is all lower-case alphabets of English language.

 [A-Z] is all upper-case alphabets of English language.

 [0-9] is all natural digits used in mathematics.

11/28/2024 WCU-CS Compiled by TM. 12

Tokens cont’d……
ii. Representation of occurrence of symbols using regular expressions
 letter = [a – z] or [A – Z]

 digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]

 sign = [ + | - ]

iii. Representation of language tokens using regular expressions

?
 Decimal = (sign) (digit)
+

 Identifier = (letter)(letter | digit)*

 However, the only problem left with the lexical analyzer is how to verify the validity of a
regular expression used in specifying the patterns of keywords of a language.
 A well-accepted solution to this problem is use finite automata for verification.

 To recognize and verify the tokens, the lexical analyzer builds Finite Automata for every pattern.

 Transition diagrams can be built and converted into programs as an intermediate step.

 Each state in the transition diagram represents a piece of code.

 Every identified lexeme walks through the Automata.

 The programs built from Automata can consist of switch statements to keep track of the state of the
lexeme. The lexeme is verified to be a valid token if it reaches the final state.
13
2.3. Lexical Error Recovery
 Lexical errors:
 are a type of error can be detected during the lexical analysis phase

 is a sequence of characters that does not match the pattern of any token, which is not
possible to scan into any valid token
 are thrown by the lexer when unable to continue. i.e., if there’s no way to recognize a
lexeme as a valid token.
 Lexical errors are not very common, but it should be managed by a scanner
 Some of common lexical errors in Lexical phase error can be
 Spelling error of identifiers, operators, keyword, etc

 Appearance of some illegal character

 Exceeding length of identifier or numeric constants.

 Remove the character that should be present.

 Replace a character with an incorrect character.

 Transposition of two characters.

11/28/202 14
Lexical Error cont’d……
 Example: see this C code Void main() {
 In this code, 1xab is neither a number nor
int x=10, y=20;
an identifier.
char * a;
 So this code will show the lexical error.
a= &x;
x= 1xab;
}
 Lexical Error recovery: There are some recovery mechanisms to remove lexical errors
 See some of possible error-recovery actions with examples of “cout” are
i. deleting an unnecessary character eg. couttcout
ii. inserting a missing character eg cotcout
iii. replacing an incorrect character by a correct character eg coufcout
iv. transposing two adjacent characters. Eg ocutcout
 However, few errors are out of power of lexical analyzer to recognize, because a lexical analyzer
has a very localized view of a source program. So, some other phase of compiler handle this error
 For instance, if the string fi is encountered in a C++/C program for the first time in the context
of:
 In this code, a lexical analyzer cannot tell whether fi is a misspelling
fi (a == b) … of the keyword if or an undeclared function identifier.
11/28/202 15
2.4. Automata: NFA to DFA Conversation
 Finite automata is a state machine that takes a string of symbols as input and changes its state
accordingly.
 Finite automata is a recognizer for regular expressions.
 When a regular expression string is fed into finite automata, it changes its state for each literal.
 If the input string is successfully processed and the automata reaches its final state, it is
accepted,
 i.e., the string that fed was said to be valid token of the language in hand

 Regular expressions =specify the specification

 Finite automata = implementation

 A finite automaton consists of

 An input alphabet 

 A set of states S

 A start state n

 A set of accepting states F  S

 A set of transitions state 

input state

11/28/202 WCU-CS Compiled by TM. 16

Automata: NFA to DFA cont’d……
 Transition: s1 a s2
 This can be read as; In state s1 on input “a” go to state s2
 If end of input
 If in accepting state => accept, othewise => reject

 If no transition possible => reject

 Finite Automata State Graphs can be build up using

 A state

 The start state

 An accepting state
a
 A transition

 Simple Example: A finite automaton that accepts only “1”

11/28/202 17
Automata: NFA to DFA cont’d……
 A finite automaton accepts a string if we can follow transitions labeled with the characters in the
string from the start to some accepting state
 Another Example: A finite automaton accepting any number of 1’s followed by a single 0
1
 Alphabet: {0,1} 0

 Check that “1110” is accepted with this finite automation

 Exercise: given Alphabet {0,1}, What language will be recognized by this automation machine?
1 0
0 0

1
1
 Epsilon Moves
 Another kind of transition with: -moves

A B  Here a machine can move from state A to state B without
reading input
11/28/202 18
Automata: NFA to DFA cont’d……
 Types of Finite Automata
i. Non-Deterministic Automata (NFA).
ii. Deterministic Automata (DFA)
i. Nondeterministic Finite Automata (NFA)
 Can have multiple transitions for one input in a given state

 Can have -moves

 NFA accepts if it can get in a final state

ii. Deterministic Finite Automata (DFA):

 A DFA is a special case of a NFA in which:-

 It has at most one transition per input from any state

 No -moves, means it has no transitions on input € ,

 DFA formally defined by 5 tuple notation; M = (Q, ∑, δ, qo, F), where

 Q is a finite “set of states”, which is non empty.
 ∑ is “input alphabets”, indicates input set.
 qo is an “initial state” and qo is in Q; ie, qo, ∑, Q F is a set of “Final states”,
 δ is a “transmission function” or mapping function, using this function the next state
can be determined.
11/28/202 WCU-CS Compiled by TM. 19
Automata: NFA to DFA cont’d……

Reading assignment
Execution of Finite Automata ?????
Details of NFA vs. DFA ?????
Regular expression is converted into minimized DFA ?????
Regular Expressions to Finite Automata ????
NFA to DFA ????
Implementation of DFA ????
20

You can refer more and more for detail elaboration

2.5. Lexical Analyzer Generator
 Creating a lexical analyzer with Lex:
 First, a lexical analyzer is prepared by creating a program lex.l in the Lex language.
 Then, lex.l is run through the Lex compiler to produce a C program lex.yy.c.
 Finally, lex.yy.c is run through the C compiler to produce an object program a.out,
 a.out is the lexical analyzer that transforms an input stream into a sequence of tokens.

11/28/202 WCU-CS Compiled by TM. 21

Lexical Analyzer cont’d……
■ Lex Specification: a Lex program consists of three parts:
%{ definitions }% %{
int vowels=0, int cons=0;
%% %}{
%%
{rules } [aeiouAEIOU] {vowels++;}
%% [a-zA-Z] {cons++;}
%%
{ user subroutines } where,
■ Definitions include declarations of variables, constants, and regular definitions
■ Rules are statements of the form p1{action1}p2{action2}… pn{actionn}
■ where pi is regular expression and
■ action describes what action the lexical analyzer should take when pattern pi matches a
lexeme.
■ Actions are written in C code.
■ User subroutines are auxiliary procedures needed by the actions.
■ These can be compiled separately and loaded with the lexical analyzer.

11/28/202 WCU-CS Compiled by TM. 22

Lexical Analyzer cont’d……
■ Consider the following lex program; that count vowels and consonants
%{
int vowels=0;  Steps to executing this 'Lex' program:

int cons=0;  First write the source code in lex editor

%} “EditPlusPortable” or any editor, then
 Tools->'Lex File Compiler'
%%  Tools->'Lex Build'
[aeiouAEIOU] {vowels++;}  Tools->'Open CMD'
[a-zA-Z] {cons++;}  Then in command prompt type
%% 'name_of_file.exe' example->‘lex2.exe‘ and
press enter
int yywrap() {  Then entering your whole input and press enter
return 1;  Final press Ctrl + Z and press Enter., then you
} see the output
main(){
printf(" Enter any string to count vowels and consonats at end press^d\n");
yylex();
printf("no: of vowels are: %d\n",vowels);
printf("no of constants: %d\n",cons);
return 0;
}
11/28/202 WCU-CS Compiled by TM. 23
Lexical Analyzer cont’d……
■ The output for the above program will look like

11/28/202 WCU-CS Compiled by TM. 24

Next class
Chapter 3: Syntax Analysis
3.1. Role of a parser
3.2. Parsing
Outline
3.3. Types of parsing
3.4. Parser Generator: Yacc

Subscribe Yadah Academy on YouTube

Click https://youtube.com/@yadahacademy-
educationalco8575

002chapter 2 - Lexical Analysis
No ratings yet
002chapter 2 - Lexical Analysis
114 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
MCQs - CSE322
100% (1)
MCQs - CSE322
19 pages
5.tokens, Patterns, and Lexemes
No ratings yet
5.tokens, Patterns, and Lexemes
7 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
Lecture3 E
No ratings yet
Lecture3 E
153 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Unit2-Notes Ai Updated
No ratings yet
Unit2-Notes Ai Updated
36 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Flat 1
No ratings yet
Flat 1
16 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
100 pages
Lecture 3 - Lexical Analysis
No ratings yet
Lecture 3 - Lexical Analysis
42 pages
CSC 415 Compiler Design: Lexical Analysis
No ratings yet
CSC 415 Compiler Design: Lexical Analysis
40 pages
Chapter 2 Lexical - Analysis
No ratings yet
Chapter 2 Lexical - Analysis
38 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
Lexical Analysis
No ratings yet
Lexical Analysis
153 pages
Unit 2
No ratings yet
Unit 2
39 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
The Greatest Integer Function
50% (2)
The Greatest Integer Function
4 pages
Lecture 2 10022025 035804pm
No ratings yet
Lecture 2 10022025 035804pm
27 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
Chpater 2 Lexical Analysis
No ratings yet
Chpater 2 Lexical Analysis
48 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
L4 - Lexical Analysis
No ratings yet
L4 - Lexical Analysis
44 pages
1 - Scanning Slides Sanyal Part1
No ratings yet
1 - Scanning Slides Sanyal Part1
22 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
4 LexicalAnalysis
No ratings yet
4 LexicalAnalysis
27 pages
CD Lab Prgms Final
No ratings yet
CD Lab Prgms Final
43 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Week 5-6
No ratings yet
Week 5-6
33 pages
Pushdown Automata Is A Finite Automata With Extra Memory Called - 1
No ratings yet
Pushdown Automata Is A Finite Automata With Extra Memory Called - 1
5 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Slides 01 - Compiler Construction - UET CS - Introduction
No ratings yet
Slides 01 - Compiler Construction - UET CS - Introduction
37 pages
The Optimal Implementation of Functional Programming Languages - Cambridge Tracts in Theoretical Computer Science - Andrea Asperti and Stefano Guerrini - 9780521621120 38538
100% (2)
The Optimal Implementation of Functional Programming Languages - Cambridge Tracts in Theoretical Computer Science - Andrea Asperti and Stefano Guerrini - 9780521621120 38538
403 pages
Chapter-6 - 2023 Resolution
No ratings yet
Chapter-6 - 2023 Resolution
24 pages
Chapter1 Cheat Sheet
67% (3)
Chapter1 Cheat Sheet
2 pages
Lecture 03
No ratings yet
Lecture 03
42 pages
L2 Lexical Analysis
No ratings yet
L2 Lexical Analysis
59 pages
Lark Parser Readthedocs Io en Latest
No ratings yet
Lark Parser Readthedocs Io en Latest
90 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
No ratings yet
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
64 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Flat Unit-Iii
No ratings yet
Flat Unit-Iii
11 pages
Advanced Neural Network: Multiple Choice Questions and Answers
No ratings yet
Advanced Neural Network: Multiple Choice Questions and Answers
39 pages
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
88 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Compiler Design 2
No ratings yet
Compiler Design 2
9 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
CYK Algorithm
No ratings yet
CYK Algorithm
29 pages
03 Lex Analysis
No ratings yet
03 Lex Analysis
61 pages
CS369 StringAlgs PDF
No ratings yet
CS369 StringAlgs PDF
33 pages
Test No 4 Full Syllabus HPSC Subjective - 23270245 - 230924 - 123400
No ratings yet
Test No 4 Full Syllabus HPSC Subjective - 23270245 - 230924 - 123400
5 pages
Module 5 Lexical Analyser
No ratings yet
Module 5 Lexical Analyser
10 pages
AI 08 First-Order Logic
No ratings yet
AI 08 First-Order Logic
20 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Chapter 2 Lexical Analysis (Scanning) Edited
No ratings yet
Chapter 2 Lexical Analysis (Scanning) Edited
46 pages
CD KCS502 Unit 1 B
No ratings yet
CD KCS502 Unit 1 B
12 pages
Logic Workbook Introduction
No ratings yet
Logic Workbook Introduction
4 pages
Unit 6
No ratings yet
Unit 6
109 pages
CS 302 Comprehensive Examination
No ratings yet
CS 302 Comprehensive Examination
2 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Compiler Design Chapter 2
No ratings yet
Compiler Design Chapter 2
14 pages
Lexical Analysis
No ratings yet
Lexical Analysis
38 pages
2 Lex
No ratings yet
2 Lex
45 pages
Discrete Structure For Computer Science
No ratings yet
Discrete Structure For Computer Science
47 pages
04 Boyer Moore v2
No ratings yet
04 Boyer Moore v2
23 pages
L4 - Lexical Analysis (Introduction)
No ratings yet
L4 - Lexical Analysis (Introduction)
11 pages
Lexical Analyzer in Perspective: Parser Source Program Token
No ratings yet
Lexical Analyzer in Perspective: Parser Source Program Token
22 pages
Relational Calculus: We Will Occasionally Use This Arrow Notation Unless There Is Danger of No Confusion
No ratings yet
Relational Calculus: We Will Occasionally Use This Arrow Notation Unless There Is Danger of No Confusion
29 pages
Lesson 3
No ratings yet
Lesson 3
14 pages
Quiz 1 (March 11, 2015)
No ratings yet
Quiz 1 (March 11, 2015)
9 pages
String Builder and String Buffer in Java
No ratings yet
String Builder and String Buffer in Java
4 pages
Turing Machine GK
No ratings yet
Turing Machine GK
31 pages
Goldfarb Descriptions
No ratings yet
Goldfarb Descriptions
6 pages
CSE 20 Homework 1: Solutions
No ratings yet
CSE 20 Homework 1: Solutions
3 pages
Lex-Yacc For Exam
100% (1)
Lex-Yacc For Exam
17 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)

2024 CD-Ch02 Lexical Analysis

Uploaded by

2024 CD-Ch02 Lexical Analysis

Uploaded by

Wachemo University

By: Tseganesh M.(MSc.)

Subscribe on Yadah Academy YouTube channel

 Main functions of Lexical analyzer

Read entire program into memory Symbol table

 Improving compiler efficiency

 Enhancing compiler portability (e.g. Linux to Win)

 Token, Pattern, Lexeme

 These patterns can be defined by means of regular expressions

 A token can have a token name and an optional token/attribute value

11/28/202 WCU-CS Compiled by TM. 2

 What does the Lexical Analyzer send to the next phase?

11/28/202 WCU-CS Compiled by TM. 2

 int value = 100;

 contains the tokens:

 int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).

11/28/202 WCU-CS Compiled by TM. 6

 <id, pointer to symbol table entry for E> Token Attribute

11/28/202 WCU-CS Compiled by TM. 7

 {0,1} is a set of binary alphabets,

 {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal alphabets,

 {a-z, A-Z} is a set of English language alphabets.

 e.g., the length of the string compiler is 8 and is denoted by |compiler| = 8

Arithmetic Symbols Addition(+), Subtraction(-), Modulo(%), Multiplication(*), Division(/)

11/28/202 WCU-CS Compiled by TM. 9

 A grammar defined by regular expressions is known as regular grammar

 There are several important operations that can be applied to languages.

 Union of two languages L and M is written as;

 L* = Zero or more occurrence of language L

 Let L={0,1} and S={a,b,c}

 Kleene closure : L*={ ε,0,1,00….}

 Positive closure : L+={0,1,00….}

i. Representing valid tokens of a language in regular expression

 x* means zero or more occurrence of x.

 i.e., it can generate { e, x, xx, xxx, xxxx, … }

 x+ means one or more occurrence of x.

 i.e., it can generate { x, xx, xxx, xxxx … } or x.x*

 x? means at most one occurrence of x

 i.e., it can generate either {x} or {e}.

 [a-z] is all lower-case alphabets of English language.

 [A-Z] is all upper-case alphabets of English language.

 [0-9] is all natural digits used in mathematics.

11/28/2024 WCU-CS Compiled by TM. 12

iii. Representation of language tokens using regular expressions

 Identifier = (letter)(letter | digit)*

 Each state in the transition diagram represents a piece of code.

 Every identified lexeme walks through the Automata.

 Appearance of some illegal character

 Exceeding length of identifier or numeric constants.

 Remove the character that should be present.

 Replace a character with an incorrect character.

 Transposition of two characters.

 Regular expressions =specify the specification

 A finite automaton consists of

 A set of accepting states F  S

 A set of transitions state 

11/28/202 WCU-CS Compiled by TM. 16

 If no transition possible => reject

 Finite Automata State Graphs can be build up using

 The start state

 Simple Example: A finite automaton that accepts only “1”

 Check that “1110” is accepted with this finite automation

 Can have -moves

 NFA accepts if it can get in a final state

ii. Deterministic Finite Automata (DFA):

 It has at most one transition per input from any state

 No -moves, means it has no transitions on input € ,

 DFA formally defined by 5 tuple notation; M = (Q, ∑, δ, qo, F), where

You can refer more and more for detail elaboration

11/28/202 WCU-CS Compiled by TM. 21

11/28/202 WCU-CS Compiled by TM. 22

int cons=0;  First write the source code in lex editor

11/28/202 WCU-CS Compiled by TM. 24

Subscribe Yadah Academy on YouTube

You might also like