0% found this document useful (0 votes)

13 views31 pages

Chapter 2

The document discusses lexical analysis in compilers, focusing on the goal of partitioning input strings into tokens while removing comments and whitespace. It explains the concept of tokens as the smallest units of syntax in programming languages and outlines the design and implementation of a lexical analyzer. Additionally, it covers regular languages, finite automata, and the differences between deterministic and nondeterministic finite automata, along with examples and notations for regular expressions.

Uploaded by

Siyamregn Yeshidagna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views31 pages

Chapter 2

Uploaded by

Siyamregn Yeshidagna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Compilers

Lexical Analysis
Lexical Analysis
• What is the goal?
if (i ==0)
z=0;
else
z=1;

• The input is just a string of characters:

• If (i==0)\n\tz=0;\nelse\n\tz=1;
• Goal: Partition input string into substrings, remove comments and
whitespaces
• where the substrings are tokens (lexemes)
Token
• Words which are the smallest unit above letters.
• Is the minimal syntax category.
• English: noun, verb, adjective …
• Programming language: Identifier, integer, keyword, whitespace, …
• Tokens correspond to sets of strings
• Identifier: strings of letters or digits, starting with a letter
• Integer: a non-empty string of digits
• Keyword: ”else” or “if” …
• Whitespace: a non-empty sequence of blanks, newlines and tabs.
Contd…
• Tokens classify program substrings according to its role
• The output of a lexical analysis is a stream of tokens.
• Parser relies on token distinction.
• Identifier, is treated differently than a keyword
Designing a lexical analyser
• Define a finite set of tokens
• Tokens describe all items of interest
• Choice of tokens depends on language, design of parser …
• Recall
• \tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
• Useful tokens for this expression:
• Integer, Keyword, Relation, Identifier, Whitespace, (, ), =, ;
• N.B., (, ), =, ; are tokens, not characters, here
• Next step is to Describe which substrings belong to each token.
Implementation
• An implementation is responsible for two things.
• Recognize substrings corresponding to tokens accurately
• Return the value or lexeme (substring) of the token.
• First it discards unneeded tokens which won’t contribute to parsing
• Whitespaces and comments.

if (i ==0) //if clause

z=0;
if (i == 0)\n\tz=0;\nelse\n\tz=1;
else /*else clause is located here*/
z=1;
Some examples
• C++
• Most are easily done.
• In Template syntax : Foo<Bar>
• Stream syntax: Cin >> var;
• When there is nested templates occur, there is a conflict: FOO<Bar<Bazz>>
• Is if two variables I and f?
• Is == two equal signs = = or ?
Solution
• Left-to-right scan
• lookahead sometimes required.
Regular languages
• Are one of the several formalisms for specifying tokens.
• Regular languages are simple and useful theory
• Easy to understand
• Efficient implementation
• Definition: Let Σ be a set of characters. A language over Σ is a set of
strings of characters drawn from Σ.
Examples of languages

English Programming language

• Alphabet = characters • Alphabet = ASCII
• Language = Sentences • Language = programs
Notations
• Languages are sets of strings.

• Need some notation for specifying which sets we want

• The standard notation for regular languages is regular expressions.

Regular expressions
• Single character : ‘c’ ={“c”}
• Epsilon: ε ={“”}
• Union A+B ={ s| s ∈A or s ∈B}
• Concatenation AB = {ab | a ∈A and b ∈A}
• Iteration A* = where = AAA… i times.
Regular expressions
• Definition: The regular expressions over Σ are the smallest set of
expressions including
• ε
• ‘c’ where c ∈ Σ
• A+B where A, B are rexp over Σ
• AB “ “ “
• A* Where A is a rexp over Σ
• A? Zero or one instance of A
• A+ One or more instance
Examples
• Keywords: “else” or “if” or …
• ‘else’ + ‘if’ …
• ‘else’ abbreviates as ‘e’ ‘l’ ‘s’ ‘e’
• Integer: a non-empty string of digits
• Digit = ‘0’ +'1’ +'2’ +'3’ +'4’ +'5’ +'6’ +'7’ +'8’ +’9’
• Integer = digit digit*
• Abbreviation: = AA*
• Identifier: strings of letters or digits, starting with a letter
• Letter = ‘A’ + … + ‘z’ +’a’+….+’z’
• Identifier = letter (letter + digit)*
• Whitespace: a non empty sequence of blanks, newlines, and tabs
Examples
• Phone Number
• +251-911-00 00 00
• Σ = digits U { -, +, ‘ ‘}
• Email Address
• [email protected]

• There are regular expressions everywhere.

• Everything discussed so far is Syntax not semantics (meaning).
Last lecture
• What is Lexical Analysis?
• What are Tokens?
• Why did we need to have regular languages?
• Write a regular expression for your ID Numbers.
Finite Automata
• Is a simple idealized machine used to recognize patterns within input taken
from some character set.
• Also known as Transition table.
• The job of a FA is to accept or reject an input depending on whether the
pattern defined by the FA occurs in the input.
• Consists of
• An input alphabet Σ
• A set of states S
• A start state
• A set of accepting states F ⊆ S
• A set of transitions
Finite Automata
• Transition S1 a S2
• In state S1 on input “a” go to state S2
• If end of input and in accepting state => accept
• Otherwise reject
Finite state graphs
• A state

• The start state

• An accepting state

a
• A transition
Example
• A finite state automata accepting any number of 1’s followed by a
single 0
Epsilon moves
• Another kind of transition: ε-moves

• Machine can move from state A to B without reading an input.

NFA and DFA
• There are two types
• Deterministic finite automata (DFA)
• Have for every state, exactly one leaving edge with a given non empty input.
• i.e one transition per input per state and no ε-move
• Is completely determined by input
• Nondeterministic finite automata (NFA)
• Have no restrictions on the labels of their edges.
• A state can label several edges out of the same state and ε-move is possible
• Machine can choose whether to make ε-move, which of multiple transitions
of a single input to take.
NFA and DFA (Cont.)

NFA DFA
Reg to FA
• Some additional notations in Reg ex
• Union A+B = A|B
• Option (zero or one): A+ ε = A?
• Range ‘a’+’b’+…+’z’ = [a-z]
• Excluded range: complement of [a-z]= [^a-z]
• Two ways of implementing.
• Regular expression => NFA = > DFA => Table-driven implementation
• Can be done intuitively
First method
• For each kind of rexp, define an NFA notation
• For ε

• For input a

• For AB

• For A+B

• For A*
Example
• Perform the following for the regExp -> NFA
• (1+0)*1
NFA to DFA
• Each state of DFA is a non-empty state of states in NFA
• Start state
• Set of NFA states reachable through ε-moves from NFA start state
• Add a transition S S’ ato DFA iff
• S’ is the set of NFA states reachable from any state in S after seeing the input
a, considering ε-moves as well
• Note that NFA may be in many states at any time.
Example
Second method
• What does the following rexp represent
• Digit = 0|1|…|9
• Digits = digit+
• Digits(.digits)?(e[+|-]?digits)?
• Perform the DFA imperically
Solution

digit
digit digit

digit *
digit . digit E +|- Other
1 2 3 4 5 6 7 8

digit

E
Reading assignment
• Error recovery
• Buffered I/O for token detection and Buffered I/O with Sentinels
• 2D table implementation of a DFA

CSBS-Scheme-Syllabus-2023-24 (3rd & 4th Sem)
No ratings yet
CSBS-Scheme-Syllabus-2023-24 (3rd & 4th Sem)
52 pages
Lexical Analyzer Project Report
33% (3)
Lexical Analyzer Project Report
24 pages
Formal Languages and Chomsky Hierarchy
No ratings yet
Formal Languages and Chomsky Hierarchy
36 pages
Compiler Unit 1 Notes
No ratings yet
Compiler Unit 1 Notes
23 pages
B.Tech, CS&E-CS, 5th Sem, 2018-19 Batch
No ratings yet
B.Tech, CS&E-CS, 5th Sem, 2018-19 Batch
20 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Compiler Construction Lecture 3-4
No ratings yet
Compiler Construction Lecture 3-4
78 pages
OS 6 MemoryManagement
No ratings yet
OS 6 MemoryManagement
48 pages
Lexi Cal A Analyzer
No ratings yet
Lexi Cal A Analyzer
38 pages
Lecture 04
No ratings yet
Lecture 04
37 pages
Automata Theory Module 1
100% (2)
Automata Theory Module 1
85 pages
TOC Question Bank
No ratings yet
TOC Question Bank
5 pages
Chapter 2 - 2022
No ratings yet
Chapter 2 - 2022
51 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
Chapter 2
No ratings yet
Chapter 2
99 pages
Chapter 1 - 2022
No ratings yet
Chapter 1 - 2022
82 pages
Unit-1 F&CD
No ratings yet
Unit-1 F&CD
31 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
CMP 335 Regular Expression Exercises Note
No ratings yet
CMP 335 Regular Expression Exercises Note
18 pages
PLDI Week 06 Parsing
No ratings yet
PLDI Week 06 Parsing
55 pages
CD ch2
No ratings yet
CD ch2
104 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
52 pages
File 1675742677 110405 LexicalAnalysis-Continue1
No ratings yet
File 1675742677 110405 LexicalAnalysis-Continue1
39 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
55 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
CSE32
No ratings yet
CSE32
27 pages
Slides CHP 3 and 4
No ratings yet
Slides CHP 3 and 4
21 pages
Lexical Analysis All Token List and Diffence
No ratings yet
Lexical Analysis All Token List and Diffence
4 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
Lecture2 Web
No ratings yet
Lecture2 Web
19 pages
Recognition of Tokens
No ratings yet
Recognition of Tokens
34 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
2 Lexical Analizer
No ratings yet
2 Lexical Analizer
56 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Toc Test
No ratings yet
Toc Test
3 pages
CSE 3yr Syllabus 240920
No ratings yet
CSE 3yr Syllabus 240920
40 pages
Embedded System
No ratings yet
Embedded System
6 pages
CD - Unit1 - Lecture4 5 6 7
No ratings yet
CD - Unit1 - Lecture4 5 6 7
50 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
34 pages
Solution
No ratings yet
Solution
16 pages
Flat Unit-4
No ratings yet
Flat Unit-4
24 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Assignment-1 - Theory of Automata and Formal Languages
No ratings yet
Assignment-1 - Theory of Automata and Formal Languages
3 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Lect 03
No ratings yet
Lect 03
19 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
CC14T
No ratings yet
CC14T
3 pages
Lecture 3 (30-1-23)
No ratings yet
Lecture 3 (30-1-23)
11 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
1 Tcs Tutorial
No ratings yet
1 Tcs Tutorial
2 pages
Third - Year - Information Technology - 2019 - Course - 09.07.2021
No ratings yet
Third - Year - Information Technology - 2019 - Course - 09.07.2021
114 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Fuzziness in Automata Theory
No ratings yet
Fuzziness in Automata Theory
8 pages
TE Computer Syllabus 2015 Course-3-4-17 - 3-5-17 PDF
No ratings yet
TE Computer Syllabus 2015 Course-3-4-17 - 3-5-17 PDF
64 pages
NFA To DFA Conversion (Subset Construction Method) : Dept. of Computer Science Faculty of Science and Technology
No ratings yet
NFA To DFA Conversion (Subset Construction Method) : Dept. of Computer Science Faculty of Science and Technology
23 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
51 pages
Lec02 Lexicalanalyzer
100% (1)
Lec02 Lexicalanalyzer
50 pages
Notes 04 String Matching
No ratings yet
Notes 04 String Matching
96 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
CompilerD L3
No ratings yet
CompilerD L3
36 pages
SLD 2
No ratings yet
SLD 2
67 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Compiler Construction Lecture Notes
No ratings yet
Compiler Construction Lecture Notes
27 pages
Automata Chapter 1 2 &3
No ratings yet
Automata Chapter 1 2 &3
36 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
Theoretical Computer Science
No ratings yet
Theoretical Computer Science
24 pages
Gate Questions On Finite Automata - Theory-of-Computation - AcademyEra PDF
No ratings yet
Gate Questions On Finite Automata - Theory-of-Computation - AcademyEra PDF
13 pages
Finite Automata
No ratings yet
Finite Automata
46 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Intro To Compilers Lecture 2
No ratings yet
Intro To Compilers Lecture 2
15 pages
Compilers - Week 2
No ratings yet
Compilers - Week 2
14 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
2 Lex
No ratings yet
2 Lex
45 pages
Lexical Analysis: Textbook:Modern Compiler Design
No ratings yet
Lexical Analysis: Textbook:Modern Compiler Design
43 pages
Compiler Construction Notes
No ratings yet
Compiler Construction Notes
21 pages
CS402 Grand Quiz Mega File Made by Ans Mughal - Files Preparation Group
50% (2)
CS402 Grand Quiz Mega File Made by Ans Mughal - Files Preparation Group
71 pages
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Theory of Computation
No ratings yet
Theory of Computation
6 pages
Lexical Analysis
No ratings yet
Lexical Analysis
47 pages
Compiler
No ratings yet
Compiler
60 pages
First Midterm Examination: CENG 491 - Formal Languages and Automata
No ratings yet
First Midterm Examination: CENG 491 - Formal Languages and Automata
30 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Formal Language Theory
No ratings yet
Formal Language Theory
69 pages
EC Cryptography Tutorials - Herong's Tutorial Examples
From Everand
EC Cryptography Tutorials - Herong's Tutorial Examples
Herong Yang
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)

Chapter 2

Uploaded by

Chapter 2

Uploaded by

Compilers

• The input is just a string of characters:

if (i ==0) //if clause

English Programming language

• Need some notation for specifying which sets we want

• The standard notation for regular languages is regular expressions.

• There are regular expressions everywhere.

• The start state

• Machine can move from state A to B without reading an input.

You might also like