0% found this document useful (0 votes)

17 views64 pages

Chapter2-Lexical Analysis

Uploaded by

Boi Phúc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views64 pages

Chapter2-Lexical Analysis

Uploaded by

Boi Phúc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Chapter 2 – Lexical Analysis

Compiler
• Compiler translates from one language to another

Source code Front End Back End Target code

• Front End: Analysis

• Takes input source code
• Returns Abstract Syntax Tree and symbol table
• Back End: Synthesis
• Takes AST and symbol table
• Returns machine-executable binary code, or virtual machine code
Front End

Lexical Syntax Semantic

Analysis Analysis Analysis

• Lexical Analysis: breaks input into individual words – “tokens”

• Syntax Analysis: parses the phrase structure of program
• Semantic Analysis: calculates meaning of program
The role of the Lexical Analysis

-> read the input characters of the source program

-> group them into lexemes
-> produce as output a sequence of tokens for each lexeme in the source
program
Lexing & Parsing
• From strings to data structures

Abstract
Strings/Files Tokens
Syntax Trees

Lexing Parsing
Interactions between the lexical analyzer
and the parser
Tokens, Patterns and Lexemes
• A pattern is a description of the form that the lexemes of a token may take
(the set of rule that define a TOKEN).

• A lexeme is a sequence of characters in the source program that matches

the pattern for a token and is identfied by the lexical analyzer as an instance of that
token

• A token is a pair consisting of a token name and an optional attribute

value.
• Common token names are
• identifiers: names the programmer chooses
• keywords: names already in the programming language
• separators (also known as punctuators): punctuation characters and paired-delimiters
• operators: symbols that operate on arguments and produce results
• literals: numeric, logical, textual, reference literals
• ………..
Tokens, Patterns and Lexemes
• Consider this expression in the programming language C:
sum=3+2;
• Tokenized and represented by the following table:
Lexeme Token Name
sum Identifier
= Operator
3 Literal
+ Operator
2 Literal
; Seperator
Tokens, Patterns and Lexemes
Lexeme Token Name
if (y <= t) y = y - 3; if Keyword
( Open parenthesis
y Identifier
<= Comparison operator
t Identifier
) Close parenthesis
y Identifier
= Assignment operator
y Identifier
- Arithmatic operator
3 Integer
; semicolon
Attributes for Tokens
• When more than one lexeme can match a pattern, the lexical analyzer
must provide the subsequent compiler phases additional information
about the particular lexeme that matched.

• For example, the pattern for token number matches both 0 and 1, but it is
extremely important for the code generator to know which lexeme was
found in the source program.
Tokens, Patterns and Lexemes
cout << 3+2+3;
Lexeme The following tokens are returned by
scanner to parser in specified order
cout <identifier, ‘cout’>
<< <operator, ‘<<‘>
3 <literal, ‘3’>
+ <operator, ‘+’>
2 <literal, ‘2’>
+ <operator, ‘+’>
3 <literal, ‘3’>
; <punctuator, ‘;’>
Tokens
if (num1 == num2)
result = 1;
else
result = 0;

\tif (num1 == num2)\n\t\tresult = 1;\n\telse\n\t\tresult = 0;

Tokens
• Token class
• In English: noun, verb, adjective, …..

• In a programming language: identifier, keyword, (, ), number, …

Tokens
• Token classes correspond to sets of strings.

• Identifier:
- Identifiers are strings of letters, digits, and underscores, starting with a letter or an
underscore
num1, result, name20, _result, …..
• Integer:
- A non-empty string of digits
10, 89, 001, 00, …….
• Keyword:
- A fix set of reserved words
if, else, for, while, ….
• Whitespace:
- A non-empty sequence of blanks, newlines, and tabs
Lexical Analysis

Tokens Abstract
Strings/Files
<name, attribute> Syntax Trees

Lexing Parsing
Lexical Analysis

<id, ‘result’> Abstract

result=50 <op, ‘=‘>
Syntax Trees
<int, ’50’>

Lexing Parsing
Lexical Analysis
\tif (num1 == num2)\n\t\tresult = 1;\n\telse\n\t\tresult = 0;

=> Go through and identify the tokens of the substrings.

Whitespace: A non-empty sequence of blanks, newlines, and tabs

Keywords: A fix set of reserved words
Identifiers: Identifiers are strings of letters, digits, and underscores, starting with a letter or an
underscore
Numbers
Operators
OpenParenthesis
CloseParenthesis
Semicolon
Lexical Analysis: Regular expression
• Lexical structure = token classes

• Token classes correspond to sets of strings.

- Use regular expressions to specify which set of strings belongs to each token class
Lexical Analysis: Regular expressions
• Single character
‘a’ = {“a”}
• Epsilon
ε = {“”}
• Union
A + B = {a | a∈A} ∪ {b | b ∈B}
• Concatenation
AB = {ab | a∈A ∧ b ∈B}
• Iteration
A* = 𝑖≥0 𝐴𝑖 , 𝐴𝑖 = A……….A (i times)
𝐴0 = ε
Lexical Analysis: Regular expressions
• The regular expression over Σ are the smallest set of expressions including

R = ε
| ‘a’ where c ∈ Σ
| A+B where A, B are regular expressions over Σ
| AB where A, B are regular expressions over Σ
| A* where A is a regular expression over Σ
Lexical Analysis: Regular expressions
Σ = {0, 1}

𝑖
1* = 𝑖≥0 1 = ε + 1 + 11 + 111 + 1111 + ……..

(1+0)1 = {ab | a ∈ 1+0 ∧ b ∈ 1} = 11 + 01

0* + 1* = {0^𝑖 | 𝑖≥0} ∪ {1^𝑖 | 𝑖≥0} = ε + 0 + 00 + 000 + 0000 + ……….

+ ε + 1 + 11 + 111 + 1111 + ……..
(0+1)* = 𝑖≥0(0 + 1)𝑖
= ε + (0+1) + (0+1) (0+1) + (0+1) …… (0+1)
= all strings of 0’s and 1’s
= Σ*
Lexical Analysis
Meaning function L maps syntax to semantics

L(e) = M

Regular Set of
expression strings
L(regular_expression)
L(regular_expression) -> set of strings

‘a’ = {“a”} => L(‘a’) = {“a”}

ε = {“”} => L(ε) = {“”}
A+B=A∪B => L(A + B) = L(A) ∪ L(B)
AB = {ab | a∈A ∧ b ∈B} => L(AB) = {ab | a∈L(A) ∧ b ∈L(B)}
A* = 𝑖≥0 𝐴𝑖 , => L(A*) = 𝑖≥0 𝐿(𝐴𝑖 )
Regular Expression
• keyword: A fix set of reserved words (“if” or “else” or “for” or …..)
Regular expression for if: ‘i’’f’
Regular expression for else: ‘e’’l’’s’’e’
Regular expression for for: ‘f’’o’’r’

Regular expression for keyword:

‘i’’f’ + ‘e’’l’’s’’e’ + ‘f’’o’’r’ + ……….
=> ‘if’ + ‘else’ + ‘for’ + ……….
Regular Expression
• Integer: a non-empty string of digits

- regular expression for the set of strings corresponding to all the single
digits

digit = ‘0’ + ‘1’ + ‘2’ + ‘3’ + ‘4’ + ‘5’ + ‘6’ + ‘7’ + ‘8’ + ‘9’

integer = digit digit* = digit+

Identifier: strings of letters, digits, and underscores, starting with a letter or
an underscore.

digit = ‘0’ + ‘1’ + ‘2’ + ‘3’ + ‘4’ + ‘5’ + ‘6’ + ‘7’ + ‘8’ + ‘9’
= [0-9]
letter_ = [a-zA-Z_]
identifier = letter_(letter_ + digit)*
Whitespace: a non-empty sequence of blanks, newlines, and tabs

whitespace = (‘ ‘ + ‘\n’ + ‘\t’)+

[email protected]

=> Make regular expression for this email address:

letter+’@’letter+’.’letter+’.’letter+
Regular Expression
• At least one: AA*  A+

• Union: A|B A+B

• Option: A+ε  A?

• Range: ‘a’ + ‘b’ + …+ ‘z’  [a-z]

• Excluded range: complement of [a-z]  [^a-z]

Number in Pascal: A floating point number can have some digits, an

optional fraction and an optional exponent (3.15E+10, 8E-3, 15.6, …)

digit = ‘0’+’1’+’2’+’3’+’4’+’5’+’6’+’7’+’8’+’9’
digits = digit+
opt_fraction = (‘.’digits) + ε = (‘.’digits)?
opt_exponent = (‘E’(‘+’ + ’-’ + ε)digits) + ε = (‘E’(‘+’ + ‘-’)?digits)?
num = digits opt_fraction opt_exponent
Regular Expression
• Regular expressions describe many useful languages

• Regular languages are a language specification

• We still need an implementation
Regular Expressions => Lexical Spec
1. Write a regular expressions for the lexemes of each token class
• number = digit+
• keyword = ‘if’ + ‘else’ + …
• identifier = letter_(letter_ + digit)*
• openPar = ‘(‘
• closePar = ‘)’
• ………..

2. Construct R, matching all lexemes for all tokens

R = keyword + identifier + number + …..
= R1 + R2 + ….
• (This step is done automatically by tools like flex)
3. Let input be x1 ….xn
For 1  i  n check x1…..xi L(R) ?

4. If success, then we know that

x1…..xi L(Rj) for some j

R = R1 + R2 + R3 + …..

5. Remove x1 ….xn from input and go to (3)

How much input is used?

If x1 ….xi L(R)
And x1 ….xj L(R)
ij

Rule: Pick longest possible string in L(R)

– Pick k if k > i
– The “maximal munch”
Which token is used?
x1 ….xi L(Rj)
x1 ….xi L(Rk) => which token is used?

Keywords = ‘if’ + ‘else’ + ….

Identifiers = letter(letter + digit)*

if L(Keywords)
if L(Identifiers)
=> Choose the rule listed FIRST.
• What if no rule matches?
x1 ….xi L(R)

Error = all strings not in the language of our lexical specification

Make a regular expression for error strings and PUT IT LAST IN PRIORITY
(lowest priority)
• Regular expressions are a concise notation for string patterns

• Use in lexical analysis requires small extensions

• To resolve ambiguities
• Matches as long as possible
• Highest priority match
• To handle errors
• Make a regular expression for error strings and PUT IT LAST IN PRIORITY.
Make a regular expression for:
• Keyword is a reserved word whose meaning is already defined by the
programming language. We cannot use keyword for any other purpose
inside programming. Every programming language have some set of
keywords.
Examples: int, do, while, void, return, …………
Make a regular expression for:
• Identifiers
Identifiers are the name given to different programming elements. Either
name given to a variable or a function or any other programming element,
all follow some basic naming conventions listed below:

1.Keywords must not be used as an identifier.

2.Identifier must begin with an alphabet a-z A-Z or an underscore_ symbol.
3.Identifier can contains alphabets a-z A-Z, digits 0-9 and underscore _ symbol.
4.Identifier must not contain any special character (e.g. !@$*.'[] etc.) except
underscore _.
Make a regular expression for:
• Operator
Operators are the symbol given to any arithmetical or logical operations.
Various programming languages provides various sets of operators some
common operators are:
• Arithmetic operator (+, -, *, / %)
• Assignment operator (=)
• Relational operator (>, <, >=, <=, ==, !=)
• Logical operator (&&, ||, !)
• Bitwise operator (&, |, ^, ~, <<, >>)
• Increment/Decrement operator (++, --)
• Conditional/Ternary operator (? :)
Make a regular expression for:
• Literals
Literals are constant values that are used for performing various operations and
calculations. There are basically three types of literals:
1.Integer literal
An integer literal represents integer or numeric values.
Example: 1, 100, -12312 etc
2.Floating point literal
Floating point literal represents fractional values.
Example: 2.123, 1.02, -2.33, 13e54, -23.3 etc
3.Character literal
Character literal represent character values. Single character are enclosed in a single
quote(' ') while sequence of character are enclosed in double quotes(" ")
Example: 'a', 'n', "Hello", "Hello123" etc.
Finite Automata
• Regular expressions = specification
• Finite automata = implementation

• A finite automata consists of

• An input alphabet 
• A finite set of states S
• A start state q0
• A set of accepting states F  S
• A set of transitions δ input
state state
Finite Automata
• Transition
s1 a s2
• Is read
In state s1 on input a go to state s2

• If end of input and in accepting state => accept

• Otherwise => reject
• Terminates in state s  F
• Get stuck
Finite Automata
• A state

• The start state

• An accepting state

a
• A transition
Finite Automata
• A finite automata that accepts only “a”

a
q0 q1

• What happen if input strings are:

• “a”
• “b”
• “ab”

• Language of a finite automata is set of accepted strings.

Finite Automata
• A finite automata accepting any number of 0’s followed by a single 1.
0
1
q0 q1

STATE INPUT STATE INPUT

q0 001 q0 011

q0 001 q1 011

q1 001

Accept Reject
Regular Expressions to non-deterministic
finite automata (NFA)

Non-
Deterministic
deterministic Table-driven
Lexical Regular Finite
Finite Implementat
Specification Expressions Automata
Automata ion of DFA
(DFA)
(NFA)
Regular Expressions to NFA
• For each kind of regular expression, define an equivalent NFA that accepts
exactly the same language as the language of a regular expression.
NFA for regular expression M
M

• For ε 

• For input a a
Regular Expressions to NFA
• Concatenation
• For RS R  S

 R 
• Union
• For R + S  
S

• Iteration  
R
• For R*

Regular Expressions to NFA
• Consider the regular expression (0+1)(01)*
0
• For 0

1
• For 1

0
ε ε
• For 0 + 1
ε 1 ε
Regular Expressions to NFA
• Consider the regular expression (0+1)(01)*

• For 01 0 ε 1

ε
• For (01)*
ε 0 ε 1 ε

ε
Regular Expressions to NFA
• Consider the regular expression (0+1)(01)*

0 ε
B C
ε ε
ε ε 0 ε 1 ε
A F G H I J K L
ε 1 ε ε
D E
Regular Expressions to non-deterministic
finite automata (NFA)

Non-
Deterministic
deterministic Table-driven
Lexical Regular Finite
Finite Implementati
Specification Expressions Automata
Automata on of DFA
(DFA)
(NFA)
NFA to DFA
• Simulate the NFA
• Each state of DFA
= a non-empty subset of states of the NFA
• Start state of DFA
= the set of NFA states reachable through -moves from NFA start state
• Add a transition S a S’ to DFA if
– S’ is the set of NFA states reachable from any
state in S after seeing the input a, considering -moves as well
• Final state of DFA
= the set includes the final state of the NFA
NFA to DFA
• NFA for (0+1)(01)*

0 ε
B C
ε ε
ε ε 0 ε 1 ε
A F G H I J K L
ε 1 ε ε
D E
NFA to DFA
• NFA for (0+1)(01)*
0 ε
B C
ε ε
ε ε 0 ε 1 ε
A F G H I J K L
ε 1 ε ε
D E

0 CFGHL 0
1
ABD IJ KLH
1 EFGHL 0
0
Regular Expressions to non-deterministic
finite automata (NFA)

Non-
Deterministic Table-driven
Lexical Regular deterministic
Finite Automata Implementation
Specification Expressions Finite Automata
(DFA) of DFA
(NFA)
Implementation of DFA
• A DFA can be implemented by a 2D table T
– One dimension is “states”
– Other dimension is “input symbol”

a
– For every transition Si Sk define T[i,a] = k
a b
Input symbols
i k
j
states k
l
Implementation of DFA
• DFA for (0+1)(01)*
0 S1 0
S0 S3 1 S4
1 S2 0
0 0 1
S0 S1 S2
S1 S3
S2 S3
S3 S4
S4 S3
Implementation of DFA
i = 0;
state = 0;
0 1
while (input[i]){
state = T[state, input[i++]]; S0 S1 S2
} S1 S3
S2 S3
S3 S4
S4 S3
Implementation of DFA
• DFA for (0+1)(01)* 0 S1 0 1
S0 S3 S4
1 S2 0 0

0 1 0 1
S0 S1 S2 S0 S1 S2
S1 S3 S1
S3
S2 S3 S2
S3 S4 S3 S4
S4 S3 S4
Implementation of NFA
0 ε
0 1 ε B C
ε ε
A {B, D} ε ε 0 ε 1 ε
A F G H I J K L
B {C} ε 1 ε ε
D E
C {F}
D {E}
E {F}
F {G}
G {H, L}
H {I}
I {J}
J {K}
K {L, H}
Summarize
• Conversion of NFA to DFA is the key
• DFAs are faster and less compact so the tables can be very large
• NFAs are slower to implement but more concise.
• In practice, tools provide tradeoffs between speed and space.
• Tools give generally a series of options in the form of configuration files or
command lines which allow you to choose whether you want to be closer
to a full DFA or to a pure NFA.
Assignment 1 (Lexical Analyzer)

Lecture3 E
No ratings yet
Lecture3 E
153 pages
Lexi Cal A Analyzer
No ratings yet
Lexi Cal A Analyzer
38 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
Lec2 LexicalAnalyser
No ratings yet
Lec2 LexicalAnalyser
30 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
CD 1
No ratings yet
CD 1
92 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Chapter 2 Lexical - Analysis
No ratings yet
Chapter 2 Lexical - Analysis
38 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Compiler - Lexical Analyzer-2
No ratings yet
Compiler - Lexical Analyzer-2
16 pages
CD ch2
No ratings yet
CD ch2
104 pages
CSC 415 Compiler Design: Lexical Analysis
No ratings yet
CSC 415 Compiler Design: Lexical Analysis
40 pages
2024 CD-Ch02 Lexical Analysis
No ratings yet
2024 CD-Ch02 Lexical Analysis
25 pages
4 LexicalAnalysis
No ratings yet
4 LexicalAnalysis
27 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
CC 2
No ratings yet
CC 2
65 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
Lecture 03
No ratings yet
Lecture 03
42 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Lexical Analysis
No ratings yet
Lexical Analysis
41 pages
Module 3
No ratings yet
Module 3
7 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Lexical Analyzer 2023
No ratings yet
Lexical Analyzer 2023
38 pages
Lexical Analysis 3
No ratings yet
Lexical Analysis 3
27 pages
COS 320 Compilers: David Walker
No ratings yet
COS 320 Compilers: David Walker
38 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
CH 2 - Lexical Analysis
No ratings yet
CH 2 - Lexical Analysis
36 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
06-539 CyberCat Programming Manual
No ratings yet
06-539 CyberCat Programming Manual
138 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Compiler
No ratings yet
Compiler
60 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Yitbarek Zewde
No ratings yet
Yitbarek Zewde
79 pages
ESOL Practice Grammar Suplementary Grammar Support For ESOL Students Entry Level 3 David Alan King
No ratings yet
ESOL Practice Grammar Suplementary Grammar Support For ESOL Students Entry Level 3 David Alan King
85 pages
S1700, S2700, S5700, and S6700 V200R020C10 Upgrade Guide
No ratings yet
S1700, S2700, S5700, and S6700 V200R020C10 Upgrade Guide
146 pages
Smash 3000
No ratings yet
Smash 3000
4 pages
Lexical Analyzer in Perspective: Parser Source Program Token
No ratings yet
Lexical Analyzer in Perspective: Parser Source Program Token
22 pages
2 Lex
No ratings yet
2 Lex
45 pages
Compilers - Week 2
No ratings yet
Compilers - Week 2
14 pages
Lecture 1 - SYSTEM INTEGRATION
No ratings yet
Lecture 1 - SYSTEM INTEGRATION
31 pages
Openedge Frequently Asked Questions
No ratings yet
Openedge Frequently Asked Questions
10 pages
Intro To Compilers Lecture 2
No ratings yet
Intro To Compilers Lecture 2
15 pages
Top 10 Uses of Python in The Real World With Examples
100% (1)
Top 10 Uses of Python in The Real World With Examples
10 pages
rkCD-Chapter 2 - LEXICAL ANALYSIS
No ratings yet
rkCD-Chapter 2 - LEXICAL ANALYSIS
9 pages
Java Handson Questions
No ratings yet
Java Handson Questions
11 pages
A Cost-Effective, High-Bandwidth Storage Architecture
No ratings yet
A Cost-Effective, High-Bandwidth Storage Architecture
12 pages
Encapsulation Construct & Naming Encapsulation
No ratings yet
Encapsulation Construct & Naming Encapsulation
9 pages
CT Gateway: Installation and Administration Guide Release 5.0 SP 4
No ratings yet
CT Gateway: Installation and Administration Guide Release 5.0 SP 4
29 pages
Ai-102 8
No ratings yet
Ai-102 8
26 pages
Smart-Notes 45
No ratings yet
Smart-Notes 45
3 pages
Designing, Creating Alogo
No ratings yet
Designing, Creating Alogo
7 pages
Smartform User Guide500024
No ratings yet
Smartform User Guide500024
14 pages
Programming in C Language
No ratings yet
Programming in C Language
17 pages
USING CAATs FOR SUBSTANTIVE TESTING
No ratings yet
USING CAATs FOR SUBSTANTIVE TESTING
7 pages
SQL pubDB Moreproblems
No ratings yet
SQL pubDB Moreproblems
2 pages
IDOC Documentation
No ratings yet
IDOC Documentation
26 pages
Technical Proposal Hospital Management System
No ratings yet
Technical Proposal Hospital Management System
12 pages
Guided Question On Overview of Database Systems
No ratings yet
Guided Question On Overview of Database Systems
3 pages
Curriculum Vitae: Sanju C
No ratings yet
Curriculum Vitae: Sanju C
4 pages
HPS CSS EMEA Training Script SAP Reinvoicing After CM On QR: Input
No ratings yet
HPS CSS EMEA Training Script SAP Reinvoicing After CM On QR: Input
8 pages
Java Programming Ing. Matilde Montealegre Madero, Msc. April 12/19
No ratings yet
Java Programming Ing. Matilde Montealegre Madero, Msc. April 12/19
5 pages
VideoXpert 3rd-Party Camera Support
No ratings yet
VideoXpert 3rd-Party Camera Support
5 pages
Alerton - Compass 1.6 Data Sheet
No ratings yet
Alerton - Compass 1.6 Data Sheet
2 pages
Sub Step 1 Module Trainer Step 3 Module Team Leader
No ratings yet
Sub Step 1 Module Trainer Step 3 Module Team Leader
3 pages
C Interview Questions in Order
No ratings yet
C Interview Questions in Order
2 pages
NOVO-Script para Adicionar Pessoas em Massa-2020
No ratings yet
NOVO-Script para Adicionar Pessoas em Massa-2020
1 page
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)

Chapter2-Lexical Analysis

Uploaded by

Chapter2-Lexical Analysis

Uploaded by

Chapter 2 – Lexical Analysis

Source code Front End Back End Target code

• Front End: Analysis

Lexical Syntax Semantic

• Lexical Analysis: breaks input into individual words – “tokens”

-> read the input characters of the source program

• A lexeme is a sequence of characters in the source program that matches

• A token is a pair consisting of a token name and an optional attribute

\tif (num1 == num2)\n\t\tresult = 1;\n\telse\n\t\tresult = 0;

• In a programming language: identifier, keyword, (, ), number, …

<id, ‘result’> Abstract

=> Go through and identify the tokens of the substrings.

Whitespace: A non-empty sequence of blanks, newlines, and tabs

• Token classes correspond to sets of strings.

(1+0)1 = {ab | a ∈ 1+0 ∧ b ∈ 1} = 11 + 01

0* + 1* = {0^𝑖 | 𝑖≥0} ∪ {1^𝑖 | 𝑖≥0} = ε + 0 + 00 + 000 + 0000 + ……….

‘a’ = {“a”} => L(‘a’) = {“a”}

Regular expression for keyword:

integer = digit digit* = digit+

whitespace = (‘ ‘ + ‘\n’ + ‘\t’)+

=> Make regular expression for this email address:

• Union: A|B A+B

• Range: ‘a’ + ‘b’ + …+ ‘z’  [a-z]

• Excluded range: complement of [a-z]  [^a-z]

optional fraction and an optional exponent (3.15E+10, 8E-3, 15.6, …)

• Regular languages are a language specification

2. Construct R, matching all lexemes for all tokens

4. If success, then we know that

5. Remove x1 ….xn from input and go to (3)

Rule: Pick longest possible string in L(R)

Keywords = ‘if’ + ‘else’ + ….

Error = all strings not in the language of our lexical specification

• Use in lexical analysis requires small extensions

1.Keywords must not be used as an identifier.

• A finite automata consists of

• If end of input and in accepting state => accept

• The start state

• What happen if input strings are:

• Language of a finite automata is set of accepted strings.

STATE INPUT STATE INPUT

You might also like