0% found this document useful (0 votes)

14 views77 pages

Chapter 2

Chapter two discusses lexical analysis, focusing on the role of the lexical analyzer in reading source code, grouping characters into lexemes, and producing tokens. It covers the interaction between the lexical analyzer and parser, the specification of patterns using regular expressions, and the design of a lexical analyzer using finite automata. The chapter also addresses error detection and recovery methods, along with examples of token recognition and regular expressions for various programming constructs.

Uploaded by

aynetenew1921

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views77 pages

Chapter 2

Uploaded by

aynetenew1921

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

Chapter – two

Lexical analysis

By Melese Alemante
2025

1
Outline
 Introduction
 Interaction of the Lexical Analyzer with the
Parser
 Token, pattern, lexeme
 Specification of patterns using regular expressions
 Regular expressions
 Regular expressions for tokens
 NFA and DFA
 Conversion from RE to NFA to DFA…
 Lex Scanner Generator
 Creating a Lexical Analyzer with Lex
 Regular Expressions in Lex
 Lex specifications and examples
2
Introduction
❑ The role of the lexical analyzer is:
• to read a sequence of characters from the source
program
• group them into lexemes and
• produce as output a sequence of tokens for each
lexeme in the source program.
 The scanner can also perform the following
secondary tasks:
 stripping out blanks, tabs, new lines
 stripping out comments
 keep track of line numbers (for error reporting)

3
Interaction of the Lexical Analyzer
with the Parser

next char next token

lexical Syntax
analyzer analyzer
get next
char get next
token

Source
Program
symbol
table

(Contains a record
for each identifier)

token: smallest meaningful sequence of characters

of interest in source program
4
Token, pattern, lexeme
 A token is a sequence of characters from the source
program having a collective meaning.
 A token is a classification of lexical units.
- For example: id and num
 Lexemes are the specific character strings that make
up a token.
– For example: abc and 123A
 Patterns are rules describing the set of lexemes
belonging to a token.
– For example: “letter followed by letters and digits”
 Patterns are usually specified using regular expressions.
[a-zA-Z]*
Example: printf("Total = %d\n", score);

5
Token, pattern, lexeme…
 Example: The following table shows some tokens and
their lexemes in Pascal (a high level, case insensitive
programming language)
Token Some lexemes pattern
begin Begin, Begin, BEGIN, Begin in small or capital
beGin… letters
if If, IF, iF, If If in small or capital letters
ident Distance, F1, x, Dist1,… Letters followed by zero or
more letters and/or digits

• In general, in programming languages, the following are

tokens:
keywords, operators, identifiers, constants, literals,
punctuation symbols…
6
Attributes of tokens
 When more than one pattern matches a lexeme, the
scanner must provide additional information about the
particular lexeme to the subsequent phases of the
compiler.
 For example, both 0 and 1 match the pattern for the
token num.
 But the code generator needs to know which number is
recognized.
 The lexical analyzer collects information about tokens
into their associated attributes.
• Tokens influence parsing decisions;
• Attributes influence the translation of tokens after
parse
7
Attributes of tokens…
 Practically, a token has one attribute:
 a pointer to the symbol table entry
in which the information about the token
is kept.
 The symbol table entry contains various
information about the token
 such as its lexeme, type, the line number in which
it was first seen …

Ex. y = 31 + 28 * x, The tokens and their

attributes are written as:

8
Attributes of tokens…

9
9
Errors
 Very few errors are detected by the lexical
analyzer.
 For example, if the programmer mistakes
ebgin for begin, the lexical analyzer cannot
detect the error since it will consider ebgin as
an identifier.
 Nonetheless, if a certain sequence of
characters follows none of the specified
patterns, the lexical analyzer can detect the
error.

10
Errors…
 When an error occurs, the lexical analyzer
recovers by:
 skipping (deleting) successive characters from the
remaining input until the lexical analyzer can find a
well forme token (panic mode recover)
 deleting one character from the remaining input
 inserting missing characters in to the remaining input
 replacing an incorrect by a correct
charctercharacter
 transposing two adjacent characters

11
Errors…
Example.
int num = 42#56; // Invalid character (#)
• The lexer may skip # to recover.
int x = 9$;
• The lexer deletes $, assuming it was a typo
int y 10; // Missing '='
• The lexer inserts = to correct it: int y = 10;
int val# = 5; // Incorrect: '#' is not allowed in variable
names
• The lexer may replace # with _, assuming it was a
typo
pritnf("Hello"); // Incorrect: "pritnf" instead of "printf"
• The lexer detects that "pritnf" is similar to "printf" and
corrects the order.
12
Specification of patterns using
regular expressions

 Regular expressions
 Regular expressions for tokens

13
Regular expression: Definitions

 Represents patterns of strings of characters.

 An alphabet Σ is a finite set of symbols
(characters)
 A string s is a finite sequence of symbols
from Σ
 |s| denotes the length of string s
 ε denotes the empty string, thus
|ε| = 0
 A language L is a specific set of strings over
some fixed alphabet Σ
14
Regular expressions…
 A regular expression is one of the following:
Symbol: a basic regular expression consisting of a single
character a, where a is from:
▪ an alphabet Σ of legal characters;
▪ the metacharacter ε: or
▪ the metacharacter ø.
▪ In the first case, L(a)={a};
▪ in the second case, L(ε)= { ε};
▪ in the third case, L(ø)= { }.
▪ {} – contains no string at all.
▪ {ε} – contains the single string consists of no character
15
Regular expressions…
 Alternation: an expression of the form r|s, where r
and s are regular expressions.
 In this case , L(r|s) = L(r) U L(s) ={r,s}

 Concatenation: An expression of the form rs, where r

and s are regular expressions.
 In this case, L(rs) = L(r)L(s)={rs}
 Repetition: An expression of the form r*, where r is a
regular expression.
 In this case, L(r*) = L(r)* ={ε, r,…}

16
Regular expression: Language Operations

 Union of L and M
 L ∪ M = {s |s ∈L or s ∈M}
 Concatenation of L and M
 LM = {xy | x ∈L and y ∈ M}
 Exponentiation of L
 L0 = {ε}; Li = Li-1L The following shorthands
are often used:
 Kleene closure of L
 L* = ∪i=0,…,∞ Li r+ =rr*
r* = r+| ε
 Positive closure of L
r? =r|ε-
 L+ = ∪i=1,…,∞ Li optional
operator 17
Examples

L1={a,b,c,d} L2={1,2}
L1 ∪ L2={a,b,c,d,1,2}
L1L2={a1,a2,b1,b2,c1,c2,d1,d2}
L1*=all strings of letter a,b,c,d and empty string.
L1+= the set of all strings of one or more letter a,b,c,d,
empty string not included

18
Regular expressions…
 Examples (more):
1- a | b = {a,b}
2 (a|b)a = {aa,ba}
3 (ab) | ε ={ab, ε}
4 ((a|b)a)* = {ε, aa,ba,aaaa,baba,....}
5 Even binary numbers (0|1)*0
6 An alphabet consisting of just three alphabetic
characters: Σ = {a, b, c}. Consider the set of all strings
over this alphabet that contains exactly one b.
(a | c)*b(a|c)* {b, abc, abaca, baaaac, ccbaca, cccccb}

19
Exercises
 Describe the languages denoted by the following
regular expressions:
1 a(a|b)*a
2 ((ε|a)b*)*
3 (a|b)*a(a|b)(a|b)
4 a*ba*ba*ba*
5 (aa|bb)*((ab|ba)(aa|bb)*(ab|ba)(aa|bb)*)*

20
Regular expressions for tokens

 Regular expressions are used to specify the

patterns of tokens.
 Each pattern matches a set of strings. It falls into
different categories:
 Reserved (Key) words: They are represented by
their fixed sequence of characters,
 Ex. if, while and do....
 If we want to collect all the reserved words into
one definition, we could write it as follows:
Reserved = if | while | do |...

21
Regular expressions for tokens…
 Special symbols: including arithmetic operators,
assignment and equality such as =, :=, +, -, *
 Identifiers: which are defined to be a sequence of
letters and digits beginning with letter,
 we can express this in terms of regular definitions as
follows:
letter = A|B|…|Z|a|b|…|z
digit = 0|1|…|9
or
letter= [a-zA-Z]
digit = [0-9]
identifiers = letter(letter|digit)*
22
Regular expressions for tokens…
 Numbers: Numbers can be:
 sequence of digits (natural numbers), or
 decimal numbers, or
 numbers with exponent (indicated by an e or
E).
 Example: 2.71E-2 represents the number 0.0271.
 We can write regular definitions for these numbers as
follows:
nat = [0-9]+
signedNat = (+|-)? Nat
number = signedNat(“.” nat)?(E signedNat)?
 Literals or constants: which can include:
 numeric constants such as 42, and
 string literals such as “ hello, world”. 23
Example: Divide the following Java program into
appropriate tokens.
public class Dog {
private String name;
private String color;

public Dog(String n, String c) {

name = n;
color = c;
}

public String getName() { return name; }

public String getColor() { return color; }

public void speak() {

System.out.println("Woof");
}
}
24
Recognition of tokens
Patterns for tokens using
regular expressions

digit -> [0-9]

nat ->digit+
signednat -> (+|-)?nat
Number->signednat(“.”nat)?(E signednat)?
letter -> [A-Za-z]
Id->letter(letter|digit)*
relop -><|>|<=|>=|=|<>

➢ Forthis language, the lexical analyzer will recognize:

➢ Lexemes that match the patterns for relop, id, number

25
Transition diagram that recognizes the lexemes
matching the token relop and id.

3-27
Design of a Lexical Analyzer/Scanner
Finite Automata
❑ Lex – turns its input program into lexical analyzer.
❑ At the heart of the transition is the formalism known as
finite automata.
❑ Finite automata is graphs, like transition diagrams, with a
few differences:
1. Finite automata are recognizers; they simply say "yes" or
"no" about each possible input string.
2. Finite automata come in two flavors:
a) Nondeterministic finite automata (NFA) have no restrictions
on the labels of their edges.
ε, the empty string, is a possible label.
b) Deterministic finite automata (DFA) have, for each state,
and for each symbol of its input alphabet exactly one edge
with that symbol leaving that state.
27
The Whole Scanner Generator Process
Overview
❑ Direct construction of Nondeterministic finite
Automation (NFA) to recognize a given regular
expression.
❑ Easy to build in an algorithmic way
❑ Requires ε-transitions to combine regular sub expressions
❑ Construct a deterministic finite automation
(DFA) to simulate the NFA Optional
❑ Use a set-of-state construction
❑ Minimize the number of states in the DFA
❑ Generate the scanner code.
28
Design of a Lexical Analyzer …
 Token ➔ Pattern
 Pattern ➔ Regular Expression
 Regular Expression ➔ NFA
 NFA ➔ DFA
 DFA’s or NFA’s for all tokens ➔ Lexical Analyzer

29
Non-Deterministic Finite Automata
(NFA)
Definition
 An NFA M consists of five tuples: ( Σ,S, T, s0, F)
 A set of input symbols Σ, the input alphabet
 a finite set of states S,
 a transition function T: S × (Σ U { ε}) -> S (next state),
 a start state s0 from S, and
 a set of accepting/final states F from S.
 The language accepted by M, written L(M), is defined as:
The set of strings of characters c1c2...cn with each ci from
Σ U { ε} such that there exist states s1 in T(s0,c1), s2 in
T(s1,c2), ... , sn in T(sn-1,cn) with sn an element of F.

30
NFA…
 It is a finite automata which has choice of
edges
• The same symbol can label edges from one state to
several different states.
 An edge may be labeled by ε, the empty
string
• We can have transitions without any input
character consumption.

31
Transition Graph
 The transition graph for an NFA recognizing the
language of regular expression (a|b)*abb
all strings of a's and b's ending in the
particular string abb
a

start a b b
0 1 2 3

b S={0,1,2,3}
Σ={a,b}
S0=0
F={3}
32
Transition Table
 The mapping T of an NFA can be represented
in a transition table
State Input Input Input
a b ε
0 {0,1} {0} ø

T(0,a) = {0,1} 1 ø {2} ø

T(0,b) = {0}
T(1,b) = {2} 2 ø {3} ø
T(2,b) = {3}
3 ø ø ø

The language defined by an NFA is the set of input

strings it accepts, such as (a|b)*abb for the example
NFA
33
Acceptance of input strings by NFA
 An NFA accepts input string x if and only if there is
some path in the transition graph from the start
state to one of the accepting states
 The string aabb is accepted by the NFA:

a a b b
0 0 1 2 3 YES

a a b b
0 0 0 0 0 NO

34
Another NFA
a

a


start
b
b


An -transition is taken without consuming any character from

the input.
What does the NFA above accepts?

aa*|bb*
35
Deterministic Finite Automata (DFA)

 A deterministic finite automaton is a special

case of an NFA
 No state has an ε-transition
 For each state S and input symbol a there is
atmost one edge labeled a leaving S
 Each entry in the transition table is a single state
 At most one path exists to accept a string
 Simulation algorithm is simple

36
DFA example
A DFA that accepts (a|b)*abb

37
Simulating a DFA: Algorithm
How to apply a DFA to a string.
INPUT:
 An input string x terminated by an end-of-file character
eof.
 A DFA D with start state So, accepting states F, and
transition function move.
OUTPUT: Answer ''yes" if D accepts x; "no" otherwise
METHOD
 Apply the algorithm in (next slide) to the input string x.
 The function move(s, c) gives the state to which there is
an edge from state s on input c.
 The function nextChar() returns the next character of
the input string x.
38
Simulating a DFA
s = so;
c = nextchar();
while ( c != eof ) {
s = move(s, c);
c = nextchar();
}
if ( s is in F ) return
"yes";
DFA accepting (a|b)*abb
else return "no";

Given the input string ababb, this DFA enters the

sequence of states 0,1,2,1,2,3 and returns "yes"
39
DFA: Exercise

 Draw DFAs for the string matched by the

following definition:
digit =[0-9]
nat=digit+
signednat=(+|-)?nat
number=signednat(“.”nat)?(E signedNat)?

40
Design of a Lexical Analyzer Generator

Regular Expression DFA

Two algorithms:
1 Translate a regular expression into an NFA
(Thompson’s construction)

2 Translate NFA into DFA

(Subset construction)
41
From regular expression to an NFA
 It is known as Thompson’s construction.

Rules:
1- For an ε, a regular expressions, construct:

start a

42
From regular expression to an NFA…
2- For a composition of regular expression:
 Case 1: Alternation: regular expression(s|r), assume
that NFAs equivalent to r and s have been
constructed.

45
45
From regular expression to an NFA…
 Case 2: Concatenation: regular expression sr

ε
…r …s

Case 3: Repetition r*

44
From RE to NFA:Exercises

 Construct NFA for token identifier.

letter(letter|digit)*
Construct NFA for the following regular
expression:
(a|b)*abb

45
From an NFA to a DFA
(subset construction algorithm)

 Input NFA N Both accept the same

Output DFA D language usage (RE)

Rules:
 Start state of D is assumed to be unmarked.
 Start state of D is = ε-closure (S0),
where S0 -start state of N.

46
NFA to a DFA…
ε- closure
ε-closure (S’) – is a set of states with the following
characteristics:
1 S’ € ε-closure(S’) itself
2 if t € ε-closure (S’) and if there is an edge labeled
ε from t to v, then v € ε-closure (S’)
3 Repeat step 2 until no more states can be added
to ε-closure (S’).
E.g: for NFA of (a|b)*abb
ε-closure (0)= {0, 1, 2, 4, 7}
ε-closure (1)= {1, 2, 4}

47
NFA for identifier: letter(letter|digit)*
ε

letter
3 4
ε ε
start
letter ε ε
0 1 2 7 8
digit ε
ε 5 6

48
NFA to a DFA…
Example: Convert the following NFA into the corresponding
DFA. letter (letter|digit)*
A={0}
B={1,2,3,5,8}
start letter C={4,7,2,3,5,8}
A B
D={6,7,8,2,3,5}

letter digit
letter
digit D digit
C

letter

49
Exercise: convert NFA of (a|b)*abb in to DFA.

50
Other Algorithms

 How to minimize a DFA ? (see Dragon Book

3.9, pp.173)
 How to convert RE to DFA directly ? (see
Dragon Book 3.9.5 pp.179)

51
The Lexical- Analyzer Generator: Lex
 The first phase in a compiler is, it reads the
input source and converts strings in the source
to tokens.
 Lex: generates a scanner (lexical analyzer or
lexer) given a specification of the tokens using
REs.
 The input notation for the Lex tool is referred toas
the Lex language and
 The tool itself is the Lex compiler.
 The Lex compiler transforms the input patterns into a
transition diagram and generates code, in a file
called lex.yy.c, that simulates this transition diagram.
52
Lex…

 By using regular expressions, we can specify

patterns to lex that allow it to scan and match
strings in the input.
 Each pattern in lex has an associated action.
 Typically an action returns a token, representing
the matched string, for subsequent use by the
parser.
 It uses patterns that match strings in the input and
converts the strings to tokens.

53
General Compiler Infra-structure
Parse tree
Program source Tokens Parser
Scanner Semantic
(tokenizer) Routines
(stream of
characters) Annotated/decorated
tree

Analysis/
Transformations/
Symbol and optimizations
literal Tables
IR: Intermediate
Representation

Code
Generator

Assembly code

54
Scanner, Parser, Lex and Yacc

5858
Generating a Lexical Analyzer using Lex
Lex is a scanner generator ----- it takes lexical specification as
input, and produces a lexical analyzer written in C.

Lex source
program Lex compiler lex.yy.c
lex.l

lex.yy.c
C compiler a.out

Input stream Sequence of

a.out tokens

Lexical Analyzer
56
Lex specification
➢ Program structure C declarations in %{
...declaration section... %}

%%
P1 { action1 }
...rule section... P2 { action2 }
%%
...user defined functions...
 Rules section – regular expression <--> action.
• The actions are C program.
 Declaration section – variables, constants

57
Skeleton of a lex specification (.l file)
x.l *.c is generated after
running

%{
< C global variables, prototypes, This part is copied as–is to
comments > the top of the generated
C file
%}

Substitutions simplifies
[DEFINITION SECTION] pattern matching

%% Define how to scan and

[RULES SECTION] what action to take for
each token
%% Any user code. For
< C auxiliary subroutines> example, a main function
to call the scanning
function yylex(). 61
The rules section
%%
[RULES SECTION]

<pattern> { <action to take when matched> }

<pattern> { <action to take when matched> }
…
%%

Patterns are specified by regular expressions.

For example:
%%
[A-Za-z] + { printf(“this is a word”); }
%%
59
Design of a Lexical Analyzer Generator:
RE to NFA to DFA

NFA sim. alg

Thompson’s
construction

DFA sim. alg

60
Simulating an NFA
❑ INPUT: An input string x terminated by an end-of-file
character eof. An NFA N with start state so, accepting
states F, and transition function move.
❑ OUTPUT: Answer "yes " if M accepts x; "no" otherwise.
Algorithm
S = ε-closure(so);
c = nextchar();
while ( c != eof ) {
S = ε- closure (move(S, c)) ;
c = nextchar();
}
if ( S n F != Ф ) return “yes”;
else return "no";
61
Combining and simulation of NFAs of a Set of
Regular Expressions: Example 1
start a
a {action1} 1 2
start b
abb {action2} a b
3 4 5 6
a*b + {action3}
start a
Must find the longest b
prefix match: 7 b 8
Continue until no further
moves are possible a Action 1
ε 1 2
start b
a a b a*b+ b
0 ε 3 a 4 5 6
0 2 7 8
1 4 ε Action 2
a b
3 7 7 8 b
7 None a
Action 3 Action 3
62
Simulating NFA…

ε-closure({0}) = {0,1,3,7}
move({0,1,3,7},a) = {2,4,7}
ε-closure({2,4,7}) = {2,4,7}
move({2,4,7},a) = {7}
ε-closure({7}) = {7}
move({7},b) = {8}
ε-closure({8}) = {8}
move({8},a) = ∅

63
Combining and simulation of NFAs of a Set of
Regular Expressions: Example 2
start a
a {action1} 1 2
start b
abb {action2} a b
3 4 5 6
a*b+ {action3}
start a
When two or more b
accepting states are 7 b 8
reached, the action is
executed a Action 1
ε 1 2
start b
a b b b
0 ε 3 a 4 5 6
0 2 5 6
1 4 8 8 ε Action 2
a b
3 7 7 8 b
7 None a Action 3
Action 2
Action 3 67
DFA's for Lexical Analyzers
NFA DFA. Transition table for DFA

State a b Token
found
0137 247 8 None
247 7 58 a
8 - 8 a*b+
7 7 8 None
58 - 68 a*b+
68 - 8 abb

Example: simulate the above DFA for input abba

65
Lex Regular Expression Basics
. : matches everything except \n
* : matches 0 or more instances of the preceding regular expression
+ : matches 1 or more instances of the preceding regular expression
? : matches 0 or 1 of the preceding regular expression
| : matches the preceding or following regular expression
[xyz ] : match one character x,y,or z
[^xyz] : match any character except x,y, and z
() : groups enclosed regular expression into a new regular expression
“…” : matches everything within the “ “ literally
x :x, but only at beginning of line
x$ :x, but only at end of line
{d} : match the regular expression defined by d.

66
Pattern matching examples

67
Meta-characters

 meta-characters (do not match themselves, because

they are used as a special symbols in reg exps):
()[]{}<>+/,^*|.\"$?-%

 to match a meta-character, prefix with "\"

 to match a backslash, tab or newline, use \\, \t, or \n

68
Lex Regular Expression: Examples

• an integer: 12345

[1-9][0-9]*
• a word: cat
[a-zA-Z]+
• a (possibly) signed integer: 12345 or -12345
[-+]?[1-9][0-9]*
• a floating point number: 1.2345
[0-9]*”.”[0-9]+

69
Regular Expression: Examples…
•a delimiter for an English sentence
“.” | “?” | ! OR

[“.””?”!]
• C++ comment: // call foo() here!!
“//”.*
• white space
[ \t]+
• English sentence: Look at this!
([ \t]+|[a-zA-Z]+)+(“.”|”?”|!)
70
Two Rules

1. lex will always match the longest (number of

characters) token possible.

2. If two or more possible tokens are of the same

length, then the token with the regular expression
that is defined first in the lex specification is
favored.

71
Suppose the input is ifx and the lex specification includes:

if → return IF_TOKEN
[a-zA-Z]+ → return IDENTIFIER_TOKEN

Without the longest match rule, lex might match just i or if and stop, but
with this rule, it scans ahead:
• It sees if (matches IF_TOKEN, length 2).
• It continues to ifx and checks if a longer match is possible (e.g., ifx as
an identifier, length 3).
If no longer match is found for ifx as a single token, it commits to if as
IF_TOKEN and leaves x for the next tokenization.

However, if ifx is defined as an identifier, lex would match the entire ifx
(length 3) as IDENTIFIER_TOKEN if it’s the longest possible match.
Suppose the input is if and the lex specification includes:
if return IF_TOKEN // Defined first
[a-z]+ return IDENTIFIER_TOKEN // Defined second
Both patterns could match the string if (length 2):

if matches the literal pattern if.

[a-z]+ matches any sequence of lowercase letters, including if.

Since if is defined first in the specification, lex will match if as

IF_TOKEN rather than treating it as an identifier.
Lex variables
yyin - of the type FILE*. This points to the current file
being scanned by the lexer.
yyout - Of the type FILE*. This points to the location
where the output of the lexer will be written.
• By default, both yyin and yyout point to standard input
and output.
yytext – variable, a pointer to the matched strings (char
*)
yyleng - Gives the length of the matched pattern.
yylineno - Provides current line number information.

74
Lex functions

yylex() - The function that starts the analysis. It is

automatically generated by Lex.
yywrap() - This function is called when end of file (or input)
is encountered. If this function returns 1, the parsing
stops.
yymore() - This function tells the lexer to append the next
token to the current token.
input() – read next character from yyin. This is the function
invoked by yylex() to read the input.
output() – write yytext to yyout. This is the function
invoked by yylex() to write the output.

75
Lex predefined variables

76
Thank you!

Perhitungan Tugas Besar Geometri Jalan Raya (Andre Gunawan 1622201019)
No ratings yet
Perhitungan Tugas Besar Geometri Jalan Raya (Andre Gunawan 1622201019)
77 pages
Wma14-01-June-2023 Solved
50% (2)
Wma14-01-June-2023 Solved
32 pages
Visual Carb List
No ratings yet
Visual Carb List
7 pages
Chapter 2
No ratings yet
Chapter 2
99 pages
Chapter 02 Using Drawing Tools
No ratings yet
Chapter 02 Using Drawing Tools
59 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Food Science and Technology - FULYA
No ratings yet
Food Science and Technology - FULYA
23 pages
Ch3 1
No ratings yet
Ch3 1
52 pages
Form Aoc-4 XBRL Help
No ratings yet
Form Aoc-4 XBRL Help
23 pages
Shanabrook Forensic Audit
No ratings yet
Shanabrook Forensic Audit
63 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
TM 55 4920 413 13 and P
No ratings yet
TM 55 4920 413 13 and P
115 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
No ratings yet
Acknowledgements: The Slides For This Lecture Are A Modified Versions of The Offering by
40 pages
Chapter 15-BOM
No ratings yet
Chapter 15-BOM
12 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
Ch3 - Lexical Analysis
No ratings yet
Ch3 - Lexical Analysis
52 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Chapter 3 - Lexical Analysis
100% (3)
Chapter 3 - Lexical Analysis
51 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Desmi Operations and Maintenance Instructions
100% (2)
Desmi Operations and Maintenance Instructions
29 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Compiler - Lexical Analyzer-2
No ratings yet
Compiler - Lexical Analyzer-2
16 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
Powers of Central Government Under The Environmental Protection Act 1986
No ratings yet
Powers of Central Government Under The Environmental Protection Act 1986
4 pages
Uniform Plane Wave Solution To The Wave Equation
No ratings yet
Uniform Plane Wave Solution To The Wave Equation
5 pages
2 LexicalAnalysis
No ratings yet
2 LexicalAnalysis
11 pages
2nd KIIT University National Moot Court Competition, 2014: Civil Writ Petition No. - OF 2014
No ratings yet
2nd KIIT University National Moot Court Competition, 2014: Civil Writ Petition No. - OF 2014
30 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
Unit 3 (Consumption & Investment) - 668d048e 6825 4751 B63a 27bf5446af0d
No ratings yet
Unit 3 (Consumption & Investment) - 668d048e 6825 4751 B63a 27bf5446af0d
18 pages
CC Note 1
No ratings yet
CC Note 1
11 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
CC Unit 2
No ratings yet
CC Unit 2
80 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
CD ch2
No ratings yet
CD ch2
104 pages
UNIT-I - Lexical Analysis
No ratings yet
UNIT-I - Lexical Analysis
51 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Khabouris Acts
No ratings yet
Khabouris Acts
84 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Mse & History Format
No ratings yet
Mse & History Format
24 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
COMPTRONIX
No ratings yet
COMPTRONIX
18 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Ammeraal Beltech: Innovation & Service in Belting
No ratings yet
Ammeraal Beltech: Innovation & Service in Belting
6 pages
Master Hunter CX
No ratings yet
Master Hunter CX
13 pages
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
88 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Rezgui-An Overview of Optical Fibers
No ratings yet
Rezgui-An Overview of Optical Fibers
8 pages
Check List DPR at SRRDA Level
No ratings yet
Check List DPR at SRRDA Level
4 pages
Form Pelaporan Ukl Upl
No ratings yet
Form Pelaporan Ukl Upl
3 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Lexical Analysis 3
No ratings yet
Lexical Analysis 3
27 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Homework 5 Problem 1
No ratings yet
Homework 5 Problem 1
4 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Unit 3, Lesson 2
No ratings yet
Unit 3, Lesson 2
2 pages
Crack Waves
No ratings yet
Crack Waves
9 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Module 2 Inverse Functions
No ratings yet
Module 2 Inverse Functions
3 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Bulletin 1492 Digital/Analog Programmable Controller Wiring Systems
No ratings yet
Bulletin 1492 Digital/Analog Programmable Controller Wiring Systems
1 page
Chapter 3 - Lexical Analysis
100% (1)
Chapter 3 - Lexical Analysis
51 pages
Chpater 2 Lexical Analysis
No ratings yet
Chpater 2 Lexical Analysis
48 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
DR Lal Pathlabs: Interpretation
No ratings yet
DR Lal Pathlabs: Interpretation
2 pages
January 3-7, 2023
No ratings yet
January 3-7, 2023
3 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Lexical Analyzer in Perspective: Parser Source Program Token
No ratings yet
Lexical Analyzer in Perspective: Parser Source Program Token
22 pages
2 Lex
No ratings yet
2 Lex
45 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Compiler
No ratings yet
Compiler
60 pages
Harshit Sinha: Deloitte Financial Advisory Services India Private Limited (USI)
No ratings yet
Harshit Sinha: Deloitte Financial Advisory Services India Private Limited (USI)
1 page
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)

Chapter 2

Uploaded by

Chapter 2

Uploaded by

Chapter – two

next char next token

token: smallest meaningful sequence of characters

• In general, in programming languages, the following are

Ex. y = 31 + 28 * x, The tokens and their

 Represents patterns of strings of characters.

 Concatenation: An expression of the form rs, where r

 Regular expressions are used to specify the

public Dog(String n, String c) {

public String getName() { return name; }

public String getColor() { return color; }

public void speak() {

digit -> [0-9]

➢ Forthis language, the lexical analyzer will recognize:

T(0,a) = {0,1} 1 ø {2} ø

The language defined by an NFA is the set of input

An -transition is taken without consuming any character from

 A deterministic finite automaton is a special

Given the input string ababb, this DFA enters the

 Draw DFAs for the string matched by the

Regular Expression DFA

2 Translate NFA into DFA

 Construct NFA for token identifier.

 Input NFA N Both accept the same

 How to minimize a DFA ? (see Dragon Book

 By using regular expressions, we can specify

Input stream Sequence of

%% Define how to scan and

<pattern> { <action to take when matched> }

Patterns are specified by regular expressions.

NFA sim. alg

DFA sim. alg

Example: simulate the above DFA for input abba

 meta-characters (do not match themselves, because

 to match a meta-character, prefix with "\"

 to match a backslash, tab or newline, use \\, \t, or \n

1. lex will always match the longest (number of

2. If two or more possible tokens are of the same

if matches the literal pattern if.

Since if is defined first in the specification, lex will match if as

yylex() - The function that starts the analysis. It is

You might also like