0% found this document useful (0 votes)
99 views16 pages

CSI 411 - Compiler - Lecture 3 PDF

This document is a lecture on lexical analyzers that discusses extensions to regular expressions, examples of regular expressions, transition diagrams, states, edges in transition diagrams, rules for constructing transition diagrams, and provides examples of transition diagrams. It encourages exercises to build transition diagrams from regular expressions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views16 pages

CSI 411 - Compiler - Lecture 3 PDF

This document is a lecture on lexical analyzers that discusses extensions to regular expressions, examples of regular expressions, transition diagrams, states, edges in transition diagrams, rules for constructing transition diagrams, and provides examples of transition diagrams. It encourages exercises to build transition diagrams from regular expressions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Lecture 3 - Lexical Analyzer (Continued)

Adnan Ferdous Ashrafi

Stamford University Bangladesh

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 1 / 16


1 Extensions of Regular Expressions
Variants of Regular Expressions
Example of a Regular Expression
Exercise
Some common Regular Expressions

2 Transition Diagrams
States
Edges
Rules for construction
Examples of a Transition Diagram
Exercise

3 Conclusion

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 2 / 16


Extensions of Regular Expressions

Regular Expressions variants


Since Kleene introduced regular expressions with the basic operators for union,
concatenation, and Kleene closure in the 1950s, many extensions have been
added to regular expressions to enhance their ability to specify string patterns.

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 3 / 16


Variants of Regular Expressions
1 One or more instances: The unary, postfix operator + represents the pos-
itive closure of a regular expression and its language. That is, if r is a
regular expression, then (r )+ denotes the language (L(r ))+ The . oper-
ator has the same precedence and associativity as the operator *. Two
useful algebraic laws, r ∗ = r + |ε and r + = rr ∗ = r ∗ r relate the Kleene
closure and positive closure.
2 Zero or one instance: The unary postfix operator ? means ”zero or one
occurrence.” That is, r ? is equivalent to r |c, or put another way, L(r ?) =
L(r ) {ε}. The ? operator has the same precedence and associativity as
S

* and +.
3 Character classes: A regular expression a1 |a2 |...|an , where the ai ’s are
each symbols of the alphabet, can be replaced by the shorthand [a1 a2 ...an ].
More importantly, when a1 , a2 , ..., an form a logical sequence, e.g., con-
secutive uppercase letters, lowercase letters, or digits, we can replace
them by a1 − an , that is, just the first and last separated by a hyphen.
Thus, [abc] is shorthand for a|b|c, and [a-z] is shorthand for a|b|...|z.
Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 4 / 16
Example of a Regular Expression

Example 1
Find the regular expression for a ”word”

Solution: [A-Za-z]+

Example 2
Find the regular expression for a ”Integer Number”

Solution: [0-9]+

Example 3
Find the regular expression for a ”Word that ends with the letter ’A’ ”

Solution: [A-Za-z]+a

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 5 / 16


Do it yourself

Exercise
Find the regular expressions for the following languages:
1 All strings of lowercase letters that contain the five vowels in order.
2 A hexadecimal number
3 A double/float number

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 6 / 16


Some Common Regular Expressions

digit −→ [0-9]
digits −→ [0-9]+
numbers −→ [0-9]+(.[0-9])?(E[+-]?[0-9]+)?
letter −→ [A-Za-z]
id −→ letter [letter |digit]∗
if −→ if
else −→ else
relop −→ < | > | <= | >= | = | <>
whitespace −→ [\t\n ]+

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 7 / 16


Transition Diagrams

Definition
As an intermediate step in the construction of a lexical analyzer, we first con-
vert patterns into stylized flowcharts, called ”transition diagrams.”

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 8 / 16


States

Definition
Transition diagrams have a collection of nodes or circles, called states. Each
state represents a condition that could occur during the process of scanning the
input looking for a lexeme that matches one of several patterns. We may think
of a state as summarizing all we need to know about what characters we have
seen between the lexemeBegin pointer and the forward pointer.

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 9 / 16


Edges

Definition
Edges are directed from one state of the transition diagram to another. Each
edge is labeled by a symbol or set of symbols. If we are in some state s , and
the next input symbol is a , we look for an edge out of state s labeled by a
(and perhaps by other symbols, as well). If we find such an edge, we advance
the forward pointer arid enter the state of the transition diagram to which that
edge leads.

We shall assume that all our transition diagrams are deterministic, meaning
that there is never more than one edge out of a given state with a given symbol
among its labels.

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 10 / 16


Rules for construction of Transition Diagrams
1 Certain states are said to be accepting, or final. These states indicate that
a lexeme has been found, although the actual lexeme may not consist of
all positions between the lexemeBegin and forward pointers. We always
indicate an accepting state by a double circle, and if there is an action to
be taken - typically returning a token and an attribute value to the parser -
we shall attach that action to the accepting state.
2 In addition, if it is necessary to retract the forward pointer one position
(i.e., the lexeme does not include the symbol that got us to the accepting
state), then we shall additionally place a * near that accepting state. In
our example, it is never necessary to retract forward by more than one
position, but if it were, we could attach any number of *’s to the accepting
state.
3 One state is designated the start state, or initial state; it is indicated by
an edge, labeled ”start ,” entering from nowhere. The transition diagram
always begins in the start state before any input symbols have been read.

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 11 / 16


Transition Diagram - Example

Figure 1: Transition diagram for relop (Relational Operator)

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 12 / 16


Transition Diagram - One more Example

Figure 2: A transition diagram for id’s and keyword

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 13 / 16


Transition Diagram - Yet one more Example

Figure 3: A transition diagram for unsigned numbers

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 14 / 16


Do it yourself

Exercise
Find the transition diagrams for the following regular expressions:
1 a(a|b) ∗ a
2 (a|b) ∗ a(a|b)(a|b)
3 a ∗ ba ∗ ba ∗ ba∗

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 15 / 16


Thank you.
Any Questions?

Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 16 / 16

You might also like