CSI 411 - Compiler - Lecture 3 PDF
CSI 411 - Compiler - Lecture 3 PDF
2 Transition Diagrams
States
Edges
Rules for construction
Examples of a Transition Diagram
Exercise
3 Conclusion
* and +.
3 Character classes: A regular expression a1 |a2 |...|an , where the ai ’s are
each symbols of the alphabet, can be replaced by the shorthand [a1 a2 ...an ].
More importantly, when a1 , a2 , ..., an form a logical sequence, e.g., con-
secutive uppercase letters, lowercase letters, or digits, we can replace
them by a1 − an , that is, just the first and last separated by a hyphen.
Thus, [abc] is shorthand for a|b|c, and [a-z] is shorthand for a|b|...|z.
Adnan Ferdous Ashrafi Lecture 3 - Lexical Analyzer (Continued) 4 / 16
Example of a Regular Expression
Example 1
Find the regular expression for a ”word”
Solution: [A-Za-z]+
Example 2
Find the regular expression for a ”Integer Number”
Solution: [0-9]+
Example 3
Find the regular expression for a ”Word that ends with the letter ’A’ ”
Solution: [A-Za-z]+a
Exercise
Find the regular expressions for the following languages:
1 All strings of lowercase letters that contain the five vowels in order.
2 A hexadecimal number
3 A double/float number
digit −→ [0-9]
digits −→ [0-9]+
numbers −→ [0-9]+(.[0-9])?(E[+-]?[0-9]+)?
letter −→ [A-Za-z]
id −→ letter [letter |digit]∗
if −→ if
else −→ else
relop −→ < | > | <= | >= | = | <>
whitespace −→ [\t\n ]+
Definition
As an intermediate step in the construction of a lexical analyzer, we first con-
vert patterns into stylized flowcharts, called ”transition diagrams.”
Definition
Transition diagrams have a collection of nodes or circles, called states. Each
state represents a condition that could occur during the process of scanning the
input looking for a lexeme that matches one of several patterns. We may think
of a state as summarizing all we need to know about what characters we have
seen between the lexemeBegin pointer and the forward pointer.
Definition
Edges are directed from one state of the transition diagram to another. Each
edge is labeled by a symbol or set of symbols. If we are in some state s , and
the next input symbol is a , we look for an edge out of state s labeled by a
(and perhaps by other symbols, as well). If we find such an edge, we advance
the forward pointer arid enter the state of the transition diagram to which that
edge leads.
We shall assume that all our transition diagrams are deterministic, meaning
that there is never more than one edge out of a given state with a given symbol
among its labels.
Exercise
Find the transition diagrams for the following regular expressions:
1 a(a|b) ∗ a
2 (a|b) ∗ a(a|b)(a|b)
3 a ∗ ba ∗ ba ∗ ba∗