Lecture 06
Lecture 06
Construction
Lecture 6
How to Describe Tokens?
2
Languages
• Let be a set of characters.
is called the alphabet.
• A language over is set of
strings of characters drawn
from
3
Example of Languages
Alphabet = ASCII
Language = C++ programs,
Java, C#
4
Notation
• Languages are sets of strings
(finite sequence of characters)
• Need some notation for
specifying which sets we want
5
Notation
• For lexical analysis we care
about regular languages.
• Regular languages can be
described using regular
expressions.
6
Regular Languages
• Each regular expression is a
notation for a regular language
(a set of words).
• If A is a regular expression, we
write L(A) to refer to language
denoted by A.
7
Regular Expression
• A regular expression (RE) is
defined inductively
a ordinary character
from
the empty string
8
Regular Expression
R|S = either R or S
RS = R followed by S
(concatenation)
R* = concatenation of R
zero or more times
(R*= |R|RR|RRR...)
9
RE Extentions
R? = | R (zero or one R)
R +
= RR* (one or more R)
(R) = R (grouping)
10
RE Extentions
[abc] = a|b|c (any of listed)
[a-z]= a|b|....|z (range)
[^ab] = c|d|... (anything but
‘a’‘b’)
11
Regular Expression
RE Strings in L(R)
a “a”
ab “ab”
a|b “a” “b”
(ab)* “” “ab” “abab” ...
(a|)b “ab” “b”
12
Example: integers
• integer: a non-empty string
of digits
• digit = ‘0’|’1’|’2’|’3’|’4’|
’5’|’6’|’7’|’8’|’9’
• integer = digit digit*
13
Example: identifiers
• identifier:
string or letters or digits starting
with a letter
• C identifier:
[a-zA-Z_][a-zA-Z0-9_]*
14
Recap
Tokens:
strings of characters
representing lexical units of
programs such as identifiers,
numbers, operators.
15
Recap
Regular Expressions:
concise description of tokens.
A regular expression
describes a set of strings.
16
Recap
Language L(R):
set of strings represented by
a regular expression R. L(R) is
the language denoted by
regular expression R.
17
How to Use REs
• We need mechanism to
determine if an input string w
belongs to L(R), the language
denoted by regular expression
R.
18
Acceptor
• Such a mechanism is called an
acceptor.
input w
string yes, if w L
acceptor
no, if w L
language L
19
Finite Automata (FA)
• Specification:
Regular Expressions
• Implementation:
Finite Automata
20
Finite Automata
Finite Automaton consists of
• An input alphabet (
• A set of states
• A start (initial) state
• A set of transitions
• A set of accepting (final) states
21
Finite Automaton
State Graphs
A state
An accepting
state
22
Finite Automaton
State Graphs
a
A transition
23
Finite Automata
• A finite automaton accepts a
string if we can follow
transitions labelled with
characters in the string from
start state to some accepting
state.
24
FA Example
A FA that accepts only “1”
25
FA Example
• A FA that accepts any number of 1’s followed by a single 0
1
0
26
FA Example
• A FA that accepts ab*a
• Alphabet: {a,b}
b
a a
27