0% found this document useful (0 votes)
3 views

Lecture Week 03

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture Week 03

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

Compiler

Construction
CS 322
Mr. Atif Ali
Lecture 6
How to Describe Tokens?
 Regular Languages are the most popular for specifying tokens
because
• These are based on Simple and useful theory
• Easy to understand
• Efficient implementations exist for generating lexical analyzers
based on such languages.

Languages
 Let be a set of characters.  is called the
alphabet.
 A language over  is set of strings of characters
drawn from 
2
Example of Languages
Alphabet = English characters
Language = English sentences
Alphabet = ASCII
Language = C++ programs,
Java, C#
Notation
 Languages are sets of strings (finite sequence of
characters)
 Need some notation for specifying which sets we want
 For lexical analysis we care about regular
languages.
 Regular languages can be described using regular
3
expressions.
Regular Languages
 Each regular expression is a notation for a regular
language (a set of words).
 If A is a regular expression, we write L(A) to refer
to language denoted by A.
 A regular expression (RE) is defined inductively
a ordinary character from 
the empty string
R|S = either R or S
RS = R followed by S (concatenation)
R* = concatenation of R zero or more
times
(R*=  |R|RR|RRR...) 4
RE Extensions
Regular expression extensions are used as
convenient notation of complex RE:

R? =  | R (zero or one R)
R+ = RR* (one or more R)
(R) = R (grouping)
[abc] = a|b|c (any of listed)
[a-z] = a|b|....|z (range)
[^ab] = c|d|... (anything but ‘a’‘b’)
5
Regular Expression
RE Strings in L(R)
a “a”
ab “ab”
a|b “a” “b”
(ab)* “” “ab” “abab” ...
(a|)b “ab” “b”
Here are examples of common tokens found in
programming languages.
 integer: a non-empty string of digits
 digit = ‘0’|’1’|’2’|’3’|’4’|’5’|’6’|’7’|’8’|’9’
 integer = digit digit*

6
Example: identifiers
 identifier:
string or letters or digits starting with a letter
 C identifier: [a-zA-Z_][a-zA-Z0-9_]*

How to Use REs


 We need mechanism to determine if an input
string w belongs to L(R), the language
denoted by regular expression R.
. 7
Acceptor
 Such a mechanism is called
an acceptor.
input w
string yes, if w  L
acceptor
no, if w  L
language L

8
Finite Automata (FA)
 Specification: Regular Expressions
 Implementation: Finite Automata

Finite Automaton consists of


 An input alphabet (
 A set of states
 A start (initial) state
 A set of transitions
 A set of accepting (final) states 9
Finite Automaton
State Graphs
A state
The start state

An accepting state
a

A transition 10
Finite Automata
 A finite automaton accepts a string if we can
follow transitions labelled with characters in the
string from start state to some accepting state.

FA Example
A FA that accepts only “1”
1

11
FA Example
 A FA that accepts any number of 1’s followed by
a single 0
1
0

 A FA that accepts ab*a


 Alphabet: {a,b} b
a a
12
Table Encoding of FA
 Transition b
table a a
0 1 2

a b
0 1 err
1 2 1
2 err err
13
RE → Finite Automata
 Can we build a finite automaton for every regular
expression?
 Yes, – build FA inductively based on the definition
of Regular Expression
NFA
Nondeterministic Finite Automaton (NFA)
 Can have multiple transitions for one input in a given state
 Can have  - moves

Epsilon Moves
 ε – moves 
machine can move from state A
to state B without consuming
input
A 14 B
NFA
operation of the automaton is not completely defined by input
1
0 1
A B C
On input “11”, automaton could be in either state
Execution of FA
A NFA can choose
 Whether to make -moves.
 Which of multiple transitions to take for a single
input. 15
Acceptance of NFA
 NFA can get into multiple states
 Rule: NFA accepts if it can get in a final state
1
0 1
A B C

0
DFA and NFA
Deterministic Finite Automata (DFA)
 One transition per input per state.
 No  - moves
16
Execution of FA
A DFA
 can take only one path through the state graph.
 Completely determined by input.

NFA vs DFA
 NFAs and DFAs recognize the same set of languages (RL)
 DFAs are easier to implement – table driven.
 For a given language, the NFA can be simpler than the DFA.
 DFA can be exponentially larger than NFA.
 NFAs are the key to automating RE → DFA construction.

17
RE → NFA Construction
Thompson’s construction (CACM 1968)
 Build an NFA for each RE term.
 Combine NFAs with -moves.
Subset construction
NFA → DFA
 Build the simulation.
 Minimize number of states in DFA (Hopcroft’s
algorithm)
Key idea:
 NFA pattern for each symbol and each operator.
 Join them with -moves in precedence order.
18
RE → NFA Construction
a
NFA for a s0 s1
b
NFA for b s3 s4

a  b
s0 s1 s3 s4

NFA for ab
19
RE → NFA Construction
a
 s1 s2 
s0 s5
 b
s3 s4 

NFA for a | b
20
RE → NFA Construction

 a 
s0 s1 s2 s4


NFA for a*
21
RE → NFA Construction

 a 
s0 s1 s2 s4


NFA for a*
22
Example RE → NFA
NFA for a ( b|c )* 

b
   s4 s5 
a 
s0 s1 s2 s3 s8 s9
 s c
6 s7 


23
Thank You!

24

You might also like