0% found this document useful (0 votes)
35 views27 pages

Lecture 06

The document discusses how regular expressions can be used to describe tokens in a programming language by defining regular languages. It explains regular expressions and languages, provides examples of describing integers and identifiers with regular expressions, and how finite automata can be used to implement regular expression matching.

Uploaded by

Hammad Rajput
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views27 pages

Lecture 06

The document discusses how regular expressions can be used to describe tokens in a programming language by defining regular languages. It explains regular expressions and languages, provides examples of describing integers and identifiers with regular expressions, and how finite automata can be used to implement regular expression matching.

Uploaded by

Hammad Rajput
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 27

Compiler

Construction
Lecture 6
How to Describe Tokens?

• Regular Languages are the most


popular for specifying tokens
• Simple and useful theory
• Easy to understand
• Efficient implementations

2
Languages
• Let be a set of characters. 
is called the alphabet.
• A language over  is set of
strings of characters drawn
from 

3
Example of Languages

Alphabet = English characters


Language = English sentences

Alphabet = ASCII
Language = C++ programs,
Java, C#

4
Notation
• Languages are sets of strings
(finite sequence of characters)
• Need some notation for
specifying which sets we want

5
Notation
• For lexical analysis we care
about regular languages.
• Regular languages can be
described using regular
expressions.

6
Regular Languages
• Each regular expression is a
notation for a regular language
(a set of words).
• If A is a regular expression, we
write L(A) to refer to language
denoted by A.
7
Regular Expression
• A regular expression (RE) is
defined inductively
a ordinary character
from 
the empty string

8
Regular Expression
R|S = either R or S
RS = R followed by S
(concatenation)
R* = concatenation of R
zero or more times
(R*=  |R|RR|RRR...)
9
RE Extentions
R? =  | R (zero or one R)
R +
= RR* (one or more R)
(R) = R (grouping)

10
RE Extentions
[abc] = a|b|c (any of listed)
[a-z]= a|b|....|z (range)
[^ab] = c|d|... (anything but
‘a’‘b’)

11
Regular Expression
RE Strings in L(R)
a “a”
ab “ab”
a|b “a” “b”
(ab)* “” “ab” “abab” ...
(a|)b “ab” “b”
12
Example: integers
• integer: a non-empty string
of digits
• digit = ‘0’|’1’|’2’|’3’|’4’|
’5’|’6’|’7’|’8’|’9’
• integer = digit digit*

13
Example: identifiers
• identifier:
string or letters or digits starting
with a letter
• C identifier:
[a-zA-Z_][a-zA-Z0-9_]*

14
Recap
Tokens:
strings of characters
representing lexical units of
programs such as identifiers,
numbers, operators.

15
Recap
Regular Expressions:
concise description of tokens.
A regular expression
describes a set of strings.

16
Recap
Language L(R):
set of strings represented by
a regular expression R. L(R) is
the language denoted by
regular expression R.

17
How to Use REs
• We need mechanism to
determine if an input string w
belongs to L(R), the language
denoted by regular expression
R.

18
Acceptor
• Such a mechanism is called an
acceptor.

input w
string yes, if w  L
acceptor
no, if w  L
language L

19
Finite Automata (FA)
• Specification:
Regular Expressions
• Implementation:
Finite Automata

20
Finite Automata
Finite Automaton consists of
• An input alphabet (
• A set of states
• A start (initial) state
• A set of transitions
• A set of accepting (final) states

21
Finite Automaton
State Graphs
A state

The start state

An accepting
state
22
Finite Automaton
State Graphs
a

A transition

23
Finite Automata
• A finite automaton accepts a
string if we can follow
transitions labelled with
characters in the string from
start state to some accepting
state.

24
FA Example
A FA that accepts only “1”

25
FA Example
• A FA that accepts any number of 1’s followed by a single 0

1
0

26
FA Example
• A FA that accepts ab*a
• Alphabet: {a,b}

b
a a

27

You might also like