0% found this document useful (0 votes)
18 views5 pages

TPL Lect 15 - 16

Uploaded by

shafiqfiverr345
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views5 pages

TPL Lect 15 - 16

Uploaded by

shafiqfiverr345
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Regular Expressions

The lexical analyzer needs to scan and identify only a finite set of
valid string/token/lexeme that belong to the language in hand. It
searches for the pattern defined by the language rules.

Regular expressions have the capability to express finite languages


by defining a pattern for finite strings of symbols. The grammar
defined by regular expressions is known as regular grammar. The
language defined by regular grammar is known as regular language.

Regular expression is an important notation for specifying patterns.


Each pattern matches a set of strings, so regular expressions serve
as names for a set of strings. Programming language tokens can be
described by regular languages. The specification of regular
expressions is an example of a recursive definition. Regular
languages are easy to understand and have efficient implementation.

There are a number of algebraic laws that are obeyed by regular


expressions, which can be used to manipulate regular expressions
into equivalent forms.

Operations

The various operations on languages are:

 Union of two languages L and M is written as


L U M = {s | s is in L or s is in M}
Let L1= {00,10} L2={ε,00} then L1UL2 ={ε,00,10}

 Concatenation of two languages L and M is written as


LM = {st | s is in L and t is in M}
Let L1= {0,1} L2={00,11} then L1.L2 ={000,011,100,111}

 The Kleene Closure of a language L is written as


L* = Zero or more occurrences of language L.
Notations

If r and s are regular expressions denoting the languages L(r) and


L(s), then

 Union : (r)|(s) is a regular expression denoting L(r) U L(s)


 Concatenation : (r)(s) is a regular expression denoting L(r)L(s)
 Kleene closure : (r)* is a regular expression denoting (L(r))*
 (r) is a regular expression denoting L(r)

Precedence and Associativity

 *, concatenation (.), and | (pipe sign) are left associative


 * has the highest precedence
 Concatenation (.) has the second highest precedence.
 | (pipe sign) has the lowest precedence of all.

Representing valid tokens of a language in regular


expression

If x is a regular expression, then:

 x* means zero or more occurrence of x.


i.e., it can generate {e, x, xx, xxx, xxxx, …}

 x+ means one or more occurrence of x.


i.e., it can generate {x, xx, xxx, xxxx …}

[a-z] is all lower-case alphabets of English language.


[A-Z] is all upper-case alphabets of English language.
[0-9] is all Whole digits used in mathematics.
Representing occurrence of symbols using regular
expressions

letter = [a – z] or [A – Z]

digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]

sign = [ + | - ]

Representing language tokens using regular


expressions

Decimal = (sign)?(digit)+

Identifier = (letter)(letter | digit)*

The only problem left with the lexical analyzer is how to verify the
validity of a regular expression used in specifying the patterns of
keywords of a language. A well-accepted solution is to use finite
automata for verification.

Some RE Examples

Regular Expressions Regular Set

(0 + 10*) L = { 0, 1, 10, 100, 1000, 10000, … }

(0*10*) L = {1, 01, 10, 010, 0010, …}

(0 + ε)(1 + ε) L = {ε, 0, 1, 01}

Set of strings of a’s and b’s of any length including the null string. So L = { ε, a,
(a+b)*
b, aa , ab , bb , ba, aaa…….}
Set of strings of a’s and b’s ending with the string abb. So L = {abb, aabb, babb,
(a+b)*abb
aaabb, ababb, …………..}

Set consisting of even number of 1’s including empty string, So L= {ε, 11, 1111,
(11)*
111111, ……….}

Set of strings consisting of even number of a’s followed by odd number of b’s ,
(aa)*(bb)*b
so L = {b, aab, aabbb, aabbbbb, aaaab, aaaabbb, …………..}

String of a’s and b’s of even length can be obtained by concatenating any
(aa + ab + ba + bb)* combination of the strings aa, ab, ba and bb including null, so L = {aa, ab, ba,
bb, aaab, aaba, …………..}

Kickstart Your Career


Getrtified by completing the course
 Regular Grammar:
 A grammar is regular if it h
 as rules of form A -> a or A -> aB or A -> ɛ where ɛ is a special symbol called
NULL.
Example 1:
Write the regular expression for the language accepting all combinations of a's, over the
set ∑ = {a}

Solution:

All combinations of a's mean a may be zero, single, double and so on. If a is appearing
zero times, that means a null string. That is we expect the set of {ε, a, aa, aaa, ....}. So we
give a regular expression for this as:

1. R = a*

That is Kleen closure of a.

Example 2:
Write the regular expression for the language accepting all combinations of a's except the
null string, over the set ∑ = {a}

Solution:
The regular expression has to be built for the language

1. L = {a, aa, aaa, ....}

This set indicates that there is no null string. So we can denote regular expression as:

R = a+

Example 3:
Write the regular expression for the language accepting all the string containing any
number of a's and b's.

Solution:

The regular expression will be:

1. r.e. = (a + b)*

This will give the set as L = {ε, a, aa, b, bb, ab, ba, aba, bab, .....}, any combination of a and
b.

The (a + b)* shows any combination with a and b even a null string.

You might also like