0% found this document useful (0 votes)
19 views4 pages

Regular Expressions

The document provides an overview of regular expressions, which are used to define patterns for finite strings in programming languages. It explains operations on languages such as union, concatenation, and Kleene closure, along with their notations and precedence. Additionally, it includes examples of regular expressions and their corresponding sets of valid strings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views4 pages

Regular Expressions

The document provides an overview of regular expressions, which are used to define patterns for finite strings in programming languages. It explains operations on languages such as union, concatenation, and Kleene closure, along with their notations and precedence. Additionally, it includes examples of regular expressions and their corresponding sets of valid strings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

WAYAMBA UNIVERSITY OF SRI LANKA

FACULTY OF APPLIED SCIENCES


Architecture and Compiler Design: Theory Notes: 2

Regular Expressions

The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme that
belong to the language in hand. It searches for the pattern defined by the language rules.
Regular expressions have the capability to express finite languages by defining a pattern for finite
strings of symbols. The grammar defined by regular expressions is known as regular grammar.
The language defined by regular grammar is known as regular language.
Regular expression is an important notation for specifying patterns. Each pattern matches a set of
strings, so regular expressions serve as names for a set of strings. Programming language tokens
can be described by regular languages. The specification of regular expressions is an example of
a recursive definition. Regular languages are easy to understand and have efficient
implementation.
There are a number of algebraic laws that are obeyed by regular expressions, which can be used
to manipulate regular expressions into equivalent forms.

Operations
The various operations on languages are:
 Union of two languages L and M is written as
L U M = {s | s is in L or s is in M}
 Concatenation of two languages L and M is written as
LM = {st | s is in L and t is in M}
 The Kleene Closure of a language L is written as
L* = Zero or more occurrence of language L.

Notations
If r and s are regular expressions denoting the languages L(r) and L(s), then
 Union : (r)|(s) is a regular expression denoting L(r) U L(s)
 Concatenation : (r)(s) is a regular expression denoting L(r)L(s)
 Kleene closure : (r)* is a regular expression denoting (L(r))*
 (r) is a regular expression denoting L(r)

Precedence and Associativity


 *, concatenation (.), and | (pipe sign) are left associative
 * has the highest precedence
 Concatenation (.) has the second highest precedence.
 | (pipe sign) has the lowest precedence of all.

Representing valid tokens of a language in regular expression


If x is a regular expression, then:
 x* means zero or more occurrence of x.
i.e., it can generate { e, x, xx, xxx, xxxx, … }
 x+ means one or more occurrence of x.
i.e., it can generate { x, xx, xxx, xxxx … } or x.x*
 x? means at most one occurrence of x
i.e., it can generate either {x} or {e}.
[a-z] is all lower-case alphabets of English language.
[A-Z] is all upper-case alphabets of English language.
[0-9] is all natural digits used in mathematics.

Representing occurrence of symbols using regular expressions


letter = [a – z] or [A – Z]
digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]
sign = [ + | - ]

Representing language tokens using regular expressions


Decimal = (sign)?(digit)+
Identifier = (letter)(letter | digit)*
The only problem left with the lexical analyzer is how to verify the validity of a regular expression
used in specifying the patterns of keywords of a language. A well-accepted solution is to use finite
automata for verification.
A Regular Expression can be recursively defined as follows –
 ε is a Regular Expression indicates the language containing an empty string. (L (ε) = {ε})
 φ is a Regular Expression denoting an empty language. (L (φ) = { })
 x is a Regular Expression where L = {x}
 If X is a Regular Expression denoting the language L(X) and Y is a Regular Expression
denoting the language L(Y), then
o X + Y is a Regular Expression corresponding to the language L(X) ∪
L(Y) where L(X+Y) = L(X) ∪ L(Y).
o X . Y is a Regular Expression corresponding to the language L(X) .
L(Y) where L(X.Y) = L(X) . L(Y)
o R* is a Regular Expression corresponding to the language L(R*)where L(R*) =
(L(R))*
 If we apply any of the rules several times from 1 to 5, they are Regular Expressions.

Some RE Examples

Regular Expressions Regular Set

(0 + 10*) L = { 0, 1, 10, 100, 1000, 10000, … }

(0*10*) L = {1, 01, 10, 010, 0010, …}

(0 + ε)(1 + ε) L = {ε, 0, 1, 01}

(a+b)* Set of strings of a’s and b’s of any length including the null string. So L = {ε, a, b,
aa, ab, bb, ba, aaa…….}

(a+b)*abb Set of strings of a’s and b’s ending with the string abb. So L = {abb, aabb, babb,
aaabb, ababb, …………..}

(11)* Set consisting of even number of 1’s including empty string, So L= {ε, 11, 1111,
111111, ……….}
(aa)*(bb)*b Set of strings consisting of even number of a’s followed by odd number of b’s , so L
= {b, aab, aabbb, aabbbbb, aaaab, aaaabbb, …………..}

(aa + ab + ba + bb)* String of a’s and b’s of even length can be obtained by concatenating any
combination of the strings aa, ab, ba and bb including null, so L = {aa, ab, ba, bb,
aaab, aaba, …………..}

You might also like