0% found this document useful (0 votes)
45 views13 pages

02 PDF

The document provides an overview of regular expressions and regular languages. It defines basic concepts like alphabets, words, operations on words, languages, and operations on languages. It then introduces regular expressions as a finite representation of languages and defines the relationship between regular expressions and the languages they represent. The document proves various properties of regular expressions and languages and provides examples to illustrate the concepts.

Uploaded by

kjhdskjhfkjfh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views13 pages

02 PDF

The document provides an overview of regular expressions and regular languages. It defines basic concepts like alphabets, words, operations on words, languages, and operations on languages. It then introduces regular expressions as a finite representation of languages and defines the relationship between regular expressions and the languages they represent. The document proves various properties of regular expressions and languages and provides examples to illustrate the concepts.

Uploaded by

kjhdskjhfkjfh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Lecture 2: Languages and Regular Expressions

Basic concepts

• Alphabet — a finite set of symbols, Σ.


• word (or string) — a finite sequence of symbols from
an alphabet.

Alphabet Words
{a, b, . . . , z} man, abc, . . .
{0, 1} 000, 010101, . . .
{#, $, a, b, c} #cb$, $$$, . . .

• |w| — length of a word w, i.e. the number of symbols


in w.
• e — the empty word containing no symbols, i.e. the
word of zero length.
• To avoid confusion, e should not be in any alphabet.

1
Operations on words

• Concatenation merges two given words to form a


new word:
e.g. abc 123 = abc123
– Properties:
ew = we = w
(uv)w = u(vw)
• Reversal reverses the order of all the symbols in a
given w.

w = a1 · · · an ⇒ w R = an · · · a1
– Inductive definition:
(1) eR = e
(2) (au)R = uR a, where a ∈ Σ, u ∈ Σ∗
• Power concatenates n copies of w to form a new
word n
n z }| {
w = ww · · · w
– Inductive definition:
(1) w0 = e
(2) wi+1 = wi w, for any i ≥ 0.

2
Theorem
(uw)R = wR uR , where u, w ∈ Σ∗
Proof: Prove by induction on |u|.
• Basis step: |u| = 0, i.e. u = e.

(ew)R = wR = wR e = wR eR

• Induction hypotheses: Assume


(uw)R = wR uR for |u| ≤ n,

• Induction step: Consider the case |u| = n+1.


Let u = av for some a∈Σ and v∈Σ∗ such that |v| = n.

(uw)R = ((av)w)R
= (a(vw))R Associative law
= (vw)R a Rule 2 of ind. definition
= wR v R a Induction hypothesis
= wR (av)R Rule 2 of ind. definition
= w R uR .

3
Languages

• A language is a set of words defined over an alphabet


Σ.

Examples:
1. Set of all English words — a language over {a, b, . . . , z}.
2. {01, 0101, 010101, . . . } — a language over {0, 1}.
3. {e} — a language over any alphabet.

• ∅ — the empty language, i.e. the language contains


no words.
• Note: ∅ =
6 {e}.
• Σ∗ — the set of all words over the alphabet Σ. It is
called the universal language. Any language L is a
subset of Σ∗.
• Connection with decision problems: A decision prob-
lem corresponds to the language that consists of all
the yes-inputs.

4
Operations on languages

• Concatenation:
L1L2 = {xy | x ∈ L1, y ∈ L2}
• Reversal:
LR = {wR | w ∈ L}
• Power:
Ln = {w1w2 · · ·wn | w1 , w2, · · · , wn ∈ L}
Inductive definition:
1. L0 = {e}.
2. Ln+1 = LnL, n ≥ 0.
• Kleene star:
L∗ = {w ∈ Σ∗ | w = w1w2 · · · wk for some k ≥ 0
and some w1, . . . , wk ∈ L}
It is also called the reflexive transitive closure of L
under concatenation.
• Plus:
L+ = LL∗
It is also called the transitive closure of L under
concatenation.

5
Operations on languages

Examples:
Let Σ = {a, b}, L1 = {a, ab}, and L2 = {e, ba}.

Then
LR
1 = {a, ba}.
L1L2 = {a, ab, aba, abba}.
L21 = L1L1 = {aa, aab, aba, abab}.
L22 = L2L2 = {e, ba, baba}.
Σ∗ = {e, a, b, aa, ab, ba, bb, aaa, · · ·}.

{e}1000 =?
L∅ =?
∅∗ =?
e 6∈ L+?
Ln ⊆ Ln+1?
Ln ⊆ L∗ ?

1. Prove that (wR )R = w for any string w.


2. Prove that {e}∗ = {e}.
3. Prove that for any language L, (L∗)∗ = L∗.

6
Regular expressions
Regular expressions are a finite representation of lan-
guages.

Inductive definition of regular expressions for languages


over an alphabet Σ. A regular expression is a string over
alphabet Σ1 = Σ ∪ {(, ), ∅, ∪,∗ }.
1. ∅ and each σ ∈ Σ are regular expressions.
2. If α and β are regular expressions, then
(αβ), (α ∪ β), α∗
are regular expressions.
3. Nothing else is a regular expression.

Examples:
Let Σ = {a, b, c, d}.
• Regular expressions:

a, ((a∪b)∗d), (c∗(a∪(bc∗)))∗, ∅∗

• Not regular expressions:

c∪∗, (∗)
7
Language represented by regular expressions
Let α denote a regular expression.
Let L(α) denote the language represented by a regular
expression α.

The function L is defined as follows:


1. L(∅) = ∅, L(a) = {a} for each a∈Σ,
2. L(αβ) = L(α)L(β),
3. L(α∪β) = L(α)∪L(β),
4. L(α∗) = L(α)∗.

Example: What is L[((a∪b)∗a)]?

L[((a∪b)∗a)] = L((a ∪ b)∗)L(a)


= (L(a ∪ b))∗L(a)
= (L(a) ∪ L(b))∗L(a)
= ({a} ∪ {b})∗{a}
= {w ∈ {a, b}∗ | w ends with a}

Example: What is L[(c∗(a∪(bc∗))∗)]?


An example of strings in the language is cccaabcccbaaabbcca.

8
Examples:
1. Write a regular expression for each of the following
languages defined over Σ = {0, 1}:
(a) L = {w | w contains at least two zeros}
(b) L = {w | w is of even length}
(c) L = {w | w has even number of 1’s}
2. Simplify ∅∗ ∪ a∗ ∪ b∗ ∪ (a ∪ b)∗.
3. Simplify (a ∪ b)∗a(a ∪ b)∗.
4. Prove that
L[c∗(a∪(bc∗))∗] = {w ∈ {a, b, c}∗ | w does not con-
tain substring ac}
Proof:
Suppose w ∈ L[c∗(a∪(bc∗))∗]
⇒ each occurrence of a in w is either at the end of
the string, or is followed by another occurrence of a,
or is followed by an occurrence of b.
⇒ w does not have the substring ac.

Suppose w is a string that does not contain ac.


⇒ Let w = uv where u consists of zero or more c’s,
then v has no substring ac and does not begin with

9
c.
⇒ v is a sequence of a’s, b’s and c’s with any blocks
of c’s appearing only immediately after b’s, not after
a’s and not at the beginning of the string. Thus
v ∈ L((a ∪ bc∗)∗).
⇒ w ∈ L(c∗(a ∪ bc∗ )∗).
Notational simplifications:
1. A regular expression α also denotes the language
L(α) represented by α. E.g., we may write ab ∈ a∗b∗.
2. Omit extra parentheses. E.g.,
(a ∪ b) ∪ c = a ∪ (b ∪ c) = a ∪ b ∪ c
(ab)c = a(bc) = abc
a ∪ (bc) = a ∪ bc 6= (a ∪ b)c

10
Regular languages

Regular language: A language that can be specified


as a regular expression.

Closure Properties:
If A and B are two regular languages. Then
the following languages are also regular
AB, A∪B, A∗, AR .
Proof:
Since A and B are regular languages, by definition, they
can be represented by some regular expressions. Let A =
L(α), B = L(β), where α, β are regular expressions.
Then we have
• AB = L(α)L(β) = L(αβ)
That is, αβ is a regular expression representing the lan-
guage AB. Thus AB is a regular language.

The proofs for A∪B and A∗ being regular are similar.

Try to prove yourself that AR is regular, given A is reg-


ular.

11
Language generators vs language recognizers
A Language generator (e.g. a regular expression) repre-
sents a language by generating the words in the language
(c∗(a∪(bc∗))∗) ⇒
{w∈{a, b, c}∗ : w does not contain substring ac}.

12
A language recognizer (e.g., an algorithm) represents a
language by recognizing its words.

Algorithm: recognizer(w)
• Input: w — a string.
• Output: YES or No.

1. If w=e, return YES.


2. flagA=FALSE.
3. Scan w from left to right. For each symbol:
• If the current symbol is “a”, flagA=TRUE;
• Else if the current symbol is “c”,
(a) If flagA=TRUE, return NO.
(b) flagA=FALSE.
• Else flagA=FALSE.
4. Return YES.

{w∈{a, b, c}∗ : recognizer(w)=YES} =


{w∈{a, b, c}∗ : w does not contain substring ac}.

13

You might also like