Introduction and RE
Introduction and RE
Words
Words are strings belonging to some language.
Example If Σ= {x} then a language L can be defined as L={xn : n=1,2,3,…..} or L={x,xx,xxx,
….} Here x,xx,… are the words of L
All words are strings, but not all strings are words.
Valid/In-valid alphabets
While defining an alphabet, an alphabet may contain letters consisting of group of
symbols
for example Σ1= {B, aB, bab, d}.
Now consider an alphabet Σ2= {B, Ba, bab, d} and a string BababB.
This string can be tokenized in two different ways (Ba), (bab), (B) (B), (abab), (B) Which
shows that the second group cannot be identified as a string, defined over Σ = {a, b}. As
when this string is scanned by the compiler (Lexical Analyzer), first symbol B is identified
as a letter belonging to Σ, while for the second letter the lexical analyzer would not be
able to identify, so while defining an alphabet it should be kept in mind that ambiguity
should not be created.
While defining an alphabet of letters consisting of more than one symbols, no letter
should be started with the letter of the same alphabet i.e. one letter should not be the
prefix of another. However, a letter may be ended in a letter of same alphabet.
Length of Strings
The length of string s, denoted by |s|, is the number of letters in the
string.
Example Σ={a,b} s=ababa |s|=5
Example Σ= {B, aB, bab, d}
s=BaBbabBd Tokenizing=(B), (aB), (bab), (B), (d) |s|=5
Reverse of a String
The reverse of a string s denoted by Rev(s) or sr , is obtained by writing
the letters of s in reverse order.
Example If s=abc is a string defined over Σ={a,b,c} then Rev(s) or sr =
cba
Example Σ= {B, aB, bab, d}
s=BaBbabBd
Rev(s)=dBbabaBB
Defining Languages
The languages can be defined in different ways , such as Descriptive
definition, Recursive definition, using Regular Expressions(RE) and using
Finite Automaton(FA) etc.
Kleene Star Closure
Given Σ, then the Kleene Star Closure of the alphabet Σ, denoted by
Σ* , is the collection of all strings defined over Σ, including Λ. It is to be
noted that Kleene Star Closure can be defined over any set of strings.
Examples If Σ = {x} Then Σ* = {Λ, x, xx, xxx, xxxx, ….}
If Σ = {0,1} Then Σ* = {Λ, 0, 1, 00, 01, 10, 11, ….} If Σ = {aaB, c} Then Σ* =
{Λ, aaB, c, aaBaaB, aaBc, caaB, cc, ….}
Note Languages generated by Kleene Star Closure of set of strings, are
infinite languages. (By infinite language, it is supposed that the
language contains infinite many words, each of finite length).
PLUS Operation (+ )
Plus Operation is same as Kleene Star Closure except that it does not
generate Λ (null string), automatically.
Example If Σ = {0,1} Then Σ+ = {0, 1, 00, 01, 10, 11, ….} If Σ = {aab, c}
Then Σ+ = {aab, c, aabaab, aabc, caab, cc, ….}
Remark It is to be noted that Kleene Star can also be operated on any
string i.e. a * can be considered to be all possible strings defined over
{a}, which shows that a* generates Λ, a, aa, aaa, … It may also be noted
that a+ can be considered to be all possible non empty strings defined
over {a}, which shows that a+ generates a, aa, aaa, aaaa, …
Regular Expression As discussed earlier that a* generates Λ, a, aa, aaa,
… and a+ generates a, aa, aaa, aaaa, …, so the language L1 = {Λ, a, aa,
aaa, …} and L2 = {a, aa, aaa, aaaa, …} can simply be expressed by a *
and a+ , respectively. a * and a+ are called the regular expressions (RE)
for L1 and L2 respectively.
Recursive definition of Regular Expression(RE)
Step 1: Every letter of Σ including Λ is a regular expression.
Step 2: If r1 and r2 are regular expressions then (r1) r1 r2 r1 + r2 and
r1* are also regular expressions.
Step 3: Nothing else is a regular expressio
Write RE that accept all words start with “a”. Σ = {a,b}
a(a + b)*