Alphabets, Strings and Languages Languages:: Example
Alphabets, Strings and Languages Languages:: Example
com
Introduction
Alphabets, Strings and Languages
Languages:
A general definition of language must cover a variety of distinct categories: natural languages,
programming languages, mathematical languages, etc. The notion of natural languages like
English, Hindi, etc. is familiar to us. Informally, language can be defined as a system suitable for
expression of certain ideas, facts, or concepts, which includes a set of symbols and rules to
manipulate these. The languages we consider for our discussion is an abstraction of natural
languages. That is, our focus here is on formal languages that need precise and formal
definitions. Programming languages belong to this category. We start with some basic concepts
and definitions required in this regard.
Symbols:
Symbols are indivisible objects or entity that cannot be defined. That is, symbols are the atoms of
the world of languages. A symbol is any single object such as , a, 0, 1, #, begin, or do.
Usually, characters from a typical keyboard are only used as symbols.
Alphabets:
An alphabet is a finite, nonempty set of symbols. The alphabet of a language is normally denoted
by . When more than one alphabets are considered for discussion, then subscripts may be used
(e.g. etc) or sometimes other symbol like G may also be introduced.
Example:
Example: 0110, 11, 001 are three strings over the binary alphabet { 0, 1 } .
1
mywbut.com
It is not the case that a string over some alphabet should contain all the symbols from the
alphabet. For example, the string cc over the alphabet { a, b, c } does not contain the symbols a
and b. Hence, it is true that a string over an alphabet is also a string over any superset of that
alphabet.
Length of a string:
The number of symbols in a string w is called its length, denoted by |w|.
Convention: We will use small case letters towards the beginning of the English alphabet to
denote symbols of an alphabet and small case letters towards the end to denote strings over an
alphabet. That is, (symbols) and are strings.
Example: Concatenation of the strings 0110 and 11 is 011011 and concatenation of the strings
good and boy is goodboy.
Note that for any string w, we = ew = w. It is also obvious that if | x | = n and | y | = m, then | x +
y | = n + m.
Example: Consider the string 011 over the binary alphabet. All the prefixes, suffixes and
substrings of this string are listed below.
Powers of Strings: For any string x and integer , we use to denote the string formed
by sequentially concatenating n copies of x. We can also give an inductive definition of as
follows:
2
mywbut.com
The set contains all the strings that can be generated by iteratively concatenating symbols
from any number of times.
Example: If = { a, b }, then = { e, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, …}.
The set of all nonempty strings over an alphabet is denoted by . That is,
Note that is infinite. It contains no infinite strings but strings of arbitrary lengths.
Reversal:
For any string the reversal of the string is .
Languages:
A language over an alphabet is a set of strings over that alphabet. Therefore, a language L is any subset
of . That is, any is a language.
Example :
Convention: Capital letters A, B, C, L, etc. with or without subscripts are normally used to
denote languages.
3
mywbut.com
Set operations on languages: Since languages are set of strings we can apply set
operations to languages. Here are some simple examples (though there is nothing new in it).
Example: { 0, 11, 01, 011 } { 1, 01, 110 } = { 0, 11, 01, 011, 111 }
Complement: Usually, is the universe that a complement is taken with respect to. Thus for a
language L, the complement is L(bar) = { | }.
Example: Let L = { x | |x| is even }. Then its complement is the language { | |x| is odd }.
Similarly we can define other usual set operations on languages like relative complement,
symmetric difference, etc.
Reversal of a language:
The reversal of a language L, denoted as , is defined as: .
Note that ,
1. in general.
2.
3.
Iterated concatenation of languages: Since we can concatenate two languages, we also repeat
this to concatenate any number of languages. Or we can concatenate a language with itself any
4
mywbut.com
number of times. The operation denotes the concatenation of L with itself n times. This is
defined formally as follows:
and so on.
= ( Union n in N )
Thus is the set of all strings derivable by any number of concatenations of strings in L. It is
also useful to define
5
mywbut.com
Problems
3. Consider the language L={ 01, 11, 011}. Which of the following strings are in L*
5. Let L1={ aa }*, L2={ a, b } { a, b }{ a, b } , and L3= L2*. Describe the string that are in the
language L2, L3, and L1 L3
7. Let be an alphabet. Prove that the relation { (x, y) | x is a prefix of y } is a partial ordering
of .
a) L+=L*-{ } iff L.
6
mywbut.com
b) If L1 L2 then L1 L2 L1L2