ICT - Module 1 Lecture 2
ICT - Module 1 Lecture 2
Module 1
By
Dr Akriti Nigam
Computer Science & Engineering Department
BIT, Mesra
INFORMATION THEORY AND SOURCE ENCODING
• We are, in general, not very much concerned in our every daily life with accurate transmission of
information. This is because of the redundancy associated with our language-in conversations,
lectures, and radio or telephone communications.
• Many words or even sentences may be missed still not distorting the meaning of the message.
• However, when we are to transmit intelligence-more information in a shorter time, we wish to
eliminate unnecessary redundancy.
• Our language becomes less redundant and errors in transmission become more serious.
• Notice that while we are talking about numerical data, misreading of even a single digit could have
a marked effect on the intent of the message.
• Thus the primary objective of coding for transmission of intelligence would be two fold – increase
the efficiency and reduce the transmission errors.
• Added to this we would like our technique to ensure security and reliability.
Definition of Codes
• ‘Encoding’ or ‘Enciphering’ is a procedure for associating
words constructed from a finite alphabet of a language with
given words of another language in a one-to- one manner.
• Let the source be characterized by the set of symbols S=
{s1, s2... sq}
• We shall call ‘S’ as the “Source alphabet”. Consider
another set, X, comprising of ‘r’ symbols. X={x1, x2…xr}
• We shall call ‘X’ as the “code alphabet”.
• We define “coding” as the mapping of all possible
sequences of symbols of S into sequences of symbol of X.
• In other words, “coding means representing each and
every symbol of S by a sequence of symbols of X such that
there shall be a one-to-one relationship”
• For example, the sequences {x1; x1x3x4; x3x5x7x9;
x1x1x2x2x2} form code words. Their word lengths are
respectively1; 3; 4; and 5.
Basic properties of codes
• We require the codes to satisfy certain properties
1. Block codes: block code is one in which a particular message of the source
is always encoded into the same “fixed sequence” of the code symbol.
Although, in general, block means ‘a group having identical property’ we
shall use the word here to mean a ‘fixed sequence’ only. Accordingly, the code
can be a ‘fixed length code’ or a “variable length code”
Example 1: Source alphabet is S = {s1, s2, s3, s4}, Code alphabet is X = {0,
1} and The Code words are: C = {0, 11, 10, 11}
2. Non – singular codes: A block code is said to be non singular if all the
words of the code set X1, are “distinct”. The codes given in Example 1 do not
satisfy this property as the codes for s2 and s4 are not different. We can not
distinguish the code words.
• If the codes are not distinguishable on a simple inspection we say the code
set is “singular.
Example 2: S = {s1, s2, s3, s4}, X = {0, 1}; Codes, C = {0, 11, 10, 01}
However, the codes given in Example 2 although appear to be non-singular,
upon transmission would pose problems in decoding. For, if the transmitted
sequence is 0011, it might be interpreted as s1 s1 s4 or s2 s4. Thus there is
an ambiguity about the code.
Basic properties of codes
3. Uniquely decodable codes: A non-singular code is uniquely
decipherable, if every word immersed in a sequence of words can be
uniquely identified.
• The nth extension of a code, that maps each message into the code words
C, is defined as a code which maps the sequence of messages into a
sequence of code words.
• This is also a block code, as illustrated in the following example.
Example 3: Second extension of the code set given in Example 2.
S2={s1s1,s1s2,s1s3,s1s4; s2s1,s2s2,s2s3,s2s4; s3s1,s3s2,s3s3,s3s4;
s4s1,s4s2,s4s3,s4s4}
Basic properties of codes
• Notice that, in the above example, the codes for the source sequences, s1s3
and s4s1 are not distinct and hence the code is “Singular”.
• Since such singularity properties introduce ambiguity in the decoding stage,
we therefore require, in general, for unique decidability of our codes that
“The nth extension of the code be nonsingular for every finite n.”
4. Instantaneous Codes: A uniquely decodable code is said to be
“instantaneous” if the end of any code word is recognizable with out the need of
inspection of succeeding code symbols.
That is there is no time lag in the process of decoding. To understand the
concept, consider the following codes:
Basic properties of codes
• Code A undoubtedly is the simplest possible uniquely decipherable code. It is
nonsingular and all the code words have same length. The decoding can be done as soon
as we receive two code symbols without any need to receive succeeding code symbols.
• Code B is also uniquely decodable with a special feature that the 0`s indicate the
termination of a code word. It is called the “comma code”. When scanning a sequence
of code symbols, we may use the comma to determine the end of a code word and the
beginning of the other. Accordingly, notice that the codes can be decoded as and when
they are received and there is, once again, no time lag in the decoding process.
• Where as, although Code C is a non- singular and uniquely decodable code it cannot be
decoded word by word as it is received. For example, if we receive ‘01’, we cannot
decode it as ‘s2’ until we receive the next code symbol. If the next code symbol is
‘0’,indeed the previous word corresponds to s2, while if it is a ‘1’ it may be the symbol
s3; which can be concluded so if only if we receive a ‘0’in the fourth place. Thus, there
is a definite ‘time lag’ before a word can be decoded. Such a ‘time waste’ is not there if
we use either Code A or Code B.
• Further, what we are envisaging is the property by which a sequence of code words is
uniquely and instantaneously decodable even if there is no spacing between successive
words. The common English words do not posses this property. For example the words
“FOUND”, “AT” and “ION” when transmitted without spacing yield, at the receiver, an
altogether new word” FOUNDATION”! A sufficient condition for such property is that
“No encoded word can be obtained from each other by the addition of more letters “.
This property is called “prefix property”.
Basic properties of codes
• Prefix property: “A necessary and sufficient condition
for a code to be ‘instantaneous
• Optimal codes: An instantaneous code is said to be
optimal if it has “minimum average word length”, for a
source with a given probability assignment for the source
symbols. In such codes, source symbols with higher
probabilities of occurrence are made to correspond to
shorter code words. Suppose that a source symbol si has a
probability of occurrence Pi and has a code word of
length li assigned to it, while a source symbol sj with
probabilitys’ is that no complete code word be a prefix of
some other code word”. Pj has a code word of length lj.
• If Pi >Pj then let li<lj. For the two code words
considered, it then follows, that the average length L1 is
given by L1 = Pili + Pjlj
• A code that satisfies all the five properties is called an
“irreducible code”.
Kraft’s Inequality
•There is an instantaneous binary code with
codewords having lengths l1, . . . ln if and
only if
000
00
001 Extending the Tree to Maximum Depth
0
010 •We can extend the tree by filling in the subtree underneath every
01 actual codeword, down to the depth of the longest codeword.
011
NULL •Each codeword then corresponds to either a leaf or a subtree.
100
10 •Previous tree extended, with each codeword’s leaf or subtree circled.
101
1 •Short codewords occupy more of the tree. For a binary code, the
110 fraction of leaves taken by a codeword of length l is 1/2l.
11
111
Constructing Instantaneous Codes
0 1
010 011