0% found this document useful (0 votes)
596 views27 pages

Information Theory and Coding

The document discusses source coding theorem and various types of source codes. It explains that a discrete memoryless source outputs symbols from a finite set according to a probability distribution. The entropy of the source represents the average number of bits per symbol needed to efficiently encode the output. Variable length codes are more efficient than fixed length codes as they can assign shorter codes to more frequent symbols. For a code to be uniquely decodable, no codeword can be a prefix of another. Optimal codes achieve compression close to the source entropy.

Uploaded by

Sahil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
596 views27 pages

Information Theory and Coding

The document discusses source coding theorem and various types of source codes. It explains that a discrete memoryless source outputs symbols from a finite set according to a probability distribution. The entropy of the source represents the average number of bits per symbol needed to efficiently encode the output. Variable length codes are more efficient than fixed length codes as they can assign shorter codes to more frequent symbols. For a code to be uniquely decodable, no codeword can be a prefix of another. Optimal codes achieve compression close to the source entropy.

Uploaded by

Sahil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

 Source coding theorem

◦ It shows the efficient representation of symbols


generated by the source
◦ The main motivation is compression of data

◦ A discrete memoryless source output a symbol


every T seconds
◦ Each symbol is selected from a finite set of symbols

◦ The symbols are occuring with the probabilities


 The entropy of this DMS in bits per source
symbols is

 The equality holds when symbols are equally


likely.
 Entropy is average number of bits per
symbol.
 The source rate is H(x)/T bits per second
 Suppose we need to represent 26 letters of
english alphabets using bits
 We Know that
 So, Each of the letters is being is represented
by atleast 5 bits

 The number of binary digits(bits) R required


for unique coding
 When L is a power of 2

 When L is not a power of 2



 Here we can conclude that
 The fixed length code means each letter in
alphabet is equally important (probable)
 So each one requires 5 bits for representation

 We know that some of letters are less important


i.e. (x,q,z..etc)
 Some letters are more frequently used (s, t, a,
e...etc)

 However representing same number of bits for all


the letters is not the efficient way of coding
◦ This is also known as Fixed Length of Codes. (FLC)
 For example. ASCII codes.
 Better way of coding is
◦ More frequent alphabet is represented by less
number of bits
◦ Less frequent alphabet is represented by more
number of bits

This is known as Variable Length Codes. (VLC)


 Fixed Length codes
 Variable Length codes
 Distinct codes
 Uniquely decodable codes
 Prefix free codes
 Instantaneous codes
 Optimal codes
 Entropy coding
 If the code word length for a code is fixed
 A fixed length code assigns fixed number of bits
to the source symbols irrespective of their
statistics of appearance
◦ ASCII codes
 A to Z
 A to z
 0 to 9
 Punctuation mark
 Commas etc. have a 7 bit code word
 If there are L number of source alphbets
 If L is a power of 2 then codeword is given by

 If L is not a power of 2 then codeword is given by


 The codeword is not fixed
◦ More frequent by less number of bits
◦ Less frequent by more number of bits

◦ It require less number of bits as compared to fixed


length of codes to encode a same information
A Code is called Distinct if each codeword is
distinguishable from other

Xj Codeword
X1 00
X2 01
X3 10
X4 11
 The coded source symbols are transmitted as
stream of bits
 The codes must satisfy some properties so
that the receiver can identify the possible
symbols from stream of bits

 A Distinct code is said to be uniquely


decodable if the original source sequence can
be represented perfectly from received
encoded binary sequence.
Symbol Code 1 Code 2
A 00 0
B 01 1
C 10 00
D 11 01

Code 1 is fixed length code


Code 2 is variable length code
The message ‘A BAD CAB’ can be encoded
using above 2 codes
In Code 1 format it appears as
00 010011 100001
In Code 2 format is appears as
0 1001 0001
In Code 1 format it appears as
00 010011 100001
In Code 2 format is appears as
0 1001 0001

 Here code 1 requires 14 bits to encode


 Here code 2 requires 9 bits to encode

 Although code 2 is having less codes, yet it is not


a valid code as there is decoding problem with
this code
 The code 0 1001 0001 can be grouped in
different ways as
Symbol Code 1 Code 2
A 00 0
B 01 1
C 10 00
D 11 01

 The code 0 1001 0001 can be grouped in


different ways as
 [0] [1][0][0][1] [0][0][0][1] which means
 A BAAB AAAB
 A B C B C D
 D C B C D

 As the destination does not know where the


codeword ends and there is new codeword
start.
 In this case code 1 is uniquely decodable
 A code in which no code word forms the prefix of
any other codeword is called prefix free code
 The Prefix code is
Symbol Codeword
A 0
B 10
C 110
D 1110

 If zero(0) is received, the receiver cannot decide


whether it is entire code for ‘A’ or a partial code
word for ‘C’ or ‘D’
 Hence no code word should be prefix of any
other code word. This is called Prefix Free Code
 A Uniquely decodable code is said to be an
instantaneous code if the end of any code is
recognizable without checking subsequent
code symbols.
 It can be type of Prefix or Prefix free.
 A code is called optimal code if it is
instantaneous and has minimum average
length for a given source particular
probability assignment for the source
symbols.
 When a variable length code is designed such
that its average codeword length approaches
the entropy of the DMS (discrete memoryless
source).
 It is known as entropy coding

◦ Shanon fano and Huffman coding are the examples.


Xj Code 1 Code 2 Code 3 Code 4 Code 5 Code 6
X1 00 00 0 0 0 1
X2 01 01 1 10 01 01
X3 00 10 00 110 011 001
X4 11 11 11 111 0111 0001

Code 1 and Code 2 are fixed length codes


Code 3, 4, 5 and 6 are variable length codes
All codes are distinct except code 1
Code 2, 4, 6 are prefix or instantaneous codes
Code 2, 4, 5 and 6 are uniquely decodable codes

Code 5 is not prefix free code, still it is uniquely decodable since bit 0
indicates the beginning of each codeword
 Let X be discrete memory less Source having
an alphabet
 If the length of the binary code word
corresponding to be

 A necessary and sufficient condition for


existence of an instantaneous binary code is

 This is an expression for kraft inequality


 It indicates the existence of an instataneous
decodable code with codeword length that
satisfy the inequality
Xj Code 1 Code 2 Code 3 Code 4
X1 00 0 0 0
X2 01 10 11 100
X3 10 11 100 110
X4 11 110 110 111

 For code 1:

 Hence this satisfy kraft inequality


Xj Code 1 Code 2 Code 3 Code 4
X1 00 0 0 0
X2 01 10 11 100
X3 10 11 100 110
X4 11 110 110 111

 For code 2:

 Hence this does not satisfy kraft inequality


Xj Code 1 Code 2 Code 3 Code 4
X1 00 0 0 0
X2 01 10 11 100
X3 10 11 100 110
X4 11 110 110 111

 For code 3:

 Hence this satisfy kraft inequality


Xj Code 1 Code 2 Code 3 Code 4
X1 00 0 0 0
X2 01 10 11 100
X3 10 11 100 110
X4 11 110 110 111

 For code 4:

 Hence this satisfy kraft inequality

You might also like