Information Theory
Information Theory
Presented By:
Er. Amit Mahajan
What is information theory
The entropy H of a discrete random variable X with possible
values {x1, …, xn} is
E is the expected value function, and I(X) is the information
content or self-information of X.
I(X) is itself a random variable. If p denotes the probability
mass function of X then the entropy can explicitly be written
as
Entropy for discrete ensembles
For a random variable X with n outcomes , the Shannon
entropy, a measure of uncertainty and denoted by H(X), is
defined as
Proof
Let si denote the word length of each possible xi ( ). Define
.
, where C is chosen so that Then,
where the second line follows from Gibbs' inequality and the third line follows from
Kraft's inequality:
so
and so
and
and so by Kraft's inequality there exists a prefix-free code having those word lengths.
Thus the minimal S satisfies
Shannon-Fano Algorithm
List the source symbols in order
of decreasing probability
Partition the set into two sets that
are as close to equiprobables as
possible, and assign 0 to the upper
set and 1 to the lower set.
Continue this process, each time
partitioning the sets as nearly
equal probabilities as possible
until further partitioning is not
possible.
For M=2,
Message Probability Encoded Message Length
x1 0.4 00 2
x2 0.2 01 2
x3 0.12 100 3
x4 0.08 101 3
0.08 110 3
x5 0.08 1110 4
x6 0.04 1111 4
x7
Effeciency = = 88.7%
Huffman Coding Algorithm
Encoding algorithm
Order the symbols by decreasing probabilities
Starting from the bottom, assign 0 to the least probable
symbol and 1 to the next least probable symbol
Combine the two least probable symbols into one composite Node
symbol Root
Reorder the list with the composite symbol
Repeat Step 2 until only two symbols remain in the list 1 0
Huffman tree
1 0
Nodes: symbols or composite symbols
Branches: from each node, 0 defines one branch while 1
defines the other
Decoding algorithm Leaves
Start at the root, follow the branches based on the bits
received
When a leaf is reached, a symbol is decoded
Huffman Coding Example
Symbols Prob. Symbols Prob. Symbols Prob.
A 0.35 A 0.35 A 0.35
B 0.17 DE 0.31 BC 0.34 1
C 0.17 B 0.17 1 DE 0.31 0
D 0.16 1 C 0.17 0
E 0.15 0
Huffman Tree
Huffman Codes 1 0 Symbols Prob.
BCDE A BCDE 0.65 1
A 0
B 111 BC 1 0 DE A 0.35 0
B 1
C 110 0 E
0 1
D 101
E 100 C D
Average code-word length = 0.35 x 1 + 0.65 x 3 = 2.30 bits per symbol
Refrences
http://www.quantiki.org/wiki/index.php/Shannon
%27s_noiseless_coding_theorem
Cover, Thomas M. (2006). "Chapter 5: Data Compression".
Elements of Information Theory. John Wiley & Sons.
http://moser.cm.nctu.edu.tw/nctu/it/index_0809.html
http://www.maths.abdn.ac.uk/~igc/tch/mx4002/notes/node59.html
http://en.wikipedia.org/wiki/Mutual_information