0% found this document useful (0 votes)
14 views

ITC Module - I

Uploaded by

Aiswarya S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

ITC Module - I

Uploaded by

Aiswarya S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

MODULE -I

Entropy, Sources and


Source Coding
INFORMATION
 any message that brings a specification - which involves
a certain degree of uncertainty.
 Two correspondents: one generating it (the information
source S) and another receiving it (the destination D, or
the user U).- can be transmitted at distance or stored
(memorized) for later reading.
 The physical medium,( including the contained
equipment ) -for remote transmission of the information
from S to D, is called transmission channel C;
 In storage systems - the channel - replaced by the
storage medium, e.g. DVDS, Print outs etc.
INFORMATION
 Information source produces an output
that is random in nature.

Ifthe source output had no


randomness, (i.e., the output
were known exactly), Why
should we transmit it! ?
DISCRETE SOURCE
 a discrete information source - generates ‘ m’
distinct symbols (messages)
 A discrete source is called memory less (discrete
memory less source -DMS) if the emission of a
symbol does not depend on the previous
transmitted symbols
AMOUNT OF INFORMATION
“Sun rises in the east”
-known to all –certain event. No new
information

“I will get at least one mail today.”


more chance to happen- but some
amount of uncertainty

“It will snow in Munnar in next January”.-


No idea. Highly uncertain-
It will snow in Ernakulam tomorrow- will not
happen certain event
SO INFERENCE
 A high probability event - less information than a
low probability event.

 A less probable event- means high amount of


information or higher degree of uncertainty
contains more information . ( and vice versa)
So- quantify information -based on probability.
INFORMATION
 Consider a discrete random variable X with
possible outcomes xi, i = 1, 2, …, n. The Self
Information of the event X = xi

 Units
WHY LOG OF PROBABILITY?
 Monotonically decreasing function of
probability
 If a particular event has a 100% probability of
occurring,- then its self-information is {log(1)=0}:
its occurrence is "perfectly non-surprising" and
yields no information.
 If a particular event has a 0% probability of
occurring, then its self-information is {log(0)=α }: -
its occurrence is "infinitely surprising."
TO CALCULATE LOG2 FROM ANOTHER LOG
BASE
ENTROPY- AVERAGE INFORMATION

Where X represents the alphabet of possible output


letters from a source,
H(X) represents the average information per source
letter.
Unit : bits, nats, Hartleys based on log base
PROBLEM 1
 Let us define a random variable X as follows:
red with probability 1/2
green with probability 1/ 4
X= .
blue with probability 1/ 8
yellow with probability 1/8

 Find entropy of X.
PROBLEM 2
 Suppose we have a horse race with eight horses
taking part. Assume that the probabilities of
winning for the eight horses are( ½ ¼,1/8,
1/16,1/64, 1/64,1/64,1/64). Calculate the entropy
of horse race .

 Ans = 2 bits
PROPERTIES OF ENTROPY
 H(X) ≥0
 The entropy is a continuous function with respect to each
variable pi because it is a sum of continuous functions.

Additivity: self information is additive and so it is H(X),
which represents the mean value of self information.

Entropy is maximum when the symbols have equal
probabilities; this maximum value is also called
decision quantity, D(X): H max ( X) =: D( X) = log 2 m

Symmetry: H(X) is unchanged if the events xi are
reordered.
 The maximum value of the entropy s obtained for
equiprobabile messages.
 Hb(X) = logb(a) Ha(X)
MINIMUM AVERAGE NUMBER OF BITS IS
MESSAGES WITH EQUAL PROBABILITY
 Consider a discrete binary source – emits 0 or 1-
a sequence -statistically independent.
 The output is “0” with probability “ p” or 1 with a
probability 1 – p.
 The entropy of this binary source is

H ( X )   
i
p i log p i

= – p log2(p) – (1 – p) log2(1 – p)
=H(p) ( def)
MESSAGE WITH EQUAL PROBABILITY FOR
BINARY SOURCE.

 If p=0.5 for above case , then 1-p=0.5


 H(X) =1
 Entropy – concave function
 When p=0 or 1 – no uncertainty- not an RV
RATE OF INFORMATION
 Information rate or Rate of Information ‘R’ –
represented in average bits of information per
second.

 Information Rate R= r H(X)

 H(X) – average information


 r- rate at which information is generated.
PROBLEM 3
 A discrete source emits 5 symbols once each 1 ms
with the probabilities ½, ¼,1/8, 1/16, 1/16.
Determine the source entropy and information
rate.

 Ans=1.875 bits ms
PROBLEM 4
 An analog signal band limited to 10 kHz is
quantized in 8 levels of a PCM system with
probabilities of ¼, 1/5, 1/5, 1/10, 1/10, 1/20, 1/20,
1/20 respectively. Find the entropy and the rate
of information.
REMEMBER
 Information increases Entropy
SOURCE ENTROPY
 The minimum average number of binary digits
needed to specify a source output (message) uniquely
is called “SOURCE ENTROPY”
JOINT ENTROPY
 measures how much entropy is contained in a
joint system of two random variables.
 If the random variables are X and Y, the joint
entropy - H(X,Y).
 measured in bits, nats, or Hartleys depending on
the base of the logarithm.
JOINT ENTROPY
 The Joint Entropy of a pair of discrete random
variables (X, Y) with a joint distribution
p(x, y ) is given by
PROPERTIES OF JOINT ENTROPY
 Nonnegative, i.e., H(X,Y)≥0
 greater than individual entropies

H(X,Y) ≥ max{H(X), H(Y)}


 Less than or equal to the sum of individual
entropies
H(X,Y) ≤ H(X)+ H(Y)
REMEMBER CONDITIONAL PROBABILITY
 probability of one event occurring with some
relationship to one or more other events.
Example:
 Event A is that it is raining outside, and it has a
0.3 (30%) chance of raining today.
 Event B is that you will need to go outside, and
that has a probability of 0.5 (50%).
 A conditional probability would look at these two
events in relationship with one another
 the probability that it is both
raining and you will need to go outside.
CONDITIONAL ENTROPY
 conditional entropy is the amount of information
in one random variable given, we already know
the other.
 the conditional entropy quantifies the amount
of information needed to describe the outcome of
a random variable Y given the other random
variable X is known
 Represented as H(Y/X)

 how much entropy a random variable Y has


remaining if we have already learned the value of
a second random variable X
CONDITIONAL ENTROPY
 The Average Conditional Self Information called the
Conditional Entropy.
 If X and Y are discrete random variables and p (x, y) and p
(x|y) are the values of their joint and conditional
probability distributions, then

 H(X|Y) is the information (or uncertainty) in


X after “Y " is observed.
 entropy of X conditional on Y
OBSERVATIONS
 (i) I(X; Y) ≥0 ,H(X) ≥ H(X|Y).

(ii) The case I(X; Y )=0, H(X) = H(X|Y), which is


possible if and only if X and Y are statistically
independent.

(iii) Since H(X) ≥ H(X|Y), the observation of Y does


not increase the entropy (uncertainty).
FOR A CHANNEL
 The amount of uncertainty remaining about the
channel input after observing the channel output,
RELATIONSHIP BETWEEN CONDITIONAL
ENTROPY AND JOINT ENTROPY-CHAIN RULE

 H(X, Y) = H(X ) +H(Y|X) = H(Y ) + H(X|Y)


MUTUAL INFORMATION
 the mutual information (MI) of two random
variables is a measure of the
mutual dependence between the two variables
 Unit bits, nats, Hartleys (bsed on log base)
 measures how much one random variables tells us
about another
 can be thought of as the reduction in uncertainty
about one random variable given knowledge of
another
 High mutual information - large reduction in
uncertainty
 low mutual information - small reduction;
 zero mutual information between two random
variables means the variables are independent.
MUTUAL INFORMATION AND ENTROPY
RELATIONSHIP BETWEEN H AND I(X:Y)
PROPERTIES OF MUTUAL INFORMATION
 Mutual information of a channel is symmetric.
I(X:Y)= I(Y:X)
 Mutual information is non negative

I(X;Y) ≥0
 Mutual information can be expressed in terms of
entropy of the channel output.
 Mutual information is additive for independent
variables
I(X,W;Y,Z) =I(X;Y) +I(W;Z)
If X and Y are independent I(X;Y) =0
DISCRETE MEMORY LESS SOURCE
 A source from which the data is being emitted at
successive intervals,
 w independent of previous values

 source is discrete as it is not considered for a


continuous time interval, but at discrete time
intervals
 source is memoryless as it is fresh at each instant
of time, without considering the previous values.
DISCRETE MEMORY LESS CHANNEL
 A statistical model
 X- input Y –o/p

 o/p corrupted by noise

 X and Y are Random variables


DISCRETE MEMORY LESS CHANNEL
CHANNEL MATRIX
 A channel is completely specified by the complete
set of transition probabilities.
 From the figure in previous slide, channel is
often specified by the matrix of transition
probabilities P(Y| X) , is given by
CHANNEL MATRIX
 The matrix [P( Y /X) ] is called the channel
matrix. Since each input to the channel results in
some output, each row of the channel matrix
must to unity,
PROBLEM

Find joint entropy, marginal entropies H(X),H(Y), conditional


entropies H(X/Y) H(Y/X) and mutual information I(X;Y)
PROBLEM
 Given joint probabilities of two RV X and Y as
Y/X X=X1 X=X2 X=X3

Y=Y1 0.2 0.1 0.1

Y=Y2 0.3 0.2 0.1

 Find the joint entropy, conditional entropies,


marginal entropies and mutual information.
PROBLEM (HOME WORK)
SOURCE CODING
SOURCE CODING
 Concept behind data compression
 a mapping from (a sequence of) symbols from an
information source to a sequence of alphabet
symbols (usually bits)
 the source symbols can be exactly recovered from
the binary bits (lossless source coding) or
recovered within some distortion (lossy source
coding)
 The aim -source coding is to represent
information as accurately as possible using
as few bits as possible
 redundancy from the source needs to be
removed.
SOURCE CODING
 Use of variable-length codes in order- to reduce
the number of symbols in a message to the
minimize the information in the message.
 Example of source code - someone using HTML
code to create a screen.
 In telegraphy, we use Morse code, in which the
alphabets are denoted by Marks and Spaces. If
the letter E is considered, which is mostly used, it
is denoted by “.” Whereas the letter Q which is
rarely used, is denoted by “--.-”
 Use to remove redundancy
SOURCE CODING
In a file – Source coding-data reduction

In transmission-Source coding- reduced transmission bits

Source Coding increases Entropy

Effective representation of data


A DEMO
LOSSLESS COMPRESSION OF DATA
 Assumptions made:
 1. the channel is perfectly noiseless i.e. the
receiver sees exactly the encoder’s output
 2. we always require the output of the decoder to
exactly match the original sequence X.
 3. X is generated according to a fixed
probabilistic model, p(X)
So Encoding -a mapping of each source symbol into
a finite sequence of code symbols called a
codeword.
LENGTH OF A CODE
 Average code word length L of the source
encoder as
n
L  k  0
p k l k

 Efficiency of a code :

H (s)
 
L
 Usually expressed in % (multiplied by 100)

 Redundancy R= 1-η
PROBLEMS
1. A scanner converts a black and white document,
line-by-line into binary data for transmission.
The scanner produces source data comprising
symbols representing runs of up to six similar
image pixel elements with the probabilities as
shown below:

No. of 1 2 3 4 5 6
consecutive
pixels

Probability of 0.2 0.4 0.15 0.1 0.06 0.09


occurrence
PROBLEM CONTINUED
 Determine the average length of a run (in pixels)
and the corresponding effective information rate
for this source when the scanner is traversing
1000 pixel/s.

Answers
 H(s) = 2.29 bits

 Length – 2.69 bits

 At 1000 pixel/s scan rate we generate 1000/2.69 =


372 symbol/s. Thus the source information rate is
2.29 x 372 = 852 bit/s
SOURCE CODE
 Can be of fixed length or variable length
 Two fundamental requirement of a source code:
1. The code words produced by the encoder should
be in binary form.
2. The source code is uniquely decodable, so that
the original source sequence can be
reconstructed perfectly from the encoded binary
sequence.
 may also reduce fluctuations in the information
rate from the source and avoid symbol 'surges'
which could overload the channel when the
message sequence contains many high
probability .
VARIABLE LENGTH SOURCE CODING
 When the source symbols are not equally
probable, a more efficient encoding method is to
use variable-length code words.
 Eg: Morse code

 Probabilities may be used to assign codes, i.e, a


more occurring symbol uses less no.bits.
 This is entropy coding
PREFIX CODE
 To be a prefix code- the entire set of possible
encoded values ("code words") must not
contain any values that start with
any other value in the set.
 Actually prefix (free) code- Instantaneous code
 Eg: 1, 22,333 – a prefix code
 But 1, 12,33 is not a prefix code, because one of
the codeword (12) starts with another codeword
(1).
 A prefix code is a code in which no codeword is a
prefix of another codeword, neither can a
codeword be derived from another by appending
more bits to a shorter codeword.
INSTANTANEOUS CODE
 Instantaneous/Prefix-Free Codes A code is called
instantaneous if each symbol can be decoded as
soon as the corresponding codeword is
completed.
 It is not necessary to see bits of later symbols in
order to decode the current symbol.
 The code 0, 01, 011, 111 for the symbols s1, s2,
s3, s4, respectively is not instantaneous
(although it is uniquely decodable).
PREFIX CODE- BINARY TREE
REPRESENTATION

 most easily represented by a binary tree


 the external nodes are labeled with single
characters that are combined to form the
message.
 The encoding for a character is determined by
following the path down from the root of the tree
to the external node that holds that character:
 A ‘0’ bit identifies a left branch in the path, and a
‘1’ bit identifies a right branch.
PREFIX CODE- BINARY TREE
REPRESENTATION EXAMPLE
Character Encoding
a 0
b 111
c 1011
d 1010
r 110
! 100

Each character is encoded with a


different number of bits

Square box- external node


Black round- internal
node
PREFIX CODE- DECODING
 Start at the root of the tree.
 Repeat until you reach an external leaf node.
 Read one message bit.

 Take the left branch in the tree if the bit is 0; take


the right branch if it is 1.
o Print the character in that external node.
o The whole process is repeated, starting over at
the root, until all of the bits in the compressed
message are exhausted.
EXAMPLE
Decode the following code word based on
given code tree

0111110010110101001111100100

Now
|0|111|110|0|1011|0|1010|0|111
|110|0|100

|0|111|110|0|1011|0|1010|0|111|110|0|100
a b r a c a d a b r a !
PREFIX CODE- PROPERTY
 The encoding scheme to reduce the number of
bits in a message, always use short encodings for
frequently used characters, and long encodings
for infrequent ones.
 A second fundamental property of prefix codes is
that messages can be formed by simply stringing
together the code bits from left to right

 Fixed codes are prefix codes


UNIQUELY DECODABLE CODE
 a prefix code (or prefix-free code) if it has
the prefix property, which requires that no
codeword is a proper prefix of any other
codeword.
 One- to one mapping source symbol to code
symbol
TRY YOURSELF
 The four symbols A, B, C, D are encoded using
the following sets of code words. In
each case state whether the code is (i) uniquely
decodable ii) instantaneous
a) {1, 01, 000, 001}
b) {0, 10, 000, 100}
c) {0, 01,011 , 0111}
d) {01, 01, 110, 100}
e) {01, 10, 0010, 0111
ANSWERS
 Note: Instantaneous (Prefix free) Uniquely Decodable
 A) {1, 01, 000, 001} is an instantaneous code (and therefore
uniquely decodable )
 B) no
 C) {0, 01, 011, 0111} is uniquely decodable but not an
instantaneous code
 D) {01, 01, 110, 100} – no
 E) is not an instantaneous code since 01 is a prefix of 0111


VENN DIAGRAM REPRESENTATION
HUFFMAN CODING
 Invented by David A Huffman as a part of his
class assignment.
 Idea : to assign variable-length codes to input
characters,-lengths of the assigned codes are
based on the frequencies of corresponding
characters
 Prefix code

 A greedy algorithm- an approach for solving a


problem by selecting the best option available at
the moment.
HUFFMAN CODING
 Huffman Code is also Compact code and satisfies
the properties:
 1. Has the shortest mean length among binary
instantaneous codes
 2. Optimal tree

 3. Compact Code tree


STEPS IN HUFFMAN CODING
 (i) Arrange the source symbols in decreasing
order of their probabilities.
(ii) Take the lowermost two symbols and tie them
together as shown in Fig.

STEPS IN HUFFMAN CODING
 Add the probabilities of the two symbols and
write it on a combined node.

(iii) Treat this sum of probabilities as a new


probability associated with a new symbol. Again
pick the two smallest probabilities, tie them
together to form a new probability. Each time we
perform the combination of two symbols we
reduce the total number of symbols by one.
Whenever we tie together two probabilities
STEPS IN HUFFMAN CODING
 (iv) Continue the procedure until only one
probability is left (and it should be 1).
This completes the construction of the Huffman
tree. symbol.
 While tracing back the route, read out the labels
on the branches. This is the codeword for the
symbol.
EXAMPLE
 Given the probabilities for message symbols a1 to
a6 as 0.1 ,0.4, 0.06, 0.1, 0.04, 0.3. Find the binary
Huffman codes

 Step 1. Arrange according to decreasing order of


probabilities
STEP 1
NON SINGULAR CODES
 A code is non-singular if each source symbol is
mapped to a different non-empty bit string,
 i.e. the mapping from source symbols to bit
strings is injective. (one to one mapping)
SHANNON’S SOURCE CODING THEOREM
 Shannon’s first theorem or main theorem
 Statement : Given a discrete memory less source
of H(s), the average length of code is ‘L’, for any
distortion less source encoding scheme is
bounded as

H(s)  L

 Mathematical tool for assessing data compression


SOURCE CODING THEOREM
 Also called noiseless coding theorem- establishes
the condition for error- free coding to be possible.
SHANNON’S SOURCE CODING THEOREM
KRAFT INEQUALITY
 limits the lengths of code words.
 In prefix codes

 By Kraft in 1949

 if one takes an exponential of the length of each


valid codeword, the resulting set of values must
look like a probability mass function
 , it must have total measure less than or equal to
one.
 can tell us whether the lengths of a prefix
code can be shortened, but it cannot make
any change to the lengths
KRAFT INEQUALITY
 Relates- how to construct a symbol code having
some of the good properties with minimum
expected length of the code.
 We shall assume that the encoding is not in
binary, but in a general D-ary code i.e.,(uses D
symbols [D] = {0, 1, ..., D−1} instead of 2 symbols
{0, 1}).
KRAFT (MC –MILLAN)INEQUALITY
SOME PROPERTIES
 If Kraft's inequality holds with strict inequality,
the code has some redundancy.
 If Kraft's inequality holds with equality, the code
in question is a complete code.
 If Kraft's inequality does not hold, the code is
not uniquely decodable.
 For every uniquely decodable code, there exists a
prefix code with the same length distribution
STATEMENT

 Kraft's inequality states that given a list of


positive integers (n1,n2,…nr), there exists a prefix
code with a set of code words (σ1,σ2,…σr) where
the length of each codeword |σi|=ni, i| if and
only if

 ni
D 1

 D=2 ,for binary codes


PROOF
Refer notes.
PROOF (ALTERNATE METHOD)
 The Kraft inequality can tell that a given code
is not a prefix code but it cannot be used to decide
if a given code is a prefix code.
VARIABLE LENGTH CODING
 Consider an example which can be coded in three
forms
VARIABLE LENGTH OF CODE
BOUNDS ON CODE LENGTH
BOUNDS ON CODE LENGTH
 There is an overhead that is at most 1 bit-due to the fact
that log 1/pi is not always an integer.
 We can reduce the overhead per symbol by spreading it out
over many symbols.
 Let us consider a system in which we send a
sequence of n symbols from X.
 The symbols are assumed to be drawn i.i.d.
according to p(x).
BOUNDS ON CODE LENGTH
 By using large block lengths we can achieve an
expected codelength per symbol arbitrarily close to
the entropy.
 We can use the same argument for a sequence of
symbols from a stochastic process that is not
necessarily i.i.d.
 In this case, we still have the bound
SHANNON’S SOURCE CODING THEOREM
 Shannon’s first theorem
 (or Shannon’s noiseless coding theorem)

 establishes the statistical limits to possible data


compression for data whose source is
an independent identically-distributed random
variable, and the operational meaning of
the Shannon entropy.
BASED ON BOUNDS -SHANNON’S SOURCE
CODING THEOREM
MODULE I
 Module 1 – Entropy, Sources and Source
Coding
Entropy, Properties of Entropy, Joint and
Conditional Entropy, Mutual Information,
Properties of Mutual Information.
Discrete memory less sources, Source code,
Average length of source code, Bounds on average
length, Uniquely decodable and prefix-free source
codes. Kraft Inequality (with proof), Huffman
code. Shannon’s source coding theorem (both
achievability and converse) and operational
meaning of entropy.

You might also like