Basic Concepts of Encoding
Basic Concepts of Encoding
Encoding
Error Correction vs. Compression
The Simplest Parity-Control Error
Detection Encoding
Shannon-Fano and Huffman Encoding
1
Redundancy vs. Efficiency
100% efficiency of encoding means that the
average word length must be equal to the
entropy of the original message ensemble:
If the entropy of the original message ensemble is
less than the length of the word over the original
alphabet, this means that the original encoding is
redundant and that the original information may
be compressed by the efficient encoding.
for 2
if 2 then log 1
( ) ( )
Efficiency 100%
log
D
D D
H X H X
L D L
=
= =
= =
2
Redundancy vs. Efficiency
On the other hand, as we have seen, to be
able to detect and to correct the errors, a
code must be redundant, that is its efficiency
must be lower that 100%: the average word
length must be larger than the entropy of the
original message ensemble:
for 2
if 2 then log 1
( ) ( )
Efficiency 100%
log
D
D D
H X H X
L D L
=
= =
= =
3
Redundancy and Error Correction
As we have seen, the capacity of the channel
is the maximum of transinformation (with
respect to all possible sets of probabilities that
could be assigned to the source alphabet) that
could be transmitted through this channel:
( ) | |
| |
max ; max ( ) ( | )
max ( ) ( | )
I X Y H X H X Y
H X
C
Y H Y
= = =
=
4
Redundancy and Error Correction
For the digital communication channel with
the binary alphabet, the probability of error
(inversion of a bit) p and the probability of the
correct transmission 1-p:
( ) ( )
( )
1
max ( ) 1;
| log (1 ) log(1 )
( ) | 1 log (1 ) log(1 )
p
P
p
H X
H X Y H P p p p p
C H X H X Y p p p p
=
| |
|
= = +
|
|
\ .
= = + +
5
Redundancy and Error Correction
The capacity C determines the limit for error
correction encoding: if we need to transmit a
message of the length m (bits) ensuring the
error correction ability, we will need to
transmit at least bits.
/ n m C >
6
Redundancy and Error Correction
Theorem. Let us have a digital communication
channel with the probability of error p. Any
error correction encoding, which ensures that
the probability of the error in the transmitted
word does not exceed , leads to
times extension of the original information
and
( )
, k m p
c
( )
0
lim , 1/
m
k m p C
c
c
>
7
Redundancy and Error Correction
The efficient encoding is reached when
The absolutely reliable encoding procedure
does not exist because
( )
0
lim , 1/
m
k m p C
c
c
=
( )
0
, k m p =
8
Parity Control:
the Simplest Error Detection Encoding
Let us use a uniform binary code with the
length of encoding vector (word) n-1. Let us
add to each encoding vector a parity bit: if the
number of 1s in the vector is even this bit
equals to 0, if it is odd, it equals to 1.
For example:
00101 2 1s 001010
01101 3 1s 011011
9
Parity Control:
the Simplest Error Detection Encoding
Hence, we obtain the uniform code with the
length of encoding word n , where the last bit
is a parity bit and where always the number of
1s in any encoding vector must be even.
If after transmission the number of 1s in the
encoding vector is odd, this means that the
error has occurred.
10
Parity Control:
the Simplest Error Detection Encoding
To be able to detect more than one error, we should
transmit a block of k vectors (words) of n bits length
each. These vectors are collected in the matrix (n
columns and k rows).
For each column a parity bit must be calculated. We
put it in the additional k+1
st
row, in the corresponding
column. Then the matrix is transmitted row by row.
If after the transmission the parity property does not
hold for if only one column, the whole matrix must be
transmitted again. This allows to detect the group
errors of the length up to n.
11
COMPRESSION OF THE
INFORMATION
12
Compression: Background
If the entropy of the original message
ensemble is less than the length of the word
over the original alphabet, this means that the
original encoding is redundant and that the
original information may be compressed by
the efficient encoding.
13
for 2
if 2 then log 1
( ) ( )
Efficiency 100%
log
D
D D
H X H X
L D L
=
= =
= =
Compression: Background
The main idea behind the compression is to
create such a code, for which the average
length of the encoding vector (word) will not
exceed the entropy of the original ensemble
of messages.
This means that in general those codes that
are used for compression are not uniform.
14
Shannon-Fano Encoding
Sources without memory are such sources of
information, where the probability of the next
transmitted symbol (message) does not depend
on the probability of the previous transmitted
symbol (message).
Separable codes are those codes for which the
unique decipherability holds.
Shannon-Fano encoding constructs reasonably
efficient separable binary codes for sources
without memory.
15
Shannon-Fano Encoding
Shannon-Fano encoding is the first established
and widely used encoding method. This
method and the corresponding code were
invented simultaneously and independently of
each other by C. Shannon and R. Fano in 1948.
16
Shannon-Fano Encoding
Let us have the ensemble of the original
messages to be transmitted with their
corresponding probabilities:
Our task is to associate a sequence C
k
of
binary numbers of unspecified length n
k
to
each message x
k
such that:
17
| | | | | | | |
1 2 1 2
, ,..., ; , ,...,
n n
X x x x P p p p = =
Shannon-Fano Encoding
No sequences of employed binary numbers C
k
can be obtained from each other by adding more
binary digits to the shorter sequence (prefix
property).
The transmission of the encoded message is
reasonably efficient, that is, 1 and 0 appear
independently and with almost equal
probabilities. This ensures transmission of
almost 1 bit of information per digit of the
encoded messages.
18
Shannon-Fano Encoding
Another important general consideration,
which was taken into account by C. Shannon
and R. Fano, is that (as we have already
considered) a more frequent message has to
be encoded by a shorter encoding vector
(word) and a less frequent message has to be
encoded by a longer encoding vector (word).
19
Shannon-Fano Encoding: Algorithm
The letters (messages) of (over) the input alphabet must be
arranged in order from most probable to least probable.
Then the initial set of messages must be divided into two
subsets whose total probabilities are as close as possible to
being equal. All symbols then have the first digits of their
codes assigned; symbols in the first set receive "0" and
symbols in the second set receive "1".
The same process is repeated on those subsets, to
determine successive digits of their codes, as long as any
sets with more than one member remain.
When a subset has been reduced to one symbol, this
means the symbol's code is complete.
20
Shannon-Fano Encoding: Example
21
Message x
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
Probability 0.25 0.25 0.125 0.125 0.0625 0.0625 0.0625 0.0625
x1,x2,x3,x4,x5,x6,x7,x8
x1,x2 x3,x4,x5,x6,x7,x8
x1 x2 x3,x4 x5,x6,x7,x8
x3 x4 x5,x6 x7,x8
01 00
1100
100
10
0
1
11
x5 x6 x7 x8
101
110 111
1101 1110 1111
Shannon-Fano Encoding: Example
Entropy
Average length of the encoding vector
The Shannon-Fano code gives 100% efficiency
22
Message x
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
Probability 0.25 0.25 0.125 0.125 0.0625 0.0625 0.0625 0.0625
Encoding
vector
00 01 100 101 1100 1101 1110 1111
1 1 1 1 1 1
2 log 2 log 4 log 2.75
4 4 8 8 16 16
H
| |
| | | | | |
= + + =
| | | |
\ . \ . \ .
\ .
{ }
1 1 1
2 2 2 3 4 4 2.75
4 8 16
i i
L P x n
| |
| | | | | |
= = + + =
| | | |
\ . \ . \ .
\ .
{ } { } { }
( )
(
1 1
)
log
k
I x
N N
k k k
H
k
X
k
k
L P x n P x P x
= =
= =
Shannon-Fano Encoding:
Properties
It should be taken into account that the
Shannon-Fano code is not unique because it
depends on the partitioning of the input set of
messages, which, in turn, is not unique.
If the successive equiprobable partitioning is
not possible at all, the Shannon-Fano code
may not be an optimum code, that is, a code
that leads to the lowest possible average
length of the encoding vector for a given D.
25
Huffman Encoding
This encoding algorithm has been proposed by
David A. Huffman in 1952, and it is still the
main loss-less compression basic encoding
algorithm.
The Huffman encoding ensures constructing
separable codes (the unique decipherability
property holds) with minimum redundancy
for a set of discrete messages (letters), that is,
this encoding results in an optimum code.
26
Huffman Encoding: Background
For an optimum encoding, the longer encoding
vector (word) should correspond to a message
(letter) with lower probability:
For an optimum encoding it is necessary that
otherwise the average length of the encoding
vector will be unnecessarily increased.
It is important to mention that not more than D (D is the number of letters
in the encoding alphabet) encoding vectors could have equal length (for
the binary encoding D=2)
27
{ } { } { } { } { } { }
2 2 1 1
... ...
N N
P x P x P L x x L x L x s s s > > >
( ) ( )
1 N N
L x L x
=
Huffman Encoding: Background
For an optimum encoding with D=2 it is
necessary that the last two encoding vectors
are identical except for the last digits.
For an optimum encoding it is necessary that
each sequence of length digits either
must be used as an encoding vector or must
have one of its prefixes used as an encoding
vector.
28
( )
1
N
L x
Huffman Encoding: Algorithm
The letters (messages) of (over) the input alphabet must be
arranged in order from most probable to least probable.
Two least probable messages (the last two messages) are
merged into the composite message with a probability
equal to the sum of their probabilities. This new message
must be inserted into the sequence of the original
messages instead of its parents, accordingly with its
probability.
The previous step must be repeated until the last remaining
two messages will compose a message, which will be the
only member of the messages sequence.
The process may be utilized by constructing a binary tree
the Huffman tree.
29
Huffman Encoding: Algorithm
The Huffman tree should be constructed as follows:
1) A root of the tree is a message from the last step
with the probability 1; 2) Its children are two messages
that have composed the last message; 3) The step 2
must be repeated until all leafs of the tree will be
obtained. These leafs are the original messages.
The siblings-nodes from the same level are given the
numbers 0 (left) and 1 (right).
The encoding vector for each message is obtained by
passing a path from the roots child to the leave
corresponding to this message and reading the
numbers of nodes (roots childintermidiatesleaf)
that compose the encoding vector.
30
Huffman Encoding: Example
Let us construct the Huffman code for the
following set of messages: x1, x2, x3, x4, x5
with the probabilities p(x1)==p(x5)=0.2
1) x1 (p=0.2), x2 (p=0.2), x3 (p=0.2), x4 (p=0.2), x5 (p=0.2)
2) x4,x5x45 (p=0.4)=> x45,x1,x2,x3
3) x2,x3x23 (p=0.4)=>x45, x23, x1
4) x1,x23x123(p=0.6)=> x123, x45
5) x123, 45x12345 (p=1)
31
Huffman Encoding: Example
32
x12345
x123 x45
x4 x1 x23 x5
x2 x3
0 1
0 0
0
1
1
1
Encoding vectors: x1(00); x2(010); x3(011); x4(10); x5(11)
Huffman Encoding: Example
Entropy
Average length of the encoding vector
The Huffman code gives (2.32/2.4)100% = 97%
efficiency
33
( )
1 1 1
( ) 5 0.2log0.2 5 log log log5 2.32
5 5 5
H X
| |
= = = = ~
|
\ .
1 1 12
3 2 2 3 2.4
5 5 5
L
| | | |
= + = =
| |
\ . \ .
Homework
Construct the Shannon-Fano code for the last
example
Examples 4-7, 4-8 from the Rezas book. Try
both Huffman and Shannon-Fano encoding.
34