Basic Concepts of Encoding

This document discusses encoding techniques for error correction and data compression. It introduces parity-control encoding as the simplest error detection method. Shannon-Fano encoding and Huffman encoding are then described as methods for compressing data by assigning shorter codewords to more frequent messages. Shannon-Fano encoding constructs reasonably efficient variable-length codes for sources without memory by recursively splitting the message set in half based on probability. Huffman encoding also assigns variable-length codes but creates the most efficient codes possible for a given source. The document emphasizes that error correction requires redundancy while compression aims for 100% encoding efficiency.

Uploaded by

Souresh Kulshreshtha

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views

Basic Concepts of Encoding

Uploaded by

Souresh Kulshreshtha

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Basic Concepts of

Encoding
Error Correction vs. Compression
The Simplest Parity-Control Error
Detection Encoding
Shannon-Fano and Huffman Encoding
1
Redundancy vs. Efficiency
100% efficiency of encoding means that the
average word length must be equal to the
entropy of the original message ensemble:

If the entropy of the original message ensemble is
less than the length of the word over the original
alphabet, this means that the original encoding is
redundant and that the original information may
be compressed by the efficient encoding.
for 2
if 2 then log 1
( ) ( )
Efficiency 100%
log
D
D D
H X H X
L D L
=
= =
= =

2
Redundancy vs. Efficiency
On the other hand, as we have seen, to be
able to detect and to correct the errors, a
code must be redundant, that is its efficiency
must be lower that 100%: the average word
length must be larger than the entropy of the
original message ensemble:
for 2
if 2 then log 1
( ) ( )
Efficiency 100%
log
D
D D
H X H X
L D L
=
= =
= =

3
Redundancy and Error Correction
As we have seen, the capacity of the channel
is the maximum of transinformation (with
respect to all possible sets of probabilities that
could be assigned to the source alphabet) that
could be transmitted through this channel:

( ) | |
| |
max ; max ( ) ( | )
max ( ) ( | )
I X Y H X H X Y
H X
C
Y H Y
= = =
=
4
Redundancy and Error Correction
For the digital communication channel with
the binary alphabet, the probability of error
(inversion of a bit) p and the probability of the
correct transmission 1-p:
( ) ( )
( )
1
max ( ) 1;
| log (1 ) log(1 )
( ) | 1 log (1 ) log(1 )
p
P
p
H X
H X Y H P p p p p
C H X H X Y p p p p

=
| |
|
= = +
|
|
\ .
= = + +
5
Redundancy and Error Correction
The capacity C determines the limit for error
correction encoding: if we need to transmit a
message of the length m (bits) ensuring the
error correction ability, we will need to
transmit at least bits.

/ n m C >
6
Redundancy and Error Correction
Theorem. Let us have a digital communication
channel with the probability of error p. Any
error correction encoding, which ensures that
the probability of the error in the transmitted
word does not exceed , leads to
times extension of the original information
and
( )
, k m p
c
( )
0
lim , 1/
m
k m p C
c
c

>
7
Redundancy and Error Correction
The efficient encoding is reached when

The absolutely reliable encoding procedure
does not exist because
( )
0
lim , 1/
m
k m p C
c
c

=
( )
0
, k m p =
8
Parity Control:
the Simplest Error Detection Encoding
Let us use a uniform binary code with the
length of encoding vector (word) n-1. Let us
add to each encoding vector a parity bit: if the
number of 1s in the vector is even this bit
equals to 0, if it is odd, it equals to 1.
For example:
00101 2 1s 001010
01101 3 1s 011011
9
Parity Control:
the Simplest Error Detection Encoding
Hence, we obtain the uniform code with the
length of encoding word n , where the last bit
is a parity bit and where always the number of
1s in any encoding vector must be even.
If after transmission the number of 1s in the
encoding vector is odd, this means that the
error has occurred.
10
Parity Control:
the Simplest Error Detection Encoding
To be able to detect more than one error, we should
transmit a block of k vectors (words) of n bits length
each. These vectors are collected in the matrix (n
columns and k rows).
For each column a parity bit must be calculated. We
put it in the additional k+1
st
row, in the corresponding
column. Then the matrix is transmitted row by row.
If after the transmission the parity property does not
hold for if only one column, the whole matrix must be
transmitted again. This allows to detect the group
errors of the length up to n.
11
COMPRESSION OF THE
INFORMATION
12
Compression: Background
If the entropy of the original message
ensemble is less than the length of the word
over the original alphabet, this means that the
original encoding is redundant and that the
original information may be compressed by
the efficient encoding.

13
for 2
if 2 then log 1
( ) ( )
Efficiency 100%
log
D
D D
H X H X
L D L
=
= =
= =

Compression: Background
The main idea behind the compression is to
create such a code, for which the average
length of the encoding vector (word) will not
exceed the entropy of the original ensemble
of messages.
This means that in general those codes that
are used for compression are not uniform.
14
Shannon-Fano Encoding
Sources without memory are such sources of
information, where the probability of the next
transmitted symbol (message) does not depend
on the probability of the previous transmitted
symbol (message).
Separable codes are those codes for which the
unique decipherability holds.
Shannon-Fano encoding constructs reasonably
efficient separable binary codes for sources
without memory.
15
Shannon-Fano Encoding
Shannon-Fano encoding is the first established
and widely used encoding method. This
method and the corresponding code were
invented simultaneously and independently of
each other by C. Shannon and R. Fano in 1948.
16
Shannon-Fano Encoding
Let us have the ensemble of the original
messages to be transmitted with their
corresponding probabilities:

Our task is to associate a sequence C
k
of
binary numbers of unspecified length n
k
to
each message x
k
such that:
17
| | | | | | | |
1 2 1 2
, ,..., ; , ,...,
n n
X x x x P p p p = =
Shannon-Fano Encoding
No sequences of employed binary numbers C
k

can be obtained from each other by adding more
binary digits to the shorter sequence (prefix
property).
The transmission of the encoded message is
reasonably efficient, that is, 1 and 0 appear
independently and with almost equal
probabilities. This ensures transmission of
almost 1 bit of information per digit of the
encoded messages.
18
Shannon-Fano Encoding
Another important general consideration,
which was taken into account by C. Shannon
and R. Fano, is that (as we have already
considered) a more frequent message has to
be encoded by a shorter encoding vector
(word) and a less frequent message has to be
encoded by a longer encoding vector (word).
19
Shannon-Fano Encoding: Algorithm
The letters (messages) of (over) the input alphabet must be
arranged in order from most probable to least probable.
Then the initial set of messages must be divided into two
subsets whose total probabilities are as close as possible to
being equal. All symbols then have the first digits of their
codes assigned; symbols in the first set receive "0" and
symbols in the second set receive "1".
The same process is repeated on those subsets, to
determine successive digits of their codes, as long as any
sets with more than one member remain.
When a subset has been reduced to one symbol, this
means the symbol's code is complete.
20
Shannon-Fano Encoding: Example
21
Message x
1
x
2

x
3

x
4

x
5

x
6

x
7

x
8

Probability 0.25 0.25 0.125 0.125 0.0625 0.0625 0.0625 0.0625
x1,x2,x3,x4,x5,x6,x7,x8
x1,x2 x3,x4,x5,x6,x7,x8
x1 x2 x3,x4 x5,x6,x7,x8
x3 x4 x5,x6 x7,x8
01 00
1100
100
10
0
1
11
x5 x6 x7 x8
101
110 111
1101 1110 1111
Shannon-Fano Encoding: Example

Entropy
Average length of the encoding vector

The Shannon-Fano code gives 100% efficiency

22
Message x
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
Probability 0.25 0.25 0.125 0.125 0.0625 0.0625 0.0625 0.0625
Encoding
vector
00 01 100 101 1100 1101 1110 1111
1 1 1 1 1 1
2 log 2 log 4 log 2.75
4 4 8 8 16 16
H
| |
| | | | | |
= + + =
| | | |
\ . \ . \ .
\ .
{ }
1 1 1
2 2 2 3 4 4 2.75
4 8 16
i i
L P x n
| |
| | | | | |
= = + + =
| | | |
\ . \ . \ .
\ .

Shannon-Fano Encoding: Example

The Shannon-Fano code gives 100% efficiency.
Since the average length of the encoding
vector for this code is 2.75 bits, it gives the
0.25 bits/symbol compression, while the
direct uniform binary encoding (3 bits/symbol)
is redundant.

23
Message x
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
Probability 0.25 0.25 0.125 0.125 0.0625 0.0625 0.0625 0.0625
Encoding
vector
00 01 100 101 1100 1101 1110 1111
Shannon-Fano Encoding:
Properties
The Shannon-Fano encoding is the most
efficient when the probability of the
occurrence of each message (letter) x
k
is of
the form and
The prefix property always holds and

the efficiency is 100%.
24
{ }
2
k
n
k
P x

=
1 2
1
2 2 2 ... 2 1
k N
N
n n n n
k

=
= + + + =

{ } { } { }
( )
(
1 1
)
log
k
I x
N N
k k k
H
k
X
k
k
L P x n P x P x
= =

= =

Shannon-Fano Encoding:
Properties
It should be taken into account that the
Shannon-Fano code is not unique because it
depends on the partitioning of the input set of
messages, which, in turn, is not unique.
If the successive equiprobable partitioning is
not possible at all, the Shannon-Fano code
may not be an optimum code, that is, a code
that leads to the lowest possible average
length of the encoding vector for a given D.
25
Huffman Encoding
This encoding algorithm has been proposed by
David A. Huffman in 1952, and it is still the
main loss-less compression basic encoding
algorithm.
The Huffman encoding ensures constructing
separable codes (the unique decipherability
property holds) with minimum redundancy
for a set of discrete messages (letters), that is,
this encoding results in an optimum code.
26
Huffman Encoding: Background
For an optimum encoding, the longer encoding
vector (word) should correspond to a message
(letter) with lower probability:

For an optimum encoding it is necessary that

otherwise the average length of the encoding
vector will be unnecessarily increased.

It is important to mention that not more than D (D is the number of letters
in the encoding alphabet) encoding vectors could have equal length (for
the binary encoding D=2)

27
{ } { } { } { } { } { }
2 2 1 1
... ...
N N
P x P x P L x x L x L x s s s > > >
( ) ( )
1 N N
L x L x

=
Huffman Encoding: Background
For an optimum encoding with D=2 it is
necessary that the last two encoding vectors
are identical except for the last digits.
For an optimum encoding it is necessary that
each sequence of length digits either
must be used as an encoding vector or must
have one of its prefixes used as an encoding
vector.

28
( )
1
N
L x
Huffman Encoding: Algorithm
The letters (messages) of (over) the input alphabet must be
arranged in order from most probable to least probable.
Two least probable messages (the last two messages) are
merged into the composite message with a probability
equal to the sum of their probabilities. This new message
must be inserted into the sequence of the original
messages instead of its parents, accordingly with its
probability.
The previous step must be repeated until the last remaining
two messages will compose a message, which will be the
only member of the messages sequence.
The process may be utilized by constructing a binary tree
the Huffman tree.
29
Huffman Encoding: Algorithm
The Huffman tree should be constructed as follows:
1) A root of the tree is a message from the last step
with the probability 1; 2) Its children are two messages
that have composed the last message; 3) The step 2
must be repeated until all leafs of the tree will be
obtained. These leafs are the original messages.
The siblings-nodes from the same level are given the
numbers 0 (left) and 1 (right).
The encoding vector for each message is obtained by
passing a path from the roots child to the leave
corresponding to this message and reading the
numbers of nodes (roots childintermidiatesleaf)
that compose the encoding vector.
30
Huffman Encoding: Example
Let us construct the Huffman code for the
following set of messages: x1, x2, x3, x4, x5
with the probabilities p(x1)==p(x5)=0.2
1) x1 (p=0.2), x2 (p=0.2), x3 (p=0.2), x4 (p=0.2), x5 (p=0.2)
2) x4,x5x45 (p=0.4)=> x45,x1,x2,x3
3) x2,x3x23 (p=0.4)=>x45, x23, x1
4) x1,x23x123(p=0.6)=> x123, x45
5) x123, 45x12345 (p=1)
31
Huffman Encoding: Example
32
x12345
x123 x45
x4 x1 x23 x5
x2 x3
0 1
0 0
0
1
1
1
Encoding vectors: x1(00); x2(010); x3(011); x4(10); x5(11)
Huffman Encoding: Example
Entropy
Average length of the encoding vector

The Huffman code gives (2.32/2.4)100% = 97%
efficiency

33
( )
1 1 1
( ) 5 0.2log0.2 5 log log log5 2.32
5 5 5
H X
| |
= = = = ~
|
\ .
1 1 12
3 2 2 3 2.4
5 5 5
L
| | | |
= + = =
| |
\ . \ .
Homework
Construct the Shannon-Fano code for the last
example
Examples 4-7, 4-8 from the Rezas book. Try
both Huffman and Shannon-Fano encoding.
34

SSC Quantitative Aptitude Formula Book PDF
100% (1)
SSC Quantitative Aptitude Formula Book PDF
53 pages
Information Theory and Coding
No ratings yet
Information Theory and Coding
27 pages
Information Coding Techniques
0% (2)
Information Coding Techniques
374 pages
ICT (Source Encoding & Channel Encoding)
No ratings yet
ICT (Source Encoding & Channel Encoding)
15 pages
Unit 5 - Part-Ii
No ratings yet
Unit 5 - Part-Ii
41 pages
Materi Source Coding
No ratings yet
Materi Source Coding
39 pages
CH 6
No ratings yet
CH 6
21 pages
Source Coding: Source Encoder Channel Encoder Digital Source Source Entropy Symbols Binary Sequence Modulator
No ratings yet
Source Coding: Source Encoder Channel Encoder Digital Source Source Entropy Symbols Binary Sequence Modulator
18 pages
Unit 2
No ratings yet
Unit 2
30 pages
ETN3046 Chapter 6
No ratings yet
ETN3046 Chapter 6
31 pages
Cha 02
No ratings yet
Cha 02
45 pages
Compression: Background
No ratings yet
Compression: Background
10 pages
Elements of Encoding
No ratings yet
Elements of Encoding
16 pages
Source 515 A
No ratings yet
Source 515 A
80 pages
Information and Coding Theory
No ratings yet
Information and Coding Theory
177 pages
(Karrar Shakir Muttair) Coding
No ratings yet
(Karrar Shakir Muttair) Coding
33 pages
Lecture 5
No ratings yet
Lecture 5
13 pages
Information Theory: Dr. Muhammad Imran Farid
No ratings yet
Information Theory: Dr. Muhammad Imran Farid
32 pages
Chapter Five Lossless Compression
No ratings yet
Chapter Five Lossless Compression
49 pages
Source Coding
No ratings yet
Source Coding
35 pages
Unit 2
No ratings yet
Unit 2
28 pages
97351
No ratings yet
97351
17 pages
Advanced Digital Communications (Ee5511) : MSC Module of Wireless Communication System
No ratings yet
Advanced Digital Communications (Ee5511) : MSC Module of Wireless Communication System
54 pages
Coding Line Coding Covered
No ratings yet
Coding Line Coding Covered
68 pages
Data Compression Basic Concepts of Data Compression Data Compression
No ratings yet
Data Compression Basic Concepts of Data Compression Data Compression
21 pages
Information Theory & Coding: Understand
No ratings yet
Information Theory & Coding: Understand
126 pages
Source Coding Shannon Fano Coding
No ratings yet
Source Coding Shannon Fano Coding
24 pages
2013 O Donnel ECcodes Hadamard
No ratings yet
2013 O Donnel ECcodes Hadamard
10 pages
Revision of Lecture 1: Q Bits R R Q Q (Bits/symbol) I (M P Log R R R) ? M, P
No ratings yet
Revision of Lecture 1: Q Bits R R Q Q (Bits/symbol) I (M P Log R R R) ? M, P
18 pages
Mesleki Yeterlilik
No ratings yet
Mesleki Yeterlilik
106 pages
ECCLectureNotes 2
No ratings yet
ECCLectureNotes 2
81 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
Source Coding
No ratings yet
Source Coding
18 pages
3 Source Coding
No ratings yet
3 Source Coding
31 pages
Introduction To Information Theory and Coding: Louis Wehenkel
No ratings yet
Introduction To Information Theory and Coding: Louis Wehenkel
34 pages
Project Report: "Shannon Fannon Coding"
No ratings yet
Project Report: "Shannon Fannon Coding"
8 pages
Group Presentation Digital Communication Systems
No ratings yet
Group Presentation Digital Communication Systems
29 pages
Point-to-Point Wireless Communication (III) :: Coding Schemes, Adaptive Modulation/Coding, Hybrid ARQ/FEC
No ratings yet
Point-to-Point Wireless Communication (III) :: Coding Schemes, Adaptive Modulation/Coding, Hybrid ARQ/FEC
156 pages
Data Compression (Pt2)
No ratings yet
Data Compression (Pt2)
22 pages
Block Codes Code Rate
No ratings yet
Block Codes Code Rate
7 pages
Proving Shannon's Second Theorem
No ratings yet
Proving Shannon's Second Theorem
18 pages
Block Codes
No ratings yet
Block Codes
24 pages
Low Density Parity Check Codes For Erasure Protection: Alexander Sennhauser April 22, 2005
No ratings yet
Low Density Parity Check Codes For Erasure Protection: Alexander Sennhauser April 22, 2005
20 pages
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
No ratings yet
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
7 pages
Information Theory and Coding PDF
No ratings yet
Information Theory and Coding PDF
150 pages
Noise, Information Theory, and Entropy: CS414 - Spring 2007
No ratings yet
Noise, Information Theory, and Entropy: CS414 - Spring 2007
44 pages
Coding Techniques Important Questions-1
No ratings yet
Coding Techniques Important Questions-1
6 pages
Error Control Coding
No ratings yet
Error Control Coding
20 pages
GEMATMW Coding Theory Notes
100% (1)
GEMATMW Coding Theory Notes
8 pages
Information Coding Techniques
No ratings yet
Information Coding Techniques
42 pages
Coding Intro
No ratings yet
Coding Intro
21 pages
CT3 PDF
No ratings yet
CT3 PDF
154 pages
Ch. 2 Source Coding-Ppt1 PDF
No ratings yet
Ch. 2 Source Coding-Ppt1 PDF
59 pages
Entropy & Run Length Coding
No ratings yet
Entropy & Run Length Coding
45 pages
Coding-Part 2
No ratings yet
Coding-Part 2
17 pages
Module IV
No ratings yet
Module IV
37 pages
Source Coding
No ratings yet
Source Coding
29 pages
Mobile Communicaton Engineering: Review On Fundamental Limits On Communications
No ratings yet
Mobile Communicaton Engineering: Review On Fundamental Limits On Communications
31 pages
Channel Coding Theorem
No ratings yet
Channel Coding Theorem
23 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
PRGM 2 AO - Algorithm
No ratings yet
PRGM 2 AO - Algorithm
2 pages
XI Maths DPP (02) - Sets, Relations & Functions + Basic Maths PDF
100% (1)
XI Maths DPP (02) - Sets, Relations & Functions + Basic Maths PDF
15 pages
Homework 2
No ratings yet
Homework 2
2 pages
Graph Theory by Narsing Deo
91% (23)
Graph Theory by Narsing Deo
369 pages
R23 II Year Syllabus CSE
No ratings yet
R23 II Year Syllabus CSE
45 pages
Lecture 3
No ratings yet
Lecture 3
118 pages
ADA Course Plan NAG
No ratings yet
ADA Course Plan NAG
2 pages
JMO NumberTheory
No ratings yet
JMO NumberTheory
3 pages
Department of Computer Science - Course: B.Sc. (Hons) Computer Science - Session: 2022-23
No ratings yet
Department of Computer Science - Course: B.Sc. (Hons) Computer Science - Session: 2022-23
5 pages
G10 Lesson 20 and 21 Permutation
No ratings yet
G10 Lesson 20 and 21 Permutation
14 pages
Chapter 1
No ratings yet
Chapter 1
43 pages
Timetable-Dec23 v5 Student
No ratings yet
Timetable-Dec23 v5 Student
10 pages
Theoritical Foundation in Computer Science
No ratings yet
Theoritical Foundation in Computer Science
28 pages
MCA BookList - Compressed
No ratings yet
MCA BookList - Compressed
352 pages
Maths
No ratings yet
Maths
2 pages
The Bingo Paradox MH Sept17
No ratings yet
The Bingo Paradox MH Sept17
4 pages
Number Theory
No ratings yet
Number Theory
5 pages
Relations
No ratings yet
Relations
75 pages
Graph Theory
No ratings yet
Graph Theory
13 pages
Binomial Theorem
No ratings yet
Binomial Theorem
14 pages
Earth Mathematical Olympiad (EMO) : Mathematics Association, Andromeda
No ratings yet
Earth Mathematical Olympiad (EMO) : Mathematics Association, Andromeda
21 pages
6689 01 Que 20080111
No ratings yet
6689 01 Que 20080111
24 pages
Lesson 12
No ratings yet
Lesson 12
30 pages
07 Game Playing
No ratings yet
07 Game Playing
30 pages
A New External Sorting Algorithm With No Additional Disk Space
No ratings yet
A New External Sorting Algorithm With No Additional Disk Space
5 pages
20 Coding Patterns-1
100% (1)
20 Coding Patterns-1
43 pages
Permutation and Combination (Ex-2)
No ratings yet
Permutation and Combination (Ex-2)
18 pages
GATE Questions
No ratings yet
GATE Questions
126 pages
Probabilistic Methods in Combinatorics - Prob-Comb
No ratings yet
Probabilistic Methods in Combinatorics - Prob-Comb
7 pages