0% found this document useful (0 votes)
86 views

Source Coding

1. Source coding is the conversion of the output of a discrete memoryless source into a binary sequence with the goal of minimizing the average bit rate. 2. Source coding theory states that the average code length L must be greater than or equal to the entropy H(X) of the source. L can be made arbitrarily close to H(X) with a suitable code. 3. Entropy coding techniques like Shannon-Fano and Huffman coding aim to design variable length codes with average code lengths close to the source entropy.

Uploaded by

Joy Nkirote
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views

Source Coding

1. Source coding is the conversion of the output of a discrete memoryless source into a binary sequence with the goal of minimizing the average bit rate. 2. Source coding theory states that the average code length L must be greater than or equal to the entropy H(X) of the source. L can be made arbitrarily close to H(X) with a suitable code. 3. Entropy coding techniques like Shannon-Fano and Huffman coding aim to design variable length codes with average code lengths close to the source entropy.

Uploaded by

Joy Nkirote
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

SOURCE CODING

1. Definition of Source Coding Terms


(a) Code Length
(b) Code efficiency
(c) Code redundancy
2. Source Coding Theory
3. Classification of Codes
4. Entropy Encoding
(a) Shannon-Fano
(b) Huffman

ECE 416 – Digital Communication


Friday, 23 March 2018
1
SYLLABUS

2
DIGITAL COMMUNICATION

Baseband signal
Signal Source Source Channel Modulator
& Transducer Encoder Encoder

Microphone Sampling Error Control


TV Camera Companding
Flow-sensor Encrypting
etc
Channel
Free space
Co-axial cable
Baseband signal Water
fibre

Signal Output Message Channel Demodulator


Recovery decoder
SOURCE CODING DEFINITION & OBJECTIVE

1. Sources Coding is the conversion of the


output of a Discrete Memory-less Source
(DMS) into a binary sequence.
2. The Objective of Source Coding is to
minimize the average bit rate required to
represent signal by reducing the redundancy
of the information source.

4
CLASSIFICATION OF INFORMATION SOURCES

Information Sources fall into two categories:

a) Memory sources where the current symbol


depends on the previous symbols,

b) Memory-less sources where current symbols


are independent of previous symbols.

5
CODE LENGTH DEFINITION

• Assume X is a DMS with finite entropy H(X)


and alphabet {x1,x2,..,xn) and with
corresponding probabilities P(xi) for i =
1,2,..,m.
• If the binary code word assigned to a symbol xi
by the source encoder has length n, then the
codeword length is the number of binary
digits in the code word.

6
OTHER DEFINITIONS

1. The average Codeword Length is given by:


m
L   P ( xi )ni
i 1

2. Code Efficiency is defined as:


Lmin
 
L
where Lmin is the minimum possible value of the average
code length, L.
3. The code is said to be efficient when the code length,η
tends to 1.

3. Code redundancy is defined as:


  1  7
SOURCE CODING THEOREM

• The Source coding theorem states that a DMS


X with entropy H(X), and average code length
L per symbol is bound by
L ≥ H(X)
• Further, that L can be made closer to H(X) as
desired through some suitable code.

8
CLASSIFICATION OF CODES

1. Fixed Length Code:


A code whose code length is fixed.

9
CLASSIFICATION OF CODES

2. Variable Length Code


A code whose length varies for different
symbols.

10
CLASSIFICATION OF CODES

3. Distinct Code is a code in which each code


word is distinguishable from other code words

11
CLASSIFICATION OF CODES

4. Prefix-free Codes
Codes in which no codeword can be formed
by adding code symbols to another codeword

12
CLASSIFICATION OF CODES

5. Uniquely Decodable Codes


A code in which the original sequence can be
reconstructed perfectly from the encoded
binary sequence.

13
CLASSIFICATION OF CODES

6. Instantaneous Codes
1. A code which has the end of any codeword is
recognizable without examining subsequent code
symbols.
2. Instantaneous Codes have the property that no
codeword is a prefix of another codeword.

14
CLASSIFICATION OF CODES

7. Optimal Codes:
A code is said to be optimal if it is
instantaneous and has a minimum average
Length, 𝐿𝑚𝑖𝑛

15
WORKED EXAMPLE - 1

1. A Discrete Memory-less Source X has alphabet


{x1, x2} and associated probabilities, P(x1)=0.9,
P(x2)=0.1 where the symbols are encoded as:

Find the efficiency and redundancy of the code

16
SOLUTION

Entropy is:

Code efficiency is

Code redundancy is:

17
EXAMPLE 2

2. A Discrete Memory-Less Source X has


alphabet {x1,x2,x3,x4} and a source coding as
shown below.
xi P(xi) Code
X1 0.81 0
X2 0.09 10
X3 0.09 110
X4 0.01 111

Determine the efficiency and redundancy of


the code.
18
SOLUTION

• The average code word length, L is:

• The entropy H(X) is given by:


H(X)

• Code Efficiency is therefore:


H ( X ) 0.938
   0.727
L 1.29

• Code redundancy is therefore:


  1    0.273  27%
19
ENTROPY CODING

1. Entropy coding refers to the design of a


variable length code such that its average
codeword length approaches the entropy.
There are two main type of entropy coding,
i.e
(a) Shannon-Fano Coding
(b) Huffman Coding

20
SHANNON-FANO CODING

• Named after Claude Shannon and Robert Fano, is


a technique for constructing a prefix-code based
on a set of symbols and their probabilities
(estimated or measured).
• Shannon–Fano coding is suboptimal in the sense
that it does not achieve the lowest possible
expected code word length like Huffman coding.
• The technique was first proposed in Shannon's "A
Mathematical Theory of Communication", his
1948 article introducing the field of information
theory.

21
SHANNON-FANO CODING

The Shannon-Fano Code is generated by using


the following procedure:
1. List the source symbols in the order of
decreasing probability.
2. Partition the set into two sets with as nearly
equal probabilities as possible and assign 0 to the
upper set and 1 to the lower set.
3. Continue with the process each time partitioning
the sets with as nearly equal probabilities as
possible until further partitioning of sets is not
possible.
22
SHANNON-FANO CODING
1. Original symbol list and Probabilities
x(i) P(x(i))
x1 0.05
x2 0.30
x3 0.08
x4 0.25
x5 0.20
x6 0.12

2. Sort in the order of decreasing probabilities


x(i) P(x(i))
x2 0.30
x4 0.25
x5 0.20
x6 0.12
x3 0.08
x1 0.05

3. Partition into 2 sets above and below 0.5 (approx.)


x(i) P(x(i)) Step 1

x2 0.30 0
Assign 0
x4 0.25 0

x5 0.20 1

x6 0.12 1

x3 0.08 1
Assign 1
x1 0.05 1

23
SHANNON-FANO CODING

4. Partition into 2 sets above and below the middle points


x(i) P(x(i)) Step 1 Step 2
x2 0.30 0 0
x4 0.25 0 1
x5 0.20 1 0
x6 0.12 1 1
x3 0.08 1 1
Remaining
x1 0.05 1 1

5. Partition the remaining into 2 sets above and below the middle points
x(i) P(x(i)) Step 1 Step 2 Step 3
x2 0.30 0 0
x4 0.25 0 1
x5 0.20 1 0
x6 0.12 1 1 0
x3 0.08 1 1 1
x1 0.05 1 1 1 Remaining

5. Partition the remaining into 2 sets above and below the middle points
x(i) P(x(i)) Step 1 Step 2 Step 3 Step 4 Code
x2 0.30 0 0 00
x4 0.25 0 1 01
x5 0.20 1 0 10
x6 0.12 1 1 0 110
x3 0.08 1 1 1 0 1110
24
x1 0.05 1 1 1 1 1111
SHANNON-FANO CODING EXAMPLE 1

A DMS has four symbols, i.e x1. x2, x3 and x4 with


probabilities P(x1)=1/2, P(x2)=1/4, P(x3)=P(x4)=1/8.

Construct the Shannon-Fano Code and determine the


code efficiency.

25
SHANNON-FANO CODE-EXAMPLE 1 - SOLUTION

1. Shannon-Fano Code
x(i) P(x(i)) Step 1 Step 2 Step 3 Code
x1 0.500 0 0
x2 0.250 1 0 10
x3 0.125 1 1 0 110
x4 0.125 1 1 1 111

𝐼 𝑥1 = 𝑙𝑜𝑔2 2 = 1 = n1
𝐼 𝑥2 = 𝑙𝑜𝑔2 4 = 2 = n2
𝐼 𝑥3 = 𝑙𝑜𝑔2 8 = 3 = n3
𝐼 𝑥4 = 𝑙𝑜𝑔2 8 = 3 = n4
4
1 1 1 1
𝐻 𝑋 = ෍ 𝑃 𝑥𝑖 𝐼 𝑥𝑖 = 1 + 2 + 3 + 3 = 1.75
2 4 8 8
𝑖=1
4
1 1 1 1
𝐿 = ෍ 𝑃 𝑥𝑖 𝑛𝑖 = 1 + 2 + 3 + 3 = 1.75
2 4 8 8
𝑖=1
1. Efficiency
𝐻(𝑋)
𝑛= = 1 𝑜𝑟 100% 26
𝐿
HUFFMAN CODE

1. Huffman coding is a lossless data compression


algorithm using variable length codes.
2. Lengths of the assigned codes are based on the
frequencies of corresponding characters.
a) Most frequent character gets the shortest code.
b) Least frequent character gets the longest code.
3. The variable-length codes assigned to input
characters are Prefix Codes, i.e
– the codes are assigned in such a manner that the
code assigned to one character is not prefix of code
assigned to any other character.

27
HUFFMAN CODE

1. The Huffman Code results in a code that is optimal


and is therefore a code with the highest efficiency.
2. The Huffman procedure is based on the following
observations regarding optimum prefix codes.
a) Symbols that occur more frequently (have a higher
probability of occurrence) will have the shortest code
words.
b) The two symbols that occur least frequently will have the
same length.
c) Code words corresponding to the two lowest probability
symbols differ only in the last bit.

28
STEPS IN HUFFMAN CODING

There are mainly two major


steps in Huffman Coding, i.e
1. Build a Huffman Tree from
input characters.
2. Traverse the Huffman Tree
and assign codes to
characters. Character Code
a 0
b 111
d 1011
d 100
r 110
29
! 1010
ASSIGN CODES

1. Start encoding from the last reduction on the


tree.
2. Assign 0 to the first digit of the code words for
all symbols associated with the first probability.
Assign 1 to the second probability.
3. Assign 0 and 1 to the second digit of the two
probabilities that were combined in previous
reduction step while retaining all assignments in
the previous stage
4. Repeat the process until the first column is
reached.

30
USING THE HUFFMAN CODE IN PRACTICE
• Assume that you a character file that you would like to compress. By parsing
through the list, a computer stablishes that there are 100,000 characters with a
frequency of occurrence as shown below.
Character Frequency
A 45,000
B 13,000
C 12,000
D 16,000
E 9,000
F 5,000
Total 100,000

• Determine a code that encodes the file using as few bits as possible.

SOLUTION 1: USING A HAMMER


A fixed code scheme would require 3 bits per character., i.e 2𝑣 ≥ 6
Therefore using this code we will store 3 x 100,000 = 300kbits
----------
If on the other hand we used a byte to store each character, we would have
required a file of size 8 x 100,000 or 800kbits
31
HUFFMAN CODE FOR THE EXAMPLE

1. Average Code Length, L = 2.24


2. Bits required = L x 100,000 = 224,000 bits
32
HOMEWORK

• Determine the Huffman code for the following


codes and their corresponding probabilities.
Character Probability Character Probability

A 0.05 F 0.3 0.3 0.3 0.3 0.4 0.6

B 0.15 C 0.2 0.2 0.2 0.3 0.3

C 0.2 G 0.1 0.1 0.2 0.2 0.3

D 0.05 B 0.15 0.15 0.15 0.2

E 0.15 E 0.15 0.15 0.15

F 0.3 A 0.05 0.1

G 0.1 D 0.05

33
FIRST, CREATE THE TREE

Character Probability Character Probability


F 0.3 0.3 0.3 0.3 0.4 0.6
A 0.05
C 0.2 0.2 0.2 0.3 0.3
B 0.15
B 0.15 0.15 0.2 0.2 0.3
C 0.2
E 0.15 0.15 0.15 0.2
D 0.05
G 0.1 0.1 0.15
E 0.15

F 0.3 A 0.05 0.1

G 0.1 D 0.05

34
USE ONLINE CALCULATOR TO CROSS-CHECK

35

You might also like