Source Coding
Source Coding
ITC
Source Coding
Latha Srinivasan
Assistant Professor
Department of ECE
ASE, Bengaluru
Source coding
• Source coding, also known as data compression, refers to the process of encoding information
using fewer bits than the original representation. It is a fundamental concept in information theory
and communication systems, focusing on the efficient representation of data without losing the
original information.
• This task is performed in source encoder
• In the context of source coding, source alphabet and code alphabet are fundamental concepts that
describe the symbols used in the communication and encoding processes.
• The source alphabet refers to the set of symbols or characters that the source (e.g., a data
generator, text, or signal) produces.
S={S1, S2,………Sq}
• The code alphabet refers to the set of symbols used in the encoded representation of the source
alphabet.
X= { X1, X2, ……..Xr }
In binary coding, code alphabet contains only two symbols
X={0, 1}
Properties of the code
Example
S1 00 00 0 0 0 1
S2 01 01 1 10 01 01
• Variable-Length Code:
Codewords have different lengths.
Example: All codes except Code 1 and Code 2.
• Prefix-Free Code:
No codeword is a prefix of another codeword.
Example: Codes 2, 4, and 6 are prefix-free codes.
Classification of Codes Cont…
• Uniquely Decodable Code:
Every code word immersed in a encoded sequence can be uniquely identified.
Code 3 is not uniquely decodable because the sequence "1001" can correspond to
multiple source sequences (e.g., x2x1x1x2, and x2x3x2).
A sufficient condition for unique decodability is that the code is prefix-free.
• Instantaneous Codes:
A code is instantaneous if no codeword is the prefix of another code word. This allows
decoding to happen as soon as a codeword is recognized, without waiting for
subsequent symbols. Necessary and Sufficient Condition for Instantaneous Code is no
codeword should be a prefix of another.
• Optimal Codes:
An optimal code is instantaneous and has a minimum average codeword length, while
adhering to the probability distribution of the source symbols. Symbols with higher
probabilities of occurrence are assigned shorter codewords and vice versa.
Kraft-McMillan Inequality
The necessary and sufficient condition for the existence of an instantaneous code with a given set of
codeword lengths is
𝒒
𝒓−𝒍𝒊 ≤ 𝟏
𝒊=𝟏
𝟐−𝒍𝒊 ≤ 𝟏
𝒊=𝟏
Unit of code word length is binits, which is the short form of the binary digits.
Code Efficiency
• Code efficiency measures how closely a given coding scheme approaches the theoretical
minimum average code length dictated by the source’s entropy.
It is defined as 𝑯 𝒔
𝜼𝒄 =
𝑳 For binary code
𝑯𝒓 𝒔
𝜼𝒄 = For r-ary code
𝑳
Where H(S): Entropy of the source (average information content per symbol, in bits)
L: Average code length per symbol for the given coding scheme and is given by
𝒒
𝑳 = 𝒑𝒊 𝒍𝒊
binits /symbol
𝒊=𝟏
• Maximum Efficiency is achieved when 𝐿=𝐻(S), i.e., when the average code length
equals the source's entropy. Efficiency is 100% in this case.
• Practical Scenarios L is slightly greater than 𝐻(S) due to constraints like integer
codeword lengths and prefix-free coding.
Code Redundancy:
Code redundancy is Rc =1-ηc
Shannon's Noiseless Coding Theorem
Shannon developed a simplified method to determine the length of each codeword in an optimal
coding scheme. This method is based on the probabilities of the symbols in the source alphabet.
The length 𝑙𝑖 of the codeword assigned to a symbol S𝑖 with probability 𝑝𝑖 can be approximated as
li ≈⌈−log2pi ⌉
High Probability Symbols: For symbols that are more likely (i.e., have higher probabilities),
the codeword length will be shorter, as the negative logarithm of a larger number is closer to zero.
Low Probability Symbols: For symbols that are less likely (i.e., have smaller probabilities),
the codeword length will be longer.
The ceiling function ensures that the codeword lengths are integers
(since you cannot have a fractional number of bits).
Shannon's Noiseless Coding Theorem Cont….
• It is possible to encode a source in such a way that the average number of bits used to
represent each symbol is arbitrarily close to the entropy of the source.
• In other words, the minimum average code length required to represent a source symbol cannot be
smaller than the entropy of the source.