0% found this document useful (0 votes)
2 views

Source Coding

Source coding, or data compression, is the process of encoding information using fewer bits while preserving the original data, involving concepts like source and code alphabets. Codes can be classified into fixed-length, variable-length, distinct, prefix-free, uniquely decodable, instantaneous, and optimal codes, with efficiency measured against the source's entropy. Shannon's Noiseless Coding Theorem establishes that the average code length cannot be less than the entropy of the source, guiding optimal coding schemes.

Uploaded by

7.bharani12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Source Coding

Source coding, or data compression, is the process of encoding information using fewer bits while preserving the original data, involving concepts like source and code alphabets. Codes can be classified into fixed-length, variable-length, distinct, prefix-free, uniquely decodable, instantaneous, and optimal codes, with efficiency measured against the source's entropy. Shannon's Noiseless Coding Theorem establishes that the average code length cannot be less than the entropy of the source, guiding optimal coding schemes.

Uploaded by

7.bharani12
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

19ECE312

ITC
Source Coding
Latha Srinivasan
Assistant Professor
Department of ECE
ASE, Bengaluru
Source coding
• Source coding, also known as data compression, refers to the process of encoding information
using fewer bits than the original representation. It is a fundamental concept in information theory
and communication systems, focusing on the efficient representation of data without losing the
original information.
• This task is performed in source encoder
• In the context of source coding, source alphabet and code alphabet are fundamental concepts that
describe the symbols used in the communication and encoding processes.
• The source alphabet refers to the set of symbols or characters that the source (e.g., a data
generator, text, or signal) produces.
S={S1, S2,………Sq}
• The code alphabet refers to the set of symbols used in the encoded representation of the source
alphabet.
X= { X1, X2, ……..Xr }
In binary coding, code alphabet contains only two symbols
X={0, 1}
Properties of the code
Example

Si Code 1 Code 2 Code 3 Code 4 Code5 Code 6

S1 00 00 0 0 0 1

S2 01 01 1 10 01 01

S3 00 10 00 110 011 001

S4 11 11 11 111 0111 0001


Classification of Codes
• Fixed-Length Code (or Block Codes)
Each codeword has a fixed length.
Example: Code 1 and Code 2 (length 2).

• Variable-Length Code:
Codewords have different lengths.
Example: All codes except Code 1 and Code 2.

• Distinct Codes (Non-Singular Codes):


Each codeword is unique and distinguishable from others.
All codes except Code 1 are distinct (non-singular).

• Prefix-Free Code:
No codeword is a prefix of another codeword.
Example: Codes 2, 4, and 6 are prefix-free codes.
Classification of Codes Cont…
• Uniquely Decodable Code:
Every code word immersed in a encoded sequence can be uniquely identified.
Code 3 is not uniquely decodable because the sequence "1001" can correspond to
multiple source sequences (e.g., x2x1x1x2, and x2x3x2​).
A sufficient condition for unique decodability is that the code is prefix-free.
• Instantaneous Codes:
A code is instantaneous if no codeword is the prefix of another code word. This allows
decoding to happen as soon as a codeword is recognized, without waiting for
subsequent symbols. Necessary and Sufficient Condition for Instantaneous Code is no
codeword should be a prefix of another.
• Optimal Codes:
An optimal code is instantaneous and has a minimum average codeword length, while
adhering to the probability distribution of the source symbols. Symbols with higher
probabilities of occurrence are assigned shorter codewords and vice versa.
Kraft-McMillan Inequality
The necessary and sufficient condition for the existence of an instantaneous code with a given set of
codeword lengths is
𝒒

෍ 𝒓−𝒍𝒊 ≤ 𝟏
𝒊=𝟏

Where r=Number of symbols in the code alphabet X


li= Word length corresponding to ith source symbol
q=Number of source symbols
For binary codes 𝒒

෍ 𝟐−𝒍𝒊 ≤ 𝟏
𝒊=𝟏

Unit of code word length is binits, which is the short form of the binary digits.
Code Efficiency
• Code efficiency measures how closely a given coding scheme approaches the theoretical
minimum average code length dictated by the source’s entropy.
It is defined as 𝑯 𝒔
𝜼𝒄 =
𝑳 For binary code

𝑯𝒓 𝒔
𝜼𝒄 = For r-ary code
𝑳

Where H(S): Entropy of the source (average information content per symbol, in bits)
L: Average code length per symbol for the given coding scheme and is given by
𝒒

𝑳 = ෍ 𝒑𝒊 𝒍𝒊
binits /symbol
𝒊=𝟏

Where pi is probability of the ith symbol and


li is the code word length of the ith symbol
Code Efficiency

• Maximum Efficiency is achieved when 𝐿=𝐻(S), i.e., when the average code length
equals the source's entropy. Efficiency is 100% in this case.
• Practical Scenarios L is slightly greater than 𝐻(S) due to constraints like integer
codeword lengths and prefix-free coding.

Code Redundancy:
Code redundancy is Rc =1-ηc
Shannon's Noiseless Coding Theorem
Shannon developed a simplified method to determine the length of each codeword in an optimal
coding scheme. This method is based on the probabilities of the symbols in the source alphabet.
The length 𝑙𝑖 of the codeword assigned to a symbol S𝑖 with probability 𝑝𝑖​ can be approximated as
li​ ≈⌈−log2​pi ⌉
High Probability Symbols: For symbols that are more likely (i.e., have higher probabilities),
the codeword length will be shorter, as the negative logarithm of a larger number is closer to zero.
Low Probability Symbols: For symbols that are less likely (i.e., have smaller probabilities),
the codeword length will be longer.
The ceiling function ensures that the codeword lengths are integers
(since you cannot have a fractional number of bits).
Shannon's Noiseless Coding Theorem Cont….

• It is possible to encode a source in such a way that the average number of bits used to
represent each symbol is arbitrarily close to the entropy of the source.
• In other words, the minimum average code length required to represent a source symbol cannot be
smaller than the entropy of the source.

The Shannon's Noiseless Coding Theorem states that:


• The average length of the code L for a given source S must satisfy the inequality
L ≥ H(S)

You might also like