Information Theory Module 5
Information Theory Module 5
Dr. Markkandan S
1
Human Speech Production Mechanism
• Audio compression refers to the process of reducing the amount of data required
to represent audio signals while maintaining acceptable quality.
• Two main types of compression:
• Lossy compression: Reduces data by removing some audio information, which may
result in a loss of quality (e.g., MP3).
• Lossless compression: Reduces data without any loss of quality (e.g., FLAC).
• Compression reduces file size, transmission time, and storage requirements.
• Applications: Digital media, streaming, telecommunication, and storage.
• LPC is widely used for speech signal compression.The basic idea is to model the
vocal tract as a linear filter and represent speech as the output of this filter.
• Equation for LPC model:
p
X
yn = ai yn−i + Gen (1)
i=1
where:
• yn is the current sample,
• ai are the LPC coefficients,
• en is the excitation signal,
• G is the gain factor.
• Applications: Speech coding, synthesis, and recognition.
• The speech signal is modeled by a filter that represents the vocal tract.
• The excitation signal en drives this filter.
• LPC analyzes the speech into frames and estimates filter coefficients for each
frame.
• Voicing and pitch information helps generate the excitation signal for LPC.
• Process:
• Levinson-Durbin algorithm:
1. Initialize: E0 = R(0)
2. For i = 1 to p:
Pi−1 (i−1)
R(i) − j=1 aj R(i − j)
ki = (7)
Ei−1
(i)
ai = ki (8)
(i) (i−1) (i−1)
aj = aj − ki ai−j , 1≤j <i (9)
Ei = (1 − ki2 )Ei−1 (10)
• Parameters to transmit:
• Reflection coefficients (instead of direct LPC coefficients)
• Pitch period (for voiced frames)
• Voiced/unvoiced decision
• Gain
• Quantization:
• Reflection coefficients: Non-uniform quantization
1 + ki
gi = (11)
1 − ki
• Pitch: Logarithmic quantization
• Gain: Logarithmic quantization
• Bit allocation example (LPC-10, 2.4 kbps): Dr. Markkandan S Module-5 Audio and Video Coding 20/69
LPC: Transmission of Parameters
• Process:
1. Decode received parameters
2. Generate excitation signal
3. Synthesize speech using all-pole filter
• Excitation generation:
• Voiced: Impulse train with decoded pitch period
• Unvoiced: White noise generator
• Post-processing:
• De-emphasis filter (inverse of pre-emphasis)
• Adaptive postfiltering to enhance formants
• CELP as a solution:
• Process:
1. LPC analysis to obtain filter coefficients
2. Search codebooks for best excitation
3. Minimize perceptually weighted error
4. Transmit codebook indices and gains
Dr. Markkandan S Module-5 Audio and Video Coding 27/69
Introduction to Code Excited Linear Prediction (CELP)
• Advantages:
• Challenges:
Figure 6: Block diagram of the ITU-T H.261 CELP Encoder and Decoder
Dr. Markkandan S Module-5 Audio and Video Coding 29/69
CELP: Codebook Structure(1/2)
1. Adaptive codebook
2. Fixed (stochastic) codebook
• Adaptive codebook:
• Fixed codebook:
• Excitation signal:
e(n) = βv (n) + γc(n) (13)
where v (n) is from adaptive codebook, c(n) is from fixed codebook, β and γ are
respective gains
• Random codebook:
• Entries are Gaussian random sequences
• Typically quantized to +1, -1, or 0
• Large storage requirement
• Effect:
• Implementation:
• Benefits:
• Computational complexity:
• Codebook search most intensive
• Fast search algorithms used in practice
Dr. Markkandan S Module-5 Audio and Video Coding 36/69
CELP: Encoder Operation
• General approach:
1. Time-frequency analysis
2. Psychoacoustic modeling
3. Bit allocation based on perceptual importance
4. Quantization and coding
• Auditory masking:
• Louder sounds mask quieter sounds
• Masking threshold varies with frequency
• Critical bands:
• Non-uniform frequency resolution of ear
• Approximated by bark scale
• Temporal masking:
• Pre-masking: 20 ms before masker
• Post-masking: up to 200 ms after masker
Figure 10: Frequency masking
• Just Noticeable Distortion (JND):
• Minimum perceivable change in sound
• Varies with frequency and intensity
Image source: https://www.researchgate.net/figure/Illustration-of-the-masking-effect_fig2_220009783
• Key features:
• Perceptual coding
principles
• Filterbank for
time-frequency
mapping
• Psychoacoustic
model
• Dynamic bit
allocation coding
Figure 11: General structure of MPEG audio encoder
Solution
Calculating for each coefficient:
1 + 0.5 1.5
g1 = = =3
1 − 0.5 0.5
1 + (−0.3) 0.7
g2 = = ≈ 0.538
1 − (−0.3) 1.3
1 + 0.2 1.2
g3 = = = 1.5
1 − 0.2 0.8
1 + 0.1 1.1
g4 = = ≈ 1.222
1 − 0.1 0.9
Therefore, the g-parameters are approximately: 3, 0.538, 1.5, and 1.222.
Dr. Markkandan S Module-5 Audio and Video Coding 47/69
CELP: Numerical Problem 2
Problem
In a CELP coder, the excitation signal is given by: [ e(n) = βv (n) + γc(n)]If β = 0.8,
γ = 0.5, v (n) = 1, −1, 0.5, −0.5, and c(n) = 0.2, 0.3, −0.1, 0.1 for a subframe of 4
samples, calculate the excitation signal e(n).
Solution
Calculating sample by sample:
Pi−1 (i−1)
R(i) − j=1
a jR(i − j)
ki =
Ei − 1
(i) (i−1) (i−1)
a j =a j − ki a i − j, 1≤j <i
2
Ei = (1 − ki )E i −1
Solution
R(1)
First, calculate k1 : [ k1 = R(0) = 0.5
1
= 0.5]Then,E1 :[E1 = (1 − k12 )E0 = (1 − 0.52 )1 = 0.75]
Now, calculate k2 :
(1)
R(2) − a1 R(1) 0.2 − 0.5(0.5) 0.2 − 0.25 1
k2 = = = =− ≈ −0.0667
E1 0.75 0.75 15
(2)
Finally, calculate a1 :
(2)
Therefore, k2 ≈ −0.0667 and a1 ≈ 0.5667
• 30 frames/second, 16 bits/pixel
• 168 Mbits/second
• Advantages:
• I-frames (Intra-coded)
• P-frames (Predictive-coded)
• Benefits:
• Efficient compression
• Flexible access to video
3. Quantization
4. Entropy coding
1. Entropy decoding
2. Inverse quantization
3. Inverse transform
4. Motion compensation
• Methods:
• Applications:
• Digital television
• Interactive graphics applications
• Streaming media
Dr. Markkandan S Module-5 Audio and Video Coding 63/69
MPEG-4: Object-Based Coding
w′ g h 1 1
• Temporal Scalability:
• Enhance frame rate
• Base layer + enhancement layer(s)
• Spatial Scalability:
• Enhance spatial resolution
• Use upsampling of base layer
• SNR Scalability:
• Enhance quality (Signal-to-Noise Ratio)
• Refine quantization in enhancement layer
• Object Scalability:
• Selectively transmit or decode objects Dr. Markkandan S Module-5 Audio and Video Coding 67/69
MPEG-4: Shape Coding
• Methods:
• Bitmap-based
• Contour-based