0% found this document useful (0 votes)
32 views69 pages

Information Theory Module 5

Uploaded by

Akash Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views69 pages

Information Theory Module 5

Uploaded by

Akash Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Module-5

Audio and Video Coding

Dr. Markkandan S

School of Electronics Engineering (SENSE)


Vellore Institute of Technology
Chennai

1
Human Speech Production Mechanism

Figure 1: Human Speech Production Mechanism


Dr. Markkandan S Module-5 Audio and Video Coding 2/69
Introduction to Audio Compression

• Audio compression refers to the process of reducing the amount of data required
to represent audio signals while maintaining acceptable quality.
• Two main types of compression:
• Lossy compression: Reduces data by removing some audio information, which may
result in a loss of quality (e.g., MP3).
• Lossless compression: Reduces data without any loss of quality (e.g., FLAC).
• Compression reduces file size, transmission time, and storage requirements.
• Applications: Digital media, streaming, telecommunication, and storage.

Dr. Markkandan S Module-5 Audio and Video Coding 3/69


Introduction to Audio Coding

• Audio coding: compression of audio signals for efficient storage/transmission


• Objectives:
• Reduce bit rate while maintaining perceptual quality
• Enable efficient storage and transmission of audio
• Key approaches:
• Waveform coding: Directly encode audio samples
• Parametric coding: Encode model parameters (e.g., LPC)
• Perceptual coding: Exploit human auditory system limitations
• Trade-offs:
• Compression ratio vs. audio quality
• Computational complexity vs. performance
• Delay vs. coding efficiency
• Applications: Digital audio broadcasting, VoIP, music streaming, etc.
Dr. Markkandan S Module-5 Audio and Video Coding 4/69
Types of Audio Coding Techniques

• Common audio coding techniques:


• Pulse Code Modulation (PCM)
• Differential Pulse Code Modulation (DPCM)
• Adaptive Differential PCM (ADPCM)
• Linear Predictive Coding (LPC)
• Code-Excited Linear Prediction (CELP)
• Perceptual Audio Coding (e.g., MPEG Audio)
• PCM: Direct sampling and quantization of the audio signal.
• DPCM and ADPCM: Reduces redundancy in the signal by encoding differences
between samples.
• LPC and CELP: Use models of human speech production to achieve efficient
compression.

Dr. Markkandan S Module-5 Audio and Video Coding 5/69


Linear Predictive Coding(LPC)
Overview of Linear Predictive Coding (LPC)

• LPC: Efficient parametric coding technique for speech


• Core idea: Model speech production process
• Key components:
• Excitation source model
• Vocal tract filter model
• Process:
1. Analyze speech to extract model parameters
2. Transmit parameters (not raw audio)
3. Synthesize speech at receiver using parameters
• Advantages:
• Very low bit rate (2.4 kbps - 4.8 kbps)
• Good intelligibility for speech
• Limitations:
• ”Robotic” sound quality
• Not suitable for non-speech audio
Dr. Markkandan S Module-5 Audio and Video Coding 7/69
Linear Predictive Coding (LPC): Overview

• LPC is widely used for speech signal compression.The basic idea is to model the
vocal tract as a linear filter and represent speech as the output of this filter.
• Equation for LPC model:
p
X
yn = ai yn−i + Gen (1)
i=1

where:
• yn is the current sample,
• ai are the LPC coefficients,
• en is the excitation signal,
• G is the gain factor.
• Applications: Speech coding, synthesis, and recognition.

Dr. Markkandan S Module-5 Audio and Video Coding 8/69


LPC: Speech Production Model

• Models speech as output of a time-varying linear system


• Two main components:
• Excitation source: Models airflow from lungs
• Vocal tract filter: Models acoustic properties of vocal tract
• Excitation types:
• Voiced: Quasi-periodic pulses (e.g., vowels)
• Unvoiced: White noise (e.g., fricatives)
• Vocal tract modeled as an all-pole filter:
G
H(z) = Pp −k
(2)
1− k=1 ak z
where G is gain, p is filter order, ak are filter coefficients
• Time-domain representation:
p
X
s(n) = ak s(n − k) + Gu(n) (3)
k=1 Dr. Markkandan S Module-5 Audio and Video Coding 9/69
LPC: Encoder and Decoder

Figure 2: LPC encoder and Decoder


Dr. Markkandan S Module-5 Audio and Video Coding 10/69
LPC: Speech Signal Modeling

• The speech signal is modeled by a filter that represents the vocal tract.
• The excitation signal en drives this filter.
• LPC analyzes the speech into frames and estimates filter coefficients for each
frame.

Figure 3: Speech synthesis model

Dr. Markkandan S Module-5 Audio and Video Coding 11/69


LPC: Vocal Tract Filter

• All-pole filter approximates vocal tract resonances (formants)


• Transfer function:
G
H(z) = (4)
1 − pk=1 ak z −k
P

• Typical filter order: 10-12 for 8 kHz sampled speech


• Estimation of filter coefficients:
• Minimize mean squared prediction error
• Autocorrelation method:
Ra = r (5)
where R is the autocorrelation matrix, a is the coefficient vector, r is the
autocorrelation vector
• Solved efficiently using Levinson-Durbin recursion
• Stability ensured by converting to reflection coefficients
• Quantization: Non-uniform quantization of reflection coefficients
Dr. Markkandan S Module-5 Audio and Video Coding 12/69
LPC: Vocal Tract Filter

Figure 4: Model for speech synthesis with vocal tract filter

Dr. Markkandan S Module-5 Audio and Video Coding 13/69


LPC: Excitation Source

• Two types of excitation:


1. Voiced excitation:
• Quasi-periodic pulse train
• Parameters: Pitch period, voiced/unvoiced decision
• Pitch detection algorithms:
• Time-domain: Autocorrelation, AMDF
• Frequency-domain: Harmonic peak detection
2. Unvoiced excitation:
• White noise generator
• No additional parameters needed
• Voiced/Unvoiced decision:
• Based on features like:
• Short-term energy
• Zero-crossing rate
• First reflection coefficient
• Excitation gain: Dr. Markkandan S Module-5 Audio and Video Coding 14/69
LPC: Excitation Source

Dr. Markkandan S Module-5 Audio and Video Coding 15/69


LPC: Voicing and Pitch Detection

• Voicing decision: Determines if the segment is voiced or unvoiced.

• Voiced speech has a periodic structure, unvoiced speech resembles noise.

• Pitch period estimation: A critical step for voiced speech.

• The pitch period is extracted using autocorrelation or average magnitude difference


function (AMDF):
N
1 X
AMDF (P) = |yi − yi−P | (6)
N
i=1

• Voicing and pitch information helps generate the excitation signal for LPC.

Dr. Markkandan S Module-5 Audio and Video Coding 16/69


LPC: Voicing and Pitch Detection - AMDF

AMDF for ’e’ and ’s’


Pitch Detection
Dr. Markkandan S Module-5 Audio and Video Coding 17/69
LPC: Parameter Estimation

• Goal: Estimate LPC model parameters from speech signal

• Process:

1. Pre-emphasis: High-pass filter to flatten spectral slope


2. Framing: Divide speech into 20-30 ms frames
3. Windowing: Apply window function (e.g., Hamming) to each frame
4. Autocorrelation: Compute autocorrelation coefficients
5. Levinson-Durbin recursion: Solve for LPC coefficients

Dr. Markkandan S Module-5 Audio and Video Coding 18/69


LPC: Parameter Estimation

• Levinson-Durbin algorithm:

1. Initialize: E0 = R(0)
2. For i = 1 to p:
Pi−1 (i−1)
R(i) − j=1 aj R(i − j)
ki = (7)
Ei−1
(i)
ai = ki (8)
(i) (i−1) (i−1)
aj = aj − ki ai−j , 1≤j <i (9)
Ei = (1 − ki2 )Ei−1 (10)

• Output: LPC coefficients ai and reflection coefficients ki

Dr. Markkandan S Module-5 Audio and Video Coding 19/69


LPC: Transmission of Parameters

• Parameters to transmit:
• Reflection coefficients (instead of direct LPC coefficients)
• Pitch period (for voiced frames)
• Voiced/unvoiced decision
• Gain

• Quantization:
• Reflection coefficients: Non-uniform quantization
1 + ki
gi = (11)
1 − ki
• Pitch: Logarithmic quantization
• Gain: Logarithmic quantization

• Bit allocation example (LPC-10, 2.4 kbps): Dr. Markkandan S Module-5 Audio and Video Coding 20/69
LPC: Transmission of Parameters

• Reflection coefficients: 41 bits


• Pitch and V/UV: 7 bits
• Gain: 5 bits
• Synchronization: 1 bit

• Frame duration: 22.5 ms (180 samples at 8 kHz)

• Resulting bit rate: 54 bits / 22.5 ms 2400 bps

Dr. Markkandan S Module-5 Audio and Video Coding 21/69


LPC: Speech Synthesis at Receiver

• Process:
1. Decode received parameters
2. Generate excitation signal
3. Synthesize speech using all-pole filter

• Excitation generation:
• Voiced: Impulse train with decoded pitch period
• Unvoiced: White noise generator

• All-pole filter implementation:


p
X
s(n) = ak s(n − k) + Gu(n) (12)
k=1

• Overlap-add successive frames to reduce discontinuities


Dr. Markkandan S Module-5 Audio and Video Coding 22/69
LPC: Speech Synthesis at Receiver

• Post-processing:
• De-emphasis filter (inverse of pre-emphasis)
• Adaptive postfiltering to enhance formants

Dr. Markkandan S Module-5 Audio and Video Coding 23/69


Limitations of LPC and Need for CELP

• Limitations of basic LPC:

• ”Buzzy” or ”robotic” sound quality


• Poor representation of unvoiced and transient sounds
• Binary voiced/unvoiced decision too simplistic
• Limited pitch resolution
• Sensitive to transmission errors

• Reasons for limitations:

• Oversimplified excitation model


• All-pole filter may not capture all spectral details
• Frame-based analysis loses some temporal resolution
Dr. Markkandan S Module-5 Audio and Video Coding 24/69
Limitations of LPC and Need for CELP

• Need for improvement:

• Better excitation model


• Finer pitch and spectral representation
• Improved perceptual quality at low bit rates

• CELP as a solution:

• Addresses LPC limitations


• Uses codebook of excitation vectors
• Incorporates perceptual weighting
• Achieves better quality at similar bit rates

Dr. Markkandan S Module-5 Audio and Video Coding 25/69


Code Excited Linear Prediction (CELP)
Introduction to Code Excited Linear Prediction (CELP)

• Key idea: Use codebook of excitation vectors


• Components:
• LPC filter (as in traditional LPC)
• Adaptive codebook (for pitch structure)
• Fixed (stochastic) codebook (for residual excitation)
• Perceptual weighting filter

• Process:
1. LPC analysis to obtain filter coefficients
2. Search codebooks for best excitation
3. Minimize perceptually weighted error
4. Transmit codebook indices and gains
Dr. Markkandan S Module-5 Audio and Video Coding 27/69
Introduction to Code Excited Linear Prediction (CELP)

• Advantages:

• Improved speech quality compared to LPC


• Efficient at low bit rates (4.8-16 kbps)
• Handles both voiced and unvoiced speech well

• Challenges:

• Computationally intensive codebook search


• Requires larger codebooks for higher quality

Dr. Markkandan S Module-5 Audio and Video Coding 28/69


Introduction to Code Excited Linear Prediction (CELP)

Figure 6: Block diagram of the ITU-T H.261 CELP Encoder and Decoder
Dr. Markkandan S Module-5 Audio and Video Coding 29/69
CELP: Codebook Structure(1/2)

• Two main codebooks in CELP:

1. Adaptive codebook
2. Fixed (stochastic) codebook

• Adaptive codebook:

• Models pitch periodicity and long-term correlations


• Constructed from past excitation signals
• Updated each subframe
• Typically 128-256 entries

Dr. Markkandan S Module-5 Audio and Video Coding 30/69


CELP: Codebook Structure(2/2)

• Fixed codebook:

• Models residual excitation after pitch prediction


• Designed to cover range of possible excitations
• Typically 512-1024 entries
• Various structures: Binary, ternary, sparse algebraic

• Excitation signal:
e(n) = βv (n) + γc(n) (13)

where v (n) is from adaptive codebook, c(n) is from fixed codebook, β and γ are
respective gains

Dr. Markkandan S Module-5 Audio and Video Coding 31/69


CELP: Stochastic Codebook

• Purpose: Model residual excitation not captured by adaptive codebook


• Types of stochastic codebooks:
• Random codebook: Gaussian random sequences
• Algebraic codebook: Structured sparse vectors

• Random codebook:
• Entries are Gaussian random sequences
• Typically quantized to +1, -1, or 0
• Large storage requirement

• Algebraic codebook (ACELP):


• Sparse vectors with few non-zero pulses
• Pulse positions and signs determined by algebraic structure
Dr. Markkandan S Module-5 Audio and Video Coding 32/69
CELP: Adaptive Codebook

• Purpose: Model pitch periodicity and long-term correlations


• Structure:
• Contains past excitation signals
• Updated each subframe
• Typically 128-256 entries

• Adaptive codebook vector:


v (n) = e(n − T + i), i = 0, 1, ..., N − 1 (16)
where T is pitch lag, N is subframe size
• Fractional pitch:
• Allows finer pitch resolution (e.g., 1/3 or 1/4 sample)
• Requires interpolation of past excitation Dr. Markkandan S Module-5 Audio and Video Coding 33/69
CELP: Perceptual Weighting

• Purpose: Shape quantization noise to be less perceptible

• Concept: Exploit masking properties of human auditory system

• Perceptual weighting filter:


A(z/γ1 )
W (z) = (18)
A(z/γ2 )
where A(z) is LPC filter, typically γ1 = 0.9, γ2 = 0.5

• Effect:

• Attenuates error in formant regions


• Amplifies error in spectral valleys

Dr. Markkandan S Module-5 Audio and Video Coding 34/69


CELP: Perceptual Weighting

• Implementation:

• Apply W (z) to error signal in codebook search


• Minimize weighted error:
N−1
X
Ew = [xw (n) − x̂w (n)]2 (19)
n=0

• Benefits:

• Improved subjective quality


• Better allocation of bits to perceptually important regions

Dr. Markkandan S Module-5 Audio and Video Coding 35/69


CELP: Encoder Operation

• CELP encoding steps:


1. LPC analysis: Compute coefficients, convert
to LSP
2. Subframe processing (typically 4 per frame)
• Bit allocation example
3. Adaptive codebook search: Find best pitch lag
(FS1016 4.8 kbps):
and gain
4. Fixed codebook search: Find best index and • LSP: 34 bits/frame
gain • Pitch: 8 bits/subframe
5. Parameter quantization: LSPs, pitch, indices, • Fixed codebook: 9
gains bits/subframe
6. Update memories for next frame • Gains: 5 bits/subframe

• Computational complexity:
• Codebook search most intensive
• Fast search algorithms used in practice
Dr. Markkandan S Module-5 Audio and Video Coding 36/69
CELP: Encoder Operation

Figure 7: Block diagram of CELP encoder

Image source: https://www.researchgate.net/figure/Block-diagram-of-CELP-encoder_fig1_264423574

Dr. Markkandan S Module-5 Audio and Video Coding 37/69


CELP: Decoder Operation

• Steps in CELP decoding: • Excitation reconstruction:


1. Parameter decoding
2. LSP to LPC conversion e(n) = βv (n) + γc(n) (20)
3. For each subframe:
• Adaptive codebook contribution • LPC synthesis:
• Fixed codebook contribution p
• Excitation reconstruction
X
s(n) = ak s(n − k) + e(n)
• LPC synthesis filtering
k=1
4. Post-processing (21)

Dr. Markkandan S Module-5 Audio and Video Coding 38/69


CELP: Decoder Operation

Figure 8: Block diagram of CELP decoder

Image source: https://www.researchgate.net/figure/Block-diagram-of-CELP-decoder_fig2_264423574


Dr. Markkandan S Module-5 Audio and Video Coding 39/69
CELP: Examples (FS1016, G.728)

Federal Standard 1016 (4.8 kbps) ITU-T G.728 (16 kbps)


• Frame size: 30 ms • Frame size: 0.625 ms (5 samples)
• Subframes: 4 x 7.5 ms • LPC order: 50
• LPC order: 10 • Backward adaptive prediction
• Adaptive codebook: 128 entries • No explicit pitch prediction
• Fixed codebook: 512 entries • Shape-gain vector quantization
• Bit allocation: • Bit allocation:
• LSP: 34 bits/frame • Shape codebook: 7 bits/frame
• Pitch: 8 bits/subframe • Gain codebook: 3 bits/frame
• Fixed codebook: 9 bits/subframe
• Gains: 5 bits/subframe • Low delay: 0.625 ms

Dr. Markkandan S Module-5 Audio and Video Coding 40/69


Perceptual Coding & MPEG
Introduction to Perceptual Coding

• Goal: Exploit limitations of human auditory system


• Key principles:
• Auditory masking
• Critical band analysis
• Temporal masking

• General approach:
1. Time-frequency analysis
2. Psychoacoustic modeling
3. Bit allocation based on perceptual importance
4. Quantization and coding

• Advantages: Dr. Markkandan S Module-5 Audio and Video Coding 42/69


Introduction to Perceptual Coding

Figure 9: General structure of a perceptual audio coder

Dr. Markkandan S Module-5 Audio and Video Coding 43/69


Image source: https://www.researchgate.net/figure/Basic-structure-of-perceptual-audio-coder_fig1_224345681
Psychoacoustic Principles in Audio Coding

• Auditory masking:
• Louder sounds mask quieter sounds
• Masking threshold varies with frequency
• Critical bands:
• Non-uniform frequency resolution of ear
• Approximated by bark scale
• Temporal masking:
• Pre-masking: 20 ms before masker
• Post-masking: up to 200 ms after masker
Figure 10: Frequency masking
• Just Noticeable Distortion (JND):
• Minimum perceivable change in sound
• Varies with frequency and intensity
Image source: https://www.researchgate.net/figure/Illustration-of-the-masking-effect_fig2_220009783

Dr. Markkandan S Module-5 Audio and Video Coding 44/69


MPEG Audio Coding: Overview

• MPEG: Moving Picture Experts Group

• Audio coding standards:

• MPEG-1 Audio (1992)


• MPEG-2 Audio (1994)
• MPEG-4 Audio (1999)

• MPEG-1 Audio Layers:

• Layer I: Simplest, lowest compression


• Layer II: Improved compression
• Layer III (MP3): Highest compression

Dr. Markkandan S Module-5 Audio and Video Coding 45/69


MPEG Audio Coding: Overview

• Key features:
• Perceptual coding
principles
• Filterbank for
time-frequency
mapping
• Psychoacoustic
model
• Dynamic bit
allocation coding
Figure 11: General structure of MPEG audio encoder

Image source: https://www.researchgate.net/figure/


Block-diagram-of-a-generic-MPEG-1-Layer-I-II-encoder_fig1_224345681
Dr. Markkandan S Module-5 Audio and Video Coding 46/69
LPC: Numerical Problem 1
Problem
Given an LPC model of order 4 with the following reflection coefficients: k1 = 0.5, k2 = −0.3,
1+ki
k3 = 0.2, k4 = 0.1 Calculate the g-parameters for transmission using the equation: [ gi = 1−ki
]

Solution
Calculating for each coefficient:
1 + 0.5 1.5
g1 = = =3
1 − 0.5 0.5
1 + (−0.3) 0.7
g2 = = ≈ 0.538
1 − (−0.3) 1.3
1 + 0.2 1.2
g3 = = = 1.5
1 − 0.2 0.8
1 + 0.1 1.1
g4 = = ≈ 1.222
1 − 0.1 0.9
Therefore, the g-parameters are approximately: 3, 0.538, 1.5, and 1.222.
Dr. Markkandan S Module-5 Audio and Video Coding 47/69
CELP: Numerical Problem 2

Problem
In a CELP coder, the excitation signal is given by: [ e(n) = βv (n) + γc(n)]If β = 0.8,
γ = 0.5, v (n) = 1, −1, 0.5, −0.5, and c(n) = 0.2, 0.3, −0.1, 0.1 for a subframe of 4
samples, calculate the excitation signal e(n).
Solution
Calculating sample by sample:

e(0) = 0.8(1) + 0.5(0.2) = 0.8 + 0.1 = 0.9


e(1) = 0.8(−1) + 0.5(0.3) = −0.8 + 0.15 = −0.65
e(2) = 0.8(0.5) + 0.5(−0.1) = 0.4 − 0.05 = 0.35
e(3) = 0.8(−0.5) + 0.5(0.1) = −0.4 + 0.05 = −0.35

Therefore, e(n) = 0.9, −0.65, 0.35, −0.35


Dr. Markkandan S Module-5 Audio and Video Coding 48/69
LPC: Numerical Problem 3
(2)
Using the Levinson-Durbin algorithm, calculate k2 and a1 given: R(0) = 1, R(1) = 0.5, R(2) = 0.2 Recall the relevant equations:

Pi−1 (i−1)
R(i) − j=1
a jR(i − j)
ki =
Ei − 1
(i) (i−1) (i−1)
a j =a j − ki a i − j, 1≤j <i
2
Ei = (1 − ki )E i −1

Solution
R(1)
First, calculate k1 : [ k1 = R(0) = 0.5
1
= 0.5]Then,E1 :[E1 = (1 − k12 )E0 = (1 − 0.52 )1 = 0.75]
Now, calculate k2 :

(1)
R(2) − a1 R(1) 0.2 − 0.5(0.5) 0.2 − 0.25 1
k2 = = = =− ≈ −0.0667
E1 0.75 0.75 15

(2)
Finally, calculate a1 :

(2) (1) (1)


a1 = a1 − k2 a1 = 0.5 − (−0.0667)(1) = 0.5 + 0.0667 = 0.5667

(2)
Therefore, k2 ≈ −0.0667 and a1 ≈ 0.5667

Dr. Markkandan S Module-5 Audio and Video Coding 49/69


Video Coding
Introduction to Video Compression

• Video: time sequence of images


• Huge data rates: e.g., CCIR 601 format

• 30 frames/second, 16 bits/pixel
• 168 Mbits/second

• Goal: Reduce data rate while maintaining quality

• Key approach: Exploit temporal correlation between frames


• Challenges:

• Motion video perception differs from still images


• Artifacts may be more/less noticeable in motion

Dr. Markkandan S Module-5 Audio and Video Coding 51/69


Video Compression: Basic Concept

• Use previous frame to predict current


frame
• Encode and transmit prediction error
(residual)
• Receiver reconstructs frame using:
• Prediction from previous frame
• Received prediction error
• Key technique: Motion-compensated Two frames of a video sequence
prediction

Dr. Markkandan S Module-5 Audio and Video Coding 52/69


Video Compression: Basic Concept

Dr. Markkandan S Module-5 Audio and Video Coding 53/69


Video Coding: Motion Estimation and Compensation

• Problem: Objects move between frames

• Solution: Block-based motion compensation

• Divide frame into blocks (e.g., 16x16 pixels)


• Search previous frame for best matching block
• Transmit motion vectors instead of pixel differences

• Advantages:

• Exploits temporal redundancy


• Significantly reduces data to be transmitted

Dr. Markkandan S Module-5 Audio and Video Coding 54/69


Motion Estimation: Block Matching

• Search area typically ±15 pixels


• Matching criterion: Sum of Absolute
Differences (SAD)
SAD = 15
P P15
i=0 j=0 |Cij − Rij | where Cij and
Rij are pixels in current and reference blocks
• Trade-off: Block size vs. prediction accuracy Effect of block size on motion
compensation

Dr. Markkandan S Module-5 Audio and Video Coding 55/69


Types of Frames in Video Coding

• I-frames (Intra-coded)

• Encoded without reference to other frames


• Provide random access points

• P-frames (Predictive-coded)

• Use motion-compensated prediction from previous I or P frame

• B-frames (Bidirectionally predictive-coded)

• Use prediction from both past and future frames


• Highest compression, but introduce delay

Dr. Markkandan S Module-5 Audio and Video Coding 56/69


Group of Pictures (GOP) Structure

• GOP: Sequence of I, P, and B frames

• Typical pattern: IBBPBBPBBPBB

• I-frame: Start of each GOP

• P-frames: Predicted from previous I or P

• B-frames: Bidirectional prediction

• Benefits:

• Efficient compression
• Flexible access to video

Dr. Markkandan S Module-5 Audio and Video Coding 57/69


Group of Pictures (GOP) Structure

Figure 12: A possible arrangement for a group of pictures

Dr. Markkandan S Module-5 Audio and Video Coding 58/69


Video Encoding Process

1. Motion estimation and compensation

2. Transform coding (usually DCT)

3. Quantization

4. Entropy coding

Dr. Markkandan S Module-5 Audio and Video Coding 59/69


Video Encoding Process

Figure 13: Block diagram of a video encoder


Dr. Markkandan S Module-5 Audio and Video Coding 60/69
Video Decoding Process

1. Entropy decoding

2. Inverse quantization

3. Inverse transform

4. Motion compensation

• Decoder performs inverse operations of encoder

• Reconstructs video frames from received data

Dr. Markkandan S Module-5 Audio and Video Coding 61/69


Rate Control in Video Coding

• Purpose: Maintain constant output bit rate

• Methods:

• Adjust quantization step size


• Drop frames if necessary

• Buffer fullness guides rate control decisions

• Balances quality and bit rate

Dr. Markkandan S Module-5 Audio and Video Coding 62/69


Video Coding Standard: MPEG-4

• Developed by Moving Picture Experts Group (MPEG)

• Object-oriented approach to multimedia coding


• Key features:
• Object-based coding
• Sprite coding for backgrounds
• Scalability (temporal, spatial, and object)

• Applications:
• Digital television
• Interactive graphics applications
• Streaming media
Dr. Markkandan S Module-5 Audio and Video Coding 63/69
MPEG-4: Object-Based Coding

• Video scene as collection of objects

• Each object coded independently

• Objects can be:

• Visual (e.g., background, talking head)


• Aural (e.g., speech, music)

• Scene description using BIFS (Binary Format for Scenes)

• Allows flexible manipulation of objects

Dr. Markkandan S Module-5 Audio and Video Coding 64/69


MPEG-4: Object-Based Coding

Dr. Markkandan S Module-5 Audio and Video Coding 65/69


MPEG-4: Sprite Coding

• Sprite: Large panoramic background image

• Transmitted once, reused in multiple frames

• Moving foreground objects placed on sprite

• Efficient for scenes with static backgrounds

• Equation for sprite warping:


    
x′ a b c x
 ′ 
 y  = d e f  y 
 

w′ g h 1 1

where (x, y ) is the original coordinate and (x ′ /w ′ , y ′ /w ′ ) is the warped coordinate


Dr. Markkandan S Module-5 Audio and Video Coding 66/69
MPEG-4: Scalability

• Temporal Scalability:
• Enhance frame rate
• Base layer + enhancement layer(s)

• Spatial Scalability:
• Enhance spatial resolution
• Use upsampling of base layer

• SNR Scalability:
• Enhance quality (Signal-to-Noise Ratio)
• Refine quantization in enhancement layer

• Object Scalability:
• Selectively transmit or decode objects Dr. Markkandan S Module-5 Audio and Video Coding 67/69
MPEG-4: Shape Coding

• Important for object-based coding

• Methods:

• Bitmap-based
• Contour-based

• Context-based Arithmetic Encoding (CAE) for binary alpha planes

• Equation for context number:


9
X
Ck = ci · 2i
i=0

where ci are binary values of neighboring pixels

Dr. Markkandan S Module-5 Audio and Video Coding 68/69


MPEG-4: Facial Animation

• Facial Definition Parameters (FDPs):


• Define shape and texture of face

• Facial Animation Parameters (FAPs):


• Control facial expressions

• 68 FAPs defined, e.g.:


• Jaw rotation
• Eye movement
• Lip deformation

• Equation for FAP interpolation:


t − t1
FAP(t) = FAP(t1 ) + [FAP(t2 ) − FAP(t1 )]
t2 − t1
Dr. Markkandan S Module-5 Audio and Video Coding 69/69

You might also like