0% found this document useful (0 votes)

32 views69 pages

Information Theory Module 5

Uploaded by

Akash Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views69 pages

Information Theory Module 5

Uploaded by

Akash Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Module-5

Audio and Video Coding

Dr. Markkandan S

School of Electronics Engineering (SENSE)

Vellore Institute of Technology
Chennai

1
Human Speech Production Mechanism

Figure 1: Human Speech Production Mechanism

Dr. Markkandan S Module-5 Audio and Video Coding 2/69
Introduction to Audio Compression

• Audio compression refers to the process of reducing the amount of data required
to represent audio signals while maintaining acceptable quality.
• Two main types of compression:
• Lossy compression: Reduces data by removing some audio information, which may
result in a loss of quality (e.g., MP3).
• Lossless compression: Reduces data without any loss of quality (e.g., FLAC).
• Compression reduces file size, transmission time, and storage requirements.
• Applications: Digital media, streaming, telecommunication, and storage.

Dr. Markkandan S Module-5 Audio and Video Coding 3/69

Introduction to Audio Coding

• Audio coding: compression of audio signals for efficient storage/transmission

• Objectives:
• Reduce bit rate while maintaining perceptual quality
• Enable efficient storage and transmission of audio
• Key approaches:
• Waveform coding: Directly encode audio samples
• Parametric coding: Encode model parameters (e.g., LPC)
• Perceptual coding: Exploit human auditory system limitations
• Trade-offs:
• Compression ratio vs. audio quality
• Computational complexity vs. performance
• Delay vs. coding efficiency
• Applications: Digital audio broadcasting, VoIP, music streaming, etc.
Dr. Markkandan S Module-5 Audio and Video Coding 4/69
Types of Audio Coding Techniques

• Common audio coding techniques:

• Pulse Code Modulation (PCM)
• Differential Pulse Code Modulation (DPCM)
• Adaptive Differential PCM (ADPCM)
• Linear Predictive Coding (LPC)
• Code-Excited Linear Prediction (CELP)
• Perceptual Audio Coding (e.g., MPEG Audio)
• PCM: Direct sampling and quantization of the audio signal.
• DPCM and ADPCM: Reduces redundancy in the signal by encoding differences
between samples.
• LPC and CELP: Use models of human speech production to achieve efficient
compression.

Dr. Markkandan S Module-5 Audio and Video Coding 5/69

Linear Predictive Coding(LPC)
Overview of Linear Predictive Coding (LPC)

• LPC: Efficient parametric coding technique for speech

• Core idea: Model speech production process
• Key components:
• Excitation source model
• Vocal tract filter model
• Process:
1. Analyze speech to extract model parameters
2. Transmit parameters (not raw audio)
3. Synthesize speech at receiver using parameters
• Advantages:
• Very low bit rate (2.4 kbps - 4.8 kbps)
• Good intelligibility for speech
• Limitations:
• ”Robotic” sound quality
• Not suitable for non-speech audio
Dr. Markkandan S Module-5 Audio and Video Coding 7/69
Linear Predictive Coding (LPC): Overview

• LPC is widely used for speech signal compression.The basic idea is to model the
vocal tract as a linear filter and represent speech as the output of this filter.
• Equation for LPC model:
p
X
yn = ai yn−i + Gen (1)
i=1

where:
• yn is the current sample,
• ai are the LPC coefficients,
• en is the excitation signal,
• G is the gain factor.
• Applications: Speech coding, synthesis, and recognition.

Dr. Markkandan S Module-5 Audio and Video Coding 8/69

LPC: Speech Production Model

• Models speech as output of a time-varying linear system

• Two main components:
• Excitation source: Models airflow from lungs
• Vocal tract filter: Models acoustic properties of vocal tract
• Excitation types:
• Voiced: Quasi-periodic pulses (e.g., vowels)
• Unvoiced: White noise (e.g., fricatives)
• Vocal tract modeled as an all-pole filter:
G
H(z) = Pp −k
(2)
1− k=1 ak z
where G is gain, p is filter order, ak are filter coefficients
• Time-domain representation:
p
X
s(n) = ak s(n − k) + Gu(n) (3)
k=1 Dr. Markkandan S Module-5 Audio and Video Coding 9/69
LPC: Encoder and Decoder

Figure 2: LPC encoder and Decoder

Dr. Markkandan S Module-5 Audio and Video Coding 10/69
LPC: Speech Signal Modeling

• The speech signal is modeled by a filter that represents the vocal tract.
• The excitation signal en drives this filter.
• LPC analyzes the speech into frames and estimates filter coefficients for each
frame.

Figure 3: Speech synthesis model

Dr. Markkandan S Module-5 Audio and Video Coding 11/69

LPC: Vocal Tract Filter

• All-pole filter approximates vocal tract resonances (formants)

• Transfer function:
G
H(z) = (4)
1 − pk=1 ak z −k
P

• Typical filter order: 10-12 for 8 kHz sampled speech

• Estimation of filter coefficients:
• Minimize mean squared prediction error
• Autocorrelation method:
Ra = r (5)
where R is the autocorrelation matrix, a is the coefficient vector, r is the
autocorrelation vector
• Solved efficiently using Levinson-Durbin recursion
• Stability ensured by converting to reflection coefficients
• Quantization: Non-uniform quantization of reflection coefficients
Dr. Markkandan S Module-5 Audio and Video Coding 12/69
LPC: Vocal Tract Filter

Figure 4: Model for speech synthesis with vocal tract filter

Dr. Markkandan S Module-5 Audio and Video Coding 13/69

LPC: Excitation Source

• Two types of excitation:

1. Voiced excitation:
• Quasi-periodic pulse train
• Parameters: Pitch period, voiced/unvoiced decision
• Pitch detection algorithms:
• Time-domain: Autocorrelation, AMDF
• Frequency-domain: Harmonic peak detection
2. Unvoiced excitation:
• White noise generator
• No additional parameters needed
• Voiced/Unvoiced decision:
• Based on features like:
• Short-term energy
• Zero-crossing rate
• First reflection coefficient
• Excitation gain: Dr. Markkandan S Module-5 Audio and Video Coding 14/69
LPC: Excitation Source

Dr. Markkandan S Module-5 Audio and Video Coding 15/69

LPC: Voicing and Pitch Detection

• Voicing decision: Determines if the segment is voiced or unvoiced.

• Voiced speech has a periodic structure, unvoiced speech resembles noise.

• Pitch period estimation: A critical step for voiced speech.

• The pitch period is extracted using autocorrelation or average magnitude difference

function (AMDF):
N
1 X
AMDF (P) = |yi − yi−P | (6)
N
i=1

• Voicing and pitch information helps generate the excitation signal for LPC.

Dr. Markkandan S Module-5 Audio and Video Coding 16/69

LPC: Voicing and Pitch Detection - AMDF

AMDF for ’e’ and ’s’

Pitch Detection
Dr. Markkandan S Module-5 Audio and Video Coding 17/69
LPC: Parameter Estimation

• Goal: Estimate LPC model parameters from speech signal

• Process:

1. Pre-emphasis: High-pass filter to flatten spectral slope

2. Framing: Divide speech into 20-30 ms frames
3. Windowing: Apply window function (e.g., Hamming) to each frame
4. Autocorrelation: Compute autocorrelation coefficients
5. Levinson-Durbin recursion: Solve for LPC coefficients

Dr. Markkandan S Module-5 Audio and Video Coding 18/69

LPC: Parameter Estimation

• Levinson-Durbin algorithm:

1. Initialize: E0 = R(0)
2. For i = 1 to p:
Pi−1 (i−1)
R(i) − j=1 aj R(i − j)
ki = (7)
Ei−1
(i)
ai = ki (8)
(i) (i−1) (i−1)
aj = aj − ki ai−j , 1≤j <i (9)
Ei = (1 − ki2 )Ei−1 (10)

• Output: LPC coefficients ai and reflection coefficients ki

Dr. Markkandan S Module-5 Audio and Video Coding 19/69

LPC: Transmission of Parameters

• Parameters to transmit:
• Reflection coefficients (instead of direct LPC coefficients)
• Pitch period (for voiced frames)
• Voiced/unvoiced decision
• Gain

• Quantization:
• Reflection coefficients: Non-uniform quantization
1 + ki
gi = (11)
1 − ki
• Pitch: Logarithmic quantization
• Gain: Logarithmic quantization

• Bit allocation example (LPC-10, 2.4 kbps): Dr. Markkandan S Module-5 Audio and Video Coding 20/69
LPC: Transmission of Parameters

• Reflection coefficients: 41 bits

• Pitch and V/UV: 7 bits
• Gain: 5 bits
• Synchronization: 1 bit

• Frame duration: 22.5 ms (180 samples at 8 kHz)

• Resulting bit rate: 54 bits / 22.5 ms 2400 bps

Dr. Markkandan S Module-5 Audio and Video Coding 21/69

LPC: Speech Synthesis at Receiver

• Process:
1. Decode received parameters
2. Generate excitation signal
3. Synthesize speech using all-pole filter

• Excitation generation:
• Voiced: Impulse train with decoded pitch period
• Unvoiced: White noise generator

• All-pole filter implementation:

p
X
s(n) = ak s(n − k) + Gu(n) (12)
k=1

• Overlap-add successive frames to reduce discontinuities

Dr. Markkandan S Module-5 Audio and Video Coding 22/69
LPC: Speech Synthesis at Receiver

• Post-processing:
• De-emphasis filter (inverse of pre-emphasis)
• Adaptive postfiltering to enhance formants

Dr. Markkandan S Module-5 Audio and Video Coding 23/69

Limitations of LPC and Need for CELP

• Limitations of basic LPC:

• ”Buzzy” or ”robotic” sound quality

• Poor representation of unvoiced and transient sounds
• Binary voiced/unvoiced decision too simplistic
• Limited pitch resolution
• Sensitive to transmission errors

• Reasons for limitations:

• Oversimplified excitation model

• All-pole filter may not capture all spectral details
• Frame-based analysis loses some temporal resolution
Dr. Markkandan S Module-5 Audio and Video Coding 24/69
Limitations of LPC and Need for CELP

• Need for improvement:

• Better excitation model

• Finer pitch and spectral representation
• Improved perceptual quality at low bit rates

• CELP as a solution:

• Addresses LPC limitations

• Uses codebook of excitation vectors
• Incorporates perceptual weighting
• Achieves better quality at similar bit rates

Dr. Markkandan S Module-5 Audio and Video Coding 25/69

Code Excited Linear Prediction (CELP)
Introduction to Code Excited Linear Prediction (CELP)

• Key idea: Use codebook of excitation vectors

• Components:
• LPC filter (as in traditional LPC)
• Adaptive codebook (for pitch structure)
• Fixed (stochastic) codebook (for residual excitation)
• Perceptual weighting filter

• Process:
1. LPC analysis to obtain filter coefficients
2. Search codebooks for best excitation
3. Minimize perceptually weighted error
4. Transmit codebook indices and gains
Dr. Markkandan S Module-5 Audio and Video Coding 27/69
Introduction to Code Excited Linear Prediction (CELP)

• Advantages:

• Improved speech quality compared to LPC

• Efficient at low bit rates (4.8-16 kbps)
• Handles both voiced and unvoiced speech well

• Challenges:

• Computationally intensive codebook search

• Requires larger codebooks for higher quality

Dr. Markkandan S Module-5 Audio and Video Coding 28/69

Introduction to Code Excited Linear Prediction (CELP)

Figure 6: Block diagram of the ITU-T H.261 CELP Encoder and Decoder
Dr. Markkandan S Module-5 Audio and Video Coding 29/69
CELP: Codebook Structure(1/2)

• Two main codebooks in CELP:

1. Adaptive codebook
2. Fixed (stochastic) codebook

• Adaptive codebook:

• Models pitch periodicity and long-term correlations

• Constructed from past excitation signals
• Updated each subframe
• Typically 128-256 entries

Dr. Markkandan S Module-5 Audio and Video Coding 30/69

CELP: Codebook Structure(2/2)

• Fixed codebook:

• Models residual excitation after pitch prediction

• Designed to cover range of possible excitations
• Typically 512-1024 entries
• Various structures: Binary, ternary, sparse algebraic

• Excitation signal:
e(n) = βv (n) + γc(n) (13)

where v (n) is from adaptive codebook, c(n) is from fixed codebook, β and γ are
respective gains

Dr. Markkandan S Module-5 Audio and Video Coding 31/69

CELP: Stochastic Codebook

• Purpose: Model residual excitation not captured by adaptive codebook

• Types of stochastic codebooks:
• Random codebook: Gaussian random sequences
• Algebraic codebook: Structured sparse vectors

• Random codebook:
• Entries are Gaussian random sequences
• Typically quantized to +1, -1, or 0
• Large storage requirement

• Algebraic codebook (ACELP):

• Sparse vectors with few non-zero pulses
• Pulse positions and signs determined by algebraic structure
Dr. Markkandan S Module-5 Audio and Video Coding 32/69
CELP: Adaptive Codebook

• Purpose: Model pitch periodicity and long-term correlations

• Structure:
• Contains past excitation signals
• Updated each subframe
• Typically 128-256 entries

• Adaptive codebook vector:

v (n) = e(n − T + i), i = 0, 1, ..., N − 1 (16)
where T is pitch lag, N is subframe size
• Fractional pitch:
• Allows finer pitch resolution (e.g., 1/3 or 1/4 sample)
• Requires interpolation of past excitation Dr. Markkandan S Module-5 Audio and Video Coding 33/69
CELP: Perceptual Weighting

• Purpose: Shape quantization noise to be less perceptible

• Concept: Exploit masking properties of human auditory system

• Perceptual weighting filter:

A(z/γ1 )
W (z) = (18)
A(z/γ2 )
where A(z) is LPC filter, typically γ1 = 0.9, γ2 = 0.5

• Effect:

• Attenuates error in formant regions

• Amplifies error in spectral valleys

Dr. Markkandan S Module-5 Audio and Video Coding 34/69

CELP: Perceptual Weighting

• Implementation:

• Apply W (z) to error signal in codebook search

• Minimize weighted error:
N−1
X
Ew = [xw (n) − x̂w (n)]2 (19)
n=0

• Benefits:

• Improved subjective quality

• Better allocation of bits to perceptually important regions

Dr. Markkandan S Module-5 Audio and Video Coding 35/69

CELP: Encoder Operation

• CELP encoding steps:

1. LPC analysis: Compute coefficients, convert
to LSP
2. Subframe processing (typically 4 per frame)
• Bit allocation example
3. Adaptive codebook search: Find best pitch lag
(FS1016 4.8 kbps):
and gain
4. Fixed codebook search: Find best index and • LSP: 34 bits/frame
gain • Pitch: 8 bits/subframe
5. Parameter quantization: LSPs, pitch, indices, • Fixed codebook: 9
gains bits/subframe
6. Update memories for next frame • Gains: 5 bits/subframe

• Computational complexity:
• Codebook search most intensive
• Fast search algorithms used in practice
Dr. Markkandan S Module-5 Audio and Video Coding 36/69
CELP: Encoder Operation

Figure 7: Block diagram of CELP encoder

Image source: https://www.researchgate.net/figure/Block-diagram-of-CELP-encoder_fig1_264423574

Dr. Markkandan S Module-5 Audio and Video Coding 37/69

CELP: Decoder Operation

• Steps in CELP decoding: • Excitation reconstruction:

1. Parameter decoding
2. LSP to LPC conversion e(n) = βv (n) + γc(n) (20)
3. For each subframe:
• Adaptive codebook contribution • LPC synthesis:
• Fixed codebook contribution p
• Excitation reconstruction
X
s(n) = ak s(n − k) + e(n)
• LPC synthesis filtering
k=1
4. Post-processing (21)

Dr. Markkandan S Module-5 Audio and Video Coding 38/69

CELP: Decoder Operation

Figure 8: Block diagram of CELP decoder

Image source: https://www.researchgate.net/figure/Block-diagram-of-CELP-decoder_fig2_264423574

Dr. Markkandan S Module-5 Audio and Video Coding 39/69
CELP: Examples (FS1016, G.728)

Federal Standard 1016 (4.8 kbps) ITU-T G.728 (16 kbps)

• Frame size: 30 ms • Frame size: 0.625 ms (5 samples)
• Subframes: 4 x 7.5 ms • LPC order: 50
• LPC order: 10 • Backward adaptive prediction
• Adaptive codebook: 128 entries • No explicit pitch prediction
• Fixed codebook: 512 entries • Shape-gain vector quantization
• Bit allocation: • Bit allocation:
• LSP: 34 bits/frame • Shape codebook: 7 bits/frame
• Pitch: 8 bits/subframe • Gain codebook: 3 bits/frame
• Fixed codebook: 9 bits/subframe
• Gains: 5 bits/subframe • Low delay: 0.625 ms

Dr. Markkandan S Module-5 Audio and Video Coding 40/69

Perceptual Coding & MPEG
Introduction to Perceptual Coding

• Goal: Exploit limitations of human auditory system

• Key principles:
• Auditory masking
• Critical band analysis
• Temporal masking

• General approach:
1. Time-frequency analysis
2. Psychoacoustic modeling
3. Bit allocation based on perceptual importance
4. Quantization and coding

• Advantages: Dr. Markkandan S Module-5 Audio and Video Coding 42/69

Introduction to Perceptual Coding

Figure 9: General structure of a perceptual audio coder

Dr. Markkandan S Module-5 Audio and Video Coding 43/69

Image source: https://www.researchgate.net/figure/Basic-structure-of-perceptual-audio-coder_fig1_224345681
Psychoacoustic Principles in Audio Coding

• Auditory masking:
• Louder sounds mask quieter sounds
• Masking threshold varies with frequency
• Critical bands:
• Non-uniform frequency resolution of ear
• Approximated by bark scale
• Temporal masking:
• Pre-masking: 20 ms before masker
• Post-masking: up to 200 ms after masker
Figure 10: Frequency masking
• Just Noticeable Distortion (JND):
• Minimum perceivable change in sound
• Varies with frequency and intensity
Image source: https://www.researchgate.net/figure/Illustration-of-the-masking-effect_fig2_220009783

Dr. Markkandan S Module-5 Audio and Video Coding 44/69

MPEG Audio Coding: Overview

• MPEG: Moving Picture Experts Group

• Audio coding standards:

• MPEG-1 Audio (1992)

• MPEG-2 Audio (1994)
• MPEG-4 Audio (1999)

• MPEG-1 Audio Layers:

• Layer I: Simplest, lowest compression

• Layer II: Improved compression
• Layer III (MP3): Highest compression

Dr. Markkandan S Module-5 Audio and Video Coding 45/69

MPEG Audio Coding: Overview

• Key features:
• Perceptual coding
principles
• Filterbank for
time-frequency
mapping
• Psychoacoustic
model
• Dynamic bit
allocation coding
Figure 11: General structure of MPEG audio encoder

Image source: https://www.researchgate.net/figure/

Block-diagram-of-a-generic-MPEG-1-Layer-I-II-encoder_fig1_224345681
Dr. Markkandan S Module-5 Audio and Video Coding 46/69
LPC: Numerical Problem 1
Problem
Given an LPC model of order 4 with the following reflection coefficients: k1 = 0.5, k2 = −0.3,
1+ki
k3 = 0.2, k4 = 0.1 Calculate the g-parameters for transmission using the equation: [ gi = 1−ki
]

Solution
Calculating for each coefficient:
1 + 0.5 1.5
g1 = = =3
1 − 0.5 0.5
1 + (−0.3) 0.7
g2 = = ≈ 0.538
1 − (−0.3) 1.3
1 + 0.2 1.2
g3 = = = 1.5
1 − 0.2 0.8
1 + 0.1 1.1
g4 = = ≈ 1.222
1 − 0.1 0.9
Therefore, the g-parameters are approximately: 3, 0.538, 1.5, and 1.222.
Dr. Markkandan S Module-5 Audio and Video Coding 47/69
CELP: Numerical Problem 2

Problem
In a CELP coder, the excitation signal is given by: [ e(n) = βv (n) + γc(n)]If β = 0.8,
γ = 0.5, v (n) = 1, −1, 0.5, −0.5, and c(n) = 0.2, 0.3, −0.1, 0.1 for a subframe of 4
samples, calculate the excitation signal e(n).
Solution
Calculating sample by sample:

e(0) = 0.8(1) + 0.5(0.2) = 0.8 + 0.1 = 0.9

e(1) = 0.8(−1) + 0.5(0.3) = −0.8 + 0.15 = −0.65
e(2) = 0.8(0.5) + 0.5(−0.1) = 0.4 − 0.05 = 0.35
e(3) = 0.8(−0.5) + 0.5(0.1) = −0.4 + 0.05 = −0.35

Therefore, e(n) = 0.9, −0.65, 0.35, −0.35

Dr. Markkandan S Module-5 Audio and Video Coding 48/69
LPC: Numerical Problem 3
(2)
Using the Levinson-Durbin algorithm, calculate k2 and a1 given: R(0) = 1, R(1) = 0.5, R(2) = 0.2 Recall the relevant equations:

Pi−1 (i−1)
R(i) − j=1
a jR(i − j)
ki =
Ei − 1
(i) (i−1) (i−1)
a j =a j − ki a i − j, 1≤j <i
2
Ei = (1 − ki )E i −1

Solution
R(1)
First, calculate k1 : [ k1 = R(0) = 0.5
1
= 0.5]Then,E1 :[E1 = (1 − k12 )E0 = (1 − 0.52 )1 = 0.75]
Now, calculate k2 :

(1)
R(2) − a1 R(1) 0.2 − 0.5(0.5) 0.2 − 0.25 1
k2 = = = =− ≈ −0.0667
E1 0.75 0.75 15

(2)
Finally, calculate a1 :

(2) (1) (1)

a1 = a1 − k2 a1 = 0.5 − (−0.0667)(1) = 0.5 + 0.0667 = 0.5667

(2)
Therefore, k2 ≈ −0.0667 and a1 ≈ 0.5667

Dr. Markkandan S Module-5 Audio and Video Coding 49/69

Video Coding
Introduction to Video Compression

• Video: time sequence of images

• Huge data rates: e.g., CCIR 601 format

• 30 frames/second, 16 bits/pixel
• 168 Mbits/second

• Goal: Reduce data rate while maintaining quality

• Key approach: Exploit temporal correlation between frames

• Challenges:

• Motion video perception differs from still images

• Artifacts may be more/less noticeable in motion

Dr. Markkandan S Module-5 Audio and Video Coding 51/69

Video Compression: Basic Concept

• Use previous frame to predict current

frame
• Encode and transmit prediction error
(residual)
• Receiver reconstructs frame using:
• Prediction from previous frame
• Received prediction error
• Key technique: Motion-compensated Two frames of a video sequence
prediction

Dr. Markkandan S Module-5 Audio and Video Coding 52/69

Video Compression: Basic Concept

Dr. Markkandan S Module-5 Audio and Video Coding 53/69

Video Coding: Motion Estimation and Compensation

• Problem: Objects move between frames

• Solution: Block-based motion compensation

• Divide frame into blocks (e.g., 16x16 pixels)

• Search previous frame for best matching block
• Transmit motion vectors instead of pixel differences

• Advantages:

• Exploits temporal redundancy

• Significantly reduces data to be transmitted

Dr. Markkandan S Module-5 Audio and Video Coding 54/69

Motion Estimation: Block Matching

• Search area typically ±15 pixels

• Matching criterion: Sum of Absolute
Differences (SAD)
SAD = 15
P P15
i=0 j=0 |Cij − Rij | where Cij and
Rij are pixels in current and reference blocks
• Trade-off: Block size vs. prediction accuracy Effect of block size on motion
compensation

Dr. Markkandan S Module-5 Audio and Video Coding 55/69

Types of Frames in Video Coding

• I-frames (Intra-coded)

• Encoded without reference to other frames

• Provide random access points

• P-frames (Predictive-coded)

• Use motion-compensated prediction from previous I or P frame

• B-frames (Bidirectionally predictive-coded)

• Use prediction from both past and future frames

• Highest compression, but introduce delay

Dr. Markkandan S Module-5 Audio and Video Coding 56/69

Group of Pictures (GOP) Structure

• GOP: Sequence of I, P, and B frames

• Typical pattern: IBBPBBPBBPBB

• I-frame: Start of each GOP

• P-frames: Predicted from previous I or P

• B-frames: Bidirectional prediction

• Benefits:

• Efficient compression
• Flexible access to video

Dr. Markkandan S Module-5 Audio and Video Coding 57/69

Group of Pictures (GOP) Structure

Figure 12: A possible arrangement for a group of pictures

Dr. Markkandan S Module-5 Audio and Video Coding 58/69

Video Encoding Process

1. Motion estimation and compensation

2. Transform coding (usually DCT)

3. Quantization

4. Entropy coding

Dr. Markkandan S Module-5 Audio and Video Coding 59/69

Video Encoding Process

Figure 13: Block diagram of a video encoder

Dr. Markkandan S Module-5 Audio and Video Coding 60/69
Video Decoding Process

1. Entropy decoding

2. Inverse quantization

3. Inverse transform

4. Motion compensation

• Decoder performs inverse operations of encoder

• Reconstructs video frames from received data

Dr. Markkandan S Module-5 Audio and Video Coding 61/69

Rate Control in Video Coding

• Purpose: Maintain constant output bit rate

• Methods:

• Adjust quantization step size

• Drop frames if necessary

• Buffer fullness guides rate control decisions

• Balances quality and bit rate

Dr. Markkandan S Module-5 Audio and Video Coding 62/69

Video Coding Standard: MPEG-4

• Developed by Moving Picture Experts Group (MPEG)

• Object-oriented approach to multimedia coding

• Key features:
• Object-based coding
• Sprite coding for backgrounds
• Scalability (temporal, spatial, and object)

• Applications:
• Digital television
• Interactive graphics applications
• Streaming media
Dr. Markkandan S Module-5 Audio and Video Coding 63/69
MPEG-4: Object-Based Coding

• Video scene as collection of objects

• Each object coded independently

• Objects can be:

• Visual (e.g., background, talking head)

• Aural (e.g., speech, music)

• Scene description using BIFS (Binary Format for Scenes)

• Allows flexible manipulation of objects

Dr. Markkandan S Module-5 Audio and Video Coding 64/69

MPEG-4: Object-Based Coding

Dr. Markkandan S Module-5 Audio and Video Coding 65/69

MPEG-4: Sprite Coding

• Sprite: Large panoramic background image

• Transmitted once, reused in multiple frames

• Moving foreground objects placed on sprite

• Efficient for scenes with static backgrounds

• Equation for sprite warping:

    
x′ a b c x
 ′ 
 y  = d e f  y 
 

w′ g h 1 1

where (x, y ) is the original coordinate and (x ′ /w ′ , y ′ /w ′ ) is the warped coordinate

Dr. Markkandan S Module-5 Audio and Video Coding 66/69
MPEG-4: Scalability

• Temporal Scalability:
• Enhance frame rate
• Base layer + enhancement layer(s)

• Spatial Scalability:
• Enhance spatial resolution
• Use upsampling of base layer

• SNR Scalability:
• Enhance quality (Signal-to-Noise Ratio)
• Refine quantization in enhancement layer

• Object Scalability:
• Selectively transmit or decode objects Dr. Markkandan S Module-5 Audio and Video Coding 67/69
MPEG-4: Shape Coding

• Important for object-based coding

• Methods:

• Bitmap-based
• Contour-based

• Context-based Arithmetic Encoding (CAE) for binary alpha planes

• Equation for context number:

9
X
Ck = ci · 2i
i=0

where ci are binary values of neighboring pixels

Dr. Markkandan S Module-5 Audio and Video Coding 68/69

MPEG-4: Facial Animation

• Facial Definition Parameters (FDPs):

• Define shape and texture of face

• Facial Animation Parameters (FAPs):

• Control facial expressions

• 68 FAPs defined, e.g.:

• Jaw rotation
• Eye movement
• Lip deformation

• Equation for FAP interpolation:

t − t1
FAP(t) = FAP(t1 ) + [FAP(t2 ) − FAP(t1 )]
t2 − t1
Dr. Markkandan S Module-5 Audio and Video Coding 69/69

Dolby Audio Coders
100% (3)
Dolby Audio Coders
17 pages
Tandberg TD 20a Service Manual
100% (1)
Tandberg TD 20a Service Manual
37 pages
Linear Predictive Coding: Jeremy Bradbury December 5, 2000
No ratings yet
Linear Predictive Coding: Jeremy Bradbury December 5, 2000
23 pages
Code Excited Liner Predictive Coding
No ratings yet
Code Excited Liner Predictive Coding
9 pages
Speech Compression Techniques - Formant and CELP Vocoders
No ratings yet
Speech Compression Techniques - Formant and CELP Vocoders
41 pages
STR KM5000 PDF
No ratings yet
STR KM5000 PDF
71 pages
Implementation of Linear Predictive Coding (LPC) of Speech: Outline
No ratings yet
Implementation of Linear Predictive Coding (LPC) of Speech: Outline
15 pages
Mesa Boogie Dual Caliber DC 5 Schematic
100% (5)
Mesa Boogie Dual Caliber DC 5 Schematic
13 pages
Linear Prediction
No ratings yet
Linear Prediction
94 pages
Vocoder
No ratings yet
Vocoder
72 pages
Pioneer DEH-2900MP Manual en
No ratings yet
Pioneer DEH-2900MP Manual en
20 pages
Icom IC-M502 Service Manual
100% (1)
Icom IC-M502 Service Manual
44 pages
Speech Coders For Wireless Communication
No ratings yet
Speech Coders For Wireless Communication
53 pages
Speech Processing Project
No ratings yet
Speech Processing Project
16 pages
ch5.3 (Vocoders)
No ratings yet
ch5.3 (Vocoders)
23 pages
Adaptive Multi Rate Coder Using ACLP
No ratings yet
Adaptive Multi Rate Coder Using ACLP
45 pages
MELP Low Bit Rate Speech Coding Algorithm
No ratings yet
MELP Low Bit Rate Speech Coding Algorithm
5 pages
57 - Linear Predictive Coding PDF
No ratings yet
57 - Linear Predictive Coding PDF
7 pages
LPC Vocoder Project
No ratings yet
LPC Vocoder Project
4 pages
Unit 2 Wireless
No ratings yet
Unit 2 Wireless
159 pages
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
No ratings yet
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
20 pages
REC085 t3 Sheet
No ratings yet
REC085 t3 Sheet
15 pages
Unit2 1
No ratings yet
Unit2 1
23 pages
Vocoders: Nadeem Pasha
No ratings yet
Vocoders: Nadeem Pasha
7 pages
Unit 3
No ratings yet
Unit 3
44 pages
A New Mixed Excitation LPC Vocoder
No ratings yet
A New Mixed Excitation LPC Vocoder
4 pages
Vocoder
No ratings yet
Vocoder
8 pages
Comparative Analysis of Speech Compression Algorithms With Perceptual and LP Based Quality Evaluations
No ratings yet
Comparative Analysis of Speech Compression Algorithms With Perceptual and LP Based Quality Evaluations
1 page
Comparative Analysis of Speech Compression Algorithms With Perceptual and LP Based Quality Evaluations
No ratings yet
Comparative Analysis of Speech Compression Algorithms With Perceptual and LP Based Quality Evaluations
5 pages
தகவல் தொடர்பாடல் தொழினுட்பவியல் தரம் 9 கடந்தகால வினாத்தாள் தவணை 1 2023 - -
No ratings yet
தகவல் தொடர்பாடல் தொழினுட்பவியல் தரம் 9 கடந்தகால வினாத்தாள் தவணை 1 2023 - -
4 pages
Speech Coder
No ratings yet
Speech Coder
20 pages
New Speech Coding Techniques: Mr. L.Ramesh Ap/Ece
No ratings yet
New Speech Coding Techniques: Mr. L.Ramesh Ap/Ece
24 pages
Unit 2 A
No ratings yet
Unit 2 A
48 pages
Dokumen - Tips Elec9344speech Audio Processing 4pdfspeech Signal For Digital Storage or Transmission
No ratings yet
Dokumen - Tips Elec9344speech Audio Processing 4pdfspeech Signal For Digital Storage or Transmission
87 pages
Low Bit Rate Speech Coding
No ratings yet
Low Bit Rate Speech Coding
165 pages
Speech Compression
No ratings yet
Speech Compression
37 pages
McCree MixedExcitationLPCVocoderModel ieeetSAP95
No ratings yet
McCree MixedExcitationLPCVocoderModel ieeetSAP95
9 pages
Speech and Audio Processing: Lecture-3
No ratings yet
Speech and Audio Processing: Lecture-3
20 pages
Research Paper
No ratings yet
Research Paper
5 pages
Linear, Time-Varying System e (N), Excitation X (N), Speech Output
No ratings yet
Linear, Time-Varying System e (N), Excitation X (N), Speech Output
4 pages
Speech Signal Processing
No ratings yet
Speech Signal Processing
41 pages
Speech Compression
No ratings yet
Speech Compression
14 pages
Prepared By: Mamatha.K.S M.Tech (S.P) 1 Sem Guided By: Mr. Satish.M.N
No ratings yet
Prepared By: Mamatha.K.S M.Tech (S.P) 1 Sem Guided By: Mr. Satish.M.N
21 pages
Report
No ratings yet
Report
9 pages
MMC Unit III-1
No ratings yet
MMC Unit III-1
122 pages
Linear Predictive Coding
No ratings yet
Linear Predictive Coding
22 pages
Human Speech Producing Organs: 2.4 Kbps
No ratings yet
Human Speech Producing Organs: 2.4 Kbps
108 pages
Audio Compression
No ratings yet
Audio Compression
81 pages
Speech Coding Journal
No ratings yet
Speech Coding Journal
20 pages
Speech Compression Using LPC Technique: Subject: Digital Signal Processing
No ratings yet
Speech Compression Using LPC Technique: Subject: Digital Signal Processing
23 pages
CELP
No ratings yet
CELP
23 pages
Ijetae 0612 54 PDF
No ratings yet
Ijetae 0612 54 PDF
4 pages
Human Speech Communication
No ratings yet
Human Speech Communication
44 pages
LPC Modeling: Unit 5 1.speech Compression
No ratings yet
LPC Modeling: Unit 5 1.speech Compression
13 pages
JBL SR4731X Tech-Manual Rev O
No ratings yet
JBL SR4731X Tech-Manual Rev O
2 pages
Unit Iv Audio and Video Coding
No ratings yet
Unit Iv Audio and Video Coding
15 pages
Lecture LPC
No ratings yet
Lecture LPC
7 pages
Speech Compression
No ratings yet
Speech Compression
15 pages
Nice
No ratings yet
Nice
15 pages
EMU B-3 Sound Module
No ratings yet
EMU B-3 Sound Module
196 pages
Bebio Amaro - Electrical Audio
No ratings yet
Bebio Amaro - Electrical Audio
506 pages
Speech Generation
No ratings yet
Speech Generation
11 pages
Sony SSWS-101 Service Manual
No ratings yet
Sony SSWS-101 Service Manual
4 pages
Free Plugin Tool Box - Edition 1 - Creative Sound Lab
No ratings yet
Free Plugin Tool Box - Edition 1 - Creative Sound Lab
14 pages
4: Speech Compression: Data Rates
No ratings yet
4: Speech Compression: Data Rates
14 pages
Service Manual: HCD-D670AV/N555AV
No ratings yet
Service Manual: HCD-D670AV/N555AV
78 pages
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
No ratings yet
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
20 pages
EE6425 Class Project: LPC 10 Speech Analysis and Synthesis Model
No ratings yet
EE6425 Class Project: LPC 10 Speech Analysis and Synthesis Model
23 pages
Service Manual: MA6002 (MA6004 (
No ratings yet
Service Manual: MA6002 (MA6004 (
33 pages
Sound System RAB
No ratings yet
Sound System RAB
3 pages
MICRO Groove Tubes - 55 - Web0504
No ratings yet
MICRO Groove Tubes - 55 - Web0504
1 page
4320 Manual
No ratings yet
4320 Manual
4 pages
HP 3585A Datasheet
No ratings yet
HP 3585A Datasheet
3 pages
P A System Spec - RCF Edts 173
No ratings yet
P A System Spec - RCF Edts 173
14 pages
Class D Vs Ab Efficiency Report
No ratings yet
Class D Vs Ab Efficiency Report
2 pages
Linear Prediction Coding Vocoders: Institute of Space Technology Islamabad
No ratings yet
Linear Prediction Coding Vocoders: Institute of Space Technology Islamabad
15 pages
Lake LM44 Dolby
No ratings yet
Lake LM44 Dolby
4 pages
The Inventor of The Phonograph
No ratings yet
The Inventor of The Phonograph
1 page
Roland V Mixer M 300 Users Manual 444091
No ratings yet
Roland V Mixer M 300 Users Manual 444091
6 pages
Bt/Aux/Usb/Tuner/Nfc/Mic/Ripall/Dj/Mix: NTX800/All
No ratings yet
Bt/Aux/Usb/Tuner/Nfc/Mic/Ripall/Dj/Mix: NTX800/All
45 pages
Echo Indigo DJ Tutorial
No ratings yet
Echo Indigo DJ Tutorial
11 pages
Driver Time Alignment
No ratings yet
Driver Time Alignment
6 pages
Ken Ishiwata
No ratings yet
Ken Ishiwata
3 pages
Audio Compression in Film and Documentary Editing
No ratings yet
Audio Compression in Film and Documentary Editing
3 pages
Sonimus Satson Channel Strip 1.1: Licensing
No ratings yet
Sonimus Satson Channel Strip 1.1: Licensing
9 pages
NOBELS - Mix 41c
No ratings yet
NOBELS - Mix 41c
1 page
VT1731 - VIA Spec Page
No ratings yet
VT1731 - VIA Spec Page
2 pages
Digital Signal Processing for Audio Applications: Volume 1 - Formulae
From Everand
Digital Signal Processing for Audio Applications: Volume 1 - Formulae
Anton R Kamenov
No ratings yet
Some Case Studies on Signal, Audio and Image Processing Using Matlab
From Everand
Some Case Studies on Signal, Audio and Image Processing Using Matlab
Dr. Hedaya Mahmood Alasooly
No ratings yet
Noise Reduction: Enhancing Clarity, Advanced Techniques for Noise Reduction in Computer Vision
From Everand
Noise Reduction: Enhancing Clarity, Advanced Techniques for Noise Reduction in Computer Vision
Fouad Sabry
No ratings yet