CTN521 - 3 PCM Coding
CTN521 - 3 PCM Coding
Overview of Today
• PCM
– Linear Sampling Techniques
– m-LaW
• DPCM
• Generic Coding Techniques
ADPCM
• MPEG-1 Psychoacoutic Coding
• Vocoding Speech Specific Techniques
Audio Signals
• Analog audio is basically voltage as a continuous function
of time.
• Unlike video which is 3D, audio is a 1D signal.
– Can capture without having to discretize the higher dimensions.
• Audio sampling basically boils down to quantizing signal
level to a set of values.
• Digital audio parameters:
– bits per sample
– sampling rate
– number of channels.
Sampling
Input
m-law and A-law
• Non-linear sampling called “companding”
• 8-bits companded provides dynamic range
equivalent to 12-bits.
• U-law and A-law are companding standards
defined in G.711
• Difference is in exact shape of piece-wise
linear companding function.
m -Law companding
...
...
...
29-31 15
31-35 1111 16
4 001 0000
...
...
...
...
91-95 31
95-103 1111 32
8 010 0000
...
...
...
...
215-223 47
223-239 1111 48
16 011 0000
...
...
...
...
463-479 63
1111
m -Law Decoding
High-resolution 8-bit Inverse
PCM encoding Table -Law 14-bit
Table decoding
(12, 14, 16 bits) Lookup encoding Lookup
Sender Receiver
...
...
0001111 30
0010000 33
4
...
...
0011111 93
0100000 99
8
...
...
0101111 219
0110000 231
16
...
...
0111111 471
Difference Encoding
0100
0011
0010
0001
0000
1001
1010
1011
1100
• Differential-PCM (DPCM)
– Exploit temporal redundancy in samples
– Difference between 2 x-bit samples can be represented
with significantly fewer than x-bits
– Transmit the difference (rather than the sample)
Slope Overload Problem
0100
0011
0010
0001
0000
1001
1010
1011
“Slope Overload”
1100
Quantizer Step-Size
Quantization Output Multiples
difference < 1 4 step_size 000 0.0
1 1
4 step_size < difference < 2 step_size 001 0.25
1 step_size < difference < 3 step_size 010 0.50
2 4
3 step_size < difference < step_size 011 0.75
4
step_size < difference < 5 4 step_size 100 1.0
5 step_size < difference < 3 step_size 101 1.25
4 2
3 step_size < difference < 7 step_size 110 1.5
2 4
7 step_size < difference 111 1.75
4
IMA Step-size Table
Step Step Step Step Step
Index Size Index Size Index Size Index Size Index Size
0 7 18 41 36 230 54 1282 72 7132
1 8 19 45 37 253 55 1411 73 7845
2 9 20 50 38 279 56 1552 74 8630
3 10 21 55 39 307 57 1707 75 9493
4 11 22 60 40 337 58 1878 76 10442
5 12 23 66 41 371 59 2066 77 11487
6 13 24 73 42 408 60 2272 78 12635
7 14 25 80 43 449 61 2499 79 13899
8 16 26 88 44 494 62 2749 80 15289
9 17 27 97 45 544 63 3024 81 16818
10 19 28 107 46 598 64 3327 82 18500
11 21 29 118 47 658 65 3660 83 20350
12 23 30 130 48 724 66 4026 84 22358
13 25 31 143 49 796 67 4428 85 24623
14 28 32 157 50 876 68 4871 86 27086
15 31 33 173 51 963 69 5358 87 29794
16 34 34 190 52 1060 70 5894 88 32767
17 37 35 209 53 1166 71 6484
Adaptive Step-size Selection
16-bit + Difference
4-bit
PCM
Sample
+ Quantizer ADPCM
difference
– (in step-size units)
PCM Step-Size
Sample Adjuster
n–1 +
Register + +
Dequantizer
Index
Range Limit Adjustment Step-Size Quantize
Step-Size
Table
(0 to 88) + Table Index r
Output
Lookup Adjustment
Previous Lookup
Index Register
New Step-Size
Adaptive Step-size Selection
Range Limit Step-Size Difference
Step-Size
Table (0 to 88) + Index Table Index
Difference
Quantizer
Quantizer
Lookup Adjustment Adjustment
Previous Lookup
Index Register
New Step-Size
80
Sound Masking tone
Level 60
(dB) 40
Masked tone
20
Inaudible
0 Frequency
0.02 0.05 0.1 0.2 0.5 1 2 5 10 20 (kHz)
Psycho-
Frame Encoded
acoutstic Bitstream
Packing
Model
Ancillary Data
Subband Filter
• Transforms signal from time domain to
frequency domain.
– 32 PCM samples yields 32 subband samples.
• Each subband corresponds to a freq. band evenly
spaced from 0 to Nyquist freq.
– Filter actually works on a window of 512
samples that is shifted over 32 samples at a time.
• Subband coefficients are analyzed with
psychoacoustic model, quantized, and coded.
Layer 1
• 384 samples per frame.
• Iterative bit allocation process:
– For each subband, determine MNR.
– Increase number of quantization bits for
subband with smallest MNR.
– Iterate until all bits used.
• Fixed allocation of bits among subbands for
a particular frame.
• Up to 448 kb/s
Layer 2
• 1152 samples per frame.
• Iterative bit allocation.
• Subband allocation is dynamic.
• Up to 384 kb/s
Layer 3
• 1152 samples
– Up to 320 kb/s
• Each subband further analyzed using MDCT
to create 576 frequency lines.
– 4 different windowing schemes depending on
whether samples contain “attack” of new
frequencies.
• Lots of bit allocation options for quantizing
frequency coefficients.
• Quantized coefficients Huffman coded.
Vo-coding
• Concept: Develop a mathematical
model of the vocal cords & throat
– Derive/compute model parameters for
a short interval and transmit to the
decoder
– Use the parameters to synthesize
speech at the decoder
Amplitude 60
45
30
15
Frequency
0 (kHz)
• Vocoding principles:
– voice = formants + buzz pitch & intensity
– voice – estimated formants = “residue”
• Linear Predictive Coding (LPC)
– A sample is represented as a linear combination of p
previous samples
p
y(n) =
k=1
ak y(n – k) + G x x(n)
LPC
• Decoder artificially generates speech via formant synthesis
– A mathematical simulation of the vocal tract as a series of bandpass
filters
– Encoder codes & transmit filter coefficients, pitch period, gain
factor, & nature of excitation
• Standards:
– Regular Pulse Excited Linear Predictive Coder (RPE-LPC)
• Digital cellular standard GSM 6.1 (13 kbps)
– Code Excited Linear Predictive Coder (CELP)
• US Federal Standard 1016 (4.8 kbps)
– Linear Predictive Coder (LPC)
• US Federal Standard 1015 (2.4 kbps)
Networking Concerns
• Audio bandwidth is actually quite small.
• But human sensitivity to loss and noise is
quite high.
• Netwoking concerns:
– Loss concealment
– Jitter control
• Especially for telephony applications.