Basic Audio Compression Techniques
Basic Audio Compression Techniques
Techniques
Aron Thomas
Some key concepts
1. Hearing Threshold
• It defines the minimum volume at which a sound is perceivable.
• Any audio samples below this threshold can be safely deleted.
• Ex. Soft background noise that is not audible can be removed without
affecting the perceived audio quality.
2. Frequency Masking
• Occurs when a sound that is normally heard is masked by another sound of
a nearby frequency
• Audio compression techniques employ this property to remove sounds that
are masked, thereby reducing the file size.
Some key concepts
3. Temporal Masking
• Occurs when a strong sound is preceded or followed in time by a weaker
sound at a nearby or same frequency.
• Affects sounds occurring closely in time.
• Used to eliminate inaudible sounds occurring around louder ones.
4. Critical Bands
• The range of audible frequencies can be split into a bunch of critical bands.
• Critical bands are determined according to the sound perception of the ear.
• There are 27 critical bands and audio compression techniques assess each
critical band for unnecessary sound data and remove them
Audio Compression
• Audio compression reduces the storage size of audio files while
maintaining sound quality.
• Compression techniques can be lossy or lossless.
• Two features of audio compression:
• It can be lossy
• It requires fast decoding.
• Audio compression methods are asymmetric.
• This is also the reason why dictionary based methods are not
used for audio compression.
Conventional Methods
• RLE can work well only in cases where there are long runs of the
same samples.
• 8 bit audio can produce such cases. However 16 bit audio has a
higher variability and therefore RLE will be ineffective.
• Statistical methods will not respond well to 8 bit audio, but may
respond well to 16 bit audio.
• Dictionary based methods are not well suited for audio
compression as audio can have minute changes.
Lossy Audio Compression
• The data is first normalized to the range [-1, 1] and then the
formula is applied. This is then scaled to the range [-256, 256]
ADPCM Audio Compression
• Adjacent audio samples tend to be similar. Thus, one can code the
differences between each successive samples rather than absolute
values.
• Such a kind of compression method is referred to as Differential Pulse
Code Modulation.
• ADPCM stands for Adaptive Pulse Code Modulation.
• It employs linear prediction. It uses previous samples to predict the
current sample, computes the difference between them, and
quantizes the difference.
• Decoding is done by multiplying this difference with the quantization
step. These differences are then added to the predicted values.
ADPCM Audio Compression
ADPCM Audio Compression
• Consider 16-bit input samples [1000, 1012, 1020, 1008]
Encoding
Quantized C[n] Dequantized
X[n] Xp[n−1] D[n]=X[n]−Xp[n−1] Xp[n]
C[n]C[n] Dq[n]
1000 1000 0 0000 0 1000
1012 1000 12 0011 10 1010
1020 1010 10 0010 8 1018
1008 1018 -10 1101 -8 1010
Decoding
C[n] Xp[n−1] Dq[n] Xn (Reconstructed)
0000 1000 0 1000
0011 1000 10 1010
0010 1010 8 1018
1101 1018 -8 1010
Lossless Audio Compression
• Lossless audio compression techniques work by removing
redundancies in the audio signal
• These files have a larger size as compared to lossy compression
methods.
• Such files are used in music production, Blu-Ray audio, Archival
and long term storage of audio.
• Some lossless audio codecs include FLAC, ALAC, MLP.
MLP Audio
• Stands for Meridian Lossless Packing.
• This is developed by Meridian Audio and it compresses high-
fidelity digital audio by eliminating redundancy without loss of
data.
• It supports upto 192 kHz sample rates and 63 channels and is
optimized for DVD-Audio.
• It can also handle variable sample rates.
MLP Audio
• Steps include:
1. Lossless Processing
- Removes any unnecessary information in the audio signals
2. Matrixing
- Similar audio signals across channels removed using an affine transformation matrix.
3. IIR Filtering
- Predicts next samples based on the previous ones and stores the differences.
4. Entropy Encoding
- These differences are further compressed using entropy encoding through variable-length codes.
5. FIFO buffering
- This output is put into a FIFO buffer to smooth data output.
• Output from the buffer is divided into packets, check bits and restart
points are added.
Shorten
• A simple, special-purpose, lossless compressor for waveform files.
• Works on any file whose samples go up and down in a wave format.
• Performs best on low-amplitude, low-frequency, waveforms.
• Compression process:
1. Partition the audio into blocks.
2. Predict each sample using the previous sample.
3. Encode the differences using a variable-size code.