0% found this document useful (0 votes)
18 views17 pages

Basic Audio Compression Techniques

The document discusses various audio compression techniques, highlighting key concepts such as hearing threshold, frequency and temporal masking, and critical bands. It outlines both lossy and lossless compression methods, including silence compression, companding, and ADPCM, as well as specific codecs like MP3, FLAC, and MLP. The document emphasizes the importance of maintaining audio quality while reducing file size and details the processes involved in different compression techniques.

Uploaded by

advaithmanoj10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views17 pages

Basic Audio Compression Techniques

The document discusses various audio compression techniques, highlighting key concepts such as hearing threshold, frequency and temporal masking, and critical bands. It outlines both lossy and lossless compression methods, including silence compression, companding, and ADPCM, as well as specific codecs like MP3, FLAC, and MLP. The document emphasizes the importance of maintaining audio quality while reducing file size and details the processes involved in different compression techniques.

Uploaded by

advaithmanoj10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Basic Audio Compression

Techniques

Aron Thomas
Some key concepts
1. Hearing Threshold
• It defines the minimum volume at which a sound is perceivable.
• Any audio samples below this threshold can be safely deleted.
• Ex. Soft background noise that is not audible can be removed without
affecting the perceived audio quality.

2. Frequency Masking
• Occurs when a sound that is normally heard is masked by another sound of
a nearby frequency
• Audio compression techniques employ this property to remove sounds that
are masked, thereby reducing the file size.
Some key concepts
3. Temporal Masking
• Occurs when a strong sound is preceded or followed in time by a weaker
sound at a nearby or same frequency.
• Affects sounds occurring closely in time.
• Used to eliminate inaudible sounds occurring around louder ones.

4. Critical Bands
• The range of audible frequencies can be split into a bunch of critical bands.
• Critical bands are determined according to the sound perception of the ear.
• There are 27 critical bands and audio compression techniques assess each
critical band for unnecessary sound data and remove them
Audio Compression
• Audio compression reduces the storage size of audio files while
maintaining sound quality.
• Compression techniques can be lossy or lossless.
• Two features of audio compression:
• It can be lossy
• It requires fast decoding.
• Audio compression methods are asymmetric.
• This is also the reason why dictionary based methods are not
used for audio compression.
Conventional Methods
• RLE can work well only in cases where there are long runs of the
same samples.
• 8 bit audio can produce such cases. However 16 bit audio has a
higher variability and therefore RLE will be ineffective.
• Statistical methods will not respond well to 8 bit audio, but may
respond well to 16 bit audio.
• Dictionary based methods are not well suited for audio
compression as audio can have minute changes.
Lossy Audio Compression

• Works by discarding information that is not perceptible to the


human ear.
• Audio, in itself may lose some quality during digitization and if
compression is done right, playback quality may not be affected.
• Two common approaches to lossy audio compression are Silence
Compression and Companding.
• Some lossy audio codes are MP3, AAC, OGG, WMA, etc.
Silence Compression
• This method treats smaller amplitude sound samples as silence.
• This results in a sequence of consecutive zeroes, and thus this
method can be considered a variation of RLE.
• Audio that contains long segments of low-amplitude sounds
respond well to silence compression.
• A user-defined parameter is required to specify the maximum
amplitude that should be suppressed.
• Two other parameters – Minimum run length, Silence termination
condition.
Companding
• This method takes advantage of how the human ear works.
• Greater precision at lower amplitudes and can tolerate more
errors at higher amplitudes.
• Companding applies a non-linear formula to reduce the number
of bits per sample.
• Ex. Consider 16-bit audio, it can have 65,536 samples.
Companding can apply a formula to non-linearly map these
samples to 15-bit numbers. Doing so can reduce the range to
[0, 32,767]
Companding
• More complex methods are present which include μ-law and A-
law companding. They have been designated as international
standards.
• The μ-law encoder takes a 14-bit signed input sample and
outputs 8-bit code words, whereas the A-law encoder takes a 13-
bit signed input sample and outputs 8-bit code words.

• The data is first normalized to the range [-1, 1] and then the
formula is applied. This is then scaled to the range [-256, 256]
ADPCM Audio Compression
• Adjacent audio samples tend to be similar. Thus, one can code the
differences between each successive samples rather than absolute
values.
• Such a kind of compression method is referred to as Differential Pulse
Code Modulation.
• ADPCM stands for Adaptive Pulse Code Modulation.
• It employs linear prediction. It uses previous samples to predict the
current sample, computes the difference between them, and
quantizes the difference.
• Decoding is done by multiplying this difference with the quantization
step. These differences are then added to the predicted values.
ADPCM Audio Compression
ADPCM Audio Compression
• Consider 16-bit input samples [1000, 1012, 1020, 1008]
Encoding
Quantized C[n] Dequantized
X[n] Xp[n−1] D[n]=X[n]−Xp[n−1] Xp[n]
C[n]C[n] Dq[n]
1000 1000 0 0000 0 1000
1012 1000 12 0011 10 1010
1020 1010 10 0010 8 1018
1008 1018 -10 1101 -8 1010

Decoding
C[n] Xp[n−1] Dq[n] Xn (Reconstructed)
0000 1000 0 1000
0011 1000 10 1010
0010 1010 8 1018
1101 1018 -8 1010
Lossless Audio Compression
• Lossless audio compression techniques work by removing
redundancies in the audio signal
• These files have a larger size as compared to lossy compression
methods.
• Such files are used in music production, Blu-Ray audio, Archival
and long term storage of audio.
• Some lossless audio codecs include FLAC, ALAC, MLP.
MLP Audio
• Stands for Meridian Lossless Packing.
• This is developed by Meridian Audio and it compresses high-
fidelity digital audio by eliminating redundancy without loss of
data.
• It supports upto 192 kHz sample rates and 63 channels and is
optimized for DVD-Audio.
• It can also handle variable sample rates.
MLP Audio
• Steps include:
1. Lossless Processing
- Removes any unnecessary information in the audio signals

2. Matrixing
- Similar audio signals across channels removed using an affine transformation matrix.

3. IIR Filtering
- Predicts next samples based on the previous ones and stores the differences.

4. Entropy Encoding
- These differences are further compressed using entropy encoding through variable-length codes.

5. FIFO buffering
- This output is put into a FIFO buffer to smooth data output.

• Output from the buffer is divided into packets, check bits and restart
points are added.
Shorten
• A simple, special-purpose, lossless compressor for waveform files.
• Works on any file whose samples go up and down in a wave format.
• Performs best on low-amplitude, low-frequency, waveforms.
• Compression process:
1. Partition the audio into blocks.
2. Predict each sample using the previous sample.
3. Encode the differences using a variable-size code.

• If there are multiple channels, Shorten first separates them before


compression.
• Predictors of different orders are present: Zeroth-order, First-order, and
so on.
FLAC
• Stands for Free Lossless Audio Compression.
• It is open-source, optimized for real-time playback.
• It is supported on multiple platforms including Windows, Linux, macOS,
BeOS, OS/2, and Unix-based systems.
• Compression process:
1. Partition the audio into blocks.
2. Predict each sample using the previous sample.
3. Encode the differences using Rice codes.
4. Store metadata(sampling rate, channels, etc).

• FLAC also supports robust error detection and metadata support


through MD5 signatures, CRC checksums, seek tables, Tags, Cuesheets

You might also like