Lecture 10 mp3
Lecture 10 mp3
MP3 is a digital audio format that is used to compress and store audio files
in a way that reduces their size while maintaining a high level of sound
quality. It was developed by the Moving Picture Experts Group (MPEG) and
has become a popular format for storing and playing music on computers
and other digital audio players.
The MP3 format uses a compression algorithm that reduces the size of audio
files by removing parts of the audio signal that are not perceived by the
human ear. This compression allows for more efficient storage and
transmission of audio files without significant loss of sound quality.
One of the main advantages of MP3 is its small file size, which allows for the
storage of many audio files on a single device or storage medium. However,
this compression also means that some of the original audio data is lost,
which can affect the overall sound quality.
Despite the emergence of newer audio formats, MP3 remains a popular and
widely used format for digital audio.
Perceptual coding
MP3 encoding represents massive savings, even more at 128 kbps bit rate.
This was preferable when our bandwidth speeds were extremely low on dial-up
modems and we might of even had bandwidth caps for the month. Instead of
waiting days to download a song, we could do it in a couple hours (and in the
present, a couple of seconds!).
MP3’s are maintaining their presence due to MP3 players like the iPod. They have
limited hard drive or flash drive space, so with compression we can carry around a
lot more music.
Plus there’s no need for full resolution files when we’re doing yard work or at the
gym using tiny sports earphones. It’s also a huge space and bandwidth saver for
online streaming services.
The brain uses certain tricks like auditory masking to allocate resources and
attention to what is the most important sound happening at any given time. Using
this info, we know what we can get rid of, data-wise.
The first and easiest savings are to go ahead and cut out a certain frequency range
if the music allows for it. Adults begin to lose their capacity for hearing above 16-
18 kHz, whereas the top limit for humans is around 24 kHz. At that level there’s not
a lot going on in terms of intelligibility. It’s just “sparkle, shine, sheen.”
In most cases, we don’t need to have it at all or at least can encode it into the MP3
file at a lower resolution.
So what the encoder algorithm does is ignore or at least allocate much less data to
the quieter sound since we won’t perceive it anyways.
16 bit is a depth that has plenty of headroom to provide a high signal-to-noise ratio.
It means that every sample has 16 bits to encode with (using a 0 or a 1 in binary).
By dropping from 24 bit to 16 bit we’ve already made a 25% saving in size with no
discernible quality difference.
Speaking of each sample having 16 bits each… that’s another place massive savings
are made. Sample rates can get as high as 96,000 samples per second! 44.1 kHz is
your typical sample rate for MP3’s and that’s still a ton of samples per second, but
it represents a 50% drop in the amount of data being stored versus 96 kHz sample
rates.
The basics is that a lower sample rate captures less “snap shots” of each moment
of music. You can think of it like a movie or a video game at 60 frames-per-second
versus the typical 24 fps. 24 is more than good enough but 60 looks great during
fast action scenes. It works the same for music and sample rates.
And finally we set a limit to the data throughput. This takes into account everything
mentioned above and then sets a ceiling on how much data you can send at once.
Most MP3 streaming and selling services use a CBR, which is a constant bit rate,
usually of 128 kilobytes per second.
Other common options are 192 kbps, and 320 kbps which is the highest available
on MP3 and as good as uncompressed audio quality. Some stream services will only
send 64 kbps and you can definitely tell. Quality takes a serious drop below 128
kbps.
Constant bit rates are preferable for these services and consumers because it helps
them predict their bandwidth and storage needs. But advances have been made
for personal use such as VBR, which is a variable bit rate.
What this does is allows a lower bit rate during quiet parts of songs and a higher bit
rate at louder or more complex parts of a song. This is preferable for those who
prefer the highest quality audio but still desire the data savings of MP3’s.
The lossy compression allows great savings in file size, with the average MP3 file
being 90% smaller than an equivalent uncompressed audio file.
Like all lossy compressed files, savings in size are made by deleting data that the
computer believes is redundant and will not be missed by the user.
Perceptual music shaping refers to the process of removing inaudible sounds in order
to make a file size smaller.
Bitrate
In audio files, the bitrate is the number of bits that need to be processed every
second. This is measured in kilobits per second.
The bitrate is calculated by multiplying the sample rate by the bit depth and number
of audio channels.
The bigger the bitrate, the better the sound quality, but the larger the file size.
Sample Rate
Bit Depth
MP3s with a high bit depth will contain a wider spectrum of frequencies, giving a more
accurate recording of the audio performance. However, the higher the bit depth, the
greater the file size.
Aggressive compression will allow you to squeeze more tunes onto your device, but
what’s the point if they sound terrible?
On the other hand, what’s the point of downloading large, slightly compressed files, if
you only intend to play them through cheap speakers and cannot tell the difference
anyway?
When streaming audio, we want it to download quickly and sound amazing, but
unfortunately we can’t have it all!
The Algorithm
As the name states, MP3 has three distinct layers for compression. Layer 1
forms the most basic algorithm and the other two layers enhance Layer 1.
This section summarizes the main differences between the layers while the
following sections delve into details about the stages.
The Layer 2 algorithm enhances Layer 1 by coding data in larger groups and
imposing restrictions on bit allocations for values in higher subbands. The
encoder in Layer 2 groups three groups of twelve samples as organized in
Layer 1. See the figure below. Additionally, Layer 2 saves bits by
representing the bit allocation, scale factor values, and quantized samples
(to quantize means to limit the possible values of a magnitude or quantity to
a discrete set of values) with more compact code. This allows for more bits
to be dedicated to improving audio quality.
25% By Layer 1
10% to 8% By Layer 3
Bit Allocation
Through an iterative algorithm, the bit allocation uses information from the
psychoacoustic model to determine the number of code bits to be allocated
to each subband. This process can be described using the following formula:
where:
SNRdB is the signal-to-noise ratio, given with the MPEG audio standard in
a table
Two nested iteration loops called the rate loop and the distortion loop serve
to quantize and code in MP3 encoders. The quantized values are coded
using Huffman methods, which is lossless.