0% found this document useful (0 votes)
47 views13 pages

Lecture 10 mp3

MP3 is a popular digital audio format that uses lossy compression to greatly reduce file sizes while maintaining good sound quality. It works by removing aspects of the audio that are inaudible to humans, such as sounds above the range of human hearing or quiet sounds masked by louder ones. This allows for more efficient storage and streaming of music files. While some audio quality is lost, standard MP3 bitrates provide near-CD quality sound at much smaller file sizes than uncompressed formats. MP3 remains widely supported despite newer formats due to its balance of quality and efficiency.

Uploaded by

Mido Alaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views13 pages

Lecture 10 mp3

MP3 is a popular digital audio format that uses lossy compression to greatly reduce file sizes while maintaining good sound quality. It works by removing aspects of the audio that are inaudible to humans, such as sounds above the range of human hearing or quiet sounds masked by louder ones. This allows for more efficient storage and streaming of music files. While some audio quality is lost, standard MP3 bitrates provide near-CD quality sound at much smaller file sizes than uncompressed formats. MP3 remains widely supported despite newer formats due to its balance of quality and efficiency.

Uploaded by

Mido Alaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

MP3

MP3 is a digital audio format that is used to compress and store audio files
in a way that reduces their size while maintaining a high level of sound
quality. It was developed by the Moving Picture Experts Group (MPEG) and
has become a popular format for storing and playing music on computers
and other digital audio players.

The MP3 format uses a compression algorithm that reduces the size of audio
files by removing parts of the audio signal that are not perceived by the
human ear. This compression allows for more efficient storage and
transmission of audio files without significant loss of sound quality.

MP3 files can be played on a wide range of devices, including computers,


smartphones, and portable audio players. They can also be streamed over
the internet or downloaded from online music stores.

One of the main advantages of MP3 is its small file size, which allows for the
storage of many audio files on a single device or storage medium. However,
this compression also means that some of the original audio data is lost,
which can affect the overall sound quality.

Despite the emergence of newer audio formats, MP3 remains a popular and
widely used format for digital audio.
Perceptual coding

What MP3 Compression Is & Why It Exists


Nobody cared about this stuff when we were working in the analog field. We had
vinyl records, 8-tracks, cassette tapes, and compact discs (these are digital but
didn’t need compression). MP3’s became a “thing” after the explosion of the
internet.

MP3 encoding represents massive savings, even more at 128 kbps bit rate.

A typical uncompressed wave file might be as big as 30 MB for a typical 3 minute


song. But after being run through the MP3 compression algorithms that might drop
down to 3 MB without any serious loss of quality.

This was preferable when our bandwidth speeds were extremely low on dial-up
modems and we might of even had bandwidth caps for the month. Instead of
waiting days to download a song, we could do it in a couple hours (and in the
present, a couple of seconds!).
MP3’s are maintaining their presence due to MP3 players like the iPod. They have
limited hard drive or flash drive space, so with compression we can carry around a
lot more music.

Plus there’s no need for full resolution files when we’re doing yard work or at the
gym using tiny sports earphones. It’s also a huge space and bandwidth saver for
online streaming services.

MP3 stands for MPEG Layer 3


MPEG is a video file type that did the same thing for videos as MP3’s did for audio.
In fact, MP3’s are just the 3rd layer set apart for audio on the video files. It’s all the
same technology.

How Does MP3 Compression Save So Much Space?


Here’s where it gets crazy. The people who designed these compression algorithms
used our knowledge of psychoacoustics to manage the data bandwidth.
Psychoacoustics refers to how our brain interprets sounds.

The brain uses certain tricks like auditory masking to allocate resources and
attention to what is the most important sound happening at any given time. Using
this info, we know what we can get rid of, data-wise.

Adult Hearing Loss

The first and easiest savings are to go ahead and cut out a certain frequency range
if the music allows for it. Adults begin to lose their capacity for hearing above 16-
18 kHz, whereas the top limit for humans is around 24 kHz. At that level there’s not
a lot going on in terms of intelligibility. It’s just “sparkle, shine, sheen.”

In most cases, we don’t need to have it at all or at least can encode it into the MP3
file at a lower resolution.

De-Emphasize the Quiet


This refers to something our ears and brains do called simultaneous masking.
Basically, if a loud sound is blaring out over the top of a lot of low-volume sounds,
you’re naturally going to focus on the loud sound. What this means is that we can
spend lot less data on the quiet sounds. They don’t need as much detail encoded
in them during those times.
Temporal Masking
In the same fashion above, if two sound events occur within milliseconds of each
other, we’re only going to be able to focus on the loudest one. It’s how we’ve been
evolutionarily primed to react. Our ears and minds can’t separate events that close
in time.

So what the encoder algorithm does is ignore or at least allocate much less data to
the quieter sound since we won’t perceive it anyways.

Minimum Audition Threshold


The minimum audition threshold refers to volume. As a voice or sound becomes
quieter and quieter, we’re able to make out less and less detail. The encoder knows
this and chooses to not save every single detail of quiet sounds since we can’t use
it anyways. And if a sound dips below a certain volume threshold where the human
ear can’t hear it, then it gets tossed out completely.

Bit Rate, Bit Depth, & Sample Rate Management


And finally this is where the real work is done. Once you’ve processed all of the
savings mentioned above, you’re still going to be left with a hefty file of large size.
That’s because all of the left over data is still being stored at the highest resolution
possible. Here’s how the geniuses behind MP3 solved it.

First and foremost, MP3 is a lossy data compression technique by definition


because we immediately drop the bit depth of the audio from 24 bit or above down
to 16 bit. Lossy refers to this drop in resolution but doesn’t have to mean a loss in
audio quality.

16 bit is a depth that has plenty of headroom to provide a high signal-to-noise ratio.
It means that every sample has 16 bits to encode with (using a 0 or a 1 in binary).
By dropping from 24 bit to 16 bit we’ve already made a 25% saving in size with no
discernible quality difference.

Speaking of each sample having 16 bits each… that’s another place massive savings
are made. Sample rates can get as high as 96,000 samples per second! 44.1 kHz is
your typical sample rate for MP3’s and that’s still a ton of samples per second, but
it represents a 50% drop in the amount of data being stored versus 96 kHz sample
rates.

The basics is that a lower sample rate captures less “snap shots” of each moment
of music. You can think of it like a movie or a video game at 60 frames-per-second
versus the typical 24 fps. 24 is more than good enough but 60 looks great during
fast action scenes. It works the same for music and sample rates.

And finally we set a limit to the data throughput. This takes into account everything
mentioned above and then sets a ceiling on how much data you can send at once.
Most MP3 streaming and selling services use a CBR, which is a constant bit rate,
usually of 128 kilobytes per second.

Other common options are 192 kbps, and 320 kbps which is the highest available
on MP3 and as good as uncompressed audio quality. Some stream services will only
send 64 kbps and you can definitely tell. Quality takes a serious drop below 128
kbps.

Constant bit rates are preferable for these services and consumers because it helps
them predict their bandwidth and storage needs. But advances have been made
for personal use such as VBR, which is a variable bit rate.

What this does is allows a lower bit rate during quiet parts of songs and a higher bit
rate at louder or more complex parts of a song. This is preferable for those who
prefer the highest quality audio but still desire the data savings of MP3’s.

MP3 Audio Compression


MP3s are audio files compressed using lossy compression.

The lossy compression allows great savings in file size, with the average MP3 file
being 90% smaller than an equivalent uncompressed audio file.

Like all lossy compressed files, savings in size are made by deleting data that the
computer believes is redundant and will not be missed by the user.

MP3 audio compression reduces a file size through:

 Perceptual music shaping


 Reducing the audio bitrate

Perceptual Music Shaping

Perceptual music shaping refers to the process of removing inaudible sounds in order
to make a file size smaller.

Inaudible sounds may include:

 Noises at frequencies that humans cannot hear


 Quiet sounds that cannot be heard over louder sounds

Bitrate

In audio files, the bitrate is the number of bits that need to be processed every
second. This is measured in kilobits per second.

The bitrate is calculated by multiplying the sample rate by the bit depth and number
of audio channels.

The bigger the bitrate, the better the sound quality, but the larger the file size.

Sample Rate

The sample rate (measured in Hz or kHz) is the number of samples (snapshots in


time) of sound that are recorded to represent an audio performance.
Taking more samples per second will result in a more accurate and better sounding
audio file. However, increasing the sample rate increases the file size.

Bit Depth

Bit depth is the number of bits of information recorded in each sample.

MP3s with a high bit depth will contain a wider spectrum of frequencies, giving a more
accurate recording of the audio performance. However, the higher the bit depth, the
greater the file size.

Finding the sweet spot

Deciding how best to compress audio will depend on many factors.

Aggressive compression will allow you to squeeze more tunes onto your device, but
what’s the point if they sound terrible?

On the other hand, what’s the point of downloading large, slightly compressed files, if
you only intend to play them through cheap speakers and cannot tell the difference
anyway?

When streaming audio, we want it to download quickly and sound amazing, but
unfortunately we can’t have it all!

Somewhere in the middle though is an acceptable compromise and a compression


“sweet spot”, ideal for that particular purpose.

The Algorithm

The Algorithm for MP3 Compression


As seen in the following diagram, the process of MP3 compression can be
broken down into steps. First, the input audio stream passes through a filter
bank that divides the sound into subbands of frequency. Simultaneously, it
passes through a psychoacoustic model that utilizes the concept of auditory
masking to determine what can or cannot be heard in each subband. The bit
allocation block minimizes the audibility of noise. Finally, the bit stream
formatting block accumulates all the information and processes it into a
coded bitstream (Pan 2).

Layer Coding Options

As the name states, MP3 has three distinct layers for compression. Layer 1
forms the most basic algorithm and the other two layers enhance Layer 1.
This section summarizes the main differences between the layers while the
following sections delve into details about the stages.

The Layer 1 algorithm codes audio data by grouping together 12 samples


from each of the 32 subbands created in the filter bank stage for a total of
384 subbands, as seen in the figure below. Each group of 12 samples gets
a bit allocation and scale factor. The bit allocation tells the decoder the
number of bits used to represent the sample while the scale factor is a
multiplier that sizes the samples.

The Layer 2 algorithm enhances Layer 1 by coding data in larger groups and
imposing restrictions on bit allocations for values in higher subbands. The
encoder in Layer 2 groups three groups of twelve samples as organized in
Layer 1. See the figure below. Additionally, Layer 2 saves bits by
representing the bit allocation, scale factor values, and quantized samples
(to quantize means to limit the possible values of a magnitude or quantity to
a discrete set of values) with more compact code. This allows for more bits
to be dedicated to improving audio quality.

Layer 3 is an improvement above the other two layers because it utilizes a


transformation known as the Modified Discrete Cosine Transform (MDCT) to
represent the frequency of the signal at 32 different frequency bands.

Using MPEG compression techniques, data can be reduced to the following


percentages while still maintaining CD sound quality:

25% By Layer 1

16% to 12% By Layer 2

10% to 8% By Layer 3

The Hybrid Filter Bank


The purpose of the filter bank is to divide the audio signal into 32 equal-width
frequency subbands. Empirical evidence has shown that the human ear has
a limited resolution that can be expressed in terms of critical bandwidths less
than 100Hz and more than 4kHz. Within a critical bandwidth the human ear
blurs frequencies. Thus the filter bank creates equal-width frequency
subbands that correlate to the critical bandwidths in a method diagrammed
in the following figure.

While Layer 1 and Layer 2 uses just a polyphase filterbank, an


additional MDCT is used in Layer 3 compression.

The Psychoacoustic Model

The branch of psychoacoustics examines the concept of auditory masking


and its effect on compression. Within each subband where blurring occurs
the presence of a strong tonal signal can mask a region of weaker signals.
This is evidenced in the following figure.
If the noise resulting from approximation of sound data can be kept below
the masking threshold for each partition, then the compression result should
be indistinguishable from the original audio data.

Next step: Bit allocation

Bit Allocation

Through an iterative algorithm, the bit allocation uses information from the
psychoacoustic model to determine the number of code bits to be allocated
to each subband. This process can be described using the following formula:

MNRdB = SNRdB - SMRdB

where:

MNRdB is the mask-to-noise ratio

SNRdB is the signal-to-noise ratio, given with the MPEG audio standard in
a table

SMRdB is the signal-to-mask ratio, derived from the psychoacoustic model


Then the subbands are placed in order of lowest to highest mask-to-noise
ratio, and the lowest subband is allocated the smallest number of code bits
and this process continues until no more code bits can be allocated (Pan 5).

Two nested iteration loops called the rate loop and the distortion loop serve
to quantize and code in MP3 encoders. The quantized values are coded
using Huffman methods, which is lossless.

You might also like