6-Audio Basics FileType MP3Compr
6-Audio Basics FileType MP3Compr
And Design
Overview of Sound as Multimedia
Element
1
What is SOUND?
• Sound comprises the spoken word, voices, music
• and even noise. It is a complex relationship
involving:
– a vibrating object (sound
source)
– a transmission medium
(usually air)
– a preceptor (brain).
2
The Power of Sound
4
Use of Sound
Sounds are either content sounds or ambient
sounds.
Content sounds furnish information
Narration, dialogue are content sounds.
Music and other sounds can be considered as content
sounds if they are parts of the topic themselves.
Ambient sounds reinforce messages and set the
mood
Background sounds and special effects are ambient
sounds.
Special sound effects can reinforce or enliven a
message.
5
Guidelines for Using Sound
Use the same style of music (if multiple sound
files are needed) to maintain a sense of unity
Coordinate sound files with other media
elements
Sound quality should be kept consistent
Record at a rate and resolution that is
appropriate to the delivery mode
6
Guidelines for Using Sound
Use the same voice for narration and
voiceovers, but different voices for different
characters
Optimize files for background music
Use sound cues for specific events
During voice-overs, background music should be
turned off or adjusted to a low volume such that
the spoken words can be understood without
difficulty
7
Example of
Waveforms
Piano
Pan flute
Snare drum
5
Sound 9
But
the waves of noise are irregular. They do not
have a repeated pattern.
Basic Principles of Sound
This analog wave pattern represents the volume and frequency of a sound.
10
Basic Principles of Sound
Amplitude: Distance between the valley and the
peak of a waveform; determines volume
Volume is measured in decibels (dB)
Decibel (dB) is a logarithmic unit used to describe a ratio.
One dB is close to Just Noticeable Difference (JND) for
sound level.
Frequency: Number of peaks that occur in one
second measured by the distance between the
peaks; determines pitch
11
Decibel
Table
dB Watts Example
195 25–40 million Saturn rocket
170 100,000 Jet engine with afterburner
160 10,000 Turbojet engine at 7,000-pounds
thrust
150 1,000 ALSETEX splinter less stun grenade
140 100 2 JBL2226 speakers
130 10 75-piece orchestra, at fortissimo
120 1 Large chipping hammer
110 0.1 Riveting machine
100 0.01 Automobile on highway
90 0.001 Subway train; a shouting voice
80 0.0001 Inside a 1952 Corvette at 60 mph
70 0.00001 Voice conversation; freight
train 100 feet away
60 0.000001 Large department store
50 0.0000001 Average residence or small business
Basic Principles of Sound
Analog sound is a continuous stream of sound waves.
For sound to be included in multimedia applications,
analog sound must be converted to digital form.
Digitizing (or sound sampling): the process of
converting analog sound to numbers
Digital Audio: An analog sound that has been
converted to numbers
14
Basic Principles of Sound
15
Basic Principles of Sound
Quantization of Sound: The process of converting a
continuous range of values into a finite range of
discreet values.
- This is a function of analog-to-digital converters,
which create a series of digital values to represent the
original analog signal.
- bit depth (number of bits available) determines the
accuracy and quality of the quantized value.
Basic Principles of Sound
17
Basic Principles of Sound
18
19
Characteristic of Sound Waves
• Sound is described in terms of two
characteristics:
– Frequency (or pitch)
– Amplitude (or loudness)
Frequency 20
Quiet
Loud
23
Sound Quality
Audio resolution
Also known as sample size or bit resolution
Number of binary bits used to represent each sound
sample
As the audio resolution increases, the quality of the
digital audio also improves.
Audio resolution determines the accuracy with which
sound can be digitized.
Common values: 8 bits, 16 bits
CD quality: 16 bits
24
Sound Quality
26
Downloaded vs. Streamed
27
Monophonic vs. Stereo Sound
28
Digital Audio File Size
File size of a digital audio recording (in bytes)
(assume that there is no compression)
29
Digital Audio File Size
Calculate the file size of a digital audio recording with a sampling rate of 44.1
KHz, recording resolution was 16 bit in stereo channel for a duration of 3 Mins.
= 31,752,000/1024
= 31,007.8152 KB .1024
= 30.28 MB
Calculate the file size of a digital audio recording with a voice quality of 11 KHz,
recording in 8 bit mono channel for a duration of 1,000 Mins.
Find the audio file size of a high quality music, recorded for 61 Mins in
44KHz rate at 16 bit resolution in stereo.
30
Digital Audio
• Digital audio data is the representation of sound, stored
in the form of samples point.
• Quality of digital recording depends on the sampling
rate, that is, the number of samples point taken per
second (Hz).
waveform
S am ple
S ample
Time Time
3-bit
Sampling quantization
Sampling rate: Number of 3-bit quantization gives 8
samples per second possible sample values
(measured in Hz) E.g., CD standard audio
E.g., CD standard audio uses 16-bit quantization
uses a sampling rate of giving 65536 values.
44,100 Hz (44100 samples Why Quantize?
per second) To Digitize!
Nyquist Sampling Theorem
Consider a sine
wave
Sampling once a
cycle
Appears as a
constant signal
• Nyquist rate = 2f m
What should be 38
This feature can be very useful, but watch out: most time-stretching algorithms will severely
degrade the audio quality of the file if the length is altered more than a few percent in either
direction.
CM3106 Chapter 4: Introduction to
Digital Audio
Sound Reception
Destination — Receives Sound
Electrical — Microphone produces electric signal
Ears — Responds to pressure hear sound
(MPEG Audio — exploits this fact)
CM3106 Chapter 4: Digital Audio What is Sound? 2
Digitising Sound
Microphone:
Receives sound
Converts to analog signal.
Computer like discrete entities
Need to convert Analog-to-Digital — Dedicated
Hardware (e.g. Soundcard)
Sample Rate
How many Samples to take?
11.025 KHz — Speech (Telephone 8 KHz)
22.05 KHz — Low Grade Audio
(WWW Audio, AM Radio)
44.1 KHz — CD Quality
0.5 0.5
−0.5
Sine Wave 0
−0.5
Aliased Sine Wav
−1 −1
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Aliased Piano
Web Link:
Click Here to Hear Sound Examples
CM3106 Chapter 4: Digital Audio Digital Sampling 15
Implications of Sample Rate and Bit Size (2)
Filtering of Signal
Synthesis Pipeline
• Linear Predictive Coding (LPC) fits signal to speech model and then
transmits parameters of model as in APC.
Speech Model:
• Speech Model:
• Synthesised speech
• More prediction coefficients than APC – lower sampling rate
• Still sounds like a computer talking,
• Bandwidth as low as 2.4 kbits/sec.
Psychoacoustics and perceptual coding
Semicircular canals
• Body’s balance mechanism.
• Thought that it plays no part in
hearing.
The cochlea:
• Pressure waves in the cochlea exert energy along a route that begins at the oval
window and ends abruptly at the membrane-covered round window.
• Pressure applied to the oval window is transmitted to all parts of the cochlea.
• Inner surface of the cochlea ( the basilar membrane) is lined with over 20,000
hair-like nerve cells — stereocilia:
Hearing different frequencies
After the ear hears a loud sound: It takes a further short while before it
can hear a quieter sound.
Why is this so?
• Stereocilia vibrate with corresponding force of input sound stimuli.
• Temporal masking occurs because any loud tone will cause the hearing receptors
in the inner ear to become saturated and require time to recover.
• If the stimuli is strong then stereocilia will be in a high state of excitation and get
fatigued.
• Hearing Damage: After extended listening to loud music or headphones this
sometimes manifests itself with ringing in the ears and even temporary deafness
(prolonged exposure permanently damages the stereocilia).
Example of temporal masking
• Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at 40
dB. Test tone can’t be heard (it’s masked).
Stop masking tone, then stop test tone after a short delay.
Adjust delay time to the shortest time that test tone can be heard
(e.g., 5 ms).
Repeat with different level of the test tone and plot:
Example of temporal masking
Try other frequencies for test tone (masking tone duration constant).
Total effect of masking:
Example of temporal masking
The longer the masking tone is played, the longer it takes for the test
tone to be heard. Solid curve: 200 ms masking tone, dashed curve: 100
ms masking tone.
Compression idea: how to exploit?
Analysis filters
• Also called critical-band filters
• Break signal up into equal width subbands
• Use filter banks (modified with discrete cosine
transform (DCT) Level 3)
• Filters divide audio signal into frequency subbands that
approximate the 32 critical bands
• Each band is known as a sub-band sample.
• Example: 16 kHz signal frequency, Sampling rate 32 kHz gives each
subband a bandwidth of 500 Hz.
• Time duration of each sampled segment of input signal is time to
accumulate 12 successive sets of 32 PCM (subband) samples, i.e.
32*12 = 384 samples.
Basic MPEG-1 Compression Algorithm
analysis filters
• In addition to filtering the input, analysis banks determine
• Maximum amplitude of 12 subband samples in each
subband.
• Each known as the scaling factor of the subband.
• Passed to psychoacoustic model and quantiser blocks
Basic MPEG-1 compression algorithm
Psychoacoustic modeller:
• Frequency Masking and may employ temporal masking.
• Performed concurrently with filtering and analysis operations.
• Uses Fourier Transform (FFT) to perform analysis.
• Determine amount of masking for each band caused by nearby
bands.
• Input: set hearing thresholds and subband masking
properties (model dependent) and scaling factors (above).
Basic MPEG-1 compression algorithm
Example of quantisation:
• Assume that after analysis, the levels of first 16 of the 32 bands are:
----------------------------------------------------------------------
Band 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Level (db) 0 8 12 10 6 2 10 60 35 20 15 2 3 5 3 1
----------------------------------------------------------------------
Each level:
• Increasing levels of sophistication
• Greater compression ratios.
• Greater computation expense (but mainly at the coder side)
Level 1
• Best suited for bit rate bigger than 128 kbits/sec per channel.
• Example: Phillips Digital Compact Cassette uses Layer 1 192
kbits/sec compression
• Divides data into frames,
• Each of them contains 384 samples,
• 12 samples from each of the 32 filtered subbands as shown above.
• Psychoacoustic model only uses frequency masking.
• Optional Cyclic Redundancy Code (CRC) error checking.
Level 1 (and Level 2) audio layers
4.13:
30
Layer 2
• Aim: ensure that all of the quantisation noise is below the masking
thresholds
• Compute the mask-to-noise ratio (MNR) for all subbands:
M N RdB = SN RdB − SM RdB
where
M N RdB is the mask-to-noise ratio,
SN RdB is the signal-to-noise ratio (SNR), and
SM RdB is the signal-to-mask ratio from the psychoacoustic
model.
After this the process repeats. The process stops if any of these three
conditions is true:
• None of the scale factor bands have more than the allowed
distortion.
• The next iteration would cause the amplification for any of the
bands to exceed the maximum allowed value.
• The next iteration would require all the scale factor bands to be
amplified.
Encoding:
• Code some upper-frequency subband outputs:
• A single summed signal instead of sending independent left and
right channels codes
• Codes for each of the 32 subband outputs.
Decoding:
• Reconstruct left and right channels
• Based only on a single summed signal
• Independent left and right channel scale factors.
MPEGAudio (DIRECTORY)
MPEGAudio.zip (All Files Zipped)
Dolby audio compression
Application areas:
• FM radio Satellite transmission and broadcast TV audio
(DOLBY AC-1)
• Common compression format in PC sound cards
(DOLBY AC-2)
• High Definition TV standard advanced television (ATV)
(DOLBY AC-3). MPEG a competitor in this area.
Differences with MPEG
DOLBY AC-1
DOLBY AC-2
DOLBY AC-3