Sound & Audio

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 37

Sound & Audio

Basics of Acoustics:
Sound

Sound is a form of energy similar to heat and light. Sound is generated from vibrating objects
and can flow through a material medium from one place to another. During generation the
kinetic energy of the vibrating body is converted to sound energy. Acoustic energy flowing
outwards from its point of generation can be compared to a spreading wave over the surface
of water. When an object starts vibrating or oscillating rapidly, a part of their kinetic energy
is imparted to the layer of the medium in contact with the object e.g. the air surrounding a
bell. The particles of the medium on receiving the energy starts vibrating on their own, and in
turn help to impart a portion of their energy to the next layer of air particles, which also starts
vibrating. This process continues thereby propagating the acoustic energy throughout the
medium. When it reaches our ears it sets the ear-drums into similar kind of vibration and our
brain recognizes this as sound.

Acoustics
Acoustics is the branch of science dealing with the study of sound and is concerned with the
generation, transmission and reception of sound waves. the application of acoustics in
technology is called acoustical engineering. The main sub- disciplines of acoustics are :
Aero-acoustics, bio-acoustics, biomedical-acoustics, psycho-acoustics, physical-acoustics,
speech communication, ultrasonics, musical – acoustics.

psycho – acoustics : psycho – acoustics, concerned with the hearing, perceptions and
localization of sound related to human beings.

Psycho-Acoustics
[taken from e-material]
In harnessing sound for various musical instruments as well as for multimedia applications,
the effects of sound on human hearing and the various factors involved needs to be analyzed.
Psycho-acoustics is the branch of acoustics which deals with these effects.

Nature of Sound Waves

As the sound energy flows through the material medium, it sets the layers of the medium into
oscillatory motion. This creates alternate regions of compression and expansion. This is
pictorially represented as a wave, the upper part(i.e., the crest or positive peak) denoting a
compression and the lower part(i.e., the trough or negative peak) denoting a rarefraction.
Since a sound wave actually represents a disturbance of the medium particles from original
position(i.e., before the wave started) it cannot exist in vacuum. Sound waves have two
characteristics properties. Firstly , they are said to be longitudinal waves, which means that
the direction of propagation of sound is the same as the direction along which the medium
particles oscillate. Secondly, sound waves are referred to as mechanical waves. This means
that they are capable of being compressed and expand like springs. When they are
compressed, the peaks come closer together, while on expansion the peaks move further
apart. On compression the frequency of sound increases and it appear more high pitched ,
while on expansion, the frequency decreases making it appear more dull and flat.

Spatial and Temporal waves


Waves can be of two types : Spatial and
temporal waves..

Spatial Waves represent the vibrating states of all particles in the path of a wave at an instant
of time. The horizontal axis represents the distance of all particles. Distance of separation
between points in the same phase is called the Wavelength. The particles at points O and D
have the same state of motion at that instant and are said to be at the Same Phase. The length
of the wave between O and D is called the Wavelength.

Temporal Waves represent the state of a single particle in the path of a wave over a period of
time. The horizontal axis represents the time period over which the wave flows. The time
elapsed between which the particle is in the same phase is called the Time Period. The state
of the particle is same at instants O and D, and the particle is said to have undergone one
Complete Cycle or Oscillation. The time interval between instants O and D is said to be Time
Period of the wave.
Fundamental Characteristics

A sound wave has three fundamental characteristics :


Amplitude of a wave is the maximum displacement of a particle in the path of a wave from
its mean position and is the peak height of the wave. The physical manifestation of amplitude
is the intensity of energy of the wave. For sound waves this corresponds to the loudness of
sound. Loudness is measured in a unit called decibel denoted by dB.

The second characteristic is Frequency. This measures the number of vibration of a particle
in the path of a wave, in one second. Higher is the frequency of the wave larger is the number
of oscillation per second. The physical manifestation of frequency of a sound wave is the
pitch of sound. As frequency of the sound increases, higher becomes the pitch and more
shriller becomes the sound. Frequency is measured in an unit called Hertz and denoted by
Hz. A sound of 1 Hz is produced by an object vibrating at the rate of 1 vibration per second.
The total range of human hearing lies between 20 Hz at the lower end to 20,000 Hz (or 20
KHz) at the higher end.
The Time period of a wave is the time taken to complete one complete oscillation Time
period is inversely proportional
to the frequency of the sound
wave.
The third characteristic is the Waveform. This is the actual shape of the wave when
represented pictorially. The physical manifestation is the quality or timbre of sound. This
helps us to distinguish between sounds coming from different instruments like guitar and
violin.

Dynamic range
The term dynamic range is used to mean the ratio of maximum amplitude of undistorted
sound in an audio equipment sound like microphone or loud speaker to the amplitude of the
quietest sound possible which is often determined by inherent noise characteristics of the
device. The term is often used to indicate the ratio of the maximum level of power, current or
voltage to the minimum detectable values. In music, dynamic range is used to mean the
difference between the quitest and loudest volume of an instrument. For digital audio, the
dynamic range is synonymous to the signal to noise ratio(SNR) and is expressed in db. It
can be shown that increasing the bit-depth of the digital audio by 1-bit results in its increase
in dynamic range by 6 dB approximately.

Musical Sound and Noise

Sounds pleasant to hear are called Musical and those unpleasant to our ears are called Noise.
Though quite subjective, musical sounds normally originate from periodic or regular
vibrations while noise generally originates from irregular or non-periodic vibrations. Musical
sounds most commonly originate from vibrating strings, like in guitars and violins, vibrating
plates, like in drums and tabla, and vibrating air columns, like in pipes and horns. In all these
cases periodic vibration is responsible for the musical sensation.

Types of Noise:
White noise: white noise is a signal that has the same energy or power for any frequency
value, i.e., constant power density. Since a signal physically cannot have power for all
frequencies(which would mean it has infinite energy content), a signal can be white noise
over a defined frequency range.
Other colors of noises are:
pink noise, red noise, green noise, blue noise, brown noise, black noise.

Tone and Note

A Tone is a sound having a single frequency. A tone can be represented pictorially by a


wavy curve called a Sinusoidal wave. A tone is produced when a tuning fork is struck with a
padded hammer. The kind of vibration associated with the generation of a tone is called
Simple Harmonic Motion. This is executed by a spring loaded weight tied by a flexible
string to a point moving around the circumference of a circle at constant speed.

In daily life we do not hear single


frequency tones. The sounds we
normally hear are a composite
mixture of various tones of varying
amplitudes and frequencies. Such a
composite sound is called a Note.
The tone parameters determine the
resultant of the note.
i.e., the wave form of a note can be derived from the resultant or sum of all its tonal
components. The lowest frequency of a note is called the fundamental frequency. All other
frequencies are called overtones. Frequency of some overtones may be integeral multiples of
the fundamental frequency, these are called harmonics. Thus, the tone having frequency
double that of the fundamental is called first harmonics, that having three times that of the
fundamental frequency is called second harmonics and so on. It has been observed that
presence of more harmonic content adds to the richness of sound, which is referred to as
harmonious sound. Two tones representing sinusoidal tone can together resultant in a
composite sound may have variable waveforms depending upon the phase difference
between the component sound.

Decibels

A unit for measuring loudness


of sound as perceived by the
human ear is Decibel. It
involves comparing the
intensity of a sound with the
faintest sound audible by the human ear and expressing the ratio as a logarithmic value.
The full range of human hearing is 120 decibels. Logarithms are designed for talking about
numbers of greatly different magnitude such as 56 vs. 7.2 billion. The most difficult problem
is getting the number of zeros right. We can use scientific notations like 5.6 X 10^1 and 7.2
X 10^9 but these are awkward to deal with. For convenience we find the ratio between the
two numbers and convert it to a logarithm. This gives us a number like 8.1. To avoid the
decimal we multiply the number by 10. If we measured one value as 56 HP (horse power - a
measure of power) and another as 7.2 billion HP, we say that one is 81 dB greater than the
other.

Power in dB = 10 log10 (power A / power B)


When speaking in the context of sound waves we can use the relation when comparing
the energy content of the waves . since the power or intensity of sound energy is
proportional to the square of the amplitude of the sound wave, thus

Power in dB = 20 log10 (amplitude A / amplitude B)

The usefulness of this becomes apparent when we think how our ear perceives loudness. The
softest audible sound has a power of about 10-12 watt/sq. meter and the loudest sound we can
hear (also known as threshold of pain ) is about 1 watt/sq. meter. giving a total range of 120
dB. Thus when we speak of a 60 dB sound we actually mean :

60 dB =10 * log10 (Energy content of the measured sound / Energy content of the
softest/faintest audible sound)
Thus, Energy content of measured sound = 106 * (Energy content of softest audible
sound)

Secondly our judgement of relative levels of loudness is somewhat logarithmic. If a sound


has 10 times more power than another, we hear it twice as loud. (Logarithm of 102 is equal to
2).
Most studies of psycho-acoustics deal with sensitivity and accuracy of human hearing. The
human ear can respond to a range of amplitudes. People's ability to judge pitch is quite
variable. Most subjects studied could match pitches to within 3%. Recognition in terms of
timbre is not very well studied but once we have learned to identify a particular timbre,
recognition is possible even if loudness and pitch are varied. We are able to perceive the
direction of sound source with some accuracy. Left, right and height information is
determined by the difference of sound in each ear. We can also understand whether the sound
source is moving away or towards us.

Threshold of Audibility : The faintest sound that can be heard by a normal human ear.
The energy content of this sound wave is 10-12
Threshold of pain : The loudest sound that can be tolerated by a normal humanear.
The energy content of this sound is 1 watt/sq.meter.
Power difference in decibal = 10 * log10 (power of the measured sound in watt/sq.meter /
power of softest/faintest audible sound in watt/sq.meter)

Masking

One of the most important findings from the study of psycho-acoustics is a phenomenon
called Masking which has wielded profound influence in later years of digital processing of
sound. Masking occurs due to the limitations of human ear in perceiving multiple sources of
sound simultaneously. When a large number of sound waves of similar frequencies are
present in the air at the same time then it is seen that higher volume or higher intensity
sounds apparently predominates over lower intensity sounds and 'masks' the latter out or
makes it inaudible. Thus even though the masked out sound actually exists yet we are unable
to perceive it as a separate source of sound. The higher intensity sound is called the Masker
and the lower intensity sound is called Masked. This phenomenon effective over a limited
range of frequencies beyond which masking may not be perceptible. This range of
frequencies is called the Critical Band. Though masking occurs as a result of limitations of
human ear, nevertheless modern sound engineers have converted it to advantage in designing
digital sound compressors. Software for compressing sound files in digital compressors
utilize the masking phenomenon to throw away irrelevant information from sound files in
order to reduce its size and storage space.

Temporal masking
Another related phenomenon called temporal masking occurs when tones are sound close in
time but not simultaneously. A louder tone occurring just before a softer tone masks the latter
inaudible. Temporal masking increases as time difference are reduced. Temporal masking
suggests that the brain probably integrates sound over a period of time and processes the
information as bursts.

Localization

Elementary Sound Systems


BASIC SOUND SYSTEMS

An elementary sound system consists of 3 main components : microphone, amplifier and


loudspeaker.
A microphone is a device for converting sound energy to electrical energy.
An amplifier is a device which boosts the electrical signals leaving the microphone in order
to drive the loudspeakers.
A loudspeaker is a device which converts electrical energy back into sound energy.

Microphones
A microphone records sound by converting the acoustic energy to electrical energy.
Sound pressure exists as patterns of air pressure. The microphone changes this information
into patterns of electrical current.

There are several characteristics that classify microphones :


One classification is based on how the microphone responds to the physical properties of a
sound wave (like pressure, gradient etc.). Another classification is based on the directional
properties of the microphone. A third classification is based on the mechanism by which the
microphone creates an electrical signal.[ as per e-material]

Based on the constructional features microphones may be of two types: moving coil
types and condenser type.

Moving Coil Microphones


In a moving-coil or dynamic microphone, sound waves cause movement of a thin metallic
diaphragm and an attached coil of wire. A magnet produces a magnetic field which surrounds
the coil. As sound impinges on the diaphragm attached to the coil, it causes movement of the
coil within the magnetic field. A current is therefore produced proportional to the intensity of
the sound hitting the diaphragm. Example : Shure Beta 57A dynamic mirophone, Shure
SM58 dynamic microphone

Condenser Microphone
Often called the capacitor or condenser microphone, here the diaphragm is actually the plate
of a capacitor. The incident sound on the diaphragm moves the plate thereby changing the
capacitance and generating a voltage. In a condenser microphone the diaphragm is mounted
close to but not touching a rigid back plate. A battery is connected to both pieces of metal
which produces an electrical potential or charge between them. The amount of charge is
determined by the voltage of the battery, the area of the diaphragm and back plate, and
distance between the two. This distance changes as the diaphragm moves in response to
sound. When distance changes current flows in the wire as
the battery maintains the correct charge. The amount of
current is proportional to the displacement of the
diaphragm. Example : Marshall Instrument MXL-600
condenser microphone, Shure SM87A condenser
microphone
A common variant of this design uses a material, usually a
kind of plastic, with a permanent charge on it. This is
called an Electrets microphone.
Based on the directional properties , microphones may be classified into three types:
omni directional, bi-directional and uni-directional.

Omni- directional Microphone [Pressure Microphones]

It consists of a pressure sensitive element contained in an enclosure open to air on one side.
Sound waves creates a pressure at the opening regardless of their direction of origin, the
pressure cause the diaphragm to vibrate. This vibration is translated to electrical signal
through either of the mechanism[by moving coil or condenser mechanism] The polar plot of
a microphone graphs the output of the microphone with equal sound levels being input into
the microphone at various angles around the microphone. The polar plot for an pressure
microphone is a circle. So desired sound and noise are picked off equally from all directions.
Thus it is also called an omni-directional microphone.
These are used to record sound coming from multiple sources, e.g., environmental sounds in
a wild –life video clip.

Bi-directional [Gradient
Microphones]

The diaphragm is open to air


on both sides so that the net
force on it is proportional to the
pressure difference. A sound
impinging upon the front of the
microphone creates a pressure at the front opening. A short time later the sound will travel to
the back of the microphone and enters the microphone through the rear opening (180°)
striking the diaphragm from the opposite side. How ever since the sound had to travel a
longer distance to reach the rear opening, it has dissipated more energy, and would be
striking the diaphragm with less force. The diaphragm would therefore be vibrating with
differential force. Sounds from the sides [90° and 270°] create identical pressure on both side
of the diaphragm and produce no resultant displacement. The polar response resembles the
figure 8. It has maximum response for sound from the openings and minimum response for
sound incident from the sides. Also known as bi-directional microphone. Ex: Microtech
Gefell UMT800 microphone.
Bi-directional microphone is sensitive to sounds coming from two directions: the front and
rear. It used to record two source of sound simultaneously ,e.g. conversation between two
persons on opposite sides of the table.

Uni-directional microphone [ Cardiod Microphones]


A Uni–directional microphone is designed to record sound from a single source, e.g. a single
individual speaking. Its construction is similar to that of the bi-directional one, with a single
exception. On the rear side of the microphone is a resistive material like foam or cloth near
the diaphragm. This tends to absorb some of energy of sound entering through the rear
opening. Sound produced at the front of the microphone strike the diaphragm directly from
the front while a part of the energy travels to the back, get reduced by the resistive material
and strikes the diaphragm with a smaller force from the opposite direction. The diaphragm
vibrates with the differential force and the microphone responds to the sound. When sound is
produced at the back of the microphone, the direct energy wave gets reduced by the resistive
material before striking the diaphragm from the back. A part of the original sound energy
traveling a longer distance to the front also gets reduced before striking the diaphragm from
the front in the opposite direction. The resistive material is designed in such a way that these
two reductions are almost equal with the net effect that two equal and opposing force striking
the diaphragm produces no vibration. The microphone therefore do not respond to any sound
coming from the rear. Sounds from the sides are cancelled out. Ex: Roland DR-20 Cardioid
Microphone.
Polar plot
The polar plot of a graph plotting the output level of the microphone against the angle at
which the incident sound is produced . By definition the omni-directional microphone
produces equal outputs for all angles of incidence. Hence, its polar plot is a circle. For bi-
directional microphone, the outputs are maximum for sounds coming from the front (0°) and
rear (180°). The out gradually decreases as the incident sound shifts from the front (and rear)
to the sides(90° and 270°). The polar plot therefore resembles the figure’8’ . For a
unidirectional microphone , the output is maximum at the front and minimum at the rear, and
decrease gradually from the front to the rear, resulting in a decreased but non – zero value at
the sides. The polar plot is heart shaped due to which the microphone are also called as
‘cardiod’ microphones.

Microphone Specification
The most important factors in choosing a particular type of microphone is based on how it
picks up sound for the required application. In this respect the following issue should be
considered:

1. Sensitivity

2. Overload characteristics

3. Distortion
4. Frequency Response

5. Noise

6. Condenser Vs Dynamic
Comparison of Microphone

Amplifiers and Loudspeakers

Amplifier is a device in which a varying input signal controls a flow of energy to produce an
output signal that varies in the same way but has a larger amplitude. The input signal may be
a current, a voltage, a mechanical motion, or any other signal, and the output signal is usually
of the same nature. The ratio of the output voltage to the input voltage is called the voltage
gain. The most common types of amplifiers are electronic and use a series of Transistors as
their principal components. In most cases, the transistors are incorporated into integrated
circuit chips. Amplifier circuits are designed as A,B,AB and C for analogue design and D
and E for digital designs. Ex: Kenwood KA-5090R Stereo integrated Amplifier
Loudspeakers converts electrical energy back to acoustic energy. A cone made of paper or
fiber, known as the diaphragm, is attached to a coil of wire, kept near permanent magnet.
When current from source system is passed through the coil, a magnetic field is generated
around the coil. This field interacts with the magnetic field of the permanent magnet
generating vibrating forces which oscillates the magnetic diaphragm. The diaphragm
oscillates in the same frequency as the original electrical signal and therefore reproduces the
same sounds which had been used to encode the signal in the first place. All these
components are enclosed in a container which additionally include a suspension system
which provide lateral stability to the vibrating components.
An important criteria is an even response for all frequencies. However the requirements for
good high and low frequency response conflicts with each other.

Thus there are separate units called woofer, midrange and tweeter for reproducing sound of
different frequencies.
Woofer : 20 Hz to 400 Hz Midrange : 400 Hz to 4 KHz Tweeter : 4KHz to 20 KHz

Audio Mixer
In a studio the stereo sound is produced artificially by placing individual microphone for
individual instruments. Each of the signals generated is called a Track. A device called an
audio mixer is used to record these individual tracks and edit them separately. Each of these
tracks a number Audio mixer consists number of controls for adjusting the volume,
tempo(speed of playback), mute , etc. for each individual tracks. Using these controls each
separate track of sound, e.g., guitar track, piano track, voice track, etc. could be edited for
adjusting the overall volume and tempo of the audio, as well as for providing special effects
like chorus , echo, reverb(multiple echo), panning. Finally all these tracks are combimned
into two channels (for stereo sound) or multiple channels (for surround sound).

Graphics Equalizer

Digitisation of sound
Analog Representations

An analog quantity is a physical value that varies continuously over space and/or time. It can
be described by mathematical functions of the type s=f(t), s=f(x,y,z) or s=f(x,y,z,t). Physical
phenomena that stimulate human senses like light and sound can be thought of as continuous
waves of energy in space. Continuity implies that there is no gap in the energy stream at any
point. These phenomena can be measured by instruments which transform the captured
physical variable into another space/time dependent quantity called a signal. If the signal is
also continuous we say that it is analogous to the measured variable. The instruments are
called sensors and the signals usually take the form of electrical signals. For example, a
microphone converts the environmental sound energy into electrical signals and a solar cell
converts the radiant energy (light and heat) from the sun into electrical signals.

Analog signals have two essential properties:


• The signal delivered by the capturing instrument may take any possible value within
the limits of the instrument. Thus the value can be expressed by any real number in
the available range. Analog signals are thus said to be amplitude continuous.
• The value of the analog signal can be determined for any possible value of time or
space variable. Analog signals are therefore also said to be time or space continuous.

Digital Representations

In contrast to analog signals, digital signals are not continuous over space or time. They are
discrete in nature which means that they exist or have values only at certain points in space
or instants in time, but not at other points or instants. To use a personal computer to create
multimedia presentations, all media components have to be converted to the digital form
because that is the form the computer recognizes and can work with.

Analog to Digital Conversions

The transformation from analog to digital form requires three successive steps : Sampling,
Quantization and Code-word generation.

Sampling

Sampling involves examining the values of the continuous analog signal at certain points in
time and thereby isolate a discrete set of values from the continuous suite of values.
Sampling is usually done at periodic time or space intervals. For time-dependant quantities
like sound, sampling is done at specific intervals of time and is said to create time-
discretization of the signal. For time-independent quantities like a static image, sampling is
done at regular space intervals (i.e. along the length and breadth of the image) and is said to
create space-dicretization of the signal.
The figure illustrates the sampling process. For every clock pulse the instantaneous value of
the analog waveform is read thus yielding a series of sampled values. The sampling clock
frequency is referred to as sampling rate. For a static image, sampling rate would be
measured in the spatial domain i.e. along the length and width of the image area and would
actually denote the pixel resolution, while for a time-varying medium like sound, it denotes
how many times per second the analog wave is sampled and measured in Hertz. Since the
input analog signal is continuous, the value change over space or time. The A/D conversion
process takes a finite time to complete hence the input analog signal must beheld constant
during the conversion process to avoid conversion problems. This is done by a sample-and-
hold circuit.

Quantization

This process consists of converting a sampled signal into a signal which can take only a
limited number of values. Quantization is also called amplitude-discretization. To illustrate
this process consider an analog electrical signal whose value varies in a continuous way
between 0 mV and +255 mV. Sampling of the signal creates a set of discrete values, which
can have any value within the specified range, say a thousand different values. For quantizing
the signal we need to fix the total number of values permissible. Suppose we decide that we
will consider only 256 of the thousand sampled values that adequately represents the total
range of sampled values i.e. from the minimum to the maximum. This enables us to create a
binary representation of each of the considered values. We can now assign a fixed number of
bits to represent the 256 values considered. Since we know that n binary digits can give rise
to 2^n numbers, so a total of 8 bits would be sufficient to represent the 256 values. The
number of bits is referred to as the bit-depth of the quantized signal. (Incidentally, we could
have considered all the thousand values, but that would have required a larger number of bits
and corresponding more computing resource, but more of that later on).

Code-word Generation
This process consists of associating a group of binary digits called a code-word to every
quantized value. In the above example, the 256 permissible values will be allocated values
from 00000000 for the minimum value to 11111111 for the maximum value. Each binary
value actually represents the amplitude of the original analog signal at a particular point or
instant, but between two such points the amplitude value is lost. This explains how a
continuous signal is converted into a discrete signal.

The whole process of sampling operation followed by quantization and code word generation
is called digitization. The result is a sequence of values coded in binary format. Physically
an analog signal is digitized by passing it through a electronic chip called an Analog-to-
Digital Converter (ADC).

Digital to Analog Conversion

The digital form of representation is useful inside a computer for storage and manipulation.
Since humans only react to physical sensory stimuli, playback of the stored media requires a
conversion back to the analog form. Our eyes and ears can only sense the physical light and
sound energies, which are analog in nature, not the digital quantities stored inside a
computer. Hence the discrete set of binary values need to be converted back to the analog
form during playback. For example a digital audio file needs to be converted to the analog
form and played back using a speaker for it to be perceived by the human ear. A reverse
process to that explained above is followed for this conversion. Physically this is done by
passing the digital signal through another electronic chip called a Digital-to-Analog
Converter (DAC).

Relation between Sampling Rate and Bit Depth

As we increase the sampling rate we get more information about the analog wave. So the
resultant digital wave would be a more accurate representation of the analog wave. However
increasing the sampling rate also implies we have more data to store and thus require more
space. In terms of resources this implies more disk space and RAM and hence greater will be
the cost involved. Increasing the number of samples per second also means we require more
numbers to represent them. Hence we require a greater bit depth. If we use a lower bit depth
than is required we will not be able to represent all the sample values. Hence the advantage
of using a higher sampling rate will be lost. On the other hand if we use a lower sampling
rate we will get a lesser amount of information regarding the analog wave. So the digital
sound will not be an accurate representation of the analog wave. Now if we use a high bit
depth, we will have provisions for representing a large number of samples per second.
Because of the larger number of bits the size of the sound file will be quite large but because
of the low number of samples the quality will be degraded as compared to the original analog
wave.
Quantization Error

No matter what the choice of bit depth digitization can never perfectly encode a continuous
analog signal. An analog waveform has an infinite number of amplitude values but a
quantizer has a finite number of intervals. All the analog values between two intervals can
only be represented by the single number assigned to that interval. Thus the quantized value
is only an approximation of the actual. For example suppose the binary number 101000
corresponds to the analog value of 1.4 V, and 101001 corresponds to 1.5 V and the analog
value at sample time is 1.45 V. Because 1010001/2 is not available the quantizer must round
up to 101001 or down to 101000. Either way there will be an error with a magnitude of one-
half of an interval.
Quantization error (e) is the difference between the actual analog value at sample time and
the quantized value, as shown below. Let us consider an analog waveform which is sampled
at a, b and c and the corresponding sample values are A, B and C. Considering the portion
between A and B, the actual value of the signal at some point x after a is xX but value of the
digital output is fixed at xm. Thus there is an error equal to the length mX. Similarly at point
y, actual value of analog signal is yY but digital output is fixed at yn. Thus error increases to
nY. This continues for all points between a and b until just before b for an actual value of
almost bB, we get a sampled value still fixed at bp. The error is maximum at this point and
equals to pB which is also almost equal to the height of one step. This maximum error is the
quantization error, denoted by e and is equal to one step size of the digital output.

Because of quantization error there is always a distortion of the


wave when represented digitally. This distortion effect is physically
manifested as noise. Noise is any unwanted signal that creeps in
along with the required signal. To eliminate noise fully during
digitization we must sample at an infinite rate which is practically
impossible. Hence we must find out other ways to reduce the
effects of noise. Other than quantization error, noise may also
percolate in from the environment, as well as the electrical
equipment used for digitization.

In characterizing digital hardware performance we can determine the ratio of the maximum
expressible signal amplitude to the maximum noise amplitude. This determines the S/N
(signal to noise) ratio of the system. It can be shown that the S/N ratio expressed in decibels
varies as 6 times the bit-depth. Thus increasing bit-depth during sampling leads to the
reduction of noise. To remove environmental noise we need to use good quality microphones
and sound proof recording studios. Noises generated from electrical wires may be reduced by
proper shielding and earthing of the cables. After digitization noise can also be removed by
using sound editing software. These employ noise filters to identify and selectively remove
noise from the digital audio file.

Each sample value needs to be held constant by a hold circuit until the next sample value is
obtained. Thus the maximum difference between the sample value and the actual value of the
analog wave is equal to the height of one step. If be the peak to peak height of the wave
and be the bit depth, then number of steps is . Height of each step is which is equal

to the quantization error . Thus we have the relation: .


Signal to noise ratio
Expressed in decibels the SNR is seen to be directly proportional to the bit-depth:

This implies that if bit-depth is increased by 1, during digitization, the signal to noise ratio
increases by 6 dB.

Importance of Digital Representation

The key advantage of the digital representation lies in the universality of representation.
Since any medium, be it text or image or sound is coded in a unique form which ultimately
results in a sequence of bits, all kinds of information can be handled in the same way.

The following advantages are also evident:

Storage : The same digital storage device, like memory chips, hard disks, floppies and CD-
ROMs, can be used for all media.

Transmission : Any single communication network capable of supporting digital


transmission has the potential to transmit any multimedia information. Digital signals are less
sensitive to noise than analog signals. Attenuation of digital signals are lesser. Error detection
and correction can be implemented. The encryption of the information is possible to maintain
confidentiality.

Processing : Powerful software programs can be used to analyze, modify, alter and
manipulate multimedia data in a variety of ways. This is probably where the potential is the
highest. The quality of the information may also be improved by removal of noises and
errors. This capability enables us to digitally restore old photographs or noisy audio
recordings.

Drawbacks of Digital Representation

The major drawback lies in the coding distortion. The process of first sampling and then
quantizing and coding the sampled values introduces distortions. Also since a continuous
signal is broken into a discrete form, a part of the signal is actually lost and cannot be
recovered. As a result the signal generated after digital to analog conversion and presented to
the end user has little chance of being completely identical to the original signal.

Another consequence is the requirement of large digital storage capacity required for storing
image, sound and video. Each minute of CD-quality stereo sound requires 10 MB of data and
each minute of full screen digital video fills up over 1 GB of storage space. Fortunately
compression algorithms have been developed to alleviate the problem to a certain extent.
NOTES TAKEN FROM CD

Early Sound Storage

A/D and D/A Converter

Analog Vs Digital Format

Analog format
Digital format

Sampling
Sampling rate

Sampling Resolution
PCM- pulse Code Modulation
The process of converting an analog signal into a digital signal is called Pulse Code
Modulation or PCM and involves sampling. We use electronic circuits to store sampled
values as an electrical signal and then hold the signal constant until the next sampled value.
What is PCM

Effects of Sampling Parameter


As sampling rate is increased we obtain more data about the input signal and the
output signal become a closer approximation of the input signal. To accommodate the
larger number of values due to increased rate, resolution must be increased by
increasing the number of bits.

Nyquist’s Sampling Theory


Aliasing

Over Sampling
When sampling is done at much higher rate than that prescribed by the theorem, it is called
an over sampling. Althogh over sampling can generate a high quality digital signal, it can
unnecessarily increase the file size.
Practical sampling Frequencies
To handle the full 20KHz range of human hearing , practical sampling systems use
frequencies of 44- 48 KHz. However depending on audio content sampling may also be done
at lower rate e.g., to reproduce human speech sampling needs to be done at 11KHz.

Case Study
Here we take a look at some of the case studies for obtaining the digital output waves by
using various range of sampling rates and sampling resolutions. The actual values should be
chosen keeping in the mind the comprise between cost and quality
Low rate, low resolution

Low rate, High resolution


High rate, low resolution

High rate, High resolution

Bit Rate & File Size

File size calculation : (sampling rate in Hz)x(sampling resolution in bits)x(No. of


channels)x(Duration of clips in seconds) bits.
Which is divided by (8x1024) for conversion to KB(Kilo Bytes)

Benefits of digital representation of Sound


1. Usable in multimedia application
2. Easier data manipulation

3.Possibility of compressing data

4.Copies without generation loss

5. Greater durability of data


6.Possibility of synthetic sound

7. Possibilty of upgradation

Electronic Music & Synthesizer

SYNTHESIZERS

Synthesizers are electronic instruments which allow us to generate digital samples of sounds
of various instruments synthetically i.e. without the actual instrument being present.
The core of a synthesizer is a special purpose chip or IC which has the capability of
generating the appropriate signals for producing sound. The sound may be recording of
actual sounds or simulation of actual sound through mathematical techniques.
The sound produced can be modified by additional hardware components for changing its
loudness, pitch etc.

Synthesizer Basics

Synthesizers can be broadly classified into two categories : FM Synthesizers generate sound
by combining elementary sinusoidal tones to build up a note having the desired waveform.
Earlier generation synthesizers were generally of FM type, the sounds of which lacked the
depth of real-world sounds. Wavetable Synthesizers, created later on, produced sound by
retrieving high-quality digital recordings of actual instruments from memory and playing
them on demand. Modern synthesizers are generally of wavetable type. The sounds
associated with synthesizers are called patches, and the collection of all patches is called the
Patch Map. Each sound in a patch map must have a unique ID number to identify it during
playback. The audio channel of a synthesizer is divided into 16 logical channel, each of
which is capable of playing a separate instrument.
FM Synthesis

Wave Table Synthesis

Characteristics of a Synthesizer
• Polyphony : A synthesizer refers to polyphony if it has ability to play more than one
note at a time. polyphony is generally measured or specified as a number of notes or
voice.
• Multitimbral : A synthesizer is said to be multitimbral if it is capable of producing
two or more different instrument sounds simultaneously

Each physical channel of the synthesizer is divided into 16 logical channels. Omni mode
sgnifies all 16 channel are capable of receving data simultaneously. Polyphony means several
instruments can play simultaneously in each logical channel.
MUSICAL INSTRUMENT DIGITAL INTERFACE (MIDI)

What is MIDI
The Musical Instrument Digital Interface (MIDI) is a protocol or set of rules for connecting
digital synthesizers to each other or to a computers. Much on the same way that two
computers communicate via modems, two synthesizers communicate via MIDI. The
information exchanged between two MIDI devices is musical in nature. MIDI information
tells a synthesizer, in its most basic mode, when to start and stop playing a specific note, I
any. MIDI information can also be more hardware specific . It can tell a synthesizer to
change sounds, master volume, modulation devices, and even how to receive information. In
more advanced uses, MIDI information can be used to indicate the starting and end point of a
song. More recent application include using the interface between computers and synthesizer
on the computer. MIDI standard defined a protocol in which the keys instead of producing
sound directly, produced data in the form of instructions which can be stored and edited in a
personal computer before being played as sound.

MIDI Specification
The MIDI specification/protocol has three portions: the hardware standards which defined
rules for connecting a musical instrument to a computer, the message standards which
defined the format for exchanging data between the instruments and the computer, and the
file format in which this data can be stored in a computer and playback.

Hardware
The MIDI hardware has 3 basic components : the keyboard which is played as instrument
and translates key notes into MIDI data, the Sequencer which allow MIDI data to be
captured, edited and replayed, and Sound Module which translates the MIDI data to sound.
Messages/ Protocol

Channel Message
Channel Message are instructions for specific channel and contain data for the actual key
notes. The status byte contain the channel number and the function, which is followed by one
or two data bytes with additional parameters like note number, velocity.
System message

File Format

The MIDI specifications made provisions to save synthesizer audio in a separate file format
called MIDI files. MIDI files are totally different from normal digital audio files (like WAV
files) in that they do not contain the audio data at all, but rather the instructions on how to
play the sound. These instructions act on the synthesizer chips to produce the actual sound.
Because of this, MIDI files are extremely compact as compared to WAV files. They also
have another advantage that the music in a MIDI file can easily be changed by modifying the
instructions using appropriate software.
General MIDI (GM) Specification
SOUND CARD ARCHITECTURE
The sound card is an expansion board in your multimedia PC which interfaces with the CPU
via slots on the mother-board. Externally it is connected to speakers for playback of sound.
Other than playback the sound card is also responsible for digitizing, recording and
compressing the sound files.

Basic Components

The basic internal components of the sound card include :

SIMM Banks : Local memory of the sound card for storing audio data during digitization
and playback of sound files.
DSP : The digital signal processor which is the main processor of the sound card and
coordinates the activities of all other components. It also compresses the data so that it takes
up less space.
DAC/ADC : The digital-to-analog and analog-to-digital converters for digitizing analog
sound and reconverting digital sound files to analog form for playback.
WaveTable/FM Synthesizers : For generating sound on instructions from MIDI messages.
The wavetable chip has a set of pre-recorded digital sounds while the FM chip generates the
sound by combining elementary tones.
CD Interface : Internal connection between the CD drive of the PC and the sound card.
16-bit ISA connector : Interface for exchanging audio data between the CPU and sound
card.
Amplifier : For amplification of the analog signals from the DAC before being sent to the
speakers for playback.

The external ports of the sound card include :

Line Out : Output port for connecting to external recording devices like a cassette player or
an external amplifier.
MIC : Input port for feeding audio data to the sound card through a microphone connected to
it.
Line In : Input port for feeding audio data from external CD/cassette players for recording or
playback.
Speaker Out : Output port for attaching speakers for playback of sound files.
MIDI : Input port for interfacing with an external synthesizer.

Source : www.pctechguide.com

Processing Audio Files

WAV files

From the microphone or audio CD player a sound card receives a sound as an analog signal.
The signals go to an ADC chip which converts the analog signal to digital data. The ADC
sends the binary data to the DSP, which typically compresses the data so that it takes up less
space. The DSP then sends the data to the PC’s main processor which in turn sends the data
to the hard drive to be stored. To play a recorded sound the CPU fetches the file containing
the compressed data and sends the data to the DSP. The DSP decompresses the data and
sends it to the DAC chip which converts the data to a time varying electrical signal. The
analog signal is amplified and fed to the speakers for playback.

MIDI files
The MIDI instruments connected to the sound card via the external MIDI port, or the MIDI
files on the hard disk retrieved by the CPU, instructs the DSP which sounds to play and how
to play them, using the standard MIDI instruction set. The DSP then either fetches the actual
sound from a wavetable synthesizer chip or instructs an FM synthesizer chip to generate the
sound by combining elementary sinusoidal tones. The digital sound is then sent to the DAC
to be converted to analog form and routed to the speakers for playback.

File Formats

Wave (Microsoft) File (.WAV) : This is the format for sampled sounds defined by
Microsoft for use with Windows. It is an expandable format which supports multiple data
formats and compression schemes.
Macintosh AIFF (.AIF/ .SND) : This format is used on the Apple Macintosh to save sound
data files. An .AIFF file is best when transferring files between the PC and the Mac using a
network.
RealMedia (.RM/.RA) : These are compressed formats designed for real-time audio and
video streaming over the Internet.
MIDI (.MID) : Text files containing instructions on how to generate music. The actual music
is generated from digital synthesizer chips.
Sun JAVA Audio (.AU) : Only audio format supported by JAVA Applets on the Internet.
MPEG -1 Level 3 (.MP3) : Highly compressed audio files providing almost CD-quality
sound.

You might also like