Sound & Audio
Sound & Audio
Sound & Audio
Basics of Acoustics:
Sound
Sound is a form of energy similar to heat and light. Sound is generated from vibrating objects
and can flow through a material medium from one place to another. During generation the
kinetic energy of the vibrating body is converted to sound energy. Acoustic energy flowing
outwards from its point of generation can be compared to a spreading wave over the surface
of water. When an object starts vibrating or oscillating rapidly, a part of their kinetic energy
is imparted to the layer of the medium in contact with the object e.g. the air surrounding a
bell. The particles of the medium on receiving the energy starts vibrating on their own, and in
turn help to impart a portion of their energy to the next layer of air particles, which also starts
vibrating. This process continues thereby propagating the acoustic energy throughout the
medium. When it reaches our ears it sets the ear-drums into similar kind of vibration and our
brain recognizes this as sound.
Acoustics
Acoustics is the branch of science dealing with the study of sound and is concerned with the
generation, transmission and reception of sound waves. the application of acoustics in
technology is called acoustical engineering. The main sub- disciplines of acoustics are :
Aero-acoustics, bio-acoustics, biomedical-acoustics, psycho-acoustics, physical-acoustics,
speech communication, ultrasonics, musical – acoustics.
psycho – acoustics : psycho – acoustics, concerned with the hearing, perceptions and
localization of sound related to human beings.
Psycho-Acoustics
[taken from e-material]
In harnessing sound for various musical instruments as well as for multimedia applications,
the effects of sound on human hearing and the various factors involved needs to be analyzed.
Psycho-acoustics is the branch of acoustics which deals with these effects.
As the sound energy flows through the material medium, it sets the layers of the medium into
oscillatory motion. This creates alternate regions of compression and expansion. This is
pictorially represented as a wave, the upper part(i.e., the crest or positive peak) denoting a
compression and the lower part(i.e., the trough or negative peak) denoting a rarefraction.
Since a sound wave actually represents a disturbance of the medium particles from original
position(i.e., before the wave started) it cannot exist in vacuum. Sound waves have two
characteristics properties. Firstly , they are said to be longitudinal waves, which means that
the direction of propagation of sound is the same as the direction along which the medium
particles oscillate. Secondly, sound waves are referred to as mechanical waves. This means
that they are capable of being compressed and expand like springs. When they are
compressed, the peaks come closer together, while on expansion the peaks move further
apart. On compression the frequency of sound increases and it appear more high pitched ,
while on expansion, the frequency decreases making it appear more dull and flat.
Spatial Waves represent the vibrating states of all particles in the path of a wave at an instant
of time. The horizontal axis represents the distance of all particles. Distance of separation
between points in the same phase is called the Wavelength. The particles at points O and D
have the same state of motion at that instant and are said to be at the Same Phase. The length
of the wave between O and D is called the Wavelength.
Temporal Waves represent the state of a single particle in the path of a wave over a period of
time. The horizontal axis represents the time period over which the wave flows. The time
elapsed between which the particle is in the same phase is called the Time Period. The state
of the particle is same at instants O and D, and the particle is said to have undergone one
Complete Cycle or Oscillation. The time interval between instants O and D is said to be Time
Period of the wave.
Fundamental Characteristics
The second characteristic is Frequency. This measures the number of vibration of a particle
in the path of a wave, in one second. Higher is the frequency of the wave larger is the number
of oscillation per second. The physical manifestation of frequency of a sound wave is the
pitch of sound. As frequency of the sound increases, higher becomes the pitch and more
shriller becomes the sound. Frequency is measured in an unit called Hertz and denoted by
Hz. A sound of 1 Hz is produced by an object vibrating at the rate of 1 vibration per second.
The total range of human hearing lies between 20 Hz at the lower end to 20,000 Hz (or 20
KHz) at the higher end.
The Time period of a wave is the time taken to complete one complete oscillation Time
period is inversely proportional
to the frequency of the sound
wave.
The third characteristic is the Waveform. This is the actual shape of the wave when
represented pictorially. The physical manifestation is the quality or timbre of sound. This
helps us to distinguish between sounds coming from different instruments like guitar and
violin.
Dynamic range
The term dynamic range is used to mean the ratio of maximum amplitude of undistorted
sound in an audio equipment sound like microphone or loud speaker to the amplitude of the
quietest sound possible which is often determined by inherent noise characteristics of the
device. The term is often used to indicate the ratio of the maximum level of power, current or
voltage to the minimum detectable values. In music, dynamic range is used to mean the
difference between the quitest and loudest volume of an instrument. For digital audio, the
dynamic range is synonymous to the signal to noise ratio(SNR) and is expressed in db. It
can be shown that increasing the bit-depth of the digital audio by 1-bit results in its increase
in dynamic range by 6 dB approximately.
Sounds pleasant to hear are called Musical and those unpleasant to our ears are called Noise.
Though quite subjective, musical sounds normally originate from periodic or regular
vibrations while noise generally originates from irregular or non-periodic vibrations. Musical
sounds most commonly originate from vibrating strings, like in guitars and violins, vibrating
plates, like in drums and tabla, and vibrating air columns, like in pipes and horns. In all these
cases periodic vibration is responsible for the musical sensation.
Types of Noise:
White noise: white noise is a signal that has the same energy or power for any frequency
value, i.e., constant power density. Since a signal physically cannot have power for all
frequencies(which would mean it has infinite energy content), a signal can be white noise
over a defined frequency range.
Other colors of noises are:
pink noise, red noise, green noise, blue noise, brown noise, black noise.
Decibels
The usefulness of this becomes apparent when we think how our ear perceives loudness. The
softest audible sound has a power of about 10-12 watt/sq. meter and the loudest sound we can
hear (also known as threshold of pain ) is about 1 watt/sq. meter. giving a total range of 120
dB. Thus when we speak of a 60 dB sound we actually mean :
60 dB =10 * log10 (Energy content of the measured sound / Energy content of the
softest/faintest audible sound)
Thus, Energy content of measured sound = 106 * (Energy content of softest audible
sound)
Threshold of Audibility : The faintest sound that can be heard by a normal human ear.
The energy content of this sound wave is 10-12
Threshold of pain : The loudest sound that can be tolerated by a normal humanear.
The energy content of this sound is 1 watt/sq.meter.
Power difference in decibal = 10 * log10 (power of the measured sound in watt/sq.meter /
power of softest/faintest audible sound in watt/sq.meter)
Masking
One of the most important findings from the study of psycho-acoustics is a phenomenon
called Masking which has wielded profound influence in later years of digital processing of
sound. Masking occurs due to the limitations of human ear in perceiving multiple sources of
sound simultaneously. When a large number of sound waves of similar frequencies are
present in the air at the same time then it is seen that higher volume or higher intensity
sounds apparently predominates over lower intensity sounds and 'masks' the latter out or
makes it inaudible. Thus even though the masked out sound actually exists yet we are unable
to perceive it as a separate source of sound. The higher intensity sound is called the Masker
and the lower intensity sound is called Masked. This phenomenon effective over a limited
range of frequencies beyond which masking may not be perceptible. This range of
frequencies is called the Critical Band. Though masking occurs as a result of limitations of
human ear, nevertheless modern sound engineers have converted it to advantage in designing
digital sound compressors. Software for compressing sound files in digital compressors
utilize the masking phenomenon to throw away irrelevant information from sound files in
order to reduce its size and storage space.
Temporal masking
Another related phenomenon called temporal masking occurs when tones are sound close in
time but not simultaneously. A louder tone occurring just before a softer tone masks the latter
inaudible. Temporal masking increases as time difference are reduced. Temporal masking
suggests that the brain probably integrates sound over a period of time and processes the
information as bursts.
Localization
Microphones
A microphone records sound by converting the acoustic energy to electrical energy.
Sound pressure exists as patterns of air pressure. The microphone changes this information
into patterns of electrical current.
Based on the constructional features microphones may be of two types: moving coil
types and condenser type.
Condenser Microphone
Often called the capacitor or condenser microphone, here the diaphragm is actually the plate
of a capacitor. The incident sound on the diaphragm moves the plate thereby changing the
capacitance and generating a voltage. In a condenser microphone the diaphragm is mounted
close to but not touching a rigid back plate. A battery is connected to both pieces of metal
which produces an electrical potential or charge between them. The amount of charge is
determined by the voltage of the battery, the area of the diaphragm and back plate, and
distance between the two. This distance changes as the diaphragm moves in response to
sound. When distance changes current flows in the wire as
the battery maintains the correct charge. The amount of
current is proportional to the displacement of the
diaphragm. Example : Marshall Instrument MXL-600
condenser microphone, Shure SM87A condenser
microphone
A common variant of this design uses a material, usually a
kind of plastic, with a permanent charge on it. This is
called an Electrets microphone.
Based on the directional properties , microphones may be classified into three types:
omni directional, bi-directional and uni-directional.
It consists of a pressure sensitive element contained in an enclosure open to air on one side.
Sound waves creates a pressure at the opening regardless of their direction of origin, the
pressure cause the diaphragm to vibrate. This vibration is translated to electrical signal
through either of the mechanism[by moving coil or condenser mechanism] The polar plot of
a microphone graphs the output of the microphone with equal sound levels being input into
the microphone at various angles around the microphone. The polar plot for an pressure
microphone is a circle. So desired sound and noise are picked off equally from all directions.
Thus it is also called an omni-directional microphone.
These are used to record sound coming from multiple sources, e.g., environmental sounds in
a wild –life video clip.
Bi-directional [Gradient
Microphones]
Microphone Specification
The most important factors in choosing a particular type of microphone is based on how it
picks up sound for the required application. In this respect the following issue should be
considered:
1. Sensitivity
2. Overload characteristics
3. Distortion
4. Frequency Response
5. Noise
6. Condenser Vs Dynamic
Comparison of Microphone
Amplifier is a device in which a varying input signal controls a flow of energy to produce an
output signal that varies in the same way but has a larger amplitude. The input signal may be
a current, a voltage, a mechanical motion, or any other signal, and the output signal is usually
of the same nature. The ratio of the output voltage to the input voltage is called the voltage
gain. The most common types of amplifiers are electronic and use a series of Transistors as
their principal components. In most cases, the transistors are incorporated into integrated
circuit chips. Amplifier circuits are designed as A,B,AB and C for analogue design and D
and E for digital designs. Ex: Kenwood KA-5090R Stereo integrated Amplifier
Loudspeakers converts electrical energy back to acoustic energy. A cone made of paper or
fiber, known as the diaphragm, is attached to a coil of wire, kept near permanent magnet.
When current from source system is passed through the coil, a magnetic field is generated
around the coil. This field interacts with the magnetic field of the permanent magnet
generating vibrating forces which oscillates the magnetic diaphragm. The diaphragm
oscillates in the same frequency as the original electrical signal and therefore reproduces the
same sounds which had been used to encode the signal in the first place. All these
components are enclosed in a container which additionally include a suspension system
which provide lateral stability to the vibrating components.
An important criteria is an even response for all frequencies. However the requirements for
good high and low frequency response conflicts with each other.
Thus there are separate units called woofer, midrange and tweeter for reproducing sound of
different frequencies.
Woofer : 20 Hz to 400 Hz Midrange : 400 Hz to 4 KHz Tweeter : 4KHz to 20 KHz
Audio Mixer
In a studio the stereo sound is produced artificially by placing individual microphone for
individual instruments. Each of the signals generated is called a Track. A device called an
audio mixer is used to record these individual tracks and edit them separately. Each of these
tracks a number Audio mixer consists number of controls for adjusting the volume,
tempo(speed of playback), mute , etc. for each individual tracks. Using these controls each
separate track of sound, e.g., guitar track, piano track, voice track, etc. could be edited for
adjusting the overall volume and tempo of the audio, as well as for providing special effects
like chorus , echo, reverb(multiple echo), panning. Finally all these tracks are combimned
into two channels (for stereo sound) or multiple channels (for surround sound).
Graphics Equalizer
Digitisation of sound
Analog Representations
An analog quantity is a physical value that varies continuously over space and/or time. It can
be described by mathematical functions of the type s=f(t), s=f(x,y,z) or s=f(x,y,z,t). Physical
phenomena that stimulate human senses like light and sound can be thought of as continuous
waves of energy in space. Continuity implies that there is no gap in the energy stream at any
point. These phenomena can be measured by instruments which transform the captured
physical variable into another space/time dependent quantity called a signal. If the signal is
also continuous we say that it is analogous to the measured variable. The instruments are
called sensors and the signals usually take the form of electrical signals. For example, a
microphone converts the environmental sound energy into electrical signals and a solar cell
converts the radiant energy (light and heat) from the sun into electrical signals.
Digital Representations
In contrast to analog signals, digital signals are not continuous over space or time. They are
discrete in nature which means that they exist or have values only at certain points in space
or instants in time, but not at other points or instants. To use a personal computer to create
multimedia presentations, all media components have to be converted to the digital form
because that is the form the computer recognizes and can work with.
The transformation from analog to digital form requires three successive steps : Sampling,
Quantization and Code-word generation.
Sampling
Sampling involves examining the values of the continuous analog signal at certain points in
time and thereby isolate a discrete set of values from the continuous suite of values.
Sampling is usually done at periodic time or space intervals. For time-dependant quantities
like sound, sampling is done at specific intervals of time and is said to create time-
discretization of the signal. For time-independent quantities like a static image, sampling is
done at regular space intervals (i.e. along the length and breadth of the image) and is said to
create space-dicretization of the signal.
The figure illustrates the sampling process. For every clock pulse the instantaneous value of
the analog waveform is read thus yielding a series of sampled values. The sampling clock
frequency is referred to as sampling rate. For a static image, sampling rate would be
measured in the spatial domain i.e. along the length and width of the image area and would
actually denote the pixel resolution, while for a time-varying medium like sound, it denotes
how many times per second the analog wave is sampled and measured in Hertz. Since the
input analog signal is continuous, the value change over space or time. The A/D conversion
process takes a finite time to complete hence the input analog signal must beheld constant
during the conversion process to avoid conversion problems. This is done by a sample-and-
hold circuit.
Quantization
This process consists of converting a sampled signal into a signal which can take only a
limited number of values. Quantization is also called amplitude-discretization. To illustrate
this process consider an analog electrical signal whose value varies in a continuous way
between 0 mV and +255 mV. Sampling of the signal creates a set of discrete values, which
can have any value within the specified range, say a thousand different values. For quantizing
the signal we need to fix the total number of values permissible. Suppose we decide that we
will consider only 256 of the thousand sampled values that adequately represents the total
range of sampled values i.e. from the minimum to the maximum. This enables us to create a
binary representation of each of the considered values. We can now assign a fixed number of
bits to represent the 256 values considered. Since we know that n binary digits can give rise
to 2^n numbers, so a total of 8 bits would be sufficient to represent the 256 values. The
number of bits is referred to as the bit-depth of the quantized signal. (Incidentally, we could
have considered all the thousand values, but that would have required a larger number of bits
and corresponding more computing resource, but more of that later on).
Code-word Generation
This process consists of associating a group of binary digits called a code-word to every
quantized value. In the above example, the 256 permissible values will be allocated values
from 00000000 for the minimum value to 11111111 for the maximum value. Each binary
value actually represents the amplitude of the original analog signal at a particular point or
instant, but between two such points the amplitude value is lost. This explains how a
continuous signal is converted into a discrete signal.
The whole process of sampling operation followed by quantization and code word generation
is called digitization. The result is a sequence of values coded in binary format. Physically
an analog signal is digitized by passing it through a electronic chip called an Analog-to-
Digital Converter (ADC).
The digital form of representation is useful inside a computer for storage and manipulation.
Since humans only react to physical sensory stimuli, playback of the stored media requires a
conversion back to the analog form. Our eyes and ears can only sense the physical light and
sound energies, which are analog in nature, not the digital quantities stored inside a
computer. Hence the discrete set of binary values need to be converted back to the analog
form during playback. For example a digital audio file needs to be converted to the analog
form and played back using a speaker for it to be perceived by the human ear. A reverse
process to that explained above is followed for this conversion. Physically this is done by
passing the digital signal through another electronic chip called a Digital-to-Analog
Converter (DAC).
As we increase the sampling rate we get more information about the analog wave. So the
resultant digital wave would be a more accurate representation of the analog wave. However
increasing the sampling rate also implies we have more data to store and thus require more
space. In terms of resources this implies more disk space and RAM and hence greater will be
the cost involved. Increasing the number of samples per second also means we require more
numbers to represent them. Hence we require a greater bit depth. If we use a lower bit depth
than is required we will not be able to represent all the sample values. Hence the advantage
of using a higher sampling rate will be lost. On the other hand if we use a lower sampling
rate we will get a lesser amount of information regarding the analog wave. So the digital
sound will not be an accurate representation of the analog wave. Now if we use a high bit
depth, we will have provisions for representing a large number of samples per second.
Because of the larger number of bits the size of the sound file will be quite large but because
of the low number of samples the quality will be degraded as compared to the original analog
wave.
Quantization Error
No matter what the choice of bit depth digitization can never perfectly encode a continuous
analog signal. An analog waveform has an infinite number of amplitude values but a
quantizer has a finite number of intervals. All the analog values between two intervals can
only be represented by the single number assigned to that interval. Thus the quantized value
is only an approximation of the actual. For example suppose the binary number 101000
corresponds to the analog value of 1.4 V, and 101001 corresponds to 1.5 V and the analog
value at sample time is 1.45 V. Because 1010001/2 is not available the quantizer must round
up to 101001 or down to 101000. Either way there will be an error with a magnitude of one-
half of an interval.
Quantization error (e) is the difference between the actual analog value at sample time and
the quantized value, as shown below. Let us consider an analog waveform which is sampled
at a, b and c and the corresponding sample values are A, B and C. Considering the portion
between A and B, the actual value of the signal at some point x after a is xX but value of the
digital output is fixed at xm. Thus there is an error equal to the length mX. Similarly at point
y, actual value of analog signal is yY but digital output is fixed at yn. Thus error increases to
nY. This continues for all points between a and b until just before b for an actual value of
almost bB, we get a sampled value still fixed at bp. The error is maximum at this point and
equals to pB which is also almost equal to the height of one step. This maximum error is the
quantization error, denoted by e and is equal to one step size of the digital output.
In characterizing digital hardware performance we can determine the ratio of the maximum
expressible signal amplitude to the maximum noise amplitude. This determines the S/N
(signal to noise) ratio of the system. It can be shown that the S/N ratio expressed in decibels
varies as 6 times the bit-depth. Thus increasing bit-depth during sampling leads to the
reduction of noise. To remove environmental noise we need to use good quality microphones
and sound proof recording studios. Noises generated from electrical wires may be reduced by
proper shielding and earthing of the cables. After digitization noise can also be removed by
using sound editing software. These employ noise filters to identify and selectively remove
noise from the digital audio file.
Each sample value needs to be held constant by a hold circuit until the next sample value is
obtained. Thus the maximum difference between the sample value and the actual value of the
analog wave is equal to the height of one step. If be the peak to peak height of the wave
and be the bit depth, then number of steps is . Height of each step is which is equal
This implies that if bit-depth is increased by 1, during digitization, the signal to noise ratio
increases by 6 dB.
The key advantage of the digital representation lies in the universality of representation.
Since any medium, be it text or image or sound is coded in a unique form which ultimately
results in a sequence of bits, all kinds of information can be handled in the same way.
Storage : The same digital storage device, like memory chips, hard disks, floppies and CD-
ROMs, can be used for all media.
Processing : Powerful software programs can be used to analyze, modify, alter and
manipulate multimedia data in a variety of ways. This is probably where the potential is the
highest. The quality of the information may also be improved by removal of noises and
errors. This capability enables us to digitally restore old photographs or noisy audio
recordings.
The major drawback lies in the coding distortion. The process of first sampling and then
quantizing and coding the sampled values introduces distortions. Also since a continuous
signal is broken into a discrete form, a part of the signal is actually lost and cannot be
recovered. As a result the signal generated after digital to analog conversion and presented to
the end user has little chance of being completely identical to the original signal.
Another consequence is the requirement of large digital storage capacity required for storing
image, sound and video. Each minute of CD-quality stereo sound requires 10 MB of data and
each minute of full screen digital video fills up over 1 GB of storage space. Fortunately
compression algorithms have been developed to alleviate the problem to a certain extent.
NOTES TAKEN FROM CD
Analog format
Digital format
Sampling
Sampling rate
Sampling Resolution
PCM- pulse Code Modulation
The process of converting an analog signal into a digital signal is called Pulse Code
Modulation or PCM and involves sampling. We use electronic circuits to store sampled
values as an electrical signal and then hold the signal constant until the next sampled value.
What is PCM
Over Sampling
When sampling is done at much higher rate than that prescribed by the theorem, it is called
an over sampling. Althogh over sampling can generate a high quality digital signal, it can
unnecessarily increase the file size.
Practical sampling Frequencies
To handle the full 20KHz range of human hearing , practical sampling systems use
frequencies of 44- 48 KHz. However depending on audio content sampling may also be done
at lower rate e.g., to reproduce human speech sampling needs to be done at 11KHz.
Case Study
Here we take a look at some of the case studies for obtaining the digital output waves by
using various range of sampling rates and sampling resolutions. The actual values should be
chosen keeping in the mind the comprise between cost and quality
Low rate, low resolution
7. Possibilty of upgradation
SYNTHESIZERS
Synthesizers are electronic instruments which allow us to generate digital samples of sounds
of various instruments synthetically i.e. without the actual instrument being present.
The core of a synthesizer is a special purpose chip or IC which has the capability of
generating the appropriate signals for producing sound. The sound may be recording of
actual sounds or simulation of actual sound through mathematical techniques.
The sound produced can be modified by additional hardware components for changing its
loudness, pitch etc.
Synthesizer Basics
Synthesizers can be broadly classified into two categories : FM Synthesizers generate sound
by combining elementary sinusoidal tones to build up a note having the desired waveform.
Earlier generation synthesizers were generally of FM type, the sounds of which lacked the
depth of real-world sounds. Wavetable Synthesizers, created later on, produced sound by
retrieving high-quality digital recordings of actual instruments from memory and playing
them on demand. Modern synthesizers are generally of wavetable type. The sounds
associated with synthesizers are called patches, and the collection of all patches is called the
Patch Map. Each sound in a patch map must have a unique ID number to identify it during
playback. The audio channel of a synthesizer is divided into 16 logical channel, each of
which is capable of playing a separate instrument.
FM Synthesis
Characteristics of a Synthesizer
• Polyphony : A synthesizer refers to polyphony if it has ability to play more than one
note at a time. polyphony is generally measured or specified as a number of notes or
voice.
• Multitimbral : A synthesizer is said to be multitimbral if it is capable of producing
two or more different instrument sounds simultaneously
Each physical channel of the synthesizer is divided into 16 logical channels. Omni mode
sgnifies all 16 channel are capable of receving data simultaneously. Polyphony means several
instruments can play simultaneously in each logical channel.
MUSICAL INSTRUMENT DIGITAL INTERFACE (MIDI)
What is MIDI
The Musical Instrument Digital Interface (MIDI) is a protocol or set of rules for connecting
digital synthesizers to each other or to a computers. Much on the same way that two
computers communicate via modems, two synthesizers communicate via MIDI. The
information exchanged between two MIDI devices is musical in nature. MIDI information
tells a synthesizer, in its most basic mode, when to start and stop playing a specific note, I
any. MIDI information can also be more hardware specific . It can tell a synthesizer to
change sounds, master volume, modulation devices, and even how to receive information. In
more advanced uses, MIDI information can be used to indicate the starting and end point of a
song. More recent application include using the interface between computers and synthesizer
on the computer. MIDI standard defined a protocol in which the keys instead of producing
sound directly, produced data in the form of instructions which can be stored and edited in a
personal computer before being played as sound.
MIDI Specification
The MIDI specification/protocol has three portions: the hardware standards which defined
rules for connecting a musical instrument to a computer, the message standards which
defined the format for exchanging data between the instruments and the computer, and the
file format in which this data can be stored in a computer and playback.
Hardware
The MIDI hardware has 3 basic components : the keyboard which is played as instrument
and translates key notes into MIDI data, the Sequencer which allow MIDI data to be
captured, edited and replayed, and Sound Module which translates the MIDI data to sound.
Messages/ Protocol
Channel Message
Channel Message are instructions for specific channel and contain data for the actual key
notes. The status byte contain the channel number and the function, which is followed by one
or two data bytes with additional parameters like note number, velocity.
System message
File Format
The MIDI specifications made provisions to save synthesizer audio in a separate file format
called MIDI files. MIDI files are totally different from normal digital audio files (like WAV
files) in that they do not contain the audio data at all, but rather the instructions on how to
play the sound. These instructions act on the synthesizer chips to produce the actual sound.
Because of this, MIDI files are extremely compact as compared to WAV files. They also
have another advantage that the music in a MIDI file can easily be changed by modifying the
instructions using appropriate software.
General MIDI (GM) Specification
SOUND CARD ARCHITECTURE
The sound card is an expansion board in your multimedia PC which interfaces with the CPU
via slots on the mother-board. Externally it is connected to speakers for playback of sound.
Other than playback the sound card is also responsible for digitizing, recording and
compressing the sound files.
Basic Components
SIMM Banks : Local memory of the sound card for storing audio data during digitization
and playback of sound files.
DSP : The digital signal processor which is the main processor of the sound card and
coordinates the activities of all other components. It also compresses the data so that it takes
up less space.
DAC/ADC : The digital-to-analog and analog-to-digital converters for digitizing analog
sound and reconverting digital sound files to analog form for playback.
WaveTable/FM Synthesizers : For generating sound on instructions from MIDI messages.
The wavetable chip has a set of pre-recorded digital sounds while the FM chip generates the
sound by combining elementary tones.
CD Interface : Internal connection between the CD drive of the PC and the sound card.
16-bit ISA connector : Interface for exchanging audio data between the CPU and sound
card.
Amplifier : For amplification of the analog signals from the DAC before being sent to the
speakers for playback.
Line Out : Output port for connecting to external recording devices like a cassette player or
an external amplifier.
MIC : Input port for feeding audio data to the sound card through a microphone connected to
it.
Line In : Input port for feeding audio data from external CD/cassette players for recording or
playback.
Speaker Out : Output port for attaching speakers for playback of sound files.
MIDI : Input port for interfacing with an external synthesizer.
Source : www.pctechguide.com
WAV files
From the microphone or audio CD player a sound card receives a sound as an analog signal.
The signals go to an ADC chip which converts the analog signal to digital data. The ADC
sends the binary data to the DSP, which typically compresses the data so that it takes up less
space. The DSP then sends the data to the PC’s main processor which in turn sends the data
to the hard drive to be stored. To play a recorded sound the CPU fetches the file containing
the compressed data and sends the data to the DSP. The DSP decompresses the data and
sends it to the DAC chip which converts the data to a time varying electrical signal. The
analog signal is amplified and fed to the speakers for playback.
MIDI files
The MIDI instruments connected to the sound card via the external MIDI port, or the MIDI
files on the hard disk retrieved by the CPU, instructs the DSP which sounds to play and how
to play them, using the standard MIDI instruction set. The DSP then either fetches the actual
sound from a wavetable synthesizer chip or instructs an FM synthesizer chip to generate the
sound by combining elementary sinusoidal tones. The digital sound is then sent to the DAC
to be converted to analog form and routed to the speakers for playback.
File Formats
Wave (Microsoft) File (.WAV) : This is the format for sampled sounds defined by
Microsoft for use with Windows. It is an expandable format which supports multiple data
formats and compression schemes.
Macintosh AIFF (.AIF/ .SND) : This format is used on the Apple Macintosh to save sound
data files. An .AIFF file is best when transferring files between the PC and the Mac using a
network.
RealMedia (.RM/.RA) : These are compressed formats designed for real-time audio and
video streaming over the Internet.
MIDI (.MID) : Text files containing instructions on how to generate music. The actual music
is generated from digital synthesizer chips.
Sun JAVA Audio (.AU) : Only audio format supported by JAVA Applets on the Internet.
MPEG -1 Level 3 (.MP3) : Highly compressed audio files providing almost CD-quality
sound.