Music Technology: Digitalization of Sound

Download as odt, pdf, or txt
Download as odt, pdf, or txt
You are on page 1of 4

Music Technology

Digitalization Of Sound

Sound exists in the real world as vibrations. This is transmitted from the source to the
listener's ear through the physical medium of air, fluid or soilds.

In the world of music technology however, sounds exists as a string of numbers that
represent the sound wave in a digital form.

For the purposes of recording,storing,manipulating and playing back music, we use


computers and hence we must study how computers work with audio signals to better
use it to our advantage.

The computer stores sounds as a sequence of numbers. These numbers capture, instant
by instant, the instantaneous amplitudes of the sound that they represent. In practice,
the digital conversion of an analog sound can be considered to be a process of taking
"snapshots" of the amplitude at regular intervals : each "photo" is a number. The reverse
process, converting digital sound into an analog waveform is done by re-transforming
this sequence of numbers into an audible sound.

The digital recording of sound is called Sampling, and generates a series of numbers,
each of which is a sample : A measurement of of the amplitude of the analog signal at a
given instant. The process of digitization, in order to work correctly, must be performed
at regular intervals; the number of amplitude measurements that are made on a regulat
basis in one secong is defined as the Sampling Rate.

This is similar to what happens with film and video. It requires 24 frames each of which
consists of a fixed image, to be projected each second in order to view the movie as a
continously flowing image, rather than a series of stills.

Likewise, on an audio CD, there are 44,100 samples in each second of music. The higher
the sampling rate, the more accurate the representation of the stored sound. This series
of numbers generates motion, in the form of a wave, which ultimately produces a
physical wave by moving the molecules in the air around us.

There is a distinction between the "Sampling Frequency" and the "frequency of the
waveform". The Sampling frequency represents the number of samples per second ,
whereas the frequency of the waveform represents the number of cycles per second of
that particular waveform.

File Storage and File Formats

On disk storage, many different formats of encoding audio files are used. These formats
may either be compressed or uncompressed.

Uncompressed Audio file formats include WAV and AIFF.


Lossless formats include FLAC,Lossless WMA and Apple Lossless. These formats
reduce the file size without sacrificing quality.
Lossy formats use lossy techniques to achieve a smaller file size at the expense of
sound quality , such as MP3,AAC,Ogg Vorbis, and standard WMA.
WAV(Wave) and AIFF(Audio Interchange File Format) files, along with most of the
other formats, provide a header with space for an ID, a sampling rate, a number
of channels, the number of bits per sample and the length of the audio data,
along with any other metadata needed for decoding and cataloging.

Any sound file can always be converted from one standard format to another. But
once audio is converted from a Lossless format to a Lossy format , such as say
from Wav to mp3 , the mp3 file converted back to wav will not contain the
missing the information.

An audio file can be monaural(mono), stereo or multichannel. A monaural audio


file contains a single sequence of numbers that encode its digital waveform. A
Stereo file on the other hand, contains two digital sequences that can be
converted in parallel, one for the left channel and right channel. A multichannel
file onctains a variable number of sequences(between 4 & 8) that can be sent to
the same number of speakers.

When a high quality WAV file is compressed into a low quality mp3 file , the audio
information is inevitably lost. Most people cannot tell the difference between
wave and mp3 at first listen because they may not have good enough speakers to
show them the difference , but once you have heard the same piece piece of
music in its full quality , you will surely hear the difference , and it is quite
breath-taking ! Welcome to the world of lossless audio.

Bit Depth is another property of a file and this is based on the way the computer
stores the numbers that are sampled. Bit Depth refers to the number of bits you
have to capture audio. The easiest way to envision this is as a series of levels,
that audio energy can be sliced at any given moment in time. With 16 bit audio,
there are 65,536 possible levels. With every bit of greater resolution, the number
of levels double. By the time we get to 24 bit, we actually have 16,777,216
levels. Remember we are talking about a slice of audio frozen in a single moment
of time.

In short, An Audio signal is measured many thousand times a second to generate a


string of binary number(called words). The longer each word is(the more bits it
has), the greater the accuracy of each measuerment. Short words give poor
resolution of the signal voltage(high distortion) and long words give good
resoultion(low distortion). Bit depth or reolution are other terms for word length.
Hence a 24 bit WAV file has better resolution than a 16 bit WAV file of the same
sample.
Signal Characteristics of Audio Devices

Frequency Response

All electronic devices concerned with sound, respond differently to different frequencies
of sound. Some might amplify the low notes(bass) and others might amplify the
Highs(treble). We can graph how the device responds to different frequencies by
plotting its output level versus the frequency. This graph is called a frequency response
graph. The level in the graph is measured in dB and frequency in Hz. Generally 1dB is
the smallest change in loudness that we can hear.

Suppose the level is the same at all frequencies, then the graph forms a horizontal line
and is called a "flat frequency response". All the frequencies are reproduced at an equal
level. In other words, the device passes through all the frequencies without changing the
relative levels.

Many audio devices do not have a flat frequency response across the entire audio
spectrum from 20Hz to 2okHz. They have a limited range of frequencies that can be
reproduced at an equal level(with a tolerance of 3dB + or -).

Usually, the more extended or wide the frequency range is, the more natural and real
the recording sounds. A wide, flat response gives accurate reproduction. A frequency
response of 200 to 8000 Hz is narrow(poor fidelity); 80 to 12000Hz is wider(better
fidelity) and 20 to 20000kHz is widest(Best fidelity).

Noise

Noise is another characteristic of audio signals. Every audio component produces a little
noise a rushing sound like wind in trees. Noise in a recording is undesirable unless it's
part of the music.

Distortion

If you turn up the signal level too high, the signal distorts and you hear a gritty,grainy
sound or clicks. This type of distortion is called "clipping" , because the peaks of the
signals are clipped off so they are flattened. Digital recorders also produce quantisation
distortion at very low levels.

Optimum Signal Level

We need the signal level high enough to cover up the noise, but low enough to avoid
distortion. Evey audio component works best at a certain optimum signal level.

If we take the range of signal levels in an audio device at the bottom is the noise floor
of the device. i.e, the level of noise that the device produces with no signal. At the top
is the distortion level the point at which the signal distorts and sounds grungy. The
idea is to maintain the signal around the average level below the maximum.
Signal to Noise Ratio(SNR)

The level difference in decibels between the signal level and the noise floor is called the
signal-to-noise ratio or S/N. The Higher the S/N, the cleaner the sound. An S/N of 60dB
is fair, 70dB is good, and 80dB or greater is excellent.

For instance , imagine a person yelling a message over the sound of a train. The message
yelled is the signal , the noise is the train. The louder the message, the quiter the train,
the greater the S/N.

Headroom

The level difference in decibels between the normal signal level and the distortion level
is called "headroom". The greater the headroom, the greater the signal level the device
can pass through without running into distortion. If an audio device has a lot of
headroom, it can pass high level peaks without clipping them.

You might also like