Audio Synthesis
Audio Synthesis
Analogue Synthesis
Most analogue synthesis employs a method known as “subtractive synthesis” . Here the desired
sound is
achieved by filtering out the undesired parts of the sound from a broad initial sound source. An
analogue
synthesiser can be broken down into three main types of components;
• sound sources
• sound modifiers
• controllers
Highpass filter
Bandpass Filter
Frequency Modulation
In the previous example you controlled the frequency of the oscillators with the joystick or to put it
another way,
the joystick was modulating the frequency of the oscillators. Another way to control an oscillator’s
frequency is
to use another oscillator as a control source. This type of configuration is commonly referred to as
frequency
modulation or FM (the same principal is used in FM radio where the carrier wave is is modulated by
the signal
wave). If the modulating frequency is very high the effect will be complex, somewhat like the Ring
Modulator.
However at very low frequencies (such as 5 Hz and lower) the effect will sound like a kind of
vibrato. Some
synthesisers have oscillators specifically designed to create very low (sub audible) frequencies in
order to
create low frequency modulation. These oscillators are sometimes called Low Frequency
Oscillators or LFOs.
Low frequency modulation is often referred to as Pitch modulation because of the vibrato like effect.
In
Matrixsynth the three oscillators can each function as an LFO. The LFO mode is selected using the
sub-menu
under the level dial in the oscillator modules. When an oscillator is set to LFO mode the range of
frequencies
that can be produced falls mainly in the sub-audible range.
Pulsewidth Modulation
When you alter the waveform in an oscillator using the shape dial you are varying the symmetrical
proportions of
the waveforms shape. For example a symmetrical triangle can be altered to produce a sawtooth
wave
Changing the pulsewidth of a waveform results in a change in the balance of harmonic frequencies
present in
the sound. This is easy to hear when you vary the shape of a triangle wave. When the wave is
symmetrical (a
triangle wave) the sound is relatively simple (with few of the upper harmonic frequencies present).
However,
when the pulsewidth is altered such that the waveform is closer to a sawtooth wave the sound will
be much
brighter. This is caused by the presence of upper harmonic frequencies.
Figure 4.2 This organ has a great many pipes, and together they function exactly like
an additive synthesis algorithm.
Each pipe essentially produces a sine wave (or something like it), and by selecting
different combinations of harmonically related pipes (as partials), we can create
different combinations of sounds, called (on the organ) stops. This is how organs get
all those different sounds: organists are experts on Fourier series and additive
synthesis (though they may not know that!).
The technique of mixing simple sounds together to get more complex sounds
dates back a very long time. In the Middle Ages, huge pipe organs had a great
many stops that could be "pulled out" to combine and recombine the sounds
from several pipes. In this way, different "patches" could be created for the
organ. More recently, the telharmonium, a giant electrical synthesizer from the
early 1900s, added together the sounds from dozens of electro-mechanical
tone generators to form complex tones. This wasn’t very practical, but it has
an important place in the history of electronic and computer music.
Applet 4.2
Mixed sounds
While instruments like the pipe organ were quite effective for some sounds,
they were limited by the need for a separate pipe or oscillator for each tone
that is being added. Since complex sounds can require anywhere from a couple
dozen to several thousand component tones, each needing its own pipe or
oscillator, the physical size and complexity of a device capable of producing
these sounds would quickly become prohibitive. Enter the computer!
This piece and another extraordinary Gaburo work, "For Harry," were made at the
Soundfile 4.1 University of Illinois at Urbana-Champaign on an early electronic music instrument called
Excerpt from the harmonic tone generator, which allowed the composer to set the frequencies and
Kenneth amplitudes of a number of sine wave oscillators to make their own timbres. It was
Gaburo’s extremely cumbersome to use, but it was essentially a giant Fourier synthesizer, and,
composition theoretically, any periodic waveform was possible on it!
"Lemon Drops"
It’s a tribute to Gaburo’s genius and that of other early electronic music pioneers that they
were able to produce such interesting music on such primitive instruments. Kind of makes
it seem like we’re almost cheating, with all our fancy software!
If there is one thing
computers are good at,
it’s adding things
together. By
using digital
oscillators instead of
actual physical devices,
a computer can add up
any number of simple
sounds to create
extremely complex
waveforms. Only the
speed and power of the
computer limit the
number and complexity
of the waveforms.
Modern systems can
easily generate and mix
thousands of sine
waves in real time.
This makes additive
synthesis a powerful
and versatile
performance and
synthesis tool. Additive
synthesis is not used so
much anymore (there
are a great many other,
more efficient
techniques for getting
complex sounds), but
it’s definitely a good
thing to know about.
A Simple Additive
Synthesis Sound
Let’s design a simple
sound with additive
synthesis. A nice
example is the
generation of a square
wave.
You can probably imagine what a square wave would look like. We start with
just one sine wave, called thefundamental. Then we start adding odd
partials to the fundamental, the amplitudes of which are inversely proportional
to their partial number. That means that the third partial is 1/3 as strong as the
first, the fifth partial is 1/5 as strong, and so on. (Remember that the
fundamental is the first partial; we could also call it the first harmonic.) Figure
4.3 shows what we get after adding seven harmonics. Looks pretty square,
doesn’t it?
Now, we should admit that there’s an easier way to synthesize square waves:
just flip from a high sample value to a low sample value every nsamples. The
lower the value of n, the higher the frequency of the square wave that’s being
generated. Although this technique is clearer and easier to understand, it has its
problems too; directly generating waveforms in this way can cause unwanted
frequency aliasing.
Figure 4.4 The Synclavier was an early digital electronic music instrument that used a large oscillator
bank for additive synthesis. You can see this on the front panel of the instrument—many of the LEDs
indicate specific partials! On the Synclavier (as was the case with a number of other analog and digital
instruments), the user can tune the partials, make them louder, even put envelopes on each one.
This applet lets you add sine waves together at various amplitudes, to see how additive
synthesis works.
Applet 4.3
Additive
synthesis
This applet lets you add spectral envelopes to a number of partials. This means that you
can impose a different amplitude trajectory for each partial, independently making each
Applet 4.4 louder and softer over time.
Spectral
envelopes
[Under This is really more like the way things work in the real world: partial amplitudes evolve
Development over time—sometimes independently, sometimes in conjunction with other partials (in a
Available Oct. phenomenon called common fate). This is called spectral evolution,and it’s what makes
2004] sounds live.
OK, now how about a more interesting example of additive synthesis? The
quality of a synthesized sound can often be improved by varying its
parameters (partial frequencies, amplitudes, and envelope) over time. In fact,
Xtra bit 4.4
time-variant parameters are essential for any kind of "lifelike" sound, since all
Spectral formula naturally occurring sounds vary to some extent.
of a waveform
Sine wave speech is an experimental technique that tries to simulate speech with just
a few sine waves, in a kind of primitive additive synthesis. The idea is to pick the sine
waves (frequencies and amplitudes) carefully. It’s an interesting notion, because sine
waves are pretty easy to generate, so if we can get close to "natural" speech with just
a few of them, it follows that we don’t require that much information when we listen
to speech.
Sine wave speech has long been a popular idea for experimentation by psychologists
and researchers. It teaches us a lot about speech—what’s important in it, both
perceptually and acoustically.
These files are used with the permission of Philip Rubin, Robert Remez,
and Haskins Laboratories.
As we’ve said, additive synthesis is an important tool, and we can do a lot with
it. It does, however, have its drawbacks. One serious problem is that while it’s
good for periodic sounds, it doesn’t do as well with noisy or chaotic ones.
And there’s a worse problem that we’d love to sweep under the old
psychoacoustical rug, too, but we can’t: it’s great that we know so much about
steady-state, periodic, Fourier-analyzable sounds, but from a cognitive and
perceptual point of view, we really couldn’t care less about them! The ear and
brain are much more interested in things like attacks, decays, and changes over
time in a sound (modulation). That’s bad news for all that additive synthesis
software, which doesn’t handle such things very well.
That’s not to say that if we play a triangle wave and a sawtooth wave, we
couldn’t tell them apart; we certainly could. But that really doesn’t do us much
good in most circumstances. If angry lions roared in square waves, and cute
cuddly puppy dogs barked in triangle waves, maybe this would be useful, but
we have evolved—or learned to hear attacks, decays, and other transients as
being more crucial. What we need to be able to synthesize are transients,
spectral evolutions, and modulations. Additive synthesis is not really the best
technique for those.
Shepard Tones
These tones slide gradually from the bottom of the frequency range to the top.
The amplitudes of the component frequencies follow a bell-shaped spectral
envelope (see Figure 4.6) with a maximum near the middle of the standard
musical range. In other words, they fade in and out as they get into the most
common frequency range. This creates an interesting illusion: a circular
Shepard tone scale can be created that varies only in tone chroma and
collapses the second dimension of tone height by combining all octaves. In
other words, what you hear is a continuous pitch change through one octave,
but not bigger than one octave (that’s a result of the special spectra and the
amplitude curve). It’s kind of like a barber pole: the pitches sound as if they
just go around for a while, and then they’re back to where they started (even
though, actually, they’re continuing to rise!).
Figure 4.10 Bell-shaped spectral envelope for making Shepard tones.
Soundfile 4.6
Shepard tone
Figure 4.7 Try clicking on Soundfile 4.6. After you listen to the soundfile once, click
again to listen to the frequencies continue on their upward spiral.
Used with permission from Susan R. Perry, M.A., Dept. of Psychology, University of
Tennessee.
The Shepard tone contains a large amount of octave-related harmonics across the
frequency spectrum, all of which rise (or fall) together. The harmonics toward the low and
high ends of the spectrum are attenuated gradually, while those in the middle have
maximum amplification. This creates a spiraling or barber pole effect. (Information from
Soundfile 4.7 Doepfer Musikelektronik GmbH.)
Shepard tone
Soundfile 4.7 is an example of the spiraling Shepard tone effect.
James Tenney is an important computer music composer and pioneer who worked at Bell
Laboratories with Roger Shepard in the early 1960s.
This piece was composed in 1969. This composition is based on a set of continuously
Soundfile 4.8 rising tones, similar to the effect created by Shepard tones. The compositional process is
"For Ann" simple: each glissando, separated by some fixed time interval, fades in from its lowest note
(rising), by
and fades out as it nears the top of its audible range. It is nearly impossible to follow,
James Tenney.
aurally, the path of any given glissando, so the effect is that the individual tones never
reach their highest pitch.
Section 4.3: Filters
The most common way to think about filters is as functions that take in a
signal and give back some sort of transformed signal. Usually, what comes
out is "less" than what goes in. That’s why the use of filters is sometimes
referred to as subtractive synthesis.
Older telephones had around an 8k low-pass filter imposed on their audio signal, mostly
for noise reduction and to keep the equipment a bit cheaper.
Soundfile 4.9
Telephone
simulations
White noise (every frequency below the Nyquist rate at equal level) is filtered so we hear
only frequencies above 5 kHz.
Soundfile 4.10
High-pass
filtered
noise
Soundfile 4.11
Low-pass
filtered
noise
Four Basic Types of Filters
Figure 4.8 Four common filter types (clockwise from upper left): low-pass, high-
pass, band-reject, band-pass.
Figure 4.8 illustrates four basic types of filters: low-pass, high-pass, band-
pass, and band-reject. Low-pass and high-pass filters should already be
familiar to you—they are exactly like the "tone" knobs on a car stereo or
boombox. A low-pass (also known as high-stop) filter stops, or attenuates,
high frequencies while letting through low ones, while a high-pass (low-stop)
filter does just the opposite.
This applet is a good example of how filters, combined with something like noise, can
produce some common and useful musical effects with very few operations.
Applet 4.5
Using filters
Comb filters are a very specific type of digital process in which a short delay (where some
number of samples are actually delayed in time) and simple feedback algorithm (where
outputs are sent back to be reprocessed and recombined) are used to create a rather
extraordinary effect. Sounds can be "tuned" to specific harmonics (based on the length of
Applet 4.6 the delay and the sample rate).
Comb filters
Low-pass and high-pass filters have a value associated with them called
the cutoff frequency, which is the frequency where they begin "doing their
thing." So far we have been talking about ideal, or perfect, filters, which cut
off instantly at their cutoff frequency. However, real filters are not perfect,
and they can’t just stop all frequencies at a certain point. Instead, frequencies
die out according to a sort of curve around the corner of their cutoff
frequency. Thus, the filters in Figure 4.8 don’t have right angles at the cutoff
frequencies—instead they show general, more or less realistic response curves
for low-pass and high-pass filters.
Cutoff Frequency
The cutoff frequency of a filter is defined as the point at which the signal is
attenuated to 0.707 of its maximum value (which is 1.0). No, the number
0.707 was not just picked out of a hat! It turns out that the power of a signal is
determined by squaring the amplitude: 0.7072 = 0.5. So when the amplitude
of a signal is at 0.707 of its maximum value, it is at half-power. The cutoff
frequency of a filter is sometimes called its half-power point.
Transition Band
The area between where a filter "turns the corner" and where it "hits the
bottom" is called the transition band.The steepness of the slope in the
transition band is important in defining the sound of a particular filter. If the
slope is very steep, the filter is said to be "sharp"; conversely, if the slope is
more gradual, the filter is "soft" or "gentle."
Things really get interesting when you start combining low-pass and high-
pass filters to form band-pass and band-reject filters. Band-pass and band-
reject filters also have transition bands and slopes, but they have two of them:
one on each side. The area in the middle, where frequencies are either passed
or stopped, is called thepassband or the stopband. The frequency in the
middle of the band is called the center frequency, and the width of the band is
called the filter’s bandwidth.
You can plainly see that filters can get pretty complicated, even these simple
ones. By varying all these parameters (cutoff frequencies, slopes, bandwidths,
etc.), we can create an enormous variety of subtractive synthetic timbres.
Filters are often talked about as being one of two types: finite impulse
response (FIR) and infinite impulse response (IIR). This sounds complicated
(and can be!), so we’ll just try to give a simple explanation as to the general
idea of these kinds of filters.
Finite impulse response filters are those in which delays are used along with
some sort of averaging. Delays mean that the sound that comes out at a given
time uses some of the previous samples. They’ve been delayed before they get
used.
We’ve talked about these filters in earlier chapters. What goes into an FIR is
always less than what comes out (in terms of amplitude). Sounds reasonable,
right? FIRs tend to be simpler, easier to use, and easier to design than IIRs,
and they are very handy for a lot of simple situations. An averaging low-pass
filter, in which some number of samples are averaged and output, is a good
example of an FIR.
Well, IIRs are similar. Because the feedback path of these filters consists of
some number of delays and averages, they are not always what are
called unity gaintransforms. They can actually output a higher signal than that
which is fed to them. But at the same time, they can be many times more
complex and subtler than FIRs. Again, think of electric guitar feedback—IIRs
are harder to control but are also very interesting.
Figure 4.9 FIR and IIR filters.
Filters are usually designed in the time domain, by delaying a signal and then
averaging (in a wide variety of ways) the delayed signal and the nondelayed one.
These are called finite impulse response (FIR) filters, because what comes out uses a
finite number of samples, and a sample only has a finite effect.
If we delay, average, and then feed the output of that process back into the signal, we
create what are called infinite impulse response (IIR) filters. The feedback process
actually allows the output to be much greater than the input. These filters can, as we
like to say, "blow up."
These diagrams are technical lingo for typical filter diagrams for FIR and IIR filters.
Note how in the IIR diagram the output of the filter’s delay is summed back into the
input, causing the infinite response characteristic. That’s the main difference
between the two filters.
Designing filters is a difficult but key activity in the field ofdigital signal
processing, a rich area of study that is well beyond the range of this book. It is
interesting to point out that, surprisingly, even though filters change the
frequency content of a signal, a lot of the mathematical work done in filter
design is done in the time domain, not in the frequency domain. By using
things like sample averaging, delays, and feedback, one can create an
extraordinarily rich variety of digital filters.
For example, the following is a simple equation for a low-pass filter. This
equation just averages the last two samples of a signal (where x(n) is the
current sample) to produce a new sample. This equation is said to have aone-
sample delay. You can see easily that quickly changing (that is, high-
frequency) time domain values will be "smoothed" (removed) by this
equation.
In fact, although it may look simple, this kind of filter design can be quite
difficult (although extremely important). How do you know which
frequencies you’re removing? It’s not intuitive, unless you’re well schooled in
digital signal processing and filter theory, have some background in
mathematics, and know how to move from the time domain (what you have)
to the frequency domain (what you want) by averaging, delaying, and so on.
<-- Back to Previous Page Next Section -->
These peaks stay in the same frequency range, independent of the actual
(fundamental) pitch being produced by the voice or instrument. While there
are many other factors that go into synthesizing a realistic timbre, the use of
formants is one way to get reasonably accurate results.
Figure 4.10 A trumpet plays two different notes, a perfect fourth apart, but the
formants (fixed resonances) stay in the same places.
Resonant Structure
Generating really good and convincing synthetic speech and singing voices
is more complex than simply moving around a set of formants—we haven’t
mentioned anything about generating consonants, for example. And no
Xtra bit 4.5
speech synthesis system relies purely on formant synthesis. But, as these
Change of examples illustrate, even very basic formant manipulation can generate
resonance sounds that are undoubtedly "vocal" in nature.
Figure 4.11 A spectral picture of the voice, showing formants. Graphic courtesy
of the alt.usage.english newsgroup.
Xtra bit 4.6
Formant
manipulations
Soundfile 4.12 Paul Lansky is a well-known composer and researcher of computer music who teaches
"Notjustmoreidlech at Princeton University. He has been a leading pioneer in software design, voice
atter" of Paul
synthesis, and compositional techniques.
Lansky
"Over ten years ago I wrote three 'chatter' pieces, and then decided to quit while I was
ahead. The urge to strike again recently overtook me, however, and after my lawyer
Soundfile 4.13 assured me that the statute of limitations had run out on this particular offense, I once
"idlechatterjunior" again leapt into the fray. My hope is that the seasoning provided by my labors in the
of Paul Lansky
intervening years results in something new and different. If not, then look out for 'Idle
from 1999
Chatter III'... ."
Over the years, computer voice simulations have become better and better. They still
sound a bit robotic, but advances in voice synthesis and acoustic technology make
voices more and more realistic. Bell Telephone Laboratories has been one of the leading
research facilities for this work, which is expected to become extremely important in the
Soundfile 4.15 near future.
Synthetic speech
example, "Fred"
voice from the
Macintosh
computer
In this piece, based on a reading by Australian sound-poet Chris Mann, the composer
tries to separate vowels and consonants, moving them each to a different speaker. This
was inspired by an idea of Mann's, who always wanted to do a "headphone piece" in
which he spoke and the consonants appeared in one ear, the vowels in another.
Soundfile 4.16
Carter Sholz’s 1-
minute piece
"Mannagram"
Soundfile 4.17
The trump
<-- Back to Previous Page TOC Next Section -->
Introduction to Modulation
Modulated signals are those that are changed regularly in time, usually by
other signals. They can get pretty complicated. For example, modulated signals
can modulate other signals! To create a modulated signal, we begin with two
or more oscillators (or anything that produces a signal) and combine the output
signals of the oscillators in such a way as to modulate the amplitude,
frequency, and/or phase of one of the oscillators.
Applet 4.9 shows what happens, in the case of frequency modulation, if the
modulating signal is low frequency. In that case, we’ll hear something
like vibrato (a regular change in frequency, or perceived pitch). We can also
Applet 4.8 modulate amplitude in this way (tremolo), or even formant frequencies if we
LFO
modulation want. Low-frequency modulations (that is, modulators that themselves are
low-frequency signals) can produce interesting sonic effects.
But for making really complex sounds, we are generally interested in high-
frequency modulation. We take two audio frequency signals and multiply them
together. More precisely, we start with a carrier oscillator and attach
a modulating oscillator to modify and distort the signal that the carrier
oscillator puts out. The output of the carrier oscillator can include its original
signal and the sidebands or added spectra that are generated by the modulation
process.
Amplitude Modulation
A low-pass moving filter that uses a sine wave to control a sweep between 0 Hz and 500
Hz.
Soundfile 4.18
Low-pass
moving filter
(modulated by
sine)
A high-pass moving filter that uses a sine wave to control a sweep between 5,000 Hz and
15,000 Hz.
Soundfile 4.19
High-pass
moving filter
(modulated by
sine)
A low-pass moving filter that uses a sawtooth wave to control a sweep between 0 Hz and
500 Hz.
Soundfile 4.20
Low-pass
moving filter
(modulated by
sawtooth)
A high-pass moving filter that uses a sawtooth wave to control a sweep between 5,000 Hz
and 15,000 Hz.
Soundfile 4.21
High-pass
moving filter
(modulated by
sawtooth)
Figure 4.14 James Tenney’s "Phases," one of the earliest and still most interesting
pieces of computer-assisted composition. The pictures above are his "notes" for the
piece, which constitute a kind of score.
<-- Back to Previous Page TOC Next Section -->
y = f(x)
This is simple, right? In fact, it’s much simpler than any other function
we’ve seen so far. That’s because waveshaping, in its most general form,
is just any old function. But there’s a lot more to it than that. In order to
change the shape of the function (and not just make it bigger or smaller),
the function must be nonlinear, which means it has exponents greater than
1, or transcendental (like sines, cosines, exponentials, logarithms, etc.).
You can use almost any function you want as a waveshaper. But the most
useful ones output zero when the input is zero (that’s because you usually
don’t want any output when there is no input).
0 = f(0)
y = f(x) = x * x * x = x3
What would it look like to pass a simple sine wave that varied from -
1.0 to 1.0 through this waveshaper? If our input x is sin(wt), then:
y = x3 = sin3(wt)
If we plot both functions (sin(x) and the output signal), we can see that the
original input signal is very round, but the output signal has a narrower
peak. This will give the output a richer sound.
This example gives some idea of the power of this technique. A simple
function (sine wave) gets immediately transformed, using simple math
and even simpler computation, into something new.
When x is zero, y is zero. Plug in a few numbers for x, like 0.5, 7.0,
1,000.0, –7.0, and see what you get. As x gets larger (approaches positive
infinity), y approaches +1.0 but never reaches it. As x approaches negative
infinity, yapproaches –1.0 but never reaches it. This kind of curve is
sometimes called soft clipping because it does not have any hard edges. It
can give a nice "tubelike" distortion sound to a guitar. So this function has
some nice properties, but unfortunately it requires a divide, which takes a
lot more CPU power than a multiply. On older or smaller computers, this
can eat up a lot of CPU time (though it’s not much of a problem
nowadays).
Applet 4.9
Changing the shape
of a waveform
Chebyshev Polynomials
T0(x) = 1
T1(x) = x
T2(x) = 2x2 – 1
T3(x) = 4x3 – 3x
T4(x) = 8x4 – 8x2 + 1
Table-Based Waveshapers
Doing all these calculations in realtime at audio rates can be a lot of work,
even for a computer. So we generally precalculate these polynomials and
put the results in a table. Then when we are synthesizing sound, we just
take the value of the input sine wave and use it to look up the answer in
the table. If you did this during an exam it would be called cheating, but
in the world of computer programming it is called optimization.
One big advantage of using a table is that regardless of how complex the
original equations were, it always takes the same amount of time to look
up the answer. You can even draw a function by hand without using an
equation and use that hand-drawn function as your transfer function.
This applet plays sine waves through polynomials and hand-drawn waves.
Applet 4.10
Waveshaping
Now that you know a lot about waveshaping, Chebyshev polynomials, and transfer
functions, we’ll show you what happens when the information gets into the wrong
hands!
Soundfile 4.22a These soundfiles are two recordings done in the mid-1980s, at the Mills College
Experimental Center for Contemporary Music, by one of our authors (Larry Polansky). They use
waveshaping: "Toyoji
an highly unusual live, interactive computer music waveshaping system.
Patch"
Both of these sound excerpts feature the amazing contemporary flutist Anne
LaBerge. In the first version, LaBerge is playing but is not recorded. She is in
another room, and the output of the system is fed back into itself through a
microphone. By playing, she could drastically affect the sound (since her flute went
immediately into the transfer function). However, although she’s causing the
changes to occur, we don’t actually hear her flute. In the second version, LaBerge is
in front of the same microphone that’s used for feedback and recording.
In both versions, Polansky was controlling the mix and the feedback gain as well as
playing with the computer.
<-- Back to Previous Page Next Section -->
History of FM Synthesis
FM techniques have been around since the early 20th century, and by the 1930s
FM theory for radio broadcasting was welldocumented and understood. It was
not until the 1970s, though, that a certain type of FM was thoroughly
researched as a musical synthesis tool. In the early 1970s, John Chowning, a
composer and researcher at Stanford University, developed some important
new techniques for music synthesis using FM.
Chowning’s research paid off. In the early 1980s, the Yamaha Corporation
introduced their extremely popular DX line of FM synthesizers, based on
Chowning’s work. The DX-7 keyboard synthesizer was the top of their line,
and it quickly became thedigital synthesizer for the 1980s, making its mark on
both computer music and synthesizer-based pop and rock. It’s the most popular
synthesizer in history.
Thanks to Joseph Rivers and The Audio Playground Synthesizer Museum for this
photo.
Simple FM
In its simplest form, FM involves two sine waves. One is called the modulating
wave, the other the carrier wave. The modulating wave changes the frequency
of the carrier wave. It can be easiest to visualize, understand, and hear when the
modulator is low frequency.
Figure 4.17 Frequency modulation, two operator case.
Vibrato
Soundfile 4.23
Vibrato sound
FM can create vibrato when the modulating frequency is less than 30 Hz.
Okay, so it’s still not that exciting—that’s just because everything is
moving slowly. We’ve created a very slow, weird vibrato! That’s because we
were doing low-frequency modulation. In Soundfile 4.23, the frequency (fc) of
the carrier wave is 500 Hz and the modulating frequency (fm) is 1 Hz. 1 Hz
means one complete cycle each second, so you should hear the frequency of
the carrier rise, fall, and return to its original pitch once each second.
Note that the frequency of the modulating wave is the rate of change in the
carrier’s frequency. Although you can’t tell from the above equation, it also
turns out that the amplitude of the modulator is the degree of change of the
carrier’s frequency, and the waveform of the modulator is the shape of change
of the carrier’s frequency.
In Figure 4.17 showing the unit generator diagram for frequency modulation
(remember, we showed you one of these in Section 4.5), note that each of the
sine wave oscillators has two inputs: one for frequency and one for amplitude.
For our modulating oscillator we are using 1 Hz as the frequency, which
becomes fm to the carrier (that is, the frequency of the carrier is changed 1 time
per second). The modulator’s amplitude is 100, which will determine how
muchthe frequency of the carrier gets changed (at a rate of 1 time per second).
Soundfile 4.24
Vibrato sound
If we raise the frequency of the modulating oscillator above 30 Hz, we can start
to hear more complex sounds. We can make an analogy to being able to see the
spokes of a bike wheel if it rotates slowly, but once the wheel starts to rotate
faster a visual blur starts to occur.
So it is with FM: when the modulating frequency starts to speed up, the sound
becomes more complex. The tones you heard in Soundfile 4.24 sliding around
are called sidebandsand are extra frequencies located on either side of the
carrier frequency. Sidebands are the secret to FM synthesis. The frequencies of
the sidebands (called, as a group, the spectra) depend on the ratio of fc to fm.
John Chowning, in a famous article, showed how to predict where those
sidebands would be using a simple mathematical idea called Bessel
functions.By controlling that ratio (called the FM index) and using Bessel
functions to determine the spectra, you can create a wide variety of sounds,
from noisy jet engines to a sweet-sounding Fender Rhodes.
Figure 4.18 FM sidebands.
Soundfiles 4.25 through 4.28 show some simple two-carrier FM sounds with
modulating frequencies above 30 Hz.
Carrier: 100 Hz; modulator frequency: 280 Hz; FM index: 6.0 -> 0.
Soundfile 4.25
Bell-like sound
Carrier: 250 Hz; modulator frequency: 175 Hz; FM index: 1.5 -> 0.
Soundfile 4.26
Bass clarinet-
type sound
Carrier: 700 Hz; modulator frequency: 700 Hz; FM index: 5.0 -> 0.
Soundfile 4.27
Trumpet-like
sound
Carrier: 500 Hz; modulator frequency: 500 -> 5,000 Hz; FM index: 10.
Soundfile 4.28
FM sound
One of the most common computer languages for synthesis and sound
processing is called Csound, developed by Barry Vercoe at MIT. Csound is
popular because it is powerful, easy to use, public domain, and runs on a wide
variety of platforms. It has become a kind of lingua franca for computer music.
Csound divides the world of sound into orchestras, consisting of instruments
that are essentially unit-generator designs for sounds, and scores (or note lists)
that tell how long, loud, and so on a sound should be played from your
orchestra.
Asig is simply a name for this line of code, so we can use it later (out
asig).
Yes, we know, you might be completely confused, but we thought you’d like to
see a common language that actually uses some of the concepts we’ve been
discussing!
Some music languages, like CSound, make extensive use of the unit generator
model.Generally, unit generators are used to create instruments (the orchestra), and
then a set of instructions (a score) is created that tells the instruments what to do.
Now that you understand the basics of FM synthesis, go back to the beginning
of this section and play with Applet 4.12. FM is kind of interesting
theoretically, but it’s far more fun and educational to just try it out.
<-- Back to Previous Page Next Section -->
Applet 4.12
Granular
synthesis
Figure 4.20 A grain is created by taking a waveform, in this case a sine wave, and
multiplying it by an amplitude envelope.
How would a different amplitude envelope, say a square one, affect the shape of the
grain? What would it do to the sound of the grain?
Clouds of Sound
What sorts of sounds does this image imply? If you had three vocal performers, one
for each "cloud," how would you go about performing this piece? Try it!
Soundfile 4.29 There are a great many commercial and public domain applications to do granular synthesis
"Implements of because it is relatively easy to implement and the sounds can be very interesting and
Actuation"
attractive.
<-- Back to Previous Page Next Section -->
©Burk/Polansky/Repetto/Roberts/Rockmore. All rights reserved.
Karplus-Strong Algorithm
Let’s take a look at a really simple but very effective physical model of a
plucked string, called the Karplus-Strong algorithm (so named for its principal
inventors, Kevin Karplus and Alex Strong). One of the first musically useful
physical models (dating from the early 1980s), the Karplus-Strong algorithm
has proven quite effective at generating a variety of plucked-string sounds
(acoustic and electric guitars, banjos, and kotos) and even drumlike timbres.
Fun with the Karplus-Strong plucked string algorithm.
Applet 4.13
Karplus-Strong
plucked string
algorithm
Here’s a simplified view of what happens when we pluck a string: at first the
string is highly energized and it vibrates like mad, creating a
fairly complex (meaning rich in harmonics) sound wave whose fundamental
frequency is determined by the mass and tension of the string. Gradually,
thanks to friction between the air and the string, the string’s energy is depleted
and the wave becomes less complex, resulting in a "purer" tone with fewer
harmonics. After some amount of time all of the energy from the pluck is gone,
and the string stops vibrating.
If you have access to a stringed instrument, particularly one with some very
low notes, give one of the strings a good pluck and see if you can see and hear
what’s happening per the description above.
Now that we have a physical idea of what’s happening in a plucked string, how
can we model it with a computer? The Karplus-Strong algorithm does it like
this: first we start with a buffer full of random values—noise. (A buffer is just
some computer memory (RAM) where we can store a bunch of numbers.) The
numbers in this buffer represent the initial energy that is transferred to the
string by the pluck. The Karplus-Strong algorithm looks like this:
To generate a waveform, we start reading through the buffer and using the
values in it as sample values. If we were to just keep reading through the buffer
over and over again, what we’d get would be a complex, pitched waveform. It
would be complex because we started out with noise, but pitched because we
would be repeating the same set of random numbers. (Remember that any time
we repeat a set of values, we end up with a pitched (periodic) sound. The pitch
we get is directly related to the size of the buffer (the number of numbers it
contains) we’re using, since each time through the buffer represents one
complete cycle (or period) of the signal.)
Now here’s the trick to the Karplus-Strong algorithm: each time we read a
value from the buffer, we average it with the last value we read. It is this
averaged value that we use as our output sample. We then take that averaged
sample and feed it back into the buffer. That way, over time, the buffer gets
more and more averaged (this is a simple filter, like the averaging filter
described in Section 3.1). Let’s look at the effect of these two actions
separately.
The "over time" part is where feeding the averaged samples back into the
buffer comes in. If we were to just keep averaging the values from the buffer
but never actually changing them (that is, sticking the average back into the
buffer), then we would still be stuck with a static waveform. We would keep
averaging the same set of random numbers, so we would keep getting the same
results.
Instead, each time we generate a new sample, we stick it back into the buffer.
That way our waveform evolves as we move through it. The effect of this low-
pass filtering accumulates over time, so that as the string "rings," more and
more of the high frequencies are filtered out of it. The filtered waveform is
then fed back into the buffer, where it is filtered again the next time through,
and so on. After enough times through the process, the signal has been
averaged so many times that it reaches equilibrium—the waveform is a flat line
the string has died out.
Figure 4.22 Applying the Karplus-Strong algorithm to a random waveform. After 60
passes through the filter/feedback cycle, all that’s left of the wild random noise is a
gently curving wave.
The result is much like what we described in a plucked string: an initially complex,
periodic waveform that gradually becomes less complex over time and ultimately
fades away.
Figure 4.23 Schematic view of a computer software implementation of the basic
Karplus-Strong algorithm.
For each note, the switch is flipped and the computer memory buffer is filled with
random values (noise). To generate a sample, values are read from the buffer and
averaged. The newly calculated sample is both sent to the output stream and fed back
into the buffer. When the end of the buffer is reached, we simply wrap around and
continue reading at the beginning. This sort of setup is often called a circular
buffer. After many iterations of this process, the buffer’s contents will have been
transformed from noise into a simple waveform.
If you think of the random noise as a lot of energy and the averaging of the buffer as a
way of lessening that energy, this digital explanation is not all that dissimilar from
what happens in the real, physical case.
Physical models generally offer clear, "real world" controls that can be used to
play an instrument in different ways, and the Karplus-Strong algorithm is no
exception: we can relate the buffer size to pitch, the initial random numbers in
the buffer to the energy given to the string by plucking it, and the low-pass
buffer feedback technique to the effect of air friction on the vibrating string.
Many researchers and composers have worked on the plucked string sound as a kind of
basic mode of physical modeling.
One researcher, engineer Charlie Sullivan (who we're proud to say is one of our Dartmouth
Soundfile 4.30 colleagues!) built a "super" guitar in software. Here’s the heavy metal version of "The Star
Super guitar Spangled Banner."
Physical modeling has become one of the most powerful and important current
techniques in computer music sound synthesis. One of its most attractive
features is that it uses a very small number of easy-to-understand building
blocks—delays, filters, feedback loops, and commonsense notions of how
instruments work—to model sounds. By offering the user just a few intuitive
knobs (with names like "brightness," "breathiness," "pick hardness," and so
on), we can use existing sound-producing mechanisms to create new, often
fantastic, virtual instruments.
Soundfile 4.31
An example of
Perry Cook’s
SPASM
Figure 4.24 Part of the interface from Perry R. Cook’s SPASM singing voice
software. Users of SPASM can make Sheila, a computerized singer, sing. Perry Cook
has been one of the primary investigators of musically useful physical models. He’s
released lots of great physical modeling software and source code.