A Perceptual Assistant To Do Sound Equalization
A Perceptual Assistant To Do Sound Equalization
Dale Reed
University of Illinois at Chicago, EECS Dept.
851 S. Morgan St. (M/C 154)
Chicago, IL 60607-7053 USA
(312) 413-9478
reed @Iuic.edu
212
Though early Artificial Intelligence (AI) researchers felt performance. Second is Harold Cohen’s AARON system
sensory-rich mundane tasks such as vision or locomotion [2][2] that automatically generates paintings through the use
would be easier to solve than “expert” tasks such as medical of an elaborate rule-based system with a flat-bed plotter. Our
diagnosis, the opposite has proven to be true. We show how work differs in that the computer is used as a tool to aid the
a computer system can be used as a tool to aid the user in both user in both perceiving and performing an expert task.
perceiving and performing a sensory-rich task, giving a non-
expert an expert level of performance.
2. USING THE COMPUTER AS A
TOOL
1.2 Sound Equalization In order to use the computer as a perceptual tool, the user must
Sound equalization is used in public address systems, be an integral part of the system. This exploits both the
recording studios, movie theatres, and stereo systems. At a memory and processing power of the machine as well as the
very basic level it is encountered on home stereo systems as intuitive and synthesizing ability of the user. Both the
the treble and bass tone controls. These act as amplifiers and machine and the user perceive and remember independently of
filters changing the amount of energy in different frequency each other, but productive synergy can arise when they are
bands. Equalization is used to make a sound be perceived as combined.
more natural sounding, since audio equipment and room
acoustics change aspects of the original sound. Secondly
equalization is used to give a sound a new property, such as 2.1 General Schema to Capture
making drums sound more resonant or removing a harsh Expertise
“nasal” quality of a singer’s voice. Bartlett [l] has a Consider the schema used to capture expertise shown in
description of terms commonly used to describe timbral Figure 1, applied in this work to the expertise developed by
qualities. a sound engineer. Our goal is to externalize a sound engineer’s
Typically when a sound engineer is setting up a sound internal expertise, capturing it in a form which can be reused
system, the system as a whole is first equalized to compensate by a non-expert.
for the equipment and the listening environment. Next
individual channels are equalized for the microphones on
particular instruments or other sound sources. Expert sound SoundEngineer
engineers are those who have developed through experience
the ability to hear a sound and isolate exactly which one or
several frequencies (out of 31 possible bands) need to be
changed to give a desired effect. This is complicated by the
context-dependent nature of equalization.
213
changes can all be remembered for later use, possibly by a computer is used as a tool to help perceive the input (Context
non-expert. Analysis), induce the proper action to be taken (Inferencing),
and also cause the resulting perceptual change (Modifications
Control). The system also has the ability to change
2.2 Capturing Expertise Using a dynamically according to user preferences by remembering
Computer in the Loop the users’ feedback in cases where the suggested change was
Figure 2 illustrates how the expertise-gathering inadequate.
schematic from Figure 1 can be implemented in a computer Now let us take a look in more detail at the Inferencing
system. The system must first be trained, accumulating the module as implemented here using nearest neighbor pattern
body of experience that constitutes the system’s expertise. recognition.
The second phase, performance, uses the accumulated
knowledge using inductive inference as shown by the thick
light-gray lines. In order to train the system an expert user 3. NEAREST NEIGHBOR INDUCTIVE
perceives the stimulus and is given a goal. The user INFERENCE
manipulates the stimulus using the computer to achieve an Symbolic Artificial Intelligence processing is involved with
aesthetically pleasing difference with respect to the goal. The “figuring out the rules,” or coming up with the underlying
context of the original stimulus (the auditory identifying primitives and their relation to each other. Sometimes it is
signature), along with the new computer changes for the not possible to figure out the rules or in fact not necessary,
selected goal are then stored in the database. particularly in cases where you are capturing a skill rather
than knowledge (e.g. riding a bicycle.) In this research we are
using knowledge without completely understanding its
primitives and their relationships. Rather than “figuring out”
or reasoning, we use pattern matching to inductively solve
new problems in analogous ways to previously seen similar
Analysis
situations - an “expertise oracle,” as it were. Stated another
way, “Intelligence is as intelligence does.” This is done
algorithmically using Nearest Neighbor inductive inference.
214
New Exemplar:
lM4filfal: Divorced
Income: $3OK
Age: 45
Credit Worthiness:?
This is illustrated in Figure 3, where we are trying to equalization. In other words you can’t just always “do the
predict the credit-worthiness of a loan applicant based on samething” to give a desired perceptual effect. It depends on
marital status, income, and age. The outcome of credit- what the underlying soundis.
worthiness is represented by the numbers inside the boxes,
and the location of the boxes reflect the other fields’ values.
We have chosen credit-worthiness to be scaled between 1 and 4.1 Adjusting Timbres
10 for this example, where 10 is most credit-worthy. Placing
the new example into the hypersphere of only 3 dimensions We looked specifically at the timbres of brightness, dark-
in this case we find that it is closest to two other points ness, and smoothness, as illustrated in Figure 4
whose outcomes are respectively 4 and 6. The new example’s Brightness can be thought of as high-frequency emphasis,
outcome is then some function of those values, either by with weaker low frequencies. Darkness can be thought of
some sort of weighted average (“5” in this example) or the as the opposite of brightness, with lower frequency
value that occurs most frequently in cases when there are emphasis and a decreasein high frequency energy. A
multiple “close” values. sound is smooth if it is easy on the ears, not harsh, with a
In our implementation applied to sound equalization, each flat frequencyresponse,especially in the mid-range, with
field (or dimension) is actually a measureof energy in one of an absenceof peaksand dips in the response.
the frequency bands averaged over the length of the sound.
Experts’ use of the system serves to train it, populating the ‘ffe RI&&y
NN search space. When a non-expert uses the system, new I y SGSSMYM
After Eq 1
I I
sounds are compared to existing ones, and new changes made
are similar to those done in the past for similar sounds the
system has already heard.
21.5
As mentioned previously, what makes this equalization task user interface consisted of a lo-band’on-screen equalizer with
difficult is that the equalization changes are context real-time measurement of energy per frequency band. The
dependent. What makes one sound brighter may not work for sounds used were taken from unprocessed studio master tracks
another. Making a cymbal brighter would involve increasing of typical folk/rock music (e.g. vocals, guitars, basses,
the energy in the highest frequencies available (the sliders drums, etc.). 41 stereo sound segments approximately 15
furthest to the right on a graphic equalizer), but doing the seconds long each were used for the training session. The
same thing to an electric bass sound may not make any testing session sounds were a distinct set of 10 more sounds.
difference at all. This is becausethere is no energy present at In order to be able to do pattern matching a “signature”
those high frequencies to begin with. When adjusting sliders consisting of measurement of energy per each of the nine
to make an equalization change, one must take into account frequency bands over all 15 seconds was taken for each sound,
the characteristics of the underlying sound. It isn’t possible with a filter to exclude quiet spots in the sound segment in the
to just always do the same thing to every sound for a desired averaging. For example, we did not want the measurement of
effect. Equalizations are not only context-dependent, but they average energy in a chum sound to include the silences
are non-linear as well. Moving certain sliders could make a between beats. The signature of energy in the nine bands was
sound increasingly smooth, but after a point continuing to used to place each sound in a nine-dimensional space (nine
move the same sliders in the same direction could give an dimensional array) for searching using nearest neighbor.
unpleasant quality to the sound.
216
The “No change” equalization was used as a control. Each of
these equalizations was representedby one of the Eq Selection
Buttons. The linear average was the mean slider change across
all 11 users for all 41 sounds for the current goal. This
average embodied the approach of “always do the same thing”
for a desired goal, such as always increasing the rightmost
sliders to make a sound more bright.
At the other extreme the NN average was the mean slider
change of the 2 nearest neighbors from that subject’s training
session only. The nearest neighbors were computed by
comparing the example’s signature (energy per each of the 9
bands) with the signatures of the stored data. This was
essentially placing the example point in a 9-dimensional
space and finding the two closest points. The correspondence
between the above three types of equalizations and the eq
selection button positions were randomized on each
presentation.
To start the sound playing, subjects selected one of the lQ
Figure 6: Interface for training the system. Slider selection buttons, “Eq A ” , “Eq B”, or “Eq C,” which triggered
changes corresponding to the presented goal are I-IXO& the sound to start being played. This selected equalization was
by the system. then compared to the original sound by clicking on the “Flat
Eq” button. Each of the equlizations was then given a rating
using the slider below it as to how good of a job it did at
For each user, for each of the sound-goal combinations, the giving an aesthetically pleasing increase in the highlighted
computer then created and stored an example consisting of: goal. Once the user was satisfied with his or her eq ratings,
1. Soundfile name selecting the “Next” button advanced to the next example and
corresponding goal. The eq selection button for “no change”
2. Goal (one of 3 from the goals window) was actually an identical setting to the “Flat Eq” button, a fact
3. Final slider positions (scaled from 0 to 31 for each that was not always recognized by the subjects.
slider)
4. Energy-per-band “signature” for that sound.
These examples were accumulated in the database, embodying
the system’s knowledge.
217
Eq Ratings sliders are labeled on the screen from 1 to 5, they
actually mapped to values from 1 to 15. The mean evaluation
of the “no change” equalization (the control) was 2, the linear 6. ACKNOWLEDGMENTS
(non-context dependent) equalization mean rating was 6. The Thanks to Orion Poplawski, Timothy Mills, and Dave Angulo
nearest neighbor (context dependent) equalization was over for assistance in programming. Thanks to Doug Jones for
10.08, which is 68% better than the linear equalization. testing speakers and to Peter Langston for providing sound
files. Thanks to the following for the many hours of work
Brightness was the easiest timbre for subjects to identify, training and testing the system: Tom Miller, Rob Motsinger,
followed by darkness, with smoothness being the most Jeff York, Moses Ling, Helen Hudgens, Shaun Morrison,
difficult. Rank ordering of which of the three equalizations David Schuman, Mike and Lisa Danforth, Dick Cutler, Jeff
was preferred showed than NN equalizations were preferred Cline, Pablo Perez, Stan Sheft, Norman Kruger, John
81% of the time. Bobenko, and John Lanphere.
5. CONCLUSIONS 7. REFERENCES
The trained system developed here has been implemented as
an expert equalizer (Figure S), where a sound is seIected, and
Ul Bartlett, BNCe, and Bartlett, Jenny. 1995. Engineer’s
Guide to Studio Jargon. EQ (February): 36-41.
then simply by moving a slider under the desired goal, a
context-dependent appropriate amount of equalization is PI Cohen, Harold. The further exploits of AARON, painter,
done. Stanford 4:2.
One way to look at the system is that it implements a many- [31 Frey, Peter W., and Slate, David J. 1991 Letter
to-one mapping, putting many complicated controls into a Recognition Using Holland-Style Adaptive Classifiers.
single control that appropriately affects the outcome. The Machjne. The Netherlands: Kluwer Publishers,
paradigm presented here could be used to exploit the computer 6:2 (March).
as a tool in extending users’ perception in the modalities of
sight or smell or other applications in hearing. Using nearest r41 Holland, John 1986. Escaping Brittleness: The
Possibilities of General Purpose Learning Algorithms
neighbor for a perceptive task could be used by airlines in
Applied to Parallel Rule-Based systems. In R.S.
interpreting video or x-ray data in explosives detection in
Michalski, J.G. Carbonell, & T.M. Mitchell, eds.,
luggage [7] or by the Navy in interpreting audio signals for
Machine Lear&u II. Los Altos, CA: Morgan Kaufman.
submarine detection.
As illustrated in this work’s application to sound PI htt
r, :I www n Docke
equalization, a computer can be used as a perceptive tool to 1.
give a user an expert level of skill. [61 McCorduck, Pamela. 1991. AARON’s code: meta-art,
grtificial
intelligence. hen.
New York : W.H. Freeman.
[71 Murphy, Erin E. 1989. A Rising War on Terrorists.
Spectrum, 26: 1I :33-36.
[81 Quinlan, J. R. 1986. Induction of Decision Trees.
Machine Learning 1:81-106.
PI Rudnicky, Alexander I., Hauptmann, Alexander G., Lee,
Kai-Fu. Survey of Current Speech Technology.
Communications of the ACM 37:3 (March): 52-57.
uo1 Spector, Lee.-1995. International
Artificial. Montreal, Canada, August 20-
25. Workshop on AI & Music. In Press.
r111 Stanfill,C., and Waltz, D. 1986. Toward memory-based
reasoning. Communications of the ACM, 29: 1213-
1228.
218