0% found this document useful (0 votes)
26 views7 pages

A Perceptual Assistant To Do Sound Equalization

Uploaded by

23922
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views7 pages

A Perceptual Assistant To Do Sound Equalization

Uploaded by

23922
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

A Perceptual Assistant to do Sound Equalization

Dale Reed
University of Illinois at Chicago, EECS Dept.
851 S. Morgan St. (M/C 154)
Chicago, IL 60607-7053 USA
(312) 413-9478
reed @Iuic.edu

ABSTRACT a human-computer interaction (HCI) paradigm where the


This paper describes an intelligent interface to assist in the computer is used as a tool to sense, process, and act in
expert perceptual task of sound equalization. This is helping the user perform a perceptual task.
commonly done by a sound engineer in a recording studio, The expert system developed here for doing sound
live concert setting, or in setting up audio systems. The equalization is an example of capturing the valuable
system uses inductive learning to acquire expert skill using commodity of human expertise using a computer. Computer
nearest neighbor pattern recognition. This skill is then used learning is needed to help overcome the knowledge
in a sound equalization expert system, which learns to acquisition bottleneck for these systems.
proficiently adjust the timbres (tonal qualities) of brightness,
Human expertise can be separated into expert knowledge and
darkness, and smoothness in a context-dependent fashion.
expert skill. Expert knowledge consists of that which you
The computer is used as a tool to sense, process, and act in
know how to do, such as knowing when medical symptoms
helping the user perform a perceptual task. Adjusting timbres
indicate a heart attack or who composed a particular piece of
of sound is complicated by the fact that there are non-linear
music. Expert skill consists of what you are able to do, such
relationships between equalization adjustments and perceived
as being able to perform heart bypass surgery or to play a
sound quality changes. The developed system shows that the
piece of music. Skills as such do not constitute what we
nearest-neighbor context-dependent equalization is rated 68%
think, but rather what we are. An aging professional athlete
higher than the set linear average equalization and that it is
may still know what to do, but his or her body may no longer
preferred 81% of the time.
be able to execute the action. In our expert system the skill
Keywords consists of changing tonal qualities (timbres) of sounds
Intelligent interfaces, expert systems, learning, perceptual through equalization.
tools, audio equalization In the sections to follow we first discuss the nature of the
perceptual task of sound equalization and related work. We
then discuss using the computer as a tool to capture expertise
1. INTRODUCTION (section 2). In section 3 we look at the underlying
Inductive learning can be used to perform an expert skill computational approach used, that of Nearest Neighbor
using nearest neighbor (NN) pattern recognition. This is Inductive Inference. Then we discuss setting up the
demonstrated through a sound equalization expert system that experiment using equalization to change timbers of sound
learns to proficiently adjust the timbres (tonal qualities) of (section 4), with conclusions presented in section 5.
brightness, darkness, and smoothness in a context-dependent
fashion, creating an intelligent computer interface. This is
innovative in that it applies the established nearest-neighbor 1.1 A Perceptual Task
technique to the new application area of performing a skillful We define a perceptual task as a task where sensory input is
perceptual task. This combination has been made possible processed to appropriately perform some action, e.g. riding a
through advances in computer memory and processor technol- bike or vocal harmonization. We differentiate between
ogy, making previously intractable problems now feasible. sensing and perceiving in that perceiving takes the additional
This work also demonstrates step of incorporating the sensory input into some sort of
usable representation. Perceiving is not just observing, but
Permission to make digital or hard copies of all or part of this work for additionally apprehending. We also differentiate between an
personal or classroom use is granted without fee provided that copies
are not made or distributed for prolit or commercial advantage and that
“ordinary” perceptual task and an expert, or skillful,
copies bear this notice and the full citation on the first page. To copy perceptual task. Many people can drive a car, but few have the
otherwise, to republish, to post on servers or to redistribute to lists, skill to drive in a race. Many people can tell which of two
requires prior specific permission and/or a fee. equalizations for a piece of music they prefer, but few have the
IUI 2000 New Orleans LA USA skill to isolate which frequency bands cause the differences.
Copyright ACM 2000 I-581 13-134-8/00/1...$5.00

212
Though early Artificial Intelligence (AI) researchers felt performance. Second is Harold Cohen’s AARON system
sensory-rich mundane tasks such as vision or locomotion [2][2] that automatically generates paintings through the use
would be easier to solve than “expert” tasks such as medical of an elaborate rule-based system with a flat-bed plotter. Our
diagnosis, the opposite has proven to be true. We show how work differs in that the computer is used as a tool to aid the
a computer system can be used as a tool to aid the user in both user in both perceiving and performing an expert task.
perceiving and performing a sensory-rich task, giving a non-
expert an expert level of performance.
2. USING THE COMPUTER AS A
TOOL
1.2 Sound Equalization In order to use the computer as a perceptual tool, the user must
Sound equalization is used in public address systems, be an integral part of the system. This exploits both the
recording studios, movie theatres, and stereo systems. At a memory and processing power of the machine as well as the
very basic level it is encountered on home stereo systems as intuitive and synthesizing ability of the user. Both the
the treble and bass tone controls. These act as amplifiers and machine and the user perceive and remember independently of
filters changing the amount of energy in different frequency each other, but productive synergy can arise when they are
bands. Equalization is used to make a sound be perceived as combined.
more natural sounding, since audio equipment and room
acoustics change aspects of the original sound. Secondly
equalization is used to give a sound a new property, such as 2.1 General Schema to Capture
making drums sound more resonant or removing a harsh Expertise
“nasal” quality of a singer’s voice. Bartlett [l] has a Consider the schema used to capture expertise shown in
description of terms commonly used to describe timbral Figure 1, applied in this work to the expertise developed by
qualities. a sound engineer. Our goal is to externalize a sound engineer’s
Typically when a sound engineer is setting up a sound internal expertise, capturing it in a form which can be reused
system, the system as a whole is first equalized to compensate by a non-expert.
for the equipment and the listening environment. Next
individual channels are equalized for the microphones on
particular instruments or other sound sources. Expert sound SoundEngineer
engineers are those who have developed through experience
the ability to hear a sound and isolate exactly which one or
several frequencies (out of 31 possible bands) need to be
changed to give a desired effect. This is complicated by the
context-dependent nature of equalization.

1.3 Related Work


Our Human-Computer Interaction (HCI) paradigm of using the
computer as a perceptive tool is related to computer
representations of sensory dam used to create virtual hmges to be Made
environments. For instance visual, aural and tactile feedback
are used by biochemists in the pharmaceutical drug design
process through a simulation representing the atomic
interaction between molecules [5]. Users get tactile feedback
as they manipulate the image of a molecule they are building.
In this case the computer is used as a sensory tool in the
virtual environment, however it isn’t an intelligent tool in Figure 1: Schema used to capture expertise.
that it doesn’t learn. Other examples of virtual environments The engineer first recognizes the features or context of the
are three-dimensional computer games, micro-surgery, and present sound, then remembers similar sounds and
remote robotic control. Processing of sensory data is also equalization changes made in the past with respect to the
used for autonomous vehicle navigation [12] and speech desired outcome. This information is used to infer similar
recognition [9]. The virtual environments described above equalization changes to be made in the present case. Our
are used to present sensory data, though the interface is not intent is that this process be externalized to the point that a
used interactively to enhance a user’s skill level. user can think only about the goals and need not have the
There are two notable examples where a computer does learn expertise to match features or infer equalization changes. The
to perform skillfully. The first is Lee Spector’s GenBebop same schematic would apply to perceptual tasks other than
program [lo] where a genetic algorithm is used to create our example of sound engineering. By introducing a computer
improvisation based on a short underlying musical segment. into the loop, the stimulus, context, goal, and resulting
The result is very interesting, though arguably not expert

213
changes can all be remembered for later use, possibly by a computer is used as a tool to help perceive the input (Context
non-expert. Analysis), induce the proper action to be taken (Inferencing),
and also cause the resulting perceptual change (Modifications
Control). The system also has the ability to change
2.2 Capturing Expertise Using a dynamically according to user preferences by remembering
Computer in the Loop the users’ feedback in cases where the suggested change was
Figure 2 illustrates how the expertise-gathering inadequate.
schematic from Figure 1 can be implemented in a computer Now let us take a look in more detail at the Inferencing
system. The system must first be trained, accumulating the module as implemented here using nearest neighbor pattern
body of experience that constitutes the system’s expertise. recognition.
The second phase, performance, uses the accumulated
knowledge using inductive inference as shown by the thick
light-gray lines. In order to train the system an expert user 3. NEAREST NEIGHBOR INDUCTIVE
perceives the stimulus and is given a goal. The user INFERENCE
manipulates the stimulus using the computer to achieve an Symbolic Artificial Intelligence processing is involved with
aesthetically pleasing difference with respect to the goal. The “figuring out the rules,” or coming up with the underlying
context of the original stimulus (the auditory identifying primitives and their relation to each other. Sometimes it is
signature), along with the new computer changes for the not possible to figure out the rules or in fact not necessary,
selected goal are then stored in the database. particularly in cases where you are capturing a skill rather
than knowledge (e.g. riding a bicycle.) In this research we are
using knowledge without completely understanding its
primitives and their relationships. Rather than “figuring out”
or reasoning, we use pattern matching to inductively solve
new problems in analogous ways to previously seen similar
Analysis
situations - an “expertise oracle,” as it were. Stated another
way, “Intelligence is as intelligence does.” This is done
algorithmically using Nearest Neighbor inductive inference.

3.1 Related Inductive Methods


Genetic algorithms [4] arc one of the most popular
f inductive inference methods, though they suffer from
I I I
I lengthy training time. Modified classifier systems [3]
reduce the training time, but have great sensitivity used in
reward and punishment values. Decision trees [8] provide
Figure 2: Capturing expertise with a computer in the efficient lookup, but suffer from a need for very large dam
loop sets and a length set up time compared to the Nearest
Neighbor (NN) [3] approach. Although knowledge of the
In our application to equalization the Stimulus is a sound. The relationships between examples is more opaque when
Context Analysis yields a representative “signature” made up using NN, it has the advantages of being very
of a measurementof the average energy per frequency in each straightforward, sensitive to local populations, and
sound. The Goals are changes in the timbres of brightness, adaptable to dynamic changes in the data.
darkness, and smoothness. Each example’s Changes are the
equalizer settings used to implement the goal, and the
Modifications Control is an audio equalizer. 3.2 Nearest Neighbor Description
For the performance phase, we add the inferencing module. As Nearest neighbor is an example-based pattern recognition
before, a stimulus (e.g. a sound being played) enters the approach where all the dam points are stored in an n-
system, but this time the user selects a goal. The system does dimensional space (hypersphere). A new example is mapped
pattern matching on the stimulus’ signature, finding the n into that space and its predicted outcome is computed from the
most similar previously recorded examples (signature-goal outcomes of it neighbors, that is the points close to it. These
pairs) in the data base, using nearest-neighbor pattern match- points share similar characteristics.
ing. The system then makes the same (or very similar) Consider applying NN to credit-risk analysis, where
changes to the present stimulus as was made to the previously information from a credit card application is evaluated in
captured stimuli (nearest neighbors) for the same goal. The order to assign a credit rating to an applicant. Applicants
user can provide corrective feedback, with these changes whose credit rating falls below some threshold will not be
added to the database as a new example. Note how the given a credit card due to the risk involved.

214
New Exemplar:
lM4filfal: Divorced
Income: $3OK
Age: 45
Credit Worthiness:?

Figure 3: Placing a new example by its nearest neighbors. Outcome of credit-worthiness


is determined by outcomes of its neighbors.

This is illustrated in Figure 3, where we are trying to equalization. In other words you can’t just always “do the
predict the credit-worthiness of a loan applicant based on samething” to give a desired perceptual effect. It depends on
marital status, income, and age. The outcome of credit- what the underlying soundis.
worthiness is represented by the numbers inside the boxes,
and the location of the boxes reflect the other fields’ values.
We have chosen credit-worthiness to be scaled between 1 and 4.1 Adjusting Timbres
10 for this example, where 10 is most credit-worthy. Placing
the new example into the hypersphere of only 3 dimensions We looked specifically at the timbres of brightness, dark-
in this case we find that it is closest to two other points ness, and smoothness, as illustrated in Figure 4
whose outcomes are respectively 4 and 6. The new example’s Brightness can be thought of as high-frequency emphasis,
outcome is then some function of those values, either by with weaker low frequencies. Darkness can be thought of
some sort of weighted average (“5” in this example) or the as the opposite of brightness, with lower frequency
value that occurs most frequently in cases when there are emphasis and a decreasein high frequency energy. A
multiple “close” values. sound is smooth if it is easy on the ears, not harsh, with a
In our implementation applied to sound equalization, each flat frequencyresponse,especially in the mid-range, with
field (or dimension) is actually a measureof energy in one of an absenceof peaksand dips in the response.
the frequency bands averaged over the length of the sound.
Experts’ use of the system serves to train it, populating the ‘ffe RI&&y
NN search space. When a non-expert uses the system, new I y SGSSMYM
After Eq 1
I I
sounds are compared to existing ones, and new changes made
are similar to those done in the past for similar sounds the
system has already heard.

4. CHANGING TIMBRE USING


EQUALIZATION More More More
The goal of the implementation was to create a trained system
Brightness Darkness Smoothness
usable as a tool by a non-expert to do expert sound
equalization (eq), changing the tonal quality, or timbre of a
sound using equalization through an implementation of a NN
inductive inference system. We discovered that context needs Figure 4: Equalization changes for three
to be taken into account in affecting timbre through timbres.

21.5
As mentioned previously, what makes this equalization task user interface consisted of a lo-band’on-screen equalizer with
difficult is that the equalization changes are context real-time measurement of energy per frequency band. The
dependent. What makes one sound brighter may not work for sounds used were taken from unprocessed studio master tracks
another. Making a cymbal brighter would involve increasing of typical folk/rock music (e.g. vocals, guitars, basses,
the energy in the highest frequencies available (the sliders drums, etc.). 41 stereo sound segments approximately 15
furthest to the right on a graphic equalizer), but doing the seconds long each were used for the training session. The
same thing to an electric bass sound may not make any testing session sounds were a distinct set of 10 more sounds.
difference at all. This is becausethere is no energy present at In order to be able to do pattern matching a “signature”
those high frequencies to begin with. When adjusting sliders consisting of measurement of energy per each of the nine
to make an equalization change, one must take into account frequency bands over all 15 seconds was taken for each sound,
the characteristics of the underlying sound. It isn’t possible with a filter to exclude quiet spots in the sound segment in the
to just always do the same thing to every sound for a desired averaging. For example, we did not want the measurement of
effect. Equalizations are not only context-dependent, but they average energy in a chum sound to include the silences
are non-linear as well. Moving certain sliders could make a between beats. The signature of energy in the nine bands was
sound increasingly smooth, but after a point continuing to used to place each sound in a nine-dimensional space (nine
move the same sliders in the same direction could give an dimensional array) for searching using nearest neighbor.
unpleasant quality to the sound.

Enor~y per Band 4.3 Hearing Test


Before training the system we ran a brief hearing test. Users
were presented with two sounds, where one of them
sometimes had an equalization change applied to it. He or she
then indicated whether or not the two sounded the same or
different. 30 such judgements were gathered, giving an
indication as to how well the user could discern equalization
changes. Results ranged from 60% to 90% correct judgements
for the 17 subjects.

4.4 Training Phase


The interface shown in Figure 6 was used by each of the 17
subjects in equalizing the 41 sound segments. All examples
1Figure 5: Energy per frequency band for three sounds. with “Brightness” as a goal were done first, then those for
“Darkness” and finally “Smoothness.” The system first
highlighted the desired goal, where the goal was to give an
For example, consider the goal of an increase in brightness aesthetically pleasing increase in brightness, darkness, or
applied to the three sounds (bass, acoustic guitar, and smoothness. The user then made equalization changes using
rainstick) whose energy graphs are shown in Figure 6. To the on-screen eq sliders as the sound was playing, trying to
make the bass sound brighter we would want to increase the achieve the highlighted goal. The “Flat Eq” and “Changed
energy in the bands 500, IK, and 2K. To make the rainstick EQ” buttons allowed subjects to compare the changed sounds
sound brighter, however, we would have to increase the to the original sounds. These controls were used in real time,
energy in bands 4K, and 8K, which is different. while the sound was being played. The sound could be
replayed as many times as needed. Once subjects were
satisfied that the goal had been met, selecting the “Next”
4.2 Experiment Setup button took them to the next training example.
The 17 subjects used in training and testing the system were
sound reinforcement professionals as well as some music
students, Subjects were first given a hearing test to determine
their ability to hear the difference between different
equalizations. Next the system was trained as users performed
equalizations using the system. Then users evaluated the
system’s level of acquired expertise. Each of these 3 phases (
Hearing Test, Training, Testing) is further described below.
The physical listening chamber was lined with sound baffling
panels setup to eliminate any early reflections. The graphical
’ The sampling rate of 22.05 kHz. limited the highest
sampled frequency to 11 kHz., so the tenth band at 16 kHz.
(topmost) was disabled.

216
The “No change” equalization was used as a control. Each of
these equalizations was representedby one of the Eq Selection
Buttons. The linear average was the mean slider change across
all 11 users for all 41 sounds for the current goal. This
average embodied the approach of “always do the same thing”
for a desired goal, such as always increasing the rightmost
sliders to make a sound more bright.
At the other extreme the NN average was the mean slider
change of the 2 nearest neighbors from that subject’s training
session only. The nearest neighbors were computed by
comparing the example’s signature (energy per each of the 9
bands) with the signatures of the stored data. This was
essentially placing the example point in a 9-dimensional
space and finding the two closest points. The correspondence
between the above three types of equalizations and the eq
selection button positions were randomized on each
presentation.
To start the sound playing, subjects selected one of the lQ
Figure 6: Interface for training the system. Slider selection buttons, “Eq A ” , “Eq B”, or “Eq C,” which triggered
changes corresponding to the presented goal are I-IXO& the sound to start being played. This selected equalization was
by the system. then compared to the original sound by clicking on the “Flat
Eq” button. Each of the equlizations was then given a rating
using the slider below it as to how good of a job it did at
For each user, for each of the sound-goal combinations, the giving an aesthetically pleasing increase in the highlighted
computer then created and stored an example consisting of: goal. Once the user was satisfied with his or her eq ratings,
1. Soundfile name selecting the “Next” button advanced to the next example and
corresponding goal. The eq selection button for “no change”
2. Goal (one of 3 from the goals window) was actually an identical setting to the “Flat Eq” button, a fact
3. Final slider positions (scaled from 0 to 31 for each that was not always recognized by the subjects.
slider)
4. Energy-per-band “signature” for that sound.
These examples were accumulated in the database, embodying
the system’s knowledge.

4.5 Testing Phase


The 17 subjects from the training phase were evaluated to
select the best 11 subjects to continue with the testing phase.
These 11 were determined by analyzing the extent to which
they moved the sliders. Users who had to move the sliders to
an extreme in order to effect a perceptible change in the sound
were eliminated. As expected, it turned out the better the
subject’s hearing as measuredby the hearing test, the less the
subject tended to move the sliders. The accumulated data for
the 11 selected subjects became the dataset of 451 examples
per each of the three goals, for a total of 1353 examples,
embodying the “knowledge” of the system.
The interface screen for the testing phase is shown in Figure Figure 7: Interface for testing phase. Three possible
7. Users were askedto give a rating to each of three different equalizations are rated as to how well they do in giving an
equalizations presented by the computer. These three choices aesthetically pleasing increase in the highlighted goal.
were:
1. A linear average
4.6 Results
2. The NN average
The experiment validated the hypothesis that nearest
3. No change neighbor pattern matching (context dependent) does a better
job at equalizations than does a linear average. Although the

217
Eq Ratings sliders are labeled on the screen from 1 to 5, they
actually mapped to values from 1 to 15. The mean evaluation
of the “no change” equalization (the control) was 2, the linear 6. ACKNOWLEDGMENTS
(non-context dependent) equalization mean rating was 6. The Thanks to Orion Poplawski, Timothy Mills, and Dave Angulo
nearest neighbor (context dependent) equalization was over for assistance in programming. Thanks to Doug Jones for
10.08, which is 68% better than the linear equalization. testing speakers and to Peter Langston for providing sound
files. Thanks to the following for the many hours of work
Brightness was the easiest timbre for subjects to identify, training and testing the system: Tom Miller, Rob Motsinger,
followed by darkness, with smoothness being the most Jeff York, Moses Ling, Helen Hudgens, Shaun Morrison,
difficult. Rank ordering of which of the three equalizations David Schuman, Mike and Lisa Danforth, Dick Cutler, Jeff
was preferred showed than NN equalizations were preferred Cline, Pablo Perez, Stan Sheft, Norman Kruger, John
81% of the time. Bobenko, and John Lanphere.

5. CONCLUSIONS 7. REFERENCES
The trained system developed here has been implemented as
an expert equalizer (Figure S), where a sound is seIected, and
Ul Bartlett, BNCe, and Bartlett, Jenny. 1995. Engineer’s
Guide to Studio Jargon. EQ (February): 36-41.
then simply by moving a slider under the desired goal, a
context-dependent appropriate amount of equalization is PI Cohen, Harold. The further exploits of AARON, painter,
done. Stanford 4:2.
One way to look at the system is that it implements a many- [31 Frey, Peter W., and Slate, David J. 1991 Letter
to-one mapping, putting many complicated controls into a Recognition Using Holland-Style Adaptive Classifiers.
single control that appropriately affects the outcome. The Machjne. The Netherlands: Kluwer Publishers,
paradigm presented here could be used to exploit the computer 6:2 (March).
as a tool in extending users’ perception in the modalities of
sight or smell or other applications in hearing. Using nearest r41 Holland, John 1986. Escaping Brittleness: The
Possibilities of General Purpose Learning Algorithms
neighbor for a perceptive task could be used by airlines in
Applied to Parallel Rule-Based systems. In R.S.
interpreting video or x-ray data in explosives detection in
Michalski, J.G. Carbonell, & T.M. Mitchell, eds.,
luggage [7] or by the Navy in interpreting audio signals for
Machine Lear&u II. Los Altos, CA: Morgan Kaufman.
submarine detection.
As illustrated in this work’s application to sound PI htt
r, :I www n Docke
equalization, a computer can be used as a perceptive tool to 1.
give a user an expert level of skill. [61 McCorduck, Pamela. 1991. AARON’s code: meta-art,
grtificial
intelligence. hen.
New York : W.H. Freeman.
[71 Murphy, Erin E. 1989. A Rising War on Terrorists.
Spectrum, 26: 1I :33-36.
[81 Quinlan, J. R. 1986. Induction of Decision Trees.
Machine Learning 1:81-106.
PI Rudnicky, Alexander I., Hauptmann, Alexander G., Lee,
Kai-Fu. Survey of Current Speech Technology.
Communications of the ACM 37:3 (March): 52-57.
uo1 Spector, Lee.-1995. International
Artificial. Montreal, Canada, August 20-
25. Workshop on AI & Music. In Press.
r111 Stanfill,C., and Waltz, D. 1986. Toward memory-based
reasoning. Communications of the ACM, 29: 1213-
1228.

Figure 8: Expert equalizer interface. Slider changes in


w1 Thorpe, C., Herbert, M., Kanade, T. and Shafer, S. 1987.
Vision and navigation for the Carnegie-Mellon
the ‘T!!q Effects” window for a particular goal NAVLAE3. In @ual Review of Computer Science. Vol,
automatically give a context-dependent equalization. a. Annual Reviews Inc., Palo Alto, Calif.

218

You might also like