100% found this document useful (1 vote)
50 views7 pages

Listening To "Naima": An Automated Structural Analysis of Music From Recorded Audio

The document summarizes an automated system that analyzes the structure of music from a recorded audio file. The system takes a digital audio file as input, such as from a compact disc, and outputs an explanation of the music's structure by identifying repeated sections. When applied to the song "Naima" by John Coltrane, the system is able to recognize the song's AABA form and notices that the initial AA section is omitted the second time through. The algorithms used by the system and results from analyzing two additional songs are also described.

Uploaded by

Steven Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
50 views7 pages

Listening To "Naima": An Automated Structural Analysis of Music From Recorded Audio

The document summarizes an automated system that analyzes the structure of music from a recorded audio file. The system takes a digital audio file as input, such as from a compact disc, and outputs an explanation of the music's structure by identifying repeated sections. When applied to the song "Naima" by John Coltrane, the system is able to recognize the song's AABA form and notices that the initial AA section is omitted the second time through. The algorithms used by the system and results from analyzing two additional songs are also described.

Uploaded by

Steven Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Listening to “Naima”: An Automated Structural Analysis of

Music from Recorded Audio

Roger B. Dannenberg
School of Computer Science, Carnegie Mellon University
email: [email protected]

1.1 Abstract This project was motivated by music information


retrieval problems. Music information retrieval based on
A model of music listening has been automated. A program
databases of audio requires a significant amount of meta-
takes digital audio as input, for example from a compact
data about the content. Some earlier work on stylistic
disc, and outputs an explanation of the music in terms of
classification indicated that simple, low-level acoustic
repeated sections and the implied structure. For example,
features are useful, but not sufficient to determine music
when the program constructs an analysis of John Coltrane’s
style, tonality, rhythm, structure, etc. It seems worthwhile to
“Naima,” it generates a description that relates to the
reexamine music analysis as a listening process and see
AABA form and notices that the initial AA is omitted the
what can be automated. A good description of music can be
second time. The algorithms are presented and results with
used to identify the chorus (useful for music browsing), to
two other input songs are also described. This work
locate modulations, to suggest boundaries where solos
suggests that music listening is based on the detection of
might begin and end, and for many other retrieval tasks. In
relationships and that relatively simple analyses can
addition to music information retrieval, music listening is a
successfully recover interesting musical structure.
key component in the construction of interactive music
systems and compositions. (Rowe, 1993) The techniques
2 Introduction described here show promise for all of these tasks.
When we listen to a piece of music, we pay attention to While thinking about the problems of automated
repetition, and we use repetition or the lack of it to listening, given the well-known problems of polyphonic
understand the structure of the music. This in turn helps us transcription, I happened to hear a recording of a jazz ballad
to anticipate what will come next, remember what we have played in the distance. After recognizing the tune, it
heard, relate the music to other music, and explain or occurred to me that the signal-to-noise ratio of this setting
develop simple models of what we are hearing. Any was so bad that I could hardly hear anything but the
structural relationship that we perceive, not just repetition, saxophone, yet the structure of the music was strikingly
functions in the same way. Although not used in the present clear. I wondered, “Could a computer derive the same
study, pitch transposition is an especially salient structure from this same signal?” and “If so, could this serve
relationship. as a model for music understanding?”
In my opinion, this is the essence of listening to music.
We hear relationships of all kinds among different sounds. 3 Related Work
We develop theories or models that predict what
Many other researchers have considered the importance
relationships are important and recurrent. Sometimes we are
of patterns and repetition in music. David Cope’s work
right, and sometimes that is interesting. Sometimes we are
explores pattern processing to analyze music, generally with
wrong, and that can be even more interesting. These ideas
the goal of finding commonalities among different
are not at all new (Simon & Sumner, 1968), but it is good to
compositions. (Cope, 1996) This work is based on symbolic
repeat them here in a way that emphasizes how simple the
music representations and is aimed at the composition rather
whole music listening and music understanding process
than the listening process. Eugene Narmour has published a
might be, at least at some level.
large body of work on cognitive models for music listening.
Starting with this completely simple-minded view of
In one recent publication, Narmour (2000) explores
what music and listening are about, my question is, can this
structural relationships and analogies that give rise to
conception of music understanding be modeled and
listeners’ expectations. Narmour quotes Schenker as saying
automated? In particular, I am interested in recovering
that “repetition … is the basis of music as an art.” The more
structure and information from actual audio – not symbolic
elaborate rules developed by Narmour are complex
notation, not synthesized examples, but recorded audio as
examples of structural relationships described here.
found on a CD.
Published as: Roger B. Dannenberg (2002). “Listening to ‘Naima’: An Automated Structural Analysis of
Music from Recorded Audio.” In Proceedings of the International Computer Music Conference. San
Francisco: International Computer Music Association.
Simon and Sumner (Simon & Sumner, 1968) developed significant similarities. Fifth, a clustering algorithm is used
a model of music listening and music memory in which to find groups of similar melodic material. For example, we
music is coded as simply as possible using operators such as would hope to find a cluster representing the three A’s in the
repeat and transpose. Compact encodings convey structural AABA structure. Sixth, while interesting, the clusters reflect
relationships within a composition, so my work is consistent many more relationships than a human would typically
with theirs, and is certainly inspired by it. Other researchers describe. A final pass works left-to-right (in time order) to
have noticed that data compression relies upon the find an “explanation” for the piece as a whole.
discovery and encoding of structure, and so data The following sections describe this analysis in detail.
compression techniques have been applied to music as a The results of each stage are described, leading to a final
form of analysis. An application to music generation is seen analysis.
in work by Lartillot, Dubnov, Assayag, and Bejerano
(2001). 5 Melody Extraction
Mont-Reynaud and Goldstein (1985) investigated the
discovery of rhythmic patterns to locate possible After making a digital copy of the entire recording from
transcription errors. Colin Meek created a program to search CD, it was observed that the left channel contains a much
for common musical sequences, and his program has been stronger saxophone signal. This channel was down-sampled
used to identify musical themes. (Meek & Birmingham, to 22.05kHz and saved as a mono sound file for analysis.
2001) Conklin and Anagnostopoulou (2001) describe a The manual steps here could easily be automated by looking
technique for finding recurrent patterns in music, using an for the channel with the strongest signal or by analyzing
expectation estimation to determine which recurring both channels separately and taking the one with the best
patterns are significant. This analysis relies on exact results.
matches. Another approach to pattern extraction is found in Pitch is estimated using autocorrelation to find candidate
Rolland and Ganascia (2000). Stammen and Pennycook pitches, and a peak-picking algorithm to decide the best
used melodic similarity measures to identify melodic estimate: Evaluate windows of 256 samples every 0.02s.
fragments in jazz improvisations. (Stammen & Pennycook, Perform an autocorrelation on the window. Searching from
1993) left to right (highest frequency to lowest), first look for a
The nature of music listening and music analysis has significant dip in the autocorrelation to avoid false peaks
been a topic of study for many years. A full review is that occur very close to zero. Then search for the first peak
beyond the scope of this paper, but this list may highlight that is within 90% of the highest peak. Sometimes there is a
the variety of efforts in this area. candidate at double this frequency that looks almost as
good, so additional rules give preference to strong peaks at
higher frequencies. Details are available as code from the
4 Overview author; however, the enhanced autocorrelation method
The recording initially examined in this work is (Tolonen & Karjalainen, 2000), unknown to us when this
“Naima,” composed by John Coltrane (1960) and recorded work was started, would probably give better results.
by his quartet. As an aside, a danger of this work is that after Furthermore, there are much more sophisticated approaches
repeated exposure, the researcher is bound to have any for dealing with pitch extraction of melody from polyphonic
recording firmly “stuck” in his or her head, so the choice of sources. (Goto, 2001)
material should be made carefully! “Naima” is basically an Figure 1 illustrates the waveform and an associated pitch
AABA form, where the A section is only 4 measures, and B contour.
is 8 measures. There are interesting and clear rhythmic
motives, transpositional relationships, and harmonic
structures as well, making this an ideal test case for analysis.
The analysis takes place in several stages. First, the
melody is extracted. This is complicated by the fact that the
piece is performed by a jazz quartet, but the task is
simplified by the clear, sustained, close-miked, and
generally high-amplitude saxophone lines. Second, the pitch Figure 1. “Naima” left channel from CD recording
estimates are transcribed into discrete pitches and durations, amplitude (top) and pitch contour (bottom). The lines that
using pitch confidence level and amplitude cues to aid in descend to the bottom represent locations where no pitch
was detected, reported and plotted as zero values. The
segmentation. Third, the transcribed melodic sequence is
middle of the piece is a piano solo where very little pitch
analyzed for embedded similarities using a matrix information was recovered. An ascending scale is clearly
representation to be described. A simple, recursive melodic visible at the end of the piece.
similarity algorithm was developed to be tolerant of
transcription errors. Fourth, the similarity matrix is reduced
by removing redundant information, leaving the most

-2-
6 Transcription and Segmentation
The next step is to create a list of discrete notes. RMS
amplitude information is derived from the original signal by
removing frequencies below 200 Hz, computing the RMS
over non-overlapping, square windows of duration 0.01s.
The transcription works by looking for consecutive,
consistent pitch estimates. We step one pitch estimate at a
time, but look at overlapping groups of 15 estimates to help Figure 2. Transcription of “Naima.” The saxophone at
deal with noise and error. At each step, a group of 15 roughly the first and last thirds of the piece is transcribed
pitches is retrieved (corresponding to a time interval of fairly well, with only some short notes missing. The piano
solo in the middle third is almost completely missed.
0.3s). Pitch estimates where the RMS value is below a
threshold are considered unreliable, so they are forced to an
erroneous value of zero. The pitches are then sorted. If 2/3 7.1 Melodic Similarity
of the data falls in a range of 25 cents, then the pitch is A simple algorithm is used for determining the duration
deemed to be stable, marking the beginning of a note. of matching melodic sequences, inspired by Mongeau and
Consecutive samples are processed similarly to find the Sankoff (Mongeau & Sankoff, 1990). The two sequences to
extent of the note: if 5 of 15 estimates differ by less than 25 be compared are processed iteratively: if some initial part of
cents, and the median of these is within 70 cents of the start sequence 1 matches some initial part of sequence 2, the
of the note, then we extend the note with the median of the initial parts are discarded and the remainders are compared
new group of estimates. When the end of the note is in the next iteration. Matches occur when:
reached, we report the pitch as the median of all the pitch • the pitches of two notes match and either their durations
estimates up to the first 1s of duration. This helps to ignore or inter-onset-intervals (IOI) agree within 20% or 0.1s.
pitch deviations sometimes encountered near the beginnings • a match occurs as just described after skipping one note
of notes. of either melody, or one short note (< 0.6s) in each
To avoid problems with absolute tuning differences, all melody.
pitches are kept as floating point numbers giving fractional • two notes of either or both melodies have matching
semitones. To transcribe the data, a histogram is contructed pitches, and when merged together, lead to matching
from the fractional parts of all note pitches, quantized to 10 durations (within 20%).
cents. The peak in this histogram indicates the difference This algorithm is applied to every pair of non-equal note
between the tuning reference used in the recording and the positions. Since the matrix is symmetric, we store the length
A440 reference used in our analysis. This will also of the matching sequence starting at i as Mi,j and the length
compensate for any systematic error or rounding in our
admittedly low-resolution pitch estimation procedure. of the matching sequence starting at j as Mj,i.
Figure 2 illustrates the note transcription as a plot of
pitch vs. time. The transcription does not include any 7.2 Simplification
metrical information. If location i,j represents similar sequences, then i+1,j+1
will probably represent similar, but shorter, sequences, the
7 Finding Similarities same sequences starting at i,j, excepting the first notes.
Since this is uninteresting, or at least less significant than i,j,
The next step begins the interesting task of looking for
we want to remove these entries from M. The algorithm is
structure in the data obtained so far. A melodic similarity
simple: determine the submatrix Mi:u,j:v that corresponds the
matrix, Mi,j is defined as the duration of similar melodic
matching sequences at i and j, i.e., the sequence at i runs to
sequences starting at all pairs of notes indexed by i and j. note u, and the sequence at j runs to note v. Set every entry
We will assume a sequence of n notes is described by pitch
in the submatrix to zero except for Mi,j. This simplification
pi and duration di, 0 ≤ i < n . If pitches do not match, then M
is performed on half the matrix, and the other half is zeroed
is zero: pi ≠ pj → Mi,j = 0, so much of M is zero. Non-zero symmetrically about the diagonal.
entries indicate similarity. For example, the second 4 In addition, we are not interested in matching sequences
measures repeat the first 4, starting at the 7th note. M0,6 = that contain only one note, so these are also zeroed.
14.62 (seconds), the duration of the matching 4-measure
repetition. 8 Clustering
After simplifying the matrix, we have all pairs of similar
melodic sequences. What if a sequence is repeated more
than once? If we scan any row or column, all non-zero
entries represent the beginnings of similar sequences. Why?

-3-
Because each entry denotes a similarity to the sequence are repeated three times near the end. However, the
starting at the given row or column. We can use this fact to information is not very clear, and there is a lot of detail that
construct clusters of similar sequences. A cluster will be a is confusing. In the next section, I show how this can be
group of melodic sequences that are all similar to one simplified considerably.
another.
The algorithm scans each row of M. At the first non-zero 9 A Simplified Representation
element, we note the duration, construct an empty cluster,
and insert the corresponding pair of sequences into the The goal of this final step is to produce an “explanation”
cluster. Continuing to scan the row, as each non-zero of the entire piece in terms of structural relationships. This
element is found, and if the duration roughly matches the is a non-hierarchical explanation and it only presents one
first one (within 40%), we insert the sequence possible explanation of the material, thereby achieving a
(corresponding to the current column) into the cluster. If the great simplification over the clusters, which provide for
duration does not match, the element is ignored. The cluster essentially every possible explanation. Rather than an
is complete when the end of the row is reached. To keep explanation, you can think of this as a parsing algorithm.
track of what sequences have been inserted into clusters, we The output will be a string of symbols, e.g. AABA,
zero all combinations; for example, if the cluster has representing musical structure, but unlike a typical parsing,
sequences starting at i, j, k, then we zero locations i,j, j,i, i,k, the grammar is unknown, and the symbols are generated by
k,i, j,k, and k,j. Scanning continues on the next row until the the algorithm rather than being defined in a grammar.
entire matrix has been scanned. The procedure uses an incremental “greedy” algorithm:
Figure 3 illustrates the clusters that were found. A proceeding from left-to-right, explain each unexplained
horizontal line denotes a cluster, and the (always non- note. An “explanation” is a relationship, i.e. “this note is
overlapping) sequences contained in the cluster are part of a phrase that is similar to this other phrase.” If the
indicated by thick bars at the appropriate times. The vertical explanation also explains other notes, they are marked as
position of a cluster line has no meaning; it is chosen to such and not reexamined (this is the greedy aspect).
avoid overlap with other clusters. For example, the bottom More specifically, we start with the first note, and all
line has 4 thick bars. These correspond to the “A” sections notes are initially marked as “unexplained.” Search for a
in the opening AABA form. The fourth element of the cluster that contains the first note in one of its sequences.
cluster corresponds to the “A” section when the saxophone Create a new symbol, e.g. “A,” as the label for all notes in
enters with BA after the piano solo. Already, the clusters the cluster and mark them. Once a note receives a label, the
express a rich fabric of relationships, many of which would label is not revised. Now, find the next unmarked note and
be described or at least noticed by a human listener. repeat this process until all notes are marked or “explained”.
Included within these relationships is the information that Any notes that are not covered by any cluster are ignored.
the structure is AABA and that the melody returns after the Figure 4 illustrates the results of this step. Rather than
solo with BA rather than AABA and that the last 2 measures using letters, different shadings are used to show the labels
graphically. We have mentioned the AABA structure many

Figure 3. Each horizontal line represents one cluster. The elements of the cluster are indicated by heavy lines, showing the
locations of similar melodic sequences. The melodic transcription shown at the bottom.

Figure 4. Simplified structural representation of “Naima,” shown below the transcription. Similar sections are shaded similarly.
The letter labels were added by hand for illustration purposes. Some of the sections in the middle reflect a spurious similarities
between parts of the piano solo.

-4-
times in this paper. Looking at Figure 4, the initial AA is ends the explanation because all 24 measures are covered by
indicated by the leftmost two rectangles. The B section is it. Since measures 1 through 4 of the 12-measure sequence
actually composed of the next 3 rectangles, showing were already explained in terms of a different cluster, it was
substructure of the bridge (which in fact does have the same surprising to see that the program chose the 12-measure
b1b1b2 structure shown here). Next comes the final A sequence to explain measure 5. In “Naima,” the clusters do
section, indicated by a shading that matches the first and not overlap. Nevertheless, the result makes sense and has a
second rectangles. The rectangles representing BA are hierarchical interpretation: the piece consists of 12 measures
repeated when the saxophone returns after the solo. Thus, repeated twice. Within each 12 measures, there is additional
the program derives almost exactly the same high-level structure: the first 2 measures are repeated at measures 3-4
description a jazz musician would use to describe the and 7-8. (Although transposition relationships are not
structure, without any prior knowledge or grammar of an studied here, it is interesting to note that measures 5-6 are a
acceptable description! It would be trivial to use a tree- simple transposition of measures 1-2.)
matching or parsing scheme to map the actual data onto a
standard form (including AABA) and then produce a
hierarchical description. (“B is structured as b1b1b2.”)
Further analysis could be applied to the durations of
these patterns or motives. It is clear by inspection that the
ratios of durations of the AAb1b1b2A form is 221122. There
is no way to tell that the unit here is 2 measures, but this
would at least give candidates for beat durations that might
help a beat-tracking algorithm. Also, the fact that these add
up to 10 rather than 8 or 16 is interesting, an observation Figure 5. Analysis of “Freddie the Freeloader,” a repeated
that a program could easily make if it knew a few 12-bar blues form. Audio is at top, transcription is in the
middle, and the structural “explanation” is at the bottom.
conventions about song form. The structure shows a repeated riff (3 times) and the
repetition of the entire 12-bar blues form.
10 Evaluation With New Input The Freddie the Freeloader example is successful in that
This analysis method works well for “Naima,” which is it reveals all the structure that can be obtained by
to be expected. After all, the system was built specifically considering repetition, including hierarchal relationships
for this piece, and was modified to overcome problems as that the software was not intended to find. This example
they were encountered. What about other input? I tested the illustrates the importance of hierarchy, and future work
analysis system with two other songs: “Freddie the should explicitly allow for hierarchical structure discovery.
Freeloader,” a jazz standard by Miles Davis, and “We Three This example also illustrates some of the danger of “greedy”
Kings,” a popular Christmas Carol by John H. Hopkins. algorithms. In this case, the simplification algorithm
These were played on trumpet and violin, respectively. destroyed some potentially interesting structure, namely the
Because the interesting part of the work is in the analysis recurrence of the first two measures at measures 13-14, 15-
rather than the polyphonic signal processing, these two 16, and 19-20. Fortunately, this is redundant information in
performances are monophonic. To be fair, these are the first this case. More work is needed to rank relationships
test cases after “Naima” (there was no search for good according to their importance though.
examples), and the software was not altered or tuned at all “We Three Kings” is a 32-measure form. An amateur
to prepare or tune the system for new input. student performed it on solo violin. If we were to consider
“Freddie the Freeloader” is a standard 12-bar blues with only 4-measure groups, the form would be AABCDDED.
a simple repeating figure. It was performed by the author, an The analysis, shown in Figure 6, comes close to revealing
experienced jazz trumpet player, with a moderate amount of this structure. The AA repetition is found, as is the first DD
expression including expressive pitch deviations and repetition. Interestingly, the program found a similarity
articulation. The transcription and analysis are shown in between B and E. Any listener would probably agree these
Figure 5. At first, this result was disappointing. It only sections are similar, sharing some pitches and having similar
seems to show the riff in the first two measures repeating in arch shapes. The program also found a similarity between
measures 3-4 and measures 7-8. Upon closer inspection, part of C and part of the final D, thus it did not label the
more structure is revealed. The 12-bar form was played final 4 measures correctly.
twice, with a change in the last 2 measures the second time. It should be emphasized again that the input is audio. No
This created a cluster representing 12 bars repeated twice parameters were adjusted in the pitch analysis software, so
(the small variation was ignored). When the simplification there are transcription errors. No beat detection is
algorithm looked for an explanation of measure 5, it found performed, so the program does not have the knowledge of
this overarching cluster. Thus the explanation of measure 5 beats or bar lines. Nevertheless, the overall analysis is quite
is that it is part of a 12-measure sequence that repeats. This

-5-
good, identifying the more important motives A and D, and section, but Marolt’s system captures and transcribes much
organizing them within the 32-measure form. of the polyphonic piano solo.)
It is important to try this work on a wider range of pieces
and to work on techniques that work robustly with all kinds
of music. It may be unreasonable to expect a machine to
“understand” music as well as humans, but we want the
system to be as general as possible. This work might be
extended to handle a broader range of pieces.
It is not at all clear that the algorithms presented here are
Figure 6. Analysis of “We Three Kings.” Audio is at top, the best for the task. In fact this work was originally
transcription is in the middle, and the structural intended as a proof-of-concept demonstration, and it was
“explanation” is at the bottom. The structure shows a surprising to see how well it works. An improved version
repeated passage (vertical bars) at the beginning and a should use a more reliable measure for melodic similarity
different repetition (black) in the middle. The contrasting (Mazzoni & Dannenberg, 2001) and should be less eager to
arch-shaped phrases are not literal repetitions, but were throw out entries in the similarity matrix. Perhaps a
found to be similar (diagonal \\\). hierarchical or lattice-based representation of similarities
Overall, the performance of this analysis software is would be better. Finally there is much more that can be done
quite good. The works chosen for analysis are not difficult in terms of harmonic analysis, melodic tension and
cases, but on the other hand, the program was not modified resolution, and rhythmic structure.
or adapted to cope with new problems that arose. To make
this a meaningful test, “Freddie” and “We Three Kings” are 12 Conclusions
the first test cases after “Naima.” Thus, the algorithm could
Listening to music is a rich human experience that no
be expected to give similar performance on comparable
computer model can fully replicate. However, some of the
pieces.
principle activities and by-products of music listening may
be subject to modeling with simple mechanisms.
11 Future Work Specifically, music listening involves the recognition of
Further work is required to consider other relationships. patterns and relationships. The most important relationship
For example, in “Naima,” there is a rhythmic motive that is repetition. This work demonstrates how a model of
occurs frequently, making connections between the A and B musical listening can be constructed upon the idea that
parts, and there is a descending pattern in the second half of musical repetition gives rise to structural relationships.
the B part where essentially the same figure is repeated at Listening is largely a matter of finding and organizing these
different transpositions. It should not be too difficult to relationships in order to construct an “explanation” of the
detect these relationships, if the notes are detected. (In the music in terms of how each part relates to some other part.
present example, some of the shorter notes of the figures are This model is realized by a fully automated music analysis
not always transcribed.) The difficult problem seems to be system that accepts audio as input and produces a structural
deciding what relationships are important and which take description as its output. Motives are identified, and the
priority. Conklin and Anagnostopoulou (2001) looked at a structural description tells how a small set of motives can be
statistical measure for the repetition of a pattern by chance ordered and repeated to form the music as a whole. This
as a way to decide if a relationship is significant or not, and information reflects common notions of musical description,
perhaps similar techniques could be applied here. including abstract form (e.g. AABA), identification of
This work could benefit from better transcription tools. common themes or motives, and the temporal organization
As mentioned earlier, there is work that already of phrases into 4-, 8-, 12-, and 16-measure groups.
demonstrates impressive performance on much more The analysis system has been demonstrated on three
difficult transcription tasks. Another possibility is to apply examples that include jazz and popular melodies, and in all
polyphonic transcription and look for harmonic cases the analysis is quite close to a standard interpretation.
relationships within a polyphonic preformance. We are Given the difficulties of acoustic analysis, it is quite
pursuing this idea now, using a transcription system created remarkable how well the system produces explanations of
by Matija Marolt (2001). We plan to perform an analysis structure within these examples. Currently, the system is
very much like the one described here but using harmonies somewhat limited by the quality of transcription. With
rather than pitches. This will require a similarity measure improvements to transcription, future enhancements should
for harmony and ways to combine outputs from the allow the identification of transposition as a relationship,
transcriber into harmonic regions. It will be interesting to thus providing an even more detailed and complete analysis.
see what this approach does with the piano solo in “Naima.”
(Our simple pitch analysis detected very little of the piano
solo, so the music analysis is mostly vacant during the solo

-6-
13 Acknowledgements Simon, H. A., & Sumner, R. K. (1968). Pattern in Music. In
B. Kleinmuntz (Ed.), Formal Representation of Human
The author wishes to thank his collaborators at the Judgment. New York: Wiley. Reprinted in S.
University of Michigan, especially Bill Birmingham. This Schwanauer and D. Levitt, eds., Machine Models of
work is supported in part by NSF Award #0085945. Music, MIT Press, pp. 83-112.
Stammen, D., & Pennycook, B. (1993). "Real-Time
References Recognition of Melodic Fragments Using the Dynamic
Coltrane, J. (1960). Naima, Giant Steps: Atlantic Records. Timewarp Algorithm." Proceedings of the 1993
Conklin, D., & Anagnostopoulou, C. (2001). International Computer Music Conference. San
"Representation and Discovery of Multiple Viewpoint Francisco: International Computer Music Association,
Patterns." Proceedings of the 2001 International pp. 232-235.
Computer Music Conference. San Francisco: Tolonen, T., & Karjalainen, M. (2000). "A computationally
International Computer Music Association, pp. 479-485. efficient multi-pitch analysis model." IEEE Transactions
Cope, D. (1996). Experiments in Musical Intelligence (Vol. on Speech and Audio Processing, 8(6).
12). Madison, Wisconsin: A-R Editions, Inc.
Goto, M. (2001, May). "A Predominant-F0 Estimation
Method for CD Recordings: MAP Estimation using EM
Algorithm for Adaptive Tone Models." 2001 IEEE
International Conference on Acoustics, Speech, and
Signal Processing. IEEE, pp. V-3365-3368.
Lartillot, O., Dubnov, S., Assayag, G., & Bejerano, G.
(2001). "Automatic Modeling of Musical Style."
Proceedings of the 2001 International Computer Music
Conference. San Francisco: International Computer
Music Association, pp. 447-454.
Marolt, M. (2001). "SONIC: Transcription of Polyphonic
Piano Music With Neural Networks." Workshop on
Current Research Directions in Computer Music.
Barcelona, Spain: Audiovisual Institute, Pompeu Fabra
University, pp. 217-224.
Mazzoni, D., & Dannenberg, R. B. (2001). "Melody
Matching Directly From Audio." 2nd Annual
International Symposium on Music Information
Retrieval. Bloomington: Indiana University, pp. 17-18.
Meek, C., & Birmingham, W. P. (2001). "Thematic
Extractor." 2nd Annual International Symposium on
Music Information Retrieval. Bloomington: Indiana
University, pp. 119-128.
Mongeau, M., & Sankoff, D. (1990). Comparison of
Musical Sequences. In W. Hewlett & E. Selfridge-Field
(Eds.), Melodic Similarity Concepts, Procedures, and
Applications (Vol. 11). Cambridge: MIT Press.
Mont-Reynaud, B., & Goldstein, M. (1985). "On Finding
Rhythmic Patterns in Musical Lines." Proceedings of the
International Computer Music Conference 1985. San
Francisco: International Computer Music Association,
pp. 391-397.
Narmour, E. (2000). "Music Expectation by Cognitive Rule-
Mapping." Music Perception, 17(3), 329-398.
Rolland, P.-Y., & Ganascia, J.-G. (2000). Musical pattern
extraction and similarity assessment. In E. Miranda
(Ed.), Readings in Music and Artificial Intelligence (pp.
115-144): Harwood Academic Publishers.
Rowe, R. (1993). Interactive Music Systems: Machine
Listening and Composing: MIT Press.

-7-

You might also like