Identification of Musical Notes in Sheet Music Images Using Colors

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8
At a glance
Powered by AI
The paper presents an alternative method for identifying musical notes in sheet music images by coloring different elements and analyzing the overlaid colors. It aims to address the challenge of distinguishing foreground and background elements in optical music recognition as notes are superimposed on stave lines.

The paper aims to address the challenge of image processing musical score sheets where notes and symbols are difficult to distinguish from the stave lines that they are superimposed on, as both are usually the same color.

The paper proposes to color each of the five stave lines and four stave spaces differently, and keep the musical note heads a uniform color. It then analyzes the results of overlaying these colors to identify the letter representation of each musical note on G-clef staves.

CMSC 190 SPECIAL PROBLEM, INSTITUTE OF COMPUTER SCIENCE 1

Identification of Musical Notes in Sheet Music


Images Using Colors
Maria Angela D. Barua and Prof. Margarita Carmen S. Paterno

Abstract— Optical Music Recognition reads notes from musical The input being an image, OMR Systems usually makes
score sheets by analyzing the interplay of various musical an intensive use of various image processing techniques as its
symbols, specially paying attention to the placement of musical engine. But unlike its predecessors in AI such as the Optical
notes on the stave lines or stave spaces to derive the its meaning.
We present an alternative method for identifying notes. Each of Character Recognition (OCR) wherein text characters against a
the five stave lines and four stave spaces were colored differently, against a uniform or non-uniform background are recognized,
and the musical note heads, a uniform color. The results of their classifying notes or objects in OMR are made extra difficult
overlaid colors were used to identify the letter-conversion of each due to the fact that the musical notes and symbols are super-
musical note on the G-clef staves. imposed on the stave lines- that is, both the objects of interest
Index Terms— Optical Music Recognition, Artificial Intelli- (the notes and symbols) and the background (the stave lines)
gence, Image Processing, sheet music are of the same color. Special techniques must be employed
in order to distinguish which is foreground and which is
background.
I. I NTRODUCTION
A. Background of the Problem B. Statement of the Problem
The WMN (Western Music Notation) had been widely used Image processing of musical score sheets had been a chal-
and accepted by musicians worldwide as a standard format for lenge for researchers due to the super-imposition of notes and
representing musical compositions. Using this format, musical symbols on the staves [10]. The Musical notes and symbols are
notes and symbols are placed systematically on a series of five usually regarded to be more important than the stave lines for
lines and four spaces called staves. Each note along with its they contain in bulk the sheet’s meaning, and various methods
accompanying elements is then visually identified by humans, of separating them from each other had been done in the past.
enabling them to understand the meaning or the semantics of But removing the stave lines usually damages the notes in
the score sheet as a whole. which some pixels belonging to a note are also removed along
Several studies arose from this field of Music wherein the with the stave lines, thus impairing its recognition.
interpretation of a musical score is automated by a machine, The semantics or meaning of a musical score sheet is given
instead of a human user. A branch of Artificial Intelligence by the sheet itself as a whole, and notes and symbols are
(AI), known as the Optical Music Recognition (OMR) System easily recognized by the musician due to the interaction of
had been developed to address on this task. OMR Systems musical notes on the spaces and stave lines. Optical Music
takes in as input a digital image of a musical score sheet Recognition Systems does the same, by keeping track of
and creates an output file suitable for computer manipulation the notes’ position relative to the stave lines or stave space.
from the read image. Output formats come in the form of Vertical lines corresponding to note stems which are possible
wave files (.wav), .pdf files or even text files that contains candidates for a note’s X-coordinates are stored in a list, and
the musical score sheet’s converted form which non-music components, such as the note heads that precede a note stem
practitioners who have difficulty in reading raw musical scores in another list [1][3][12]. With this, I propose to recognize the
could understand. Applications and systems for this field meaning of notes and symbols by assigning colors to each of
had been indispensable for both musicians and non-musicians the five lines and four spaces, varying their colors in intensity
alike, from the interpretation and conversion of a musical score for the different octaves such that when musical notes, also
which the general mass could understand, up to the detailed colored differently other than black coincides with them would
planning of an orchestral piece, and many more. produce colors matched to a specific note’s meaning.
To convert a musical score sheet into such formats, OMR
Systems must identify the stave lines, locate the musical C. Significance of the Study
objects on the score, identify the symbols that the objects
represent, and understand what the position of the of the The study, which poses a new method for classifying
symbols relative to the stave lines and each other means [11]. musical notes, would try to successfully identify the meaning
of musical notes by mixing their color on the colored stave
Presented to the Faculty of the Institute of Computer Science, University
of the Philippines Los Baños in partial fulfillment of the requirements for the
Degree of Bachelor of Science in Computer Science

c 2006 ICS University of the Philippines Los Baños
CMSC 190 SPECIAL PROBLEM, INSTITUTE OF COMPUTER SCIENCE 2

line or space it falls into. It aims to further improve character to the position that will generate the indicated tone
musical object recognition by the interactive use and on a musical instrument, especially on fast musical passages,
assignment of colors for the stave lines, stave spaces and where the mind has little time to process the position of the
musical objects. note character and translate it to the corresponding tone on
the instrument. Thus, for most people the only difficult aspect
D. Scope and Limitations of learning to read Standard Music Notation (SMN) is being
able to determine in an instant what tone is being represented
The system which had been implemented in the study by a note on the staff [10].
focused on the recognition of machine printed and scanned
musical score sheets, taking only the notes assigned on the
Treble-Staff or the right-hand and not on the interactive dual
of both the Treble-Staff and the Bass-Staff. In the real world,
musical notes on the Treble-Staff have accompanying notes
on the Bass-Staff which must be played all at once to produce
a harmonic blending of tones. The method of synchronizing
both staves in an OMR is another different topic, focusing
on the semantics of the musical score sheet as a whole. My
proposed study, which will only deal with the recognition of Fig. 1. Parts of a Staff.
musical notes, will not cover this area.
The system had been tested on several full-page machine- One of the initial challenges in any Optical Music Recog-
printed musical score sheets of varying sizes and resolutions. nition (OMR) system is the treatment of the staves. For
The musical notes’ recognition had been implemented on the musicians, stave lines are required to facilitate reading the
first four spaces and first five lines only, plus the first three notes. For the machine, however, it becomes an obstacle
upper and lower ledger lines. for making the segmentation of the symbols very difficult.
The task of separating background from foreground figures
E. Objective of the Study is an unsolved problem in many machine pattern recognition
The general objective of our study is to implement and test systems in general. There are two approaches to this problem
a method that may further enhance the recognition of musical in OMR systems. One way is to try to remove the stave lines
notes for OMR Systems using colors. It will make use of the without removing the parts of the music symbols that are
result of blending unique colors assigned to each stave line, superimposed. The other method is to leave the stave lines
stave space and musical notes to classify the meaning of a untouched and devise a method to segment the symbols [8].
musical note. Most OMR Systems first locates the position of the stave
The specific objectives are: lines prior to removing them. Many solutions and approaches
had already been proposed and among them is the use
1) to assign different colors for the staff lines,
of horizontal and vertical projections. Five obvious peaks
2) to assign different colors for the staff spaces,
corresponding to the location of stave lines are detected in
3) to assign a uniform non-black color to the musical notes,
horizontal projection. Vertical projection on the other hand
4) to identify the meaning of the musical notes based on
would also produce peaks corresponding to the width of
their color blended with the stave line or stave space;
the musical notes and symbols. Fuginana, I. in his Adaptive
and
Optical Music Recognition (AOMR) in 1988 has used these
5) to make a letter-based output (C,D,E,F,G,A,B) of the
methods to along with connected components analysis and
musical note’s converted form.
run-length coding to successfully isolate parts of a musical
score sheet into the primitive parts: the note stems, note heads,
II. R EVIEW OF R ELATED L ITERATURE bar lines, slurs, and symbols and finally, the staff itself.
Converting musical score sheets into formats suitable for Groups of compound musical objects and singles would be
computer manipulation and general formats non-music in- isolated after removing the stave lines. The next phase would
clined individuals could understand has driven researchers in be to recognize the primitive objects that make up each object.
the field of Computer Science to explore the possibilities and The primitive’s recognized from here would later be used to
intricacies of doing so. analyze the note’s semantics, for there are many number of
The Standard Music Notation (SMN) is the widely-used ways these may be combined to form a note. For example, a
music-composition writing format that started on the 17th note head with a stem, two bars and a dot would make up a
century. It involves the positioning of note characters on one 16th note with one-half of its duration added depending on the
or more staves that consists of five lines and four spaces, each time signature. A commonly used method for the recognition
representing a particular musical tone that occupies different of primitive objects is to use a set of templates that would be
positions on the Treble Staff and the Bass Staff. Lines and matched against objects that have been isolated. [7][1][6]
spaces represent sequential tones, and the note characters The meaning of musical notes are determined by their po-
differ according to their duration. Most people consider it sition or placement on the stave line or stave space [1][3][12].
more difficult to mentally translate the position of each note Bainbridge’s CANTOR System [1], Fujigana’s Adaptive Opti-
CMSC 190 SPECIAL PROBLEM, INSTITUTE OF COMPUTER SCIENCE 3

processing musical score sheets. Lin, K. and Bell, T. [2] have


proposed to color the stave lines in a much lighter hue than the
musical objects. Kestenbaum, D., and Hyman, V.M. proposed
to use a colored music notation system using seven colors that
are easily distinguishable from one another.
Fig. 2. Primitive objects that make up a musical object. George, S.E. and Sheridan, S. [9] proposed to improve
musical note identification by defacing the musical score sheet:
additional stave lines are added while maintaining the original,
so that the notes would look similar regardless of their position
on the staff.

III. T HEORETICAL F RAMEWORK


Fig. 3. Combinations of primitive objects. In general, Optical Music Recognition needs to keep track
of the locations or coordinates of the various musical objects
that make up a musical score sheet not only to facilitate
cal Music Recognition (AOMR) and McPherson’s proposal for the recognition of that object, but also in determining its
Coordinating Knowledge Within an OMR [11] first removes semantics or meaning. Several image processing techniques
the stave lines to separate the musical notes and objects before have been proven work to well for these cases. They include
performing musical note identification and semantics. the use of horizontal and vertical projections [1][2][6], run-
The concepts and functions of an OMR system had been length coding analysis [7], connected components analysis and
implemented in various music notation packages and appli- template matching.
catiosn during the past few years, with the intent of aiding
musicians and non-music inclined individuals understand or
A. Horizontal Projection/Vertical Projection
create musical score sheets and compositions of their own.
These applications were made to create and read certain music These are image processing techniques wherein pixels of a
file formats like MIDI, pdf, GUIDO, and MusicXML. Of certain color interest are counted along the image’s width row
these, MusicXML which was developed by Recordare LLC by row for horizontal projection and column by column along
on January 2004 is a promising XML-based digital sheet the image’s height for vertical projection, at the pixel level.
music interchange and distribution format for common West- High pixel counts would indicate the presence of many colored
ern Music Notation. Musical information obtained through pixels on that particular row or column. Black horizontal stave
MusicXML is designed for an Internet-friendly method of lines would produce five noticeable peaks higher than the rest
publishing music scores and it is usable by most notation and vertical note stems would have peaks higher than most of
programs and databases. the elements projected vertically.
One of the notable and highly-distinguished ”industry-
standard” OMR is the Finale Notation Program written by
Phil Farrand in 1988 for MakeMusic. It has undergone several
releases and its current version is the Finale 2009. Finale
is capable of creating musical compositions with a highly-
interactive GUI and it also includes a function for optically
recognizing printed music from a scan. It can also import and Fig. 4. Horizontal projections (blue) and vertical projections (red) of a staff
export MIDI files, and playback music using a large range of segment.
audio samples [wikipedia].
Audiveris Music Scanner[13] is the first Java-based open-
source Optical Music Recognition Tool whose current ver- B. Binary Image
sion 3.0 was released on 28-Jan-2008. Using the JAI (Java
A colored image is turned into black and white pixels only,
Advanced Imaging), it provides high-level logical music in-
depending on an intensity threshold.
formation compliant with the MusicXML definition about an
image of printed sheet music only, with the support for recog-
nizing handwritten scores still under development. Audiveris C. Run Length Coding
is able to detect and correct the potential skew angle of the Run-length counts the frequencies of certain-colored pixels
sheet; remove staff lines, detect systems, staves and measures; in a sequence. Identical numbers are represented by the
extract individual music items based on their appearance and number and length of the run in run lengths coding. For
their structure; translate music based on the extracted music example, the sequence {3 3 3 3 5 5 9 9 9 9 9 9 9 9 9 9
items driven by music grammar; manually correct remaining 9 9 6 6 6 6 6} can be coded as {(3, 4) (5, 2) (9, 12) (6, 5)}
errors; export logical music information using the MusicXML while binary images, which could only be 1 or 0 would only
format definition and provide direct audio playback using a need the length of the run. As another example, the sequence
MusicXML/MIDI interface. (1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
Some OMR Systems have applied the use of colors in 0 0 0 0 1 1) could be coded as (7, 4, 13, 8, 2), assuming 1
CMSC 190 SPECIAL PROBLEM, INSTITUTE OF COMPUTER SCIENCE 4

starts a sequence (if a sequence starts with a 0, the length of


zero would be used). Vertical run-lengths coding is a compact
representation of the binary image matrix column by column
[8].

D. Connected components analysis


Connected components analysis is a method used to check if
a group of pixels form an object. All the pixels in a connected
set are said to be adjacent or touching. A formal definition of
connectedness is: ”Between any two pixels in a connected set,
there exists a connected path wholly within a set.” Thus, in
a connected set, one can trace a connected path between any
two pixels without ever leaving the set. Connected components Fig. 5. Stave lines’ color assignments and corresponding note Letter-
analysis is useful for detecting bars, slurs and other small conversions.
musical symbols in a musical score sheet [8].

E. Template Matching
Template matching aims to give a correct guess of an
object’s group or classification by comparing it with a set
of templates that are more or likely different renditions and
samples of the said object obtained from other sources. If an
object is found to be quite like the objects in the template set,
then that object must also be a member of that group or class.
The object’s invariant features or characteristics that remain
constant in spite of the image’s rotation, skew, scale and
other severe warping factors must be wisely selected before
comparing with those same features in the set. The number
of black pixels and the ratio of black pixels with respect to Fig. 6. Stave spaces’ color assignements and corresponding note Letter-
area are the most commonly used features for the template conversions.
matching of musical symbols. K-Nearest Neighbors, one of the
most effective and popular classifiers and template matching
method wherein an object automatically becomes a member of B. Image acquisition and normalization
a class whose members are nearest to it by votation has been
implemented in Fujigana’s AOMR and Baindridge’s CANTOR The image obtained at this stage may or may not be purely
system [1][2][6]. black and white and some may even be uneven in coloring.
All input images must first be binarized, or converted to their
F. Artificial Intelligence (AI) black-and-white versions, a large seas of 1’s and 0’s. Then, the
horizontal projection of the binarized sheet music was derived.
A branch of Computer Science whose goal is to under-
A group of five ominous peaks from the horizontal projection
stand intelligence by building computer programs that exhibit
would correspond to the stafflines. Their y-coordinates would
intelligent behaviors. It is concerned with the concepts and
be recorded for future use.
methods of symbolic inference, or reasoning by a computer,
and how the knowledge used to make those inferences will be
represented inside the machine [5].

IV. M ATERIALS AND M ETHODS


The method had been implemented using C++ and the
imagelab libraries. The executables ptoduced by C++ had been
integrated into a simple User Interface (made with Visual
Basic 6.0) to facilitate the workflow. Input images must have a
24 Bit Depth for in order for imagelab to read them correctly.

A. Assignment of Colors
Colors would be assigned to each of the five stave lines and
four spaces. A set of dark colors had been chosen to represent
Fig. 7. Horizontal projection of a straight sheet music.
them. Pure cyan blue (6,105,178) will be assigned for the note
heads and other musical symbols.
CMSC 190 SPECIAL PROBLEM, INSTITUTE OF COMPUTER SCIENCE 5

3) Given the best-correlated offset, add the two projections


together and make this the new reference y-projection.
The offset is stored in an array to be used later.
4) If not at the end (right-side) of the staff, go back to Step
2.
5) If the right side of the page is reached, go back to Step
2, but this time move from the center to the left side of
the page.
6) Once the offsets for the strips of the entire page are
calculated, these offsets are used to shear the entire
image
The use of gradient magnitude in marking and finding an-
gles of significant edges have been borrowed from imagelab’s
d edge.cpp. All black pixels’ angles in relation to the whole
image are computed and the most dominant angle, which will
correspond to skewed lines, will be the image’s over-all skew
angle. The input image will then be rotated with this angle to
correct the skew.

Fig. 8. Detected staves. D. Whole Notes Detection


Overall, note-heads detection would be relatively simple if
there are no whole note heads on the score. Whole notes
C. Image Skew Correction and Detection
are special kind of notes that look like zero’s with holes in
The input image, if obtained through scanning could be between. Whole notes must first be covered and filled up
subjected to varying degrees of skew, and must first be as smoothly as possible for them to be detected as solid
corrected prior to note identification. note heads. The original input images had been dilated and
filled staffline height times using an 8x8 structuring element.
The resulting image was a blobbed-up, thickened version of
the regular thin image, covering most holes from the whole
notes and defacing the regular noteheads to some degree. To
restore the original image without also restoring the newly-
filled whole notes, the resulting image was eroded, shrinking it
to its actual size edges but leaving the filled-spaces untouched.
It was then x-ored with the original image. The resulting
image is a good enough revision of the score such that whole
note heads are adequately filled and some regular notes are
thickened.

Fig. 9. Horizontal projection of a skewed sheet music.

Three methods have been considered to correct the skew


but only the first had been implemented. First, the image will
be rotated until a satisfactory horizontal projection similar to
Figure 7 is obtained; that is; all prominent peaks corresponding
to staff lines must be divisible by 5.
The other method, adapted from [8]’s De-skewing method
is outlined as follows:
Fig. 10. Original staff segment and its dilated/eroded/x-or version. Notice
1) Take the narrow strip (currently set at 32 pixels wide) that whole note heads and other hollow parts are filled.
at the center of the page and take a y-projection. Make
this the reference y-projection.
2) Take a y-projection of an adjacent vertical strip to the
right of the center strip. Shift this strip up and down E. Staff lines processing
to find out the offset that results in the best match to The whole page of sheet music was cut into staves, adding
the reference y-projection. The best match is defined as an extra staff-height up and down or each detected staff. Each
the largest correlation coefficient, which is calculated by piece of staff would undergo the process for finding note
multiplying the two y-projections. heads. The y-coordinates of five ominous peaks corresponding
CMSC 190 SPECIAL PROBLEM, INSTITUTE OF COMPUTER SCIENCE 6

to the horizontal staff lines would be recorded, and colors will note heads are irrelevant and would be considered as noise.
be assigned to each horizontal line. An effective method for the This includes the sharps, flats, beams, slurs, bar lines, and
removal of stavelines had been demonstrated by Fujigana[6] others musical symbols, with the vertical note stems included.
which uses vertical black runs. First, the vertical black run of These must be removed such that only the note heads (as
the entire sheet was taken. The most frequent black-runs would much as possible) would remain. To remove bar lines and other
make up the stafflines, because they cover most of the page. thick or thin horizontal symbols, symbols whose horizontal
All vertical runs smaller than or equal to the most common black runs longer 2.0*noteWidth should be removed. A way
black runs(the stafflines) would be removed, leaving most of to remove long vertical lines is by removing all black pixels
the musical objects without the stafflines. The staff line height whose note stack width from Figure 10.0 is higher than 60
and staff space height could also be deduced from the initial percent of the staffspace height. Another method would also
vertical black run. be to remove all horizontal black runs whose widths are greater
than the staffspace width. After removing the note stems and
other symbols, only the note heads and other small symbols
will remain. Color them pure cyan blue (6,10,178).

Fig. 11. Horizontal and vertical projections of a staff segment.


Fig. 13. Staff lines removed from a staff segment.

Prior to this, note heads from various-sized score sheets have


been sampled and it is found out that a more or less steady
ratio of 9 pixels for note head height would have 12 pixels
of note head width, scaling up or scaling down depending
on the size of the staff space height. The staffspace height is Fig. 14. Barlines and other symbols removed from a staff segment.
also a good estimate of the notehead’s height.The note head’s
height, width and area could be determined from this point:
the note’s height, noteHeight would be assigned as the staff
space’s height, and the note head’s width would be
noteWidth = ( noteHeight * 12 ) / 9.0.
Fig. 15. Miscellaneous symbols smaller than and greater than note stacks’
F. Finding Note Segments width removed.

Vertical projection would next be performed. This would


produce peaks corresponding to various musical objects, with
the highest peaks on the long, vertical note stems and bar
lines. Since note stems indicate the presence of note heads,
the note stems’ x-coordinates would be recorded for future Fig. 16. Extracted note heads intersected with vertical lines.
reference. Note segments are groups of vertical projections that
are closely-packed together, corresponding to the possible x-
coordinates of the note heads or any other musical symbol. H. Identification of note heads
Note segments from the dilated version would reveal the For each detected note stack, all notes detected on staff
missing note segments of the whole notes from the original lines have been recorded as a cyan-blue 1-pixel wide stick
input image. ran down through it vertically. It was blended or overlaid
with the color of the line it passes through and its matched
letter conversion was recorded. The notes on staff spaces were
processed next in the same manner. However when dealing
with notes on staff spaces, it became necessary to check if the
blocks/components making up two or more directly adjacent
(touching) notes are higher than staffspace width because staff
lines in between 2 consecutive notes on staff spaces are not
removed, and classified as a note belonging to a staff line. If
the components are higher than 1 staff space; meaning they
Fig. 12. Note segments are associated with vertical projections.
could be 2 to more consecutive notes, all staff lines on them are
removed (the spaces occupied by the removed staff lines are
colored white). If they have only one unit less of staffline hight
G. Extraction of Note Heads removed staff lines in between, then these components would
Since we are only interested in knowing the notes’ meaning be separated, and the staff lines between them be removed
and not it’s other semantics, other musical symbols other than from the encountered lines.
CMSC 190 SPECIAL PROBLEM, INSTITUTE OF COMPUTER SCIENCE 7

Fig. 19. Encountered notes on staff spaces.


Fig. 17. Mixing pixel colors by getting the average R,G,B of each.

Fig. 18. Encountered notes on staff lines.

V. R ESULTS AND D ISCUSSIONS


Fig. 20. Summary of identified notes for Fig. 21
Executable files have been created from C++ and imagelab
libraries to demonstrate the proposed methods of musical
notes’ identification using colors. These where then integrated time it will take for the skew to be corrected for its method is to
into a simple interface created using Visual Basic 6.0, to aid rotate the image by increments of positive or negative 0.001
sheet music input, selection, conversion and display. degrees until a satisfactory number of horizontal projection
The interface enables an easy way of feeding input images peaks is achieved.
to the C++ executables. A user is able to browse for a
particular sheet music image, and select which staff/staves will
VI. C ONCLUSIONS AND R ECOMENCADTIONS
be read and converted to letter notation. Converted notes per
staff are shown on the right side panel while errors, warnings Identification of musical notes using colors, which makes
and miscellaneous notices are displayed on the bottom panel. use of the color output matched to the letter-conversion of
A user is also able to save the letter conversions into a notes as a vertical note stick passes through detected vertical
text file. The interface is also used for clearing all traces of segments of stack of notes is an efficient and straight-forward
computations and image outputs by the C++ executables upon method of classifying musical notes’ meaning. However its
start and exit. results could be clouded by an unsuccessful extraction of
Using the methods described, the proposed system was able note heads and deletion of unnecessary musical symbols. The
to correctly identify more or less 96 percent of all notes (on results of image processing methods required to preprocess
the G-clef) of the given sheet music sample. However there the input musical score sheet relies on the image’s quality;
are still imperfections on the methods which resulted to the size and resolution, orientation and manner of encoding. Sheet
missed, incorrectly identified and falsely identified notes. music may have different printing syntax, and have a varying
The major factor of these errors is the total deletion of standard of drawing notes and other musical symbols per
miscellaneous musical symbols such as the bar lines, extended author or publisher, thus, a varying standard of note heads’
note stems and accidentals. Undeleted symbols would add up sizes in relation to the other musical symbols. Overall, these
to a detected note’s width, making it wider and be associated are the key factors affecting the successful identification of
with other musical symbols that have large widths. After the staff lines and musical symbols. Binarization of the image
initial deletion of all musical symbols, the remaining musical could lead to turning black pixels to white if not given the
symbols that have the same width as the general staff’s note proper threshold. Dilation could make non-connected note
width could be falsely classified as notes (see Fig. 20 and Fig. heads connected while it could also fill up whole notes’ empty
21). spaces. Horizontal and vertical black run-length coding, while
Due to the staff space and height and the staff’s general most useful in isolating staff lines from the rest of the musical
notes width being the basis of the possible locations of notes symbols could also delete note heads as it delete bar lines and
or stacks of notes, tied notes whose widths are wider than other miscellaneous images, if not given the proper estimate
regular single notes heads are currently not read see Fig. 3. of note heads’ height, width and area. For the best results, it is
Another factor is the partially filled-up whole note heads advised to obtain a not so large yet not so small input image.
which couldn’t be counted as a regular black note head. Whole Though there are still slight errors in note identification due
notes’ dilation and erosion via a staff line height structuring to the said deficiencies, the system described in this study is
element sometimes still leaves spaces or small holes, thus not found to work best on non-skewed machine printed musical
covering the entire area. score sheets whose widths are less than or equal to 2000 pixels.
Depending on the input images’ clarity, resolution and level The system could be further enhanced by a more accurate
of detail, dilation and erosion could invade extra spaces on the method of note heads extraction, along with the deletion of
staff space which the original note wasn’t supposed to occupy. unnecessary musical symbols. A note heads extractor that
This could result to a note belonging to a staff line be identified could work on general and special cases of sheet music lay-
as a note on a staff space, and vice versa. outs would further improve the system. It is also recommended
The skew correction is found to work best on clear copies of that an automatic detection of G-clef staves from F-clef staves
skewed musical score sheets. The smaller the skew, the shorter be made. A better method of binarizing gradient-filled scanned
CMSC 190 SPECIAL PROBLEM, INSTITUTE OF COMPUTER SCIENCE 8

Fig. 23. Unclean scanned musical score sheet(top) and binarized image
(bottom)

[2] T. Bell and K. Lin, “Integrating paper and digital music information
systems,” Master’s thesis, University of Canterbury, Christchurch, New
Zealand, 2000.
[3] I. Bloch and F. Rossant, “Robust and adaptive omr system including
fuzzymodeling, fusion of musical rules, and possible error detection,”
EURASIP Journal on Advances in Signal Processing, vol. 2007, p. 25,
August 2006.
[4] N. Carter, “Automatic recognition of printed music in the
Fig. 21. Melodies of Life (Final Fantasy 9 OST) context of electronic publishing,” Ph.D. dissertation, University
of Surrey, London, England, February 1989. [Online]. Available:
http://www.npcimaging.com/thesis/thesis.html
[5] R. S. Engelmore. (1993, May) Knowledge-based systems in japan.
[Online]. Available: http://www.wtec.org/loyola/kb/
[6] I. Fujigana, “Optical music recognition using projections,” Master’s
thesis, McGill University, Montreal, Canada, 1988.
[7] I. Fujigana and M. Droettboom, “Interpreting the semantics of music
notation using an extensible and object-oriented system,” in Proceedings
of the Nineth International Python Conference, pp. 71–85.
[8] S. George, Visual Perception of Music Notation: On-Line and Off-Line
Recognition. Australia: Idea Group Publishing, 2004.
[9] S. E. George and S. Sheridan, “Defacing music scores for improved
recognition,” in Proceedings of the Second Australian Undergraduate
Students Computing Conference, Dec. 8–10, 2004, pp. 142–148.
[10] D. Kestenbaum and V. M. Hyman. (2006, Dec) Colored music notation
system and method of colorizing music notation. [Online]. Available:
http://www.freepatentsonline.com/7148414.html
[11] J. McPherson, “Coordinating knowledge to improve op-
tical music recognition,” Ph.D. dissertation, The Univer-
sity of Waikato, Japan, Sept 2006. [Online]. Available:
Fig. 22. Optical Music Recognition Using RGB http://www.cs.waikato.ac.nz/ davidb/publications/acsc96/final.html
[12] M. Roth, An approach to recognition of printed music. Germany:
Eidgenssische Technische Hochschule Zurich, Departement Informatik,
musical score sheets is also yet to be implemented and a Institut fr Theoretische, 1994.
[13] (2008) The audiveris website. [Online]. Available:
straight-forward method of determining the skew angle and https://audiveris.dev.java.net/
correcting the skew by the detected angle is also recom- [14] (1988) Finale (software). [Online]. Available:
mended. Furthermore, the use of colors in identifying notes http://en.wikipedia.org/wiki/Finale Music Notation
could also be tested on both the G-clef and the F-clef staves.

VII. ACKNOWLEDGMENT
Many thanks to Him for inspiring me with ideas when my
mind goes blank, to Ma’am Marge for being His vessel of
guidance for providing me with wonderful ideas and sugges-
tions for the topic, Sir Vlad for the priceless knowledge shared
on the CMSC170 and CMSC191 sessions and my dear loving
parents for whom all of this is dedicated.

R EFERENCES
[1] D. Bainbridge and T. Bell, “An extensible optical music recognition
system,” presented at the Nineteenth Australasian Computer Science
Conference, 1996.

You might also like