0% found this document useful (0 votes)
28 views5 pages

Paper 3

The document presents a novel application for music creation and editing using a wall-mounted panel and computer vision algorithms, allowing users to generate sounds in real-time without traditional musical instruments. It utilizes OpenCV for note recognition and JFugue for sound playback, enabling users to compose, edit, and play music interactively. The application is designed to be low-cost and user-friendly, making it a valuable tool for music education and creative expression.

Uploaded by

Shruti Salke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views5 pages

Paper 3

The document presents a novel application for music creation and editing using a wall-mounted panel and computer vision algorithms, allowing users to generate sounds in real-time without traditional musical instruments. It utilizes OpenCV for note recognition and JFugue for sound playback, enabling users to compose, edit, and play music interactively. The application is designed to be low-cost and user-friendly, making it a valuable tool for music education and creative expression.

Uploaded by

Shruti Salke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2020 24th International Conference on System Theory, Control and Computing (ICSTCC)

Music panel: An application for creating and editing


music using OpenCV and JFugue
Oana Săman Robert Muscaliuc Loredana Stanciu
Automation and Applied Informatics Automation and Applied Informatics Automation and Applied Informatics
Department Department Department
Politehnica University Timisoara Politehnica University Timisoara Politehnica University Timisoara
Timisoara, Romania Timisoara, Romania Timisoara, Romania
[email protected] [email protected] [email protected]

Abstract—Music is the product of human creativity and the and distance from the system, corresponding to the increase
mastery of musical instruments. There are a large number of and decrease in the pitch of the musical notes and playing a
musical instruments, whether electric, acoustic, wind or musical note when the ultrasonic sensor detects an object in
percussion. Some areas of research explore the creation of music front of it.
through the human-computer interaction. This paper proposes
a way to generate sounds with the help of a wall-installed panel There are also some applications that combine the idea of
with notes drawn on it. The notes are recognized with a webcam simulating playing an instrument with learning music in an
and computer vision algorithms, and then played back to the easy way, with the help of games, like Guitar Hero or Rock
user in real time. The application was created with the aim of Band [3]. The interaction between the game and the user is
giving people who want to express their creativity through achieved through peripherals that mimic musical instruments,
music, the chance to do so without the need for musical which furthers the immersion into the game. [4] The game
instruments, in a simple and novel way. checks whether the note or the chord displayed on the
computer screen is the same as the one received from the user
Keywords—music panel, music note generation, human- through peripherals. If the result of the check is positive, the
computer interaction, computer vision, OpenCV, JFugue player will hear a clear and linear melodic line. One drawback
I. INTRODUCTION of the application is the lack of possibility to create own
musical compositions, but still proved to be a very good
The evolution of technology came also with the progress resource in music education [3].
of the way music is created and represented. Not long ago,
music was created only with acoustic instruments. In the The availability of inexpensive hardware and powerful
twentieth century, the creation of acoustic instruments slowed free software tools is fueling an explosion in the use of
down and with advancements in the field of electronics, new computer vision for art interfaces, enabling a new
electrical instruments appeared, such as the electric guitar or expressiveness in both performance and audience-interactive
the synthesizer. From the second half of the century up until art [5]. For instance, in [6] a vibraphone (a percussion musical
now, computers have been used to generate and synthesize instrument that is played by striking metal bars with mallets)
sound. However, the way a musician interacts with the is used in combination with a Kinect camera. The camera was
computer in order to create music has not changed very much. used in order to track mallet position and create new
Music creation oftentimes resumes to using software possibilities of audio control of the vibraphone based on the
controlled with the keyboard, mouse, or electronic positions of the mallets. In the study, one mallet was used for
instruments called MIDI controllers. Currently, in addition to creating sound distortion and the other one was used to raise
these controllers, the mobile phone or tablet are used for their or lower the pitch. The aim is to create a so-called
ability to generate sounds of any duration, pitch, intensity or hyperinstrument, a musical instrument that uses the power of
timbre, and the user's ability to interact with these devices with computers to improve its output, in order to increase musical
touch. expression. In another paper [10], a method for generating
piano music using dance movements was proposed. The
One idea researched lately for music creation through movements are recorded with a webcam and the coordinates
human-computer interaction is the creation of musical of the joints produce musical chords.
instruments that can be used to interact with the computer
naturally, i.e. without keyboard, mouse or touch, by using a The idea of using a natural user interface with the
touchless interface or gestures [1]. For example, Soundbeam computer in order to create music has been explored in this
is an interactive MIDI hardware and software system paper. An application was created that lets the user compose
developed by Soundbeam Project/EMS in which ultrasonic songs by using a wall-installed panel with notes drawn on it,
sensors are used to detect motion and control the system for indicating the notes with a blue-colored indicator and
sound generation [2]. The system was used mainly in the field recognizing the notes shown with the help of computer vision
of education for people with special needs due to the minimal algorithms. The note that is recognized on the panel will then
physical movement required to operate it. Soundbeam uses a be played by the application in real time. The application is
combination of a tangible interface (a foot controller) and a easy to install and use, does not involve high costs and
non-tangible interface (sonar) to generate MIDI event simulates playing a musical instrument as close to reality as
messages. There are two ways of interaction: the proximity possible. The project presented in this paper allows the
creation of music (in a simple variant) through human-

978-1-7281-9809-5/20/$31.00 ©2020 IEEE 379

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 06,2021 at 17:31:58 UTC from IEEE Xplore. Restrictions apply.
application described in this paper, the notes are enclosed by
black rectangles, each note having a specific rectangle on the
panel. The notes are organized so that three and a half octaves
are available to play, and there are 66 rectangles for these
notes.
Using computer vision algorithms from the OpenCV
library, every note on the panel is determined by recognizing
all the rectangles. The user then shows a note on the panel with
the help of a blue indicator. Then, the rectangle which contains
the blue indicator is found, thus the note that was indicated is
recognized. This note can now be played in real-time. A use-
case diagram of the application is shown in Fig. 1.
For detecting the black rectangles, first the image is
converted to grayscale in order to discard color information
and operate only with luminous intensity. The conversion to
grayscale is made with the cvtColor() function. Then, the
contrast of the image is improved through the method called
histogram equalization (equalizeHist() function). The process
called thresholding applies a mask over the pixels in the image
that are considered black. The threshold used for detecting the
black pixels can be modified with a trackbar, depending on the
light conditions. The approximate value used for the threshold
was 30.
The contours that are considered black are found with the
findContours() function. The flag RETR_CCOMP classifies
these contours as inner and outer contours. The hierarchy of
Fig. 1. Use-case diagram
the contours is built into an array of 4-dimensional elements.
computer interaction without the need for a MIDI controller Each element in the hierarchy represents a contour and its four
or mouse/keyboard. It also offers the ability to imitate ten values represent indexes of other contours, by the following
musical instruments, create, play and edit music. The rules: the next contour belonging to the same hierarchical
application’s main features are: level, the previous contour belonging to the same hierarchical
level, the first child, the parent. For finding the rectangles that
• Low-cost, with minimal physical resources used: a contain musical notes, all contours that have a parent (an outer
wall panel and a web camera; contour) and have the area larger than 1000 (experimentally
• Easy and intuitive to play music with the determined – this parameter was used in order to filter out
organization of notes on the panel; smaller contours that are detected) are taken into
• Recognition of the musical notes with computer consideration. Then the found rectangles are counted. If there
vision algorithms from OpenCV library; are 66 rectangles, they are saved into an array and a note is
• Playing the musical notes in real time using MIDI assigned to each of them. If there are not 66 rectangles, the
capabilities offered by JFugue, a Java library; application waits until this condition is met. After the
• Ability to save a song; detection of the rectangles, the user can show different notes
• Ability to edit saved songs by changing the rhythm, on the panel with an indicator of blue color.
adding and changing an instrument, changing the To detect the blue indicator, the image is converted to the
duration of a note or group of notes, and changing HSV color space, which was found to be suited for detecting
the octave of a note or group of notes; blue color and it is robust to illumination changes.
• Record and play a song over an already existing one. Thresholding is applied on the hue, saturation, and value
components in order to detect the blue colors in the image.
Blue is represented in OpenCV-HSV as 120, so the hue
II. METHODS
component was chosen to be between 100 and 140. This range
The purpose of the application is to simulate a musical works well in all light conditions, however, for better
instrument using a physical panel with musical notes accuracy, the hue range can be modified with a trackbar while
represented on it, and recognizing those notes with the help of running the application. The saturation range was
a web camera and computer vision algorithms offered by experimentally chosen to be between 150 and 255 in order to
OpenCV library. The notes are then played using a Java music exclude low-saturation colors, and the full range of the value
programming API called JFugue. component was used, 0-255, in order to detect various shades
of blue. The resulting image is then improved with
A. Detection of musical notes
morphological operations of opening and closing. The
The panel contains 66 musical notes, organized in a table, opening operation is composed of the basic functions erode(),
and each cell of the table represents a note. This is an followed by dilate(). Erosion removes the white noise in the
adaptation of the musical keyboard layout called Wicki- image, but also shrinks the contours of the objects and
Hayden [7]. The Wicki-Hayden format represents the notes in amplifies the gaps. Dilation reverts the contours back to their
hexagonal cells and groups the notes of the major scale in one original size. The closing operation is composed of dilate(),
place, thus minimizing hand movement between notes. For the followed by erode(). Dilation fills up the gaps inside of

380

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 06,2021 at 17:31:58 UTC from IEEE Xplore. Restrictions apply.
JFugue is an open-source library that can be used with the
Java programming language to compose music using a series
of instructions. These instructions are interpreted by JFugue
and turned into sound. JFugue works with Staccato strings,
which is a standard format for representing musical notes on
the computer. An example of such string is: V0
T[MODERATO] I[PIANO] C D E F G A B. In this example,
the C major scale is played by using a piano sound and a
moderate rhythm. The following properties can be added to
each musical note: duration, octave, and symbols of alteration
(sharps and flats). The classes used to interpret Staccato
strings are called Player and RealTimePlayer. An object of the
type of either of these classes calls the play(java.lang.String
string) method, which receives as parameter a Staccato string.
The duration of a note is symbolized by different letters,
as follows: w – whole note, h – half note, q – quarter note, i –
eighth note and s – sixteenth note. The default value is q.
Fig. 2. Detection of the blue indicator Combinations of durations such as “wh” may also be used.
The octave is represented by numbers 0 to 10. The default
octave value is 5. Symbols for alteration of the notes are b for
objects, and erosion keeps the contours at their original size. flat and # for sharp.
The opening and closing operations remove noise from the
background and fill up the contour of the blue indicator. The Java threads were used in two situations:
detection of the blue indicator is presented in Fig. 2. The upper 1) When displaying musical notes in real time. A thread
left image illustrates the HSV color space, the upper right has been created that handles the textbox update in which the
image shows the result after thresholding and the bottom notes are displayed, in order to avoid blocking the graphical
image shows the result after morphological opening and user interface;
closing.
2) When playing a song in parallel with an already saved
After the detection of the blue color, its contour is found song. In this situation, a thread has been created that plays the
with the findContours() function. Its centroid is determined by background song. Both threads update the GUI with the notes
using the moments() function. The centroid is then used to played by the user or the notes of the background song. To
exactly determine in which of the rectangles the blue indicator avoid the possible conflicts when updating the same shared
was positioned. After the rectangle was found, the note resource, the two threads were synchronized.
associated with it is searched and determined. If two
consecutive frames contain the centroid of the blue indicator The editing wizard for songs (Fig. 3) consists of three
in the same rectangle, then the note is played. The reason for buttons that correspond to the following operations: open, play
having two frames instead of one is to avoid accidentally or save. A right-click on the text area of the wizard shows the
playing a note when transitioning between two notes on the menu that provides all the editing options. These options are
panel. mentioned in the Introduction chapter, as features of the
application. Editing can be performed on a note, a group of
When the note indicated on the panel is determined, the notes, or even the entire song. Each editing operation has a
user hears it in real time. In order to play different notes, the listener attached to it that performs the appropriate change
user has to move the blue indicator to another rectangle. The according to the selection of the user. These changes are then
same note can be played multiple times if the user hides the translated into Staccato string notation.
blue indicator (i.e. turns it on its other side) and then shows it
again to the camera. The Player class is used for playing existing songs or the
songs after editing, being useful for the situation when all the
B. Playing the notes and editing the music notes of a song are known. The RealTimePlayer class is used
The component that handles note playback and editing
saved songs is an application written in Java programming
language. The role of this component is to interpret notes
according to a user-chosen configuration and to provide
editing operations that can be applied to a song. The
application provides configuration features for playing
melodies in real time, such as choosing the rhythm or the
musical instrument used. The user can choose between ten
instruments and three kinds of rhythms. The application also
allows playing over another song that is already saved.
The process of creating a song consists of indicating a note
on the physical panel which can be then heard and viewed by
the user. The algorithm is designed to play and display the
notes in real time. The functionality of the JFugue library was
used to play the notes, and threads were used to display the
notes on the screen. Fig. 3. Editing a song

381

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 06,2021 at 17:31:58 UTC from IEEE Xplore. Restrictions apply.
for playing the notes in real time, smoothly and connected.
This class is useful when the songs are created.
TABLE I. DETECTION OF THE RECTANGLES
Setting the default configuration can be done using the
configuration wizard or the CONFIG.txt file. In the Lighting condition Test passed
Daylight 400 lux Yes
configuration wizard, the user can set a value for the
instrument and for the rhythm. The possible values for the Low light 190 lux Yes
instrument are offered in a drop-down list: piano, xylophone, Low light 30 lux Yes
harmonica, acoustic guitar, acoustic bass, viola, string
ensemble, alto sax, flute and trumpet. There is a second drop- Low light 15 lux Yes
down list with the possible values for the rhythm: lento, Low light 10 lux No
moderato and allegro. If the configuration wizard is not used,
Artificial light 30 lux Yes
then the application will use the values from the CONFIG.txt
file.
TABLE II. PLAYING THE SAME NOTE 66 TIMES
III. RESULTS
Lighting condition Short note time Moderate note
A. Testing time
Daylight 400 lux 100% 99.6%
Because the image recognition part is sensitive to
illumination, the system was tested in three conditions: Low light 190 lux 99.2% 100%
daylight (at noon), low light (between sunset and dusk) and Low light 30 lux 93.5% 100%
artificial light (LED lights at night). For each of the lighting
conditions mentioned above, four tests were performed for Low light 15 lux 98.4% 95.4%
determining the detection accuracy of the rectangles and then Low light 10 lux N.A. a
N.A.a
of the blue indicator. Tests 2) and 3) were performed holding
Artificial light 30 lux 96.9% 99.6%
the indicator over the notes for: a) a short period of time (~0.3
seconds) and b) a longer period of time (~1.5 seconds). The
distance of the web camera from the panel was 1.5m. The TABLE III. PLAYING ALL 66 NOTES
illuminance (measured in lux) was measured using an ambient Lighting condition Short note time Moderate note
light sensor of a smartphone positioned near the web camera. time
The values of illuminance at the time of the tests were: Daylight 400 lux 98.4% 99.2%
daylight – 400 lux, low light – 190 lux, 30 lux, 15 lux and 10 98.8% 98.1%
Low light 190 lux
lux, artificial light – 30 lux.
1) The instantaneous detection of all 66 rectangles was Low light 30 lux 96.2% 98.4%
tested. Results are presented in table I. Low light 15 lux 90.9% 95.4%
2) All of the 66 rectangles were indicated on the panel Low light 10 lux N.A.a N.A.a
with the blue indicator to check if each note is detected
Artificial light 30 lux 95.8% 94.6%
correctly. Results are presented in table II.
3) The same note was shown 66 times to see if a note is a.
A note played at moderate speed in low-light was detected more than once for each note
detected correctly each time. Results are presented in table played, so the test could not be carried out. Also, the rectangles were not detected so the
application cannot be used in this light condition.
III.
4) The same note was indicated for 60 seconds, TABLE IV. PLAYING A NOTE FOR 60 SECONDS
uninterrupted. The purpose of this test was to determine if the Lighting condition Test passed
detection is reliable for longer notes and the same note does Daylight Yes
not play multiple times if it was intended to be played once. Low light Yes b
Results are presented in table IV.
Artificial light Yes
All of the tests were repeated four times. The first test is
passed if the rectangles are detected instantaneously. For the b.
The test is passed for illuminance values > 15 lux. For 10 lux, the note was detected on
second and third tests, the number of times each note was average 21.75 times in 60 seconds. Also, for 10 lux, the rectangles were not detected so
the application cannot be used in this light condition.
detected correctly (out of 66 times) was counted and the
average value was calculated and expressed as a percentage.
For an indicated note to pass the test, it must be heard once for
which is a musical layout used by concertina players, that
the duration it is indicated, and the correct note must be heard.
places the major scale under the fingers, thus requiring little
The fourth test is passed if the note is detected only once in a
hand movement. The Wicki-Hayden layout is composed of
period of 60 seconds.
hexagonal cells which represent musical notes. In our
The detection of the notes in daylight, low light and
application, this design was adapted to rectangular cells in
artificial light was fairly accurate. In very low light conditions
order to facilitate their detection with computer vision.
(10 lux), the rectangles could not be detected so the musical
notes could not be mapped correctly to the note arrays in the Another issue was generated by the feature of playing a
program. In conclusion, notes of any duration can be played song over another recorded song. A first idea was to use
in daylight, low light (> 15 lux) and artificial light conditions. multiple MIDI files generated with JFugue and play them
together, but this was difficult and not so efficient. Eventually,
B. Encountered issues
the JFugue feature called voices was used, which lets the user
A first issue was to organize the notes to be easily played play songs on multiple channels at the same time. Each
by the user. The method found was the Wicki-Hayden system,

382

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 06,2021 at 17:31:58 UTC from IEEE Xplore. Restrictions apply.
channel uses a different Staccato string, which may be saved REFERENCES
in a .txt file and subsequently loaded into the application. [1] R. Fiebrink, D. Trueman, C. Britt, M. Nagai, K. Kaczmarek, M. Early,
Thus, multiple Staccato strings can be played at the same time, M.R. Daniel, A. Hege, and P. Cook, “Toward Understanding Human-
either from different .txt files or from a .txt file and a melody Computer Interaction in Composing the Instrument”, Int. Comp. Music
played in real time with the panel of notes. Assoc., 2010
[2] Swingler, T., “The Invisible Keyboard in the Air: An Overview of the
IV. CONCLUSION Educational, Therapeutic and Creative Applications of the EMS
Soundbeam™”, in 2nd European Conference for Disability, Virtual
The system described in this paper is composed of an Reality & Associated Technology, 1998.
application for recognizing musical notes with computer [3] E. Tobias, “Let’s Play! Learning Music through Video Games and
vision algorithms and an application for playing back the Virtual Worlds”. The Oxford Handbook of Music Education, vol. 2,
notes, saving and editing the songs. The system’s main 2018, doi:10.1093/oxfordhb/9780199928019.013.0035
purpose is to simulate a musical instrument controlled by the [4] J. Tanenbaum and J. Bizzocchi, “Rock Band: a case study in the design
user indicating notes on a wall-installed panel. This is of embodied interface experience”. In Proceedings of the 2009 ACM
SIGGRAPH Symposium on Video Games (Sandbox ’09). Association
combined with a feature for editing the created songs. An for Computing Machinery, New York, NY, USA, 2009, 127–134.
algorithm was implemented that detects the notes shown on [5] A.W. Senior, A. Jaimes, “Chapter 2 - Computer Vision Interfaces for
the panel and plays them to the user in real time. Similar works Interactive Art”, Human-Centric Interfaces for Ambient Intelligence,
include Soundbeam [2], which controls a music generating 2010, pp. 33-48
application through movement detection with ultrasound [6] G. Odowichuk, S. Trail, P. Driessen, W. Nie and W. Page, “Sensor
sensors, or a 3D music control interface using image sensors fusion: Towards a fully expressive 3D music control interface,”
to track mallet positions and add new possibilities of control Proceedings of 2011 IEEE Pacific Rim Conference on
Communications, Computers and Signal Processing, Victoria, BC,
for pitched percussion instruments [6]. 2011, pp. 836-841
Creating music brings benefits such as improving musical [7] Wicki-Hayden Note Layout, THE MUSIC NOTATION PROJECT,
knowledge, improving memory or releasing stress [8]. Also, http://musicnotation.org/wiki/instruments/wicki-hayden-note-layout/
there are proved effects of tempo and mode on spatial ability, [8] S. Bottiroli, A. Rosi, R. Russo, T. Vecchi, and E. Cavallini, “The
cognitive effects of listening to background music on older adults:
arousal, and mood of listeners [9]. These are additional aspects processing speed improves with upbeat music, while memory seems to
for users of such an application. Simple or more complex benefit from both upbeat and downbeat music”. Frontiers in aging
songs can be created by playing a single melody in real time neuroscience, vol. 6, 284, 2014.
or by combining it with a background melody. Also, multiple [9] G. Husain, W.F. Thompson, E.G. Schellenberg, Effects of Musical
musical instruments can be simulated and controlled with the Tempo and Mode on Arousal, Mood, and Spatial Abilities, Music
same interface, giving the user the ability to try multiple Perception 20 (2): 151–171, 2002.
instruments without owning them physically. Future ideas for [10] F. Albu, M. Nicolau, F. Pirvan, D. Hagiescu, “A Sonification Method
developing the application include: showing the played notes using human body movements”. Proceedings of the 10th
International Conference on Creative Content Technologies, 2018
on a sheet music in real time, the possibility to play chords,
and adding sound effects to the finished song.

383

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 06,2021 at 17:31:58 UTC from IEEE Xplore. Restrictions apply.

You might also like