Music Recommemdation System
Music Recommemdation System
SYSTEM
A Project Report
Submitted by
BACHELOR OF TECHNOLOGY
in
Certified that this project “MUSIC RECOMMENDATION SYSTEM” is the Bonafide work
of “ ANJALI KASOUDHAN, TEJASWI LENKA, ROSALINE DAS, PRANGYA P. PATI,
BHAGYASHREE SAHOO, SAMIKSHYA BAL” Who carried out the project work under
my supervision. This is to further certify to the best of my knowledge , that this project has
not been carried out earlier in this institute and the university.
SIGNATURE
Dr. JYOTI PRAKASH GIRI
School of Management
Certified that the above-mentioned project has been duly carried out as per the
norms of the college and statues of the university.
SIGNATURE
Department seal
DECLARATION
Registration No.:
Place:
Date:
CONTENTS
Sl.no Topics
1 Abstract
2 Introduction
6 Methodology
7 Conclusion
8 Reference
ABSTRACT
The Internet evolution continuously generates several changes in social habits related
to communication and lifestyle. The bandwidth growing originated the birth and late
spreading of complex file-sharing systems. This systems known as peer to peer
software let users share files they had stored locally in their personal computers with
other users connected to the same system. Music sharing started thanks to software
like Napster (www.napster.com) or late Audiogalaxy (www.audiogalaxy.com). These
peer to peer systems revolutionized the music industry and so the habits of people
related to musical collect and playback. Now it was easier to search music, easier to
store music, and much cheaper to get it. This new situation led to massive music
storage for sharing purposes and affected the way music was reproduced, changing
from complete and straight album reproduction to the creation of complex playlists
composed of many artist and musical genres.
The continue increasing in connection speed and trends in web development
technologies, given rise to large web systems nowadays visited daily. Among these
advanced systems, there are systems that allow users to listen music online without
the need of downloading it to their personal computer. This issue solves a big
problem originated by peer to peer software. This is the music copyright problem
confronted with the music purchaser rights. The time to define the line delimiting the
freedom a user has to share something he allegedly bought legally had come. Big
music distribution companies started legal battles against the most important peer to
peer software owners, the success of these legal processes depended on the copyright
laws in each peer to peer hosting country. Despite some peer to peer software systems
stand till nowadays, this web music services came up as new music sharing formulas.
The music listening services own big music catalogs, in order to provide a wide
public use. These same services manage the copyright problems for each country.
They adapt the musical catalog according the copy and reproduction rights of the
musical label associated to each album. Most of these music services are paid, some
provide free access to the musical catalog, but no reproduction rights. There is a wide
variety of these systems and new alternatives are constantly emerging increasingly
improved. Some are simple players providing playlist functionality (prostopleer.com),
others accompany the player with a recommendation system of similar artists
(www.spotify.com), also there are complex collaborative systems in which hundreds
of people leave comments on songs (www.pandora.com, www.lastfm.com ) and have
the chance of interact with each other as in the newly emerging social networks.
Music Recommender systems can be seen as a surrogate of real-world radio stations
and music magazines. These real-world organizations main purpose is to promote
certain artists, sometimes because the radio directors or magazine editors find
noteworthy the quality of their musical works, sometimes just because of economic
interest. Some people listen to these stations and read magazines in order to make
decisions about what music to acquire: either by traditional means or through some
share-alike network. Music recommenders have the chance of making accessible to
users not only the market-defined “good music”, but also new emerging groups,
minor rare music and independent label’s productions.
Approaches to recommendation :
Recommendation is an important field strongly related to web business which
has been intensely researched in the past years, since electronic commerce web
sites started their activity. Among the reviewed approaches, many solutions were
found for data analysis, data colleting, or data-objects representation. According
these, models like content-based, collaborative or context-based give differenced
solutions to select key information to face recommendations. In parallel to the
recommender model’s development also evolves the mathematical world related
to the most pure heuristic bases of recommender systems.
The most common mathematical [1] models used in current recommender
systems have been reviewed helping the author of this project to build up a solid
idea about what recommendation is and how can it be achieved. These
mathematical approaches to recommenders:
- Logic recommender systems [17] try to find an exact match among
the recommendation options compared the user profile. The data
representation is build using the attributes that define objects.
Attribute types as used as they are with no further abstraction.
- Vector space-based systems [28] due to its numeric data modeling,
estimate which objects best suit the user profile statements. Data
object surrogates represent attributes in vector form being each cell
a concrete attribute.
- Probabilistic systems [17] estimate a concrete object’s importance
using a probability function. This function estimates the probability
the object has in order to meet the preferences stated in the user
profile.
The application of these methods depends on the features of the recommendation
problem itself. An important step of the design phase is how to adapt the
problem abstraction to a suitable mathematical model,
This project has been designed in order to meet some logic-based recommender
features. In one hand, due to the need of overcoming the numerical
representation of data, needed to use other mathematical approaches, in other
hand because is allows the closest data representation domain to this problem.
Project definition :
Thesis structure
This project is composed by two parts of differenced nature. First it has been
performed an introductory review about recommendation, explaining why the
recommender systems are so important for electronic commerce, what kind of
systems do perform recommendation focusing arguments in music
recommendation. Along the section 2 of this document, it’s presented a general
recommender system’s overview, including a possible taxonomy [6], a technical
overview [17] and a classification [1].
Section 3 includes a complete music recommender’s overview, including
previously proposed approaches, and reviews of commercial and non-
commercial systems which are currently online. Following this introduction to
music recommenders, the problem this project is facing is defined (Section 4)
and so the selected methodology to achieve it (Section 5). The development has
been sliced in three iterations, which faithfully represent the real project
evolution. Sections 6.2, 6.3 and 6.4 deeply explain the actions and decisions taken
in each one.
After presenting this project, the conclusions‟ section (7) includes some
observations inferred from the whole project execution. The contributions
achieved in this project, are just my two cents for the huge knowledge about
recommendation out there.
1. The recommender systems :
Introduction :
The roots of recommender systems were settled due to special needs of works in
diverse fields: cognitive science [19], information retrieval [20] or economics
[21]. Recommender systems emerged as an independent research area in the
middle 90s and their important role to enhance data accessibility attracted the
attention of both, academic and industrial worlds.
Recommender systems are a useful way to expand search algorithms since they
help users discover items they might not have found by themselves. A
recommendation is basically to present the user with some items which would
match his preferences. There exist different approaches [1] to collect information
about the users, by monitoring their interaction, by asking them to perform
some actions or to fill some forms with personal information. The user's
interaction with the system provides two types of information:
Implicit information: Collected from the user interaction itself. For example, by
keeping the items the user has interacted with, and item related information like
viewing times, item’s reproductions or user related information as group
membership.
Explicit information: The users provide this information every time they give
opinion about items, rating or liking some item. Generally all the information
elaborated by the user consciously.
The recommender system collects both kinds of information to generate the user
profile. This profile stores information not only about the user likes, also
information about the user itself, current placing, current personal needs, sex,
age, professional position, and so. The way it's used by the recommendation
system varies a lot among the different systems. The information stored within is
also a determinant factor in the recommender algorithm design.
1.1. Taxonomy for recommender systems
A possible taxonomy of the recommender systems it has been proposed in [1]. The categories
in which is divided describe diverse models of abstraction for user profile, how it is
generated, and how is it late maintained and how does it evolve as the system runs.
User profile representation: An accurate profile is an important task since the
recommendation success depends on how the system represents the user's interests. Next are
listed some models applied in current recommender systems:
- History-based
Some systems keep a list of purchases, the navigation history or the content of e-mail boxes
as a user profile. Additionally, it is also common to keep the relevant feedback of the user
associated with each item in the history. Amazon1 web site is a clear example.
- Vector-space
In the vector space model, items are represented with a vector of features, usually words or
concepts which are represented numerically as frequencies, relevance porcentase or
probability.
- Demographic
Demographic filtering systems create a user profile through stereotypes. Therefore, the user
profile representation is a list of demographic features which represent the kind of user.
-Classifier-based models
Systems using a classifier as a user profile learning technique, elaborate a methodology to
monitor continuously input data in order to classify the information. This is the case of
neural networks, decision trees and Bayesian networks.
- Weighted n-grams
Items are represented as a net of words with weights scoring each linking between nodes.
For example in [22] ), the system is based on the assumption that words tend to occur one
after another a significantly high number of times, extracts fixed length consecutive series of
n characters and organizes them with weighted links representing the co-occurrence of
different words. Therefore, the structure achieves a context representation of the words.
1
www.amazon.com
2
www.jamendo.com
Initial profile generation:
- Empty: the profile is built as the users interact with the system.
- Manual: the users are asked to register their interest beforehand.
- Stereotyping: Collecting user-related information like city, country, lifestyle, age
or sex.
- Training set: providing the users with some items among which they should
select one.
Profile learning technique: The way the profile changes during time.
- Not needed: Some systems do not need profile learning technique. Some because
they load the user related information from a database or it’s dynamically
generated.
- Clustering: Is the process of grouping information objects regarding some
common features inherited to its information context. User profiles are often
clustered in order to groups according to some rule. To assess which users share
common interests. Recommenders like Last.fm3 or iRate4 perform this
technique [12].
- Classifiers: Classifiers are general computational models for assigning a
category to an input. To build a recommender system using a classifier means
using information about the item and the user profile as input, and having the
output category represent how strongly to recommend an item to the user.
Classifiers may be implemented using many different machine learning
strategies including neural networks, decision trees, association rules and
Bayesian networks [1].
- Information retrieval techniques: When the information source has no clear
structure, pre-processing steps are needed to extract relevant information which
allows estimation of any information container’s importance. This process
comprises two main steps: feature selection and information indexing.
Relevance feedback: The two most common [1] ways to obtain relevance feedback is to use
information given explicitly or to get information observed implicitly from the user’s
interaction. Moreover, some systems propose implicit-explicit hybrid approaches.
- No feedback: Some systems do not update the user profile automatically and,
therefore, they do not need relevance feedback. For example, all the systems
which update the user profile manually.
- Explicit feedback: In several systems, users are required to explicitly evaluate
items. These evaluations indicate how relevant or interesting an item is to the
user, or how relevant or interesting the user thinks an item is to other users.
Some systems invite
3
www.last.fm
4
irate.sourceforge.net
users to submit information as track playlists. Rate uses this approach to provide its
recommender with finer information about user’s preferences.
- Implicit feedback: Implicit feedback means that the system automatically infers
the user’s preferences passively by monitoring the user’s actions. Most implicit
methods obtain relevance feedback by analyzing the links followed by the user,
by storing a historic of purchases or by parsing the navigation history.
It should be noted that the performance of the frameworks presented next, will
suffer severely due to limitations in the scope of available features. For instance,
when item surrogates do not feature attributes that represent some the actual
features people perceives on those objects. Obviously, this is not a problem of the
frameworks themselves, but about the manifestation of the knowledge-
representation designers.
This model is based on the idea of exact match: the system rejects or accepts
objects depending on whether they satisfy the constraint statements present in
the user profile. As described in [17], objects that match the constraints in the
profile, since share the same characteristics, are considered to have the same
utility value.
Domain representation
The object surrogates are represented as a group of attributes associated to some
object identifier or to some description text. Each of these attributes has a well-
defined type, which fully conveys the semantics of the attribute value. Regarding
user profiles, those collections of statements that define which attributes
values are considered useful. These statements effectively restrict the range
of values an object’s attribute may take so that it is considered useful.
Comparison process
The main operation for comparison is checks whether the set of
attributes associated to a given object satisfy the constraints encoded
in the user profile. It is searched within an object repository which
matches perfectly those constraints. The number of attributes to
consider is fixed. This process turns each list of attributes attached to
indexed objects into a list of boolean assessments that represent
whether a constraint present in the profile is satisfied or not.
Usually, to avoid too strict specification, the list of constraints is
parsed in disjunctive normal form: as soon as a constraint is satisfied,
the object is accepted.
Domain representation
Comparison process
5
mercury.bio.uaf.edu/courses/wlf625/readings/MLEstimation.pdf
Drawbacks and limitations:
In practice numeric attributes can be difficult to handle since they imply
integrating the estimated probability density function, as stated in [24]. Numeric
attributes can be clustered, or better said, encoded as a set of discrete symbols.
These resulting synthetic categorical attributes are not the exact equivalent of
their numeric meaning, because any mapping function from the infinite set of
numbers to a finite, not very large set of integer numbers implies a loss of
information. The assignment of points to target symbols and the measurement of
the associated distortion phenomena is optimally obtained if:
- Maximizing the homogeneity of assignments
- Maximizing the minimum separation between cluster assignees
This fact is considered a NP-hard problem 6 since no exact, polynomial solutions
are known. Therefore lots of numerical approximations to this problem have
been analyzed as optimally reviewed in [18].
The heuristic based techniques focus on the pure algorithmic part of the
implementation. The big advantage of these techniques is that they are not based
on a complex system architecture. Therefore these solutions can be easily
plugged into whatever kind of recommender system designed following some
algorithm-independent approach [13].
The model based solutions move a step forward by creating a complete pattern of
recommender system. Each model defines its item's surrogates, the profile
generation and maintenance. The algorithms used then for matching purposes
might be analytically selected, based on the desired system's behavior.
A different overview differences each approach according decisions taken when
designing item surrogates are mainly guided by the approach selected to estimate
the utility of a given item A for a particular user U. There are two main branches
for this overview, on one hand based on the social properties of networks, such as
the collaborative filtering [3], on the other hand relied in the user interaction and
preferences, like content-based filtering [4]. The proposal described in [1] studies
the possibility of combining both techniques, referred as hybrid recommendation
systems, obtaining finer recommendations from better suited user profiles. I
found further complete the solution explained in [6] and resumed next.
- Content-based systems: item surrogates will be composed of attributes
that characterize their information content.
- Collaborative systems: item surrogates are reduced to their minimum
expression, and their utility estimation is more a matter of statistical or
probabilistic prediction.
- Context-based systems: item surrogates are composed of contextual
information.
- Hybrid systems: using combination of all of the above methods.
Conclusion :
This review shows the variety of decisions to make when
planning a recommender system, offering a complete summary that
eases the decision making process upon analysis‟ phase. Some
decisions visualized after this analysis state that the user profile
employed in this recommender could be based on history based
generation, due to the monitoring capabilities of the web interface
itself. The user profile will be refreshed when new information is
retrieved from interaction, therefore user profile is continuously
evolving. The only relevant feedback taken into account for this
purpose is the purely implicit. It is retrieved optimally by the case-
designed web interface. The most suited solution found for this
purpose relies on logic-recommenders as mathematical model and
data domain representation, while its features as recommendation
model still could be more precisely estimated after next section, where
music recommenders are introduced.
There exist a variety of web systems ready to help users discover new
music, some are commercial applications, some are open source projects
easy to inspect, therefore providing useful information about their model
design. Commercial systems are completely closed to users, no reviews
detailed in their documentation or internal logic explained at all. Some
documented users could infer the model of these services, but it is
impossible to get detailed information about the recommendation
algorithm or user profile concept implemented in this systems.
Most of commercial systems often implement a complex
recommender structure, some examples are Last.fm, Grooveshark or
Spotify. All of them incorporate a music recommendation algorithm
as an important part of their working. This algorithm is an
information-filtering system itself, which plugged into musical
systems, tends to sharp the music collection presented according the
user's preferences. Some of the most important (in terms of
popularity) music services are Last.fm7, Pandora8, Spotify9,
Magnatune10, but also implement recommender algorithms lots of
Internet sites as Apple music store11, (ITunes is the most popular
according [5]), or the Amazon e-Commerce website12.
There exist pure music recommenders like Emergent-music13
inside commercial world and for inside non-commercial world as
iRate14. Despite these systems are not large complex communities as
those mentioned above, they successfully fulfill recommendation
actions. It ought to be considered that collaborative approaches
strongly depend on the number of regular users the system is
managing. I understand a regular user as someone interacting
frequently with the system; otherwise, a latent user will not be useful
for collaborative purposes. Some of them are reviewed next while
those selected to be used for this purpose, upon being deeper studied;
are listed in section 6.2. First Iteration
7
Last.fm website: www.last.fm.com
8
Pandora system: website: www.pandora.com
9
Spotify desktop system: www.spotify.com
10
Magnatune radio website: magnatune.com
11
Apple Itunes Store: www.apple.com/itunes/what-is/store.html
12
Amazon website: www.amazon.com
3.1.1. LAST.FM
Last.fm website15 is one of the most outstanding music recommenders out
there. It clearly illustrates the concept of collaborative filtering
recommender system. Users access recommendations by connecting to a
web-based music streaming service. The tracks played on that stream are
the recommended items. Like while listening to the “random” broadcast,
users can tell the system whether they find the item being broadcasted
interesting or just plainly ban the author of the item being broadcasted.
There are two kinds of recommendations streams: one for subscribers and
another for non-subscriber. Depending on the user being a subscriber the
recommendation algorithm precision varies. In the cases of non-subscriber
users, the items broadcasted are selected according to a group of user
profiles that are found to be similar. Subscribers can access a music stream
whose contents are governed only by their user profile. It is then expected
that the items on that personalized stream match more closely user
preferences.
Audioscrobbler.com is an open source project16 that acts as a data
harvester for Last.fm web service. It uses and requires functionality of a
quite complex and expensive infrastructure. This seems to be mostly paid
through a donation system, where users are expected to donate the amount
they feel the system deserves. Users that donate money become
subscribers accessing enhanced services.
In order to build up the user profile, the Las.fm system has implemented
three different approaches:
- User adding explicitly items (artists) to their profiles
through Last.fm web interface.
- Get the AudioScrobbler.com plug-in, available for a wide
range of media players, which records which tracks are
played. Once a certain number of playback events have
been recorded, a report is sent automatically to the Last.fm
servers. This information is integrated together with other
previous statements in the user profile.
- User can connect to Last.fm Radio, consuming a stream of
music over which features a significative proportion of not
very popular artists. Users, through a set of web controls in
the website, can tell the system whether they approve or ban
the artist whose work is being played at the moment. This
feedback is also integrated into the user profile.
15
www.last.fm
16
www.audioscrobbler.net
3.1.2. FLYFI (Emergent music)
Emergent Music17, at starting point presents the user with a list of top
downloaded tracks and top listened tracks for the current week. This
certainly suggests some collaborative filtering inside their recommender
engine. The user can create playlists of songs which are saved
automatically inside the user profile. Users are allowed to either download
published tracks or listen to them through its streaming service. However,
there is not always the option to do so: artists decide whether to make or
not publicly available their works.
The playlist creation feature acts as an user activity sniffer, creating
relations between songs. These relations and the songs included help the
recommender to build the user profile. Recommended items are ordered by
its expected affinity with user’s taste. Besides that, users can also perform
simple searches on the recommended items, specifying several keywords.
Feedback on recommendations is given by explicitly rating of presented
items. The interface offered for this task is quite simple. The problem
comes with the number of recommended items which may be greater than
one hundred, which implies a hard task for the user to give each item
individual feedback.
Emergent Music is a Music Recommender exclusively based on
collaborative filtering algorithms and techniques. It also provides a
desktop application called Goombah. It has a more complete interface than
the web system, interacts with their database to provide the user with
recommendations and music associated to the played song. Other feature is
a partnership-like playlist scrobbler18 in association with Tunes. This
software must be installed locally and acts as a boosting element for the
recommendation engine. It creates relations among the tracks listed inside
the current Tunes playlist and between the user profile and these tracks,
which are loaded from the user’s music collection.
Flyfi web site: www.emergentmusic.com
From Last.fm scrobbler engine: www.audioscrobbler.net
3.2.1. JAMENDO
Jamendo is an online community19 of free, legal and unlimited music
published with Creative-Commons licenses. All the music published on
the Jamendo site can be used free for personal use. The Creative Commons
license allows the owner of the music to retain some rights, while giving
users the ability to download and listen to this music freely. Commercial
rights are applied to each musical piece separately and are handled through
the Jamendo site. This allows the site to offer free music to its users yet
allows publishers to earn some income from the commercial rights to their
works.
The site also features a group of selections, Radio Stations, and Playlists
created by users. The site also features social networking aspects such as
user profiles, user friends, community forums and inter user-messaging.
Songs can be streamed or downloaded, depending on the copyright laws
ruling in the country where the download is sent.
The music stored in Jamendo database is cataloged by artist, album and by
tagging options. Content-based filtering is present in the recommendation
engine as well as a collaborative solution that models the interaction
between users. There exist user clustering classifiers that can be composed
using some of the many interaction possibilities a user can have with other
system elements. Some features as groups of friends or the internal
messaging among them are quite useful for a collaborative approach due to
its semantic contribution by linking their musical preferences with the
common features defining their profiles.
The system provides a web service, which can be used by developers for
example for adding free music to their web sites. It is provided an api
documentation20 where it’s explained clearly what to do to interact with
this web service.
Jamendo is an incipient music recommender with constant growing and
increasing popularity nowadays. Now it has more than 10.000 albums
available for streaming or download. They provide help for new groups to
promote their work offering flexible licensing features. It’s a really
interesting web site but still with an immature music catalog.
iRATE developers emphasize that the system does not intend to
become a smart P2P network. This is further enforced by making sure that
the only music made available is licensed under the any of the Creative
Commons licensing patterns. Users are required to rate explicitly the songs
the system presents them. This is achieved by getting the users to install a
Java-based application, which downloads the music published on the
system servers. As soon as the client application downloads a song, it is
played back at the user. The user has the opportunity to rate it using the
client application user interface. Songs rated with the minimum score are
immediately stopped by the system, starting the playback of the following
song if exist.
An extra feature this system includes is the music related news which are
all about the musical interest inferred from the user’s FOAF profile. They
make use of a very interesting service called Pubsub 23 whose purpose is to
maintain up to date a big information index. Pubsub collects news from
over 13 million weblogs and around 50.000 newsgroups, and adverts the
user if some new content matching his search terms have been published.
Upon the news are collected, the FOAF system uses the TF/IDF algorithm
to score the news documents and present them to the user ordered by
relevance, as explained in [12].
22
FOAFing the music Project: foafing-the-music.iua.upf.edu
23
Pubsub Website: www.pubsub.com
Problem definition
Aim
This project aims to create a free web-based music recommendation
system able to estimate the user’s musical preferences and elaborate
recommendations of several musical elements according to these
preferences.
Information sources
The system intends to use various online music services, through
which obtain listings of groups and artists presented to the user. The music
collections retrieved from music services act as a browsing environment to
let the user navigate through music. Each music service provides several
features, some common to all, others are particular features.
User interaction
The user interaction is done through a web interface accessible from
any platform with a web browser. This interface provides great
opportunities for interaction enabling continuous navigation through
thousands of albums and artists. It is designed to allow the user easy and
intuitive interaction, it should be mentioned that the longer the interaction
is, the more complete information about user´s likes is stored, therefore
better recommendations generated. This recommender system is basically
a software element that studies the user's browsing patterns and then
decides what to present next.
System features
In order to provide the system with a complete musical collection,
several music web services have been reviewed. Unfortunately, the lack of
a relational music database limits somehow the freedom for managing
music data in my own way. Therefore the music catalogs are loaded
dynamically from these music services when the user interaction requires
them. Upon this data is loaded, the system extracts the significant
information about the musical items (item surrogates) to evaluate what
kind of music the user is interacting with. By monitoring this interaction
the system is able to build a user profile, which is not understood as a
constant definition of user’s preferences, instead it’s conceptualized as an
adaptive changing pattern. In this way, the system is able to store a historic
of user interaction as a long-term user-system relationship but still reacts
more sensitively to recent occurred events, preserving the system from
over valuating the most frequented items, storing also mid-term interaction
memory.
Improvements :
It has been observed that most music recommenders rely on
collaborative filtering techniques to support sometimes or to boost others
the recommender system functionality. The nature of this filtering slightly
diverges from the pure concept of recommendation, which is strictly based
on the current user’s preferences.
It has been proved in [25] that collaborative filtering provides good
recommendations to users with no previous knowledge about user likes
(explicit). The fact inspiring this project’s aim is to base recommendation
explicitly in implicit information retrieved from user actions.
The problems commented in section 6.4 have been probably solved by
current commercial systems due to experience obtained during its time
online. It’s believed so because these systems provide good
recommendation results to users as well as economic benefits to founders
(in the opposite case they won’t be online). This project offers a content-
based context-based recommender, able to provide new musical content,
without being influenced by any collaborative-like procedure. Problems
related to the content-based model described in section 3.2 have been
solved using custom designed algorithms (section 6.3 and section 6.4) as
commented in section 6.4 .
2. Methodology :
The selected methodology used for developing this work is the Rapid
Application Development explained in [10]. Its characteristics fit very well
to the needs identified after the planning of project execution. These
characteristics are:
- Iterative
- Based on goals and use cases
- Using GUI tools, CMS, etc.
- Periodic testing system
- Track Changes
Iteration’s steps:
1. Determine objectives, alternatives, and triggers for iteration.
2. Evaluate alternatives, identify and solve problems.
3. Develop prototypes and verify the results of previous
design.
4. Specify objectives for the next iteration.
3. Development :
Summary
The project development from the first analysis of the tools necessary to
implementation and final testing has been carried out using the method of
rapid application design. The most outstanding feature of this methodology
relays on its iterative nature. The overall process has been divided into
several stages of development. Each stage is determined by previous
targets and final conclusions setting out the objectives of the next iteration.
Each stage includes activities related to analysis, design, implementation
of prototypes and their late testing. The following summarizes the stages
that emerged during the project planning and the activities belonging to
each one.
3.1.1. First Iteration
1. Objectives:
1. Objectives:
1. Objectives:
Objectives
Last.fm provides developers with a powerful web service with a huge and
comprehensive music catalog. This service provides many details about
groups or artists and their albums, as well as images in various sizes for
artist and album art. It also includes a tagging system in which users can
add personal tags that can be used to classify music privately as a way to
create personal collections. Many tags are applied to each artist, album or
song depending on their popularity. Tags help to extract the musical genre
of the album or song, among other things, being a serious drawback that
these tags are not present for every artist, album or track. The complete
web service and musical catalog makes the last.fm web service to be one
of the most suitable for the purpose of this project, but also has some
limitations: it doesn’t allow listening or downloading songs. The API does
not provide solutions for previewing or downloading a track, listening is
only available through the web site, and only by payment account.
Play.me
Tests using the Play.me Web Service determined that not all items
announced on its API website have real support in the web service.
It has been implemented a software layer or wrapper to access the web
service API, intending to evaluate the usefulness of the service for this
purpose. The information provided inside the API documentation and
returned-data definitions are not completely accurate. At the time of
representing the information graphically in the browser, the images of
artists and albums are crucial. With them, the platform providing
information to the recommender algorithm has a rich visual interface
which makes it more entertaining than a pure text-based web page. The
images about musical items are represented as links to the content, but
often do not charge due to broken link or, unhopefully, due to internal
server errors. Play.me web service includes a very interesting field inside
music items, this is the genre, very useful in order to catalog content and
thus be able to recommend music based on musical genre. Other
interesting feature is the possibility of listening a short 30 second sample
of each song.
The most serious drawback found for Play.me is the musical catalog, even
if it is big, is not comparable with the one owned by Last.fm or by
Musicbrainz.
Development-framework analysis
32
www.magnolia-cms.com
33
atleap.dev.java.net
34
www.pligg.com
Communication with Last.fm web service
35
Last.fm api documentation: www.lastfm.es/api/intro
36
creativecommons.org/licenses/BSD
37
Last.fm java bindings: code.google.com/p/lastfm-java
38
Servlet technology: www.oracle.com/technetwork/java/index-jsp-135475.html
39
www.ajax.org
40
www.json.org
41
Musicbrainz xml web service: musicbrainz.org/doc/XMLWebService
42
Musicbrainz java code: sourceforge.net/projects/javamusicbrainz
43
Play.me API: lab.playme.com/api_overview
The element of communication uses these software components that
connect the system with web services. Upon receiving a request from the
web interface, the communication element prepares the requests for each
music service. The responses from each music service are analyzed by
extracting the necessary information from each service response, and then
generating a full data object with which the web system can perform its
activity. The following diagram represents the structure of access to music
services and communication with the web component.
Objectives
45
www.lkozma.net/knn2.pdf
This algorithm, when interpolating class labels, has two
parameters:
- k or number of neighbors to be calculated to infer the label.
- the kernel function to approximate the numeric relationship.
The learning set is the set of rated items appearing on were are the
item surrogates limited to real value features. In order to infer the utility
for an unrated item, we compute the distance vector.
where L2 stands for the Euclidean distance (second order metric). This
vector D is normalized to lie in the [0, 1) interval according to the
maximum distance Once distances are normalized, we linearly interpolate
the utility for item where is the number of elements in . Note that since
utility assignments are either 0 or 1, it is needed only to consider those
objects that were assigned non-zero utility. The full recommendation
algorithm pseudocode is shown next. In this version, parameter k is , the
total number of rated items in .
is the set of items eligible for recommendation, ordered according the
predicted item utility, and is the set of items that will be presented to the
user. It requires a training algorithm which should be designed according
the system behavior.
K-nearest neighbors algorithm
Drawbacks
46
http://www.eng.tau.ac.il/~bengal/BN.pdf
47
JointProbabilityDistribution:www.stat.tamu.edu/~henrik/
211S05/notes/chp5.pdf
48
en.wikipedia.org/wiki/Principle_of_indifference
Network parameters estimation
where stands for a numeric variable belonging to . Note that since we are
considered conditional probability distributions, in practice, we will have
to estimate two sets of parameters , one for each of the possible outcomes
of . The simplest approach to obtain an estimate of these is to choose the
parameters so that they maximize the likelihood of the data. In this case,
the data are the pairs of utility assignments . Next it is needed to select
good estimates for . Many previous works propone the sample-mean and
sample-variance to be good estimates of distribution parameters, having an
hypothetical continuous training data.
The real problem really has a continuous training data which are the
selections of music items, which arrive continuously to the data collector
from the web interface as a consequence to user interaction. The problem
is that this number of infinite inputs must be determined while performing
the calculation thus estimation of parameters is still mandatory. Other
important drawback for application on this problem is the poor capacity of
numeric abstraction over systems‟ item surrogates, which have
independent textual significance and cannot be numerically estimated.
3. Algorithms based on music data
These weights will be recorded in a preference table for the user. After
calculating the weight for each music group, the recommender system
ranks all the music groups. The music group with a greater weight takes a
higher priority of recommendation. To avoid recommending a large
number of music objects to users, the recommender limits the number of
music objects for being retrieved. According to the , different numbers of
music objects from the music groups will be recommended.
The STA Method is based on the use statistics [16]. They define a
long-term hot music group as the music group containing the higher
number of music objects in the access histories of all users. Furthermore,
it’s also defined a short-term hot music group as the music group
containing the most music objects in the latest five transactions in the
access histories of all users.
4. First Algorithm proposal
The system needs to collect information about user interactions for the
recommender. For this, the user interface is designed in such a way it
allows to obtain data related to the clicked items.
To elaborate a context that defines user preferences, every user
interaction is stored in a profile-related structure independent for each user.
This structure contains all the user selections made during his session. The
recommendation is made when the structure has enough information to
provide a possible recommendation. To decide when the information
stored is actually providing an adequate pattern, can be arbitrary decided
depending on the desired behavior. By establishing this threshold to higher
values the recommender engine disposes more data to generate
recommendations therefore being more satisfactory in terms of musical
coverage. As counterpoint, the user profile evolves more slowly in time.
The opposite behavior could be obtained by selecting a lower threshold .
Each recommendation is executed upon this threshold is reached and
iteratively the input data-set to perform recommendation increases.
Let us represent the recommendation input-set for the th time the
threshold is reached, and the set of stored item surrogates.
For its iteration the threshold is refreshed as follows:
The algorithm uses statistical techniques to determine the most selected
artist over the data-set . If some artist is dominantly selected, the
recommendation is made about this artist, making an album compilation
composed by selecting some albums from this artist and some from its
similar artist’s albums. This artist is stored as some artist that could interest
the user inside the user profile. If no dominant artist is found inside the data
collected, then the recommendation will be based on musical genre. The
statistics applied on genre data collection tend to retrieve the rate of the
most “hit” genre, are explained following:
If is over a 66% of hits, then it’s believed that this genre is interesting for
the user. Therefore the recommendation will be based on artist related to
that musical genre.
If is over a 33% of hits, then the rule inferred is that this genre is not a
decisive element able to represent the interaction mood, thus, a set of
related musical genres
to the given is retrieved. Let us represent the set as the set of
genre tags related to . Then artists are retrieved for each
musical genre inside .
- If is under a 33% of hits, it is inferred that the user has not still
taken a constant listening attitude maybe because he didn’t
interacted too much time in something he could find
interesting. Then recommendation will try to provide the user
with new musical genres he still didn’t check.
The selected artist and the selected genre, if present, are stored in the
user profile. This information will not be used to perform future
recommendations but for loading a user- related home page interface next
time he starts a session, as a memory of the previous experience but not
affecting the current decisions of the recommender. This way the user is
provided with an alternative to start the interaction with the system through
these memories instead of searching for a concrete keyword as the very
first time. This feature increases general application efficacy by offering
the most promising results from the last session as a starting point for the
current.
Similar artists‟ evaluation :
49
Project website: www.audioscrobbler.net
50
www.youtube.com
fig 7 schematic figure of web system‟s structure
Redefinition of user interaction :
Conclusions
Objectives
The objective for this iteration is mainly to enhance the whole system,
which is observed to work properly but greatly improvable still to be left
as it is as final system. In order to achieve better recommendations, an
alternative to the proposed algorithm would be found. Furthermore,
extensions in the persistence system can be done in order to store part of
the information the system needs o create recommendations. This
extension will be studied along this iteration assessing its advantages and
disadvantages.
A better user interface is to be designed, once the problems related
previous design have been identified, won‟t be difficult to achieve
improvements in this matter.
Control all the bugs and exceptions emerging to the web interface will be
other objective covered in this iteration. Some events which fire upon
abnormal web services responses are not easy to handle, but enhancing the
exception control, no error pop-ups or blank results page would be
displayed.
Algorithm analysis
Most of the reviewed systems rely on collaborative filtering to solve
problems like the unrated item problem described in [7]. The previous
algorithm could be defined as a content- based like with some influence of
context-based recommenders, overcomes this problem by including non
previously interacted (rated in other approaches) genres or artists. It is
done when no relevant information is found inside the current interactions
data-set. The approach is to select music genres which are not present
within this data-set and elaborate the recommendation based on these
genres.
Focusing the problem
The problem still pending to be solved relies on the fact that the retrieved
genres could be completely disliked by the user. It will be called the
problem of blind recommendation. This might happen as a direct
consequence of the recommendation method itself. The method completely
forgets about user‟s preferences stored in the profile and looks for new user
genres using the web services. It may be a significant improvement to
include in this case, some details regarding user preferences, which might
be inferred using different techniques or extending the present approach.
Solving alternatives
Many different alternatives to implement the recommender
procedure can be found in
[14] often based on combinations of collaborative-like and
content-based-like approaches. It
seems interesting for this recommendation issue the fact of scoring
each user selection, for example, by giving higher ranks to items clicked
more recently while lowering the rankings for the most distant in terms of
time. Including this ranking system, it is possible to reload previously high-
clicked items when no relevant information found inside the monitored
interaction. Thus, the blind recommendation can now be based in some
item, that maybe does not seem interesting to the user currently, but at least
it can infer that in some moment it was, avoiding to provide completely
uncorrelated items according user preferences.
Analysis of user management solutions
There are several ways to solve the user management in a web site.
There exist a variety of modules that can be plugged, Java libraries, etc.
The system needs an element that checks the user's identity, preventing
access to application data to any user without valid credentials for access.
The element must be incorporated into the system without making major
modifications to either element.
Several modular solutions have been studied, assessing the complexity
of installation, ease to configure, the type of server they use and the
problems of incompatibility the database system version used. Among
them the most outstanding are Josso, JFacets and OSUser. All of them are
“easy” to install and configure. Josso57 was chosen because of its apparent
simplicity, it is providing various security protocols, and as main feature +,
the functions it provides match the needs of this project and are written in
the same J2EE programming language used in the web system.
57
www.josso.org
Design of new algorithm
For the algorithm design previous problems tend to be determined
precisely. Then some solutions for each determined problem will be
provided in order to enhance significantly the previous lacks in
recommendation precision.
As a previous problem it has been identified that sometimes the
implemented algorithm completely forgets about user preferences as a try
to provide him with undiscovered music genres. This behavior is not
counter-productive, because many genres will be provided being very
improbable every genre is completely disliked by the user. But it still
breaks with the finer recommendation politics and maybe cross a
conceptual border which should be strictly respected.
A limitation of the previous implemented model relies on the
impossibility of performing searches by genre for example among the
results previously loaded. It is possible to request to Last.fm an artist
collection given a genre , but this results, are ranked results, and the
method does not retrieve all the available elements, just the top tagged
using that genre tag . It is possible to store some other information about
user interaction inside the database system. Each time a user interacts with
some music element , classified using the tag this interaction will be
stored in the database. As more times the same interaction occurs more
important will be that element for that user. It can be also stored the
element surrogate genre in order to classify information for further use of
the recommender system. Upon each user interaction, an interacted
element will be associated to its tag and associated to this user. This
associationcan be stored as our own genre classification for the interacted
artists.
Then would be easier for the recommender to collect information
about user’s likes even to start a collaborative-like environment in which it
is known which user has interacted with which artist and when. Further
information can be stored like playback environment in order to specify
more clearly the context in which the interaction occurs. By including this
new feature, the use of some memory about the evolution of the user
preferences is possible. Using the a artist-genre-user-time relationship
table, is feasible to keep a long-term memory about user interaction as the
time the previous monitoring structure caring about mid-term memory is
still used.
Assumptions
A hit is a click over a musical surrogate58.
The long-term memory is stored permanently LTM. Stores relations
between artists and interacted by all the system‟s users.
The mid-term memory or MTM is stored temporally until the http
session expires.
Each click refers to a musical surrogate independent from the item
(artist, album or track) that generated it. The surrogate includes artist name
and musical genre. Due to reasons beyond this system‟s capabilities, the
year or époque related to the musical item could not be taken into account
for recommendation purposes because it is absent in most of the musical
elements checked.
60
Javascript definition: en.wikipedia.org/wiki/JavaScript
61
jquery.com
62
www.mysql.com
63
www.phpmyadmin.net
Conclusions
Project objectives
The system development has been completed successfully. Its
usefulness for discovering new music has been tested meeting and the
goals stated at project’s objectives.
It has been proven that is not needed to build a huge information
system to provide the user with new music that matches his likes.
Taking advantage of available web services which provide complete
music catalogs for non-commercial purposes.
However, it was observed that the development of a
recommendation system with commercial features actually requires
an extensive relational database to store the music previously
cataloged. Creating a good relational database of music, making
relations between artists, albums, musical genres and époques, could
greatly expand the capabilities of a recommender algorithm to help
users discover new music.
Purpose remarks
An important point to comment is that while all commercial
recommenders claim to be offering personalized recommendations, the
truly nature of their predictions might be biased by the taste of the majority
of users, instead being biased by the actual user’s preferences.
This has been an important assessment made at the project work-lines
definition, to focus mainly in authentic user preferences, and furthermore
making especial care about the earliest implicit tips extracted from user
actions. The real motivation which pushed me to consider this approach
should be mentioned:
- On one hand, my personal view before and after this thesis
work about music recommenders is that they should meet user’s
preferences whatever approach is implemented to do so.
Relying in collaborative filtering techniques is a risky issue
because it slightly diverges from pure user interests, by
abstracting the user preference pattern to meet some similar
users. This abstraction is the actual loose of information which
generates the initial purpose deviation.
- On the other hand, the collaborative filtering is a way of
abstracting some king of music hierarchical order in which
users implicitly decide what’s the best music while their
actions are monitored. This fact is strongly determined by the
music industry and its merchandising policies. Their promoting
activities of some economically remarkable groups or artists
lead people to subliminally remember them, thus seeding in
customer’s minds a spot of interest related to this groups or
artists.
Technical remarks
Popularize the use of the fingerprint is one of the most promising
options for the future of music recommendation. The possibility of
combining the techniques of collaborative recommendation, with data-
mining techniques applied on the data collected from user monitoring, is
an interesting path to infer optimally user’s preferences. Subsequently,
adding the ability to recommend music similar (or identical or different)
according acoustic features, means that future recommenders could
provide us with the music we really want.
Utopical facts
The ultimate recommender could provide the user with a different
rock song (for assessing a concrete example), featuring a slightly faster
tempo (compared to the previous song) but with more folklorically-liked
instrumentation. Such query might be a common music query for future
recommenders.
4. Contributions
Reviews
A complete The recommender systems about the state-of-the-art of
recommender systems, specially focusing on music recommender systems
has been performed. Aimed to settle the project analysis and design
strategies, as well as define correctly the work outlines.
In order to provide the user with a free, complete musical catalog, a
revision of music web services has been performed. Through this work it
has been assessed which service provides better suited solutions, starting a
deeper analysis of them by implementing prototype wrappers to enable
communication with this web system. As a result of this deeper analysis,
the web system successfully achieves the communication functionality by
providing music data from the selected web services for its use within the
implemented environment.
Code
Code reusability is a key decision factor regarding the framework
selection. So, the project will be implemented using Java programming
language. This time-saving advantage benefited the project when some
BSD licensed-software components were found to be developed in the
selected language. Some custom adaptations or code modifications were
needed in order to successfully plug these extern modules inside the
project definition.
Unfortunately most of the code pieces needed to build up this work are not
previously implemented. A new wrapper has been implemented supporting
some required methods employed to retrieve musical data from Play.me
web service.
Other implemented wrapper allows communication with the video service
of Youtube. Its main feature is the track name parsing, oriented to properly
select the best suited video from a feed, composed by thousands of
possible matches.
Recommender algorithms
Upon a data domain was defined, a first custom designed algorithm was
implemented from scratch. This algorithm (6.3. Second Iteration was
designated to cover well defined recommendation constraints using custom
defined item surrogates. Its recommender helpfulness was probed using
empiric tests using random users, but were also assessed some limitations
and deviations from the very main recommendation purpose. A further
study, based on the current data-object surrogates and system
characteristics, gave rise to a new evolved version of recommendation
algorithm. The second algorithm proposal provides solution to problems
observed theoretically after successful empirical tests over the 4. First
Algorithm proposal
The second algorithm was designed using algebraic grouping
theory for solving some
use-cases where the recommendation was ignoring few possible positive
responses. Reviewed in previous works, the recommender problems
related to selected approaches (see collaborative and content based
filtering in section 3.2. Approaches to music recommendation) were
indentified. This second version has been designed to solve these problems
in an effort to develop a fine recommender. Grouping theory suits well this
recommendation approach: The collection oriented structure of Last.fm
responses, can be abstracted as a group, whose membership constraints are
based on musical features of objects.
[1] Montaner, M.; Lopez, B.; de la Rosa, J. L. (June 2003) "A Taxonomy
of Recommender Agents on the Internet", Artificial Intelligence
Review 19 (4): 285–330, http://eia.udg.es/~blopez/pub.html Sat, 23-
10-2010
[2] http://labs.oracle.com/projects/dashboard.php?id=153 Sat, 23-10-
2010. Search Inside the Music is a project of Sun Labs, Burlington,
MA.
[3] http://www.moyak.com/papers/collaborative-filtering.html Sat, 23-
10-2010
[4] Using Content-Based Filtering for Recommendation. Robin van
Meteren, NetlinQ Group, Amsterdam, and Maarten van Someren,
University of Amsterdam, The Netherlands.
[5] http://www.pocket-lint.com/news/29588/itunes-most-popular-music-
service Sat, 23-10-
2010
[6] Toward the Next Generation of Recommender Systems: A Survey of
the State-of-the-Art and Possible Extensions, Gediminas
Adomavicius, Alexander Tuzhilin, IEEE Members
[7] Creating a Hybrid Music Recommendation System from Content and
Social-Based Algorithms Jamie Cai, John Francis, Stephen Gheysens,
Rutgers University, GSET „09
[8] http://en.wikipedia.org/wiki/Acoustic_fingerprint, Sun, 24-10-2010
[9] http://marsyas.info, Sun 24-10-2010
[10] http://en.wikipedia.org/wiki/Timbre, Tue, 26-10-2010
[11] Belkin, N. J., and Croft, W. B. Information filtering and information
retrieval: Two sides of the same coin? Communications of the ACM
35, 12 (December 1992), 29–39.
[12] Foafing the music: music recommendation system based on rss feeds
and user preferences.
[13] Influence in Ratings-Based Recommender Systems: An Algorithm-
Independent Approach Al Mamunur Rashid, George Karypis.
[14] User
modeling via stereotypes. Cognitive Science, Rich, E.(1979),
329–354.
[15] Powell, M. Approximation Theory and Methods. Cambridge
University Press, 1981.
[16] Murthi,B. P. S., and S. Sarkar, The role of the management sciences
in research on personalization. Management Science 49, 10 (2003),
1344–1362.
[17] PSUN: A Profiling System for Usenet News (Extended Abstract) H.
Sorensen, M. Mc Elligott, Computer Science Department, University
College, Cork, Ireland.
[18] Music Recommendation by Modeling User‟s Preferred Perspectives
of Content, Singer/Genre and Popularity Zehra Cataltepe and Berna
Altinel Istanbul Technical University Computer, Istanbul, Turkey
(Review section)
[19] Probabilistic Models for Unified Collaborative and Content-Based
Recommendation in Sparse-Data Environments, Department of
Computer & Information Science.
[20] Collaborative Filtering for Information Recommendation Systems
Anne Yun-An Chen and Dennis McLeod, University of Southern
California, Los Angeles, California, USA
[21] An algorithm for finding nearest neighbors in constant average time,
E. Vidal Ruiz, Universidad de Valencia, Spain, 1995.
[22] Najarian,K. and Darvish, A. 2006. Maximum Likelihood Estimation.
Wiley Encyclopedia of Biomedical Engineering.
[23] Content-based recommendation systems, Michael J. Pazzani, Rutgers
University, ASBIII, New Brunswick, NJ Daniel Billsus FX Palo Alto
Laboratory, Inc., Palo Alto, CA
6. Referenced links
➢ Suggestions / Recommendations:
(By the Course Faculty)