Recommender System
Recommender System
Recommender system
Recommendation system
Recommender system, or a recommendation system (sometimes replacing 'system' with a
synonym such as platform or engine), is a subclass of information filtering system that seeks to
predict the "rating" or "preference" a user would give to an item.
Recommender systems are used in a variety of areas, with commonly recognised examples
taking the form of playlist generators for video and music services, product recommenders for
online stores, or content recommenders for social media platforms and open web content
recommenders. These systems can operate using a single input, like music, or multiple inputs
within and across platforms like news, books, and search queries. There are also popular
recommender systems for specific topics like restaurants and online application. Recommender
systems have also been developed to explore research articles and experts, collaborators, and
financial services.
In all of these problems, the common thread is that they aim to increase customer satisfaction
and in turn drive business in the form of increased commissions, greater sales, etc. Whatever the
use case may be, the data is typically in the following format:
Any other feature like the details of the product or demographics of the
customer
In this section we introduce a model for recommendation systems, based on a utility matrix of
preferences.
In a recommendation-system application there are two classes of entities, which we shall refer to
as users and items. Users have preferences for certain items, and these preferences must be
teased out of the data. The data itself is represented as a utility matrix, giving for each user-item
pair, a value that represents what is known about the degree of preference of that user for that
item. Values come from an ordered set, e.g., integers 135 representing the number of stars that
the user gave as a rating for that item. We assume that the matrix is sparse, meaning that most
entries are <unknown.= An unknown rating implies that we have no explicit information about the
user9s preference for the item.
Example : In Fig. we see an example utility matrix, representing users9 ratings of movies on a
135 scale, with 5 the highest rating. Blanks represent the situation where the user has not rated
the movie. The movie names are HP1, HP2, and HP3 for Harry Potter I, II, and III, TW for
Twilight, and SW1,SW2, and SW3 for Star Wars episodes 1, 2, and 3. The users are represented
by capital letters A through D.
Notice that most user-movie pairs have blanks, meaning the user has not rated the movie. In
practice, the matrix would be even sparser, with the typical user rating only a tiny fraction of all
available movies.
The goal of a recommendation system is to predict the blanks in the utility matrix. For example,
would user A like SW2? There is little evidence from the tiny matrix in Fig. 9.1. We might design
our recommendation system to take into account properties of movies, such as their producer,
director, stars,or even the similarity of their names. If so, we might then note the similarity
between SW1 and SW2, and then conclude that since A did not like SW1, they were unlikely to
enjoy SW2 either. Alternatively, with much more data, we might observe that the people who
rated both SW1 and SW2 tended to give them similar ratings. Thus, we could conclude that A
would also give SW2 a low rating, similar to A9s rating of SW1.
Methods
There are two basic architec-tures for a recommendation system:
2. Collaborative-Filtering systems focus on the relationship between users and items. Similarity
of items is determined by the similarity of the ratings of those items by the users who have rated
both items.
Consider an example of recommending news articles to users. Let9s say we have 100 articles
and a vocabulary of size N. We first compute the tf-idf score for each of the words for every
article. Then we construct 2 vectors:
1. Item vector: This is a vector of length N. It contains 1 for words that have a high tf-idf
score in that article, otherwise 0.
2. User vector: Again a 1xN vector. For every word, we store the probability of the word occurring
(i.e. having a high tf-idf score) in articles that the user has consumed. Note here, that the user
vector is based on the attributes of the item (tf-idf score of words in this case).
Once we have these profiles, we compute similarities between the users and the items. The
items that are recommended are the ones that 1) the user has the highest similarity with or 2)
has the highest similarity with the other items the user has read. There are multiple ways of doing
this. Let9s look at 2 common methods:
1.Cosine Similarity:
To compute similarity between the user and item, we simply take the cosine similarity between
the user vector and the item vector. This gives us user-item similarity.
To recommend items that are most similar to the items the user has bought, we compute cosine
similarity between the articles the user has read and other articles. The ones that are most
similar are recommended. Thus this is item-item similarity.
Cosine similarity is best suited when you have high dimensional features, especially in
information retrieval and text mining.
2. Jaccard similarity:
Also known as intersection over union, the formula is as follows:
This is used for item-item similarity. We compare item vectors with each other and return the
items that are most similar.
Jaccard similarity is useful only when the vectors contain binary values. If they have rankings or
ratings that can take on multiple values, Jaccard similarity is not applicable.
In addition to the similarity methods, for content based recommendation, we can treat
recommendation as a simple machine learning problem. Here, regular machine learning
algorithms like random forest, XGBoost, etc., come in handy.
This method is useful when we have a whole lot of 8external9 features, like weather conditions,
market factors, etc. which are not a property of the user or the product and can be highly
variable. For example, the previous day9s opening and closing price play an important role in
determining the profitability of investing in a particular stock. This comes under the class of
supervised problems where the label is whether the user liked/clicked on a product or not(0/1) or
the rating the user gave that product or the number of units the user bought.
Collaborative Filtering
Collaborative filtering is based on the assumption that people who agreed in the past will agree in
the future, and that they will like similar kinds of items as they liked in the past. The system
generates recommendations using only information about rating profiles for different users or
items. By locating peer users/items with a rating history similar to the current user or item, they
generate recommendations using this neighborhood.
The underlying assumption of the collaborative filtering approach is that if A and B buy similar
products, A is more likely to buy a product that B has bought than a product which a random
person has bought. Unlike content based, there are no features corresponding to users or items
here. All we have is the Utility Matrix. This is what it looks like:
A, B, C, D are the users, and the columns represent movies. The values represent ratings (135)
a user has given a movie. In other cases, these values could be 0/1 depending on whether the
user watched the movie or not.
When building a model from a user's behavior, a distinction is often made between explicit
and implicit forms of data collection.
Examples of explicit data collection include the following:
Cold start: For a new user or item, there isn't enough data to make accurate
recommendations.
Scalability: In many of the environments in which these systems make
recommendations, there are millions of users and products. Thus, a large amount of
computation power is often necessary to calculate recommendations.
Sparsity: The number of items sold on major e-commerce sites is extremely large.
The most active users will only have rated a small subset of the overall database.
Thus, even the most popular items have very few ratings.
One of the most famous examples of collaborative filtering is item-to-item collaborative filtering
(people who buy x also buy y), an algorithm popularized by Amazon.com's recommender
system.
There are a 2 broad categories that collaborative filtering can be split into:
This is the mean rating that user i has given all the movies she/he has rated. Using this, we
estimate his rating of movie k as follows:
Similarity between users a and i can be computed using any methods like cosine
similarity/Jaccard similarity/Pearson9s correlation coefficient, etc.
These results are very easy to create and interpret, but once the data becomes too sparse,
performance becomes poor.
Thus, our utility matrix decomposes into U and V where U represents the users and V represents
the movies in a low dimensional space. This can be achieved by using matrix
decomposition techniques like SVD or PCA or by learning the 2 embedding matrices using
neural networks with the help of some optimizer like Adam, SGD etc.
For a user i and every movie j we just need to compute rating y to and recommend the movies
with the highest predicted rating. This approach is most useful when we have a ton of data and it
has high sparsity. Matrix factorization helps by reducing the dimensionality, hence making
computation faster. One disadvantage of this method is that we tend to lose interpretability as we
do not know what exactly elements of the user/item vectors mean.