AI Recommendation System
AI Recommendation System
INSTITUTE OF ENGINEERING
CENTRAL CAMPUS, PULCHOWK
A Report On
Recommendation System
As a part of our Artificial Intelligence course in Bachelor’s degree in Computer Engineering, third
year, second part, we have successfully been able to complete our project “Recommender System”
for an online shopping site. We sincerely appreciate the support and direction of all those people who
have been instrumental in making this project a success.
We would like to express our special thanks and appreciation to our lecturer Dr. Basanta Joshi for
granting us the opportunity to do this project and also offering lots of practical techniques and ideas
on the best ways of software development. We really appreciate his sincere guidance and
encouragement right from the beginning till the end of the project.
We would also like to thank all of our friends who have directly and indirectly helped us in doing this
project. Last but not the least, we place a deep sense of appreciation to our family members who have
been constant source of inspiration for us.
To push the right information to the right person at the right time, classical work on recommendation
system focuses on optimizing the user interest. Recommendation system are best known for their use
on e-commerce websites, where they use input about a customer’s interests to generate a list of
recommended items. Many applications use only the items that customers purchase and explicitly rate
to represent their interests, but they can also use other attributes, including items viewed, demographic
data, subject interests, and favorite artists .Recommendation system have proven to be a valuable way
for an online user to cope with the information overloaded and have become one of the most powerful
and popular tools in e-commerce.
Here, we used recommendation algorithms to personalize the online store for each customer. We used
the item that the customer purchased and explicit rating to represent their interest .We recommended
item to the customer on the basis of their interest and related interest of past customer.
Content Page no
1. Introduction 1
2. Problem Statement 2
3. Objectives 3
4. Literature Review 4
5. Methodology 7
5.1. Item-based Collaborative filtering for rated resources 7
i) The Slope One Scheme 7-8
5.2. Item-based collaborative filtering of purchase statistics 9-10
7. Conclusion 16
Introduction
Recommendation system are software tools and techniques for optimizing the user interest and
pushing the right information to the right person at the right time. It has changed the way people find
products, information, and even other people. They study patterns of behavior to know what someone
will prefer from among a collection of things he has never experienced.
Recommendation systems have become extremely common in recent years, and are applied in a
variety of applications that expose the user to a huge collections of items. The most popular ones are
probably movies, music, news, books, research articles, search queries, social tags, and products in
general. Such systems typically provide the user with a list of recommended items they might prefer,
or predict how much they might prefer each item. These systems help users to decide on appropriate
items, and ease the task of finding preferred items in the collection.
1
Problem Statement
The explosive growth of e-commerce and online environment has made the issue of information search
and selection increasingly serious, user are overloaded by options to consider and they may not have
the time and knowledge to personally evaluate the options.
E-commerce recommendation algorithms often operate in a challenging environment. For example:
A large retailer might have huge amounts of data, tens of millions of customers and millions
of distinct catalog items.
Many applications require the results set to be returned in realtime, in no more than half a
second, while still producing high-quality recommendations.
New customers typically have extremely limited information, based on only a few purchases
or product ratings.
Older customers can have a glut of information, based on thousands of purchases and ratings.
Customer data is volatile: Each interaction provides valuable customer data, and the algorithm
must respond immediately to new information
2
Objectives
3
Literature Review
Collaborative filtering
In general, collaborative filtering is the process of filtering for information or patterns using techniques
involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of
collaborative filtering typically involve very large data sets. Collaborative filtering methods have been
applied to many different kinds of data including: sensing and monitoring data, such as in mineral
exploration, environmental sensing over large areas or multiple sensors; financial data, such as
financial service institutions that integrate many financial sources; or in electronic commerce and web
applications where the focus is on user data, etc. Collaborative filtering methods are based on
collecting and analyzing a large amount of information on users’ behaviors, activities or preferences
and predicting what users will like based on their similarity to other users. The underlying assumption
of the collaborative filtering approach is that if a person A has the same opinion as a person B on an
issue, A is more likely to have B's opinion on a different issue x than to have the opinion on x of a
person chosen randomly.
The motivation for collaborative filtering comes from the idea that people often get the best
recommendations from someone with similar tastes to themselves. Collaborative filtering explores
techniques for matching people with similar interests and making recommendations on this basis.
4
Asking a user to rate an item on a sliding scale.
Asking a user to search.
Asking a user to rank a collection of items from favorite to least favorite.
Presenting two items to a user and asking him/her to choose the better one of them.
Asking a user to create a list of items that he/she likes.
Examples of implicit data collection include the following:
To abstract the features of the items in the system, an item presentation algorithm is applied. A widely
used algorithm is the tf-idf representation (also called vector space representation).
To create a user profile, the system mostly focuses on two types of information:
Basically, these methods use an item profile (i.e. a set of discrete attributes and features) characterizing
the item within the system. The system creates a content-based profile of users based on a weighted
vector of item features. The weights denote the importance of each feature to the user and can be
computed from individually rated content vectors using a variety of techniques. Simple approaches
use the average values of the rated item vector while other sophisticated methods use machine learning
techniques such as Bayesain Classifier, cluster analysis, decision tree and artificial neural network in
order to estimate the probability that the user is going to like the item.
6
Methodology
Inorder to provide the recommendation for user Item-based collaborative filtering is used. Item-based
collaborative filtering is a model-based algorithm for making recommendations. In the algorithm, the
similarities between different items in the dataset are calculated by using one of a number of similarity
measures, and then these similarity values are used to provide recommendation for user. The process
of calculating similarties is explained below.
It uses the most similar items to a user's already-rated items to generate a list of recommendations.
Usually this calculation is a weighted sum or regression with a single free parameter (f(x) = x+b). The
free parameter is simply the average difference between the two items' ratings. This form of
recommendation is analogous to "people who rate item X highly, like you, also tend to rate item Y
highly, and you haven't rated item Y yet, so you should try it".
Fig:-1
Consider two users A and B, two items I and J and Fig. 1. User A gave item I a rating of 1, whereas
user B gave it a rating of 2, while user A gave item J a rating of 1.5. We observe that item J is rated
more than item I by 1.5−1 = 0.5 points, thus we could predict that user B will give item J a rating of
2+0.5 = 2.5. We call user B the predictee user and item J the predictee item. Many such differentials
exist in a training set for each unknown rating and we take an average of these differentials.
The slope one schemes take into account both information from other users who rated the same item
and from the other items rated by the same user (like the PER USER AVERAGE). However, the
schemes also rely on data points that fall neither in the user array nor in the item array (e.g. user A’s
7
rating of item I in Fig. 1), but are nevertheless important information for rating prediction. Much of
the strength of the approach comes from data that is not factored in. Specifically, only those ratings
by users who have rated some common item with the predictee user and only those ratings of items
that the predictee user has also rated enter into the prediction of ratings under slope one schemes.
Formally, given two evaluation arrays vi and wi with i = 1,...,n, we search for the best predictor of the
form f(x) = x + b to predict w from v by minimizing ∑i (vi + b − wi)^2 . Deriving with respect to b and
setting the derivative to zero, we get b =( ∑i wi−vi)/ n . In other words, the constant b must be chosen
to be the average difference between the two arrays.
This result motivates the following scheme. Given a training set x, and any two items j and i with
ratings uj and ui respectively in some user evaluation u (annotated as u∈Sj,i(x)), we consider the
average deviation of item i with respect to item j as:
Any user evaluation u not containing both uj and ui is not included in the summation. The symmetric
matrix defined by devj,i can be computed once and updated quickly when new data is entered. Given
that devj,i + ui is a prediction for uj given ui, a reasonable predictor is the average of all such predictions
P(u)j = (1/card(Rj))* ∑i∈Rj (devj,i +ui) where Rj = {i|i ∈ S(u),i != j,card(Sj,i(x)) > 0} is the set of all
relevant items.
One of the drawbacks of slope one is that the number of ratings observed is not taken into
consideration. Intuitively, to predict user A’s rating of item L given user A’s rating of items J and K,
if 2000 users rated the pair of items J and L whereas only 20 users rated the pair of items K and L,
then user A’s rating of item J is likely to be a far better predictor for item L than user A’s rating of
item K is. Thus, we define the weighted slope one prediction as the following weighted average
Considering the example using weighted slope One Prediction:Sample rating database
If a user rated several items, the predictions are simply combined using a weighted average where a
good choice for the weight is the number of users having rated both items. In the above example, we
would predict the following rating for Lucy on item A:
Hence, given n items, to implement Slope One, all that is needed is to compute and store the average
differences and the number of common ratings for each of the n2 pairs of items.
We are not always given ratings, when the users provide only binary data (such as the item was
purchased or not), then Slope One and other rating-based algorithm do not apply.
Here, we provide recommendation on basis of similarity in taste. The simplest form of this algorithm
is to assign preference value based on some action. This is called an implicit preference since there is
no explicit rating or thumbs up. This form of recommendation is analogous to "people who bought X
items, like you, also bought Y item and you haven't bought item Y yet, so you should try it".
A basic “cooccurrence” style recommender is used. Coocurrence of preferences between user and
products is stored into a matrix form. The basic algorithm is:
[B’B]Hp = Rp,
where
[B’B] is the similarity matrix between products
Rp is a matrix whose rows are the recommendations for all users, so row 0 contains the
recommendations for user 0 and the values are the strength of recommendation. We sort a
row by strength and get the top k recommendations ranked.
rp = a recommendation list for a user based on purchase history, call it a vector.
9
hp = a user history vector
B = the entire matrix of hp row vectors
B’ = B transposed = Hp, These matrixes contain the user purchase history in their columns.
Hp may be a subset of B’.
Considering an example:
There are four users u1, u2, u3 and three products p1 p2 p3.
User/Item P1 P2 P3
U1 1 0 1
U2 0 1 1
U3 0 1 0
The above matrix is [B] that provides user purchase history where 1 indicates user has
purchased that particular product and 0 indicates that product is not purchased.
item P1 P2 P3
P1 1 0 1
P2 0 2 1
P3 1 1 2
The above matrix is [B’B] that gives similarity between products.
P1 1 0 1 P1 1 P1 0
P2 0 2 1 P2 0 P2 1
P3 1 1 2 P3 1 P3 0
10
Output:
We have,
Fig 1 : tbl_user
User_id :3 is Najma
11
Fig: tbl_order
12
Fig 3: tbl_orderdetails
- she purchased products of product_id's: 33, 60 , these are the product_id corresponding to the
order_id equal to the serial in tbl_order.
13
So, User_id :3 is recommended with products of product_id: 26,36,31
User_id Product_id
36 33 26 31 60
1(Bhawna) 1 1 1 1 1
2(Chitra) 1 1 1 0 1
3(Najma) 0 1 0 0 1
So we recommend product_id: 36, 26, 31 to user 3(Najma) which are the products shown in the figure
4 above. The product names are Chiffon Blouse, Dark Red Pumps and Handbag. The real data in the
database is presented below.
14
15
Conclusion
We have a created a recommender system that gives personalized recommendation to the user on the
basis of the products they have rated and the products they have purchased. We have used to two
methods one where we have used implicit preference (purchase based) and other where we have used
explicit preference (ratings) combining both the method recommendation is done. The
recommendation is provided only if one of the data is presented. Thus the objective of providing
recommendation to user was partially successful.
16