0% found this document useful (0 votes)
38 views36 pages

Unit 3

dd

Uploaded by

smilemadan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views36 pages

Unit 3

dd

Uploaded by

smilemadan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

VIVEKANANDHA

COLLEGE OF TECHNOLOGY FOR WOMEN


Elayampalayam, Tiruchengode – 637205.

DEPARTMENT OF INFORMATION TECHNOLOGY

SUBJECT NAME: BIG DATA ANALYTICS


SUBJECT CODE: CS8091
YEAR/SEM: III / VI

YEAR/SEM: IV / VIII
Unit - 3

Regulation: 2017

Staff in charge HOD/IT Principal


Part - A
UNIT 3 ASSOCIATION AND RECOMMENDATION SYSTEM

Advanced Analytical Theory and Methods: Association Rules - Overview - Apriori Algorithm -
Evaluation of Candidate Rules - Applications of Association Rules - Finding Association&
finding similarity - Recommendation System: Collaborative Recommendation- Content Based
Recommendation - Knowledge Based Recommendation- Hybrid Recommendation Approaches.

1. What is Association Rule Mining?

 The Association Rule Mining is main purpose to discovering frequent itemsets from a
large dataset is to discover a set of if-then rules called Association rules.
 The form of an association rules is I→j, where I is a set of items(products) and j is a
particular item.

2. List any two algorithms for finding frequent item set.

 Apriori Algorithm
 FP-Growth Algorithm
 SON algorithm
 PCY algorithm

3. What is meant by curse of dimensionality?

 Points in high-dimensional Euclidean spaces, as well as points in non-Euclidean


spaces often behave unintuitive.
 Two unexpected properties of these spaces are that the random points are almost
always at about the same distance, and random vectors are almost always
orthogonal.

4. Write an algorithm of Park-Chen-Yu.

FOR(each basket):

FOR(each item in basket):

add 1 to item’s count;

FOR(each pair of items):


{hash the pair to a bucket;

add 1 to the count for that bucket:}

5. Define Toivonen’s Algorithm

 Toivonen’s algorithm makes only one full pass over the database.
 The algorithm thus produces exact association rules in one full pass over the database.
 The algorithm will give neither false negatives nor positives, but there is a small yet
non-zero probability that it will fail to produce any answer at all.
 Toivonen’s algorithm begins by selecting a small sample of the input dataset and
finding from it the candidate frequent item sets.
 Apriori is an algorithm for frequent item set mining and association rule learning
over relational databases.
 It proceeds by identifying the frequent individual items in the database and
extending them to larger and larger item sets as long as those item sets appear
sufficiently often in the database.

6. List out some applications of clustering.

 Collaborative filtering
 Customer segmentation
 Data summarization
 Dynamic trend detection
 Multimedia data analysis
 Biological data analysis
 Social network analysis

7. What is apriori algorithm?

 Apriori is an algorithm for frequent item set mining and association rule learning over
relational databases.

 It proceeds by identifying the frequent individual items in the database and extending
them to larger and larger item sets as long as those item sets appear sufficiently often in
the database.
8. Define support and confidence.

 Support shows transactions with items purchased together in a single transaction.

 Confidence shows transactions where the items are purchased one after the other.

9. What are the steps followed in the apriori algorithm of data mining ?

There are two steps following in apriori algorithm.

1. Join Step: This step generates (K+1) itemset from K-itemsets by joining each item with
itself.

2. Prune Step: This step scans the count of each item in the database. If the candidate item
does not meet minimum support, then it is regarded as infrequent and thus it is removed.
This step is performed to reduce the size of the candidate itemsets.

10. What are the applications of apriori algorithm

Some fields where Apriori is used:

1. In Education Field: Extracting association rules in data mining of admitted students


through characteristics and specialties.

2. In the Medical field: For example Analysis of the patient's database.

3. In Forestry: Analysis of probability and intensity of forest fire with the forest fire data.

4. Apriori is used by many companies like Amazon in the Recommender System and by
Google for the auto-complete feature.
11. What are the applications of Association Rules?

Applications of association rule mining are:

 Stock analysis
 Web log mining,
 Medical diagnosis,
 Customer market analysis
 Bioinformatics

12. How to find larger frequent item sets?


 A-Priori and many other algorithms allow us to find frequent itemsets larger than
pairs, if we make one pass over the baskets for each size itemset, up to some limit.

 To find the frequent itemsets of size k, monotonicity lets us restrict our attention
to only those itemsets such that all their subsets of size k − 1 have already been
found frequent.

13. Give few techniques to improve the efficiency of Apriori algorithm.

 Hash based technique


 Transaction Reduction
 Portioning
 Sampling
 Dynamic item counting

14. What is Recommender System?

 Recommender systems are an important class of machine learning algorithms that


offer "relevant" suggestions to users.

 A Recommender System refers to a system that is capable of predicting the future


preference of a set of items for a user, and recommend the top items.

 recommender systems are algorithms aimed at suggesting relevant items to users


(items being movies to watch, text to read, products to buy or anything else
depending on industries).

5
15. Why do we need recommender systems?

 Companies using recommender systems focus on increasing sales as a result of


very personalized offers and an enhanced customer experience.

 Recommendations typically speed up searches and make it easier for users to


access content they’re interested in, and surprise them with offers they would
have never searched for.

 Companies are able to gain and retain customers by sending out emails with links
to new offers that meet the recipients’ interests, or suggestions of films and TV
shows that suit their profiles.

 The user starts to feel known and understood and is more likely to buy additional
products or consume more content. By knowing what a user wants, the company
gains competitive advantage and the threat of losing a customer to a competitor
decreases.

 Providing that added value to users by including recommendations in systems and


products is appealing. Furthermore, it allows companies to position ahead of their
competitors and eventually increase their earnings.

6
1. Briefly explain apriori algorithm to find a frequent item set in a data base.

Association Rule Mining is defined as:

“Let I= { …} be a set of ‘n’ binary attributes called items. Let D= { ….} be set of transaction
called database. Each transaction in D has a unique transaction ID and contains a subset of
the items in I. A rule is defined as an implication of form X->Y where X, Y? I and X?Y=?. The
set of items X and Y are called antecedent and consequent of the rule respectively.”

 Learning of Association rules is used to find relationships between attributes in large


databases. An association rule, A=> B, will be of the form” for a set of transactions, some
value of itemset.

 A determines the values of itemset B under the condition in which minimum support and
confidence are met”.

Support and Confidence can be represented by the following example:


Bread=> butter [support=2%, confidence-60%]

 The above statement is an example of an association rule.

 This means that there is a 2% transaction that bought bread and butter together and there
are 60% of customers who bought bread as well as butter.

Support and Confidence for Itemset A and B are represented by formulas:

Association rule mining consists of 2 steps:


1. Find all the frequent itemsets.
2. Generate association rules from the above frequent itemsets.

Why Frequent Itemset Mining?

7
 Frequent item set or pattern mining is broadly used because of its wide applications in
mining association rules, correlations and graph patterns constraint that is based on
frequent patterns, sequential patterns, and many other data mining tasks.

Apriori Algorithm – Frequent Pattern Algorithms

 Apriori algorithm was the first algorithm that was proposed for frequent item set mining.
It was later improved by R Agarwal and R Srikant and came to be known as Apriori.

 This algorithm uses two steps “join” and “prune” to reduce the search space.

 It is an iterative approach to discover the most frequent itemsets.

Apriori says:
The probability that item I is not frequent is if:

 P(I) < minimum support threshold, then I is not frequent.


 P (I+A) < minimum support threshold, then I+A is not frequent, where A also
belongs to item set.
 If an item set set has value less than minimum support then all of its supersets will
also fall below min support, and thus can be ignored. This property is called the
Antimonotone property.

The steps followed in the Apriori Algorithm of data mining are:

1. Join Step: This step generates (K+1) item set from K-item sets by joining each item with
itself.
2. Prune Step: This step scans the count of each item in the database. If the candidate item
does not meet minimum support, then it is regarded as infrequent and thus it is removed.
This step is performed to reduce the size of the candidate item sets.

Steps In Apriori

 Apriori algorithm is a sequence of steps to be followed to find the most frequent item set
in the given database.

 This data mining technique follows the join and the prune steps iteratively until the most
frequent item set is achieved.

8
A minimum support threshold is given in the problem or it is assumed by the user.

1) In the first iteration of the algorithm, each item is taken as a 1-itemsets candidate. The
algorithm will count the occurrences of each item.

2) Let there be some minimum support, min_sup ( eg 2). The set of 1 – itemsets whose
occurrence is satisfying the min sup are determined. Only those candidates which count more
than or equal to min_sup, are taken ahead for the next iteration and the others are pruned.

3) Next, 2-itemset frequent items with min_sup are discovered. For this in the join step, the 2-
itemset is generated by forming a group of 2 by combining items with itself.

4) The 2-itemset candidates are pruned using min-sup threshold value. Now the table will have 2
–itemsets with min-sup only.

5) The next iteration will form 3 –itemsets using join and prune step. This iteration will follow
antimonotone property where the subsets of 3-itemsets, that is the 2 –itemset subsets of each
group fall in min_sup. If all 2-itemset subsets are frequent then the superset will be frequent
otherwise it is pruned.

6) Next step will follow making 4-itemset by joining 3-itemset with itself and pruning if its
subset does not meet the min_sup criteria. The algorithm is stopped when the most frequent
itemset is achieved.

9
Example of Apriori: Support threshold=50%, Confidence= 60%

TABLE-1
Transaction List of items

T1 I1,I2,I3

T2 I2,I3,I4

T3 I4,I5

T4 I1,I2,I4

T5 I1,I2,I3,I5

T6 I1,I2,I3,I4

Solution:
Support threshold=50% => 0.5*6= 3 => min_sup=3

1. Count of Each Item


TABLE-2
Item Count

I1 4

I2 5

I3 4

I4 4

I5 2

10
2. Prune Step: TABLE -2 shows that I5 item does not meet min_sup=3, thus it is deleted, only
I1, I2, I3, I4 meet min_sup count.

TABLE-3
Item Count

I1 4

I2 5

I3 4

I4 4

3. Join Step: Form 2-itemset. From TABLE-1 find out the occurrences of 2-itemset.

TABLE-4
Item Count

I1,I2 4

I1,I3 3

I1,I4 2

I2,I3 4

I2,I4 3

I3,I4 2

11
4. Prune Step: TABLE -4 shows that item set {I1, I4} and {I3, I4} does not meet min_sup, thus
it is deleted.

TABLE-5
Item Count

I1,I2 4

I1,I3 3

I2,I3 4

I2,I4 3

5. Join and Prune Step: Form 3-itemset. From the TABLE- 1 find out occurrences of 3-itemset.
From TABLE-5, find out the 2-itemset subsets which support min_sup.

We can see for itemset {I1, I2, I3} subsets, {I1, I2}, {I1, I3}, {I2, I3} are occurring in TABLE-
5 thus {I1, I2, I3} is frequent.

We can see for itemset {I1, I2, I4} subsets, {I1, I2}, {I1, I4}, {I2, I4}, {I1, I4} is not frequent, as
it is not occurring in TABLE-5 thus {I1, I2, I4} is not frequent, hence it is deleted.

TABLE-6
Item

I1,I2,I3

I1,I2,I4

I1,I3,I4

I2,I3,I4

12
Only {I1, I2, I3} is frequent.

6. Generate Association Rules: From the frequent item set discovered above the association
could be:
{I1, I2} => {I3}

Confidence = support {I1, I2, I3} / support {I1, I2} = (3/ 4)* 100 = 75%

{I1, I3} => {I2}

Confidence = support {I1, I2, I3} / support {I1, I3} = (3/ 3)* 100 = 100%

{I2, I3} => {I1}

Confidence = support {I1, I2, I3} / support {I2, I3} = (3/ 4)* 100 = 75%

{I1} => {I2, I3}

Confidence = support {I1, I2, I3} / support {I1} = (3/ 4)* 100 = 75%

{I2} => {I1, I3}

Confidence = support {I1, I2, I3} / support {I2 = (3/ 5)* 100 = 60%

{I3} => {I1, I2}

Confidence = support {I1, I2, I3} / support {I3} = (3/ 4)* 100 = 75%

This shows that all the above association rules are strong if minimum confidence threshold is
60%.

The Apriori Algorithm: Pseudo Code

C: Candidate item set of size k

L: Frequent itemset of size k

13
Advantages
1. Easy to understand algorithm
2. Join and Prune steps are easy to implement on large item sets in large databases

Disadvantages
1. It requires high computation if the item sets are very large and the minimum support is
kept very low.

2. The entire database needs to be scanned.

Methods To Improve Apriori Efficiency

Many methods are available for improving the efficiency of the algorithm.

1. Hash-Based Technique: This method uses a hash-based structure called a hash table for
generating the k-item sets and its corresponding count. It uses a hash function for
generating the table.

2. Transaction Reduction: This method reduces the number of transactions scanning in


iterations. The transactions which do not contain frequent items are marked or removed.

14
3. Partitioning: This method requires only two database scans to mine the frequent item
sets. It says that for any item set to be potentially frequent in the database, it should be
frequent in at least one of the partitions of the database.

4. Sampling: This method picks a random sample S from Database D and then searches for
frequent item set in S. It may be possible to lose a global frequent item set. This can be
reduced by lowering the min_sup.

5. Dynamic Itemset Counting: This technique can add new candidate item sets at any
marked start point of the database during the scanning of the database.

Applications of Apriori Algorithm

Some fields where Apriori is used:


1. In Education Field: Extracting association rules in data mining of admitted students
through characteristics and specialties.

2. In the Medical field: For example Analysis of the patient's database.

3. In Forestry: Analysis of probability and intensity of forest fire with the forest fire data.

4. Apriori is used by many companies like Amazon in the Recommender System and by
Google for the auto-complete feature.

15
2. Explain the basic concept of Recommender Systems.
What Are Recommender Systems?
 Recommender systems are one of the most common and easily understandable
applications of big data.
 Recommender systems are an important class of machine learning algorithms that
offer "relevant" suggestions to users.

 A Recommender System refers to a system that is capable of predicting the future


preference of a set of items for a user, and recommend the top items.

 recommender systems are algorithms aimed at suggesting relevant items to users


(items being movies to watch, text to read, products to buy or anything else
depending on industries).

 The most known application is probably Amazon’s recommendation engine,


which provides users with a personalized web page when they visit
Amazon.com.
 However, e-commerce companies are not the only ones that use
recommendation engines to persuade customers to buy additional products.
 There are use cases in entertainment, gaming, education, advertising, home
decor, and some other industries.
 The systems have different applications, from recommending music and events
to furniture and dating profiles.
 Many worldwide known industry leaders save billions of dollars and engage
several times more users be harnessing the power of recommender systems.

16
Types of Data Used by Recommender Systems

 Since big data fuels recommendations, the input needed for model training plays
a key role.
 Depending on your business goals, a system can work based on such types of
data as content, historical data, or user data involving views, clicks, and likes.
 The data used for training a model to make recommendations can be split into
several categories.

1. User behavior data (historical data)

 Log on-site activity: clicks, searches, page, and item views


 Off-site activities: tracking clicks in emails, in mobile applications, and in their
push notifications

2. Particular item details

 Title
 Category
 Price
 Description
 Style
3. Contextual information

 Device used
 Current location
 Referral URL

 For you to get a full picture of your customer, it is not enough to be aware of
what he or she is viewing on your website and your competitors’ ones.
 You should take into account the frequency of visits, user location, and types of
devices.

17
 All the data sources are equally important for the smooth and consistent
operation of different types of algorithms.

18
There are three major types of recommender systems:

 Content-based filtering
 Collaborative filtering
 Hybrid recommender systems

These methods can rely on user behavior data, including activities, preferences, and likes, or can
take into account the description of the items that users prefer, or both.
i) Content-based filtering
 This method works based on the properties of the items that each user likes,
discovering what else the user may like.
 It takes into account multiple keywords. Also, a user profile is designed to
provide comprehensive information on the items that a user prefers.
 The system then recommends some similar items that users may also want to
purchase.

ii) Collaborative filtering

 Recommendation engines can rely on likes and desires of other users to compute
a similarity index between users and recommend items to them accordingly.
 This type of filtering relies on user opinion instead of machine analysis to
accurately recommend complex items, such as movies or music tracks.

19
 The collaborative filtering algorithm has some specifics. The system can search
for look-alike users, which will be user-user collaborative filtering.
 So, recommendations will depend on a user profile. But such an approach
requires a lot of computational resources and will be hard to implement for
large-scale databases.
 Another option is item-item collaborative filtering. The system will find similar
items and recommend these items to a user on a case-by-case basis.

 It is a resource-saving approach, and Amazon utilizes it to engage customers and


improve sales volumes.

iii) Hybrid recommender systems


 It is also possible to combine both types to build a more prosperous
recommendation engine.
 This method is used to generate collaborative and content-based predictions and
pull them all together to increase performance.
 We have already mentioned Netflix, and this provider of media services uses a
hybrid system to win customer loyalty.

 Users get movie recommendations based on their habits and the characteristics
of content they prefer.

Why Should You Integrate Recommender Systems?


Recommender systems have proved themselves efficient in dealing with the following
challenges:

 Increase the number of items sold


 Sell more diverse items
 Increase user satisfaction
 Better understand what the user wants

20
3. Briefly explain the concept of collaborative filtering methods in recommender system.
 Collaborative methods for recommender systems are methods that are based solely on the
past interactions recorded between users and items in order to produce new
recommendations.
 These interactions are stored in the so-called “user-item interactions matrix”.

Illustration of the user-item interactions matrix.


 Then, the main idea that rules collaborative methods is that these past user-item
interactions are sufficient to detect similar users and/or similar items and make
predictions based on these estimated proximities.
 The class of collaborative filtering algorithms is divided into two sub-categories that are
generally called memory based and model based approaches.
 Memory based approaches directly works with values of recorded interactions, assuming
no model, and are essentially based on nearest neighbors search (for example, find the
closest users from a user of interest and suggest the most popular items among these
neighbours).
 Model based approaches assume an underlying “generative” model that explains the user-
item interactions and try to discover it in order to make new predictions.
 The main advantage of collaborative approaches is that they require no information about
users or items and, so, they can be used in many situations.

21
 Moreover, the more users interact with items the more new recommendations become
accurate: for a fixed set of users and items, new interactions recorded over time bring
new information and make the system more and more effective.

collaborative filtering methods paradigm.


 However, as it only consider past interactions to make recommendations, collaborative
filtering suffer from the “cold start problem”.
 It is impossible to recommend anything to new users or to recommend a new item to any
users and many users or items have too few interactions to be efficiently handled.
 This drawback can be addressed in different way: recommending random items to new
users or new items to random users (random strategy).
 Recommending popular items to new users or new items to most active users (maximum
expectation strategy).
 Recommending a set of various items to new users or a new item to a set of various users
(exploratory strategy) or, finally, using a non collaborative method for the early life of the
user or the item.
Types of collaborative filtering approaches in recommender system.
The collaborative filtering is classified in to two types. they are:
i. Memory based or item based collaborative filtering
ii. Model based or user based collaborative filtering

22
i). Memory based or item based collaborative filtering.
 The collaborative filtering algorithm has some specifics.
 The system can search for look-alike users, which will be user-user collaborative
filtering.
 So, recommendations will depend on a user profile. But such an approach requires a lot
of computational resources and will be hard to implement for large-scale databases.
 Another option is item-item collaborative filtering.
 The system will find similar items and recommend these items to a user on a case-by-
case basis.
 It is a resource-saving approach, and Amazon utilizes it to engage customers and improve
sales volumes.
 The main characteristics of user-user and item-item approaches it that they use only
information from the user-item interaction matrix and they assume no model to produce
new recommendations.

User-user
 In order to make a new recommendation to a user, user-user method roughly tries to
identify users with the most similar “interactions profile” (nearest neighbours) in order to
suggest items that are the most popular among these neighbours (and that are “new” to
our user).
 This method is said to be “user-centred” as it represent users based on their interactions
with items and evaluate distances between users.
 Assume that we want to make a recommendation for a given user.
 First, every user can be represented by its vector of interactions with the different items
(“its line” in the interaction matrix).
 Then, we can compute some kind of “similarity” between our user of interest and every
other users.
 That similarity measure is such that two users with similar interactions on the same items
should be considered as being close.
 Once similarities to every users have been computed, we can keep the k-nearest-
neighbours to our user and then suggest the most popular items among them (only
looking at the items that our reference user has not interacted with yet).
 Notice that, when computing similarity between users, the number of “common
interactions” (how much items have already been considered by both users?) should be
considered carefully!.

23
 Indeed, most of the time, we want to avoid that someone that only have one interaction in
common with our reference user could have a 100% match and be considered as being
“closer” than someone having 100 common interactions and agreeing “only” on 98% of
them.
 So, we consider that two users are similar if they have interacted with a lot of common
items in the same way (similar rating, similar time hovering…).

Illustration of the user-user method.

Item-item
 To make a new recommendation to a user, the idea of item-item method is to find items
similar to the ones the user already “positively” interacted with.
 Two items are considered to be similar if most of the users that have interacted with both
of them did it in a similar way.
 This method is said to be “item-centred” as it represent items based on interactions users
had with them and evaluate distances between those items.
 Assume that we want to make a recommendation for a given user.
 First, we consider the item this user liked the most and represent it (as all the other items)
by its vector of interaction with every users (“its column” in the interaction matrix).
 Then, we can compute similarities between the “best item” and all the other items.
 Once the similarities have been computed, we can then keep the k-nearest-neighbours to
the selected “best item” that are new to our user of interest and recommend these items.

24
Illustration of the item-item method.

Comparing user-user and item-item


 The user-user method is based on the search of similar users in terms of interactions with
items.
 As, in general, every user have only interacted with a few items, it makes the method
pretty sensitive to any recorded interactions (high variance).
 On the other hand, as the final recommendation is only based on interactions recorded for
users similar to our user of interest, we obtain more personalized results (low bias).
 Conversely, the item-item method is based on the search of similar items in terms of user-
item interactions.
 As, in general, a lot of users have interacted with an item, the neighbourhood search is far
less sensitive to single interactions (lower variance).
 As a counterpart, interactions coming from every kind of users (even users very different
from our reference user) are then considered in the recommendation, making the method
less personalised (more biased).
 Thus, this approach is less personalized than the user-user approach but more robust.

25
The difference between item-item and user-user methods.

Disadvantages of memory based or item based collaborative method.


 One of the biggest flaw of memory based collaborative filtering is that they do not scale
easily.
 Generating a new recommendation can be extremely time consuming for big systems.
 Indeed, for systems with millions of users and millions of items, the nearest neighbours
search step can become intractable if not carefully designed (KNN algorithm has a
complexity of O(ndk) with n the number of users, d the number of items and k the
number of considered neighbours).
 In order to make computations more tractable for huge systems, we can both take
advantage of the sparsity of the interaction matrix when designing our algorithm or use
approximate nearest neighbours methods (ANN).

ii).Model based or user based collaborative filtering :

Model based collaborative approaches


 Model based collaborative approaches only rely on user-item interactions information
and assume a latent model supposed to explain these interactions.
 For example, matrix factorisation algorithms consists in decomposing the huge and
sparse user-item interaction matrix into a product of two smaller and dense matrices: a
user-factor matrix (containing users representations) that multiplies a factor-item matrix
(containing items representations).

26
Matrix factorisation
 The main assumption behind matrix factorisation is that there exists a pretty low
dimensional latent space of features in which we can represent both users and items and
such that the interaction between a user and an item can be obtained by computing the dot
product of corresponding dense vectors in that space.
 For example, consider that we have a user-movie rating matrix.
 In order to model the interactions between users and movies, we can assume that:

 there exists some features describing (and telling apart) pretty well movies.

 these features can also be used to describe user preferences (high values for features
the user likes, low values otherwise)
 However we don’t want to give explicitly these features to our model (as it could be done
for content based approaches that we will describe later).
 Instead, we prefer to let the system discover these useful features by itself and make its
own representations of both users and items.
 As they are learned and not given, extracted features taken individually have a
mathematical meaning but no intuitive interpretation (and, so, are difficult, if not
impossible, to understand as human).
 However, it is not unusual to ends up having structures emerging from that type of
algorithm being extremely close to intuitive decomposition that human could think about.
 Indeed, the consequence of such factorization is that close users in terms of preferences
as well as close items in terms of characteristics ends up having close representations in
the latent space.

27
Illustration of the matrix factorization method.

Mathematics of matrix factorisation


 In this subsection, we will give a simple mathematical overview of matrix factorization.
 More especially, we describe a classical iterative approach based on gradient descent that
makes possible to obtain factorisations for very large matrices without loading all the
data at the same time in computer’s memory.
 Let’s consider an interaction matrix M (nxm) of ratings where only some items have been
rated by each user (most of the interactions are set to None to express the lack of rating).
We want to factorise that matrix such that

 where X is the “user matrix” (nxl) whose rows represent the n users and where Y is the
“item matrix” (mxl) whose rows represent the m items:

 Here l is the dimension of the latent space in which users and item will be represented.
 So, we search for matrices X and Y whose dot product best approximate the existing
interactions.
 Denoting E the ensemble of pairs (i,j) such that M_ij is set (not None), we want to find X
and Y that minimise the “rating reconstruction error”

28
 Adding a regularisation factor and dividing by 2, we get

Extensions of matrix factorisation


 We can finally notice that the concept of this basic factorisation can be extended to more
complex models with, for example, more general neural network like “decomposition”
(we cannot strictly speak about “factorisation” anymore).
 The first direct adaptation we can think of concerns boolean interactions.
 If we want to reconstruct boolean interactions, a simple dot product is not well adapted.
 If, however, we add a logistic function on top of that dot product, we get a model that
takes its value in [0, 1] and, so, better fit the problem.
 In such case, the model to optimise is with f(.) our logistic function.

 Deeper neural network models are often used to achieve near state of the art
performances in complex recommender systems.

 Matrix factorization can be generalized with the use of a model on top of users and items
embeddings.

29
4. Briefly elaborate the content based recommender systems in business industries.
 Content based approaches use additional information about users and/or items.
 If we consider the example of a movies recommender system, this additional information
can be, for example, the age, the sex, the job or any other personal information for users
as well as the category, the main actors, the duration or other characteristics for the
movies (items).
 Then, the idea of content based methods is to try to build a model, based on the available
“features”, that explain the observed user-item interactions.
 Still considering users and movies, we will try, for example, to model the fact that young
women tend to rate better some movies, that young men tend to rate better some other
movies and so on.
 If we manage to get such model, then, making new predictions for a user is pretty easy:
we just need to look at the profile (age, sex, …) of this user and, based on this
information, to determine relevant movies to suggest.

Overview of the content based methods paradigm.

1. Content based methods suffer far less from the cold start problem than collaborative
approaches: new users or items can be described by their characteristics (content) and so
relevant suggestions can be done for these new entities.

2. Only new users or items with previously unseen features will logically suffer from this
drawback, but once the system old enough, this has few to no chance to happen.

30
 In content based methods, the recommendation problem is casted into either a
classification problem (predict if a user “likes” or not an item) or into a regression
problem (predict the rating given by a user to an item).
 In both cases, we are going to set a model that will be based on the user and/or item
features at our disposal (the “content” of our “content-based” method).
 If our classification (or regression) is based on users features, we say the approach is
item-centred: modelling, optimisations and computations can be done “by item”.
 In this case, we build and learn one model by item based on users features trying to
answer the question “what is the probability for each user to like this item?” (or “what is
the rate given by each user to this item?”, for regression).
 The model associated to each item is naturally trained on data related to this item and it
leads, in general, to pretty robust models as a lot of users have interacted with the item.
 However, the interactions considered to learn the model come from every users and even
if these users have similar characteristic (features) their preferences can be different.
 This mean that even if this method is more robust, it can be considered as being less
personalised (more biased) than the user-centred method thereafter.
 If we are working with items features, the method is then user-centred: modeling,
optimizations and computations can be done “by user”.
 We then train one model by user based on items features that tries to answer the question
“what is the probability for this user to like each item?” (or “what is the rate given by this
user to each item?”, for regression).
 We can then attach a model to each user that is trained on its data: the model obtained is,
so, more personalized than its item-centred counterpart as it only takes into account
interactions from the considered user.
 However, most of the time a user has interacted with relatively few items and, so, the
model we obtain is a far less robust than an item-centred one.

31
Illustration of the difference between item-centred and user-centred content based methods.

 From a practical point of view, we should underline that, most of the time, it is much
more difficult to ask some information to a new user (users do not want to answer too
much questions) than to ask lot of information about a new item (people adding them
have an interest in filling these information in order to make their items recommended to
the right users).
 We can also notice that, depending on the complexity of the relation to express, the
model we build can be more or less complex, ranging from basic models (logistic/linear
regression for classification/regression) to deep neural networks.
 Finally, let’s mention that content based methods can also be neither user nor item
centred: both informations about user and item can be used for our models, for example
by stacking the two features vectors and making them go through a neural network
architecture.

Item-centred Bayesian classifier


 Let’s first consider the case of an item-centred classification: for each item we want to
train a Bayesian classifier that takes user features as inputs and output either “like” or
“dislike”.

32
 So, to achieve the classification task, we want to compute the ratio between the
probability for a user with given features to like the considered item and its probability to
dislike it.
 This ratio of conditional probabilities that defines our classification rule (with a simple
threshold) can be expressed following the Bayes formula where are priors computed from
the data whereas are likelihoods assumed to follow Gaussian distributions with parameters
to be determined also from data.

 Various hypothesis can be done about the covariance matrices of these two likelihood
distributions (no assumption, equality of matrices, equality of matrices and features
independence) leading to various well known models (quadratic discriminant analysis,
linear discriminant analysis, naive Bayes classifier).
 We can underline once more that, here, likelihood parameters have to be estimated only
based on data (interactions) related to the considered item.

Illustration of the item-centred content based Bayesian classifier.

33
User-centred linear regression
 Let’s now consider the case of a user-centred regression: for each user we want to train a
simple linear regression that takes item features as inputs and output the rating for this
item.
 We still denote M the user-item interaction matrix, we stack into a matrix X row vectors
representing users coefficients to be learned and we stack into a matrix Y row vectors
representing items features that are given.
 Then, for a given user i, we learn the coefficients in X_i by solving the following
optimisation problem where one should keep in mind that i is fixed and, so, the first
summation is only over (user, item) pairs that concern user i.

 We can observe that if we solve this problem for all the users at the same time, the
optimisation problem is exactly the same as the one we solve in “alternated matrix
factorisation” when we keep items fixed.
 This observation underlines the link we mentioned in the first section: model based
collaborative filtering approaches (such as matrix factorisation) and content based
methods both assume a latent model for user-item interactions but model based
collaborative approaches have to learn latent representations for both users and items
whereas content-based methods build a model upon human defined features for users
and/or items.

Illustration of the user-centred content based regression.

34
5. How to evaluate a recommender system?
 As for any machine learning algorithm, we need to be able to evaluate the performances
of our recommender systems in order to decide which algorithm fit the best our situation.
 Evaluation methods for recommender systems can mainly be divided in two sets:
evaluation based on well defined metrics and evaluation mainly based on human
judgment and satisfaction estimation.

i) Metrics based evaluation


 If our recommender system is based on a model that outputs numeric values such as
ratings predictions or matching probabilities, we can assess the quality of these outputs in
a very classical manner using an error measurement metric such as, for example, mean
square error (MSE).
 In this case, the model is trained only on a part of the available interactions and is tested
on the remaining ones.
 Still if our recommender system is based on a model that predicts numeric values, we can
also binarize these values with a classical thresholding approach (values above the
threshold are positive and values bellow are negative) and evaluate the model in a more
“classification way”.
 Indeed, as the dataset of user-item past interactions is also binary (or can be binarized by
thresholding), we can then evaluate the accuracy (as well as the precision and the recall)
of the binarized outputs of the model on a test dataset of interactions not used for
training.
 Finally, if we now consider a recommender system not based on numeric values and that
only returns a list of recommendations (such as user-user or item-item that are based on a
knn approach), we can still define a precision like metric by estimating the proportion of
recommended items that really suit our user.
 To estimate this precision, we cannot take into account recommended items that our user
has not interacted with and we should only consider items from the test dataset for which
we have a user feedback.

ii) Human based evaluation


 When designing a recommender system, we can be interested not only to obtain model
that produce recommendations we are very sure about but we can also expect some other
good properties such as diversity and explainability of recommendations.

35
 As mentioned in the collaborative section, we absolutely want to avoid having a user
being stuck in what we called earlier an information confinement area.
 The notion of “serendipity” is often used to express the tendency a model has or not to
create such a confinement area (diversity of recommendations).
 Serendipity, that can be estimated by computing the distance between recommended
items, should not be too low as it would create confinement areas, but should also not be
too high as it would mean that we do not take enough into account our users interests
when making recommendations (exploration vs exploitation).
 Thus, in order to bring diversity in the suggested choices, we want to recommend items
that both suit our user very well and that are not too similar from each others.
 For example, instead of recommending a user “Star Wars” 1, 2 and 3, it seems better to
recommend “Star wars 1”, “Star trek into darkness” and “Indiana Jones and the raiders of
the lost ark”: the two later may be seen by our system as having less chance to interest our
user but recommending 3 items that look too similar is not a good option.
 Explain ability is another key point of the success of recommendation algorithms. Indeed,
it has been proven that if users do not understand why they had been recommended as
specific item, they tend to lose confidence into the recommender system.
 So, if we design a model that is clearly explainable, we can add, when making
recommendations, a little sentence stating why an item has been recommended (“people
who liked this item also liked this one”, “you liked this item, you may be interested by this
one”, …).
 Finally, on top of the fact that diversity and explain ability can be intrinsically difficult to
evaluate, we can notice that it is also pretty difficult to assess the quality of a
recommendation that do not belong to the testing dataset:
 How to know if a new recommendation is relevant before actually recommending it to our
user? For all these reasons, it can sometimes be tempting to test the model in “real
conditions”. As the goal of the recommender system is to generate an action (watch a
movie, buy a product, read an article etc…), we can indeed evaluate its ability to generate
the expected action.
 For example, the system can be put in production, following an A/B testing approach, or
can be tested only on a sample of users.

 Such processes require, however, having a certain level of confidence in the model.

36

You might also like