Module 2
Module 2
Module 2
MODEL-BASED COLLABORATIVE FILTERING
NEIGHBORHOOD-BASED vs MODEL-BASED
• The neighborhood-based methods can be viewed as generalizations of k-
nearest neighbor classifiers, which are commonly used in machine learning.
1. Space-efficiency: Typically, the size of the learned model is much smaller than the original ratings matrix. Thus,
the space requirements are often quite low.
2. Training speed and prediction speed: One problem with neighborhood-based methods is that the pre-processing
stage is quadratic in either the number of users or the number of items. Model-based systems are usually much
faster in the preprocessing phase of constructing the trained model.
3. Avoiding overfitting: Overfitting is a serious problem in many machine learning algorithms, in which the
prediction is overly influenced by random artifacts in the data. This problem is also encountered in classification
and regression models. The summarization approach of model-based methods can often help in avoiding
overfitting.
Decision and Regression Trees
• Decision and regression trees are frequently used in data
classification.
• Decision trees are designed for those cases in which the
dependent variable is categorical, whereas regression trees are
designed for those cases in which the dependent variable is
numerical
How to use Decision Tree?
F YOUNG
F ADULT
M ADULT
F ADULT
M YOUNG
M YOUNG
Gini Index
Step 1: Gini Impurity Index
SPLITTING BY GENDER
STEP 2 :
SPLIT ROOT
SPLITTING BY AGE
STEP 3: Which is the best?
STEP 4 : FINAL DECISION TREE
WHAT APP YOU WILL RECOMMEND?
• For Male?
• For Female?
Rule Based Collaborative
Filtering Recommendation
RULE BASED COLLABORATIVE FILTERING
Association rule mining
Item1 Item2 Item3 Item4 Item5
Alice 1 0 0 0 ?
Mine rules such as
Item1 → Item5 User1 1 0 1 0 1
support (2/4),
confidence (2/2) (without User2 1 0 1 0 1
Alice)
User3 0 0 0 1 1
User4 0 1 1 0 0
Recommendation based on Association Rule Mining
Alice 1 3 3 2 ?
User1 2 4 2 2 4
User2 1 3 3 5 1
X = (Item1 =1, Item2=3, Item3= … )
User3 4 5 2 3 3
User4 1 1 5 2 1
More to consider
Zeros (smoothing required)
like/dislike simplification possible
Naive Bayes Collaborative Filtering
Judging Classification Performance
• The simple case in which all entries in the ratings matrix R are observed. The key idea is that any m
× n matrix R of rank k min{m, n} can always be expressed in the following product form of rank-k
factors:
R = UV T or
R ≈ UV T
CONT.