0% found this document useful (0 votes)

34 views53 pages

Module 2

This document discusses neighborhood-based collaborative filtering and model-based collaborative filtering. It explains that neighborhood-based methods do not create an explicit model up front for prediction, while model-based methods create a summarized model of the data during a separate training phase to then use for predictions. Model-based methods tend to be more space and time efficient than neighborhood-based methods. Some examples of model-based recommender systems discussed include decision trees, Bayesian classifiers, and latent factor models, which use dimensionality reduction techniques to fill in missing entries in a ratings matrix.

Uploaded by

Sri Karthik Avala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views53 pages

Module 2

Uploaded by

Sri Karthik Avala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 53

CSE4077 Recommender Systems

Module 2
MODEL-BASED COLLABORATIVE FILTERING
NEIGHBORHOOD-BASED vs MODEL-BASED
• The neighborhood-based methods can be viewed as generalizations of k-
nearest neighbor classifiers, which are commonly used in machine learning.

• These methods are instance-based methods, whereby a model is not

specifically created up front for prediction.

• Neighborhood-based methods are generalizations of instance-based

learning methods or lazy learning methods in which the prediction approach
is specific to the instance being predicted.

• For example, in user-based neighborhood methods, the peers of the target

user are determined in order to perform the prediction.
MODEL BASED
• In model-based methods, a summarized model of the data is
created up front, as with supervised or unsupervised machine
learning methods.

• Therefore, the training (or modelbuilding phase) is clearly

separated from the prediction phase.

• Examples of such methods in traditional machine learning include

decision trees, rule-based methods, Bayes classifiers, regression
models, support vector machines, and neural networks
CONT.
• In the data classification problem, an m × n matrix, in
which the first (n − 1) columns are feature variables
(or independent variables), and the last (i.e., nth)
column is the class variable (or dependent variable).

• All entries in the first (n − 1) columns are fully

specified, whereas only a subset of the entries in the
nth column is specified.

• Therefore, a subset of the rows in the matrix is fully

specified, and these rows are referred to as the
training data.

• The remaining rows are referred to as the test data.

• Model-based recommender systems often have a number of advantages over neighborhood-based methods:

1. Space-efficiency: Typically, the size of the learned model is much smaller than the original ratings matrix. Thus,
the space requirements are often quite low.

2. Training speed and prediction speed: One problem with neighborhood-based methods is that the pre-processing
stage is quadratic in either the number of users or the number of items. Model-based systems are usually much
faster in the preprocessing phase of constructing the trained model.

3. Avoiding overfitting: Overfitting is a serious problem in many machine learning algorithms, in which the
prediction is overly influenced by random artifacts in the data. This problem is also encountered in classification
and regression models. The summarization approach of model-based methods can often help in avoiding
overfitting.
Decision and Regression Trees
• Decision and regression trees are frequently used in data
classification.
• Decision trees are designed for those cases in which the
dependent variable is categorical, whereas regression trees are
designed for those cases in which the dependent variable is
numerical
How to use Decision Tree?

• Here, each node comprises an attribute (feature) that

becomes the root cause of further splitting in the downward
direction.
• Can you answer,
– How to decide which feature should be located at the root
node,
– Most accurate feature to serve as internal nodes or leaf nodes,
– How to divide tree,
– How to measure the accuracy of splitting tree and many more.
DATASET

GENDER AGE APP DOWNLOADED

F YOUNG

F ADULT

M ADULT

F ADULT

M YOUNG

M YOUNG
Gini Index
Step 1: Gini Impurity Index
SPLITTING BY GENDER

STEP 2 :
SPLIT ROOT
SPLITTING BY AGE
STEP 3: Which is the best?
STEP 4 : FINAL DECISION TREE
WHAT APP YOU WILL RECOMMEND?

• For Adult customer ??

• For Young customer?

• For Male?
• For Female?
Rule Based Collaborative
Filtering Recommendation
RULE BASED COLLABORATIVE FILTERING
Association rule mining
Item1 Item2 Item3 Item4 Item5

Alice 1 0 0 0 ?
Mine rules such as
Item1 → Item5 User1 1 0 1 0 1
support (2/4),
confidence (2/2) (without User2 1 0 1 0 1
Alice)
User3 0 0 0 1 1

User4 0 1 1 0 0
Recommendation based on Association Rule Mining

• Simplest approach Item1 Item2 Item3 Item4 Item5

– transform 5-point ratings into binary Alice 1 0 0 0 ?
ratings (1 = above user average)
User1 1 0 1 0 1
• Mine rules such as
– Item1 → Item5 User2 1 0 1 0 1
• support (2/4), confidence (2/2) (without Alice) User3 0 0 0 1 1
• Make recommendations for Alice (basic method) User4 0 1 1 0 0
HOW TO GENERATE RULES?
Item1 Item2 Item3 Item4 Item5
• Compute the following rules: Alice 1 0 0 0 ?
• Item1-> Item3, Item 5 User1 1 0 1 0 1
– Support= 2/4, conf=2/2 User2 1 0 1 0 1
User3 0 0 0 1 1
User4 0 1 1 0 0
Add Item 3 and Item 5 to recommendation list
• Item1-> Item 3
– Compute Support and Confidence

Add Item 3 to recommendation list

Finally recommended Items to Alice is Item 3, Item 5

Probabilistic based
Recommendation methods
Probabilistic methods
Calculation of probabilities in simplistic approach
Item1 Item2 Item3 Item4 Item5

Alice 1 3 3 2 ?

User1 2 4 2 2 4

User2 1 3 3 5 1
X = (Item1 =1, Item2=3, Item3= … )
User3 4 5 2 3 3

User4 1 1 5 2 1

 More to consider
 Zeros (smoothing required)
 like/dislike simplification possible
Naive Bayes Collaborative Filtering
Judging Classification Performance

• A natural criterion for judging the performance of a classifier

is the probability of making a misclassification error.
• Misclassification means that the observation belongs to one
class but the model classifies it as a member of a different
class.
• A classifier that makes no errors would be perfect, but we
do not expect to be able to construct such classifiers in the
real world due to “noise” and to not having all the
information needed to classify cases precisely.
Naïve Rule

• The Naive Bayes classification algorithm is a probabilistic

classifier. It is based on probability models that
incorporate strong independence assumptions.
• The independence assumptions often do not have an
impact on reality.
• Therefore they are considered as naive.
• Naïve Bayes Algorithm is a Probabilistic Classifier
algorithm
Naïve Bayes Classifier

• A naive Bayes classifier is an algorithm that uses

Bayes' theorem to classify objects.
• Naive Bayes classifiers assume strong, or naive,
independence between attributes of data points.
• Popular uses of naive Bayes classifiers include spam
filters, text analysis and medical diagnosis.
Formula for Naïve Bayes Theorem

• Bayes Theorem provides a principled way for calculating

the conditional probability.
• The simple form of the calculation for Bayes Theorem is
as follows:
P(A|B) = P(B|A) * P(A) / P(B)
Naïve Bayes Example

• Naive Bayes classifier assumes that the presence of a

particular feature in a class is unrelated to the presence of
any other feature.
• For example, a fruit may be considered to be an apple
if it is red, round, and about 3 inches in diameter
Uses of Naïve Algorithm

• Naïve Bayes algorithms are often used in

• sentiment analysis,
• spam filtering,
• recommendation systems
Advantages of Naïve Bayes

• It doesn't require as much training data.

• It handles both continuous and discrete data.
• It is highly scalable with the number of predictors and
data points.
• It is fast and can be used to make real-time predictions
Naïve Bayes
Latent factor models (R = UV T)
• Latent factor models are considered to be state-of-the-art in
recommender systems.

• These models leverage well-known dimensionality reduction

methods to fill in the missing entries.

• Dimensionality reduction methods are used commonly in other areas

of data analytics to represent the underlying data in a small number
of dimensions.

• The basic idea of dimensionality reduction methods is to rotate the

axis system, so that pairwise correlations between dimensions are
removed.

• The key idea in dimensionality reduction methods is that the

reduced, rotated, and completely specified representation can be
robustly estimated from an incomplete data matrix.
• Latent factor models, such as singular value decomposition (SVD), comprise an
alternative approach by transforming both items and users to the same latent factor
space, thus making them directly comparable.
• The latent space tries to explain ratings by characterizing both products and users
on factors automatically inferred from user feedback.
• For example, when the products are movies, factors might measure obvious
dimensions such as comedy vs. drama, amount of action, or orientation to children;
less well defined dimensions such as depth of character development or quirkiness;
or completely uninterpretable dimensions.
• For users, each factor measures how much the user likes movies that score high on
the corresponding movie factor.
• The use of such correlations is, after all, fundamental to all collaborative filtering methods, whether they are
neighborhood methods or model-based methods.
• For example, user-based neighborhood methods leverage user-wise correlations, whereas item-based neighborhood
methods leverage item-wise correlations.
• Matrix factorization methods provide a neat way to leverage all row and column correlations in one shot to estimate
the entire data matrix.
• This sophistication of the approach is one of the reasons that latent factor models have become the state-of-the-art in
collaborative filtering.

• The simple case in which all entries in the ratings matrix R are observed. The key idea is that any m
× n matrix R of rank k min{m, n} can always be expressed in the following product form of rank-k
factors:

R = UV T or
R ≈ UV T
CONT.

Shrapnel White Paper
No ratings yet
Shrapnel White Paper
42 pages
Unsupervised Learning Algorithm 1
No ratings yet
Unsupervised Learning Algorithm 1
3 pages
FULLTEXT01
No ratings yet
FULLTEXT01
44 pages
Recommender: An Analysis of Collaborative Filtering Techniques
No ratings yet
Recommender: An Analysis of Collaborative Filtering Techniques
5 pages
DS - Module 4
No ratings yet
DS - Module 4
57 pages
Unit - IV
No ratings yet
Unit - IV
78 pages
Module 4
No ratings yet
Module 4
20 pages
CompSci HL P3 Case Study
No ratings yet
CompSci HL P3 Case Study
7 pages
A Review of Information Filtering-CF
No ratings yet
A Review of Information Filtering-CF
47 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
13 pages
L6 Recommendation
No ratings yet
L6 Recommendation
56 pages
Recommender System
No ratings yet
Recommender System
26 pages
Data Mining Intro IEP
No ratings yet
Data Mining Intro IEP
47 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
Lecture 2 Part1
No ratings yet
Lecture 2 Part1
14 pages
MOvie Recommendation System Project Report
No ratings yet
MOvie Recommendation System Project Report
30 pages
.Trashed-1724941095-Recommender Systems
No ratings yet
.Trashed-1724941095-Recommender Systems
30 pages
CS345A Data Mining: Recommendation Systems
No ratings yet
CS345A Data Mining: Recommendation Systems
26 pages
An Introduction To Data Mining IIT Bombay
No ratings yet
An Introduction To Data Mining IIT Bombay
48 pages
Recommendation System
No ratings yet
Recommendation System
17 pages
Classification Notes
No ratings yet
Classification Notes
14 pages
Module 5
No ratings yet
Module 5
8 pages
MLA Unit-II
No ratings yet
MLA Unit-II
10 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
48 pages
RS LVC 2 Post-Session Summary
No ratings yet
RS LVC 2 Post-Session Summary
13 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
32 pages
MACHINE LEARNING ALGORITHM Unit-II Part-II-1
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II Part-II-1
65 pages
Recommender Systems
No ratings yet
Recommender Systems
12 pages
T10 Recommender System
No ratings yet
T10 Recommender System
45 pages
Session 5
No ratings yet
Session 5
36 pages
Book Recommendation Project
No ratings yet
Book Recommendation Project
15 pages
SRU ADA Unit-3
No ratings yet
SRU ADA Unit-3
78 pages
Recommendation Systems
No ratings yet
Recommendation Systems
62 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
All Merge Chap 1
No ratings yet
All Merge Chap 1
69 pages
Introduction To Algorithms For Behavior Based Recommendation
No ratings yet
Introduction To Algorithms For Behavior Based Recommendation
36 pages
第十讲-Recommender Systems
No ratings yet
第十讲-Recommender Systems
81 pages
Industrial Training PPT On Movie Recomendation System
No ratings yet
Industrial Training PPT On Movie Recomendation System
13 pages
RecSys Updated
No ratings yet
RecSys Updated
37 pages
Lec15-S Sarkar
No ratings yet
Lec15-S Sarkar
12 pages
Music Recommendation
100% (1)
Music Recommendation
113 pages
Role of Matrix Factorization Model in Collaborative Filtering Algorithm: A Survey
No ratings yet
Role of Matrix Factorization Model in Collaborative Filtering Algorithm: A Survey
6 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
AIML Presentation
No ratings yet
AIML Presentation
21 pages
DA Slide 5,6,7
No ratings yet
DA Slide 5,6,7
11 pages
Recommended System
No ratings yet
Recommended System
33 pages
CS583 Recommender Systems
No ratings yet
CS583 Recommender Systems
40 pages
Projectbatch6ppt (1) (1) (Autosaved)
No ratings yet
Projectbatch6ppt (1) (1) (Autosaved)
24 pages
12 Recsys 1
No ratings yet
12 Recsys 1
11 pages
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
Paper 8675
No ratings yet
Paper 8675
6 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
36 pages
Module4 RecommenderSystem
No ratings yet
Module4 RecommenderSystem
11 pages
Rec Sys Context Aware
No ratings yet
Rec Sys Context Aware
25 pages
A Model-Based Collaborate Filtering Algorithm Based On Stacked Autoencoder
No ratings yet
A Model-Based Collaborate Filtering Algorithm Based On Stacked Autoencoder
9 pages
Introduction To Big Data and Data Mining
No ratings yet
Introduction To Big Data and Data Mining
130 pages
Exam TDT4215 2018 Answers
No ratings yet
Exam TDT4215 2018 Answers
9 pages
Recommendation System-WPS Office
No ratings yet
Recommendation System-WPS Office
18 pages
Content Based Filtering
No ratings yet
Content Based Filtering
40 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Image and Video Module 1
No ratings yet
Image and Video Module 1
14 pages
20mia1069 Da1
No ratings yet
20mia1069 Da1
9 pages
Recommender System Syllabus
No ratings yet
Recommender System Syllabus
3 pages
MGT1062 Da1
No ratings yet
MGT1062 Da1
6 pages
Malware Bytes Id & Key
No ratings yet
Malware Bytes Id & Key
4 pages
#Project Management Concepts - Ready Reckoner# A Quick Referance Guide For Preparing PMP Exam Based On PMBOK - 6 Edition by SN Panigrahi
No ratings yet
#Project Management Concepts - Ready Reckoner# A Quick Referance Guide For Preparing PMP Exam Based On PMBOK - 6 Edition by SN Panigrahi
1,121 pages
Lenze 8400 Electrical Shaft Technology Application - v1-0 - EN
No ratings yet
Lenze 8400 Electrical Shaft Technology Application - v1-0 - EN
50 pages
BCA 2nd Year
No ratings yet
BCA 2nd Year
31 pages
LR Jag Instructions2 1
No ratings yet
LR Jag Instructions2 1
1 page
US M 712 NEW Beanbag Positional Language Song PowerPoint Ver 6
No ratings yet
US M 712 NEW Beanbag Positional Language Song PowerPoint Ver 6
9 pages
Q1 - LE - TLE 7 - Lesson 4 - Week 4
No ratings yet
Q1 - LE - TLE 7 - Lesson 4 - Week 4
16 pages
Real-Time Canny Edge Detection Parallel Implementation For Fpgas
No ratings yet
Real-Time Canny Edge Detection Parallel Implementation For Fpgas
5 pages
Triax Main Catalogue 2013 - EN - Page 001-372 - Main Catalogue - All PDF
No ratings yet
Triax Main Catalogue 2013 - EN - Page 001-372 - Main Catalogue - All PDF
387 pages
CS Pre-ILP Assignments (Basics of Programming) : Warning
No ratings yet
CS Pre-ILP Assignments (Basics of Programming) : Warning
8 pages
Website Vulnerability Scanner Report (Light)
No ratings yet
Website Vulnerability Scanner Report (Light)
5 pages
Zodiac 6
No ratings yet
Zodiac 6
24 pages
Inbound and Outbound Idoc Configuration
No ratings yet
Inbound and Outbound Idoc Configuration
13 pages
Modicon 512/612 Micro PLC Hardware User Manual: 890 USE 145 00 Ver 1.0
No ratings yet
Modicon 512/612 Micro PLC Hardware User Manual: 890 USE 145 00 Ver 1.0
60 pages
FSSAI-April/2023-Intern-2376 Internship Mumbai
No ratings yet
FSSAI-April/2023-Intern-2376 Internship Mumbai
3 pages
Sap Abap Events in Reports
No ratings yet
Sap Abap Events in Reports
13 pages
Chapter 7 Audcis
No ratings yet
Chapter 7 Audcis
36 pages
SQL Refyne
No ratings yet
SQL Refyne
4 pages
00BA28
No ratings yet
00BA28
9 pages
Information Systems
No ratings yet
Information Systems
12 pages
5.2.1 Program Evaluation Review Technique PPT Only PDF
No ratings yet
5.2.1 Program Evaluation Review Technique PPT Only PDF
11 pages
F3 Comp HTQ
No ratings yet
F3 Comp HTQ
9 pages
1SDH001295R0155
No ratings yet
1SDH001295R0155
8 pages
CAD Coordinate System Exercise
No ratings yet
CAD Coordinate System Exercise
7 pages
SAP Process Mining by Celonis PDF
No ratings yet
SAP Process Mining by Celonis PDF
9 pages
Scope of Information Technology
No ratings yet
Scope of Information Technology
3 pages
AssetLabs Resolving Microsoft MAP WMI Error Messages
No ratings yet
AssetLabs Resolving Microsoft MAP WMI Error Messages
13 pages
SQL Window Functions Cheat Sheet
No ratings yet
SQL Window Functions Cheat Sheet
10 pages
At Os Ad Medical Dts en
No ratings yet
At Os Ad Medical Dts en
12 pages

Module 2

Uploaded by

Module 2

Uploaded by

CSE4077 Recommender Systems

• These methods are instance-based methods, whereby a model is not

• Neighborhood-based methods are generalizations of instance-based

• For example, in user-based neighborhood methods, the peers of the target

• Therefore, the training (or modelbuilding phase) is clearly

• Examples of such methods in traditional machine learning include

• All entries in the first (n − 1) columns are fully

• Therefore, a subset of the rows in the matrix is fully

• The remaining rows are referred to as the test data.

• Here, each node comprises an attribute (feature) that

GENDER AGE APP DOWNLOADED

• For Adult customer ??

• Simplest approach Item1 Item2 Item3 Item4 Item5

Add Item 3 to recommendation list

Finally recommended Items to Alice is Item 3, Item 5

• A natural criterion for judging the performance of a classifier

• The Naive Bayes classification algorithm is a probabilistic

• A naive Bayes classifier is an algorithm that uses

• Bayes Theorem provides a principled way for calculating

• Naive Bayes classifier assumes that the presence of a

• Naïve Bayes algorithms are often used in

• It doesn't require as much training data.

• These models leverage well-known dimensionality reduction

• Dimensionality reduction methods are used commonly in other areas

• The basic idea of dimensionality reduction methods is to rotate the

• The key idea in dimensionality reduction methods is that the

You might also like