Recommender Systems: Collaborative Filtering & Content-Based Recommending

The document discusses recommender systems and two main approaches: collaborative filtering and content-based recommending. Collaborative filtering recommends items based on ratings from similar users, while content-based recommending uses item descriptions and a user's preferences to make recommendations. The document also describes LIBRA, a book recommending system that uses content-based filtering to learn a user's preferences from rated examples and recommend additional books.

Uploaded by

Krishna Reddy Kalyanam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views47 pages

Recommender Systems: Collaborative Filtering & Content-Based Recommending

Uploaded by

Krishna Reddy Kalyanam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 47

1

Recommender Systems
Collaborative Filtering &
Content-Based Recommending
2
Recommender Systems
Systems for recommending items (e.g. books,
movies, CDs, web pages, newsgroup messages)
to users based on examples of their preferences.
Many on-line stores provide recommendations
(e.g. Amazon, CDNow).
Recommenders have been shown to substantially
increase sales at on-line stores.
There are two basic approaches to recommending:
Collaborative Filtering (a.k.a. social filtering)
Content-based
3
Book Recommender
Red
Mars
Juras-
sic
Park
Lost
World
2001
Found
ation
Differ-
ence
Engine
Machine
Learning
User
Profile
Neuro-
mancer
2010
4
Personalization
Recommenders are instances of personalization
software.
Personalization concerns adapting to the individual
needs, interests, and preferences of each user.
Includes:
Recommending
Filtering
Predicting (e.g. form or calendar appt. completion)
From a business perspective, it is viewed as part of
Customer Relationship Management (CRM).
5
Machine Learning and Personalization
Machine Learning can allow learning a user
model or profile of a particular user based
on:
Sample interaction
Rated examples
This model or profile can then be used to:
Recommend items
Filter information
Predict behavior
6
Collaborative Filtering
Maintain a database of many users ratings of a
variety of items.
For a given user, find other similar users whose
ratings strongly correlate with the current user.
Recommend items rated highly by these similar
users, but not rated by the current user.
Almost all existing commercial recommenders use
this approach (e.g. Amazon).

7
Collaborative Filtering
A 9
B 3
C
: :
Z 5
A
B
C 9
: :
Z 10
A 5
B 3
C
: :
Z 7
A
B
C 8
: :
Z
A 6
B 4
C
: :
Z
A 10
B 4
C 8
. .
Z 1
User
Database
Active
User
Correlation
Match
A 9
B 3
C
. .
Z 5
A 9
B 3
C
: :
Z 5
A 10
B 4
C 8
. .
Z 1
Extract
Recommendations
C
8
Collaborative Filtering Method
Weight all users with respect to similarity
with the active user.
Select a subset of the users (neighbors) to
use as predictors.
Normalize ratings and compute a prediction
from a weighted combination of the
selected neighbors ratings.
Present items with highest predicted ratings
as recommendations.
9
Similarity Weighting
Typically use Pearson correlation coefficient between
ratings for active user, a, and another user, u.

u a
r r
u a
u a
r r
c
o o
) , ( covar
,
=
r
a
and r
u
are the ratings vectors for the m items rated by
both a and u

r
i,j
is user is rating for item j
10
Covariance and Standard Deviation
Covariance:

Standard Deviation:
m
r r r r
r r
m
i
u i u a i a
u a

=

=
1
, ,
) )( (
) , ( covar
m
r
r
m
i
i x
x

=
=
1
,
m
r r
m
i
x i x
r
x

=

=
1
2
,
) (
o
11
Significance Weighting
Important not to trust correlations based on
very few co-rated items.
Include significance weights, s
a,u
, based on
number of co-rated items, m.
u a u a u a
c s w
, , ,
=

s
>
=
50 if
50
50 if 1
,
m
m
m
s
u a
12
Neighbor Selection
For a given active user, a, select correlated
users to serve as source of predictions.
Standard approach is to use the most similar
n users, u, based on similarity weights, w
a,u

Alternate approach is to include all users
whose similarity weight is above a given
threshold.
13
Rating Prediction
Predict a rating, p
a,i
, for each item i, for active user, a,
by using the n selected neighbor users, u e {1,2,n}.
To account for users different ratings levels, base
predictions on differences from a users average rating.
Weight users ratings contribution by their similarity to
the active user.

=
=

+ =
n
u
u a
n
u
u i u u a
a i a
w
r r w
r p
1
,
1
, ,
,
) (
14
Problems with Collaborative Filtering
Cold Start: There needs to be enough other users
already in the system to find a match.
Sparsity: If there are many items to be
recommended, even if there are many users, the
user/ratings matrix is sparse, and it is hard to find
users that have rated the same items.
First Rater: Cannot recommend an item that has
not been previously rated.
New items
Esoteric items
Popularity Bias: Cannot recommend items to
someone with unique tastes.
Tends to recommend popular items.

15
Content-Based Recommending
Recommendations are based on information on the
content of items rather than on other users
opinions.
Uses a machine learning algorithm to induce a
profile of the users preferences from examples
based on a featural description of content.
Some previous applications:
Newsweeder (Lang, 1995)
Syskill and Webert (Pazzani et al., 1996)
16
Advantages of Content-Based Approach
No need for data on other users.
No cold-start or sparsity problems.
Able to recommend to users with unique tastes.
Able to recommend new and unpopular items
No first-rater problem.
Can provide explanations of recommended
items by listing content-features that caused an
item to be recommended.
17
Disadvantages of Content-Based Method
Requires content that can be encoded as
meaningful features.
Users tastes must be represented as a
learnable function of these content features.
Unable to exploit quality judgments of other
users.
Unless these are somehow included in the
content features.

18
LIBRA
Learning Intelligent Book Recommending Agent
Content-based recommender for books using
information about titles extracted from Amazon.
Uses information extraction from the web to
organize text into fields:
Author
Title
Editorial Reviews
Customer Comments
Subject terms
Related authors
Related titles
19
LIBRA System
Amazon Pages
Rated
Examples
User Profile
Machine Learning
Learner
Information
Extraction
LIBRA
Database
Recommendations
1.~~~~~~
2.~~~~~~~
3.~~~~~
:
:
:
Predictor
20
Sample Amazon Page
Age of Spiritual Machines
21
Sample Extracted Information
Title: <The Age of Spiritual Machines: When Computers Exceed Human Intelligence>
Author: <Ray Kurzweil>
Price: <11.96>
Publication Date: <January 2000>
ISBN: <0140282025>
Related Titles: <Title: <Robot: Mere Machine or Transcendent Mind>
Author: <Hans Moravec> >

Reviews: <Author: <Amazon.com Reviews> Text: <How much do we humans> >

Comments: <Stars: <4> Author: <Stephen A. Haines> Text:<Kurzweil has > >

Related Authors: <Hans P. Moravec> <K. Eric Drexler>
Subjects: <Science/Mathematics> <Computers> <Artificial Intelligence>
22
Libra Content Information
Libra uses this extracted information to
form bags of words for the following
slots:
Author
Title
Description (reviews and comments)
Subjects
Related Titles
Related Authors
23
Libra Overview
User rates selected titles on a 1 to 10 scale.
Libra uses a nave Bayesian text-categorization
algorithm to learn a profile from these rated
examples.
Rating 610: Positive
Rating 15: Negative
The learned profile is used to rank all other books as
recommendations based on the computed posterior
probability that they are positive.
User can also provide explicit positive/negative
keywords, which are used as priors to bias the role
of these features in categorization.
24
Bayesian Categorization in LIBRA
Model is generalized to generate a vector of bags
of words (one bag for each slot).
Instances of the same word in different slots are treated
as separate features:
Chrichton in author vs. Chrichton in description
Training examples are treated as weighted positive
or negative examples when estimating conditional
probability parameters:
An example with rating 1 s r s 10 is given:
positive probability: (r 1)/9
negative probability: (10 r)/9
25
Implementation
Stopwords removed from all bags.
A books title and author are added to its own
related title and related author slots.
All probabilities are smoothed using Laplace
estimation to account for small sample size.
Lisp implementation is quite efficient:
Training: 20 exs in 0.4 secs, 840 exs in 11.5 secs
Test: 200 books per second
26
Explanations of Profiles and
Recommendations
Feature strength of word w
k
appearing in a
slot s
j
:

) s negative, | (
) , positive | (
log ) , ( strength
j k
j k
j k
w P
s w P
s w=
27
Libra Demo
http://www.cs.utexas.edu/users/libra
28
Experimental Data
Amazon searches were used to find books
in various genres.
Titles that have at least one review or
comment were kept.
Data sets:
Literature fiction: 3,061 titles
Mystery: 7,285 titles
Science: 3,813 titles
Science Fiction: 3.813 titles
29
Rated Data
4 users rated random examples within a
genre by reviewing the Amazon pages about
the title:
LIT1 936 titles
LIT2 935 titles
MYST 500 titles
SCI 500 titles
SF 500 titles
30
Experimental Method
10-fold cross-validation to generate learning curves.
Measured several metrics on independent test data:
Precision at top 3: % of the top 3 that are positive
Rating of top 3: Average rating assigned to top 3
Rank Correlation: Spearmans, r
s
, between systems and
users complete rankings.
Test ablation of related author and related title slots
(LIBRA-NR).
Test influence of information generated by Amazons
collaborative approach.
31
Experimental Result Summary
Precision at top 3 is fairly consistently in the
90s% after only 20 examples.
Rating of top 3 is fairly consistently above 8 after
only 20 examples.
All results are always significantly better than
random chance after only 5 examples.
Rank correlation is generally above 0.3 (moderate)
after only 10 examples.
Rank correlation is generally above 0.6 (high)
after 40 examples.
32
Precision at Top 3 for Science
33
Rating of Top 3 for Science
34
Rank Correlation for Science
35
User Studies
Subjects asked to use Libra and get
recommendations.
Encouraged several rounds of feedback.
Rated all books in final list of
recommendations.
Selected two books for purchase.
Returned reviews after reading selections.
Completed questionnaire about the system.
36
Combining Content and Collaboration
Content-based and collaborative methods have
complementary strengths and weaknesses.
Combine methods to obtain the best of both.
Various hybrid approaches:
Apply both methods and combine recommendations.
Use collaborative data as content.
Use content-based predictor as another collaborator.
Use content-based predictor to complete
collaborative data.

37
Movie Domain
EachMovie Dataset [Compaq Research Labs]
Contains user ratings for movies on a 05 scale.
72,916 users (avg. 39 ratings each).
1,628 movies.
Sparse user-ratings matrix (2.6% full).
Crawled Internet Movie Database (IMDb)
Extracted content for titles in EachMovie.
Basic movie information:
Title, Director, Cast, Genre, etc.
Popular opinions:
User comments, Newspaper and Newsgroup reviews, etc.

38
Content-Boosted Collaborative Filtering
IMDb
EachMovie
Web Crawler
Movie
Content
Database
Full User
Ratings Matrix
Collaborative
Filtering
Active
User Ratings
User Ratings
Matrix (Sparse)
Content-based
Predictor
Recommendations
39
Content-Boosted CF - I
Content-Based
Predictor
Training Examples
Pseudo User-ratings Vector
Items with Predicted Ratings
User-ratings Vector
User-rated Items
Unrated Items
40
Content-Boosted CF - II
Compute pseudo user ratings matrix
Full matrix approximates actual full user ratings matrix
Perform CF
Using Pearson corr. between pseudo user-rating vectors
User Ratings
Matrix
Pseudo User
Ratings Matrix
Content-Based
Predictor
41
Experimental Method
Used subset of EachMovie (7,893 users; 299,997
ratings)
Test set: 10% of the users selected at random.
Test users that rated at least 40 movies.
Train on the remainder sets.
Hold-out set: 25% items for each test user.
Predict rating of each item in the hold-out set.
Compared CBCF to other prediction approaches:
Pure CF
Pure Content-based
Nave hybrid (averages CF and content-based
predictions)
42
Metrics
Mean Absolute Error (MAE)
Compares numerical predictions with user ratings

ROC sensitivity [Herlocker 99]
How well predictions help users select high-quality
items
Ratings > 4 considered good; < 4 considered bad

Paired t-test for statistical significance
43
Results - I
0.9
0.92
0.94
0.96
0.98
1
1.02
1.04
1.06
M
A
E
Algorithm
MAE
CF
Content
Nave
CBCF
CBCF is significantly better (4% over CF) at (p < 0.001)
44
Results - II
0.58
0.6
0.62
0.64
0.66
0.68
R
O
C
-
4
Algorithm
ROC Sensitivity
CF
Content
Nave
CBCF
CBCF outperforms rest (5% improvement over CF)
45
Active Learning
(Sample Section, Learning with Queries)
Used to reduce the number of training
examples required.
System requests ratings for specific items
from which it would learn the most.
Several existing methods:
Uncertainty sampling
Committee-based sampling
46
Semi-Supervised Learning
(Weakly Supervised, Bootstrapping)
Use wealth of unlabeled examples to aid
learning from a small amount of labeled data.
Several recent methods developed:
Semi-supervised EM (Expectation Maximization)
Co-training
Transductive SVMs
47
Conclusions
Recommending and personalization are
important approaches to combating
information over-load.
Machine Learning is an important part of
systems for these tasks.
Collaborative filtering has problems.
Content-based methods address these
problems (but have problems of their own).
Integrating both is best.

Project Report "E-Commerce Recommendation"
No ratings yet
Project Report "E-Commerce Recommendation"
20 pages
Movie Recommender System PDF
100% (1)
Movie Recommender System PDF
5 pages
Implementation and Comparison of Recommender Systems Using Various Models
100% (1)
Implementation and Comparison of Recommender Systems Using Various Models
13 pages
Recommender Systems
No ratings yet
Recommender Systems
23 pages
Industrial Training PPT On Movie Recomendation System
No ratings yet
Industrial Training PPT On Movie Recomendation System
13 pages
Filtering and Recommender Systems: Content-Based and Collaborative
No ratings yet
Filtering and Recommender Systems: Content-Based and Collaborative
30 pages
SIS 2047Y (3) E-Business and Digital Marketing S.Rosun@uom - Ac.mu
No ratings yet
SIS 2047Y (3) E-Business and Digital Marketing S.Rosun@uom - Ac.mu
45 pages
Collaborative Filtering & Content-Based Recommending: CS 293S. T. Yang Slides Based On R. Mooney at UT Austin
No ratings yet
Collaborative Filtering & Content-Based Recommending: CS 293S. T. Yang Slides Based On R. Mooney at UT Austin
22 pages
ML Unit 6
No ratings yet
ML Unit 6
83 pages
DM - Lecture 5
No ratings yet
DM - Lecture 5
75 pages
Recommender Systems: A, B, C B B, D A, B, C, A, B, C A, B, C, e
No ratings yet
Recommender Systems: A, B, C B B, D A, B, C, A, B, C A, B, C, e
97 pages
Recommendation System
No ratings yet
Recommendation System
17 pages
Module 5
No ratings yet
Module 5
50 pages
Slides Lecture 2 RecSys
No ratings yet
Slides Lecture 2 RecSys
86 pages
10 Recommender Systems
No ratings yet
10 Recommender Systems
35 pages
Ijaret: International Journal of Advanced Research in Engineering and Technology (Ijaret)
No ratings yet
Ijaret: International Journal of Advanced Research in Engineering and Technology (Ijaret)
8 pages
UNIT I - Introduction-Recommender Systems
No ratings yet
UNIT I - Introduction-Recommender Systems
24 pages
MS - BDA Lec - Recommendation Systems I
No ratings yet
MS - BDA Lec - Recommendation Systems I
31 pages
Gopal Project
No ratings yet
Gopal Project
31 pages
Recommendation in Social Media: Recommender System
No ratings yet
Recommendation in Social Media: Recommender System
29 pages
Recommender Lecture
No ratings yet
Recommender Lecture
29 pages
Notes On Recommender Systems
No ratings yet
Notes On Recommender Systems
72 pages
Lec15-S Sarkar
No ratings yet
Lec15-S Sarkar
12 pages
IDEA - Collaborative Filtering Techniques in Recommendation Systems
No ratings yet
IDEA - Collaborative Filtering Techniques in Recommendation Systems
11 pages
Karan Mini Proj
No ratings yet
Karan Mini Proj
11 pages
Recommender - Introduction
No ratings yet
Recommender - Introduction
25 pages
Oracle Ds Introduction To Recommendation Engines
No ratings yet
Oracle Ds Introduction To Recommendation Engines
9 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
36 pages
Recommender Systems Asanov
No ratings yet
Recommender Systems Asanov
7 pages
Book Recommendation System Project
No ratings yet
Book Recommendation System Project
14 pages
Recommendation System-WPS Office
No ratings yet
Recommendation System-WPS Office
18 pages
Module5 Recommender Systems PartA
No ratings yet
Module5 Recommender Systems PartA
54 pages
Recommender System
No ratings yet
Recommender System
26 pages
T10 Recommender System
No ratings yet
T10 Recommender System
45 pages
L6 Recommendation
No ratings yet
L6 Recommendation
56 pages
Unit 3
No ratings yet
Unit 3
21 pages
TECHNICAL+NOTE Recommender+Systems+v.27
No ratings yet
TECHNICAL+NOTE Recommender+Systems+v.27
16 pages
Recommendation Engines
No ratings yet
Recommendation Engines
17 pages
Recommended System
No ratings yet
Recommended System
33 pages
Moradabad Institute of Technology
No ratings yet
Moradabad Institute of Technology
13 pages
RecommenderSystems Shortened
No ratings yet
RecommenderSystems Shortened
95 pages
DM Lect 6 - Recommender Systems
No ratings yet
DM Lect 6 - Recommender Systems
46 pages
Journal of Advanced Zoology: Content Based Filtering and Collaborative Filtering: A Comparative Study
No ratings yet
Journal of Advanced Zoology: Content Based Filtering and Collaborative Filtering: A Comparative Study
5 pages
Recommender Systems
No ratings yet
Recommender Systems
12 pages
MOvie Recommendation System Project Report
No ratings yet
MOvie Recommendation System Project Report
30 pages
Book Recommendation System
No ratings yet
Book Recommendation System
8 pages
Module4 RecommenderSystem
No ratings yet
Module4 RecommenderSystem
11 pages
Module 5
No ratings yet
Module 5
8 pages
Ai Document
No ratings yet
Ai Document
11 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
13 pages
Building A Recommendation System With R - Sample Chapter
No ratings yet
Building A Recommendation System With R - Sample Chapter
11 pages
Paper Template Con
No ratings yet
Paper Template Con
4 pages
AI Recommendation System
No ratings yet
AI Recommendation System
20 pages
Bilgic 和 Mooney - Explaining Recommendations Satisfaction vs. Promo
No ratings yet
Bilgic 和 Mooney - Explaining Recommendations Satisfaction vs. Promo
8 pages
Recommendation Systems: Department of Computer Science Engineering University School of Information and Technology
No ratings yet
Recommendation Systems: Department of Computer Science Engineering University School of Information and Technology
6 pages
Module 6 - Link Analysis Recommendation Systems
No ratings yet
Module 6 - Link Analysis Recommendation Systems
68 pages
CompSci HL P3 Case Study
No ratings yet
CompSci HL P3 Case Study
7 pages
Recommender System
No ratings yet
Recommender System
8 pages
Simulation for Data Science with R
From Everand
Simulation for Data Science with R
Matthias Templ
No ratings yet
Analysing Data For Your PhD: An Introduction: PhD Knowledge, #3
From Everand
Analysing Data For Your PhD: An Introduction: PhD Knowledge, #3
Helen Kara
No ratings yet

Recommender Systems: Collaborative Filtering & Content-Based Recommending

Uploaded by

Recommender Systems: Collaborative Filtering & Content-Based Recommending

Uploaded by

1

You might also like