0% found this document useful (0 votes)
9 views

Module5 Recommender Systems PartB

Uploaded by

Aathmika Vijay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Module5 Recommender Systems PartB

Uploaded by

Aathmika Vijay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Module 5_PartB

Examples for Item based and Model based


approaches
Application Domains of Recommender Systems

 Which movie should I watch?


 Which digital camera should I buy?
 Which news article will I find interesting?
 Toward which degree should I study? –
 Which is the best investment for my retirement money? –

Universität Mannheim – Bizer/Ponzetto: Web Usage Mining – FFS2017 (Version: 12.2.2017) – Slide 25
Paradigms of Recommender Systems- Recall from
Part A
Personalized
recommendations

 Demographic Recommendation
 Offer Backstreet Boys albums only to girls under 16
 Offer cameras with American electricity plug to people from US.

 Contextual Recommendation (Location / Time of Day/Year)


 Send coupon to mobile user who passes by a shop (Foursquare)
 Show holiday related advertisements based on user location
Universität Mannheim – Bizer/Ponzetto: Web Usage Mining – FFS2017 (Version: 12.2.2017) – Slide 26
Paradigms of Recommender Systems- Recall from
Part A

Collaborative: "Tell me what's popular


among my peers"

User–Item Rating Matrix


Item1 Item2
Alice 5 ?
User1 2 1
User2 4 3
Paradigms of Recommender Systems- Recall from
Part A

Content-based: "Show me more of the


same what I've liked"
Paradigms of Recommender Systems- Recall from
Part A

Hybrid: Combinations of various inputs


and/or composition of different
mechanisms
When does a Recommender do a good Job?

1. User’s Perspective
 Recommend me items that I like and did not know about
 Serendipity: Accident of finding something good
while not specifically searching for it

Recommend
items from the
long tail

2. Merchant’s Perspective
 Increase the sale of high-revenue items
 Thus real-world recommender systems are not as neutral as
the following slides suggest
4. Collaborative Filtering (CF)

 The most prominent approach to generate recommendations


 used by large e-commerce sites
 applicable in many domains (book, movies, DVDs, ..)

 Approach
 use the "wisdom of the crowd" to recommend items

 Basic Assumptions
1. Users give ratings to catalog items (implicitly or explicitly)
2. Customers who had similar tastes in the past,
will have similar tastes in the future

 Input: Matrix of given user–item ratings


 Output types
1. (Numerical) prediction indicating to what degree the current user will
like or dislike a certain item
2. Top-K list of recommended items
User-Based Nearest-Neighbor Collaborative Filtering

 Given an "active user" (Alice) and an item i not yet rated by


Alice
1. find a set of users (peers/nearest neighbors) who liked the same
items as Alice in the past and who have rated item i
2. use their ratings of item i to predict, if Alice will like item i
3. do this for all items Alice has not seen and recommend the best-
rated.

 Example: User–Item Rating Matrix


Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
Note: >99%
of real-world
User2 4 3 4 3 5
values are
User3 3 3 1 5 4
NULL
User4 1 5 5 2 1
User-Based Nearest-Neighbor Collaborative Filtering

 Some questions we need to answer


1. How do we measure user similarity?
2. How many neighbors should we consider?
3. How do we generate a prediction from the neighbors' ratings?

Item1 Item2 Item3 Item4 Item5


Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
4.1 Measuring User Similarity

 A popular similarity measure in user-based CF is the


Pearson Correlation Coefficient
a, b : users
ra,p : rating of user a for item e
P : set of items, rated both by a and b
∑p ∈P(ra,p —r¯a)(rb,p —r¯b)
sim a, b =
2 2
∑p ∈P r a,p —r¯a ∑p ∈P r b,p —r¯ b

 Takes different usage of rating scale into account


by comparing individual ratings to the user’s average rating
 Value range [-1,1]
1 : positive correlation
0 : variables independent
-1 : negative correlation
Example: Pearson Correlation

 A popular similarity measure in user-based CF is the


Pearson Correlation Coefficient

Item1 Item2 Item3 Item4 Item5


Alice 5 3 4 4 ?
User1 3 1 2 3 3 sim = 0.85
User2 4 3 4 3 5 sim = 0.70
User3 3 3 1 5 4 sim = 0.00
User4 1 5 5 2 1 sim = ‐0.79
Pearson Correlation

 Takes differences in rating behavior into account.


 Some people always give higher ratings than others.

6 Alice

5 User1

User4
4

Ratings
3

0
Item1 Item2 Item3 Item4

 Empirical studies show that Pearson Correlation often works


better than alternative measures such as cosine similarity
Making Predictions

1. A simple prediction function:


∑b∈N sin a, b ∗rb,p
pred a, e =
∑b∈N sin(a, b)
 Uses the similarity with a as a weight to combine ratings

2. A prediction function that takes rating behavior into


account:
∑b ∈ N sim a, b ∗(rb,p —r b )
pred a, p = r a +
∑b ∈ N sim a, b
 Calculates whether the neighbors' ratings for the unseen item i are
higher or lower than their average
 Uses the similarity with a as a weight to combine rating differences
 Add/subtract the neighbors' bias from the active user's average and
use this as a prediction
In the given example, Alice has more similarity to user 1 and
user 2, Hence the prediction of Alice rating for item 5 would be,

Given these calculation schemes, we can now compute rating predictions for Alice for all
items she has not yet seen and include the ones with the highest prediction values in the
recommendation list. In the example, it will most probably be a good choice to include
Item5 in such a list
Improving the Metrics / Prediction Function

 Neighborhood Selection
 Use fixed number of neighbors or similarity threshold

 Case Amplification
 Intuition: Give more weight to "very similar" neighbors,
i.e., where the similarity value is close to 1.
 Implementation: sim a, b 2

 Rating Variance
 Agreement on commonly liked items is not so informative as
agreement on controversial items
 Possible solution: Give more weight to items that have a higher
variance
Memory-based and Model-based Approaches

 User-based CF is said to be "memory-based"


 the rating matrix is directly used to find neighbors / make predictions
 does not scale for most real-world scenarios as large e-commerce
sites have tens of millions of customers and 10,000s of items

 Model-based approaches
 employ offline model-learning
 at run-time, the learned model is used to make predictions
 models are updated / re-trained periodically
 A large variety of techniques is used
1. Item-based Collaborative Filtering
2. Association Rules
3. Probabilistic Methods
4. Matrix Factorization Techniques
Item-based Collaborative Filtering

 Basic idea:
 Use the similarity between items (and not users) to make predictions

 Approach:
1. Look for items that have been rated similarly as Item5
2. Take Alice's ratings for these items to predict the rating for Item5

Item1 Item2 Item3 Item4 Item5


Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
Calculating Item-to-Item Similarity

 Cosine Similarity
 Often produces better results as Pearson for calculating the
item-to-item similarity.

a· b a ·b
sim a, b =
a ∗|b| a

 Adjusted Cosine Similarity


 adjusts ratings by taking the average rating behavior of a user into
account
 U: set of users who have rated both items a and b

∑u∈U(ru,a —r u ) (ru,b —r u )
sim a, b =
2 2
∑u∈U ru,a —r u ∑u∈U r u,b —r u
Computing Cosine Similarity
between Item 5 and Item1,
Making Predictions

 A common prediction function for item-based CF:

∑i∈ratedItem(u) sim i,p ∗ru,i


pred u, p =
∑i∈ratedItem(u) sim i, p

ratedItem(u) : Set of items rated by Alice


r ui : Alice‘s rating for items i
sim(i, p) : Similarity of item i with target item p
Pre-Processing for Item-Based Filtering

 Item-based filtering does not solve the scalability problem


itself, but as there are usually less items than users, we can
pre-calculate the item similarities and store them in memory.

 Neighborhood size is typically also limited to a specific size


 An analysis of the MovieLens dataset indicates a neighbor-hood size
of 20 to 50 items is reasonable (Herlocker et al. 2002)
 Not all neighbors are taken into account for the prediction,
as Alice most likely only rated a small subset of the neighbors

 Memory requirements
 Up to N2 pair-wise similarities to be memorized (N = number of items)
in theory
 In practice, the memory requirements are significantly lower as
- many items have no co-ratings (heavy metal and samba CDs)
- neighborhood size often limited to max items above minimum similarity
threshold
MODEL BASED APPROACHES
Recap: Association Rule Mining
 Commonly used for shopping basket analysis
 aims at detection of rules such as "If a customer purchases beer
then he also buys diapers in 70% of the cases"

 Association rule mining algorithms


 detect rules of the form X → Y (e.g., beer → diapers)
from a set of sales transactions D = {t1, t2, … tn}
 Two step rule mining process
1. determine frequent item sets
2. derive rules from the frequent item sets
 Measures of rule quality
- used e.g. as a threshold to cut off unimportant rules
- Support count = σ(X)=|{x|x ⊆ ti, t i  D}|
σ(X∪Y)
- Support =
|D|
σ(X∪Y)
- Confidence =
σ(X)
Un-Personalized Recommendation

Items co-
occurring with
book in frequent
item sets
Personalized Recommendation using Association Rules

 Simplest approach Item1 Item2 Item3 Item4 Item5

 transform 5-point ratings into Alice 1 0 0 0 ?


binary ratings User1 1 0 1 0 1
(1 = above user average) User2 1 0 1 0 1
User3 0 0 0 1 1
 Mine rules such as
User4 0 1 1 0 0
 Item1 → Item5
- support (2/4), confidence (2/2) (without Alice)

 Make recommendations for Alice (basic method)


1. determine "relevant" rules based on Alice's transactions/ratings
(the above rule will be relevant as Alice bought/rated Item1)
2. determine items not already bought/rated by Alice
3. sort the items based on the rules' confidence values
Probabilistic Methods
 Basic idea:
 given the user/item rating matrix
 determine the probability that Alice will give item i a specific rating

 Calculation of rating probabilities based on Bayes Theorem


 Given Alice's previous ratings, how probable is it that she rates Item5
with the rating value 1?
 Corresponds to conditional probability P(Item5=1 | X), where
X = Alice's previous ratings = (Item1 =1, Item2=3, Item3= … )
 Can be estimated using Bayes' theorem and independence assumption
Y = Item5=1
Independence
Probability of Prior without
assumption
seeing evidence evidence

P X Y × P(Y) ∏i=1
d
P Xi Y × P(Y) See: IE500
P YX = P YX = Data Mining:
P(X) P(X) Chapter 3
• As P(X) is a constant value, we can omit it in our
calculations. P(Y ) can be estimated for each rating value
based on the ratings database: P(Item5=1) = 2/4 (as two
of four ratings for Item5 had the value 1), P(Item5=2)=0,
and so forth. What remains is the calculation of all
class-conditional probabilities P(Xi|Y ):
Estimation of the Probabilities
Item1 Item2 Item3 Item4 Item5
Alice 1 3 3 2 ?

User1 2 4 2 2 4
X = (Item1 =1, Item2=3, Item3= … )
User2 1 3 3 5 1
User3 4 5 2 3 3
User4 1 1 5 2 1

P X Item5 = 1
= P Item1 = 1 Item5 = 1 × P Item2 = 3 Item5 = 1
2 1 1 1
× P Item3 = 3 Item5 = 1 × P Item4 = 2 Item5 = 1 = × × ×
2 2 2 2
= 0. 125

 Based on these calculations, given that P(Item5=1) = 2/4 and omitting the
constant factor P(X) in the Bayes classifier, the posterior probability of a rating
value 1 for Item5 is P(Item5 = 1|X) = 2/4 × 0.125 = 0.0625. In the example
ratings database, P(Item5=1) is higher than all other probabilities, which
means that the probabilistic rating prediction for Alice will be 1 for Item5.
More on Ratings: Explicit Ratings
 Explicit ratings are probably the most precise ratings.
 Commonly used response scales:
 1 to 5 Likert scales
 Like (sometimes also Dislike)

 Main problems
 Users often not willing to rate items
- number of ratings likely to be too small
→ poor recommendation quality
 How to stimulate users to rate more items?
- Example: Amazon Betterizer

 Alternative
 Use implicit ratings
(in addition to explicit ones)
More on Ratings: Implicit Ratings
 Events potentially interpretable as positive ratings
 items bought
 clicks, page views
 time spent on some page
 demo downloads …

 Advantage
 implicit ratings can be collected constantly by the web site
or application in which the recommender system is embedded
 collection of ratings does not require additional effort from the user

 Problem
 one cannot be sure whether the user behavior is correctly interpreted
 for example, a user might not like all the books he or she has bought;
the user also might have bought a book for someone else
Collaborative Filtering Discussion
 Pros:
 works well in some domains: books, movies. Likely not: life insurances
 requires no explicit item descriptions or demographic user profiles

 Cons:
 requires user community to give enough ratings
(many real-world systems thus employ implicit ratings)
 no exploitation of other sources of recommendation knowledge
(demographic data, item descriptions)
 Cold Start Problem
- How to recommend new items?
- What to recommend to new users?
 Approaches for dealing with the Cold Start Problem
- Ask/force users to rate a set of items
- Use another method or combination of methods (e.g., content-based,
demographic or simply non-personalized) until enough ratings are
collected
4.2 Content-based Recommendation

 While collaborative filtering methods do not use any information


about the items, it might be reasonable to exploit such information.
 e.g., recommend fantasy novels to people who liked fantasy novels in the past

 What do we need:
 information about the available items (content)
 some sort of user profile describing what the user likes (user preferences)

 The tasks:
1. learn user preferences from what she has bought/seen before
2. recommend items that are "similar" to the user preferences

"show me more
of the same
what I've liked"
Content and User Profile Representation

 Content Representation
Title Genre Author Type Price Keywords

The Night of Memoir David Carr Paperback 29.90 Press and journalism, drug
the Gun addiction, personal memoirs,
New York
The Lace Fiction, Brunonia Hardcover 49.90 American contemporary fiction,
Reader Mystery Barry detective, historical

Into the Fire Romance, Suzanne Hardcover 45.90 American fiction, murder, neo-
Suspense Brockmann nazism

 User Profile
Title Genres Authors Types Avg. Keywords
Price
… Fiction. Brunonia, Paperback 25.65 Detective, murder,
Mystery Barry, Ken New York
Follett

 use attribute specific similarity measures and weights


Recommending Text Documents

 Content-based recommendation techniques are often applied to


recommend text documents, like news articles or blog posts.
 Documents and user profiles are represented as term-vectors:

Document Corpus User Profile


Doc 1 Doc 2 Doc 3 Liked Liked Liked
Doc 1 Doc 2 Doc 3

Antony 157 73 0 Antony 0 1 0

Brutus 4 157 0 Brutus 2 2 0

Caesar 232 227 0 Caesar 4 3 0

Calpurnia 0 10 123 Calpurnia 233 99 34

Cleopatra 17 0 52 Cleopatra 57 12 0

mercy 1 0 43 mercy 22 23 90
Similarity of Text Documents

 Challenges
 terms vectors are very sparse
 not every word has the same importance
 long documents have higher chance to overlap with user profile

 Methods for handling these challenges


 Similarity metric: Cosine similarity
 Preprocessing: remove stop words
 Vector Creation:
Term-Frequency - Inverse Document Frequency (TF —IDF)
Recommending Documents

 Given a set of documents already rated by the user


 either explicitly via user interface
 or implicitly by monitoring user behavior

1. Find the n nearest neighbors of an not-yet-seen item i in D


 measure similarity of item i with neighbors using cosine similarity

2. Use ratings from Alice for neighbors to predict a rating for item i
 Find 5 most similar items to i
 4 of these items were liked by Alice item i will also be liked by Alice

 Variations:
 Varying neighborhood size k
 upper similarity threshold to prevent system from recommending too similar
texts (variations of texts the user has already seen)

 Good to model short-term interests / follow-up stories


 Often used in combination with method to model long-term preferences
 E.g. ‘Semantic enrichment’ by assigning interests to each page/product.
Universität Mannheim – Bizer/Ponzetto: Web Usage Mining – FFS2017 (Version: 12.2.2017) – Slide 56
SAMPLE SOLVED EXAMPLE OF MATRIX FACTORIZATION
Assume that the factored matrix will have only 2 features F1 and F2
User Matrix: According to Ryan, if its a Marvel movie, he’ll give it 3
points and if he is in the movie, he’ll give it 2 more points (Typical
Ryan!). Item Matrix: Item Matrix contains binary values where the
value is 1 if conditions of features mentioned above are satisfied and 0
otherwise. By performing dot product of the user matrix and item
matrix, Infinity War gets 3 and Deadpool gets a 5.
Content-based Filtering Discussion

 Pros:
 In contrast to collaborative approaches, content-based techniques
do not require user community in order to work
 No problems with recommending new items

 Cons:
 Require to learn a suitable model of user's preferences based on
explicit or implicit feedback
- deriving implicit feedback from user behavior can be problematic
- ramp-up phase required (users needs to view/rate some items)
- Web 2.0: Use other sources to learn the user preferences might be an
option (e.g. share your Facebook profile with e-shop)
 Overspecialization
- Algorithms tend to propose "more of the same"
- Recommendations might be boring as items are too similar
4.3 Hybrid Recommender Systems

Hybrid: Combinations of various


inputs and/or composition of
different mechanism in order to
overcome problems of single
methods.

Demographic: “Offer American plugs to people from the US“


Collaborative: "Tell me what's popular among my peers"
Content-based: "Show me more of the same what I've liked"
Parallelized Hybridization Design

 Output of several existing recommenders is combined


 Least invasive design
 Requires some weighting or voting scheme
 weights can be learned using existing ratings as supervision
 dynamic weighting: Adjust weights or switch between different recommenders
as more information about users and items becomes available
- e.g. if too few ratings available the use content-based recommendation,
otherwise use collaborative filtering
Parallelized Hybridization Design: Weighted

n
• Compute weighted sum: rec weighted u, i    k  reck u, i 
k 1

Recommender 1 Recommender 2
Item1 0.5 1 Item1 0.8 2
Item2 0 Item2 0.9 1
Item3 0.3 2 Item3 0.4 3
Item4 0.1 3 Item4 0
Item5 0 Item5 0

Recommender weighted(0.5:0.5)
Item1 0.65 1
Item2 0.45 2
Item3 0.35 3
Item4 0.05 4
Item5 0.00
Adjustment of Weights

 Use existing ratings to learn individual weights for each user


 Compare prediction of recommenders with actual ratings by user
 For each user adapt weights to minimize Mean Absolute Error (MAE)

Absolute errors and MAE


Weight1 Weight2 rec1 rec2 error MAE
0.1 0.9 Item1 0.5 0.8 0.23 0.61
Item4 0.1 0.0 0.99
0.3 0.7 Item1 0.5 0.8 0.29 0.63
 
n
 k  reck (u, i)  ri
ri R k 1
Item4 0.1 0.0 0.97 MAE 
R
0.5 0.5 Item1 0.5 0.8 0.35 0.65
Item4 0.1 0.0 0.95
0.7 0.3 Item1 0.5 0.8 0.41 0.67
MAE improves as rec2
Item4 0.1 0.0 0.93
is weighted more strongly
0.9 0.1 Item1 0.5 0.8 0.47 0.69
Item4 0.1 0.0 0.91
Monolithic Hybridization Design

 Features/knowledge sources of different paradigms are combined in a


single recommendation component. E.g.:
 Ratings and user demographics
 Ratings and content features: user likes many movies that are comedies

 Example: Content-boosted Collaborative Filtering


 based on content features additional ratings are created
 e.g. Alice likes Items 1 and 3 (unary ratings)
- Item7 is similar to 1 and 3 by a degree of 0.75
- Thus Alice likes Item7 by 0.75
 rating matrix becomes less sparse
 see [Prem Melville, et al. 2002]
4.4 Evaluating Recommender Systems

Question: Is a Recommender System efficient with respect to


a specific criteria like accuracy, serendipity, online
conversion, response time, ramp-up efforts?

 So we need to determine the criteria that matter to us


 Popular Measures for Accuracy
 If items are rated on a Likert scale (1 to 5)
- MAE (Mean Absolute Error), RMSE (Root Mean Squared Error)
 If items are classified as good or bad
- Precision / Recall / F1-Score
 If items are presented as ranked Top-K list
- Lift Index, Normalized Discounted Cumulative Gain

 Methodologies for measuring Accuracy


 Split-Validation, Cross-Validation
Evaluation Methodology

 Setting to ensure internal validity: Alice


 One randomly selected share of known ratings Item1 5 Training
(training set) used as input to train the algorithm Set
and build the model Item2 1
Item3 3
 Remaining share of withheld ratings (test set)
used as ground truth to evaluate quality Item4 1
 To ensure the reliability of measurements the Item5 4 Test
random split, model building and evaluation steps Set
Item6 2
are repeated several times.

 Split-Validation
 Split-Validation: e.g. 2/3 training, 1/3 validation

 N-Fold Cross Validation


 N disjunct fractions of known ratings with equal size (1/N) are determined.
Setting N to 5 or 10 is popular.
 N repetitions of the model building and evaluation steps, where each fraction
is used exactly once as a testing set while the other fractions are used for
training.
Evaluation of Likert-Scaled Predictions

 Mean Absolute Error (MAE) computes the deviation


between predicted ratings and actual ratings
1 n
MAE   | pi  ri |
n i 1
 Root Mean Square Error (RMSE) is similar to MAE, but
places more emphasis on larger deviation

1 n
RMSE   ( pi  ri )2
n i 1
 Critique
 Not meaningful as inclusion into Top-K list is more important
to the user than overall accuracy of predictions.
 Rather evaluate inclusion into Top-K list as classification
problem (see next slide).
Evaluation of Good/Bad Classifications

Confusion Matrix
 Precision: Measure of exactness.
 determines the fraction of relevant Reality
items retrieved out of all items retrieved Actually Actually Bad
Good
 fraction of recommended movies
that are actually good Good True False

Prediction
Positive (tp) Positive (fp)
 Recall: Measure of completeness. Bad False True
 determines the fraction of relevant items Negative Negative (tn)
(fn)
retrieved out of all relevant items
 E.g. the fraction of all good movies
recommended

 F1-Measure
 combines Precision and Recall into a
single value for comparison purposes.
 May be used to gain a more balanced
view of performance
Evaluation of ranked Top-K List

For a specific user:


Actually good Recommended
(predicted as good)
Item 237 Item 345
Hit
Item 899 Item 237

Item 187

 Rank position also matters!


 Rank metrics extend recall and precision to take the
positions of correct items in a ranked list into account
 Relevant items are more useful when they appear earlier
in the recommendation list
 Particularly important in recommender systems as lower ranked
items may be overlooked by users
Public Rating Datasets

 MovieLens
 movie ratings collected via MovieLens website
 1M Dataset: 6.000 users, 3.900 movies, 1 million ratings
 10M Dataset: 71.000 users, 10.600 movies, 10 million ratings

 Netflix
 provided by commercial movie rental website for Netflix competition
($1,000,000 for 10% better RMSE)
 480.000 users rated 18.000 movies, 100M ratings

 Yahoo Music
 600.000 songs, 1 million users, 300M ratings
 provided for KDD Cup 2011

 Web 2.0 Platforms offer plenty of additional rating data


 e.g. LastFM, delicious

You might also like