0% found this document useful (0 votes)
87 views26 pages

CS345A Data Mining: Recommendation Systems

This document discusses recommendation systems and some of the key techniques used to build them. It describes how recommendation engines work to filter information and provide personalized recommendations to users. It covers content-based recommendation approaches that match user profiles to item profiles, collaborative filtering techniques that identify similar users, and hybrid methods. It also discusses challenges like data sparsity, evaluating recommendation quality, and efficiently finding similar items and users.

Uploaded by

Devang Thakkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views26 pages

CS345A Data Mining: Recommendation Systems

This document discusses recommendation systems and some of the key techniques used to build them. It describes how recommendation engines work to filter information and provide personalized recommendations to users. It covers content-based recommendation approaches that match user profiles to item profiles, collaborative filtering techniques that identify similar users, and hybrid methods. It also discusses challenges like data sparsity, evaluating recommendation quality, and efficiently finding similar items and users.

Uploaded by

Devang Thakkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

CS345A

Data Mining
Recommendation Systems
Anand Rajaraman

Recommendations

Search

Recommendations

Items

Products, web sites, blogs, news items,

From scarcity to abundance


 Shelf space is a scarce commodity for
traditional retailers
 Also: TV networks, movie theaters,

 The web enables near-zero-cost


dissemination of information about
products
 From scarcity to abundance

 More choice necessitates better filters


 Recommendation engines
 How Into Thin Air made Touching the Void a
bestseller

The Long Tail

Source: Chris Anderson (2004)

Recommendation Types
 Editorial
 Simple aggregates
 Top 10, Most Popular, Recent Uploads

 Tailored to individual users


 Amazon, Netflix,

Formal Model
 C = set of Customers
 S = set of Items
 Utility function u: C S  R
 R = set of ratings
 R is a totally ordered set
 e.g., 0-5 stars, real number in [0,1]

Utility Matrix
Avatar

Alice

Bob

Carol

David

LOTR

Matrix

0.2
0.5

0.2

Pirates

0.3
1
0.4

Key Problems
 Gathering known ratings for matrix
 Extrapolate unknown ratings from
known ratings
 Mainly interested in high unknown ratings

 Evaluating extrapolation methods

Gathering Ratings
 Explicit
 Ask people to rate items
 Doesnt work well in practice people cant
be bothered

 Implicit
 Learn ratings from user actions
 e.g., purchase implies high rating
 What about low ratings?

Extrapolating Utilities
 Key problem: matrix U is sparse
 most people have not rated most items

 Three approaches
 Content-based
 Collaborative
 Hybrid

Content-based recommendations
 Main idea: recommend items to
customer C similar to previous items
rated highly by C
 Movie recommendations
 recommend movies with same actor(s),
director, genre,

 Websites, blogs, news


 recommend other sites with similar
content

Plan of action
Item profiles
likes

build

recommend

match

Red
Circles
Triangles

User profile

Item Profiles
 For each item, create an item profile
 Profile is a set of features
 movies: author, title, actor, director,
 text: set of important words in document

 How to pick important words?


 Usual heuristic is TF.IDF (Term Frequency
times Inverse Doc Frequency)

TF.IDF
fij = frequency of term ti in document dj

ni = number of docs that mention term i


N = total number of docs

TF.IDF score wij = Tfij IDFi


Doc profile = set of words with highest
TF.IDF scores, together with their scores

User profiles and prediction


 User profile possibilities:
 Weighted average of rated item profiles
 Variation: weight by difference from average
rating for item


 Prediction heuristic
 Given user profile c and item profile s,
estimate u(c,s) = cos(c,s) = c.s/(|c||s|)
 Need efficient method to find items with
high utility: later

Limitations of content-based
approach
 Finding the appropriate features
 e.g., images, movies, music

 Overspecialization
 Never recommends items outside users
content profile
 People might have multiple interests

 Recommendations for new users


 How to build a profile?

Collaborative Filtering
 Consider user c
 Find set D of other users whose ratings
are similar to cs ratings
 Estimate users ratings based on ratings
of users in D

Similar users
 Let rx be the vector of user xs ratings
 Cosine similarity measure
 sim(x,y) = cos(rx , ry)

 Pearson correlation coefficient


 Sxy = items rated by both users x and y

Rating predictions
 Let D be the set of k users most similar to c
who have rated item s
 Possibilities for prediction function (item s):
 rcs = 1/k d in D rds
 rcs = (d in D sim(c,d) rds)/(d in D sim(c,d))
 Other options?

 Many tricks possible

Complexity
 Expensive step is finding k most similar
customers
 O(|U|)

 Too expensive to do at runtime


 Could pre-compute

 Nave precomputation takes time


O(N|U|)
 Stay tuned for how to do it faster!

 Can use clustering, partitioning as


alternatives, but quality degrades

Item-Item Collaborative Filtering


 So far: User-user collaborative filtering
 Another view
 For item s, find other similar items
 Estimate rating for item based on ratings for
similar items
 Can use same similarity metrics and
prediction functions as in user-user model

 In practice, it has been observed that


item-item often works better than useruser

Pros and cons of collaborative


filtering
 Works for any kind of item
 No feature selection needed

 New user problem


 New item problem
 Sparsity of rating matrix
 Cluster-based smoothing?
 Add more data!

Hybrid Methods
 Implement two or more different
recommenders and combine predictions
 Perhaps using a linear model

 Add content-based methods to


collaborative filtering
 item profiles for new item problem
 demographics to deal with new user
problem

Evaluating Predictions
 Compare predictions with known ratings
 Root-mean-square error (RMSE)

 Another approach: 0/1 model


 Coverage
 Number of items/users for which system
can make predictions
 Precision
 Accuracy of predictions
 Receiver operating characteristic (ROC)
 Tradeoff curve between false positives and
false negatives

Problems with Measures


 Narrow focus on accuracy sometimes
misses the point
 Prediction Diversity
 Prediction Context
 Order of predictions

 In practice, we care only to predict high


ratings
 RMSE might penalize a method that does
well for high ratings and badly for others

Finding similar vectors


 Common problem that comes up in
many settings
 Given a large number N of vectors in
some high-dimensional space (M
dimensions), find pairs of vectors that
have high similarity
 e.g., user profiles, item profiles

 Perfect set-up for next topic!


 Near-neighbor search in high dimensions

You might also like