CS345A Data Mining: Recommendation Systems
CS345A Data Mining: Recommendation Systems
Data Mining
Recommendation Systems
Anand Rajaraman
Recommendations
Search
Recommendations
Items
Recommendation Types
Editorial
Simple aggregates
Top 10, Most Popular, Recent Uploads
Formal Model
C = set of Customers
S = set of Items
Utility function u: C S R
R = set of ratings
R is a totally ordered set
e.g., 0-5 stars, real number in [0,1]
Utility Matrix
Avatar
Alice
Bob
Carol
David
LOTR
Matrix
0.2
0.5
0.2
Pirates
0.3
1
0.4
Key Problems
Gathering known ratings for matrix
Extrapolate unknown ratings from
known ratings
Mainly interested in high unknown ratings
Gathering Ratings
Explicit
Ask people to rate items
Doesnt work well in practice people cant
be bothered
Implicit
Learn ratings from user actions
e.g., purchase implies high rating
What about low ratings?
Extrapolating Utilities
Key problem: matrix U is sparse
most people have not rated most items
Three approaches
Content-based
Collaborative
Hybrid
Content-based recommendations
Main idea: recommend items to
customer C similar to previous items
rated highly by C
Movie recommendations
recommend movies with same actor(s),
director, genre,
Plan of action
Item profiles
likes
build
recommend
match
Red
Circles
Triangles
User profile
Item Profiles
For each item, create an item profile
Profile is a set of features
movies: author, title, actor, director,
text: set of important words in document
TF.IDF
fij = frequency of term ti in document dj
Prediction heuristic
Given user profile c and item profile s,
estimate u(c,s) = cos(c,s) = c.s/(|c||s|)
Need efficient method to find items with
high utility: later
Limitations of content-based
approach
Finding the appropriate features
e.g., images, movies, music
Overspecialization
Never recommends items outside users
content profile
People might have multiple interests
Collaborative Filtering
Consider user c
Find set D of other users whose ratings
are similar to cs ratings
Estimate users ratings based on ratings
of users in D
Similar users
Let rx be the vector of user xs ratings
Cosine similarity measure
sim(x,y) = cos(rx , ry)
Rating predictions
Let D be the set of k users most similar to c
who have rated item s
Possibilities for prediction function (item s):
rcs = 1/k d in D rds
rcs = (d in D sim(c,d) rds)/(d in D sim(c,d))
Other options?
Complexity
Expensive step is finding k most similar
customers
O(|U|)
Hybrid Methods
Implement two or more different
recommenders and combine predictions
Perhaps using a linear model
Evaluating Predictions
Compare predictions with known ratings
Root-mean-square error (RMSE)