0% found this document useful (0 votes)
41 views47 pages

07 Recsys1

The document discusses recommendation systems and collaborative filtering. It explains that collaborative filtering approaches estimate a user's ratings based on ratings from similar users. Specifically, it finds a set of users whose ratings are similar to the target user's ratings. Then it estimates the target user's ratings for items based on how the similar users rated those items. This approach harnesses the quality judgments of other users to make recommendations.

Uploaded by

Venom Max
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views47 pages

07 Recsys1

The document discusses recommendation systems and collaborative filtering. It explains that collaborative filtering approaches estimate a user's ratings based on ratings from similar users. Specifically, it finds a set of users whose ratings are similar to the target user's ratings. Then it estimates the target user's ratings for items based on how the similar users rated those items. This approach harnesses the quality judgments of other users to make recommendations.

Uploaded by

Venom Max
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

CS246: Mining Massive Datasets

Jure Leskovec, Stanford University


http://cs246.stanford.edu
¡ Customer X ¡ Customer Y
§ Buys Metallica CD § Does search on Metallica
§ Buys Megadeth CD § Recommender system
suggests Megadeth from
data collected about
customer X
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 2
Examples:

Search Recommendations

Items Products, web sites,


blogs, news items, …

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 3


¡ Shelf space is a scarce commodity for
traditional retailers
§ Also: TV networks, movie theaters,…

¡ Web enables near-zero-cost dissemination


of information about products
§ From scarcity to abundance

¡ More choice necessitates better filters


§ Recommendation engines
§ How Into Thin Air made Touching the Void
a bestseller: http://www.wired.com/wired/archive/12.10/tail.html
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 4
Source: Chris Anderson (2004)

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 5


Read http://www.wired.com/wired/archive/12.10/tail.html to learn more!
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 6
¡ Editorial and hand curated
§ List of favorites
§ Lists of “essential” items

¡ Simple aggregates
§ Top 10, Most Popular, Recent Uploads

¡ Tailored to individual users


Today class
§ Amazon, Netflix, …

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 7


¡ X = set of Customers
¡ S = set of Items

¡ Utility function u: X × S à R
§ R = set of ratings
§ R is a totally ordered set
§ e.g., 0-5 stars, real number in [0,1]

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 8


Avatar LOTR Matrix Pirates

Alice
1 0.2
Bob
0.5 0.3
Carol
0.2 1
David
0.4

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 9


¡ (1) Gathering “known” ratings for matrix
§ How to collect the data in the utility matrix

¡ (2) Extrapolate unknown ratings from the


known ones
§ Mainly interested in high unknown ratings
§ We are not interested in knowing what you don’t like
but what you like

¡ (3) Evaluating extrapolation methods


§ How to measure success/performance of
recommendation methods
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 10
¡ Explicit
§ Ask people to rate items
§ Doesn’t work well in practice – people
can’t be bothered
§ Crowdsourcing: Pay people to label items

¡ Implicit
§ Learn ratings from user actions
§ E.g., purchase implies high rating
§ What about low ratings?

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 11


¡ Key problem: Utility matrix U is sparse
§ Most people have not rated most items
§ Cold start:
§ New items have no ratings
§ New users have no history

¡ Three approaches to recommender systems:


§ 1) Content-based
§ 2) Collaborative Today!
§ 3) Latent factor based

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 12


2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 13
¡ Main idea: Recommend items to customer x
similar to previous items rated highly by x

Example:
¡ Movie recommendations
§ Recommend movies with same actor(s),
director, genre, …
¡ Websites, blogs, news
§ Recommend other sites with “similar” content

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 14


Item profiles
likes

build
recommend

match Red
Circles
Triangles

User profile
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 15
¡ For each item, create an item profile

¡ Profile is a set (vector) of features


§ Movies: author, title, actor, director,…
§ Text: Set of “important” words in document

¡ How to pick important features?


§ Usual heuristic from text mining is TF-IDF
(Term frequency * Inverse Doc Frequency)
§ Term … Feature
§ Document … Item
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 16
fij = frequency of term (feature) i in doc (item) j
Note: we normalize TF
to discount for “longer”
documents

ni = number of docs that mention term i


N = total number of docs

TF-IDF score: wij = TFij × IDFi


Doc profile = set of words with highest TF-IDF
scores, together with their scores

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 17


¡ User profile possibilities:
§ Weighted average of rated item profiles
§ Variation: weight by difference from average
rating for item
§…
¡ Prediction heuristic:
§ Given user profile x and item profile i, estimate
𝒙·𝒊
𝑢 𝒙, 𝒊 = cos(𝒙, 𝒊) =
| 𝒙 |⋅| 𝒊 |

Note: cosine similarity u(x,i) defined this way ranges [-1, 1],
to get it to [0,1] we take arc cos u(x,i).
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 18
¡ +: No need for data on other users
§ No cold-start or sparsity problems
¡ +: Able to recommend to users with
unique tastes
¡ +: Able to recommend new & unpopular items
§ No first-rater problem
¡ +: Able to provide explanations
§ Can provide explanations of recommended items by
listing content-features that caused an item to be
recommended
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 19
¡ –: Finding the appropriate features is hard
§ E.g., images, movies, music
¡ –: Recommendations for new users
§ How to build a user profile?
¡ –: Overspecialization
§ Never recommends items outside user’s
content profile
§ People might have multiple interests
§ Unable to exploit quality judgments of other users

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 20


Harnessing quality judgments of other users
¡ Consider user x

¡ Find set N of other x


users whose ratings
are “similar” to
x’s ratings N

¡ Estimate x’s ratings


based on ratings
of users in N

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 22


rx = [1, _, _, 1, 3]
ry = [1, _, 2, 2, _]

¡ Let rx be the vector of user x’s ratings


¡ Jaccard similarity measure: rx, ry as sets:
rx = {1, 4, 5}
§ Problem: Ignores the value of the rating ry = {1, 3, 4}

¡ Cosine similarity measure:


/0 ⋅/1 rx, ry as points:
§ sim(x, y) = cos(rx, ry) = rx = {1, 0, 0, 1, 3}
||/0 ||⋅||/1 || ry = {1, 0, 2, 2, 0}

§ Problem: Treats some missing ratings as “negative”


¡ Pearson correlation coefficient:
§ Sxy = items rated by both users x and y
∑𝒔∈𝑺𝒙𝒚 𝒓𝒙𝒔 − 𝒓𝒙 𝒓𝒚𝒔 − 𝒓𝒚
𝒔𝒊𝒎 𝒙, 𝒚 =
𝟐 𝟐
∑𝒔∈𝑺𝒙𝒚 𝒓𝒙𝒔 − 𝒓𝒙 ∑𝒔∈𝑺𝒙𝒚 𝒓𝒚𝒔 − 𝒓𝒚 rx, ry … avg.
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets
rating of x,
23
y
Cosine similarity: ∑𝒊 𝒓𝒙𝒊 ⋅ 𝒓𝒚𝒊
𝒔𝒊𝒎(𝒙, 𝒚) =
∑𝒊 𝒓𝟐𝒙𝒊 ⋅ ∑𝒊 𝒓𝟐𝒚𝒊

¡ Intuitively we want: sim(A, B) > sim(A, C)


¡ Jaccard similarity: 1/5 < 2/4
¡ Cosine similarity: 0.386 > 0.322
§ Considers missing ratings as “negative”
§ Solution: subtract the (row) mean sim A,B vs. A,C:
0.092 > -0.559
Notice cosine sim. is
correlation when
data is centered at 0
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 24
From similarity metric to recommendations:
¡ Let rx be the vector of user x’s ratings
¡ Let N be the set of k users most similar to x
who have rated item i
¡ Prediction for item i of user x:
?
§ 𝑟=> = ∑A∈B 𝑟A>
@ Shorthand:
𝒔𝒙𝒚 = 𝒔𝒊𝒎 𝒙, 𝒚
∑1∈E C01⋅/1D
§ 𝑟=> = ∑1∈E C01

§ Other options?
¡ Many other tricks possible…
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 25
¡ So far: User-user collaborative filtering
¡ Another view: Item-item
§ For item i, find other similar items
§ Estimate rating for item i based
on ratings for similar items
§ Can use same similarity metrics and
prediction functions as in user-user model

rxi =
å jÎN ( i ; x )
sij × rxj

å jÎN ( i ; x ) ij
s sij… similarity of items i and j
rxj…rating of user x on item j
N(i;x)… items similar to i and rated by user x
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 26
users
1 2 3 4 5 6 7 8 9 10 11 12

1 1 3 5 5 4

2 5 4 4 2 1 3
movies

3 2 4 1 2 3 4 3 5

4 2 4 5 4 2

5 4 3 4 2 2 5

6 1 3 3 2 4

- unknown rating - rating between 1 to 5

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 27


users
1 2 3 4 5 6 7 8 9 10 11 12

1 1 3 ? 5 5 4

2 5 4 4 2 1 3
movies

3 2 4 1 2 3 4 3 5

4 2 4 5 4 2

5 4 3 4 2 2 5

6 1 3 3 2 4

- estimate rating of movie 1 by user 5

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 28


users
1 2 3 4 5 6 7 8 9 10 11 12
sim(1,m)
1 1 3 ? 5 5 4 1.00
2 5 4 4 2 1 3 -0.18
movies

3 2 4 1 2 3 4 3 5 0.41
4 2 4 5 4 2 -0.10
5 4 3 4 2 2 5 -0.31
6 1 3 3 2 4 0.59
Here we use Pearson correlation as similarity:
Neighbor selection: 1) Subtract mean rating mi from each movie i
Identify movies similar to m1 = (1+3+5+5+4)/5 = 3.6
row 1: [-2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0]
movie 1, rated by user 5 2) Compute cosine similarities between rows
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 29
users
1 2 3 4 5 6 7 8 9 10 11 12
sim(1,m)
1 1 3 ? 5 5 4 1.00
2 5 4 4 2 1 3 -0.18
movies

3 2 4 1 2 3 4 3 5 0.41
4 2 4 5 4 2 -0.10
5 4 3 4 2 2 5 -0.31
6 1 3 3 2 4 0.59

Compute similarity weights:


s1,3=0.41, s1,6=0.59
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 30
users
1 2 3 4 5 6 7 8 9 10 11 12

1 1 3 2.6 5 5 4

2 5 4 4 2 1 3
movies

3 2 4 1 2 3 4 3 5

4 2 4 5 4 2

5 4 3 4 2 2 5

6 1 3 3 2 4

Predict by taking weighted average: ∑𝒋∈𝑵(𝒊;𝒙) 𝒔𝒊𝒋 ⋅ 𝒓𝒋𝒙


𝒓𝒊𝒙 =
r1.5 = (0.41*2 + 0.59*3) / (0.41+0.59) = 2.6 ∑𝒔𝒊𝒋
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 31
Before:
rxi =
å sr
jÎN ( i ; x ) ij xj

å s
jÎN ( i ; x ) ij

¡ Define similarity sij of items i and j


¡ Select k nearest neighbors N(i; x)
§ Items most similar to i, that were rated by user x
¡ Estimate rating rxi as the weighted average:

rxi = bxi +
å jÎN ( i ; x )
sij × ( rxj - bxj )
å s
jÎN ( i ; x ) ij
baseline estimate for rxi ¡ μ = overall mean movie rating
¡ bx = rating deviation of user x
𝒃𝒙𝒊 = 𝝁 + 𝒃𝒙 + 𝒃𝒊
= (avg. rating of user x) – μ
user x and item i
¡ bi = rating deviation of movie i
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 32
Avatar LOTR Matrix Pirates

Alice
1 0.8
Bob
0.5 0.3
Carol 0.9 1 0.8
David
1 0.4
¡ In practice, it has been observed that item-item
often works better than user-user
¡ Why? Items are simpler, users have multiple tastes
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 33
¡ + Works for any kind of item
§ No feature selection needed
¡ - Cold Start:
§ Need enough users in the system to find a match
¡ - Sparsity:
§ The user/ratings matrix is sparse
§ Hard to find users that have rated the same items
¡ - First rater:
§ Cannot recommend an item that has not been
previously rated
§ New items, Esoteric items
¡ - Popularity bias:
§ Cannot recommend items to someone with
unique taste
2/13/17
§ Tends to recommend popular items
Jure Leskovec, Stanford C246: Mining Massive Datasets 34
¡ Implement two or more different
recommenders and combine predictions
§ Perhaps using a linear model

¡ Add content-based methods to


collaborative filtering
§ Item profiles for new item problem
§ Demographics to deal with new user problem

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 35


- Evaluation
- Error metrics
- Complexity / Speed

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 36


movies

1 3 4
3 5 5
4 5 5
3
users
3
2 2 2
5
2 1 1
3 3
1

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 37


movies

1 3 4
3 5 5
4 5 5
3
users
3
2 ? ?
Test Data Set
?
2 1 ?
3 ?
1

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 38


¡ Compare predictions with known ratings
§ Root-mean-square error (RMSE)
M
§ ∗
∑=> 𝑟=> − 𝑟=> where 𝒓𝒙𝒊 is predicted, 𝒓∗𝒙𝒊 is the true rating of x on i
§ Precision at top 10:
§ % of those in top 10
§ Rank Correlation:
§ Spearman’s correlation between system’s and user’s complete rankings

¡ Another approach: 0/1 model


§ Coverage:
§ Number of items/users for which system can make predictions
§ Precision:
§ Accuracy of predictions
§ Receiver operating characteristic (AUC ROC)
§ Tradeoff curve between false positives and false negatives

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 39


¡ Narrow focus on accuracy sometimes
misses the point
§ Prediction Diversity
§ Prediction Context
§ Order of predictions
¡ In practice, we care only to predict high
ratings:
§ RMSE might penalize a method that does well
for high ratings and badly for others

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 40


¡ Expensive step is finding k most similar
customers: O(|X|)
¡ Too expensive to do at runtime
§ Could pre-compute
¡ Naïve pre-computation takes time O(k ·|X|)
§ X … set of customers
¡ We already know how to do this!
§ Near-neighbor search in high dimensions (LSH)
§ Clustering
§ Dimensionality reduction
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 41
¡ Leverage all the data
§ Don’t try to reduce data size in an
effort to make fancy algorithms work
§ Simple methods on large data do best

¡ Add more data


§ e.g., add IMDB data on genres

¡ More data beats better algorithms


http://anand.typepad.com/datawocky/2008/03/more-data-usual.html

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 42


2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 43
¡ Training data
§ 100 million ratings, 480,000 users, 17,770 movies
§ 6 years of data: 2000-2005
¡ Test data
§ Last few ratings of each user (2.8 million)
§ Evaluation criterion: root mean squared error
(RMSE)
§ Netflix Cinematch RMSE: 0.9514
¡ Competition
§ 2700+ teams
§ $1 million prize for 10% improvement on Cinematch
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 44
¡ Next topic: Recommendations via
Latent Factor models
Overview of Coffee Varieties
I2
I1C6
L5 Exotic
S5 C1
S2S1 S7
S6
Exoticness / Price

R4 C7
S3 R6 R3
R2
C2C4 a1 Flavored
B2 L4C3 S4 TE FR
F3 F2
B1 F9 F8 F1
F0 F6
R5 F5
R8
Popular Roasts F4
and Blends
Com plexity of Flavor
The bubbles above represent products sized by sales volume.
Products close to each other are recommended to each other.
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 45
[Bellkor Team]

serious Braveheart
The Color Amadeus
Purple

Sense and Lethal


Sensibility Weapon
Geared Ocean’s 11 Geared
towards towards
females males

Dave

The Lion Dumb and


King Dumber
The Princess Independence
Diaries Day Gus

escapist
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 46
Koren, Bell, Volinksy, IEEE Computer, 2009
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 47

You might also like