0% found this document useful (0 votes)

41 views47 pages

07 Recsys1

The document discusses recommendation systems and collaborative filtering. It explains that collaborative filtering approaches estimate a user's ratings based on ratings from similar users. Specifically, it finds a set of users whose ratings are similar to the target user's ratings. Then it estimates the target user's ratings for items based on how the similar users rated those items. This approach harnesses the quality judgments of other users to make recommendations.

Uploaded by

Venom Max

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views47 pages

07 Recsys1

Uploaded by

Venom Max

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

CS246: Mining Massive Datasets

Jure Leskovec, Stanford University

http://cs246.stanford.edu
¡ Customer X ¡ Customer Y
§ Buys Metallica CD § Does search on Metallica
§ Buys Megadeth CD § Recommender system
suggests Megadeth from
data collected about
customer X
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 2
Examples:

Search Recommendations

Items Products, web sites,

blogs, news items, …

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 3

¡ Shelf space is a scarce commodity for
traditional retailers
§ Also: TV networks, movie theaters,…

¡ Web enables near-zero-cost dissemination

of information about products
§ From scarcity to abundance

¡ More choice necessitates better filters

§ Recommendation engines
§ How Into Thin Air made Touching the Void
a bestseller: http://www.wired.com/wired/archive/12.10/tail.html
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 4
Source: Chris Anderson (2004)

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 5

Read http://www.wired.com/wired/archive/12.10/tail.html to learn more!
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 6
¡ Editorial and hand curated
§ List of favorites
§ Lists of “essential” items

¡ Simple aggregates
§ Top 10, Most Popular, Recent Uploads

¡ Tailored to individual users

Today class
§ Amazon, Netflix, …

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 7

¡ X = set of Customers
¡ S = set of Items

¡ Utility function u: X × S à R
§ R = set of ratings
§ R is a totally ordered set
§ e.g., 0-5 stars, real number in [0,1]

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 8

Avatar LOTR Matrix Pirates

Alice
1 0.2
Bob
0.5 0.3
Carol
0.2 1
David
0.4

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 9

¡ (1) Gathering “known” ratings for matrix
§ How to collect the data in the utility matrix

¡ (2) Extrapolate unknown ratings from the

known ones
§ Mainly interested in high unknown ratings
§ We are not interested in knowing what you don’t like
but what you like

¡ (3) Evaluating extrapolation methods

§ How to measure success/performance of
recommendation methods
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 10
¡ Explicit
§ Ask people to rate items
§ Doesn’t work well in practice – people
can’t be bothered
§ Crowdsourcing: Pay people to label items

¡ Implicit
§ Learn ratings from user actions
§ E.g., purchase implies high rating
§ What about low ratings?

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 11

¡ Key problem: Utility matrix U is sparse
§ Most people have not rated most items
§ Cold start:
§ New items have no ratings
§ New users have no history

¡ Three approaches to recommender systems:

§ 1) Content-based
§ 2) Collaborative Today!
§ 3) Latent factor based

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 12

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 13
¡ Main idea: Recommend items to customer x
similar to previous items rated highly by x

Example:
¡ Movie recommendations
§ Recommend movies with same actor(s),
director, genre, …
¡ Websites, blogs, news
§ Recommend other sites with “similar” content

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 14

Item profiles
likes

build
recommend

match Red
Circles
Triangles

User profile
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 15
¡ For each item, create an item profile

¡ Profile is a set (vector) of features

§ Movies: author, title, actor, director,…
§ Text: Set of “important” words in document

¡ How to pick important features?

§ Usual heuristic from text mining is TF-IDF
(Term frequency * Inverse Doc Frequency)
§ Term … Feature
§ Document … Item
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 16
fij = frequency of term (feature) i in doc (item) j
Note: we normalize TF
to discount for “longer”
documents

ni = number of docs that mention term i

N = total number of docs

TF-IDF score: wij = TFij × IDFi

Doc profile = set of words with highest TF-IDF
scores, together with their scores

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 17

¡ User profile possibilities:
§ Weighted average of rated item profiles
§ Variation: weight by difference from average
rating for item
§…
¡ Prediction heuristic:
§ Given user profile x and item profile i, estimate
𝒙·𝒊
𝑢 𝒙, 𝒊 = cos(𝒙, 𝒊) =
| 𝒙 |⋅| 𝒊 |

Note: cosine similarity u(x,i) defined this way ranges [-1, 1],
to get it to [0,1] we take arc cos u(x,i).
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 18
¡ +: No need for data on other users
§ No cold-start or sparsity problems
¡ +: Able to recommend to users with
unique tastes
¡ +: Able to recommend new & unpopular items
§ No first-rater problem
¡ +: Able to provide explanations
§ Can provide explanations of recommended items by
listing content-features that caused an item to be
recommended
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 19
¡ –: Finding the appropriate features is hard
§ E.g., images, movies, music
¡ –: Recommendations for new users
§ How to build a user profile?
¡ –: Overspecialization
§ Never recommends items outside user’s
content profile
§ People might have multiple interests
§ Unable to exploit quality judgments of other users

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 20

Harnessing quality judgments of other users
¡ Consider user x

¡ Find set N of other x

users whose ratings
are “similar” to
x’s ratings N

¡ Estimate x’s ratings

based on ratings
of users in N

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 22

rx = [1, _, _, 1, 3]
ry = [1, _, 2, 2, _]

¡ Let rx be the vector of user x’s ratings

¡ Jaccard similarity measure: rx, ry as sets:
rx = {1, 4, 5}
§ Problem: Ignores the value of the rating ry = {1, 3, 4}

¡ Cosine similarity measure:

/0 ⋅/1 rx, ry as points:
§ sim(x, y) = cos(rx, ry) = rx = {1, 0, 0, 1, 3}
||/0 ||⋅||/1 || ry = {1, 0, 2, 2, 0}

§ Problem: Treats some missing ratings as “negative”

¡ Pearson correlation coefficient:
§ Sxy = items rated by both users x and y
∑𝒔∈𝑺𝒙𝒚 𝒓𝒙𝒔 − 𝒓𝒙 𝒓𝒚𝒔 − 𝒓𝒚
𝒔𝒊𝒎 𝒙, 𝒚 =
𝟐 𝟐
∑𝒔∈𝑺𝒙𝒚 𝒓𝒙𝒔 − 𝒓𝒙 ∑𝒔∈𝑺𝒙𝒚 𝒓𝒚𝒔 − 𝒓𝒚 rx, ry … avg.
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets
rating of x,
23
y
Cosine similarity: ∑𝒊 𝒓𝒙𝒊 ⋅ 𝒓𝒚𝒊
𝒔𝒊𝒎(𝒙, 𝒚) =
∑𝒊 𝒓𝟐𝒙𝒊 ⋅ ∑𝒊 𝒓𝟐𝒚𝒊

¡ Intuitively we want: sim(A, B) > sim(A, C)

¡ Jaccard similarity: 1/5 < 2/4
¡ Cosine similarity: 0.386 > 0.322
§ Considers missing ratings as “negative”
§ Solution: subtract the (row) mean sim A,B vs. A,C:
0.092 > -0.559
Notice cosine sim. is
correlation when
data is centered at 0
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 24
From similarity metric to recommendations:
¡ Let rx be the vector of user x’s ratings
¡ Let N be the set of k users most similar to x
who have rated item i
¡ Prediction for item i of user x:
?
§ 𝑟=> = ∑A∈B 𝑟A>
@ Shorthand:
𝒔𝒙𝒚 = 𝒔𝒊𝒎 𝒙, 𝒚
∑1∈E C01⋅/1D
§ 𝑟=> = ∑1∈E C01

§ Other options?
¡ Many other tricks possible…
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 25
¡ So far: User-user collaborative filtering
¡ Another view: Item-item
§ For item i, find other similar items
§ Estimate rating for item i based
on ratings for similar items
§ Can use same similarity metrics and
prediction functions as in user-user model

rxi =
å jÎN ( i ; x )
sij × rxj

å jÎN ( i ; x ) ij
s sij… similarity of items i and j
rxj…rating of user x on item j
N(i;x)… items similar to i and rated by user x
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 26
users
1 2 3 4 5 6 7 8 9 10 11 12

1 1 3 5 5 4

2 5 4 4 2 1 3
movies

3 2 4 1 2 3 4 3 5

4 2 4 5 4 2

5 4 3 4 2 2 5

6 1 3 3 2 4

- unknown rating - rating between 1 to 5

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 27

users
1 2 3 4 5 6 7 8 9 10 11 12

1 1 3 ? 5 5 4

2 5 4 4 2 1 3
movies

3 2 4 1 2 3 4 3 5

4 2 4 5 4 2

5 4 3 4 2 2 5

6 1 3 3 2 4

- estimate rating of movie 1 by user 5

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 28

users
1 2 3 4 5 6 7 8 9 10 11 12
sim(1,m)
1 1 3 ? 5 5 4 1.00
2 5 4 4 2 1 3 -0.18
movies

3 2 4 1 2 3 4 3 5 0.41
4 2 4 5 4 2 -0.10
5 4 3 4 2 2 5 -0.31
6 1 3 3 2 4 0.59
Here we use Pearson correlation as similarity:
Neighbor selection: 1) Subtract mean rating mi from each movie i
Identify movies similar to m1 = (1+3+5+5+4)/5 = 3.6
row 1: [-2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0]
movie 1, rated by user 5 2) Compute cosine similarities between rows
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 29
users
1 2 3 4 5 6 7 8 9 10 11 12
sim(1,m)
1 1 3 ? 5 5 4 1.00
2 5 4 4 2 1 3 -0.18
movies

3 2 4 1 2 3 4 3 5 0.41
4 2 4 5 4 2 -0.10
5 4 3 4 2 2 5 -0.31
6 1 3 3 2 4 0.59

Compute similarity weights:

s1,3=0.41, s1,6=0.59
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 30
users
1 2 3 4 5 6 7 8 9 10 11 12

1 1 3 2.6 5 5 4

2 5 4 4 2 1 3
movies

3 2 4 1 2 3 4 3 5

4 2 4 5 4 2

5 4 3 4 2 2 5

6 1 3 3 2 4

Predict by taking weighted average: ∑𝒋∈𝑵(𝒊;𝒙) 𝒔𝒊𝒋 ⋅ 𝒓𝒋𝒙

𝒓𝒊𝒙 =
r1.5 = (0.41*2 + 0.59*3) / (0.41+0.59) = 2.6 ∑𝒔𝒊𝒋
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 31
Before:
rxi =
å sr
jÎN ( i ; x ) ij xj

å s
jÎN ( i ; x ) ij

¡ Define similarity sij of items i and j

¡ Select k nearest neighbors N(i; x)
§ Items most similar to i, that were rated by user x
¡ Estimate rating rxi as the weighted average:

rxi = bxi +
å jÎN ( i ; x )
sij × ( rxj - bxj )
å s
jÎN ( i ; x ) ij
baseline estimate for rxi ¡ μ = overall mean movie rating
¡ bx = rating deviation of user x
𝒃𝒙𝒊 = 𝝁 + 𝒃𝒙 + 𝒃𝒊
= (avg. rating of user x) – μ
user x and item i
¡ bi = rating deviation of movie i
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 32
Avatar LOTR Matrix Pirates

Alice
1 0.8
Bob
0.5 0.3
Carol 0.9 1 0.8
David
1 0.4
¡ In practice, it has been observed that item-item
often works better than user-user
¡ Why? Items are simpler, users have multiple tastes
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 33
¡ + Works for any kind of item
§ No feature selection needed
¡ - Cold Start:
§ Need enough users in the system to find a match
¡ - Sparsity:
§ The user/ratings matrix is sparse
§ Hard to find users that have rated the same items
¡ - First rater:
§ Cannot recommend an item that has not been
previously rated
§ New items, Esoteric items
¡ - Popularity bias:
§ Cannot recommend items to someone with
unique taste
2/13/17
§ Tends to recommend popular items
Jure Leskovec, Stanford C246: Mining Massive Datasets 34
¡ Implement two or more different
recommenders and combine predictions
§ Perhaps using a linear model

¡ Add content-based methods to

collaborative filtering
§ Item profiles for new item problem
§ Demographics to deal with new user problem

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 35

- Evaluation
- Error metrics
- Complexity / Speed

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 36

movies

1 3 4
3 5 5
4 5 5
3
users
3
2 2 2
5
2 1 1
3 3
1

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 37

movies

1 3 4
3 5 5
4 5 5
3
users
3
2 ? ?
Test Data Set
?
2 1 ?
3 ?
1

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 38

¡ Compare predictions with known ratings
§ Root-mean-square error (RMSE)
M
§ ∗
∑=> 𝑟=> − 𝑟=> where 𝒓𝒙𝒊 is predicted, 𝒓∗𝒙𝒊 is the true rating of x on i
§ Precision at top 10:
§ % of those in top 10
§ Rank Correlation:
§ Spearman’s correlation between system’s and user’s complete rankings

¡ Another approach: 0/1 model

§ Coverage:
§ Number of items/users for which system can make predictions
§ Precision:
§ Accuracy of predictions
§ Receiver operating characteristic (AUC ROC)
§ Tradeoff curve between false positives and false negatives

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 39

¡ Narrow focus on accuracy sometimes
misses the point
§ Prediction Diversity
§ Prediction Context
§ Order of predictions
¡ In practice, we care only to predict high
ratings:
§ RMSE might penalize a method that does well
for high ratings and badly for others

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 40

¡ Expensive step is finding k most similar
customers: O(|X|)
¡ Too expensive to do at runtime
§ Could pre-compute
¡ Naïve pre-computation takes time O(k ·|X|)
§ X … set of customers
¡ We already know how to do this!
§ Near-neighbor search in high dimensions (LSH)
§ Clustering
§ Dimensionality reduction
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 41
¡ Leverage all the data
§ Don’t try to reduce data size in an
effort to make fancy algorithms work
§ Simple methods on large data do best

¡ Add more data

§ e.g., add IMDB data on genres

¡ More data beats better algorithms

http://anand.typepad.com/datawocky/2008/03/more-data-usual.html

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 42

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 43
¡ Training data
§ 100 million ratings, 480,000 users, 17,770 movies
§ 6 years of data: 2000-2005
¡ Test data
§ Last few ratings of each user (2.8 million)
§ Evaluation criterion: root mean squared error
(RMSE)
§ Netflix Cinematch RMSE: 0.9514
¡ Competition
§ 2700+ teams
§ $1 million prize for 10% improvement on Cinematch
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 44
¡ Next topic: Recommendations via
Latent Factor models
Overview of Coffee Varieties
I2
I1C6
L5 Exotic
S5 C1
S2S1 S7
S6
Exoticness / Price

R4 C7
S3 R6 R3
R2
C2C4 a1 Flavored
B2 L4C3 S4 TE FR
F3 F2
B1 F9 F8 F1
F0 F6
R5 F5
R8
Popular Roasts F4
and Blends
Com plexity of Flavor
The bubbles above represent products sized by sales volume.
Products close to each other are recommended to each other.
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 45
[Bellkor Team]

serious Braveheart
The Color Amadeus
Purple

Sense and Lethal

Sensibility Weapon
Geared Ocean’s 11 Geared
towards towards
females males

Dave

The Lion Dumb and

King Dumber
The Princess Independence
Diaries Day Gus

escapist
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 46
Koren, Bell, Volinksy, IEEE Computer, 2009
2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 47

8 Recommender
No ratings yet
8 Recommender
139 pages
Defensexin
No ratings yet
Defensexin
87 pages
08 Recsys2
No ratings yet
08 Recsys2
60 pages
06-Dim Red
No ratings yet
06-Dim Red
61 pages
第十讲-Recommender Systems
No ratings yet
第十讲-Recommender Systems
81 pages
ch03 LSH
No ratings yet
ch03 LSH
58 pages
TIMO Final 2020-2021 P3
100% (3)
TIMO Final 2020-2021 P3
5 pages
Module 2
No ratings yet
Module 2
53 pages
ch09 Recsys1
No ratings yet
ch09 Recsys1
43 pages
DM - Lecture 5
No ratings yet
DM - Lecture 5
75 pages
Module5 Recommender Systems PartB
No ratings yet
Module5 Recommender Systems PartB
57 pages
18 Advertising
No ratings yet
18 Advertising
48 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
58 pages
DS - Module 4
No ratings yet
DS - Module 4
57 pages
07 Recsys1
No ratings yet
07 Recsys1
48 pages
Recommendation System
No ratings yet
Recommendation System
17 pages
ch-09 - Part 1
No ratings yet
ch-09 - Part 1
22 pages
Slides Lecture 2 RecSys
No ratings yet
Slides Lecture 2 RecSys
86 pages
Collaborativefiltering 21
No ratings yet
Collaborativefiltering 21
72 pages
CS583 Recommender Systems
No ratings yet
CS583 Recommender Systems
40 pages
.Trashed-1724941095-Recommender Systems
No ratings yet
.Trashed-1724941095-Recommender Systems
30 pages
Week 13
No ratings yet
Week 13
26 pages
T10 Recommender System
No ratings yet
T10 Recommender System
45 pages
Unit - IV
No ratings yet
Unit - IV
78 pages
Miranda 2008 A
No ratings yet
Miranda 2008 A
5 pages
Is593-Lecture04 Recommendation Systems
No ratings yet
Is593-Lecture04 Recommendation Systems
51 pages
HW 1
No ratings yet
HW 1
9 pages
RecommenderSystems1 Overview 1
No ratings yet
RecommenderSystems1 Overview 1
11 pages
IJE - Volume 29 - Issue 6 - Pages 788-796
No ratings yet
IJE - Volume 29 - Issue 6 - Pages 788-796
9 pages
UAP DOC 203 - Specialized Allied Services
100% (1)
UAP DOC 203 - Specialized Allied Services
11 pages
AIML Presentation
No ratings yet
AIML Presentation
21 pages
Online Movie Recommendation System (Omres) : Yusuf Aytaş Kemal Eroğlu Mustafa Gündoğan Fethi Burak Sazoğlu
No ratings yet
Online Movie Recommendation System (Omres) : Yusuf Aytaş Kemal Eroğlu Mustafa Gündoğan Fethi Burak Sazoğlu
21 pages
Unsupervised Learning Algorithm 1
No ratings yet
Unsupervised Learning Algorithm 1
3 pages
Incremental Collaborative Filtering For Binary Ratings: December 2008
No ratings yet
Incremental Collaborative Filtering For Binary Ratings: December 2008
5 pages
BD - Lecture07 - RecSys1
No ratings yet
BD - Lecture07 - RecSys1
45 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
42 pages
Lec15-S Sarkar
No ratings yet
Lec15-S Sarkar
12 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
36 pages
Movie Recommendation Report
No ratings yet
Movie Recommendation Report
27 pages
16 Recommender Systems PDF
No ratings yet
16 Recommender Systems PDF
6 pages
Lecture 1 - Collaborative Filtering
No ratings yet
Lecture 1 - Collaborative Filtering
27 pages
RecSys Updated
No ratings yet
RecSys Updated
37 pages
CS345A Data Mining: Recommendation Systems
No ratings yet
CS345A Data Mining: Recommendation Systems
26 pages
Implementation and Comparison of Recommender Systems Using Various Models
100% (1)
Implementation and Comparison of Recommender Systems Using Various Models
13 pages
2404 16177v1
No ratings yet
2404 16177v1
6 pages
12 Recsys 1
No ratings yet
12 Recsys 1
11 pages
LondonR - Professional Matchmaking in R - Duncan Stoddard - 20160405-1
No ratings yet
LondonR - Professional Matchmaking in R - Duncan Stoddard - 20160405-1
28 pages
A Review of Information Filtering-CF
No ratings yet
A Review of Information Filtering-CF
47 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
49 pages
Filtering and Recommender Systems: Content-Based and Collaborative
No ratings yet
Filtering and Recommender Systems: Content-Based and Collaborative
30 pages
L6 Recommendation
No ratings yet
L6 Recommendation
56 pages
Movie Recommendation System Using Cosine Similarity and KNN: II. Related Work
No ratings yet
Movie Recommendation System Using Cosine Similarity and KNN: II. Related Work
4 pages
E96660695201532
No ratings yet
E96660695201532
5 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
13 pages
Recommendation System Based On Collaborative Filtering: Zheng Wen December 12, 2008
No ratings yet
Recommendation System Based On Collaborative Filtering: Zheng Wen December 12, 2008
10 pages
An Introduction To Data Mining IIT Bombay
No ratings yet
An Introduction To Data Mining IIT Bombay
48 pages
Recommendation Lecture Great Learning 01 A PR 2017
No ratings yet
Recommendation Lecture Great Learning 01 A PR 2017
36 pages
Movie Recommendation System: CSN-382 Project
No ratings yet
Movie Recommendation System: CSN-382 Project
25 pages
Recommender Systems
No ratings yet
Recommender Systems
12 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
48 pages
English in Common 4 Workbook 7-1
No ratings yet
English in Common 4 Workbook 7-1
9 pages
Dynamic Response of First Order Systems in Series
No ratings yet
Dynamic Response of First Order Systems in Series
15 pages
Bob Proctor Details
100% (1)
Bob Proctor Details
3 pages
(Hotel Name) Feedback Form: Customer Name: Address: Email/Phone Account
No ratings yet
(Hotel Name) Feedback Form: Customer Name: Address: Email/Phone Account
2 pages
Texto Ingles Informatica
100% (1)
Texto Ingles Informatica
2 pages
Chapter 06. Engineering Economics
No ratings yet
Chapter 06. Engineering Economics
35 pages
Guidelines For Draft Data Set
No ratings yet
Guidelines For Draft Data Set
1 page
Fourth Quarter Week 7 (Day 1 - 2) : For Teachers
No ratings yet
Fourth Quarter Week 7 (Day 1 - 2) : For Teachers
17 pages
Best Size For Refinery and Tankers
No ratings yet
Best Size For Refinery and Tankers
2 pages
Group Planning Exercise - 2
No ratings yet
Group Planning Exercise - 2
1 page
Board Practical Timetable2025
No ratings yet
Board Practical Timetable2025
9 pages
OP09 - RSOPC Gateway: Presenter: Christopher Rogers Operate
No ratings yet
OP09 - RSOPC Gateway: Presenter: Christopher Rogers Operate
42 pages
Presentation
No ratings yet
Presentation
13 pages
Professional Development
No ratings yet
Professional Development
3 pages
Components of The Universe Syllabus
No ratings yet
Components of The Universe Syllabus
5 pages
Young Artist Society: Project Proposal W.O.A.H. - Wall of Ars Haven
No ratings yet
Young Artist Society: Project Proposal W.O.A.H. - Wall of Ars Haven
4 pages
ETN - Equipamentos Industriais
No ratings yet
ETN - Equipamentos Industriais
1 page
Hartree Fock Intro
No ratings yet
Hartree Fock Intro
51 pages
BY Parth Sarthi Mba HR
No ratings yet
BY Parth Sarthi Mba HR
22 pages
Lesson Plan Form: Ashland University
No ratings yet
Lesson Plan Form: Ashland University
2 pages
Revised Affinity Laws
No ratings yet
Revised Affinity Laws
13 pages
My Writing Portfolio: Arw1 Victor Castillejo
No ratings yet
My Writing Portfolio: Arw1 Victor Castillejo
28 pages
RDUK-Family-Report.pdfyfulllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllliugggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg
No ratings yet
RDUK-Family-Report.pdfyfulllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllliugggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg
37 pages
De La Salle University - Dasmariñas: Mathematics and Statistics Department
No ratings yet
De La Salle University - Dasmariñas: Mathematics and Statistics Department
1 page
About The Affluent Worker de
No ratings yet
About The Affluent Worker de
18 pages
Indirizzi: LI04, EA03 - LICEO LINGUISTICO Tema Di: Lingua E Cultura Straniera 1 (Inglese) Lingua E Cultura Straniera 3 (Francese)
No ratings yet
Indirizzi: LI04, EA03 - LICEO LINGUISTICO Tema Di: Lingua E Cultura Straniera 1 (Inglese) Lingua E Cultura Straniera 3 (Francese)
2 pages
Android - Simple Tab Bar Example
No ratings yet
Android - Simple Tab Bar Example
7 pages
Operations Strategy
No ratings yet
Operations Strategy
4 pages
Design and Build It in the Dirt
From Everand
Design and Build It in the Dirt
Nikole Bethea
No ratings yet
Design and Build It Below
From Everand
Design and Build It Below
Nikole Bethea
No ratings yet

07 Recsys1

Uploaded by

07 Recsys1

Uploaded by

CS246: Mining Massive Datasets

Jure Leskovec, Stanford University

Items Products, web sites,

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 3

¡ Web enables near-zero-cost dissemination

¡ More choice necessitates better filters

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 5

¡ Tailored to individual users

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 7

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 8

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 9

¡ (2) Extrapolate unknown ratings from the

¡ (3) Evaluating extrapolation methods

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 11

¡ Three approaches to recommender systems:

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 12

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 14

¡ Profile is a set (vector) of features

¡ How to pick important features?

ni = number of docs that mention term i

TF-IDF score: wij = TFij × IDFi

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 17

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 20

¡ Find set N of other x

¡ Estimate x’s ratings

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 22

¡ Let rx be the vector of user x’s ratings

¡ Cosine similarity measure:

§ Problem: Treats some missing ratings as “negative”

¡ Intuitively we want: sim(A, B) > sim(A, C)

- unknown rating - rating between 1 to 5

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 27

- estimate rating of movie 1 by user 5

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 28

Compute similarity weights:

Predict by taking weighted average: ∑𝒋∈𝑵(𝒊;𝒙) 𝒔𝒊𝒋 ⋅ 𝒓𝒋𝒙

¡ Define similarity sij of items i and j

¡ Add content-based methods to

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 35

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 36

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 37

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 38

¡ Another approach: 0/1 model

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 39

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 40

¡ Add more data

¡ More data beats better algorithms

2/13/17 Jure Leskovec, Stanford C246: Mining Massive Datasets 42

Sense and Lethal

The Lion Dumb and

You might also like