0% found this document useful (0 votes)

116 views6 pages

16 Recommender Systems PDF

Recommender systems are an important application of machine learning that help recommend new content to users. They learn patterns from a user's past ratings and purchases to predict what else they might like. There are two main approaches: content-based recommendation looks at attributes of the items to find similarities, while collaborative filtering learns a user's preferences from how they and similar users rated items, without relying on item attributes. Collaborative filtering can learn its own features by starting with random user/item representations and iteratively improving them using each other's predictions until the system converges on good parameters.

Uploaded by

marc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

116 views6 pages

16 Recommender Systems PDF

Uploaded by

marc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

16: Recommender Systems

Previous Next Index

Recommender systems - introduction

Two motivations for talking about recommender systems
Important application of ML systems
Many technology companies find recommender systems to be absolutely key
Think about websites (amazon, Ebay, iTunes genius)
Try and recommend new content for you based on passed purchase
Substantial part of Amazon's revenue generation
Improvement in recommender system performance can bring in more income
Kind of a funny problem
In academic learning, recommender systems receives a small amount of attention
But in industry it's an absolutely crucial tool
Talk about the big ideas in machine learning
Not so much a technique, but an idea
As soon, features are really important
There's a big idea in machine learning that for some problems you can learn what a good set of features are
So not select those features but learn them
Recommender systems do this - try and identify the crucial and relevant features

Example - predict movie ratings

You're a company who sells movies

You let users rate movies using a 1-5 star rating
To make the example nicer, allow 0-5 (makes math easier)
You have five movies
And you have four users
Admittedly, business isn't going well, but you're optimistic about the future as a result of your truly outstanding (if limited)
inventory

To introduce some notation

nu - Number of users (called ?nu occasionally as we can't subscript in superscript)
nm - Number of movies
r(i, j) - 1 if user j has rated movie i (i.e. bitmap)
y(i,j) - rating given by user j to move i (defined only if r(i,j) = 1
So for this example
nu = 4
nm = 5
Summary of scoring
Alice and Bob gave good ratings to rom coms, but low scores to action films
Carol and Dave game good ratings for action films but low ratings for rom coms
We have the data given above
The problem is as follows
Given r(i,j) and y(i,j) - go through and try and predict missing values (?s)
Come up with a learning algorithm that can fill in these missing values

Content based recommendation

Using our example above, how do we predict?
For each movie we have a feature which measure degree to which each film is a
Romance (x1 )
Action (x2 )

If we have features like these, each film can be recommended by a feature vector
Add an extra feature which is x0 = 1 for each film
So for each film we have a [3 x 1] vector, which for film number 1 ("Love at Last") would be

i.e. for our dataset we have

{x1 , x2 , x3 , x4 , x5 }
Where each of these is a [3x1] vector with an x0 = 1 and then a romance and an action score
To be consistent with our notation, n is going to be the number of features NOT counting the x0 term, so n = 2
We could treat each rating for each user as a separate linear regression problem
For each user j we could learn a parameter vector
Then predict that user j will rate movie i with
(θj)T xi = stars
inner product of parameter vector and features
So, lets take user 1 (Alice) and see what she makes of the modern classic Cute Puppies of Love (CPOL)
We have some parameter vector (θ1 ) associated with Alice
We'll explain later how we derived these values, but for now just take it that we have a vector

CPOL has a parameter vector ( x3 ) associated with it

Our prediction will be equal to

(θ1 )T x3 = (0 * 1) + (5 * 0.99) + (0 * 0)
= 4.95
Which may seem like a reasonable value
All we're doing here is applying a linear regression method for each user
So we determine a future rating based on their interest in romance and action based on previous films
We should also add one final piece of notation
mj, - Number of movies rated by the user (j)

How do we learn (θj)

Create some parameters which give values as close as those seen in the data when applied

Sum over all values of i (all movies the user has used) when r(i,j) = 1 (i.e. all the films that the user has rated)
This is just like linear regression with least-squared error
We can also add a regularization term to make our equation look as follows
The regularization term goes from k=1 through to m, so (θj) ends up being an n+1 feature vector
Don't regularize over the bias terms (0)
If you do this you get a reasonable value
We're rushing through this a bit, but it's just a linear regression problem

To make this a little bit clearer you can get rid of the mj term (it's just a constant so shouldn't make any difference to
minimization)
So to learn (θj)

But for our recommender system we want to learn parameters for all users, so we add an extra summation term to this
which means we determine the minimum (θj) value for every user

When you do this as a function of each (θj) parameter vector you get the parameters for each user
So this is our optimization objective -> J( θ1 , ..., θnu)
In order to do the minimization we have the following gradient descent
Slightly different to our previous gradient descent implementations
k = 0 and k != 0 versions
We can define the middle term above as

Difference from linear regression

No 1/m terms (got rid of the 1/m term)
Otherwise very similar
This approach is called content-based approach because we assume we have features regarding the content which will help us
identify things that make them appealing to a user
However, often such features are not available - next we discuss a non-contents based approach!

Collaborative filtering - overview

The collaborative filtering algorithm has a very interesting property - does feature learning
i.e. it can learn for itself what features it needs to learn
Recall our original data set above for our five films and four raters
Here we assume someone had calculated the "romance" and "action" amounts of the films
This can be very hard to do in reality
Often want more features than just two
So - let's change the problem and pretend we have a data set where we don't know any of the features associated with the
films
Now let's make a different assumption
We've polled each user and found out how much each user likes
Romantic films
Action films
Which has generated the following parameter set

Alice and Bob like romance but hate action

Carol and Dave like action but hate romance
If we can get these parameters from the users we can infer the missing values from our table
Lets look at "Love at Last"
Alice and Bob loved it
Carol and Dave hated it
We know from the feature vectors Alice and Bob love romantic films, while Carol and Dave hate them
Based on the factor Alice and Bob liked "Love at Last" and Carol and Dave hated it we may be able to (correctly)
conclude that "Love at Last" is a romantic film
This is a bit of a simplification in terms of the maths, but what we're really asking is
"What feature vector should x1 be so that
(θ1 )T x1 is about 5
(θ2 )T x1 is about 5
(θ3 )T x1 is about 0
(θ4 )T x1 is about 0
From this we can guess that x1 may be

Using that same approach we should then be able to determine the remaining feature vectors for the other films

Formalizing the collaborative filtering problem

We can more formally describe the approach as follows

Given (θ1 , ..., θnu) (i.e. given the parameter vectors for each users' preferences)
We must minimize an optimization function which tries to identify the best parameter vector associated with a film
So we're summing over all the indices j for where we have data for movie i
We're minimizing this squared error
Like before, the above equation gives us a way to learn the features for one film
We want to learn all the features for all the films - so we need an additional summation term

How does this work with the previous recommendation system

Content based recommendation systems

Saw that if we have a set of features for movie rating you can learn a user's preferences
Now
If you have your users preferences you can therefore determine a film's features
This is a bit of a chicken & egg problem
What you can do is
Randomly guess values for θ
Then use collaborative filtering to generate x
Then use content based recommendation to improve θ
Use that to improve x
And so on
This actually works
Causes your algorithm to converge on a reasonable set of parameters
This is collaborative filtering
We call it collaborative filtering because in this example the users are collaborating together to help the algorithm learn better
features and help the system and the other users

Collaborative filtering Algorithm

Here we combine the ideas from before to build a collaborative filtering algorithm
Our starting point is as follows
If we're given the film's features we can use that to work out the users' preference

If we're given the users' preferences we can use them to work out the film's features

One thing you could do is

Randomly initialize parameter
Go back and forward
But there's a more efficient algorithm which can solve θ and x simultaneously
Define a new optimization objective which is a function of x and θ

Understanding this optimization objective

The squared error term is the same as the squared error term in the two individual objectives above
So it's summing over every movie rated by every users
Note the ":" means, "for which"
Sum over all pairs (i,j) for which r(i,j) is equal to 1
The regularization terms
Are simply added to the end from the original two optimization functions
This newly defined function has the property that
If you held x constant and only solved θ then you solve the, "Given x, solve θ" objective above
Similarly, if you held θ constant you could solve x
In order to come up with just one optimization function we treat this function as a function of both film features x and user
parameters θ
Only difference between this in the back-and-forward approach is that we minimize with respect to both x
and θ simultaneously
When we're learning the features this way
Previously had a convention that we have an x0 = 1 term
When we're using this kind of approach we have no x0 ,
So now our vectors (both x and θ) are n-dimensional (not n+1)
We do this because we are now learning all the features so if the system needs a feature always = 1 then the algorithm
can learn one

Algorithm Structure

1) Initialize θ1 , ..., θnu and x1 , ..., xnm to small random values

A bit like neural networks - initialize all parameters to small random numbers
2) Minimize cost function (J(x1 , ..., xnm, θ1 , ...,θnu) using gradient descent
We find that the update rules look like this

Where the top term is the partial derivative of the cost function with respect to x k i while the bottom is the partial
derivative of the cost function with respect to θk i
So here we regularize EVERY parameters (no longer x0 parameter) so no special case update rule
3) Having minimized the values, given a user (user j) with parameters θ and movie (movie i) with learned features x, we
predict a start rating of (θj)T xi
This is the collaborative filtering algorithm, which should give pretty good predictions for how users like new movies
Vectorization: Low rank matrix factorization
Having looked at collaborative filtering algorithm, how can we improve this?
Given one product, can we determine other relevant products?
We start by working out another way of writing out our predictions
So take all ratings by all users in our example above and group into a matrix Y
5 movies
4 users
Get a [5 x 4] matrix
Given [Y] there's another way of writing out all the predicted ratings
With this matrix of predictive ratings
We determine the (i,j) entry for EVERY movie
We can define another matrix X
Just like matrix we had for linear regression
Take all the features for each movie and stack them in rows
Think of each movie as one example
Also define a matrix Θ
Take each per user parameter vector and stack in rows
Given our new matrices X and θ
We can have a vectorized way of computing the prediction range matrix by doing X * θT
We can given this algorithm another name - low rank matrix factorization
This comes from the property that the X * θT calculation has a property in linear algebra that we create a low
rank matrix
Don't worry about what a low rank matrix is

Recommending new movies to a user

Finally, having run the collaborative filtering algorithm, we can use the learned features to find related films
When you learn a set of features you don't know what the features will be - lets you identify the features which define a
film
Say we learn the following features
x1 - romance
x2 - action
x3 - comedy
x4 - ...
So we have n features all together
After you've learned features it's often very hard to come in and apply a human understandable metric to what those
features are
Usually learn features which are very meaning full for understanding what users like
Say you have movie i
Find movies j which is similar to i, which you can recommend
Our features allow a good way to measure movie similarity
If we have two movies x i and xj
We want to minimize || xi - xj||
i.e. the distance between those two movies
Provides a good indicator of how similar two films are in the sense of user perception
NB - Maybe ONLY in terms of user perception

Implementation detail: Mean Normalization

Here we have one final implementation detail - make algorithm work a bit better
To show why we might need mean normalization let's consider an example where there's a user who hasn't rated any movies

Lets see what the algorithm does for this user

Say n = 2
We now have to learn θ5 (which is an n-dimensional vector)
Looking in the first term of the optimization objective
There are no films for which r(i,j) = 1
So this term places no role in determining θ5
So we're just minimizing the final regularization term

Of course, if the goal is to minimize this term then

Why - If there's no data to pull the values away from 0 this gives the min value
So this means we predict ANY movie to be zero
Presumably Eve doesn't hate all movies...
So if we're doing this we can't recommend any movies to her either
Mean normalization should let us fix this problem

How does mean normalization work?

Group all our ratings into matrix Y as before

We now have a column of ?s which corresponds to Eves rating

Now we compute the average rating each movie obtained and stored in an nm - dimensional column vector

If we look at all the movie ratings in [Y] we can subtract off the mean rating
Means we normalize each film to have an average rating of 0
Now, we take the new set of ratings and use it with the collaborative filtering algorithm
Learn θj and xi from the mean normalized ratings
For our prediction of user j on movie i, predict
(θj)T xi + μi
Where these vectors are the mean normalized values
We have to add μ because we removed it from our θ values
So for user 5 the same argument applies, so

So on any movie i we're going to predict

(θ5 )T xi + μi
Where (θ5 )T xi = to 0 (still)
But we then add the mean ( μi) which means Eve has an average rating assigned to each movie for here
This makes sense
If Eve hasn't rated any films, predict the average rating of the films based on everyone
This is the best we can do

As an aside - we spoke here about mean normalization for users with no ratings
If you have some movies with no ratings you can also play with versions of the algorithm where you normalize the
columns
BUT this is probably less relevant - probably shouldn't recommend an unrated movie
To summarize, this shows how you do mean normalization preprocessing to allow your system to deal with users who have
not yet made any ratings
Means we recommend the user we know little about the best average rated products

New PPT On Work Ethics
100% (10)
New PPT On Work Ethics
18 pages
MOvie Recommendation System Project Report
No ratings yet
MOvie Recommendation System Project Report
30 pages
8 Recommender
No ratings yet
8 Recommender
139 pages
DL Project
No ratings yet
DL Project
9 pages
Recommendation Systems
No ratings yet
Recommendation Systems
62 pages
第十讲-Recommender Systems
No ratings yet
第十讲-Recommender Systems
81 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
48 pages
Recommender Systems-Chapter 3
No ratings yet
Recommender Systems-Chapter 3
47 pages
Module 2
No ratings yet
Module 2
53 pages
A Recommender System: John Urbanic
No ratings yet
A Recommender System: John Urbanic
36 pages
DM - Lecture 5
No ratings yet
DM - Lecture 5
75 pages
Recommender System Unit Ii
No ratings yet
Recommender System Unit Ii
14 pages
15 - Matrix Factorization
No ratings yet
15 - Matrix Factorization
55 pages
DS - Module 4
No ratings yet
DS - Module 4
57 pages
Recommendation System
No ratings yet
Recommendation System
17 pages
CSE545 sp23 (9) Recommendation Systems 4-10
No ratings yet
CSE545 sp23 (9) Recommendation Systems 4-10
72 pages
Collaborativefiltering 21
No ratings yet
Collaborativefiltering 21
72 pages
.Trashed-1724941095-Recommender Systems
No ratings yet
.Trashed-1724941095-Recommender Systems
30 pages
T10 Recommender System
No ratings yet
T10 Recommender System
45 pages
Unit - IV
No ratings yet
Unit - IV
78 pages
ALS Large-Scale Parallel Collaborative Filtering For The Netflix Prize
No ratings yet
ALS Large-Scale Parallel Collaborative Filtering For The Netflix Prize
12 pages
Movie Rec
No ratings yet
Movie Rec
13 pages
BDA
No ratings yet
BDA
31 pages
Iit Jee Fiit Jee Study Material
60% (60)
Iit Jee Fiit Jee Study Material
21 pages
Movie at
No ratings yet
Movie at
19 pages
Book Recommendation Project
No ratings yet
Book Recommendation Project
15 pages
Project Report "E-Commerce Recommendation"
No ratings yet
Project Report "E-Commerce Recommendation"
20 pages
Lecture9 Recommender Systems V0
No ratings yet
Lecture9 Recommender Systems V0
52 pages
Global Baseline Estimate - 12S21009
No ratings yet
Global Baseline Estimate - 12S21009
8 pages
Title Obvhbresearch Project
No ratings yet
Title Obvhbresearch Project
7 pages
Worksheet04 - Recommender Systems
No ratings yet
Worksheet04 - Recommender Systems
2 pages
Recommendation Engines
No ratings yet
Recommendation Engines
17 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
36 pages
Recommended System
No ratings yet
Recommended System
33 pages
Lecture 1 - Collaborative Filtering
No ratings yet
Lecture 1 - Collaborative Filtering
27 pages
Recommender Systems
No ratings yet
Recommender Systems
8 pages
Movie Recommendation Report
No ratings yet
Movie Recommendation Report
27 pages
Lec15-S Sarkar
No ratings yet
Lec15-S Sarkar
12 pages
Subtitle
No ratings yet
Subtitle
4 pages
Recommender System
No ratings yet
Recommender System
26 pages
Subtitle
No ratings yet
Subtitle
3 pages
Anand Yadav Internship
No ratings yet
Anand Yadav Internship
12 pages
Exp 2
No ratings yet
Exp 2
14 pages
Implementation and Comparison of Recommender Systems Using Various Models
100% (1)
Implementation and Comparison of Recommender Systems Using Various Models
13 pages
Recommender: An Analysis of Collaborative Filtering Techniques
No ratings yet
Recommender: An Analysis of Collaborative Filtering Techniques
5 pages
12 Recsys 1
No ratings yet
12 Recsys 1
11 pages
A Review of Information Filtering-CF
No ratings yet
A Review of Information Filtering-CF
47 pages
L6 Recommendation
No ratings yet
L6 Recommendation
56 pages
Subtitle
No ratings yet
Subtitle
2 pages
Inn Aat Report
No ratings yet
Inn Aat Report
10 pages
Module 5
No ratings yet
Module 5
8 pages
Machine Learning Model For Movie Recomme
No ratings yet
Machine Learning Model For Movie Recomme
6 pages
Recommender System
No ratings yet
Recommender System
45 pages
E96660695201532
No ratings yet
E96660695201532
5 pages
Readme PDF
No ratings yet
Readme PDF
6 pages
Math 551 Lab 9
No ratings yet
Math 551 Lab 9
5 pages
Recommendation System Based On Collaborative Filtering: Zheng Wen December 12, 2008
No ratings yet
Recommendation System Based On Collaborative Filtering: Zheng Wen December 12, 2008
10 pages
Test Answers
No ratings yet
Test Answers
15 pages
Movie Recommendation System: CSN-382 Project
No ratings yet
Movie Recommendation System: CSN-382 Project
25 pages
Movie Recommendation Engine Using Artificial Intelligence
No ratings yet
Movie Recommendation Engine Using Artificial Intelligence
30 pages
Study Plan at Chess Level Intermediate
No ratings yet
Study Plan at Chess Level Intermediate
34 pages
Graph Based Recommendation System
No ratings yet
Graph Based Recommendation System
6 pages
Palantir Price List
No ratings yet
Palantir Price List
2 pages
2nd Activity Fit Goal
No ratings yet
2nd Activity Fit Goal
3 pages
Child Development Theories and Critical Perspectives, 2nd Edition Dropbox Download
100% (13)
Child Development Theories and Critical Perspectives, 2nd Edition Dropbox Download
14 pages
Advanced Accounting 3rd Edition by Jeter and Chaney Ebook and TestBank Bundle PDF Download
No ratings yet
Advanced Accounting 3rd Edition by Jeter and Chaney Ebook and TestBank Bundle PDF Download
307 pages
Students Placement 2024 090224023438
No ratings yet
Students Placement 2024 090224023438
9 pages
Socio Lesson 2 PDF
No ratings yet
Socio Lesson 2 PDF
53 pages
De TS 10 Mon Tieng Anh Chuyen Quang Nam 2022 2023
No ratings yet
De TS 10 Mon Tieng Anh Chuyen Quang Nam 2022 2023
12 pages
School-Lib
No ratings yet
School-Lib
4 pages
My Ideal Life Partner
100% (2)
My Ideal Life Partner
2 pages
Assignment 6
No ratings yet
Assignment 6
4 pages
Test Bank
100% (1)
Test Bank
23 pages
09: Neural Networks - Learning: Neural Network Cost Function
No ratings yet
09: Neural Networks - Learning: Neural Network Cost Function
9 pages
07: Regularization: The Problem of Overfitting
No ratings yet
07: Regularization: The Problem of Overfitting
5 pages
Sister Act 2: Back in The Habit Guide Questions
No ratings yet
Sister Act 2: Back in The Habit Guide Questions
1 page
01 02 Introduction Regression Analysis and GR
No ratings yet
01 02 Introduction Regression Analysis and GR
11 pages
Thomas Hardy He Never Expected Much
No ratings yet
Thomas Hardy He Never Expected Much
14 pages
Cape Sociology All Education Questions 2005-2020
100% (1)
Cape Sociology All Education Questions 2005-2020
3 pages
Hmems80 2021 Week00 Step by Step PDF
No ratings yet
Hmems80 2021 Week00 Step by Step PDF
6 pages
Common Core Practice Test Grade 7
No ratings yet
Common Core Practice Test Grade 7
35 pages
Manhattan WMS Training
No ratings yet
Manhattan WMS Training
21 pages
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
No ratings yet
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
7 pages
Course Outline in Life and Works of Rizal
No ratings yet
Course Outline in Life and Works of Rizal
4 pages
03: Linear Algebra - Review: Matrices - Overview
No ratings yet
03: Linear Algebra - Review: Matrices - Overview
4 pages
Training and Development The Role of Trainers
No ratings yet
Training and Development The Role of Trainers
13 pages
Jazzlyn Shequin: Elementary Educator
No ratings yet
Jazzlyn Shequin: Elementary Educator
2 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
Rubrics Pericare Male
No ratings yet
Rubrics Pericare Male
2 pages
08 Neural Networks Representation PDF
No ratings yet
08 Neural Networks Representation PDF
10 pages
11 Machine Learning System Design PDF
No ratings yet
11 Machine Learning System Design PDF
7 pages
Lesson Plan
No ratings yet
Lesson Plan
3 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Adsmitcard PGHHK Merged
No ratings yet
Adsmitcard PGHHK Merged
2 pages
Common Laboratory Accidents and Causes in Secondary Schools of Zaria Environ
No ratings yet
Common Laboratory Accidents and Causes in Secondary Schools of Zaria Environ
7 pages
17 Large Scale Machine Learning PDF
No ratings yet
17 Large Scale Machine Learning PDF
10 pages
Anomaly Detection - Problem Motivation
No ratings yet
Anomaly Detection - Problem Motivation
9 pages
Linear Regression With Multiple Features
No ratings yet
Linear Regression With Multiple Features
7 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
12 Support Vector Machines PDF
No ratings yet
12 Support Vector Machines PDF
11 pages
06 Logistic Regression PDF
No ratings yet
06 Logistic Regression PDF
10 pages
18: Application Example OCR: Problem Description and Pipeline
No ratings yet
18: Application Example OCR: Problem Description and Pipeline
6 pages
Episode 3
No ratings yet
Episode 3
14 pages
10: Advice For Applying Machine Learning: Deciding What To Try Next
No ratings yet
10: Advice For Applying Machine Learning: Deciding What To Try Next
8 pages
Acne Vulgaris: A Disease of Western Civilization
No ratings yet
Acne Vulgaris: A Disease of Western Civilization
7 pages
College Research Paper
No ratings yet
College Research Paper
1 page
Machine Learning in Python: Hands on Machine Learning with Python Tools, Concepts and Techniques
From Everand
Machine Learning in Python: Hands on Machine Learning with Python Tools, Concepts and Techniques
Bob Mather
5/5 (1)

16 Recommender Systems PDF

Uploaded by

16 Recommender Systems PDF

Uploaded by

16: Recommender Systems

Previous Next Index

Recommender systems - introduction

Example - predict movie ratings

You're a company who sells movies

To introduce some notation

Content based recommendation

i.e. for our dataset we have

CPOL has a parameter vector ( x3 ) associated with it

Our prediction will be equal to

How do we learn (θj)

Difference from linear regression

Collaborative filtering - overview

Alice and Bob like romance but hate action

Formalizing the collaborative filtering problem

We can more formally describe the approach as follows

How does this work with the previous recommendation system

Content based recommendation systems

Collaborative filtering Algorithm

One thing you could do is

Understanding this optimization objective

1) Initialize θ1 , ..., θnu and x1 , ..., xnm to small random values

Recommending new movies to a user

Implementation detail: Mean Normalization

Lets see what the algorithm does for this user

Of course, if the goal is to minimize this term then

How does mean normalization work?

Group all our ratings into matrix Y as before

So on any movie i we're going to predict

You might also like