0% found this document useful (0 votes)
19 views72 pages

Autoencoders

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 72

Autoencoders, Extensions, and Applications

Piyush Rai

IIT Kanpur

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 1


Outline

Introduction to Autoencoders
Autoencoder Variants and Extensions
Some Applications of Autoencoders

Autoencoders for Recommender Systems

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 2


Autoencoder

Similar to the standard feedforward neural network with a key difference:


Unsupervised. No “label” at the output layer; Output layer simply tries to “recreate” the input

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 3


Autoencoder

Similar to the standard feedforward neural network with a key difference:


Unsupervised. No “label” at the output layer; Output layer simply tries to “recreate” the input

Defined by two (possibly nonlinear) mapping functions: Encoding function f , Decoding function g

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 3


Autoencoder

Similar to the standard feedforward neural network with a key difference:


Unsupervised. No “label” at the output layer; Output layer simply tries to “recreate” the input

Defined by two (possibly nonlinear) mapping functions: Encoding function f , Decoding function g
h = f (x) denotes an encoding (possibly nonlinear) for the input x

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 3


Autoencoder

Similar to the standard feedforward neural network with a key difference:


Unsupervised. No “label” at the output layer; Output layer simply tries to “recreate” the input

Defined by two (possibly nonlinear) mapping functions: Encoding function f , Decoding function g
h = f (x) denotes an encoding (possibly nonlinear) for the input x
x̂ = g (h) = g (f (x)) denotes the reconstruction (or the “decoding”) for the input x

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 3


Autoencoder

Similar to the standard feedforward neural network with a key difference:


Unsupervised. No “label” at the output layer; Output layer simply tries to “recreate” the input

Defined by two (possibly nonlinear) mapping functions: Encoding function f , Decoding function g
h = f (x) denotes an encoding (possibly nonlinear) for the input x
x̂ = g (h) = g (f (x)) denotes the reconstruction (or the “decoding”) for the input x
For an Autoencoder, f and g are learned with a goal to minimize the difference between x̂ and x

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 3


Autoencoder for Feature Learning

The learned code h = f (x) can be used as a new feature representation of the input x

Therefore autoencoders can also be used for “feature learning”

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 4


Autoencoder for Feature Learning

The learned code h = f (x) can be used as a new feature representation of the input x

Therefore autoencoders can also be used for “feature learning”


Note: Size of the hidden units (encoding) can also be larger than the input

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 4


A Simple Autoencoder

Let’s assume a D × 1 input x ∈ RD , and a single hidden layer with K × 1 code h ∈ RK


We can then define a simple linear autoencoder as
h = f (x) = Wx + b
x̂ = g (h) = W∗ h + c

where f is defined by W ∈ RK ×D and b ∈ RK ×1

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 5


A Simple Autoencoder

Let’s assume a D × 1 input x ∈ RD , and a single hidden layer with K × 1 code h ∈ RK


We can then define a simple linear autoencoder as
h = f (x) = Wx + b
x̂ = g (h) = W∗ h + c

where f is defined by W ∈ RK ×D and b ∈ RK ×1 , g is defined by W∗ ∈ RD×K and c ∈ RD×1

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 5


A Simple Autoencoder

Let’s assume a D × 1 input x ∈ RD , and a single hidden layer with K × 1 code h ∈ RK


We can then define a simple linear autoencoder as
h = f (x) = Wx + b
x̂ = g (h) = W∗ h + c

where f is defined by W ∈ RK ×D and b ∈ RK ×1 , g is defined by W∗ ∈ RD×K and c ∈ RD×1

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 5


A Simple Autoencoder

Let’s assume a D × 1 input x ∈ RD , and a single hidden layer with K × 1 code h ∈ RK


We can then define a simple linear autoencoder as
h = f (x) = Wx + b
x̂ = g (h) = W∗ h + c

where f is defined by W ∈ RK ×D and b ∈ RK ×1 , g is defined by W∗ ∈ RD×K and c ∈ RD×1

Note: If we learn f , g to minimize the squared error ||x̂ − x||2 then the linear autoencoder with
W∗ = W> is optimal

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 5


A Simple Autoencoder

Let’s assume a D × 1 input x ∈ RD , and a single hidden layer with K × 1 code h ∈ RK


We can then define a simple linear autoencoder as
h = f (x) = Wx + b
x̂ = g (h) = W∗ h + c

where f is defined by W ∈ RK ×D and b ∈ RK ×1 , g is defined by W∗ ∈ RD×K and c ∈ RD×1

Note: If we learn f , g to minimize the squared error ||x̂ − x||2 then the linear autoencoder with
W∗ = W> is optimal, and is equivalent to Principal Component Analysis (PCA)

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 5


Autoencoder: Zooming in..

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 6


Autoencoder: Zooming in..

W: K × D matrix of weights of edges between input and hidden layer

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 6


Autoencoder: Zooming in..

W: K × D matrix of weights of edges between input and hidden layer


Wkd is the weight of edge connecting input layer node d to hidden layer node k

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 6


Autoencoder: Zooming in..

W: K × D matrix of weights of edges between input and hidden layer


Wkd is the weight of edge connecting input layer node d to hidden layer node k

W∗ : D × K matrix of weights of edges between hidden and output layer

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 6


Autoencoder: Zooming in..

W: K × D matrix of weights of edges between input and hidden layer


Wkd is the weight of edge connecting input layer node d to hidden layer node k

W∗ : D × K matrix of weights of edges between hidden and output layer



Wdk is the weight of edge connecting hidden layer node k to output layer node d

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 6


Autoencoder: Zooming in..

W: K × D matrix of weights of edges between input and hidden layer


Wkd is the weight of edge connecting input layer node d to hidden layer node k

W∗ : D × K matrix of weights of edges between hidden and output layer



Wdk is the weight of edge connecting hidden layer node k to output layer node d

If W∗ = W> , the autoencoder architecture is said to have “tied weights”

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 6


Nonlinear Autoencoders

The hidden nodes can also be nonlinear transforms of the inputs, e.g.,
Can define h as a linear transform of x followed by a nonlinearity (e.g., sigmoid, ReLU)

h = sigmoid(Wx + b)
1
where the nonlinearity sigmoid(z) = 1+exp(−z)
squashes the real-valued z to lie between 0 and 1

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 7


Nonlinear Autoencoders

The hidden nodes can also be nonlinear transforms of the inputs, e.g.,
Can define h as a linear transform of x followed by a nonlinearity (e.g., sigmoid, ReLU)

h = sigmoid(Wx + b)
1
where the nonlinearity sigmoid(z) = 1+exp(−z)
squashes the real-valued z to lie between 0 and 1

Most commonly used autoencoders use such nonlinear transforms

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 7


Nonlinear Autoencoders

The hidden nodes can also be nonlinear transforms of the inputs, e.g.,
Can define h as a linear transform of x followed by a nonlinearity (e.g., sigmoid, ReLU)

h = sigmoid(Wx + b)
1
where the nonlinearity sigmoid(z) = 1+exp(−z)
squashes the real-valued z to lie between 0 and 1

Most commonly used autoencoders use such nonlinear transforms


Note: If inputs x ∈ {0, 1}D are binary, it may be more appropriate to also define x̂ as
x̂ = sigmoid(W∗ h + c)

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 7


What’s Learned by an Autoencoder?

Figure below: The K × D matrix W learned on digits data. Each tiny block visualizes a row of W

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 8


What’s Learned by an Autoencoder?

Figure below: The K × D matrix W learned on digits data. Each tiny block visualizes a row of W

Thus W captures the possible “patterns” in the training data (akin to the K basis vectors in PCA)

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 8


What’s Learned by an Autoencoder?

Figure below: The K × D matrix W learned on digits data. Each tiny block visualizes a row of W

Thus W captures the possible “patterns” in the training data (akin to the K basis vectors in PCA)
For any input x, the encoding h tells us how much each of these K features in present in x

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 8


Training the Autoencoder

To train the autoencoder, we need to define a loss function `(x̂, x)

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9


Training the Autoencoder

To train the autoencoder, we need to define a loss function `(x̂, x)

The loss function (a function of parameters W, b, W∗ , c) can be defined using various ways

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9


Training the Autoencoder

To train the autoencoder, we need to define a loss function `(x̂, x)

The loss function (a function of parameters W, b, W∗ , c) can be defined using various ways
In general, it is defined in terms of the difference between x̂ and x (reconstruction error)

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9


Training the Autoencoder

To train the autoencoder, we need to define a loss function `(x̂, x)

The loss function (a function of parameters W, b, W∗ , c) can be defined using various ways
In general, it is defined in terms of the difference between x̂ and x (reconstruction error)
For a single input x = [x1 , . . . , xD ] and its reconstruction x̂ = [x̂1 , . . . , x̂D ]
D
X
`(x̂, x) = (x̂d − xd )2 (squared loss; used if input are real-valued)
d=1

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9


Training the Autoencoder

To train the autoencoder, we need to define a loss function `(x̂, x)

The loss function (a function of parameters W, b, W∗ , c) can be defined using various ways
In general, it is defined in terms of the difference between x̂ and x (reconstruction error)
For a single input x = [x1 , . . . , xD ] and its reconstruction x̂ = [x̂1 , . . . , x̂D ]
D
X
`(x̂, x) = (x̂d − xd )2 (squared loss; used if input are real-valued)
d=1
D
X
`(x̂, x) = − [xd log(x̂d ) + (1 − xd ) log(1 − x̂d )] (cross-entropy loss; used if input are binary)
d=1

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9


Training the Autoencoder

To train the autoencoder, we need to define a loss function `(x̂, x)

The loss function (a function of parameters W, b, W∗ , c) can be defined using various ways
In general, it is defined in terms of the difference between x̂ and x (reconstruction error)
For a single input x = [x1 , . . . , xD ] and its reconstruction x̂ = [x̂1 , . . . , x̂D ]
D
X
`(x̂, x) = (x̂d − xd )2 (squared loss; used if input are real-valued)
d=1
D
X
`(x̂, x) = − [xd log(x̂d ) + (1 − xd ) log(1 − x̂d )] (cross-entropy loss; used if input are binary)
d=1

We find (W, b, W∗ , c) by minimizing the reconstruction error (summed over all training data)

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9


Training the Autoencoder

To train the autoencoder, we need to define a loss function `(x̂, x)

The loss function (a function of parameters W, b, W∗ , c) can be defined using various ways
In general, it is defined in terms of the difference between x̂ and x (reconstruction error)
For a single input x = [x1 , . . . , xD ] and its reconstruction x̂ = [x̂1 , . . . , x̂D ]
D
X
`(x̂, x) = (x̂d − xd )2 (squared loss; used if input are real-valued)
d=1
D
X
`(x̂, x) = − [xd log(x̂d ) + (1 − xd ) log(1 − x̂d )] (cross-entropy loss; used if input are binary)
d=1

We find (W, b, W∗ , c) by minimizing the reconstruction error (summed over all training data)
This can be done using backpropagation
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 9
Undercomplete, Overcomplete, and Need for Regularization

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 10


Undercomplete, Overcomplete, and Need for Regularization

In both cases, it is important to control the capacity of encoder and decoder

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 10


Undercomplete, Overcomplete, and Need for Regularization

In both cases, it is important to control the capacity of encoder and decoder


Undercomplete: Imagine K = 1 and very powerful f and g . Can achieve very small
reconstruction error but the learned code will not capture any interesting properties in the data

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 10


Undercomplete, Overcomplete, and Need for Regularization

In both cases, it is important to control the capacity of encoder and decoder


Undercomplete: Imagine K = 1 and very powerful f and g . Can achieve very small
reconstruction error but the learned code will not capture any interesting properties in the data
Overcomplete: Imagine K ≥ D and trivial (identity) functions f and g . Can achieve even zero
reconstruction error but the learned code will not capture any interesting properties in the data

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 10


Undercomplete, Overcomplete, and Need for Regularization

In both cases, it is important to control the capacity of encoder and decoder


Undercomplete: Imagine K = 1 and very powerful f and g . Can achieve very small
reconstruction error but the learned code will not capture any interesting properties in the data
Overcomplete: Imagine K ≥ D and trivial (identity) functions f and g . Can achieve even zero
reconstruction error but the learned code will not capture any interesting properties in the data
It is therefore important to regularize the functions as well as the learned code, and not just focus
on minimizing the reconstruction error
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 10
Regularized Autoencoders

Several ways to regularize the model, e.g.


Make the learned code sparse (Sparse Autoencoders)
Make the model robust against noisy/incomplete inputs (Denoising Dutoencoders)

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 11


Regularized Autoencoders

Several ways to regularize the model, e.g.


Make the learned code sparse (Sparse Autoencoders)
Make the model robust against noisy/incomplete inputs (Denoising Dutoencoders)

Make the model robust against small changes in the input (Contractive Autoencoders)

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 11


Sparse Autoencoders

Make the learned code sparse (Sparse Autoencoders). Done by adding a sparsity penalty on h

Loss Function: `(x̂, x) + Ω(h)


PK
where Ω(h) = k=1 |hk | is the `1 norm of h

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 12


Sparse Autoencoders

Make the learned code sparse (Sparse Autoencoders). Done by adding a sparsity penalty on h

Loss Function: `(x̂, x) + Ω(h)


PK
where Ω(h) = k=1 |hk | is the `1 norm of h

Sparse autoencoder is learned by minimizing the above regularized loss function

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 12


Denoising Autoencoders

First add some noise (e.g., Gaussian noise) to the original input x
Let’s denote x̃ as the corrupted version of x
The encoder f operates on x̃, i.e., h = f (x̃)

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 13


Denoising Autoencoders

First add some noise (e.g., Gaussian noise) to the original input x
Let’s denote x̃ as the corrupted version of x
The encoder f operates on x̃, i.e., h = f (x̃)

However, we still want to reconstruction x̂ to be close to the original uncorrupted input x

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 13


Denoising Autoencoders

First add some noise (e.g., Gaussian noise) to the original input x
Let’s denote x̃ as the corrupted version of x
The encoder f operates on x̃, i.e., h = f (x̃)

However, we still want to reconstruction x̂ to be close to the original uncorrupted input x


Since the corruption is stochastic, we minimize the expected loss: Ex̃∼p(x̃|x) [`(x̂, x̃)] + Ω(h)

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 13


Deep/Stacked Autoencoders

Most autoencoders can be extended to have more than one hidden layer

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 14


Stochastic Autoencoders

Can also define the encoder and decoder functions using probability distributions

pencoder (h|x)
pdecoder (x|h)

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15


Stochastic Autoencoders

Can also define the encoder and decoder functions using probability distributions

pencoder (h|x)
pdecoder (x|h)

The choice of distributions depends on the type of data being modeled and of the encodings

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15


Stochastic Autoencoders

Can also define the encoder and decoder functions using probability distributions

pencoder (h|x)
pdecoder (x|h)

The choice of distributions depends on the type of data being modeled and of the encodings

This gives a probabilistic approach for designing autoencoders

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15


Stochastic Autoencoders

Can also define the encoder and decoder functions using probability distributions

pencoder (h|x)
pdecoder (x|h)

The choice of distributions depends on the type of data being modeled and of the encodings

This gives a probabilistic approach for designing autoencoders


Negative log-likelihood − log pdecoder (x|h) is equivalent to the reconstruction error

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15


Stochastic Autoencoders

Can also define the encoder and decoder functions using probability distributions

pencoder (h|x)
pdecoder (x|h)

The choice of distributions depends on the type of data being modeled and of the encodings

This gives a probabilistic approach for designing autoencoders


Negative log-likelihood − log pdecoder (x|h) is equivalent to the reconstruction error
Can also use a prior distribution p(h) on the encodings (equivalent to regularizer)

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15


Stochastic Autoencoders

Can also define the encoder and decoder functions using probability distributions

pencoder (h|x)
pdecoder (x|h)

The choice of distributions depends on the type of data being modeled and of the encodings

This gives a probabilistic approach for designing autoencoders


Negative log-likelihood − log pdecoder (x|h) is equivalent to the reconstruction error
Can also use a prior distribution p(h) on the encodings (equivalent to regularizer)
Such ideas have been used to design generative models for autoencoders

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15


Stochastic Autoencoders

Can also define the encoder and decoder functions using probability distributions

pencoder (h|x)
pdecoder (x|h)

The choice of distributions depends on the type of data being modeled and of the encodings

This gives a probabilistic approach for designing autoencoders


Negative log-likelihood − log pdecoder (x|h) is equivalent to the reconstruction error
Can also use a prior distribution p(h) on the encodings (equivalent to regularizer)
Such ideas have been used to design generative models for autoencoders
Variational Autoencoder (VAE) is a popular example of such a model

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15


Stochastic Autoencoders

Can also define the encoder and decoder functions using probability distributions

pencoder (h|x)
pdecoder (x|h)

The choice of distributions depends on the type of data being modeled and of the encodings

This gives a probabilistic approach for designing autoencoders


Negative log-likelihood − log pdecoder (x|h) is equivalent to the reconstruction error
Can also use a prior distribution p(h) on the encodings (equivalent to regularizer)
Such ideas have been used to design generative models for autoencoders
Variational Autoencoder (VAE) is a popular example of such a model
Generative models like VAE can be used to “generate” new data using a random code h

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 15


Variational Autoencoders (VAE)

Learns a distribution (e.g., a Gaussian) on the encoding1

1 http://www.birving.com/presentations/autoencoders/
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 16
Variational Autoencoders (VAE)

Learns a distribution (e.g., a Gaussian) on the encoding1

Unlike standard AE, a VAE model learns to generate plausible data from random encodings

1 http://www.birving.com/presentations/autoencoders/
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 16
Some Applications of Autoencoders

(Unsupervised) Feature learning and Dimensionality reduction


Denoising and inpainting
Pre-training of deep neural networks

Recommender systems applications

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 17


Feature learning and Dimensionality Reduction

Example: A deep AE for low-dim feature learning for 784-dimensional MNIST images2

2 Figure credit: Hinton and Salakhutdinov


Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 18
Feature learning and Dimensionality Reduction

Example: Low-dim feature learning for 2000-dimensional bag-of-words documents

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 19


Denoising and Inpainting

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 20


Denoising and Inpainting

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 20


Applications in Recommender Systems

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 21


Recommender Systems

Assume we are given a partially known N × M ratings matrix R of N users on M items (movies)

Denote by r (u) the (partially known) M × 1 ratings vector of user u

Denote by r (i) the (partially known) N × 1 ratings vector of item i

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 22


Recommender Systems

Assume we are given a partially known N × M ratings matrix R of N users on M items (movies)

Denote by r (u) the (partially known) M × 1 ratings vector of user u

Denote by r (i) the (partially known) N × 1 ratings vector of item i


How can we use this data to build a recommender system?

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 22


Recommender Systems via Matrix Completion

An idea: If the predicted value of a user’s rating for a movie is high, then we should ideally
recommend this movie to the user

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 23


Recommender Systems via Matrix Completion

An idea: If the predicted value of a user’s rating for a movie is high, then we should ideally
recommend this movie to the user

Thus if we can “reconstruct” the missing entries in R, we can use this method to recommend
movies to users. Using an autoencoders can help us do this!

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 23


An Autoencoder based Approach

Using the rating vectors {r (u) }N


u=1 of all users, can learn an autoencoder

Note: During backprop, only update weights in W that are connected to the observed ratings3
Once learned, the model can predict (reconstruct) the missing ratings

3 AutoRec: Autoencoders Meet Collaborative Filtering (Sedhain et al, WWW 2015)


Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 24
Another Autoencoder based Approach

Another approach is to combine (denoising) autoencoders with a matrix factorization model4

4 Deep Collaborative Filtering via Marginalized Denoising Auto-encoder (Li et al, CIKM 2015)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 25
Another Autoencoder based Approach

Another approach is to combine (denoising) autoencoders with a matrix factorization model4

Idea: Rating of a user u on an item i can be defined using the inner-product based similarity of
>
their features learned via an autoencoder: Rui = f (h (u) h (i) ) where f is some compatibity function

4 Deep Collaborative Filtering via Marginalized Denoising Auto-encoder (Li et al, CIKM 2015)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 25
Another Autoencoder based Approach

Another approach is to combine (denoising) autoencoders with a matrix factorization model4

Idea: Rating of a user u on an item i can be defined using the inner-product based similarity of
>
their features learned via an autoencoder: Rui = f (h (u) h (i) ) where f is some compatibity function

Denoting {h (u) }N (i) M


u=1 = U and {h }i=1 = V, we can write R = UV
>

4 Deep Collaborative Filtering via Marginalized Denoising Auto-encoder (Li et al, CIKM 2015)
Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 25
Other Approaches on Autoencoders for Recommender Systems

Several recent papers on similar autoencoder based ideas


Collaborative Denoising Auto-Encoders for Top-N Recommender Systems (Wu et al, WSDM 2016)
Collaborative Deep Learning for Recommender Systems (Wang et al, KDD 2015)

Also possible to incorporate side information about the users and/or items (Wang et al, KDD 2015)

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 26


Autoencoders: Summary

Simple and powerful for (nonlinear) feature learning

Learned features are able to capture salient properties of data


Several extensions (sparse, denoising, stochastic, etc.)
Can also be stacked to create “deep” autoencoders
Recent focus on autoencoders that are based on generative models of data
Example: Variational Autoencoders

Piyush Rai (IIT Kanpur) Autoencoders, Extensions, and Applications 27

You might also like