Lecture11- Unsupervised Learning (I)
Lecture11- Unsupervised Learning (I)
• Word2vec only focus on the information obtained from local context window
while the global statistic information is not used well.
• Glove solves this problem using global co-occurrence matrix
• Each element 𝑋𝑖𝑗 in the matrix represents the frequency of the word 𝑤𝑖 and the
word 𝑤𝑗 co-occur in a particular context window.
• Useful to learn useful properties about the structure of this dataset. For example,
they can learn the probability distribution that generated a dataset (density
function estimation).
• Can be used for dimensionality reduction.
• Can act as a pre-processing step before applying supervised learning techniques
(e.g. denoising).
• Can perform other tasks such as clustering.
Autoencoders (Chapter 14)
• An autoencoder is a neural network that is
trained to attempt to copy its input to its
output.
• Internally, it has a hidden layer h that
describes a code used to represent the input.
• The network may be viewed as consisting of
two parts: an encoder function h = f (x) and a
decoder that produces a reconstruction r =
g(h).
𝑥 ℎ 𝑥ො
Regularized Autoencoders
Regularized Autoencoders
• Regularized autoencoders provide the ability to choose the decoder based on the
complexity of distribution to be modeled.
• Rather than limiting the model capacity by keeping the encoder and decoder
shallow and the code size small, regularized autoencoders use a loss function that
encourages the model to have other properties besides the ability to copy its
input to its output.
• These other properties include sparsity of the representation (sparse
autoencoders), robustness to noise or to missing inputs (denoising
autoencoders), and smallness of the derivative of the representation (Contractive
autoencoders).
• A regularized autoencoder can be nonlinear and overcomplete but still learn
something useful about the data distribution even if the model capacity is great
enough to learn a trivial identity function.
𝑥 𝑥 ℎ 𝑥ො
Top to bottom:
1) Random samples from the test data set;
2) reconstructions by the 30-dimensional autoencoder;
3) reconstructions by 30-dimensional PCA.
The average squared errors are 126 and 135.