0% found this document useful (0 votes)

8 views

Lecture11- Unsupervised Learning (I)

The document discusses unsupervised learning, focusing on word embeddings and autoencoders. It outlines various types of autoencoders, including undercomplete, overcomplete, regularized, sparse, denoising, contractive, and deep autoencoders, along with their applications and differences from PCA. Additionally, it highlights challenges in word embeddings and presents key models like Word2Vec, GloVe, ELMo, GPT, and BERT.

Uploaded by

Ahmed Amr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Lecture11- Unsupervised Learning (I)

Uploaded by

Ahmed Amr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Unsupervised Learning (I)

Dr. Mohamed Elshenawy

[email protected]
Zewail University of Science and Technology
Outline
• Word Embeddings – a Recap
• Unsupervised Learning Algorithms
• Autoencoders (Chapter 14)
• Overcomplete and Undercomplete Autoencoders
• Regularized Autoencoders
• Sparse Autoencoders
• Denoising Autoencoders (DAE)
• Contractive Autoencoders (CAE)
• Deep Autoencoders
Word Embeddings – a Recap
Timeline of the Key Models
• Key challenges
• Out-of-Vocabulary (OOV) words. The
datasets used for training do not
completely include all words.
• Word representation that depends on
the context. Understanding the context
of the word is necessary for downstream
tasks.
• Word embeddings of different
languages. It is necessary to design a
specific word embedding model for a
specific language

A survey of word embeddings based on deep learning

Shirui Wang · Wenan Zhou · Chao Jiang
Classical Models – The language model

• A method of generating text, which is the maximum likelihood of a sequence of multiple n

words

𝑝(𝑤1 , 𝑤2 , … , 𝑤𝑛 ) = ς𝑛𝑖=1 𝑝 𝑤𝑖 𝑤𝑖−𝑛+1 , … , 𝑤𝑖−1 )

• The occurrence of the 𝑖 𝑡ℎ word is only related to the previous 𝑛 − 1 words

A survey of word embeddings based on deep learning

Shirui Wang · Wenan Zhou · Chao Jiang
Neural network language model (NNLM)
• Proposed by Bengio et al.
• Similar to the traditional
language model, NNLM uses
the previous 𝑛 − 1 words to
predict the 𝑛𝑡ℎ word as it
overall structure.
• The neural network

A survey of word embeddings based on deep learning

Shirui Wang · Wenan Zhou · Chao Jiang
Word2Vec

A survey of word embeddings based on deep learning

Shirui Wang · Wenan Zhou · Chao Jiang
Glove

• Word2vec only focus on the information obtained from local context window
while the global statistic information is not used well.
• Glove solves this problem using global co-occurrence matrix
• Each element 𝑋𝑖𝑗 in the matrix represents the frequency of the word 𝑤𝑖 and the
word 𝑤𝑗 co-occur in a particular context window.

A survey of word embeddings based on deep learning

Shirui Wang · Wenan Zhou · Chao Jiang
Fasttext
• Models such as Word2vec are simple, efficient, and can learn semantic
representations of words on large data sets, but they cannot learn embeddings
from out-ofvocabulary(OOV) words.
• Most of the existing word representation approaches assigning a distinct vector
to each word while words are regarded as atomic tokens.
• This is a limitation, especially for languages which consist with sub-word level
information. fastText uses sub-word n-gram information, which can obtain the
order relationship between characters and better capture the internal semantics
of words, to solve this problem.

A survey of word embeddings based on deep learning

Shirui Wang · Wenan Zhou · Chao Jiang
ELMo (Embeddings from Language Models)
• Another challenge facing word embeddings is to combine context-specific
representation problems.
• ELMo is a a deep contextualized word representation method to solve the above
problem
• Uses a bidirectional LSTM model on the large corpus
OpenAI-GPT (Generative Pre-Training)
• Unlike ELMo, GPT uses Transformer for feature extraction
BERT (Bidirectional Encoder Representations from
Transformers)
• For many downstream tasks such as
machine reading comprehension, it
is important to be able to extract
context information from both
directions at the same time.
• Uses the bi-Transformer technique
which can effectively exploit the
deep semantic information of a
sentence.
• uses the bi-Transformer technique
which can effectively exploit the
deep semantic information of a
sentence.
Unsupervised Learning
Algorithms
Unsupervised Learning Algorithms

• Useful to learn useful properties about the structure of this dataset. For example,
they can learn the probability distribution that generated a dataset (density
function estimation).
• Can be used for dimensionality reduction.
• Can act as a pre-processing step before applying supervised learning techniques
(e.g. denoising).
• Can perform other tasks such as clustering.
Autoencoders (Chapter 14)
• An autoencoder is a neural network that is
trained to attempt to copy its input to its
output.
• Internally, it has a hidden layer h that
describes a code used to represent the input.
• The network may be viewed as consisting of
two parts: an encoder function h = f (x) and a
decoder that produces a reconstruction r =
g(h).

Goodfellow, Bengio, Courville 2016

Autoencoders
• Autoencoders are restricted in ways that allow them to copy only approximately,
and to copy only input that resembles the training data.
• Typically, we would like to prioritize learning some useful aspects of the data (e.g.
if your input is a noisy data, you would like your autoencoder to learn how to
recover the original data)
• If an autoencoder succeeds in simply learning to set g(f (x)) = x everywhere, then
it is not especially useful.
• Traditionally, autoencoders were used for dimensionality reduction or feature
learning. Recently, theoretical connections between autoencoders and latent
variable models have brought autoencoders to the forefront of generative
modeling,

Goodfellow, Bengio, Courville 2016

Autoencoders and PCA
• The simplest kind of autoencoder has one hidden layer,
linear activations, and squared error loss
ො = 𝑥 − 𝑥ො 2
L(x,𝑥)
𝑥ො = WVx (a linear function) 𝑋෠ N Units

• If K>=N, then we can choose WV that is the identity V

function
• If K<N, W maps x to a K-dimensional space, so it’s doing K Units
dimensionality reduction
• The autoencoder should learn to choose the subspace W
which minimizes the squared distance from the data to
the projections. X N Units

• Thus, it is equivalent to PCA which maximizes the

variance of the projections.

Goodfellow, Bengio, Courville 2016

Difference between autoencoders and PCA
• In PCA, transformations are linear.
• When the decoder is linear and the loss function (L) is the mean squared error, an
undercomplete autoencoder learns to span the same subspace as PCA.
• Autoencoders with nonlinear encoder functions f and nonlinear decoder
functions g can learn a more powerful nonlinear generalization of PCA.
• If the encoder and decoder are allowed too much capacity, the autoencoder can
learn to perform the copying task without extracting useful information about
the distribution of the data.
• If the capacity of the autoencoder is allowed to become too great, an
autoencoder can fail to learn anything useful about the dataset.
• Thus, f or g, in undercomplete autoencoders, typically has low capacity

Goodfellow, Bengio, Courville 2016

Overcomplete and
Undercomplete Autoencoders
Undercomplete Autoencoders
• One way to obtain useful features from the
autoencoder is to constrain h to have smaller
dimension than x.
• An autoencoder whose code dimension is less
than the input dimension is called
undercomplete.
• Learning an undercomplete representation
forces the autoencoder to capture the most
salient (important) features of the training
data.
• The learning process is described simply as
minimizing a loss function
L(x, g(f(x)))
• where L is a loss function penalizing g(f (x)) is
the decoder, f(x) is the encoder function. 𝑥 ℎ 𝑥ො

Goodfellow, Bengio, Courville 2016

Overcomplete Autoencoders
• An autoencoder whose code dimension is
greater than the input dimension is called
overcomplete.
• In case of overcomplete autoencoder, even a
linear encoder and linear decoder may learn
to copy the input to the output without
learning anything useful about the data
distribution.
• Must be regularized

𝑥 ℎ 𝑥ො
Regularized Autoencoders
Regularized Autoencoders
• Regularized autoencoders provide the ability to choose the decoder based on the
complexity of distribution to be modeled.
• Rather than limiting the model capacity by keeping the encoder and decoder
shallow and the code size small, regularized autoencoders use a loss function that
encourages the model to have other properties besides the ability to copy its
input to its output.
• These other properties include sparsity of the representation (sparse
autoencoders), robustness to noise or to missing inputs (denoising
autoencoders), and smallness of the derivative of the representation (Contractive
autoencoders).
• A regularized autoencoder can be nonlinear and overcomplete but still learn
something useful about the data distribution even if the model capacity is great
enough to learn a trivial identity function.

Goodfellow, Bengio, Courville 2016

Sparse Autoencoders
• Typically used to learn features as a pre-processing for another task such as
classification.
• A sparse autoencoder is simply an autoencoder whose training criterion involves a
sparsity penalty Ω(ℎ) on the code layer h, in addition to the reconstruction error:
𝐿 𝑥, 𝑔 𝑓 𝑥 + Ω(ℎ)
Ω ℎ = 𝜆 ෍ ℎ𝑖
𝑖
• We can think of the penalty Ω(h) simply as a regularizer term added to a feedforward
network whose primary task is to copy the input to the output (unsupervised learning
objective) and possibly also perform some supervised task (with a supervised learning
objective) that depends on these sparse features.
https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf

Goodfellow, Bengio, Courville 2016

Denoising Autoencoders (DAE)
• Instead of feeding the original input
x we feed a noise version of it 𝑥.
෤
• A denoising autoencoder or DAE
minimizes
𝐿 𝑥, 𝑔 𝑓 𝑥෤
• Reconstruction 𝑥ො is computed from
the corrupted input 𝑥෤
• Loss function compares 𝑥ො
reconstruction with the noiseless
input 𝑥

𝑥 𝑥෤ ℎ 𝑥ො

Goodfellow, Bengio, Courville 2016

Denoising Autoencoders (Cont.)

• Red crosses are the training

examples x.
• Black line low-dimensional model of
the autoencoder
• The corruption process
𝐶 (𝑥෤ | 𝑥) with a gray circle of
equiprobable corruptions.
Goodfellow, Bengio, Courville 2016
Contractive Autoencoders (CAE)
• Contractive Autoencoders forces the model to learn a function that does not
change much when x changes slightly
𝐿 𝑥, 𝑔 𝑓 𝑥 + Ω ℎ, 𝑥
2
Ω ℎ, 𝑥 = 𝜆 ෍ 𝛻𝑥 ℎ𝑖
𝑖
• Features that are sensitive to small changes in the inputs are penalized.
• The name contractive arises from the way that the CAE warps space.
• Specifically, because the CAE is trained to resist perturbations of its input, it is
encouraged to map a neighborhood of input points to a smaller neighborhood of
output points.

Goodfellow, Bengio, Courville 2016

Deep Autoencoders
• “Reducing the Dimensionality of
Data with Neural Networks” by
G. E. Hinton* and R. R.
Salakhutdinov
Deep Autoencoders

Top to bottom:
1) Random samples from the test data set;
2) reconstructions by the 30-dimensional autoencoder;
3) reconstructions by 30-dimensional PCA.
The average squared errors are 126 and 135.

Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
ch14 Autoencoder
No ratings yet
ch14 Autoencoder
42 pages
Unit5 Autoencoders.doc
No ratings yet
Unit5 Autoencoders.doc
45 pages
1 Autoencoders
No ratings yet
1 Autoencoders
22 pages
UNIT-5 part1
No ratings yet
UNIT-5 part1
15 pages
Deep Learning Module-2 & 4
No ratings yet
Deep Learning Module-2 & 4
48 pages
Unit 5e - Autoencoders
No ratings yet
Unit 5e - Autoencoders
32 pages
Module 4
No ratings yet
Module 4
10 pages
DL UNIT 4
No ratings yet
DL UNIT 4
21 pages
DL Unit - 4
No ratings yet
DL Unit - 4
26 pages
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
UNIT-V DL
No ratings yet
UNIT-V DL
31 pages
Experiment 4
No ratings yet
Experiment 4
26 pages
AD3501-DL-UNIT 5 NOTES
No ratings yet
AD3501-DL-UNIT 5 NOTES
16 pages
Autoencoder
No ratings yet
Autoencoder
14 pages
unit-iv-v-deep-learning-material
No ratings yet
unit-iv-v-deep-learning-material
32 pages
Ch3-Auto-encoder
No ratings yet
Ch3-Auto-encoder
40 pages
6. Brief Introduction on Current Research Areas - Autoencoders
No ratings yet
6. Brief Introduction on Current Research Areas - Autoencoders
20 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
Autoencoders
No ratings yet
Autoencoders
35 pages
DUnit IV
No ratings yet
DUnit IV
22 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
DL Class5
No ratings yet
DL Class5
23 pages
DL Unit 5
No ratings yet
DL Unit 5
19 pages
Lec16 - Autoencoders
No ratings yet
Lec16 - Autoencoders
18 pages
Vae - Gan 1
No ratings yet
Vae - Gan 1
136 pages
module 03
No ratings yet
module 03
13 pages
Unit 4
No ratings yet
Unit 4
10 pages
D5_PPT
No ratings yet
D5_PPT
79 pages
LR 1
No ratings yet
LR 1
3 pages
DL M3 Tech
No ratings yet
DL M3 Tech
15 pages
Unit 3
No ratings yet
Unit 3
39 pages
DL_Class test_2
No ratings yet
DL_Class test_2
22 pages
Talk MLSS Part2
No ratings yet
Talk MLSS Part2
97 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
Generative_Models
No ratings yet
Generative_Models
65 pages
Introduction To Autoencoders: A Brief Overview
No ratings yet
Introduction To Autoencoders: A Brief Overview
27 pages
Statistical Mechanics of Deep Learning
No ratings yet
Statistical Mechanics of Deep Learning
30 pages
7& 9 Autoencoder and Variational Autoencoder
No ratings yet
7& 9 Autoencoder and Variational Autoencoder
13 pages
Vae Gan
No ratings yet
Vae Gan
214 pages
Deep Generative Models
No ratings yet
Deep Generative Models
55 pages
Lecture 23b Auto Encoder
No ratings yet
Lecture 23b Auto Encoder
27 pages
Cs224n 2025 Lecture03 Neuralnets
No ratings yet
Cs224n 2025 Lecture03 Neuralnets
96 pages
AAI Module 3
No ratings yet
AAI Module 3
11 pages
20 StatMechDeep
No ratings yet
20 StatMechDeep
30 pages
Gen AI Unit 2
No ratings yet
Gen AI Unit 2
65 pages
Statistics Mechanic of Deep Learning
No ratings yet
Statistics Mechanic of Deep Learning
28 pages
conmatphys-031119-050745
No ratings yet
conmatphys-031119-050745
28 pages
Deep Learning Unit 3 (Part-2)
No ratings yet
Deep Learning Unit 3 (Part-2)
56 pages
Auto Encoder
No ratings yet
Auto Encoder
39 pages
Unit-5 Auto Encoders in Deep Learning
No ratings yet
Unit-5 Auto Encoders in Deep Learning
23 pages
Auto Encoder
No ratings yet
Auto Encoder
73 pages
UNIT V
No ratings yet
UNIT V
32 pages
AAI - Module 2 - Variational Autoencoders
No ratings yet
AAI - Module 2 - Variational Autoencoders
9 pages
Domande ANN
No ratings yet
Domande ANN
28 pages
Chapter 7 - Autoencoders
No ratings yet
Chapter 7 - Autoencoders
91 pages
Advanced Deep Learning Questions - ChatGPT
No ratings yet
Advanced Deep Learning Questions - ChatGPT
13 pages
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
Process design and layout
No ratings yet
Process design and layout
86 pages
Managing Performance in The Downstream
No ratings yet
Managing Performance in The Downstream
8 pages
An Outsourcing Case Study
No ratings yet
An Outsourcing Case Study
6 pages
F61
No ratings yet
F61
2 pages
ASCM Toc Supplier Relationship Management Certificate
No ratings yet
ASCM Toc Supplier Relationship Management Certificate
13 pages
4Cs of Contracts PDF
No ratings yet
4Cs of Contracts PDF
1 page
Cover Page From FIDIC (Red) Contract For Construction 1999
No ratings yet
Cover Page From FIDIC (Red) Contract For Construction 1999
1 page
Calpeda Water Pump
No ratings yet
Calpeda Water Pump
11 pages
Systems Change Key Components1
No ratings yet
Systems Change Key Components1
29 pages
4Cs of Contracts
No ratings yet
4Cs of Contracts
1 page
The Marketing Mix (The 4 P's of Marketing)
No ratings yet
The Marketing Mix (The 4 P's of Marketing)
3 pages
Checklist: Picking Options
No ratings yet
Checklist: Picking Options
1 page
Demand Sensing in Supply Chain PDF
100% (1)
Demand Sensing in Supply Chain PDF
8 pages
Orascom - Code of Business Conduct and Ethics
100% (1)
Orascom - Code of Business Conduct and Ethics
3 pages
Supply Chain Evolution - Theory, Concepts and Science
No ratings yet
Supply Chain Evolution - Theory, Concepts and Science
25 pages
Tumor Detection Through Mri Brain Images: Rohit Arya 20MCS1009
No ratings yet
Tumor Detection Through Mri Brain Images: Rohit Arya 20MCS1009
25 pages
Pre-printCopy
No ratings yet
Pre-printCopy
41 pages
Partitioning Methods & Hierachical Methods
No ratings yet
Partitioning Methods & Hierachical Methods
22 pages
Radial Basis Function Network (RBFN) Tutorial Chris McCormick
No ratings yet
Radial Basis Function Network (RBFN) Tutorial Chris McCormick
22 pages
Cluster Analysis in Python Chapter1 PDF
No ratings yet
Cluster Analysis in Python Chapter1 PDF
31 pages
computer network ppt file
No ratings yet
computer network ppt file
10 pages
Convolutional Neural Networks - Annotated
No ratings yet
Convolutional Neural Networks - Annotated
83 pages
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
No ratings yet
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
27 pages
Hierarchical Clustering: Ke Chen
No ratings yet
Hierarchical Clustering: Ke Chen
21 pages
Endsem
No ratings yet
Endsem
738 pages
CS 432/536 (SP 17-18) - Dr. Mian Muhammad Awais Page 1 of 2
No ratings yet
CS 432/536 (SP 17-18) - Dr. Mian Muhammad Awais Page 1 of 2
2 pages
Artificial Neural Network̄
No ratings yet
Artificial Neural Network̄
62 pages
Quiz 2 - Dimensionality reduction_ Machine Learning 3 - Ravi
No ratings yet
Quiz 2 - Dimensionality reduction_ Machine Learning 3 - Ravi
5 pages
Ca-3 QB (Pec-It602b) - 2024-1
No ratings yet
Ca-3 QB (Pec-It602b) - 2024-1
12 pages
(Week 11) Data Mining (K-Means Clustering)
No ratings yet
(Week 11) Data Mining (K-Means Clustering)
8 pages
Soft Computing
No ratings yet
Soft Computing
96 pages
Osteoporosis Detection Using Machine and Deep Learning Techniques
No ratings yet
Osteoporosis Detection Using Machine and Deep Learning Techniques
15 pages
Paper For Project
No ratings yet
Paper For Project
5 pages
Amharic Abstractive Text Summarization
No ratings yet
Amharic Abstractive Text Summarization
6 pages
FFNN
No ratings yet
FFNN
3 pages
Flat Clustering & Hierarchical Clustering in I.R
No ratings yet
Flat Clustering & Hierarchical Clustering in I.R
13 pages
ANN PROJECT ASSIGNMENT
No ratings yet
ANN PROJECT ASSIGNMENT
16 pages
DL Unit 4 Notes
No ratings yet
DL Unit 4 Notes
21 pages
3-Neural Network
No ratings yet
3-Neural Network
26 pages
DL
No ratings yet
DL
73 pages
Module 4 Algorithms For Data Science
No ratings yet
Module 4 Algorithms For Data Science
66 pages
Syllabus
No ratings yet
Syllabus
2 pages
How does Backpropagation work in a CNN_ _ Medium
No ratings yet
How does Backpropagation work in a CNN_ _ Medium
29 pages
R20A6610 DL Syllabus
No ratings yet
R20A6610 DL Syllabus
2 pages
The Jackknife, The Bootstrap, and Other Resampling Plans: Bradley Efron
0% (1)
The Jackknife, The Bootstrap, and Other Resampling Plans: Bradley Efron
2 pages