0% found this document useful (0 votes)

46 views19 pages

Neural Networks For Machine Learning: Lecture 16a Learning A Joint Model of Images and Captions

The document discusses modeling the joint density of images and captions using neural networks. It proposes: 1. Training separate multilayer models for images and word vectors 2. Adding a top layer connecting the two models for further joint training, allowing each modality to improve the other's earlier layers. It also discusses using a deep Boltzmann machine instead of a deep belief net for its symmetric connections, and justifies its pre-training method. Finally, it covers using Bayesian optimization to automatically determine neural network hyperparameters by modeling previous results, predicting new outcomes, and iteratively testing the most promising configurations.

Uploaded by

Reshma Khemchandani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views19 pages

Neural Networks For Machine Learning: Lecture 16a Learning A Joint Model of Images and Captions

Uploaded by

Reshma Khemchandani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Neural Networks for Machine Learning

Lecture 16a
Learning a joint model of images and captions

Geoffrey Hinton
Nitish Srivastava,
Kevin Swersky
Tijmen Tieleman
Abdel-rahman Mohamed
Modeling the joint density of images and captions
(Srivastava and Salakhutdinov, NIPS 2012)
•  Goal: To build a joint density 1. Train a multilayer model of images.
model of captions and 2. Train a separate multilayer model of
standard computer vision word-count vectors.
feature vectors extracted
3. Then add a new top layer that is
from real photographs. connected to the top layers of both
–  This needs a lot more individual models.
computation than
–  Use further joint training of the
building a joint density whole system to allow each
model of labels and digit modality to improve the earlier
images! layers of the other modality.
Modeling the joint density of images and captions
(Srivastava and Salakhutdinov, NIPS 2012)

•  Instead of using a deep belief net, use a deep Boltzmann machine that
has symmetric connections between all pairs of layers.
–  Further joint training of the whole DBM allows each modality to
improve the earlier layers of the other modality.
–  That’s why they used a DBM.
–  They could also have used a DBN and done generative fine-tuning
with contrastive wake-sleep.
•  But how did they pre-train the hidden layers of a deep Boltzmann
Machine?
–  Standard pre-training leads to composite model that is a DBN not
a DBM.
Combining three RBMs to make a DBM
h3
•  The top and bottom
W3 2W3
RBMs must be pre- h3
trained with the weights h2
W3
in one direction twice
as big as in the other h2 h2
direction. 2W2 2W2 W2
–  This can be
justified!
h1 h1
•  The middle layers do W1
geometric model h1
averaging. 2W1 W1
v
v
Neural Networks for Machine Learning

Lecture 16b
Hierarchical coordinate frames

Geoffrey Hinton
with
Nitish Srivastava
Kevin Swersky
Why convolutional neural networks are doomed

•  Pooling loses the precise •  Convolutional nets that just

spatial relationships between use translations cannot
higher-level parts such as a extrapolate their understanding
nose and a mouth. of geometric relationships to
–  The precise spatial radically new viewpoints.
relationships are needed –  People are very good at
for identity recognition. extrapolating. After seeing
–  Overlapping the pools a new shape once they can
helps a bit. recognize it from a different
viewpoint.
The hierarchical coordinate frame approach
•  Use a group of neurons to •  Recognize larger features by
represent the conjunction of using the consistency of the
the shape of a feature and its poses of their parts.
pose relative to the retina.
–  The pose relative to the
retina is the relationship
between the coordinate
frame of the retina and the
intrinsic coordinate frame nose and mouth nose and mouth
make consistent make inconsistent
of the feature. predictions for predictions for
pose of face pose of face
Two layers in a hierarchy of parts
•  A higher level visual entity is present if several lower level visual entities
can agree on their predictions for its pose (inverse computer graphics!)

pj Tj
face
pose of mouth Tij Thj
i.e. relationship TiTij ≈ ThThj
to camera

pi Ti ph Th
mouth nose
A crucial property of the pose vectors

•  They allow spatial •  The invariant geometric

transformations to be modeled properties of a shape are in the
by linear operations. weights, not in the activities.
–  This makes it easy to learn –  The activities are
a hierarchy of visual equivariant: As the pose of
entities. the object varies, the
–  It makes it easy to activities all vary.
generalize across –  The percept of an object
viewpoints. changes as the viewpoint
changes.
Evidence that our visual systems impose coordinate frames in
order to represent shapes (after Irvin Rock)

The square and the diamond are

What country is very different percepts that make
this? Hint: Sarah different properties obvious.
Palin
Neural Networks for Machine Learning

Lecture 16c
Bayesian optimization of neural network
hyperparameters

Geoffrey Hinton
Nitish Srivastava,
Kevin Swersky
Tijmen Tieleman
Abdel-rahman Mohamed
Let machine learning figure out the hyper-parameters!
(Snoek, Larochelle & Adams, NIPS 2012)
•  One of the commonest reasons •  Naive grid search: Make a list of
for not using neural networks is alternative values for each hyper-
that it requires a lot of skill to set parameter and then try all possible
hyper-parameters. combinations.
–  Number of layers –  Can we do better than this?
–  Number of units per layer •  Sampling random combinations:
–  Type of unit This is much better if some hyper-
–  Weight penalty parameters have no effect.
–  Learning rate –  Its a big waste to exactly repeat
the settings of the other hyper-
–  Momentum etc. etc.
parameters.
Machine learning to the rescue

•  Instead of using random •  We assume that the amount of

combinations of values for the computation involved in
hyper-parameters, why not look evaluating one setting of the
at the results so far? hyper-parameters is huge.
–  Predict regions of the hyper- –  Much more than the work
parameter space that might involved in building a
give better results. model that predicts the
–  We need to predict how well result from knowing
a new combination will do previous results with
and also model the different settings of the
uncertainty of that prediction. hyper-parameters.
Gaussian Process models
•  These models assume that •  GP models do more than just
similar inputs give similar outputs. predicting a single value.
–  This is a very weak but very –  They predict a Gaussian
sensible prior for the effects of distribution of values.
hyper-parameters. •  For test cases that are close to
•  For each input dimension, they several, consistent training
learn the appropriate scale for cases the predictions are fairly
measuring similarity. sharp.
–  Is 200 similar to 300? •  For test cases far from any
–  Look to see if they give similar training cases, the predictions
results in the data so far. have high variance.
A sensible way to decide what to try
•  Keep track of the best setting so A B C
far.
•  After each experiment this might
stay the same or it might improve current
if the latest result is the best. best value
•  Pick a setting of the hyper-
parameters such that the
expected improvement in our
best setting is big.
–  don’t worry about the
worst bet best bet
downside (hedge funds!)
How well does Bayesian optimization work?

•  If you have the resources to run a lot of experiments, Bayesian

optimization is much better than a person at finding good
combinations of hyper-parameters.
–  This is not the kind of task we are good at.
–  We cannot keep in mind the results of 50 different
experiments and see what they predict.
•  It’s much less prone to doing a good job for the method we like
and a bad job for the method we are comparing with.
–  People cannot help doing this. They try much harder for their
own method because they know it ought to work better!
Neural Networks for Machine Learning

Lecture 16d
The fog of progress

Geoffrey Hinton
with
Nitish Srivastava
Kevin Swersky
Why we cannot predict the long-term future

•  Consider driving at night. The number of photons you receive from

2
the tail-lights of the car in front falls off as 1 / d
•  Now suppose there is fog.
2
–  For small distances its still 1 / d
–  But for big distances its exp(-d) because fog absorbs a certain
fraction of the photons per unit distance.
•  So the car in front becomes completely invisible at a distance at
2
which our short-range 1 / d model predicts it will be very visible.
–  This kills people.
The effect of exponential progress

•  Over the short term, things •  So the long term future of

change slowly and its easy to machine learning and neural
predict progress. nets is a total mystery.
–  We can all make quite –  But over the next five
good guesses about what years, its highly probable
will be in the iPhone 6. that big, deep neural
•  But in the longer run our networks will do amazing
perception of the future hits a things.
wall, just like fog.

Talent and Olympiad Exams Resource Book Class 7 Math
90% (10)
Talent and Olympiad Exams Resource Book Class 7 Math
136 pages
Grade 4 Sasmo: Answer The Questions
100% (4)
Grade 4 Sasmo: Answer The Questions
4 pages
Deep Learning Notes
100% (1)
Deep Learning Notes
71 pages
(Ebook) - Piers Anthony - Ghost
No ratings yet
(Ebook) - Piers Anthony - Ghost
116 pages
UGRD CS6105 Discrete Mathematics Legit Not Quizzess
100% (1)
UGRD CS6105 Discrete Mathematics Legit Not Quizzess
8 pages
UIMO CLASS 5 Past 5 Papers Reduced Size 6xwwsc
No ratings yet
UIMO CLASS 5 Past 5 Papers Reduced Size 6xwwsc
98 pages
Physical Assessment Write-Up
No ratings yet
Physical Assessment Write-Up
10 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
AA12 Deep Learning 2024
No ratings yet
AA12 Deep Learning 2024
30 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
Deep Neural Network
No ratings yet
Deep Neural Network
12 pages
Nature14539 PDF
No ratings yet
Nature14539 PDF
9 pages
Lit Deep Learning
No ratings yet
Lit Deep Learning
19 pages
Chapter 11 Neural Nets
No ratings yet
Chapter 11 Neural Nets
39 pages
Session 2 ANN 2024
No ratings yet
Session 2 ANN 2024
29 pages
R Deep Learning Essentials - Sample Chapter
100% (3)
R Deep Learning Essentials - Sample Chapter
24 pages
Machine Learning 4th Unit
No ratings yet
Machine Learning 4th Unit
54 pages
Unit 3
No ratings yet
Unit 3
16 pages
Deep
No ratings yet
Deep
15 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
Module 1
No ratings yet
Module 1
64 pages
Three Reasons That You Should NOT Use Deep Learning - by George Seif - Towards Data Science
No ratings yet
Three Reasons That You Should NOT Use Deep Learning - by George Seif - Towards Data Science
1 page
6 CNN
No ratings yet
6 CNN
50 pages
Lecture 2a An Overview of The Main Types of Neural Network Architecture
No ratings yet
Lecture 2a An Overview of The Main Types of Neural Network Architecture
32 pages
2015 Lecun
No ratings yet
2015 Lecun
10 pages
Creativity in Machine Learning
No ratings yet
Creativity in Machine Learning
8 pages
Introduction To Machine Learning
100% (1)
Introduction To Machine Learning
119 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Unit I
No ratings yet
Unit I
10 pages
01 ML Basics
No ratings yet
01 ML Basics
61 pages
CERN Deep Learning and Vision
No ratings yet
CERN Deep Learning and Vision
72 pages
Introduction To Deep Learning: Technical Seminar by Md. Abul Fazl (14261A05A0) CSE Dept
No ratings yet
Introduction To Deep Learning: Technical Seminar by Md. Abul Fazl (14261A05A0) CSE Dept
21 pages
Lecture 3 - Introduction To Deep Learning
No ratings yet
Lecture 3 - Introduction To Deep Learning
27 pages
Ann 5TH
No ratings yet
Ann 5TH
98 pages
Intro of Deep Learning
No ratings yet
Intro of Deep Learning
19 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 7+Deep+Learning
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 7+Deep+Learning
108 pages
Week8 - Machine Learning
No ratings yet
Week8 - Machine Learning
35 pages
Top 9 Machine Learning Applications in Real World
No ratings yet
Top 9 Machine Learning Applications in Real World
7 pages
ML Archs
No ratings yet
ML Archs
36 pages
Lecun 2015
No ratings yet
Lecun 2015
10 pages
Deep Learning For Signal Processing
No ratings yet
Deep Learning For Signal Processing
19 pages
Machine Learning Slides
No ratings yet
Machine Learning Slides
281 pages
Ram PDF
No ratings yet
Ram PDF
19 pages
Report of Ann Cat3-Dev
No ratings yet
Report of Ann Cat3-Dev
8 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
Lecture 10 Merged
No ratings yet
Lecture 10 Merged
14 pages
DL1 Ver1
No ratings yet
DL1 Ver1
49 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
Matlab Deep Learning Series
No ratings yet
Matlab Deep Learning Series
6 pages
Asset-V1 - MITx 6.86x 1T2021 Type@Asset Block@Slides - Lecture1 - Withcredits
No ratings yet
Asset-V1 - MITx 6.86x 1T2021 Type@Asset Block@Slides - Lecture1 - Withcredits
29 pages
CH 19 AI
No ratings yet
CH 19 AI
17 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
DL Intro
No ratings yet
DL Intro
64 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Deep Learning With Tensorflow
100% (1)
Deep Learning With Tensorflow
70 pages
2015 Lecun Deeplearn
No ratings yet
2015 Lecun Deeplearn
10 pages
Lecun 2015
No ratings yet
Lecun 2015
9 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
01 Introduction 1
No ratings yet
01 Introduction 1
71 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Active Contour: Advancing Computer Vision with Active Contour Techniques
From Everand
Active Contour: Advancing Computer Vision with Active Contour Techniques
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
ISFO English Toppers List 2019-20: S.No. Student Name School Name State Class Subject International Rank
No ratings yet
ISFO English Toppers List 2019-20: S.No. Student Name School Name State Class Subject International Rank
10 pages
ISTSE Detailed Instructions
No ratings yet
ISTSE Detailed Instructions
33 pages
Newsletter BHIS Noida
No ratings yet
Newsletter BHIS Noida
39 pages
Neural Networks For Machine Learning: Lecture 3a Learning The Weights of A Linear Neuron
No ratings yet
Neural Networks For Machine Learning: Lecture 3a Learning The Weights of A Linear Neuron
34 pages
Unified International Mathematics Olympiad - 2021: NA: Not Applicable-For That Class Note
No ratings yet
Unified International Mathematics Olympiad - 2021: NA: Not Applicable-For That Class Note
2 pages
CL 4 Nstse 2021 Paper 465 Key
No ratings yet
CL 4 Nstse 2021 Paper 465 Key
4 pages
April SKG
No ratings yet
April SKG
5 pages
Neural Networks: Guoqiang Wu, Ruobing Zheng, Yingjie Tian, Dalian Liu
No ratings yet
Neural Networks: Guoqiang Wu, Ruobing Zheng, Yingjie Tian, Dalian Liu
16 pages
Comparison of Base Classifiers For Multi Label Learning - 2020 - Neurocomputing
No ratings yet
Comparison of Base Classifiers For Multi Label Learning - 2020 - Neurocomputing
10 pages
The Broken Flute, 1994, Sharada Dwivedi, 0140236821, 9780140236828, Penguin Books, 1994
No ratings yet
The Broken Flute, 1994, Sharada Dwivedi, 0140236821, 9780140236828, Penguin Books, 1994
29 pages
Pattern Recognition: Zhe Wang, Zonghai Zhu, Dongdong Li
No ratings yet
Pattern Recognition: Zhe Wang, Zonghai Zhu, Dongdong Li
14 pages
Knowledge-Based Systems: Jun Ma, Jumei Shen
No ratings yet
Knowledge-Based Systems: Jun Ma, Jumei Shen
14 pages
Regularized Minimax Probability Machine - 2019 - Knowledge Based Systems
No ratings yet
Regularized Minimax Probability Machine - 2019 - Knowledge Based Systems
9 pages
LogIQids Trends - Relationship
No ratings yet
LogIQids Trends - Relationship
16 pages
Semi-Supervised Learning A Brief Review
No ratings yet
Semi-Supervised Learning A Brief Review
6 pages
1566991547ISFO English G3 2019 PDF
100% (1)
1566991547ISFO English G3 2019 PDF
64 pages
Posthumanism and Deconstructing Arguments Corpora and Digitallydriven Critical Analysis Kieran Ohalloran Instant Download
No ratings yet
Posthumanism and Deconstructing Arguments Corpora and Digitallydriven Critical Analysis Kieran Ohalloran Instant Download
82 pages
CPR Awareness Training
No ratings yet
CPR Awareness Training
5 pages
Criminal Appeal: T. T Sigauke, For The Accused M Musarurwa, For The State
No ratings yet
Criminal Appeal: T. T Sigauke, For The Accused M Musarurwa, For The State
7 pages
Learn Abouts 1a My Birthday
No ratings yet
Learn Abouts 1a My Birthday
4 pages
Crane Troubleshooting Guide Lines
No ratings yet
Crane Troubleshooting Guide Lines
9 pages
Species Concepts
No ratings yet
Species Concepts
33 pages
126) Idioms - & - Phrases
No ratings yet
126) Idioms - & - Phrases
110 pages
Philippine History
No ratings yet
Philippine History
24 pages
Song Lyrics of The 1950s
No ratings yet
Song Lyrics of The 1950s
10 pages
Legisltive Control Over Delegated Legislation
No ratings yet
Legisltive Control Over Delegated Legislation
9 pages
English Quarter 2 Module 3 - Personal Pronouns
No ratings yet
English Quarter 2 Module 3 - Personal Pronouns
31 pages
BHOLI
No ratings yet
BHOLI
4 pages
Myla Lesson Exemplar in Music 6
No ratings yet
Myla Lesson Exemplar in Music 6
4 pages
TM1 SWBL Renel Cuaresma
100% (3)
TM1 SWBL Renel Cuaresma
23 pages
TEDX Talks - It's Not Manipulation.
No ratings yet
TEDX Talks - It's Not Manipulation.
9 pages
1.pdf B Aboutus Website 8
No ratings yet
1.pdf B Aboutus Website 8
40 pages
Developmental Psychology: Bandura, Ross and Ross (1961)
No ratings yet
Developmental Psychology: Bandura, Ross and Ross (1961)
1 page
ISO55001 Logic Maps
100% (1)
ISO55001 Logic Maps
6 pages
Great Old Ones
No ratings yet
Great Old Ones
3 pages
Speech Text - Mental Health
No ratings yet
Speech Text - Mental Health
2 pages
Emerson-Thoreau Test Review
No ratings yet
Emerson-Thoreau Test Review
5 pages
OK NYC Resume 1
No ratings yet
OK NYC Resume 1
1 page
Cooper-Carringtonk Edid6507-Mini Project
No ratings yet
Cooper-Carringtonk Edid6507-Mini Project
32 pages
Director Seymore Butts Tells You The Truth About What Really Happens in Porn Men's Health
No ratings yet
Director Seymore Butts Tells You The Truth About What Really Happens in Porn Men's Health
1 page
Literature During The American Regime PDF
No ratings yet
Literature During The American Regime PDF
6 pages
BK O Donationdraft 1&2
No ratings yet
BK O Donationdraft 1&2
27 pages
Niveshdaily: From Research Desk
No ratings yet
Niveshdaily: From Research Desk
18 pages

Neural Networks For Machine Learning: Lecture 16a Learning A Joint Model of Images and Captions

Uploaded by

Neural Networks For Machine Learning: Lecture 16a Learning A Joint Model of Images and Captions

Uploaded by

Neural Networks for Machine Learning

• Pooling loses the precise • Convolutional nets that just

• They allow spatial • The invariant geometric

The square and the diamond are

• Instead of using random • We assume that the amount of

• If you have the resources to run a lot of experiments, Bayesian

• Consider driving at night. The number of photons you receive from

• Over the short term, things • So the long term future of

You might also like

•  Pooling loses the precise •  Convolutional nets that just

•  They allow spatial •  The invariant geometric

•  Instead of using random •  We assume that the amount of

•  If you have the resources to run a lot of experiments, Bayesian

•  Consider driving at night. The number of photons you receive from

•  Over the short term, things •  So the long term future of