Coursera Machine Learning Course Week 6 - Slides

Deep Learning:
Searching for Images

Emily Fox & Carlos Guestrin
Machine Learning Specialization
University of Washington
©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization
Visual product recommender

I want to buy new shoes, but…
Too many
options online…
3 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

Text search doesn’t help…
“Dress shoes”

Visual product search demo

Features are key to
machine learning

Goal: revisit classifiers, but using more
complex, non-linear features
Sentence
Classifier
from
review MODEL
Output: y
Input: x
Predicted class

Image classification
Input: x Output: y
Image pixels Predicted object
Neural networks
ê
Learning *very*
non-linear features

Linear classifiers
Score(x) = w0 + w1 x1 + w2 x2 + … + wd xd
Score(x) > 0 Score(x) < 0

=0
d xd
2 x2 + …
+w
x1 + w
1
w0 + w

Graph representation of classifier:
useful for defining neural networks
Input Output
1
Score(x) =
x1 w1 w0 + w1 x1 + w2 x2 + … + wd xd
w2
x2 y
…
> 0, output 1
xd < 0, output 0

What can a linear classifier represent?
x1 OR x2 x1 AND x2
1 1
x1 y x1 y
x2 x2

What can’t a simple linear classifier
represent?
XOR
the counterexample
to everything
Need non-linear features

Solving the XOR problem:
Adding a layer
XOR = x1 AND NOT x2 OR NOT x1 AND x2
z1 z2
1 -0.5 1 -0.5
1 -0.5 1
x1 z1 y
-1
1
-1
1
x2 z2
Thresholded to 0 or 1
A neural network
•  Layers and layers and layers of
linear models and non-linear transformations
1 1
x1 z1 y
x2 z2
•  Around for about 50 years

-  Fell in “disfavor” in 90s
•  In last few years, big resurgence
-  Impressive accuracy on
several benchmark problems
-  Powered by huge datasets, GPUs,
& modeling/learning alg improvements
Application of deep learning
to computer vision

Image features
•  Features = local detectors
-  Combined to make prediction
-  (in reality, features are more low-level)
Nose
Eye
Face!
Eye
Mouth
Typical local detectors look for
locally “interesting points” in image
•  Image features: collections of
locally interesting points
- Combined to build classifiers
Face!

Many hand created features exist
for finding interest points…
• Spin Images
[Johnson & Herbert ‘99]
• Textons
[Malik et al. ‘99]
• RIFT
[Lazebnik ’04]
• GLOH
[Mikolajczyk & Schmid ‘05]
• HoG
[Dalal & Triggs ‘05]
SIFT [Lowe ‘99] • …

Standard image
classification approach
Input Extract features Use simple classifier

e.g., logistic regression, SVMs
Hand-created
Face?
features

Many hand created features exist
for finding interest points…
• Spin Images
[Johnson & Herbert ‘99]
• Textons
[Malik et al. ‘99]
Hand-created • RIFT
[Lazebnik ’04]
features
• GLOH
[Mikolajczyk & Schmid ‘05]
• HoG
[Dalal & Triggs ‘05]
SIFT [Lowe ‘99] • …
… but very painful to design

Deep learning:
implicitly learns features
Layer 1 Layer 2 Layer 3 Prediction
Example
detectors
learned
Example
interest
points
detected
22 [Zeiler & Fergus ‘13]

Deep learning performance

Sample results using deep neural networks
•  German traffic sign

recognition benchmark
-  99.5% accuracy (IDSIA team)
•  House number recognition

-  97.8% accuracy per character
[Goodfellow et al. ’13]

ImageNet 2012 competition:
1.2M training images, 1000 categories
Top 3 teams
0.3
Error (best of 5 guesses)
0.25
Huge
0.2
gain
0.15
0.1
0.05
0
SuperVision ISI OXFORD_VGG
Exploited hand-coded features like SIFT

ImageNet 2012 competition:
1.2M training images, 1000 categories
Winning entry: SuperVision
8 layers, 60M parameters [Krizhevsky et al. ’12]
Achieving these amazing results required:

•  New learning algorithms
•  GPU implementation
Deep learning in computer vision

Scene parsing with deep learning
[Farabet et al. ‘13]

Retrieving similar images
Input Image Nearest neighbors

Challenges of deep learning

Deep learning score card
Pros
•  Enables learning of features
rather than hand tuning
•  Impressive performance gains

-  Computer vision
-  Speech recognition
-  Some text analysis
•  Potential for more impact

Deep learning workflow Adjust
parameters,
network
architecture,…
Learn
Training deep
set neural net
Lots of
labeled
data
Validation
Validate
set

Many tricks needed to work well…
Different types of layers, connections,…
needed for high accuracy
[Krizhevsky et al. ’12]

Deep learning score card
Pros Cons
•  Enables learning of features •  Requires a lot of data for
rather than hand tuning high accuracy
•  Computationally
•  Impressive performance gains really expensive
-  Computer vision •  Extremely hard to tune
-  Speech recognition -  Choice of architecture
-  Some text analysis -  Parameter types
-  Hyperparameters
•  Potential for more impact -  Learning algorithm
-  …
Computational cost+ so many choices
=
incredibly hard to tune
Deep features:
Deep learning
+
Transfer learning

Standard image
classification approach
Input Extract features Use simple classifier

e.g., logistic regression, SVMs
Can we learn
features from Face?
Hand-created
data, even
featureswhen
we don’t have
data or time?
Transfer learning: Use data from
one task to help learn on another
Old idea, explored for deep learning by Donahue et al. ’14 & others
Lots of data: Great

Learn
accuracy on
vs. neural net
cat v. dog
Some data: Neural net as

feature extractor Great
+
accuracy on
101 categories
Simple classifier
What’s learned in a neural net
Neural net trained for Task 1: cat vs. dog
vs.
More generic Very specific

Can be used as feature extractor to Task 1
Should be ignored
for other tasks
Transfer learning in more detail…
For Task 2, predicting 101 categories,
learn only end part of neural net
Neural net trained for Task 1: cat vs. dog
Keep weights fixed! Class?
Use simple classifier

e.g., logistic regression,
SVMs, nearest neighbor,…
More generic Very specific

Can be used as feature extractor to Task 1
Should be ignored
for other tasks
Careful where you cut:
latter layers may be too task specific
Too specific
Use these! for new task
Layer 1 Layer 2 Layer 3 Prediction
Example
detectors
learned
Example
interest
points
detected
40 [Zeiler & Fergus ‘13]

Transfer learning with deep features workflow
Learn
Training
Extract simple
set
features classifier
Some with
labeled neural net
data trained on
different
task Validation
Validate
set
How general are deep features?

Summary of deep learning

What you can do now…
•  Describe multi-layer neural network models
•  Interpret the role of features as local detectors
in computer vision
•  Relate neural networks to hand-crafted image
features
•  Describe some settings where deep learning
achieves significant performance boosts
•  State the pros & cons of deep learning model
•  Apply the notion of transfer learning
•  Use neural network models trained in one
domain as features for building a model in
another domain
•  Build an image retrieval tool using deep features

Coursera Machine Learning Course Week 6 - Slides

Uploaded by

Copyright:

Available Formats

Coursera Machine Learning Course Week 6 - Slides

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Coursera Machine Learning Course Week 6 - Slides

Uploaded by

Copyright:

Available Formats

Deep Learning:

Searching for Images

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

3 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

4 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

7 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

Score(x) > 0 Score(x) < 0

10 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

11 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

12 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

Need non-linear features

13 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

• Around for about 50 years

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

18 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

SIFT [Lowe ‘99] • …

19 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

Input Extract features Use simple classifier

20 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

SIFT [Lowe ‘99] • …

… but very painful to design

Layer 1 Layer 2 Layer 3 Prediction

22 [Zeiler & Fergus ‘13]

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

• German traﬃc sign

• House number recognition

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

Exploited hand-coded features like SIFT

Achieving these amazing results required:

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

[Farabet et al. ‘13]

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

• Impressive performance gains

• Potential for more impact

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

[Krizhevsky et al. ’12]

33 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

Input Extract features Use simple classifier

Lots of data: Great

Some data: Neural net as

Neural net trained for Task 1: cat vs. dog

More generic Very specific

Neural net trained for Task 1: cat vs. dog

Keep weights fixed! Class?

Use simple classifier

More generic Very specific

Layer 1 Layer 2 Layer 3 Prediction

40 [Zeiler & Fergus ‘13]

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

44 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization

You might also like

•  Around for about 50 years

SIFT [Lowe ‘99] • …

SIFT [Lowe ‘99] • …

•  German traﬃc sign

•  House number recognition

•  Impressive performance gains

•  Potential for more impact