CS230: Lecture 2 Practical Approaches To Deep Learning Projects
CS230: Lecture 2 Practical Approaches To Deep Learning Projects
Input Output
Model
=
Architecture
+
0
Parameters
image2vector
⎛ 255 ⎞
/255
(i )
x 1
⎜ 231 ⎟ /255
x (i )
⎜ ⎟
2
⎜ 94 ⎟ /255
x (i )
⎜ ⎟
n−1
⎝ 142 ⎠
/255
(i )
x n
image2vector
⎛ 255 ⎞
/255
(i )
x 0.12 < 0.5
1
w x +b
T (i )
σ 0.12 Dog?
⎜ 231 ⎟ /255
x (i )
⎜ ⎟
2
⎜ ⎟
0.73 > 0.5
⎜ 94 ⎟ /255
x (i )
⎜ ⎟
n−1
0.04 < 0.5
w x +b σ
⎝ 142 ⎠
T (i )
/255
(i )
0.04 Giraffe?
x n
image2vector
⎛ 255 ⎞
/255
(i )
x 1
w x +b
T (i )
σ
⎜ 231 ⎟ /255
x (i )
⎜ ⎟
2
⎜ ... ⎟ … … wT x (i ) + b σ
⎜ 94 ⎟ /255
x (i )
⎜ ⎟
n−1
w x +b σ
⎝ 142 ⎠
T (i )
/255
(i )
x n
image2vector
Hidden layer
⎛ 255 ⎞
/255
(i )
x [1]
a
1
⎜ 231 ⎟ /255
x (i )
1
⎜ ⎟
2 output layer
⎜ ⎟
[1] [2]
... … … a 2
a 1 0.73
⎜ 94 ⎟ /255
x (i )
0.73 > 0.5
⎜ ⎟
n−1
[1]
a
⎝ 142 ⎠
/255 Cat
x (i ) 3
n
(i ) [1]
x
1 a 1
Hidden layer
[2]
a 1
(i ) [1]
x 2 a 2
output layer
(i )
a [2]
2 a [ 3]
1
ŷ
(i ) [1]
x 3 a 3
[2]
a 3
(i ) [1]
x 4 a 4
• The labelling strategy matters to successfully train your models. For example, if
you’re training a 3-class (dog, cat, giraffe) classifier under the constraint of one
animal per picture, you might use one-hot vectors to label your data.
Goal: Given an image, classify as taken “during the day” (0) or “during the night” (1)
• Use a known proxy project to evaluate how much data you need.
• Be scrappy. For example, if you’d like to find a good resolution of images to use
for your data, but don’t have time for a large scale experiment, approximate
human-level performance by testing your friends as classifiers.
Goal: A school wants to use Face Verification for validating student IDs in facilities
(dinning halls, gym, pool …)
Bertrand Resolution?
(412, 412, 3)
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Face Verification
Goal: A school wants to use Face Verification for validating student IDs in facilities
(dinning halls, gym, pool …)
4. What architecture?
Simple solution: Issues:
compute distance
- Background lighting differences
pixel per pixel - A person can wear make-up, grow a
if less than threshold
beard…
then y=1
- ID photo can be outdated
database image input image
Goal: A school wants to use Face Verification for validating student IDs in facilities
(dinning halls, gym, pool …)
4. What architecture?
Our solution: encode information about a picture in a vector
128-d
⎛ 0.931 ⎞
⎜ 0.433 ⎟
⎜ ⎟
⎜ 0.331 ⎟
Deep Network ⎜! ⎟
⎜ ⎟
⎜ 0.942 ⎟
⎜ 0.158 ⎟
⎜ ⎟ 0.4 < threshold
⎝ 0.039 ⎠
distance 0.4 y=1
⎛ 0.922 ⎞
⎜ 0.343 ⎟
⎜ ⎟
⎜ 0.312 ⎟
⎜! ⎟
Deep Network ⎜ ⎟
⎜ 0.892 ⎟
⎜ 0.142 ⎟ We gather all student faces encoding in a database. Given a new
⎜ ⎟
⎝ 0.024 ⎠ picture, we compute its distance with the encoding of card holder
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Face Recognition
Goal: A school wants to use Face Verification for validating student IDs in facilities
(dinning hall, gym, pool …)
4. Loss? Training?
We need more data so that our model understands how to encode:
Use public face datasets
So let’s generate triplets:
What we really want:
Input Output
Model ⎛ 0.13 ⎞ ⎛ 0.01 ⎞ ⎛ 0.95 ⎞
⎜ 0.42 ⎟ ⎜ 0.54 ⎟ ⎜ 0.45 ⎟
= ⎜
⎜ ..
⎟
⎟
⎜
⎜ ..
⎟
⎟
⎜
⎜ ..
⎟
⎟
0
⎜ 0.10 ⎟ ⎜ 0.45 ⎟ ⎜ 0.20 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
Architecture ⎜ 0.31 ⎟
⎜ 0.73 ⎟
⎜ ⎟
⎜ 0.11 ⎟
⎜ 0.49 ⎟
⎜ ⎟
⎜ 0.41 ⎟
⎜ 0.89 ⎟
⎜ ⎟
⎜ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟
+ ⎜ 0.43 ⎟
⎜ ⎟
⎜⎝ 0.33 ⎟⎠
⎜ 0.12 ⎟
⎜ ⎟
⎜⎝ 0.01 ⎟⎠
⎜ 0.31 ⎟
⎜ ⎟
⎜⎝ 0.34 ⎟⎠
Loss
L = Enc(A) − Enc(P)
2
2
− Enc(A) − Enc(N )
2
Gradients 2
+α
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Face Recognition
Goal: A school wants to use Face Identification for recognize students in facilities
(dinning hall, gym, pool …)
K-Nearest Neighbors
Goal: You want to use Face Clustering to group pictures of the same people on your
smartphone
K-Means Algorithm
• You learnt the difference between face verification, face identification and
face clustering.
style
image generated
image
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015 Kian Katanforoosh
Art generation (Neural Style Transfer)
4. Architecture? We use a pre-trained model because it extracts important information from images.
Deep Network
classification
(pretrained on ImageNet)
⎛ 0.43⎞
⎜ 0.39⎟ ⎛ 0.92⎞ ⎛ 0.13⎞
ContentC = ⎜ ⎟ ⎜ 0.01⎟ Gram Matrix ⎜ 0.32⎟
⎜! ⎟ ⎜ ⎟ StyleS = ⎜ ⎟
⎜! ⎟ ⎜! ⎟
⎜ ⎟
⎝ 0.53⎠ ⎜ ⎟
⎝ 0.53⎠ ⎜ ⎟
⎝ 0.92⎠
Deep Network
classification
(pretrained on ImageNet)
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015 Kian Katanforoosh
Art generation (Neural Style Transfer)
Image generation process
⎛ 0.29⎞ ⎛ 0.22⎞ ⎛ 0.12⎞
⎜ 0.31⎟ ⎜ 0.99⎟
Gram Matrix ⎜ 0.10⎟
⎜ ⎟
ContentG = ⎜ ⎟ ⎜! ⎟ StyleG = ⎜ ⎟
⎜! ⎟ ⎜ ⎟ ⎜! ⎟
⎜ ⎟ ⎝ 0.43⎠
⎝ 0.44⎠ ⎜ ⎟
⎝ 0.92⎠
Deep Network
compute loss
(pretrained on ImageNet)
Kian Katanforoosh
Art generation (Neural Style Transfer)
Image generation process
⎛ 0.29⎞ ⎛ 0.22⎞ ⎛ 0.12⎞
⎜ 0.31⎟ ⎜ 0.99⎟
Gram Matrix ⎜ 0.10⎟
⎜ ⎟
ContentG = ⎜ ⎟ ⎜! ⎟ StyleG = ⎜ ⎟
⎜! ⎟ ⎜ ⎟ ⎜! ⎟
⎜ ⎟ ⎝ 0.43⎠
⎝ 0.44⎠ ⎜ ⎟
⎝ 0.92⎠
Deep Network
compute loss
(pretrained on ImageNet)
• In the neural style transfer algorithm proposed by Gatys et al., you optimize
image pixels rather than model parameters. Model parameters are
pretrained and non-trainable.
• The loss proposed by Gatys et al. aims to minimize the distances between the
content of the generated and content images, and the style of the generated
and style images.
3. Output? y = 0 or y = 1
Kian Katanforoosh
Let’s have an experiment!
y=1
y=0
y=1
y = 000000000000000000000000000000000000000010000000000
y = 000000000000000000000000000000000000000000000000000
y = 000000000001000000000000000000000000000000000000000
Kian Katanforoosh
Trigger word detection
GRU
+
GRU
+ … GRU
+
GRU
+
σ σ σ … σ σ σ
BN BN BN BN
σ σ … σ σ
000000..000001..10000..000 000000..000001..10000..000
• Your data collection strategy is critical to the success of your project. (If
applicable) Don’t hesitate to get out of the building.
• You can gain insights on your labelling strategy by using a human experiment.
• Refer to expert advice to earn time and be guided towards a good direction.
Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi: You Only Look Once: Unified, Real-Time Object Detection Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Duties for next week
C1M3
• Quiz: Shallow Neural Networks
• Programming Assignment: Planar data classification with one-hidden layer
C1M4
• Quiz: Deep Neural Networks
• Programming Assignment: Building a deep neural network - Step by Step
• Programming Assignment: Deep Neural Network Application
Others:
• TA project mentorship (mandatory this week)
• Friday TA section (10/04)
• Fill-in AWS Form to get GPU credits for your projects