0% found this document useful (0 votes)
34 views

CS230: Lecture 2 Practical Approaches To Deep Learning Projects

Here are the key steps: 1. Collect a dataset of face images for every student, with labels 2. Use face detection to extract faces as input 3. Siamese network with contrastive loss to learn embedding 4. Compare embeddings of input face to database, output similarity score The architecture encodes faces to embeddings, contrastive loss trains embeddings of same/different people to be close/far. At test time, we embed an input face and find its nearest neighbor in the database to do verification. Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri Face Recognition Goal: Build a photo tagging system that recognizes faces in photos and tags

Uploaded by

Sarah Eharot
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

CS230: Lecture 2 Practical Approaches To Deep Learning Projects

Here are the key steps: 1. Collect a dataset of face images for every student, with labels 2. Use face detection to extract faces as input 3. Siamese network with contrastive loss to learn embedding 4. Compare embeddings of input face to database, output similarity score The architecture encodes faces to embeddings, contrastive loss trains embeddings of same/different people to be close/far. At test time, we embed an input face and find its nearest neighbor in the database to do verification. Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri Face Recognition Goal: Build a photo tagging system that recognizes faces in photos and tags

Uploaded by

Sarah Eharot
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

CS230: Lecture 2

Practical Approaches to Deep


Learning Projects
Kian Katanforoosh

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Recap

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Learning Process

Input Output
Model
=
Architecture
+
0
Parameters

Things that can change Loss


- Activation function
- Optimizer
Gradients
- Hyperparameters
- … Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Logistic Regression as a Neural Network

image2vector

⎛ 255 ⎞
/255
(i )
x 1

⎜ 231 ⎟ /255

x (i )

⎜ ⎟
2

0.73 > 0.5

⎜ ... ⎟ … … … wT x (i ) + b σ 0.73 “it’s a cat”

⎜ 94 ⎟ /255
x (i )

⎜ ⎟
n−1

⎝ 142 ⎠
/255
(i )
x n

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Multi-class

image2vector

⎛ 255 ⎞
/255
(i )
x 0.12 < 0.5
1
w x +b
T (i )
σ 0.12 Dog?

⎜ 231 ⎟ /255

x (i )

⎜ ⎟
2

⎜ ⎟
0.73 > 0.5

... … … wT x (i ) + b σ 0.73 Cat?

⎜ 94 ⎟ /255
x (i )

⎜ ⎟
n−1
0.04 < 0.5
w x +b σ
⎝ 142 ⎠
T (i )
/255
(i )
0.04 Giraffe?
x n

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Neural Network (Multi-class)

image2vector

⎛ 255 ⎞
/255
(i )
x 1
w x +b
T (i )
σ
⎜ 231 ⎟ /255

x (i )

⎜ ⎟
2

⎜ ... ⎟ … … wT x (i ) + b σ

⎜ 94 ⎟ /255
x (i )

⎜ ⎟
n−1

w x +b σ
⎝ 142 ⎠
T (i )
/255
(i )
x n

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Neural Network (1 hidden layer)

image2vector

Hidden layer

⎛ 255 ⎞
/255
(i )
x [1]
a
1

⎜ 231 ⎟ /255

x (i )
1

⎜ ⎟
2 output layer

⎜ ⎟
[1] [2]
... … … a 2
a 1 0.73

⎜ 94 ⎟ /255
x (i )
0.73 > 0.5

⎜ ⎟
n−1
[1]
a
⎝ 142 ⎠
/255 Cat
x (i ) 3
n

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Deeper network: Encoding
Hidden layer

(i ) [1]
x
1 a 1
Hidden layer

[2]
a 1
(i ) [1]
x 2 a 2
output layer

(i )
a [2]
2 a [ 3]
1

(i ) [1]
x 3 a 3
[2]
a 3
(i ) [1]
x 4 a 4

Technique called “encoding”


Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Summary of learnings: Introduction

• A model is defined by its architecture and its parameters.


• The labelling strategy matters to successfully train your models. For example, if
you’re training a 3-class (dog, cat, giraffe) classifier under the constraint of one
animal per picture, you might use one-hot vectors to label your data.


• We introduced a set of notations to differentiate indices for neurons, layers


and examples.


• In deep learning, feature learning replaces feature engineering.

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Let’s build intuition on concrete applications

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Today’s outline

We will learn tips and tricks to:


- Analyze a problem from a I. Day’n’Night classification
deep learning approach
II. Face verification and recognition
- Choose an architecture
III. Neural style transfer (Art
- Choose a loss and a generation)
training strategy
IV. Trigger-word detection

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Day’n’Night classification

Goal: Given an image, classify as taken “during the day” (0) or “during the night” (1)

1. Data? 10,000 images Split? Bias?

2. Input? Resolution? (64, 64, 3)

3. Output? y = 0 or y = 1 Last Activation? sigmoid

4. Architecture ? Shallow network should do the job pretty well

5. Loss? L = −[ y log( ŷ) + (1− y)log(1− ŷ)] Easy warm up


Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Summary of learnings: Day n’ Night classification

• Use a known proxy project to evaluate how much data you need.


• Be scrappy. For example, if you’d like to find a good resolution of images to use
for your data, but don’t have time for a large scale experiment, approximate
human-level performance by testing your friends as classifiers.

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Face Verification

Goal: A school wants to use Face Verification for validating student IDs in facilities
(dinning halls, gym, pool …)

1. Data? 2. Input? 3. Output?

Picture of every student


labelled with their name y = 1 (it’s you)
or
y = 0 (it’s not you)

Bertrand Resolution?
(412, 412, 3)
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Face Verification

Goal: A school wants to use Face Verification for validating student IDs in facilities
(dinning halls, gym, pool …)

4. What architecture?
Simple solution: Issues:

compute distance
- Background lighting differences
pixel per pixel - A person can wear make-up, grow a
if less than threshold
beard…
then y=1
- ID photo can be outdated
database image input image

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Face Verification

Goal: A school wants to use Face Verification for validating student IDs in facilities
(dinning halls, gym, pool …)

4. What architecture?
Our solution: encode information about a picture in a vector
128-d
⎛ 0.931 ⎞
⎜ 0.433 ⎟
⎜ ⎟
⎜ 0.331 ⎟
Deep Network ⎜! ⎟
⎜ ⎟
⎜ 0.942 ⎟
⎜ 0.158 ⎟
⎜ ⎟ 0.4 < threshold
⎝ 0.039 ⎠
distance 0.4 y=1
⎛ 0.922 ⎞
⎜ 0.343 ⎟
⎜ ⎟
⎜ 0.312 ⎟
⎜! ⎟
Deep Network ⎜ ⎟
⎜ 0.892 ⎟
⎜ 0.142 ⎟ We gather all student faces encoding in a database. Given a new
⎜ ⎟
⎝ 0.024 ⎠ picture, we compute its distance with the encoding of card holder
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Face Recognition

Goal: A school wants to use Face Verification for validating student IDs in facilities
(dinning hall, gym, pool …)

4. Loss? Training?
We need more data so that our model understands how to encode:
Use public face datasets
So let’s generate triplets:
What we really want:

anchor positive negative

minimize encoding distance

similar encoding different encoding maximize encoding distance

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Face Recognition

What we really want:

similar encoding different encoding

So let’s generate triplets:

anchor positive negative

minimize encoding distance

maximize encoding distance

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Recap: Learning Process

Input Output
Model ⎛ 0.13 ⎞ ⎛ 0.01 ⎞ ⎛ 0.95 ⎞
⎜ 0.42 ⎟ ⎜ 0.54 ⎟ ⎜ 0.45 ⎟
= ⎜
⎜ ..



⎜ ..



⎜ ..

0
⎜ 0.10 ⎟ ⎜ 0.45 ⎟ ⎜ 0.20 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟

Architecture ⎜ 0.31 ⎟
⎜ 0.73 ⎟
⎜ ⎟
⎜ 0.11 ⎟
⎜ 0.49 ⎟
⎜ ⎟
⎜ 0.41 ⎟
⎜ 0.89 ⎟
⎜ ⎟
⎜ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟

+ ⎜ 0.43 ⎟
⎜ ⎟
⎜⎝ 0.33 ⎟⎠
⎜ 0.12 ⎟
⎜ ⎟
⎜⎝ 0.01 ⎟⎠
⎜ 0.31 ⎟
⎜ ⎟
⎜⎝ 0.34 ⎟⎠

anchor positive negative Parameters Enc(A) Enc(P) Enc(N)

Loss
L = Enc(A) − Enc(P)
2
2

− Enc(A) − Enc(N )
2

Gradients 2

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Face Recognition

Goal: A school wants to use Face Identification for recognize students in facilities
(dinning hall, gym, pool …)

K-Nearest Neighbors

Goal: You want to use Face Clustering to group pictures of the same people on your
smartphone

K-Means Algorithm

Maybe we need to detect the faces first?


Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Summary of learnings: Face Recognition

• In face verification, we have used an encoder network to learn a lower


dimensional representation (called “encoding”) for a set of data by training the
network to focus on non-noisy signals.


• Triplet loss is a loss function where an (anchor) input is compared to a


positive input and a negative input. The distance from the anchor input to the
positive input is minimized, whereas the distance from the anchor input to the
negative input is maximized.


• You learnt the difference between face verification, face identification and
face clustering.

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Art generation (Neural Style Transfer)

Goal: Given a picture, make it look beautiful

1. Data? 2. Input? 3. Output?

Let’s say we have content


any data image

style
image generated
image

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015 Kian Katanforoosh
Art generation (Neural Style Transfer)

4. Architecture? We use a pre-trained model because it extracts important information from images.

Deep Network
classification
(pretrained on ImageNet)

⎛ 0.43⎞
⎜ 0.39⎟ ⎛ 0.92⎞ ⎛ 0.13⎞
ContentC = ⎜ ⎟ ⎜ 0.01⎟ Gram Matrix ⎜ 0.32⎟
⎜! ⎟ ⎜ ⎟ StyleS = ⎜ ⎟
⎜! ⎟ ⎜! ⎟
⎜ ⎟
⎝ 0.53⎠ ⎜ ⎟
⎝ 0.53⎠ ⎜ ⎟
⎝ 0.92⎠

Deep Network
classification
(pretrained on ImageNet)

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015 Kian Katanforoosh
Art generation (Neural Style Transfer)
Image generation process
⎛ 0.29⎞ ⎛ 0.22⎞ ⎛ 0.12⎞
⎜ 0.31⎟ ⎜ 0.99⎟
Gram Matrix ⎜ 0.10⎟
⎜ ⎟
ContentG = ⎜ ⎟ ⎜! ⎟ StyleG = ⎜ ⎟
⎜! ⎟ ⎜ ⎟ ⎜! ⎟
⎜ ⎟ ⎝ 0.43⎠
⎝ 0.44⎠ ⎜ ⎟
⎝ 0.92⎠

Deep Network
compute loss
(pretrained on ImageNet)

After 2000 ⎛ 0.43⎞


iterations ⎜ 0.39⎟
ContentC = ⎜ ⎟
∂L ⎜! ⎟
update pixels using gradients
∂x ⎜ ⎟
⎝ 0.53⎠
⎛ 0.13⎞
⎜ 0.32⎟
StyleS = ⎜ ⎟
⎜! ⎟
⎜ ⎟
⎝ 0.92⎠
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015 Kian Katanforoosh
Art generation (Neural Style Transfer)

Which loss should we minimize?

Kian Katanforoosh
Art generation (Neural Style Transfer)
Image generation process
⎛ 0.29⎞ ⎛ 0.22⎞ ⎛ 0.12⎞
⎜ 0.31⎟ ⎜ 0.99⎟
Gram Matrix ⎜ 0.10⎟
⎜ ⎟
ContentG = ⎜ ⎟ ⎜! ⎟ StyleG = ⎜ ⎟
⎜! ⎟ ⎜ ⎟ ⎜! ⎟
⎜ ⎟ ⎝ 0.43⎠
⎝ 0.44⎠ ⎜ ⎟
⎝ 0.92⎠

Deep Network
compute loss
(pretrained on ImageNet)

After 2000 ⎛ 0.43⎞


iterations ⎜ 0.39⎟
ContentC = ⎜ ⎟
∂L ⎜! ⎟
update pixels using gradients
∂x ⎜ ⎟
⎝ 0.53⎠
⎛ 0.13⎞
⎜ 0.32⎟
2 2 StyleS = ⎜ ⎟
where L = ContentC − ContentG 2 + StyleS − StyleG ⎜! ⎟
2 ⎜ ⎟
⎝ 0.92⎠
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015 Kian Katanforoosh
Kian Katanforoosh
Content image
Summary of learnings: Neural Style Transfer

• In the neural style transfer algorithm proposed by Gatys et al., you optimize
image pixels rather than model parameters. Model parameters are
pretrained and non-trainable.


• You leverage the “knowledge” of a pretrained model to extract the content of a


content image and the style of a style image.

• The loss proposed by Gatys et al. aims to minimize the distances between the
content of the generated and content images, and the style of the generated
and style images.

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Trigger word detection

Goal: Given a 10sec audio speech, detect the word “activate”.

1. Data? A bunch of 10s audio clips Distribution?

x = A 10sec audio clip Resolution? (sample rate)


2. Input?

3. Output? y = 0 or y = 1

Kian Katanforoosh
Let’s have an experiment!

y=1
y=0
y=1

y = 000000000000000000000000000000000000000010000000000

y = 000000000000000000000000000000000000000000000000000

y = 000000000001000000000000000000000000000000000000000
Kian Katanforoosh
Trigger word detection

Goal: Given a 10sec audio speech, detect the word “activate”.

1. Data? A bunch of 10s audio clips Distribution?

x = A 10sec audio clip Resolution? (sample rate)


2. Input?
y = 0 or y = 1 Last Activation?
3. Output? y = 00..0000100000..000 sigmoid
y = 00..00001..1000..000 (sequential)

4. Architecture ? Sounds like it should be a RNN

5. Loss? L = −( y log( ŷ) + (1− y)log(1− ŷ))


(sequential)
Kian Katanforoosh
Trigger word detection

What is critical to the success of this project?

1. Strategic data collection/ 2. Architecture search & Hyperparameter tuning


labelling process

Positive word Negative words Background noise


Fourier transform Fourier transform

LSTM LSTM LSTM … LSTM LSTM LSTM CONV + BN

GRU
+
GRU
+ … GRU
+
GRU
+

σ σ σ … σ σ σ
BN BN BN BN

σ σ … σ σ
000000..000001..10000..000 000000..000001..10000..000

Automated labelling Never give up 00..00001..100..0

+ Error analysis 000000..000001..10000..000


Kian Katanforoosh
Summary of learnings: Trigger word detection

• Your data collection strategy is critical to the success of your project. (If
applicable) Don’t hesitate to get out of the building.


• You can gain insights on your labelling strategy by using a human experiment.


• Refer to expert advice to earn time and be guided towards a good direction.

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri


Featured in the Magazine “the Most Beautiful Loss functions of 2015”

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi: You Only Look Once: Unified, Real-Time Object Detection Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Duties for next week

For Tuesday 10/08, 8am:


C1M3
• Quiz: Shallow Neural Networks
• Programming Assignment: Planar data classification with one-hidden layer

C1M4
• Quiz: Deep Neural Networks
• Programming Assignment: Building a deep neural network - Step by Step
• Programming Assignment: Deep Neural Network Application

Others:
• TA project mentorship (mandatory this week)
• Friday TA section (10/04)
• Fill-in AWS Form to get GPU credits for your projects

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

You might also like