Face Recognition Using Facenet
Face Recognition Using Facenet
FaceNet
What is Face recognition?
1. Feature Extraction
2. Training
3. Evaluation
Feature Extraction
Traditional Approach To Image
Classification
Hand
InputImage Extracted Classifier ObjectLabel
Features
Issues
CNNs
Output
Label
Neural Network??
• It is biologicallyinspired
• Amental model for interpreting themath
http://cs231n.stanford.edu/index.html
1-layered Neural Network
Inputs
Weights
1
w0
x1 Activation
w1 Sum
Function
x2 w2 Σ Output
x3 w3
wm m
w x w x
xm
i i 0 0 w1x1 w2 x2 ... wmxm
i0
Training: Updating Weights
Inputs
Weights
1 Error =Output - Target
w0
x1 Activation
w1 Sum
Function
x2 w2 Σ Output
x3 w3
w4
x4
Backpropagation
• Error propagates backward and it all works via
(normally stochastic) gradientdescent.
• (wavehands)
Deep(Multi-‐Layer) Neural Network
CNNLayerArchitecture
Input
Convolution
Nonlinearity
Pooling (optional)
Dropout(optional)
Goals
• Need to detect the same feature
anywhere in an image
• Reuse the same weights over and over
• What we really want is one neuron that
detects a feature that we slide over the
image
Convolution
• Like sliding amatrix over the input and
performing dot products
• It’s all just matrixmultiplication
Pooling
http://cs231n.github.io/neural-‐networks-‐2/
CNN Models
• Category 1
• Zeiler & Fergus Based
Model
• 1x1 convolution Add
• NN1
• Category 2
• GoogleNet Based Model
• NN1And Compare
• About 20Times Less
Parameters
• Maximum 5Up to the ship
Less Calculate Volume
• NN2-4
• Enter Size: 220x220, 160x160,
96x96
• NNS1-4
• Small for mobile Models
Inception Model
It is a 27 layered CNN
Inception layer
Inception Based architecture
Training
Recap
Suppose we have a 720 pixel image.
It has 1280 x 720 pixels which is equal to
921600.
Doing computations with these many number of
features is difficult .
So we reduced it to a 128 dimensional vector.
Now we have to do computations with these 128
features.
Parameters or Weights
w1 w2 W3
9 7 4 4 3 5 w4 w5 W6 9
1 2 3 0 7 6 w7 w8 w9
6 3 4 5 7 3 3x3
7 0 6 5 7 8
3 8 6 2 9 3 w1 w2 w3 w4 W5
w6 w7 w8 w9 w10
2 7 0 1 8 8
w11 w12 w13 w14 w15 25
6x6
w16 w17 w18 w19 w20
w21 w22 w23 w24 w25
5x5
Activation Function
Suppose we have
(0,0),(1,1),(2,2),(3,3)
When I ask you
what is the value of
y at x=6.
Suppose we want non
linear functions such as
these so we use
something called
activation functions.
Activation functions used
Forward Propagation
Encoding or Embedding
L2 Norm
Algorithm
d(A,P)=||f(A)-f(P)||^2 d(A,N)=||f(A)-f(N)||^2
||f(A)-f(P)||^2 +c -||f(A)-f(N)||^2<=0
d(A,P)+c-d(A,N)<=0
Triplet Loss Function
We Define our loss function as follows:
d(A,P)+c-d(A,N)<=0
Loss(A,P,N)=d(A,P)+c-d(A,N)
The model was trained in a CPU cluster for 1000 to 5000 hrs.
Initially alpha was set to 0.2
Only after 500hr the loss was decreased.
Further training will improve performance.
Results
Image Quality
(NN1, Hold-out
DB)
Embedding Training
Dimensionality Data Size
(NN1, Hold-out (NN2
DB) Transfor
m)
Previous works
Previously they used a bottleneck layer or
used multiple models and pca,svm
classification to implement this .