0% found this document useful (0 votes)
146 views46 pages

Face Recognition Using Facenet

Face recognition systems identify individuals from images by extracting features from faces and comparing them to a database of known faces. This document describes FaceNet, a face recognition system that uses a deep learning model to learn features. It extracts features using a convolutional neural network, trains the network using triplets of face images, and evaluates matches using a triplet loss function and distance measurements in the embedded feature space. The FaceNet model achieves state-of-the-art face recognition accuracy when trained on large datasets like YouTube Faces.

Uploaded by

vasavi college
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views46 pages

Face Recognition Using Facenet

Face recognition systems identify individuals from images by extracting features from faces and comparing them to a database of known faces. This document describes FaceNet, a face recognition system that uses a deep learning model to learn features. It extracts features using a convolutional neural network, trains the network using triplets of face images, and evaluates matches using a triplet loss function and distance measurements in the embedded feature space. The FaceNet model achieves state-of-the-art face recognition accuracy when trained on large datasets like YouTube Faces.

Uploaded by

vasavi college
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Face Recognition using

FaceNet
What is Face recognition?

• A facial recognition system is a


technology capable of
identifying or verifying a
person from a digital image or
a video frame from a video
source.
• Have a database of n persons.
• Get an input image.
• Output ID if the image is any
of the n persons or output not
recognized.
METHODOLOGY

1. Feature Extraction
2. Training
3. Evaluation
Feature Extraction
Traditional Approach To Image
Classification

Hand
InputImage Extracted Classifier ObjectLabel
Features
Issues

• Who makesthe features?


– Need anexpert for eachproblem domain
• Whichfeatures?
– Are they the samefor every problem type?
• How robust are these features to real images?
– Translation, Rotation, contrast changes,etc.
Are these pictures of the samething?
Features AreHierarchical

• Asquirrel is acombination of fur, arms, legs, & atail


in specificproportions.
• Atail is madeof texture, color, and spatial
relationships
• Atexture is madeof oriented edges,
gradients, andcolors
ImageFeatures
• Afeature is something in the imageor derived from it
that’s relevant to thetask
• Edges
• Lines at different angles, curves, etc.
• Colors, or patterns of colors
• SIFT, SURF, HOG, GIST, ORB, etc
Ideally We’d LearnFeatures
Input
Image

CNNs

Output
Label
Neural Network??
• It is biologicallyinspired
• Amental model for interpreting themath

http://cs231n.stanford.edu/index.html
1-layered Neural Network
Inputs
Weights
1
w0
x1 Activation
w1 Sum
Function

x2 w2 Σ Output

x3 w3

wm m

w x  w x
xm
i i 0 0  w1x1  w2 x2 ...  wmxm
i0
Training: Updating Weights
Inputs
Weights
1 Error =Output - Target
w0
x1 Activation
w1 Sum
Function

x2 w2 Σ Output

x3 w3

w4
x4
Backpropagation
• Error propagates backward and it all works via
(normally stochastic) gradientdescent.
• (wavehands)
Deep(Multi-‐Layer) Neural Network
CNNLayerArchitecture
Input
Convolution

Nonlinearity

Pooling (optional)

Dropout(optional)
Goals
• Need to detect the same feature
anywhere in an image
• Reuse the same weights over and over
• What we really want is one neuron that
detects a feature that we slide over the
image
Convolution
• Like sliding amatrix over the input and
performing dot products
• It’s all just matrixmultiplication
Pooling

• A pooling layer takes the maximum of


features over small blocks of a
previous layer. The output tells us if a
feature was present in a region of the
previous layer, but not precisely where.
• Max, sum, and L2 pooling
• Atype of downsampling
Dropout

• Randomly disable someneurons on the


forwardpass
• Preventsoverfitting

http://cs231n.github.io/neural-‐networks-‐2/
CNN Models
• Category 1
• Zeiler & Fergus Based
Model
• 1x1 convolution Add
• NN1
• Category 2
• GoogleNet Based Model
• NN1And Compare
• About 20Times Less
Parameters
• Maximum 5Up to the ship
Less Calculate Volume
• NN2-4
• Enter Size: 220x220, 160x160,
96x96
• NNS1-4
• Small for mobile Models
Inception Model
 It is a 27 layered CNN

Inception layer
Inception Based architecture
Training
Recap
 Suppose we have a 720 pixel image.
 It has 1280 x 720 pixels which is equal to
921600.
 Doing computations with these many number of
features is difficult .
 So we reduced it to a 128 dimensional vector.
 Now we have to do computations with these 128
features.
Parameters or Weights
w1 w2 W3
9 7 4 4 3 5 w4 w5 W6 9
1 2 3 0 7 6 w7 w8 w9
6 3 4 5 7 3 3x3
7 0 6 5 7 8
3 8 6 2 9 3 w1 w2 w3 w4 W5
w6 w7 w8 w9 w10
2 7 0 1 8 8
w11 w12 w13 w14 w15 25
6x6
w16 w17 w18 w19 w20
w21 w22 w23 w24 w25
5x5
Activation Function
Suppose we have
(0,0),(1,1),(2,2),(3,3)
When I ask you
what is the value of
y at x=6.
Suppose we want non
linear functions such as
these so we use
something called
activation functions.
Activation functions used
Forward Propagation
Encoding or Embedding
L2 Norm
Algorithm

d(A,P)=||f(A)-f(P)||^2 d(A,N)=||f(A)-f(N)||^2

d(A,P)=||f(A)-f(P)||^2 +c <= d(A,N)=||f(A)-


f(N)||^2

||f(A)-f(P)||^2 +c -||f(A)-f(N)||^2<=0

d(A,P)+c-d(A,N)<=0
Triplet Loss Function
We Define our loss function as follows:

d(A,P)+c-d(A,N)<=0

Loss(A,P,N)=d(A,P)+c-d(A,N)

C is the threshold factor


Backpropagation

E(layer n-1)=actvfunc(E(layer n))*rot(180°)W(layer2)


Plot of Loss with weights
Gradient Descent
Adagrad
Adagrad
 It adapts the learning rate according to frequency of update of parameters.
Face Recognition
Evaluation
Datasets
 Trained the model on many datasets.
 1.Personal photos dataset
 Around 12k images
 2.YouTube faces dataset
 Done face verification with 95.12%accuracy.

 The model was trained in a CPU cluster for 1000 to 5000 hrs.
 Initially alpha was set to 0.2
 Only after 500hr the loss was decreased.
 Further training will improve performance.
Results

Image Quality
(NN1, Hold-out
DB)

Embedding Training
Dimensionality Data Size
(NN1, Hold-out (NN2
DB) Transfor
m)
Previous works
Previously they used a bottleneck layer or
used multiple models and pca,svm
classification to implement this .

This increases the computation time


and complexity of the model.
But our model uses embeddings and a
optimization algorithm directly and thus
reduces the amount of complexity and
computation which existed before.
Previous Works

Bottleneck layer and classifier


Applications
 Face Verification
 Face Recognition
 Face Clustering
Thank You

You might also like