0% found this document useful (0 votes)
107 views1 page

One Shot Learning

One-shot learning allows a computer vision system to compare two images it has never seen before and determine if they represent the same object, without requiring large labeled datasets for training. It works by using a Siamese neural network and triplet loss function to encode images based on features and determine if the encodings for two images are similar enough to represent the same object. While it reduces data labeling needs, each Siamese network can only perform one specific task and is sensitive to variations between images of the same object.

Uploaded by

Kok Hung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views1 page

One Shot Learning

One-shot learning allows a computer vision system to compare two images it has never seen before and determine if they represent the same object, without requiring large labeled datasets for training. It works by using a Siamese neural network and triplet loss function to encode images based on features and determine if the encodings for two images are similar enough to represent the same object. While it reduces data labeling needs, each Siamese network can only perform one specific task and is sensitive to variations between images of the same object.

Uploaded by

Kok Hung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 1

One Shot Learning

e.g: can you develop a computer vision system that can look at two images it has
never seen before and say whether they represent the same object?
One of the key problems in many computer vision problems is that you don’t have
many labeled images to train your neural network. For instance, a classic facial
recognition algorithm must be trained on many images of the same person to be able
to recognize her.

Imagine what this would mean for a facial recognition system used at an
international airport. You would need several images of every single person who
would possibly pass through that airport, which could amount to billions of images.

Instead of treating the task as a classification problem, one-shot learning turns


it into a difference-evaluation problem.

When a deep learning model is adjusted for one-shot learning, it takes two images
(e.g., the passport image and the image of the person looking at the camera) and
returns a value that shows the similarity between the two images. If the images
contain the same object (or the same face), the neural network returns a value that
is smaller than a specific threshold (say, zero) and if they’re not the same
object, it will be higher than the threshold.

The key to one-shot learning is an architecture called the “Siamese neural


network.”

we use a function called “triplet loss.” Basically, the triplet loss trains the
neural network by giving it three images: an anchor image, a positive image, and a
negative image. The neural network must adjust its parameters so that the feature
encoding values for the anchor and positive image are very close while that of the
negative image is very different.

For instance, in the case of the facial recognition example, a trained Siamese
neural network should be able to compare two images in terms of facial features
such as distance between eyes, nose, and mouth.

Training the Siamese network still requires a fairly large set of APN trios. But
creating the training data is much easier than the classic datasets that need each
image to be labeled. Say you have a dataset of 20 face images from two people,
which means you have 10 images per person. You can generate 1,800 APN trios from
this dataset. (You use the 10 pictures of each person to create 10×9 AP pairs and
combine it with the remaining 10 images to create a total of 10x9x10x2 = 1800 APN
trios)

With 30 images, you can create 5,400 trios, and with 100 images, you can create
81,000 APNs. Ideally, your dataset should have a diversity of face images to better
generalize across different features. Another good idea is to use a previously
trained convolutional neural network and finetune it for one-shot learning.

Limitations:
Each Siamese neural network is just useful for the one task it has been trained on.
A neural network tuned for one-shot learning for facial recognition can’t be used
for some other task, such as telling whether two pictures contain the same dog or
the same car.

The neural networks are also sensitive to other variations. For instance, the
accuracy can degrade considerably if the person in one of the images is wearing a
hat, scarf, or glasses, and the person in the other image is not.

You might also like