0% found this document useful (0 votes)
41 views11 pages

Research Paper - Formatted

This document discusses using a Siamese neural network model for few shot learning on chest x-ray images. It aims to efficiently and accurately detect diseases from limited training examples. The model is trained on hard example pairs to increase robustness. Various CNN architectures are compared in terms of performance and computation time for this task.

Uploaded by

Bob Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views11 pages

Research Paper - Formatted

This document discusses using a Siamese neural network model for few shot learning on chest x-ray images. It aims to efficiently and accurately detect diseases from limited training examples. The model is trained on hard example pairs to increase robustness. Various CNN architectures are compared in terms of performance and computation time for this task.

Uploaded by

Bob Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Few shot predictions for chest X-rays Images using

Siamese Style Network


1
Dushyant Singh, 2Shivam Kumar, 3Yogesh Walecha, 4Astitva, 5Tausif Diwan
1, 2, 3, 4, 5,
Department of Computer Science & Engineering, Indian Institute of Information Technology, Nagpur, India
1
[email protected], [email protected], [email protected], [email protected], [email protected]
Abstract
For medical images classification methods based on Machine Learning and Deep Learning exist but they only work well if a
large amount of labelled data is present. But generally, we don’t have such datasets present for medical images. Trained
personnel may learn to classify new diseases by looking at a few relevant images but for a deep learning model to do so is not
possible as they end up overfitting it. That’s where few shot or K- shot learning can be useful which can learn to classify a
new disease class just by looking at a few labelled examples. For few shot predictions we have used covid-19 radiography
dataset which contains 3 classes Covid-19, Normal and Pneumonia. We have used Siamese Network in which we make pairs
of images to train on and label them similar or not similar. Most of the literature currently on Siamese network in medical
domain selects these pairs randomly we have come up with a simple algorithm to find hard pairs i.e the pairs which are from
same class and have large Euclidian distance between their feature vector and the pairs belonging to different class but have
similar Euclidean distance. If there are N images then for binary cross-entropy loss we will have total N^2 pairs possible and
if we are using triplet loss then it will be N^3 triplets so by just randomly selecting we might not get these hard pairs so we
need to explicitly select them.Training on these hard pairs makes the model more robust. We used a simple Siamese Network
as our base model and we found a 2-3% increase in accuracy with hard pairs sampling over our base model. We also
compared different CNN Architectures like VGG-16, Resnet, DenseNet and MobileNet and found out the performance of
VGG-16 and ResNet were the best but they had higher training time compared to MobileNet and DenseNet. Overall accuracy
of VGG-16 was best and MobileNet had the least training time.
Keywords: Few shot learning, covid-19, deep learning, Siamese Network, Transfer Learning, Hard-pairs mining

1. INTRODUCTION
This research was undertaken to develop a model for predicting diseases from chest x-ray images that can quickly adapt to
new classes with just a few training examples. The goal was to find an efficient and accurate method for detecting chest
diseases which have a smaller number of training images in particular. Few-Shot Learning is where a learner is trained on
several related tasks, during the meta-training phase, so that it can generalise well to unseen (but related) tasks with just a few
examples during the meta-testing phase. N-way K shot means to classify new tasks presented in front of it having N classes
with K examples each (N- Way K- Shot).
There are 4 main reasons why Few shot learning is an ideal choice for the Medical Field:

● There are Limited sources for medical images and are not readily available to the public domain.

● Manual labelling of data is time consuming, not always practical and needs medical experts.

● Some diseases are rare and just don’t have enough data present

● Few shot models will be more robust if we want to predict some new disease with it, it can quickly adapt with just a few
training examples.

We are using Siamese Network + Transfer Learning. It can be used for few shot learning because during the training phase
we are not interested in learning the labels of classes, instead we are teaching our model how to know if two classes are
similar or not. So, if we have a rare disease say X for which we only have few training examples and there are other similar
classes of disease for which we do have large amount of data then essentially, we can use those classes as base classes and
then we give our model query example of the rare disease X which it might not have seen before but it can still compare it
with other classes in our support set and select the class which have least Euclidean distance with query image.
Siamese Networks have become popular in recent years, especially for tasks related to image similarity or text similarity. The
network architecture consists of two identical neural networks that take in two inputs, producing embeddings that are
compared using a distance metric. However, one issue with Siamese Networks is the identification of "hard pairs" or pairs
that are difficult to distinguish. The identification of hard pairs is crucial for improving the performance of Siamese Networks.
Hard pairs represent cases where the network is struggling to learn and may be a source of error. By identifying these pairs,
we can retrain the network to improve its performance. The main challenge in identifying hard pairs is determining which
pairs are truly hard. Simply selecting pairs with large distances may not be effective, as some pairs may have large distances
due to noise in the data rather than true differences in similarity.
Several approaches have been proposed for identifying hard pairs in Siamese Networks. These include distance-based
sampling and margin-based sampling. However, the effectiveness of these techniques has not been thoroughly evaluated. In
this research report, we investigate various CNN architectures for Siamese Networks and compare their performance and
computation time. We propose a method for identifying hard pairs based on Euclidean distance differences and evaluate its
effectiveness.

We have used covid-19 radiography dataset which contains 3 classes Covid-19, Normal and Pneumonia. We have used NIH
as an auxiliary dataset which consists of 8 classes to get more variety in X-ray classes. We see a very small increase in 3 ways
as the base model accuracy was already quite high but as we increase the N value, we can see an increase from base model
to be increasing. For 10-way it’s around 2% increase in accuracy while in 20-way it's 3% improvement.We also compared
some of the popular CNN architectures in terms of accuracy and computational efficiency. MobileNet and DenseNet are more
efficient in terms of time taken per epoch, making them suitable for real-time or resource-constrained applications. On the
other hand, VGG-16 and ResNet got better accuracy and were comparable to each other but were relatively slower compared
to MobileNet and DenseNet.

Major contributions:

● This is the first work that uses the Siamese Network with hard pairs sampling in the X-rays domain.

● Our research provides insights into the effectiveness of various CNN models for few shot learning via Siamese Networks.

● We propose an effective method for identifying hard pairs. Our findings can help improve the performance of Siamese
Networks on similarity tasks.

2. RELATED WORK

There are four main categories for few shot learning. Transfer learning based, Meta-learning based, Data augmentation based,
and Multimodal based methods. Transfer learning-based methods are used to transfer the knowledge it learned from training
on target domain and then fine-tune it to required tasks. Meta-learning-based methods employ past prior knowledge to guide
the learning of new tasks. Data augmentation is used when the amount of data is less, we use it to augment the data by
rotation, cropping etc. and generate new data. Multimodal based methods use auxiliary info such as text, audio, video to make
up for less data.There certain advantages and disadvantages discussed in Table.1 over choosing one method over other.

Method Characteristics Advantage Disadvantage

Transfer Learning Transfer of the useful prior Knowledge Alleviate of overfitting Negative transfer

Usage of prior knowledge to guide the


Meta-Learning Excellent performance Complex model
learning of new tasks

Usage of auxiliary information to Poor generalisation


Data Augmentation Prevention of overfitting
expand sample data ability

Usage of the information of auxiliary Better feature Hard to train and


Multimodel
modalities to classify images representation calculate

Table.1 Comparing few shots learning methods


2.1 Metric-Based Methods
Metric Based methods work by comparing the similarity and dissimilarity between feature vectors. The idea is if 2 images are
similar then the Euclidean or cosine distance between them should be close and if 2 images are different then the Euclidean or
cosine distance between them should be large.
Authors first introduced the Siamese Neural Networks for few shot classification tasks in 2015.
[7] Authors proposed the Matching Networks (MN), which learns an embedding function and uses the cosine to measure
similarity.
[8] Authors averaged the Feature vector of k examples of a class to get the mean vector that will represent the entire class and
then a similarity Score is calculated. They got 1.5 % ~ 4 % improvement for 5 - way 5 - shots.

2.2 Work in Medical Domain


Shang et al. [12] combined transfer learning, multi-task learning, and semi-supervised learning methods into a unified
framework to promote a better performance of medical image classification.
Cai et al. [13] proposed an end-to-end learning model combined with an attention mechanism to solve the problem of medical
image classification and extract features from space and channels, so as to enhance the representation ability of the model.
Chen et al. [14] proposed a few-shot learning method for the automatic screening of COVID-19 images. An encoder is trained
by comparative learning, which can capture the feature representation on the lung dataset and classify it by prototype network.
[15] Authors classified between the classes Covid, Normal and Viral Covid-19 Radiography dataset. They used Siamese
networks and transfer learning.
[16] Authors conducted their experiments on Lung Image Database Consortium Image Collection which contains over 32,000
lung CT images of different lung diseases.
They used Prototypical Networks for few-shot classification. K clusters each point belong to the cluster with the nearest mean
The goal of the pre-trained encoder is to ensure that similar images are close and dissimilar images are separate in the latent
space
[17] Authors used dataset consisting of 1,080 high-resolution FUNDUS RGB images, made available by Samsung Medical
Center in Korea
They used matching Networks. The MNs have outperformed convolutional Siamese network on Omniglot task It uses an
attention mechanism that leverages cosine similarity.
[18] Authors introduced the idea of Pretraining on Unlabeled data. As large amounts of labelled data are difficult and
expensive to obtain for pretraining. So, it makes sense to also pretrain on large unlabeled data and learn some basic “building
blocks” or high-level features. It is only helpful if the Unlabeled data is drawn from the same distribution.

3. Proposed methodology, Dataset and Results


3.1 Dataset

COVID-19 Radiography dataset consists of chest X-ray images for COVID-19 positive cases along with Normal and Viral
Pneumonia images, created by a team of researchers from Qatar University, Doha, Qatar and the University of Dhaka,
Bangladesh along with their collaborators from Pakistan and Malaysia in collaboration with medical doctors.

In the current release (as of June 6, 2020), there are 219 COVID-19 positive images, 1341 normal images and 1345 viral
pneumonia images.The dataset maintainers continue to update this database as soon as they have new x-ray images for
COVID-19 pneumonia patients.

Dataset Name COVID-19 Radiography Dataset

Description A collection of chest X-ray images labeled as COVID-19


positive, pneumonia positive, or normal

Total images 3,616

COVID-19 Positive Images 1,201

Pneumonia Positive Images 1,200

Normal Images 1,215

Sources Multiple Sources

Labels Assigned by a team of medical experts

Availability Publicly available on Kaggle and GitHub

Use To train and evaluate machine learning models for


classification of chest X-ray images into COVID-19
positive, Pneumonia positive, or normal categories
Table. 2 Overview of Covid-19 Radiography Dataset

3.2 Pre-processing
For Pre-processing we have resized all the images to 100 x 100 x3, and applied Histogram equalization to normalize the
contrast of the image makes the image clearer in darker areas. We are using no images of Covid-19 class during the training
process as it is our novel class and we are using 30 images each from the rest of the classes in our radiography dataset. To
make up for less data we have used NIH as an auxiliary dataset. We are using it as base classes. Although it has a significant
number of images per class, we only took around 30-50 from each class. Figure.1,Figure.2 and Figure.3 shows the sample of
images of chest X-ray.

Figure.1 Examples of Normal Class

Figure.2 Examples of Covid-19 class

Figure.3 Examples of Pneumonia class


3.3 Method

We are using Siamese Network + Transfer Learning. The reason for choosing Metric based approach over other alternatives
in Meta Learning such as optimization based and model based is because it works very well when combined with transfer
learning because if 2 images are similar their feature vectors should also be similar and if have a good feature extractor then
our Siamese network performance will also improve and we can get good feature vector by using transfer learning or
pretraining on similar task
So, combining transfer learning with the Siamese Network we can increase performance and save a lot of time by getting
ImageNet or NIH weights as initializers.
Covid-19 Radiography dataset consists of 3 classes Covid-19, Normal and Pneumonia.
An important choice we had to make was choosing Base and Novel classes combination. Main goal is to transfer the
knowledge of base classes to novel classes. Feature Extractor is trained using a base class and then we do few-shot predictions
on novel classes.

3.4 Proposed methodology


Siamese Network takes pairs of images as input. Then each image passed through a CNN feature extractor which gives us the
feature vector of the image. Then we combine these 2 feature vectors by taking Euclidean distance and make the prediction. It
is now just a binary classification problem where 1 means similar and 0 means not similar
We'll then pass this difference vector through a series of fully connected layers to produce a single output that indicates the
similarity between the two input images. We popped the last few layers as we are not interested in classification of classes.
We experimented with popular CNN architectures. VGG16 and ResNet are popular choices for image similarity tasks, while
Inception may be better suited for applications that require capturing features at different scales. MobileNet and DenseNet are
lightweight architectures that may be better suited for mobile and embedded applications or situations where computational
resources are limited. The choice of architecture is based on the trade-offs between computational complexity and
performance.
We found out VGG-16 Performance to be best with although it is little computationally expensive. And instead of using
random weights as initializers we are ‘imagenet’ weights. One can argue that imagenet weight won’t be much help in X-rays
classification which is somewhat true but it is still better than using random weights. Because it will help us in capturing high
level features and then we can fine tune the model according to our task.
We froze all the layers except the last few layers. This is because the first few layers capture high level features like curves
and edges that are useful for any classification problem. We will keep these features and then train the last few layers to
capture task specific features.Another choice to make is in the Loss function. We had 2 main choices: Binary cross-entropy as
it is just a binary classification problem. Another choice is to use the triplet loss function which has shown great results for the
Siamese network.
But for simplicity we decided to go with the Binary cross-entropy loss function.We chose the learning rate of 1e-4. The
reason for choosing a very low learning rate is because we are only training the last few layers and with a higher learning rate,
we risk the chance of overfitting the data.

Figure.4 Architecture of Siamese model

3.6 Hard Pairs


Hard pairs are the pairs which are from the same class and have large Euclidian distance between their feature vectors and the
pairs belonging to different classes but have similar Euclidean distance.
There has been previous work done in the domain of finding hard pairs or triplets for Siamese Network such as Ref [19]. But
we didn’t find any such work done for the Siamese Network in the medical domain. They mostly relied on generating these
pairs randomly from the dataset. Another motivation for coming up with our method was that the papers we studied on
finding hard pairs relayed on preparing these pairs beforehand. So, we came up with a much simpler and less time-consuming
approach to find these hard pairs.
In some cases, we can only rely on selecting random pairs especially when the classes differ greatly from each other, for
example if you are using 2 classes like elephant and cat. These 2 animals are completely different from each other so their
feature vectors will also differ greatly and there will be no need to find hard pairs to train on explicitly. But in case of X-rays
there is a good chance that 2 different diseases X-rays look similar or there are variations in the same class.
Let’s say we have 2 classes with 100 images each then it means there are 10^4 image pairs and if we just randomly select the
pairs then there is a good chance we won’t be able to capture the hard pairs like the sample in Figure.4 and Figure.5.
Training on hard pairs makes our model more robust and it forces it to learn the true distinguishing feature.

Figure 4, Example of different image pair

Figure.5 Example of same image pair

Figure.6 shows the architecture of our model. We don’t have to choose hard pairs for every epoch. For most of the epochs we
still pick pairs randomly but after every 10th epoch we use our trained model till now to find and generate hard pairs equal to
half our batch size and the other half is randomly selected. For selecting hard pairs from the same class, we randomly choose
a class label and then iterate over all the pairs from that class and use our trained model till now to predict which of them have
the largest margin between them. One more thing to note is that we don’t always have to select the largest margin pair as it
might just be an exception or noise. We select the pairs that differ by a certain threshold distance. By always selecting the
maximum or minimum distance pairs we might overfit our model to outliers so we defined a threshold and selected randomly
the pairs which crossed that threshold.
And for finding the hard pairs from different classes it is not optimal to generate all possible pairs from a dataset because that
will require a lot of time and memory. So instead, we select 2 random class labels and then form all possible pairs from those
2 different classes and use our trained model till the current epoch to find the pairs with distance less than our threshold value
between their feature vectors.
Figure.6 Overview of Model

We are not using the classical N-way K shot where there are N classes and K examples each. As we have only 3 query classes
so a 20-way few shot means that if query example is COVID-19 image then we will generate 20 pairs such that only 1 pair is
(Covid-19, Covid-19) and in rest 19 pairs we are pairing it with some other class such as Normal or Pneumonia.
We do this to make our model task harder as now there are 20 pairs with only 1 possible correct answer for the model to
choose from.

3.5 Proposed Algorithm

The Siamese network is a neural network architecture that is often used for few-shot learning tasks, where there are only a
few labelled examples available for each class. The goal of few-shot learning is to train a system that can accurately classify
new examples, even if there are only a few labelled examples available for each class. The Siamese network consists of two
identical sub-networks that share weights. Each sub-network takes as input an example from the dataset and produces a fixed-
length embedding vector. During training, the network is trained to produce similar embeddings for examples that belong to
the same class and dissimilar embeddings for examples that belong to different classes.
To classify a new example with few labelled examples available, the Siamese network is used to compute the distance
between the query example and each example in the support set. The distance metric used is typically the Euclidean distance
between the embedding vectors produced by the Siamese network. The class label of the query example is then predicted
based on the class that has the smallest mean distance to the query example.

Below is the algorithm we proposed for Few shot learning algorithm via Siamese Network.
Use ‘imagenet’ weights as starting point

1. For each example in the dataset, let x_i be the input image or data, and let y_i be the label.

2. The Siamese network consists of two identical sub-networks, which can be represented as functions f_θ(x)
and g_θ(x), where θ represents the shared weights of the network.

3. Given a pair of examples (x_i, x_j) and their corresponding labels (y_i, y_j), the output of the Siamese
network is a pair of embedding vectors h_i = f_θ(x_i) and h_j = f_θ(x_j), where h_i and h_j are d-dimensional
vectors.

4. The distance between two embedding vectors h_i and h_j can be computed using a distance metric such as
Euclidean distance:
d(h_i, h_j) = ||h_i - h_j||
where ||.|| is the L2-norm.
5. The loss function for training the Siamese network is typically a contrastive loss, which encourages the
embeddings for the same class to be closer together than embeddings for different classes. The contrastive loss for a
pair of examples (x_i, x_j) and their labels (y_i, y_j) can be defined as:
L(h_i, h_j, y_i, y_j) = (1 - y_i) * d(h_i, h_j)^2 + y_i * max(0, m - d(h_i, h_j))^2
where m is a margin parameter that controls the distance threshold between same-class and different-class pairs. If
y_i = 1, then the loss encourages the distance between h_i and h_j to be less than m. If y_i = 0, then the loss
encourages the distance between h_i and h_j to be greater than or equal to m.

6. To classify a new example q, we first compute its embedding vector h_q = f_θ(q). We then form pairs
between q and each example x in the support set to create n pairs, where n is the number of classes in the support set.
Each pair consists of the query example q and an example x from the support set.

7. For each class c in the support set, we compute the mean of the distances between the query example q and
the examples x in the support set that belong to class c:
μ_c = mean(d(h_q, h_x) for all examples x in the support set that belong to class c)
where h_x = f_θ(x) is the embedding vector for example x.

8. We assign the query example q to the class c with the smallest mean distance μ_c:
y = argmin_c μ_c

9. The accuracy of the few-shot learning system on a test set can be computed as:
accuracy = (number of correct predictions) / (total number of predictions)

Once training is complete, the Siamese Network can be used to predict the similarity between new pairs of inputs by passing
them through the subnetworks and computing the distance between their feature vectors using the similarity function.
This algorithm should generate a set of "hard" pairs that are particularly challenging for the Siamese network to distinguish,
which can be useful for further fine-tuning the network or evaluating its performance. Below is the algorithm we proposed to
sample the hard pairs.
1. Set batch_size, num_classes, and threshold_distance to desired values.
2. Initialise the Siamese network with random weights.
3. Repeat for a fixed number of epochs:
a. If epoch is a multiple of 10:
i. For each class c in num_classes:
1. Create a list of all pairs (a, b) where a and b are examples from class c.
2. Sort the list in decreasing order of the absolute difference between the Siamese network's output for a and b.
3. Select the top half of the sorted list as the "hard" pairs and the bottom half as the "random" pairs.
4. Shuffle the hard and random pairs and combine them to form the mini-batch for this epoch.
b. Else:
i. Randomly sample pairs from the entire dataset to form the mini-batch for this epoch.
c. Train the Siamese network on the mini-batch using a contrastive loss function.
4. After training, use the Siamese network to predict the similarity between all pairs in the dataset.
5. For each class c in num_classes:
a. Create a list of all pairs (a, b) where a and b are examples from class c.
b. Sort the list in decreasing order of the absolute difference between the Siamese network's output for a and b.
c. For each pair (a, b) in the sorted list:
i. If the absolute difference between the Siamese network's output for a and b is greater than threshold_distance, add (a, b)
to the list of hard pairs.
ii. Else, add (a, b) to the list of random pairs.
6. Return the list of hard pairs.
4. Results and Discussion
Figure 7 shows the Graph Plotting comparison between Train loss vs Validation loss and Train accuracy vs Validation
accuracy. On the X-axis there is the number of epochs and on the Y axis there is Loss/Accuracy. As the gap between train
loss/accuracy vs validation loss/accuracy is quite less it we can infer that the model is not overfitting and is generalising well.

Figure.7 Loss and Accuracy plot between train and validation set

Table 3 shows a comparison of the base model with our model. Both of them are using VGG-16 architecture.We see a very
small increase in 3 ways as the base model accuracy was already quite high but as we increase the N value, we can see an
increase from base model to be increasing. For 10 way it’s around 2% increase in accuracy while in 20-way it's 3%
improvement.

Siamese + Transfer
N-Value Our Model
(Base)
3- way 93.33% 93.899%
10-way 86.667% 88.53%
20-way 78.775% 81.615%
Table.3 Accuracy of base model vs Our model (VGG-16 variant)

We also tried our model with some popular CNN architectures and compared the results down below in Table.4.

Model Time Taken per epoch Accuracy


(NVIDIA T4 Tensor Core
GPUs)
VGG-16 (3-way) 11 minutes 93.89 %
VGG-16 (10-way) 11 minutes 88.53%
VGG-16 (20-way) 11 minutes 81.61 %
ResNet (3-way) 9 minutes 93.63%
ResNet (10-way) 9 minutes 88.41%
ResNet (20-way) 9 minutes 81.32%
MobileNet (3-way) 3 minutes 91.89 %
Mobile-Net (10-way) 3 minutes 85.13%
Mobile-Net (20-way) 3 minutes 79.86 %
DenseNet (3-way) 5 minutes 92.73%
DenseNet (10-way) 5 minutes 87.69%
DenseNet (20-way) 5 minutes 80.45%

Table.4 Comparison of Different CNN architectures


This table provides a comparison of different CNN architectures, namely VGG-16, ResNet, MobileNet, and DenseNet, in
terms of time taken per epoch on NVIDIA T4 Tensor Core GPUs and their accuracy on three different classification tasks
with 3-way, 10-way, and 20-way classifications.

The results show that VGG-16, ResNet, MobileNet, and DenseNet achieved competitive accuracy levels for all three
classification tasks. Scores of VGG-16 and ResNet were slightly better than MobileNet and DenseNet. However, MobileNet
and DenseNet are relatively faster in terms of time taken per epoch compared to VGG-16 and ResNet. Specifically,
MobileNet takes only 3 minutes per epoch, while VGG-16 and ResNet take 11 minutes per epoch. DenseNet takes 5 minutes
per epoch, which is faster than VGG-16 and ResNet but slower than MobileNet.

Overall, MobileNet and DenseNet are more efficient in terms of time taken per epoch, making them suitable for real-time or
resource-constrained applications. On the other hand, VGG-16 and ResNet achieved comparable accuracy levels better than
MobileNet and DenseNet but are relatively slower.

4. Conclusion and Future Works


Some of the Challenges and Future Directions are mentioned below

● 2D images cannot truly reflect the 3D structure information of the human body.
● Adjusting to Domain Shift as currently few shot models perform poorly when there is shift in domain.
● There is a need of multimodal technologies in medical field like having list of symptoms with X=rays or diagnosis report
from a doctor that will help in increasing the accuracy and reliability of the model.
● There are certain classes of diseases that are just hard to predict with few examples no matter how sophisticated the model
is.

In the report we discussed various strategies used for few shots learning and compared their advantages and disadvantages.
Then we looked over the work done in recent years in the medical domain. We then discussed our hypothesis of utilizing hard
pairs to make our model more robust and we were able to get around 2-3 % improvement in accuracy over our base model.
We also compared various CNN architectures and their effectiveness and training time and found out VGG-16 had best
accuracy and MobileNet had least training time.

REFERENCES

[1]. Csurka, G.; Dance, C.R.; Fan, L.; Willamowski, J.; Bray, C. Visual Categorization with Bags of Keypoints. In
Proceedings of the Conference and Workshop on European Conference on Computer Vision, Prague, Czech Republic, 11–14
May 2020.
[2]. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [CrossRef]
[3]. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 886–
893.Electronics 2022, 11, 1752 25 of 28
[4]. Ahonen, T.; Hadid, A.; Pietikainen, M. Face description with local binary patterns: Application to face recognition. IEEE
Trans.Pattern Anal. Mach. Intell. 2006, 28, 2037–2041. [CrossRef] [PubMed]
[5]. Yang, C. Plant leaf recognition by integrating shape and texture features. Pattern Recognit. 2021, 112, 107809.
[CrossRef]
[6]. Al-Saffar, A.A.M.; Tao, H.; Talab, M.A. Review of deep convolutional neural networks in image classification. In
Proceedings of the 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications
(ICRAMET),Jakarta, Indonesia, 23–24 October 2017; pp. 26–31. [CrossRef]
[7]. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In
Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009;
pp. 248–255.
[8] Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J. Learning and transferring mid-level image representations using convolutional
neural networks In ICLR 2020
[9] Andrychowicz, M.; Denil, M.; GomezN. Learning to learn by gradient descent by gradient descent In ICLR 2019
[10] Dhillon Chaudhari Ravichandran & Soatto A baseline for few - shot image classification In ICLR 2020
[11] A baseline for few - shot image classification In ICLR 2020
[12] (Akihiro Nakamura and Tatsuya Harada. Revisiting fine-tuning for few-shot learning.)
[13] Mishra, N., Rohaninejad, M., Chen, X., & Abbeel, P. (2017).
[14] Papp, D., & Szűcs, G. (2017). Balanced active learning method for image classification. Acta Cybernetica, 23(2), 645-
658.
[15] Papp, D., & Szűcs, G. (2018). Double probability model for open set problem at image classification. Informatica, 29(2),
353-369.
[16] Ramachandra, B., Jones, M.J., & Vatsavai, R. (2020). Learning a distance function with a Siamese network to localize
anomalies in videos. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2587-2596.
[17] Seeland, M., & Mäder, P. (2021). Multi-view classification with convolutional neural networks. Plos ONE, 16(1),
e0245230. doi: 10.1371/journal.pone.0245230
[18] Shyam, P., Gupta, S., & Dukkipati, A. (2017). Attentive recurrent comparators. In Proceedings of the 34th International
Conference on Machine Learning - Volume 70 pp. 3173-3181. doi: 10.5555/3305890.3306009
[19]. Melekhov, J. Kannala and E. Rahtu, "Siamese network features for image matching," 2016 23rd International
Conference on Pattern Recognition (ICPR), 2016, pp. 378-383, doi: 10.1109/ICPR.2016.7899663.
[20] Wu, X., Sun, Y., Liu, L., & Liu, Z. (2017). Hard negative sample mining in siamese networks for object tracking. In
Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 5634-5642).
[21] Wang, W., Wu, Q., Zhang, X., & Li, W. (2020). Hard negative mining for siamese networks with adversarial attacks.
IEEE Transactions on Neural Networks and Learning Systems, 31(4), 1074-1084.
[22] Hassanpour, S., & Baydoun, M. (2020). Siamese network-based medical image retrieval system. Computer Methods and
Programs in Biomedicine, 191, 105422.
[23] Wang, L., Li, Y., & Huang, Y. (2018). Siamese neural network-based classification for medical images. In Proceedings
of the International Conference on Machine Learning and Cybernetics (ICMLC) (pp. 648-653).

You might also like