0% found this document useful (0 votes)

43 views12 pages

Research Paper

This research paper proposes a method for few-shot learning of chest x-ray images using a Siamese neural network. The researchers developed an algorithm to identify "hard pairs" of images that are difficult to classify in order to improve the performance of the Siamese network. They tested their approach on datasets containing chest x-rays labeled for COVID-19, pneumonia, and normal. Their method achieved improved accuracy over the base Siamese network model, especially for classifications with more categories. The researchers also compared different CNN architectures and found that VGG-16 and ResNet had the best accuracy but longer training times than MobileNet and DenseNet.

Uploaded by

Bob Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views12 pages

Research Paper

Uploaded by

Bob Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Few shot predictions for chest X-rays with Siamese

Style Network
1Dushyant
Singh, 2Shivam Kumar, 3Yogesh Walecha, 4Astitva, 5Tausif Diwan
1, 2, 3, 4, 5, Department
of Computer Science & Engineering, Indian Institute of Information Technology, Nagpur, India
[email protected], [email protected], [email protected], [email protected], [email protected]

Abstract
For medical images classification methods based on Machine Learning and Deep Learning exist but they only work well if a
large amount of labeled data is present. But generally, we don’t have such datasets present for medical images. Trained
personnel may learn to classify new diseases by looking at a few relevant images but for a deep learning model to do so is not
possible as they end up overfitting it. That’s where few shot or K- shot learning can be useful which can learn to classify a new
disease class just by looking at a few labeled examples.
For few shot predictions we have used covid-19 radiography dataset which contains 3 classes Covid-19, Normal and
Pneumonia. We have used Siamese Network in which we make pairs of images to train on and label them similar or not similar.
Most of the literature currently on Siamese network in medical domain selects these pairs randomly we have come up with a
simple algorithm to find hard pairs i.e the pairs which are from same class and have large Euclidian distance between their
feature vector and the pairs belonging to different class but have similar Euclidean distance. If there are N images then for
binary cross-entropy loss we will have total N^2 pairs possible and if we are using triplet loss then it will be N^3 triplets so by
just randomly selecting we might not get these hard pairs so we need to explicitly select them.

Training on these hard pairs makes the model more robust. We used a simple Siamese Network as our base model and we found
a 2-3% increase in accuracy with hard pairs sampling over our base model. We also compared different CNN Architectures
like VGG-16, Resnet, DenseNet and MobileNet and found out the performance of VGG-16 and ResNet were the best but they
had higher training time compared to MobileNet and DenseNet. Overall accuracy of VGG-16 was best and MobileNet had the
least training time.

Keywords: Few shot learning, covid-19, deep learning, Siamese Network, Transfer Learning, Hard-pairs mining

1. INTRODUCTION
This research was undertaken to develop a model for predicting diseases from chest x-rays that can quickly adapt to new classes
with just a few training examples. The goal was to find an efficient and accurate method for detecting chest diseases which
have a smaller number of training images in particular.
Few-Shot Learning is where a learner is trained on several related tasks, during the meta-training phase, so that it can generalize
well to unseen (but related) tasks with just a few examples during the meta-testing phase. N-way K shot means to classify new
tasks presented in front of it having N classes with K examples each (N- Way K- Shot).

There are 4 main reasons why Few shot learning is an ideal choice for the Medical Field:
• There are Limited sources for medical images and are not readily available to the public domain.
• Manual labelling of data is time consuming, not always practical and needs medical experts.
• Some diseases are rare and just don’t have enough data present
• Few shot models will be more robust if we want to predict some new disease with it, it can quickly adapt with just a few
trainings example.

We are using Siamese Network + Transfer Learning. It can be used for few shots learning because during training phase we
are not interested in learning the labels of classes instead we are teaching our model how to know if two classes are similar or
not. So, if we have a rare disease say X for which we only have few training examples and there are other similar classes of
disease for which we do have large amount of data then essentially, we can use those classes as base classes and then we give
our model query example of the rare disease X which it might not have seen before but it can still compare it with other classes
in our support set and select the class which have least Euclidian distance with query image.
We begin by providing background on the problem and work done in this area. We then present our work and the results of our
hypothesis and experiments where we compare it to a base Siamese network model. Finally, the paper concludes with a
summary of the implications of our finding and recommendations for future research.
Siamese Networks have become popular in recent years, especially for tasks related to image similarity or text similarity. The
network architecture consists of two identical neural networks that take in two inputs, producing embeddings that are compared
using a distance metric. However, one issue with Siamese Networks is the identification of "hard pairs," or pairs that are difficult
to distinguish.

The identification of hard pairs is crucial for improving the performance of Siamese Networks. Hard pairs represent cases
where the network is struggling to learn and may be a source of error. By identifying these pairs, we can retrain the network to
improve its performance.The main challenge in identifying hard pairs is determining which pairs are truly hard. Simply
selecting pairs with large distances may not be effective, as some pairs may have large distances due to noise in the data rather
than true differences in similarity.

Several approaches have been proposed for identifying hard pairs in Siamese Networks. These include distance-based sampling
and margin-based sampling. However, the effectiveness of these techniques has not been thoroughly evaluated.

In this research report, we investigate various CNN architectures for Siamese Networks and compare their performance and
computation time. We propose a method for identifying hard pairs based on Euclidian distance differences and evaluate its
effectiveness.

We have used covid-19 radiography dataset which contains 3 classes Covid-19, Normal and Pneumonia. We have used NIH as
an auxiliary dataset which consists of 8 classes to get more variety in X-ray classes.

We see very small increase in 3 ways as the base model accuracy was already quite high but as we increase the N value, we
can see increase from base model to be increasing. For 10 way it’s around 2% increase in accuracy while in 20-way its 3%
improvement.We also comapred some of the populor CNN architecures in terms of accuracy and computaional efficency.
MobileNet and DenseNet are more efficient in terms of time taken per epoch, making them suitable for real-time or resource-
constrained applications. On the other hand, VGG-16 and ResNet got better accuracay and were comparable to each other but
were relatively slower compared to MobileNet and DenseNet.

Major contributions:

• This is the first work that uses Siamese Network with hard pairs sampling in the X-rays domain.
• Our research provides insights into the effectiveness of various CNN models for few shot learning via Siamese Networks.
• We propose a effective method for identifying hard pairs. Our findings can help improve the performance of Siamese
Networks on similarity tasks.

2. RELATED WORK

There are four main categories for few shot learning. transfer learning based, meta-learning based, data augmentation based,
and multimodal based methods. Transfer learning-based methods are used to transfer the knowledge it learned from training
on target domain and then fine-tune it to required tasks. Meta-learning-based methods employ past prior knowledge to guide
the learning of new tasks. Data augmentation is used when the amount of data is less, we use it to augment the data by rotation,
cropping etc. and generate new data. Multimodal based methods use auxiliary info such as text, audio, video to make up for
less data.

Figure-1 Comparing few shots learning methods

2.1 Metric-Based Methods

Metric Based methods work by comparing the similarity and dissimilarity between feature vectors. The idea is if 2 images are
similar then the Euclidean or cosine distance between them should be close and if 2 images are different then the Euclidean or
cosine distance between them should be large.
Authors first introduced the Siamese Neural Networks for few shot classification tasks in 2015.
[7] Authors proposed the Matching Networks (MN), which learns an embedding function and uses the cosine to measure
similarity.
[8] Authors averaged the Feature vector of k examples of a class to get the mean vector that will represent the entire class and
then a similarity Score is calculated. They got1.5 % ~ 4 % improvement for 5 - way 5 - shot.

2.2 Work in Medical Domain

Shang et al. [12] combined transfer learning, multi-task learning, and semi-supervised learning methods into a unified
framework to promote a better performance of medical image classification.
Cai et al. [13] proposed an end-to-end learning model combined with an attention mechanism to solve the problem of medical
image classification and extract features from space and channels, so as to enhance the representation ability of the model.
Chen et al. [14] proposed a few-shot learning method for the automatic screening of COVID-19 images. An encoder is trained
by comparative learning, which can capture the feature representation on the lung dataset and classify it by prototype network.
[15] Authors classified between the classes Covid, Normal and Viral Covid-19 Radiography dataset. They used Siamese
networks and transfer learning.
[16] Authors conducted their experiments on Lung Image Database Consortium Image Collection which contains over 32,000
lung CT images of different lung diseases.
They used Prototypical Networks for few-shot classification. K clusters each point belong to the cluster with the nearest mean
The goal of the pre-trained encoder is to ensure that similar images are close and dissimilar images are separate in the latent
space
[17] Authors used dataset consisting of 1,080 high-resolution FUNDUS RGB images, made available by Samsung Medical
Center in Korea
They used matching Networks. The MNs have outperformed convolutional Siamese network on Omniglot task It uses an
attention mechanism that leverages cosine similarity.
[18] Authors introduced the idea of Pretraining on Unlabeled data. As large amounts of labelled data are difficult and expensive
to obtain for pretraining. So, it makes sense to also pretrain on large unlabeled data and learn some basic “building blocks” or
high-level features. It is only helpful if the Unlabeled data is drawn from the same distribution.

3. Materials and Methods

3.1 Dataset

Fig. 2 Overview of Covid-19 Radiography Dataset

3.2 Pre-processing

For Pre-processing we have resized all the images to 100 x 100 x3, and applied Histogram equalization to normalize the contrast
of the image makes the image clearer in darker areas. We are using no images of Covid-19 class during training process as it is
our novel class and we are using 30 images each from rest of the classes in our radiography dataset. To make up for less data
we have used NIH as an auxiliary dataset. We are using it as base classes. Although it has a significant number of images per
class, we only took around 30-50 from each class.

Fig. 3 Normal Class

Fig. 4 Covid-19 class

-
Fig. 5 Pneumonia class

3.2 Method

We are using Siamese Network + Transfer Learning. The reason for choosing Metric based approach over other alternatives in
Meta Learning such as optimization based and model based is because it works very well when combined with transfer learning
because if 2 images are similar their feature vectors should also be similar and if have a good feature extractor then our Siamese
network performance will also improve and we can get good feature vector by using transfer learning or pretraining on similar
task
So, combining transfer learning with the Siamese Network we can increase performance and save a lot of time by getting
ImageNet or NIH weights as initializer.
Covid-19 Radiography dataset consists of 3 classes Covid-19, Normal and Pneumonia.
An important choice we had to make was choosing Base and Novel classes combination. Main goal is to transfer the knowledge
of base classes to novel classes. Feature Extractor is trained using a base class and then we do few-shot predictions on novel
classes.

3.2 Architecture of Network and Hyperparameters

Siamese Network takes pairs of images as input. Then each image passed through a CNN feature extractor which gives us the
feature vector of the image. Then we combine these 2 feature vectors by taking Euclidean distance and make the prediction. It
is now just a binary classification problem where 1 means similar and 0 means not similar
We'll then pass this difference vector through a series of fully connected layers to produce a single output that indicates the
similarity between the two input images. We popped the last few layers as we are not interested in classification of classes.
We experimented with popular CNN architectures. VGG16 and ResNet are popular choices for image similarity tasks, while
Inception may be better suited for applications that require capturing features at different scales. MobileNet and DenseNet are
lightweight architectures that may be better suited for mobile and embedded applications or situations where computational
resources are limited. The choice of architecture is based on the trade-offs between computational complexity and performance.
We found out VGG-16 Performance to be best with although it is little computationally expensive. And instead of using random
weights as initializers we are ‘imagenet’ weights. One can argue that imagenet weight won’t be much help in X-rays
classification which is somewhat true but it is still better than using random weights. Because it will help us in capturing high
level features and then we can fine tune the model according to our task.
We froze all the layers except the last few layers. This is because the first few layers capture high level features like curves and
edges that are useful for any classification problem. We will keep these features and then train the last few layers to capture
task specific features.Another choice to make is in the Loss function. We had 2 main choices: Binary cross-entropy as it is just
a binary classification problem. Another choice is to use the triplet loss function which has shown great results for the Siamese
network.
But for simplicity we decided to go with the Binary cross-entropy loss function.We chose the learning rate of 1e-4. The reason
for choosing a very low learning rate is because we are only training the last few layers and with a higher learning rate, we risk
the chance of overfitting the data.

Fig. 6 Architecture of Siamese model

Fig. 7 Architecture of VGG-16 Feature extractor
3.3 Algorithm
Let X = {(x1, x2, y1), (x3, x4, y2), ..., (xn-1, xn, yn)} be the training set, where xi and xj are input samples, yi is the
corresponding label (0 for dissimilar pairs and 1 for similar pairs), and n is the number of training samples.
Initialize the weights of the subnetworks randomly.
For each training pair (xi, xj) and its corresponding label yi:

a. Pass xi and xj through the subnetworks to obtain the output feature vectors yi = f(xi) and yj = f(xj), where f is the subnetwork
function.

b. Calculate the distance between the feature vectors using the similarity function 𝑑 = ||𝑦𝑖 − 𝑦𝑗||, where ||.|| denotes the L2-
norm.

c. Calculate the loss using the contrastive loss function 𝐿 = (1 − 𝑦𝑖) ∗ (𝑑)^2 + 𝑦𝑖 ∗ 𝑚𝑎𝑥(0, 𝑚 − 𝑑)^2, where m is a
margin hyperparameter and max(0, m - d) is the hinge loss function.

d. Compute the gradient of the loss with respect to the weights of the subnetworks using backpropagation. Specifically, let ∇L
be the gradient of the loss with respect to the distance d, and let ∇d be the gradient of the distance d with respect to the feature
vectors yi and yj. Then, the gradients of the loss with respect to the feature vectors can be computed as follows:

𝛻𝑦𝑖 = 𝛻𝐿 ∗ 𝛻𝑑 ∗ 1, 𝑎𝑛𝑑
𝛻𝑦𝑗 = 𝛻𝐿 ∗ 𝛻𝑑 ∗ (−1),

where 1 and -1 are column vectors of ones and negative ones, respectively, with the same shape as yi and yj.

e. Compute the gradient of the loss with respect to the weights of the subnetworks using the chain rule:

𝛻𝑊 = ( 𝛻𝑦𝑖 + 𝛻𝑦𝑗 ) ∗ 𝛻𝑓,

where ∇W is the gradient of the loss with respect to the weights of the subnetworks, and ∇f is the gradient of the subnetwork
function with respect to its inputs.

f. Update the weights of the subnetworks using gradient descent:

𝑊 = 𝑊 − 𝜂 ∗ 𝛻𝑊,

where η is the learning rate.

Repeat steps 2-3 for a fixed number of epochs or until the loss stops improving on a validation set.

Once training is complete, the Siamese Network can be used to predict the similarity between new pairs of inputs by passing
them through the subnetworks and computing the distance between their feature vectors using the similarity function.
3.4 Hard Pairs
Hard pairs are the pairs which are from the same class and have large Euclidian distance between their feature vectors and the
pairs belonging to different classes but have similar Euclidean distance.
There has been previous work done in the domain of finding hard pairs or triplets for Siamese Network such as Ref [19]. But
we didn’t find any such work done for the Siamese Network in the medical domain. They mostly relied on generating these
pairs randomly from the dataset. Another motivation for coming up with our method was that the papers we studied on finding
hard pairs relayed on preparing these pairs beforehand. So, we came up with a much simpler and less time-consuming approach
to find these hard pairs.
In some cases, we can only rely on selecting random pairs especially when the classes differ greatly from each other, for
example if you are using 2 classes like elephant and cat. These 2 animals are completely different from each other so their
feature vectors will also differ greatly and there will be no need to find hard pairs to train on explicitly. But in case of X-rays
there is a good chance that 2 different diseases X-rays look similar or there are variations in the same class.
Let’s say we have 2 classes with 100 images each then it means there are 10^4 image pairs and if we just randomly select the
pairs then there is a good chance, we won’t be able to capture the hard pairs.
Training on hard pairs makes our model more robust and it forces it to learn the true distinguishing feature.

Fig. 8 Same and Different pairs

3.5 Methodology
We don’t have to choose hard pairs for every epoch. For most of the epochs we still pick pairs randomly but after every 10th
epoch we use our trained model till now to find and generate hard pairs equal to half our batch size and the other half is randomly
selected. For selecting hard pairs from the same class, we randomly choose a class label and then iterate over all the pairs from
that class and use our trained model till now to predict which of them have the largest margin between them. One more thing
to note is that we don’t always have to select the largest margin pair as it might just be an exception or noise. We select the
pairs that differ by a certain threshold distance. By always selecting the maximum or minimum distance pairs we might overfit
our model to outliners so we defined a threshold and selected randomly the pairs which crossed that threshold.
And for finding the hard pairs from different classes it is not optimal to generate all possible pairs from a dataset because that
will require a lot of time and memory. So instead, we select 2 random class labels and then form all possible pairs from those
2 different classes and use our trained model till the current epoch to find the pairs with distance less than our threshold value
between their feature vectors.
Fig.9 Overview of Model

We are not using the classical N-way K shot where there are N classes and K examples each. As we have only
3 query classes so a 20-way few shot means that if query example is COVID-19 image then we will generate 20
pairs such that only 1 pair is (Covid-19, Covid-19) and in rest 19 pairs we are pairing it with some other class
such as Normal or Pneumonia.

We do this to make our model task harder as now there are 20 pairs with only 1 possible correct answer for the
model to choose from.

3.6 Results

Fig.10 Loss and Accuracy plot between train and validation set
N value Siamese + Transfer (Base) Our Model

3- way 93.33% 93.899%

10-way 86.667% 88.53%
20-way 78.775% 81.615%
Table 2 Accuracy of base model vs Our model (VGG-16 variant)

Above is the table showing comparison of base model with our model. Both of them are using VGG-16
architecture.
We see very small increase in 3 ways as the base model accuracy was already quite high but as we increase
the N value, we can see increase from base model to be increasing. For 10 way it’s around
2% increase in accuracy while in 20-way its 3% improvement.

We also tried our model with some popular CNN architectures and compared the results down below.

Model Time Taken per epoch Accuracy

(NVIDIA T4 Tensor Core
GPUs)
VGG-16 (3-way) 11 minutes 93.89 %
VGG-16 (10-way) 11 minutes 88.53%
VGG-16 (20-way) 11 minutes 81.61 %
ResNet (3-way) 9 minutes 93.63%
ResNet (10-way) 9 minutes 88.41%
ResNet (20-way) 9 minutes 81.32%
MobileNet (3-way) 3 minutes 91.89 %
Mobile-Net (10-way) 3 minutes 85.13%
Mobile-Net (20-way) 3 minutes 79.86 %
DenseNet (3-way) 5 minutes 92.73%
DenseNet (10-way) 5 minutes 87.69%
DenseNet (20-way) 5 minutes 80.45%
Table 3 Comparison of Different CNN architectures

This table provides a comparison of different CNN architectures, namely VGG-16, ResNet, MobileNet, and DenseNet, in terms
of time taken per epoch on NVIDIA T4 Tensor Core GPUs and their accuracy on three different classification tasks with 3-
way, 10-way, and 20-way classifications.

The results show that VGG-16, ResNet, MobileNet, and DenseNet achieved competitive accuracy levels for all three
classification tasks. Scores of VGG-16 and ResNet were slighly better than compared to MobileNet and DenseNet. However,
MobileNet and DenseNet are relatively faster in terms of time taken per epoch compared to VGG-16 and ResNet. Specifically,
MobileNet takes only 3 minutes per epoch, while VGG-16 and ResNet take 11 minutes per epoch. DenseNet takes 5 minutes
per epoch, which is faster than VGG-16 and ResNet but slower than MobileNet.

Overall, MobileNet and DenseNet are more efficient in terms of time taken per epoch, making them suitable for real-time or
resource-constrained applications. On the other hand, VGG-16 and ResNet achieved comparable accuracy levels better than
MobileNet and DenseNet but are relatively slower.

4. Conclusion and Future Works

In the report we discussed various strategies used for few shots learning and compared their advantages and disadvantages.
Then we looked over the work done in recent years in the medical domain. We then discussed our hypothesis of utilizing hard
pairs to make our model more robust and we were able to get around 2-3 % improvement in accuracy over our base model. We
also compared various CNN architectures and their effectiveness and training time and found out VGG-16 had best accuracy
and MobileNet had least training time.
Challenges and Future Direction

• 2D images cannot truly reflect the 3D structure information of the human body.
• Adjusting to Domain Shift as currently few shot models perform poorly when there is shift in domain.
• There is a need of multimodal technologies in medical field like having list of symptoms with X=rays or diagnosis report
• from doctor that will help in increasing the accuracy and reliability of model.
• There are certain classes of diseases that are just hard to predict with few examples no matter how sophisticated the model
is.

5. REFERENCES

[1]. Csurka, G.; Dance, C.R.; Fan, L.; Willamowski, J.; Bray, C. Visual Categorization with Bags of Keypoints. In Proceedings
of the Conference and Workshop on European Conference on Computer Vision, Prague, Czech Republic, 11–14 May 2020.
[2]. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [CrossRef]
[3]. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 886–
893.Electronics 2022, 11, 1752 25 of 28
[4]. Ahonen, T.; Hadid, A.; Pietikainen, M. Face description with local binary patterns: Application to face recognition. IEEE
Trans.Pattern Anal. Mach. Intell. 2006, 28, 2037–2041. [CrossRef] [PubMed]
[5]. Yang, C. Plant leaf recognition by integrating shape and texture features. Pattern Recognit. 2021, 112, 107809. [CrossRef]
[6]. Al-Saffar, A.A.M.; Tao, H.; Talab, M.A. Review of deep convolution neural network in image classification. In
Proceedings of the 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications
(ICRAMET),Jakarta, Indonesia, 23–24 October 2017; pp. 26–31. [CrossRef]
[7]. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In
Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009;
pp. 248–255.
[8] Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J. Learning and transferring mid-level image representations using convolutional
neural networks In ICLR 2020
[9] Andrychowicz, M.; Denil, M.; GomezN. Learning to learn by gradient descent by gradient descent In ICLR 2019
[10] Dhillon Chaudhari Ravichandran & Soatto A baseline for few - shot image classification In ICLR 2020
[11] A baseline for few - shot image classification In ICLR 2020
[12] (Akihiro Nakamura and Tatsuya Harada. Revisiting fine-tuning for few-shot learning.)
[13] Mishra, N., Rohaninejad, M., Chen, X., & Abbeel, P. (2017).
[14] Papp, D., & Szűcs, G. (2017). Balanced active learning method for image classification. Acta Cybernetica, 23(2), 645-
658.
[15] Papp, D., & Szűcs, G. (2018). Double probability model for open set problem at image classification. Informatica, 29(2),
353-369.
[16] Ramachandra, B., Jones, M.J., & Vatsavai, R. (2020). Learning a distance function with a Siamese network to localize
anomalies in videos. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2587-2596.
[17] Seeland, M., & Mäder, P. (2021). Multi-view classification with convolutional neural networks. Plos ONE, 16(1),
e0245230. doi: 10.1371/journal.pone.0245230
[18] Shyam, P., Gupta, S., & Dukkipati, A. (2017). Attentive recurrent comparators. In Proceedings of the 34th International
Conference on Machine Learning - Volume 70 pp. 3173-3181. doi: 10.5555/3305890.3306009
[19]. Melekhov, J. Kannala and E. Rahtu, "Siamese network features for image matching," 2016 23rd International Conference
on Pattern Recognition (ICPR), 2016, pp. 378-383, doi: 10.1109/ICPR.2016.7899663.
[20] Wu, X., Sun, Y., Liu, L., & Liu, Z. (2017). Hard negative sample mining in siamese networks for object tracking. In
Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 5634-5642).
[21] Wang, W., Wu, Q., Zhang, X., & Li, W. (2020). Hard negative mining for siamese networks with adversarial attacks. IEEE
Transactions on Neural Networks and Learning Systems, 31(4), 1074-1084.
[22] Hassanpour, S., & Baydoun, M. (2020). Siamese network-based medical image retrieval system. Computer Methods and
Programs in Biomedicine, 191, 105422.
[23] Wang, L., Li, Y., & Huang, Y. (2018). Siamese neural network-based classification for medical images. In Proceedings of
the International Conference on Machine Learning and Cybernetics (ICMLC) (pp. 648-653).