Make 04 00002 v2
Make 04 00002 v2
knowledge extraction
Article
A Transfer Learning Evaluation of Deep Neural Networks for
Image Classification
Nermeen Abou Baker * , Nico Zengeler and Uwe Handmann
Computer Science Institute, Ruhr West University of Applied Sciences, 46236 Bottrop, Germany;
[email protected] (N.Z.); [email protected] (U.H.)
* Correspondence: [email protected]
Abstract: Transfer learning is a machine learning technique that uses previously acquired knowledge
from a source domain to enhance learning in a target domain by reusing learned weights. This
technique is ubiquitous because of its great advantages in achieving high performance while saving
training time, memory, and effort in network design. In this paper, we investigate how to select the
best pre-trained model that meets the target domain requirements for image classification tasks. In
our study, we refined the output layers and general network parameters to apply the knowledge of
eleven image processing models, pre-trained on ImageNet, to five different target domain datasets.
We measured the accuracy, accuracy density, training time, and model size to evaluate the pre-trained
models both in training sessions in one episode and with ten episodes.
1. Introduction
Citation: Abou Baker, N.; Zengeler,
Deep learning is a subfield of machine learning that allows computers to automatically
N.; Handmann, U. A Transfer
interpret representations of data by learning from examples. Transfer learning is a deep
Learning Evaluation of Deep Neural learning technique that uses previous knowledge to learn new tasks and is becoming
Networks for Image Classification. increasingly popular in many applications with the support of Graphics Processing Unit
Mach. Learn. Knowl. Extr. 2022, 4, (GPU) acceleration. Transfer learning has many benefits that have attracted researchers
22–41. https://doi.org/10.3390/ in different domains, to name but a few: medical applications [1], remote sensing [2],
make4010002 optical satellite images [3], supporting automated recycling [4], natural language process-
ing [5], mobile applications [6], etc. However, there are some caveats in choosing the best
Academic Editor: Andreas Holzinger
pre-trained model for such applications, as most focus on accuracy and leave out other
Received: 3 December 2021 important parameters. Therefore, it is important to also consider other metrics such as
Accepted: 10 January 2022 training time or memory requirements before proceeding to a concrete implementation.
Published: 14 January 2022 Transfer learning is performed with pre-trained models, typically large Convolutional
Publisher’s Note: MDPI stays neutral
Neural Networks (CNNs) that are pre-trained on large standard benchmark datasets
with regard to jurisdictional claims in and then reused for the new target task. The reuse of such pre-trained models can be
published maps and institutional affil- easily implemented by, for example, replacing certain layers with other task-specific layers
iations. and then training the model for the target task. Moreover, many frameworks such as
PyTorch, MATLAB, Caffe, TensorFlow, Onnx, etc., provide several pre-trained models that
can help researchers implement this promising technique. The state-of-the-art has many
architectures, each with its own characteristics, that are suitable for CNN applications.
Copyright: © 2022 by the authors. However, the performance of the resulting transfer learning network depends on the pre-
Licensee MDPI, Basel, Switzerland. trained model used. Before going into the reuse of these models, it seems that there is a
This article is an open access article great deal of freedom in choosing the model.
distributed under the terms and
According to [7], the size and similarity of the target dataset and the source task can
conditions of the Creative Commons
be used as rules of thumb to choose the pre-trained model. ImageNet is a leading dataset
Attribution (CC BY) license (https://
due to its popularity and data diversity. However, fine-tuning pre-trained models that are
creativecommons.org/licenses/by/
trained on ImageNet is not per se able to achieve good results on spectrograms, for example.
4.0/).
Besides, following the previous strategy might not be enough with the current challenging
constraints that require high accuracy, a short training time, and limited hardware resources
for specific applications. Previously pre-trained model analysis was presented in [8], who
collected reported values from the literature and compared the models’ performance on
ImageNet to evaluate several scores, such as the top-five accuracy normalized to model
complexity and power consumption.
Another worthwhile attempt was presented by [9], who benchmarked pre-trained
models on ImageNet using multiple indices such as accuracy, computational complexity,
memory usage, and inference time to help practitioners better fit the resource constraints.
Choosing the best pre-trained model is a complex dilemma that needs to be well
understood, and researchers could feel confused about picking the most suitable option.
We performed extensive experiments to classify five datasets on eleven pre-trained models.
We provide in-depth insight and offer a feasible guideline for transfer learning that uses
a pre-trained model by introducing an overview of the tested models and datasets and
evaluating their performance using different metrics. Since most pre-trained models are
used to classify ImageNet, we conducted our research on different datasets, including
standard and non-standard tasks.
The paper is organized as follows: It starts by introducing the research gap in the
Introduction in Section 1. Section 2 summarizes the related learning methods. Section 3
gives an overview of the main characteristics of the tested models and datasets. Section 4
focuses on the implementation of the models. Results are presented and discussed in
Section 5. Finally, the conclusion of the work is given in Section 6.
Table 1. The tested models with their main characteristics, where * refers to features specially designed
for the model.
MobileNet (54)
Mobile models
Filter Concatena�on
MnasNet
1x1 Convolu�ons 1x1 Convolu�ons 3x3 max pooling
Figure 1. Infographic of the tested pre-trained models. Each model is introduced with its architecture
symbol, the number of layers between brackets, and design specification (see the color map).
3.3. Datasets
A combination of standard datasets was tested, which were: CIFAR10 with 60 K
images [40], Modified National Institute of Standards and Technology (MNIST) with
70 K images [41], Hymenoptera [42], and non-standard, which were: smartphones and
augmented smartphones [43], as follows:
3.3.1. Hymenoptera
This is a small RGB dataset that is used to classify ants and bees from a PyTorch
tutorial on transfer learning. It consists of 245 training images and 153 testing images.
to show that transfer learning can reach high accuracy with a small dataset to support
automated e-waste recycling through device classification. We collected the images from
the search engines focusing on the backside where unique features such as the logo and
camera lenses, which are distinguishing because most front-sides of modern smartphones
look similar, as showcased in Figure 2.
4. Implementation
This study performed two scenarios under the same condition, using an Nvidia GTX
1080 TiGPU to train and evaluate eleven PyTorch vision models in a sequential fashion,
namely AlexNet, VGG-16, Inception-V1 (GoogLeNet), ResNet-18, SqueezeNet, DenseNet,
ResNext, MobileNet, Wide ResNet, ShuffleNet-V2, and MnasNet. We re-trained each model
on five tasks, namely MNIST, CIFAR10, Hymenoptera, smartphones, and augmented
smartphones, each in a grid search over learning rates η ∈ {10−2 , 10−3 , 10−4 } with the
ADAM optimizer and a batch size equal to 10. In our plots, we show only the model with
the highest accuracy in the overall learning rate. To overcome overfitting, we performed
early stopping, so we saved model weights only if the validation accuracy increased. That
is, if the validation accuracy decreased, we still used the best model found so far.
We chose to perform two experiments in our paper where a pre-trained model was
used to:
• Fine-tune the classifier layer only: This method keeps the feature extraction layers from
the pre-trained model fixed, so-called frozen. We then re-initialized the task-specific
classifier parts, as given by reference in the PyTorch vision model implementations [42],
with random values. If the PyTorch model did not have an explicit classifier part,
Mach. Learn. Knowl. Extr. 2022, 4 27
for example the ResNet18 architecture, we fine-tuned only the last fully connected
layer. We froze all other weights during training. This technique saved training time
and, to some degree, overcame the problem of a small-sized target dataset because it
only updated a few weights;
• Fine-tune all layers: For this method, we used the PyTorch vision models with original
weights as pre-trained on ImageNet and fine-tuned the entire parameter vector. In the-
ory, this technique achieves higher accuracy and generalization, but it requires a longer
training time since it is used for initializing weights by continuing the backpropagation
instead of random initialization in scratch training.
PyTorch vision models typically have a classifier part and a feature extraction part.
Fine-tuning the output layers means fine-tuning the classifier part, which results in a large
variation in the model size. We froze all other weights during training. We assessed the
model performance with four metrics: the accuracy, the accuracy density, the model size,
and training time on a GPU.
5. Results
We present our results for two experiments, learning from one episode and learning
from ten episodes. In each experiment, we tested the fine-tuning of both the classifier batch
and the entire network. In the configurations with few shots, each sample was presented
only once in a single training episode, while in the configuration with ten episodes, each
sample was presented ten times accordingly.
Figure 3. Average accuracy densities with full tuning and tuning the classifier layer only for one episode.
Figure 4. Average accuracy densities with full tuning and tuning the classifier layer only for ten episodes.
Mach. Learn. Knowl. Extr. 2022, 4 29
Tuning hyperparameters means finding the best set of parameter values for a learning
algorithm. In CNNs, the initial layers are designed to extract global features, whereas the
later ones are more task-specific. Therefore, when tuning the classification layer, only the
final layer for classification is replaced, while the other layers are frozen (the weights of
the other layers are fixed). This means utilizing the knowledge of the overall architecture
as a feature extractor and using it as a starting point for retraining. Consequently, it
achieved high performance with a smaller number of parameters and a shorter training
time, as shown in Figure 6. Usually, this scenario is used when the target task labels are
scarce [44]. On the other hand, full tuning means retraining the whole network (the weights
are updated after each epoch) with a longer training time and more parameters, as shown
in Figure 5. When target task labels are plentiful, this scenario is typically applied. Each
Dataset is tested for tuning the classifier layer only and tuning full layers and shown in
detail for ten episodes in Appendix C, and for one episode in Appendix D.
Model sizes (Number of parameters x 10 6) and accuracy vs training time for all tasks and models after tuning full layers
GoogLe Dense
Res18 Shuffle ResNext
5.61
85 1.26
11.18 Mobile 23 26.5
2.24
WideRes50
80
Mnas
66.86
3.12
75
70
Accuracy %
65
60
Squeeze
0.74
55
50
45
Vgg16
40 Alex 134.31
57.05
Figure 5. Model sizes and accuracy vs. training time for all tasks and models after fine-tuning
full layers.
Model sizes (Number of parameters x 10 4) and accuracy vs training time for all tasks and models after tuning the classification layers only
Dense
86 2.03
Vgg16
84 ResNext 11958.4
1.89
Res18
Squeeze Mobile 0.47
82 0.47 1.18
GoogLe
0.94
80 WideRes50
1.89
78
Accuracy %
Alex
5457.18
76
74
72
Mnas
1.18
70
68
66 Shuffle
0.94
Figure 6. Model sizes and accuracy vs. training time for all tasks and models after fine-tuning the
classifier layer only.
Mach. Learn. Knowl. Extr. 2022, 4 30
6. Conclusions
DNNs’ performance has been enhanced over time in many aspects. Nonetheless,
there are critical parameters that define which pre-trained model perfectly matches the
application requirements. In this paper, we presented a comprehensive evaluation of eleven
popular pre-trained models on five datasets as a guiding tool for choosing the appropriate
model before deployment. We conducted two different sets of experiments: one-episode
learning and ten-episode learning, with each experiment involving tuning the classifier
layer only and full tuning. The previous findings, however, might provide some clues for
choosing the right model for fine-tuning the classification layer only. For applications that
require high accuracy, GoogLeNet, DenseNet, ShuffleNet-V2, ResNet-18, and ResNext are
the best candidates, while SqueezeNet is for the accuracy density, and AlexNet for the
shortest training time, and SqueezeNet, ShuffleNet, MobileNet, MnasNet, and GoogLeNet
are almost equal regarding the smallest model size, for embedded systems applications,
for example. On the other hand, we can also provide some suggestions when fine-tuning
only the classification layers. DenseNet achieved the highest accuracy, while ResNet18 the
best accuracy density, and SqueezeNet the shortest training time. In addition, all models
had small model sizes except AlexNet and VGG-16. Although we provided guidelines and
some hints, our argumentation does not give a final verdict, but it supports decisions for
choosing the right pre-trained model based on the task requirements.
Thus, for specific application constraints, selecting the right pre-trained model can be
challenging due to the tradeoffs among training time, model size, and accuracy as decision
factors to produce better scores.
For future work, we plan to test more evaluation metrics with the provided parameters
to facilitate decision-making in choosing the optimum model to fine-tune. Furthermore,
we aim to systematically investigate the usability of all available a priori and a posteriori
metadata for estimating useful transfer learning hyperparameters.
Author Contributions: Conceptualization, N.A.B. and N.Z.; methodology, N.A.B.; software, N.Z.;
writing—original draft preparation, N.A.B.; writing—review and editing, N.A.B. and N.Z.; supervi-
sion, U.H. All authors have read and agreed to the published version of the manuscript.
Funding: This work has been partially funded by the Ministry of Economy, Innovation, Digitization,
and Energy of the State of North Rhine-Westphalia within the project Prosperkolleg.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: No new data were created or analyzed in this study. Data sharing is
not applicable to this article.
Conflicts of Interest: The authors declare no conflict of interest.
Figure A5. Accuracy densities for one-episode learning on original smartphone data.
Appendix C. Accuracy vs. Training Time and Number of Parameters (Model Size) for
Each Task with Ten-Episode Learning
• Accuracy vs. training time and model size for CIFAR-10 for ten-episode training,
Figure A11;
Mach. Learn. Knowl. Extr. 2022, 4 36
• Accuracy vs. training time and model size for MNIST for ten-episode training, Figure A12;
• Accuracy vs. training time and model size for Hymenoptera for ten-episode training,
Figure A13;
• Accuracy vs. training time and model size for original smartphone data for ten-episode
training, Figure A14;
• Accuracy vs. training time and model size for augmented smartphone data for ten-
episode training, Figure A15;
• Model sizes and accuracy vs. training time for all tasks and models after fine-tuning
the classifier layer only, where A refers to Augmented smartphones, C to CIFAR10, H
to Hymenoptera, M to MNIST, and O to the Original smartphone dataset, Figure A16;
• Model sizes and accuracy vs. training time for all tasks and models after full fine-
tuning, where A refers to Augmented smartphones, C to CIFAR10, H to Hymenoptera,
M to MNIST, and O to the Original smartphone dataset, Figure A17.
#params
97.5
5130
95.0
densenet-full
mobilenet-full
resnext50-full
googlenet-full 13434768
mnasnet-full
resnet18-full shufflenet-full
92.5 wide_resnet50_2-full
26864406
Accuracy [%]
squeezenet-full
90.0 40294045
vgg16-full
alexnet-full
87.5 53723683
vgg16-output
alexnet-output
67153322
85.0 shufflenet-output
densenet-output
80582960
82.5 squeezenet-output
resnext50_32x4d-output 94012598
googlenet-output
80.0 resnet18-output
wide_resnet50_ output
107442237
mobilenet-output
mnasnet-output
Figure A11. Accuracy vs. training time and model size for CIFAR-10 for ten-episode training.
Model sizes and accuracies vs training time for MNIST
100
resnet18-full
googlenet-full resnext50_32x4d-full #params
mobilenet-full mnasnet-full shufflenet-full densenet-full
squeezenet-full
alexnet-full wide_resnet50_2-full
alexnet-output vgg16-full
5130
vgg16-output
98
13434768
squeezenet-output 26864406
shufflenet-output
Accuracy [%]
96
densenet-output 40294045
resnet18-output
53723683
googlenet-output
94 67153322
mnasnet-output
resnext50_32x4d-output
mobilenet-output
80582960
94012598
92
wide_resnet50_2-output 107442237
1000 2000 3000 4000 5000 6000 7000 8000 9000 120871875
Training time [s]
Figure A12. Accuracy vs. training time and model size for MNIST for ten-episode training.
Model sizes and accuracies vs training time for hymenoptera
100
#params
1026
98
13427797
densenet-output
40281339
mobilenet-output
92 alexnet-output
vgg16-full 93988424
alexnet-full
squeezenet-full 107415195
90
50 60 70 80 90 100 110 120 130 120841966
Training time [s]
Figure A13. Accuracy vs. training time and model size for Hymenoptera for ten-episode training.
Mach. Learn. Knowl. Extr. 2022, 4 37
#params
resnext50_32x4d-full
wide_resnet50_2-full 7182
densenet-full
resnet18-full vgg16-output
90
vgg16-full
googlenet-full
resnet18-output
13438254
alexnet-full
squeezenet-output squeezenet-full
mobilenet-full
alexnet-output
mnasnet-full 53731470
wide_resnet50_2-output
70 shufflenet-output 67162542
80593614
94024686
60
107455758
mnasnet-output
Figure A14. Accuracy vs. training time and model size for original smartphone data for ten-
episode training.
97.5 shufflenet-full
squeezenet-full vgg16-full 7182
alexnet-full
mnasnet-full
95.0 13438254
densenet-output
26869326
92.5
Accuracy [%]
squeezenet-output
alexnet-output vgg16-output 40300398
90.0
53731470
87.5 shufflenet-output
67162542
resnet18-output resnext50_32x4d-output
mobilenet-output
googlenet-output
80593614
85.0
94024686
82.5 wide_resnet50_2-output
107455758
mnasnet-output
Figure A15. Accuracy vs. training time and model size for augmented smartphone data for ten-
episode training.
Model sizes and accuracies vs training time for all tasks and models after output fine tuning
100
resnext50_32x4d-H
densenet-H
#params
mnasnet-H alexnet-M
googlenet-H
mobilenet-H squeezenet-M vgg16-M
vgg16-H resnet18-M shufflenet-M
shufflenet-H
90 wide_resnet50_2-H googlenet-M 1026
resnext50_32x4d-M
resnet18-H mobilenet-M
wide_resnet50_2-M
alexnet-H
11961244
squeezenet-H vgg16-A
80 squeezenet-C shufflenet-C
densenet-A alexnet-C
squeezenet-A resnet18-C googlenet-C resnext50_32x4d-C
alexnet-A wide_resnet50_2-C
23921463
mobilenet-A mobilenet-C
googlenet-A
70
Accuracy [%]
vgg16-O
resnet18-A
resnext50_32x4d-A
35881682
resnet18-O wide_resnet50_2-A
shufflenet-A mnasnet-C
60 mobilenet-O
47841901
squeezenet-O mnasnet-A
densenet-O
alexnet-O 59802120
googlenet-O
50 resnext50_32x4d-O
71762338
mnasnet-O
40
83722557
shufflenet-O
mnasnet-M
wide_resnet50_2-O
30 95682776
Figure A16. Model sizes and accuracy vs. training time for all tasks and models after fine-tuning
classifier layer only, where A refers to Augmented smartphones, C to CIFAR10, H to Hymenoptera,
M to MNIST, and O to the Original smartphone dataset.
Mach. Learn. Knowl. Extr. 2022, 4 38
Model sizes and accuracies vs training time for all tasks and models after full fine tuning
100
resnet18-M
mobilenet-M shufflenet-M #params
densenet-H squeezenet-M googlenet-M
wide_resnet50_2-M resnext50_32x4d-M
alexnet-M densenet-A vgg16-M
resnext50_32x4d-H resnet18-A resnext50_32x4d-A
mnasnet-H wide_resnet50_2-A mnasnet-M
wide_resnet50_2-H googlenet-C
resnet18-H mobilenet-A resnext50_32x4d-C 736450
90 mobilenet-H resnet18-C mobilenet-C
googlenet-H shufflenet-C
vgg16-H wide_resnet50_2-C
resnext50_32x4d-O
googlenet-A
shufflenet-H
squeezenet-H
14094595
resnet18-O
alexnet-C mnasnet-C
squeezenet-C vgg16-C
wide_resnet50_2-O
80 alexnet-H alexnet-A
vgg16-A 27452740
squeezenet-A
densenet-O shufflenet-A
vgg16-O
Accuracy [%]
40810885
70
mobilenet-O
googlenet-O 54169030
alexnet-O
60 mnasnet-A
mnasnet-O 67527176
80885321
50
squeezenet-O
94243466
shufflenet-O
40
107601611
Figure A17. Model sizes and accuracy vs. training time for all tasks and models after full fine-tuning,
where A refers to Augmented smartphones, C to CIFAR10, H to Hymenoptera, M to MNIST, and O
to the Original smartphone dataset.
Appendix D. Accuracy vs. Training Time and Model Size for Each Task with
One-Episode Learning
• Accuracy vs. training time and model size for CIFAR-10 for one-episode training,
Figure A18;
• Accuracy vs. training time and model size for MNIST for one-episode training, Figure A19;
• Accuracy vs. training time and model size for Hymenoptera for one-episode training,
Figure A20;
• Accuracy vs. training time and model size for original smartphone data for one-
episode training, Figure A21;
• Accuracy vs. training time and model size for augmented smartphone data for one-
episode training, Figure A22.
Model sizes and accuracies vs training time for CIFAR10
100
#params
95
googlenet-full 5130
mobilenet-full densenet-full
resnext50_32x4d-full
resnet18-full shufflenet-full 13434768
90
wide_resnet50_2-full
26864406
vgg16-output
alexnet-full
85 densenet-output
Accuracy [%]
vgg16-full
squeezenet-full mnasnet-full 40294045
shufflenet-output
75
mobilenet-output 80582960
94012598
70
107442237
mnasnet-output
65
250 500 750 1000 1250 1500 1750 120871875
Training time [s]
Figure A18. Accuracy vs, training time and model size for CIFAR-10 for one-episode training.
Mach. Learn. Knowl. Extr. 2022, 4 39
13434768
80
26864406
Accuracy [%]
70 40294045
53723683
60
67153322
50 80582960
94012598
40
107442237
mnasnet-output
200 400 600 800 1000 1200 1400 120871875
Training time [s]
Figure A19. Accuracy vs, training time and model size for MNIST for one-episode training.
Model sizes and accuracies vs training time for hymenoptera
100.0
#params
97.5
resnext50_32x4d-output
1026
densenet-output
mnasnet-output densenet-full
95.0
13427797
googlenet-output
mobilenet-output
resnext50_32x4d-full
vgg16-output 26854568
92.5
mnasnet-full
shufflenet-output wide_resnet50_2-full
Accuracy [%]
40281339
resnet18-full
90.0 vgg16-full
mobilenet-full
googlenet-full
53708110
wide_resnet50_2-output
resnet18-output
87.5
67134882
alexnet-output shufflenet-full
85.0 80561653
squeezenet-full
93988424
82.5
squeezenet-output
107415195
80.0 alexnet-full
6 8 10 12 14 16 18 20 120841966
Training time [s]
Figure A20. Accuracy vs, training time and model size for Hymenoptera for one-episode training.
Model sizes and accuracies vs training time for smartphone_orig
100
#params
90
7182
resnext50_32x4d-full
13438254
resnet18-full
80
wide_resnet50_2-full
26869326
densenet-full
vgg16-full
Accuracy [%]
70 40300398
vgg16-output
mobilenet-full
resnet18-output
googlenet-full 53731470
alexnet-full
60 mobilenet-output
mnasnet-full 67162542
squeezenet-output
densenet-output
googlenet-output
alexnet-output 80593614
50 resnext50_32x4d-output
squeezenet-full
94024686
mnasnet-output
shufflenet-full
40
107455758
shufflenet-output wide_resnet50_2-output
6 8 10 12 14 16 18 20 120886830
Training time [s]
Figure A21. Accuracy vs, training time and model size for original smartphone data for one-
episode training.
Model sizes and accuracies vs training time for smartphone_augmented
100
#params
densenet-full
95 resnext50_32x4d-full
resnet18-full wide_resnet50_2-full
7182
mobilenet-full
90
13438254
googlenet-full
85 26869326
vgg16-output
Accuracy [%]
40300398
80 alexnet-full
densenet-output vgg16-full
squeezenet-output
squeezenet-full 53731470
75 alexnet-output shufflenet-full
mobilenet-output
67162542
googlenet-output
70
resnext50_32x4d-output 80593614
resnet18-output
wide_resnet50_2-output
65 shufflenet-output 94024686
107455758
60 mnasnet-full
mnasnet-output
50 75 100 125 150 175 200 120886830
Training time [s]
Figure A22. Accuracy vs, training time and model size for augmented smartphone data for one-
episode training.
Mach. Learn. Knowl. Extr. 2022, 4 40
References
1. Lundervold, A.S.; Lundervold, A. An overview of deep learning in medical imaging focusing on MRI. Z. Fur Med. Phys. 2019,
29, 102–127. [CrossRef] [PubMed]
2. Pires de Lima, R.; Marfurt, K. Convolutional Neural Network for Remote-Sensing Scene Classification: Transfer Learning
Analysis. Remote Sens. 2020, 12, 86. [CrossRef]
3. Zou, M.; Zhong, Y. Transfer Learning for Classification of Optical Satellite Image. Sens. Imaging 2018, 19, 6. [CrossRef]
4. Abou Baker, N.; Szabo-Müller, P.; Handmann, U. Feature-fusion transfer learning method as a basis to support automated
smartphone recycling in a circular smart city. In Proceedings of the EAI S-CUBE 2020—11th EAI International Conference on
Sensor Systems and Software, Aalborg, Denmark, 10–11 December 2020.
5. Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; de Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-
Efficient Transfer Learning for NLP. arXiv 2019, arXiv:1902.00751.
6. Choe, D.; Choi, E.; Kim, D.K. The Real-Time Mobile Application for Classifying of Endangered Parrot Species Using the CNN
Models Based on Transfer Learning. Mob. Inf. Syst. 2020, 2020, 1–13. [CrossRef]
7. Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Transfer learning for time series classification. In Proceedings
of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018. [CrossRef]
8. Canziani, A.; Paszke, A.; Culurciello, E. An Analysis of Deep Neural Network Models for Practical Applications. arXiv 2017,
arXiv:1605.07678.
9. Bianco, S.; Cadene, R.; Celona, L.; Napoletano, P. Benchmark Analysis of Representative Deep Neural Network Architectures.
IEEE Access 2018, 6, 64270–64277. [CrossRef]
10. Socher, R.; Ganjoo, M.; Sridhar, H.; Bastani, O.; Manning, C.D.; Ng, A.Y. Zero-Shot Learning Through Cross-Modal Transfer.
arXiv 2013, arXiv:1301.3666.
11. Xian, Y.; Schiele, B.; Akata, Z. Zero-Shot Learning—The Good, the Bad and the Ugly. arXiv 2020, arXiv:1703.04394.
12. Lampert, C.H.; Nickisch, H.; Harmeling, S. Attribute-Based Classification for Zero-Shot Visual Object Categorization. IEEE Trans.
Pattern Anal. Mach. Intell. 2014, 36, 453–465. [CrossRef]
13. Zhang, Z.; Saligrama, V. Zero-Shot Learning via Semantic Similarity Embedding. arXiv 2015, arXiv:1509.04767.
14. Akata, Z.; Perronnin, F.; Harchaoui, Z.; Schmid, C. Label-Embedding for Image Classification. IEEE Trans. Pattern Anal. Mach.
Intell. 2016, 38, 1425–1438. [CrossRef] [PubMed]
15. Bart, E.; Ullman, S. Cross-generalization: Learning novel classes from a single example by feature replacement. In Proceedings of
the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26
June 2005; Volume 1, pp. 672–679. [CrossRef]
16. Fink, M. Object Classification from a Single Example Utilizing Class Relevance Metrics. In Advances in Neural Information
Processing Systems; Saul, L., Weiss, Y., Bottou, L., Eds.; MIT Press: Cambridge, MA, USA, 2005; Volume 17.
17. Tommasi, T.; Caputo, B. The More You Know, the Less You Learn: From Knowledge Transfer to One-shot Learning of Object
Categories. In Proceedings of the BMVC, London, UK, 7–10 September 2009. Available online: http://www.bmva.org/bmvc/20
09/Papers/Paper353/Paper353.html (accessed on 30 November 2021 ).
18. Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a Few Examples: A Survey on Few-Shot Learning. ACM Comput. Surv.
2020, 53, 63. [CrossRef]
19. Azadi, S.; Fisher, M.; Kim, V.; Wang, Z.; Shechtman, E.; Darrell, T. Multi-Content GAN for Few-Shot Font Style Transfer. arXiv
2017, arXiv:1712.00516.
20. Liu, B.; Wang, X.; Dixit, M.; Kwitt, R.; Vasconcelos, N. Feature Space Transfer for Data Augmentation. arXiv 2019, arXiv:1801.04356.
21. Luo, Z.; Zou, Y.; Hoffman, J.; Fei-Fei, L. Label Efficient Learning of Transferable Representations across Domains and Tasks. arXiv
2017, arXiv:1712.00123.
22. Tan, W.C.; Chen, I.M.; Pantazis, D.; Pan, S.J. Transfer Learning with PipNet: For Automated Visual Analysis of Piping Design. In
Proceedings of the 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), Munich, Germany,
20–24 August 2018; pp. 1296–1301. [CrossRef]
23. Montúfar, G.; Pascanu, R.; Cho, K.; Bengio, Y. On the Number of Linear Regions of Deep Neural Networks. arXiv 2014,
arXiv:1402.1869.
24. Kawaguchi, K.; Huang, J.; Kaelbling, L.P. Effect of Depth and Width on Local Minima in Deep Learning. Neural Comput. 2019,
31, 1462–1498. [CrossRef]
25. Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif.
Intell. Rev. 2020, 53, 5455–5516. [CrossRef]
26. Hochreiter, S. The Vanishing Gradient Problem during Learning Recurrent Neural Nets and Problem Solutions. Int. J. Uncertain.
Fuzziness Knowl.-Based Syst. 1998, 6, 107–116. [CrossRef]
27. Srivastava, R.K.; Greff, K.; Schmidhuber, J. Highway Networks. arXiv 2015, arXiv:1505.00387.
28. Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [CrossRef]
29. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf.
Process. Syst. 2012, 25, 1097–1105. [CrossRef]
30. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556.
Mach. Learn. Knowl. Extr. 2022, 4 41
31. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with
convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA,
USA, 7–12 June 2015; pp. 1–9. [CrossRef]
32. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385.
33. Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer
parameters and <0.5MB model size. arXiv 2016, arXiv:1602.07360.
34. Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings
of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017;
pp. 5987–5995. [CrossRef]
35. Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269.
[CrossRef]
36. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient
Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861.
37. Zagoruyko, S.; Komodakis, N. Wide Residual Networks. arXiv 2017, arXiv:1605.07146.
38. Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv
2017, arXiv:1707.01083.
39. Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. MnasNet: Platform-Aware Neural Architecture
Search for Mobile. arXiv 2019, arXiv:1807.11626
40. Zaheer, R.; Shaziya, H. A Study of the Optimization Algorithms in Deep Learning. In Proceedings of the 2019 Third International
Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 10–11 January 2019; pp. 536–539. [CrossRef]
41. Kaziha, O.; Bonny, T. A Comparison of Quantized Convolutional and LSTM Recurrent Neural Network Models Using MNIST. In
Proceedings of the 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA), Ras Al
Khaimah, United Arab Emirates, 19–21 November 2019; pp. 1–5. [CrossRef]
42. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch:
An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Process. Syst. 2019, 32, 8024–8035.
43. Baker, N.A.; Szabo-Mýller, P.; Handmann, U. Transfer learning-based method for automated e-waste recycling in smart cities.
EAI Endorsed Trans. Smart Cities 2021, 5. [CrossRef]
44. Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of Image Classification Algorithms Based on Convolutional Neural
Networks. Remote Sens. 2021, 13, 4712. [CrossRef]