0% found this document useful (0 votes)
24 views20 pages

Make 04 00002 v2

Uploaded by

yugalkumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views20 pages

Make 04 00002 v2

Uploaded by

yugalkumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

machine learning &

knowledge extraction

Article
A Transfer Learning Evaluation of Deep Neural Networks for
Image Classification
Nermeen Abou Baker * , Nico Zengeler and Uwe Handmann

Computer Science Institute, Ruhr West University of Applied Sciences, 46236 Bottrop, Germany;
[email protected] (N.Z.); [email protected] (U.H.)
* Correspondence: [email protected]

Abstract: Transfer learning is a machine learning technique that uses previously acquired knowledge
from a source domain to enhance learning in a target domain by reusing learned weights. This
technique is ubiquitous because of its great advantages in achieving high performance while saving
training time, memory, and effort in network design. In this paper, we investigate how to select the
best pre-trained model that meets the target domain requirements for image classification tasks. In
our study, we refined the output layers and general network parameters to apply the knowledge of
eleven image processing models, pre-trained on ImageNet, to five different target domain datasets.
We measured the accuracy, accuracy density, training time, and model size to evaluate the pre-trained
models both in training sessions in one episode and with ten episodes.

Keywords: transfer learning; image classification; deep neural network

 1. Introduction

Citation: Abou Baker, N.; Zengeler,
Deep learning is a subfield of machine learning that allows computers to automatically
N.; Handmann, U. A Transfer
interpret representations of data by learning from examples. Transfer learning is a deep
Learning Evaluation of Deep Neural learning technique that uses previous knowledge to learn new tasks and is becoming
Networks for Image Classification. increasingly popular in many applications with the support of Graphics Processing Unit
Mach. Learn. Knowl. Extr. 2022, 4, (GPU) acceleration. Transfer learning has many benefits that have attracted researchers
22–41. https://doi.org/10.3390/ in different domains, to name but a few: medical applications [1], remote sensing [2],
make4010002 optical satellite images [3], supporting automated recycling [4], natural language process-
ing [5], mobile applications [6], etc. However, there are some caveats in choosing the best
Academic Editor: Andreas Holzinger
pre-trained model for such applications, as most focus on accuracy and leave out other
Received: 3 December 2021 important parameters. Therefore, it is important to also consider other metrics such as
Accepted: 10 January 2022 training time or memory requirements before proceeding to a concrete implementation.
Published: 14 January 2022 Transfer learning is performed with pre-trained models, typically large Convolutional
Publisher’s Note: MDPI stays neutral
Neural Networks (CNNs) that are pre-trained on large standard benchmark datasets
with regard to jurisdictional claims in and then reused for the new target task. The reuse of such pre-trained models can be
published maps and institutional affil- easily implemented by, for example, replacing certain layers with other task-specific layers
iations. and then training the model for the target task. Moreover, many frameworks such as
PyTorch, MATLAB, Caffe, TensorFlow, Onnx, etc., provide several pre-trained models that
can help researchers implement this promising technique. The state-of-the-art has many
architectures, each with its own characteristics, that are suitable for CNN applications.
Copyright: © 2022 by the authors. However, the performance of the resulting transfer learning network depends on the pre-
Licensee MDPI, Basel, Switzerland. trained model used. Before going into the reuse of these models, it seems that there is a
This article is an open access article great deal of freedom in choosing the model.
distributed under the terms and
According to [7], the size and similarity of the target dataset and the source task can
conditions of the Creative Commons
be used as rules of thumb to choose the pre-trained model. ImageNet is a leading dataset
Attribution (CC BY) license (https://
due to its popularity and data diversity. However, fine-tuning pre-trained models that are
creativecommons.org/licenses/by/
trained on ImageNet is not per se able to achieve good results on spectrograms, for example.
4.0/).

Mach. Learn. Knowl. Extr. 2022, 4, 22–41. https://doi.org/10.3390/make4010002 https://www.mdpi.com/journal/make


Mach. Learn. Knowl. Extr. 2022, 4 23

Besides, following the previous strategy might not be enough with the current challenging
constraints that require high accuracy, a short training time, and limited hardware resources
for specific applications. Previously pre-trained model analysis was presented in [8], who
collected reported values from the literature and compared the models’ performance on
ImageNet to evaluate several scores, such as the top-five accuracy normalized to model
complexity and power consumption.
Another worthwhile attempt was presented by [9], who benchmarked pre-trained
models on ImageNet using multiple indices such as accuracy, computational complexity,
memory usage, and inference time to help practitioners better fit the resource constraints.
Choosing the best pre-trained model is a complex dilemma that needs to be well
understood, and researchers could feel confused about picking the most suitable option.
We performed extensive experiments to classify five datasets on eleven pre-trained models.
We provide in-depth insight and offer a feasible guideline for transfer learning that uses
a pre-trained model by introducing an overview of the tested models and datasets and
evaluating their performance using different metrics. Since most pre-trained models are
used to classify ImageNet, we conducted our research on different datasets, including
standard and non-standard tasks.
The paper is organized as follows: It starts by introducing the research gap in the
Introduction in Section 1. Section 2 summarizes the related learning methods. Section 3
gives an overview of the main characteristics of the tested models and datasets. Section 4
focuses on the implementation of the models. Results are presented and discussed in
Section 5. Finally, the conclusion of the work is given in Section 6.

2. Summary of Related Learning Methods


Machine learning is data-hungry; therefore, it has tremendous success in data-intensive
applications, but it is limited when the dataset is small. This section summarizes different
types of related machine learning methods for solving image classification tasks, including
zero-shot learning, one-shot learning, few-shot learning, and transfer learning. One com-
mon advantage of these methods is that they leave out the burden of collecting large-scale
supervised data and the issue of data scarcity.

2.1. Zero-Shot Learning


With zero-shot learning, it is possible to train a model without accessing data with non-
observed labels during training by using previous labels and some auxiliary information. It
assumes that the model can classify instances of unseen visual examples. This method looks
promising when new unlabeled examples are introduced frequently [10]. In the zero-shot
learning method, the test set and training class set are disjoint [11]. Several solutions to
this problem have been proposed, such as learning intermediate attribute classifiers [12],
learning a mixture of seen class proportions [13], or compatibility learning frameworks [14],
for example.

2.2. One-Shot Leaning


One of the limitations of deep learning is that it demands a huge amount of training
data examples to learn the weights. However, one-shot learning seeks to predict the
required output based on one or a few learning examples. However, this is usually achieved
by either sharing feature representations [15] or model parameters [16]. Methods such as
this are useful for classification tasks when it is hard to classify data for every possible class
or when new classes are added [10]. One-shot learning has been proven to be an efficient
method as the number of known labels grows because in this case, it is most likely that the
model has already learned a label that is very similar to the one to be learned [17].
Mach. Learn. Knowl. Extr. 2022, 4 24

2.3. Few-Shot Learning


This method refers to feeding a model with a small number of training data samples.
It is useful for applications that lack information or can be accessed only with difficulty
due to concerns about privacy, safety, or ethical issues [18].

2.4. Transfer Learning


In line with the previously mentioned methods, and according to [19–21], transfer
learning methods often use few-shot learning, where prior knowledge is transformed
from the source task into a few-shot task [19]. There are two ways to implement transfer
learning: fine-tuning only the classifier layers, which keeps the entire model’s weight
constant, excluding the last layer, and fine-tuning all layers, which allows the weights to
change throughout the entire network. Section 4 describes these two ways technically [22].

3. Models and Datasets


In this section, we present the CNN-design-based architectures as a critical factor in
constructing the pre-trained models, the tested models, and the datasets.

3.1. CNN-Design-Based Architectures


The CNN is the fundamental component in developing a pre-trained model, and to
understand the architecture, some criteria define the design architecture of the models,
as follows:
• Depth: The NN depth is represented by the number of successive layers. Theoretically,
deep NNs are more efficient than shallow architectures, and increasing the depth of
the network by adding hidden layers has a significant effect on supervised learning,
particularly for classification tasks [23]. However, cascading layers in a Deep Neural
Network (DNN) is not straightforward, and this may cause an exponential increase in
the computational cost;
• Width: The width of a CNN is as significant as the depth. Stacking layers may learn
various feature representations, but they would not learn useful features. Therefore, a
DNN should be wide enough, so the loss at the local minima could be smaller with
larger layer widths [24];
• Spatial kernel size: A CNN has many parameters and hyperparameters, including
weights, biases, the number of layers, the activation function, the learning rate, and the
kernel size, which define the level of granularity. Choosing the kernel size affects the
correlation of neighboring pixels. Smaller filters extract local and fine-grained features,
whereas larger filters extract coarse-grained features [25];
• Skip connection: Although a deeper NN yields better performance, it may face chal-
lenges in performance degradation, vanishing gradients, or higher test and training
errors [26]. To tackle these problems, the shortcut layer connection was first proposed
by [27] by skipping some intermediate layers to allow the special flow of information
across the layers, for example zero-padding, projection, dropout, skip connections, etc.;
• Channels: CNNs have powerful performance in learning features automatically,
and this can be dynamically performed by tuning the kernel weights. However,
some feature maps have little or no role in object discrimination [28] and could cause
overfitting as well. Those feature maps (or the channels) can be optimally selected in
designing the CNN to avoid overfitting.

3.2. Neural Network Architectures


This study tested eleven popular pre-trained models. Figure 1 gives a comprehensive
infographic representation over time. Table 1 depicts all the tested models with their main
characteristics based on their design, which is discussed in Section 3.1.
Mach. Learn. Knowl. Extr. 2022, 4 25

Table 1. The tested models with their main characteristics, where * refers to features specially designed
for the model.

Model Year Depth Main Design Characteristics Reference


AlexNet 2012 8 Spatial [29]
VGG-16 2014 16 Spatial and depth [30]
GoogLeNet 2014 22 Depth and width [31]
ResNet-18 2015 18 Skip connection [32]
SqueezeNet 2016 18 Channels [33]
ResNext 2016 101 Skip connection [34]
DenseNet 2017 201 Skip connection [35]
MobileNet 2017 54 Depthwise separable conv * [36]
WideResNet 2017 16 Width [37]
ShuffleNet-V2 2017 50 Channel shuffle * [38]
MnasNet 2019 No linear sequence Neural architecture search * [39]

MobileNet (54)
Mobile models

AlexNet (8) GoogLeNet (22)


DenseNet (201)
ResNet (18)
ILSVRC Winning entry ILSVRC Winning entry ILSVRC Winning entry similar rest net with
ResNext half params

Filter Concatena�on

1x1 Convolu�ons 3x3 Convolu�ons 5x5 Convolu�ons 1x1 Convolu�ons

MnasNet
1x1 Convolu�ons 1x1 Convolu�ons 3x3 max pooling

(no linear modules sequence)


Previous layer

2012 2014 2015 2016 2017 2019

Wide ResNet (16)


ILSVRC 2nd place
Color Map:
Spatial conv1 conv2 conv3 conv4 conv5
fc6 fc7
ShuffleNet-V2 (50)
Depth
Width
Skip connection SqueezeNet (18)
Channels VGG-16 (16) op�mize model size
Special Features ILSVRC 1st runner-up

Figure 1. Infographic of the tested pre-trained models. Each model is introduced with its architecture
symbol, the number of layers between brackets, and design specification (see the color map).

3.3. Datasets
A combination of standard datasets was tested, which were: CIFAR10 with 60 K
images [40], Modified National Institute of Standards and Technology (MNIST) with
70 K images [41], Hymenoptera [42], and non-standard, which were: smartphones and
augmented smartphones [43], as follows:

3.3.1. Hymenoptera
This is a small RGB dataset that is used to classify ants and bees from a PyTorch
tutorial on transfer learning. It consists of 245 training images and 153 testing images.

3.3.2. Smartphone Dataset


This is a relatively small dataset of different smartphone models, representing six
brands, namely: Acer, HTC, Huawei, Apple, LG, and Samsung. It contains 654 RGB images
with twelve classes, which are: Acer Z6, HTC 12S, HTC R70, Huawei Mate 10, Huawei
P20, iPhone 5, iPhone 7 Plus, iPhone 11 Pro Max, LG G2, LG Nexus 5, Samsung Galaxy S20
Ultra, and Samsung S10E. We created this dataset as a case study in a previous work [43],
Mach. Learn. Knowl. Extr. 2022, 4 26

to show that transfer learning can reach high accuracy with a small dataset to support
automated e-waste recycling through device classification. We collected the images from
the search engines focusing on the backside where unique features such as the logo and
camera lenses, which are distinguishing because most front-sides of modern smartphones
look similar, as showcased in Figure 2.

Figure 2. Example of a subset of the smartphone dataset.

3.3.3. Augmented Smartphone Dataset


Data augmentation is usually used to increase the volume of the dataset effortlessly.
We applied a rotation operation in combination with increasing the noise. For the rotation
operation, we rotated by r ∈ {45◦ , 135◦ , 225◦ , 315◦ }; for the noise operation, we added
noise in percentages p ∈ {10%, 25%, 50%} by adding pixels from a discrete uniform
distribution {0. . . 255 · p}. This resulted in a total of twelve augmentation operations for
each image. Therefore, the total number of images was multiplied by 12 to obtain a total
number of 8502 images in the augmented dataset, including the 654 original images.

4. Implementation
This study performed two scenarios under the same condition, using an Nvidia GTX
1080 TiGPU to train and evaluate eleven PyTorch vision models in a sequential fashion,
namely AlexNet, VGG-16, Inception-V1 (GoogLeNet), ResNet-18, SqueezeNet, DenseNet,
ResNext, MobileNet, Wide ResNet, ShuffleNet-V2, and MnasNet. We re-trained each model
on five tasks, namely MNIST, CIFAR10, Hymenoptera, smartphones, and augmented
smartphones, each in a grid search over learning rates η ∈ {10−2 , 10−3 , 10−4 } with the
ADAM optimizer and a batch size equal to 10. In our plots, we show only the model with
the highest accuracy in the overall learning rate. To overcome overfitting, we performed
early stopping, so we saved model weights only if the validation accuracy increased. That
is, if the validation accuracy decreased, we still used the best model found so far.
We chose to perform two experiments in our paper where a pre-trained model was
used to:
• Fine-tune the classifier layer only: This method keeps the feature extraction layers from
the pre-trained model fixed, so-called frozen. We then re-initialized the task-specific
classifier parts, as given by reference in the PyTorch vision model implementations [42],
with random values. If the PyTorch model did not have an explicit classifier part,
Mach. Learn. Knowl. Extr. 2022, 4 27

for example the ResNet18 architecture, we fine-tuned only the last fully connected
layer. We froze all other weights during training. This technique saved training time
and, to some degree, overcame the problem of a small-sized target dataset because it
only updated a few weights;
• Fine-tune all layers: For this method, we used the PyTorch vision models with original
weights as pre-trained on ImageNet and fine-tuned the entire parameter vector. In the-
ory, this technique achieves higher accuracy and generalization, but it requires a longer
training time since it is used for initializing weights by continuing the backpropagation
instead of random initialization in scratch training.
PyTorch vision models typically have a classifier part and a feature extraction part.
Fine-tuning the output layers means fine-tuning the classifier part, which results in a large
variation in the model size. We froze all other weights during training. We assessed the
model performance with four metrics: the accuracy, the accuracy density, the model size,
and training time on a GPU.

4.1. Accuracy Density


This represents the accuracy divided by the number of parameters:
accuracy
density = (1)
#parameters

A higher value corresponds to a higher model efficiency in terms of parameter usage.

4.2. Accuracy and Model Sizes vs. Training Time


Along with measuring accuracy across tasks, we also measured the training time in
seconds and the number of learning parameters in MB. The more complex the model is,
the more parameters need to be optimized. When determining the memory utilization of a
GPU for each model, the number of parameters is critical. This is the amount of memory
that will be allocated to the network and the amount of memory needed to process a batch.

5. Results
We present our results for two experiments, learning from one episode and learning
from ten episodes. In each experiment, we tested the fine-tuning of both the classifier batch
and the entire network. In the configurations with few shots, each sample was presented
only once in a single training episode, while in the configuration with ten episodes, each
sample was presented ten times accordingly.

5.1. One-Episode Learning


5.1.1. Fine-Tuning the Full Layers
As shown in Figure 3, we calculated the average accuracy densities of all tested
datasets, and we found that SqueezeNet with full tuning showed the highest accuracy
density among all models, particularly AlexNet, which came in tenth place. This result af-
firmed the original hypothesis when SqueezeNet was designed, that it preserves AlexNet’s
accuracy with 50-times fewer parameters and less than a 0.5 MB model size [33].

5.1.2. Fine-Tuning the Classifier Layers Only


The results, as seen in Figure 3, were slightly different in terms of the accuracy density
in the order of the models, but it showed a big difference in the values, where ResNet18
was the most suitable candidate. Each Dataset is tested for both experiments and shown in
detail in Appendix A.
Mach. Learn. Knowl. Extr. 2022, 4 28

Average accuracy densities in full tune


vgg16
alexnet
wide_resnet50_2
densenet
resnext50_32x4d
resnet18
googlenet
mnasnet
mobilenet
shufflenet
squeezenet
0.0 0.2 0.4 0.6 0.8 1.0 1.2
Average accuracy density (%acc/#params) 1.0× 10-5

Average accuracy densities in output tune


vgg16
alexnet
wide_resnet50_2
resnext50_32x4d
densenet
shufflenet
mnasnet
mobilenet
googlenet
squeezenet
resnet18
0.00000 0.00025 0.00050 0.00075 0.00100 0.00125 0.00150 0.00175
Average accuracy density (%acc/#params)

Figure 3. Average accuracy densities with full tuning and tuning the classifier layer only for one episode.

5.2. Ten-Episode Learning


We tested 10 independent trials and calculated the average results to avoid any bias,
as follows.

5.2.1. Fine-Tuning the Full Layers


Figure 4 shows that ten-episodes learning did not affect the order of the models in
terms of the accuracy densities compared with the one-episode experiments, and the values
were higher to a small degree. Figure 5 shows the average model sizes and accuracy vs. the
training time for all tasks and models after fine-tuning the full layers of all datasets.

5.2.2. Fine-Tuning the Classifier Layers Only


We found that ResNet18 showed a satisfactory result again as the most efficient model
that used its parameters efficiently, as shown in Figure 4. Figure 6 shows the average
model sizes and accuracy vs. the training time for all tasks and models after fine-tuning the
classifier layer only of all datasets. MnasNet had the poorest performance, making it the
least-favorable model in terms of the error metrics, yet it showed a low model complexity
and a short training time. The most complex model in all experiments was VGG-16, and the
accuracy density figures confirmed this fact. As a result, it might be less trustworthy for
embedded and mobile devices. Each Dataset is tested for both experiments and shown in
detail in Appendix B.
Average accuracy densities in full tune
vgg16
wide_resnet50_2
alexnet
densenet
resnext50_32x4d
resnet18
googlenet
mnasnet
mobilenet
shufflenet
squeezenet
0.0 0.5 1.0 1.5 2.0
1.0 × 10-5
Average accuracy density (%acc/#params)

Average accuracy densities in output tune


vgg16
alexnet
wide_resnet50_2
densenet
resnext50_32x4d
mnasnet
mobilenet
shufflenet
googlenet
squeezenet
resnet18
0.0000 0.0005 0.0010 0.0015 0.0020 0.0025
Average accuracy density (%acc/#params)

Figure 4. Average accuracy densities with full tuning and tuning the classifier layer only for ten episodes.
Mach. Learn. Knowl. Extr. 2022, 4 29

Tuning hyperparameters means finding the best set of parameter values for a learning
algorithm. In CNNs, the initial layers are designed to extract global features, whereas the
later ones are more task-specific. Therefore, when tuning the classification layer, only the
final layer for classification is replaced, while the other layers are frozen (the weights of
the other layers are fixed). This means utilizing the knowledge of the overall architecture
as a feature extractor and using it as a starting point for retraining. Consequently, it
achieved high performance with a smaller number of parameters and a shorter training
time, as shown in Figure 6. Usually, this scenario is used when the target task labels are
scarce [44]. On the other hand, full tuning means retraining the whole network (the weights
are updated after each epoch) with a longer training time and more parameters, as shown
in Figure 5. When target task labels are plentiful, this scenario is typically applied. Each
Dataset is tested for tuning the classifier layer only and tuning full layers and shown in
detail for ten episodes in Appendix C, and for one episode in Appendix D.
Model sizes (Number of parameters x 10 6) and accuracy vs training time for all tasks and models after tuning full layers

GoogLe Dense
Res18 Shuffle ResNext
5.61
85 1.26
11.18 Mobile 23 26.5
2.24
WideRes50
80

Mnas
66.86
3.12
75

70
Accuracy %

65

60
Squeeze
0.74
55

50

45

Vgg16
40 Alex 134.31
57.05

2000 4000 6000 8000 10000 12000


Training Time [s]

Figure 5. Model sizes and accuracy vs. training time for all tasks and models after fine-tuning
full layers.

Model sizes (Number of parameters x 10 4) and accuracy vs training time for all tasks and models after tuning the classification layers only

Dense
86 2.03

Vgg16
84 ResNext 11958.4
1.89
Res18
Squeeze Mobile 0.47
82 0.47 1.18

GoogLe
0.94
80 WideRes50
1.89

78
Accuracy %

Alex
5457.18
76

74

72
Mnas
1.18

70

68

66 Shuffle
0.94

70 80 90 100 110 120


Training Time [s]

Figure 6. Model sizes and accuracy vs. training time for all tasks and models after fine-tuning the
classifier layer only.
Mach. Learn. Knowl. Extr. 2022, 4 30

6. Conclusions
DNNs’ performance has been enhanced over time in many aspects. Nonetheless,
there are critical parameters that define which pre-trained model perfectly matches the
application requirements. In this paper, we presented a comprehensive evaluation of eleven
popular pre-trained models on five datasets as a guiding tool for choosing the appropriate
model before deployment. We conducted two different sets of experiments: one-episode
learning and ten-episode learning, with each experiment involving tuning the classifier
layer only and full tuning. The previous findings, however, might provide some clues for
choosing the right model for fine-tuning the classification layer only. For applications that
require high accuracy, GoogLeNet, DenseNet, ShuffleNet-V2, ResNet-18, and ResNext are
the best candidates, while SqueezeNet is for the accuracy density, and AlexNet for the
shortest training time, and SqueezeNet, ShuffleNet, MobileNet, MnasNet, and GoogLeNet
are almost equal regarding the smallest model size, for embedded systems applications,
for example. On the other hand, we can also provide some suggestions when fine-tuning
only the classification layers. DenseNet achieved the highest accuracy, while ResNet18 the
best accuracy density, and SqueezeNet the shortest training time. In addition, all models
had small model sizes except AlexNet and VGG-16. Although we provided guidelines and
some hints, our argumentation does not give a final verdict, but it supports decisions for
choosing the right pre-trained model based on the task requirements.
Thus, for specific application constraints, selecting the right pre-trained model can be
challenging due to the tradeoffs among training time, model size, and accuracy as decision
factors to produce better scores.
For future work, we plan to test more evaluation metrics with the provided parameters
to facilitate decision-making in choosing the optimum model to fine-tune. Furthermore,
we aim to systematically investigate the usability of all available a priori and a posteriori
metadata for estimating useful transfer learning hyperparameters.

Author Contributions: Conceptualization, N.A.B. and N.Z.; methodology, N.A.B.; software, N.Z.;
writing—original draft preparation, N.A.B.; writing—review and editing, N.A.B. and N.Z.; supervi-
sion, U.H. All authors have read and agreed to the published version of the manuscript.
Funding: This work has been partially funded by the Ministry of Economy, Innovation, Digitization,
and Energy of the State of North Rhine-Westphalia within the project Prosperkolleg.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: No new data were created or analyzed in this study. Data sharing is
not applicable to this article.
Conflicts of Interest: The authors declare no conflict of interest.

Appendix A. Accuracy Densities for Each Task with One-Episode Learning


• Accuracy densities for one-episode learning on CIFAR-10, Figure A1;
• Accuracy densities for one-episode learning on Hymenoptera, Figure A2;
• Accuracy densities for one-episode learning on MNIST, Figure A3;
• Accuracy densities for one-episode learning on augmented smartphone data, Figure A4;
• Accuracy densities for one-episode learning on original smartphone data, Figure A5.
Mach. Learn. Knowl. Extr. 2022, 4 31

Accuracy densities for CIFAR10 in full tune


vgg16
wide_resnet50_2
alexnet
densenet
resnext50_32x4d
resnet18
googlenet
mnasnet
mobilenet
shufflenet
squeezenet
0.00000 0.00002 0.00004 0.00006 0.00008 0.00010
Accuracy density [%acc / #params]
Accuracy densities for CIFAR10 in output tune
vgg16
alexnet
wide_resnet50_2
densenet
resnext50_32x4d
mnasnet
mobilenet
googlenet
shufflenet
resnet18
squeezenet
0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016
Accuracy density [%acc / #params]
Figure A1. Accuracy densities for one-episode learning on CIFAR-10.

Accuracy densities for hymenoptera in full tune


vgg16
wide_resnet50_2
alexnet
densenet
resnext50_32x4d
resnet18
googlenet
mnasnet
mobilenet
shufflenet
squeezenet
0.00000 0.00002 0.00004 0.00006 0.00008 0.00010 0.00012
Accuracy density [%acc / #params]
Accuracy densities for hymenoptera in output tune
vgg16
alexnet
wide_resnet50_2
densenet
resnext50_32x4d
mobilenet
mnasnet
shufflenet
googlenet
squeezenet
resnet18
0.00 0.02 0.04 0.06 0.08
Accuracy density [%acc / #params]
Figure A2. Accuracy densities for one-episode learning on Hymenoptera.
Mach. Learn. Knowl. Extr. 2022, 4 32

Accuracy densities for MNIST in full tune


vgg16
wide_resnet50_2
alexnet
densenet
resnext50_32x4d
resnet18
googlenet
mnasnet
mobilenet
shufflenet
squeezenet
0.00000 0.00002 0.00004 0.00006 0.00008 0.00010 0.00012
Accuracy density [%acc / #params]
Accuracy densities for MNIST in output tune
vgg16
alexnet
mnasnet
densenet
wide_resnet50_2
resnext50_32x4d
mobilenet
googlenet
shufflenet
resnet18
squeezenet
0.0000 0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 0.0175
Accuracy density [%acc / #params]
Figure A3. Accuracy densities for one-episode learning on MNIST.

Accuracy densities for smartphone_augmented in full tune


vgg16
alexnet
wide_resnet50_2
densenet
resnext50_32x4d
resnet18
googlenet
mnasnet
mobilenet
shufflenet
squeezenet
0.00000 0.00002 0.00004 0.00006 0.00008 0.00010
Accuracy density [%acc / #params]
Accuracy densities for smartphone_augmented in output tune
vgg16
alexnet
wide_resnet50_2
resnext50_32x4d
densenet
mnasnet
mobilenet
shufflenet
googlenet
resnet18
squeezenet
0.000 0.002 0.004 0.006 0.008 0.010
Accuracy density [%acc / #params]
Figure A4. Accuracy densities for one-episode learning on augmented smartphone data.
Mach. Learn. Knowl. Extr. 2022, 4 33

Accuracy densities for smartphone_orig in full tune


vgg16
alexnet
wide_resnet50_2
densenet
resnext50_32x4d
resnet18
googlenet
mnasnet
mobilenet
shufflenet
squeezenet
0 1 2 3 4 5 6
Accuracy density [%acc / #params] 1.0 × 10-5

Accuracy densities for smartphone_orig in output tune


vgg16
alexnet
wide_resnet50_2
resnext50_32x4d
densenet
shufflenet
mnasnet
mobilenet
googlenet
squeezenet
resnet18
0.000 0.002 0.004 0.006 0.008
Accuracy density [%acc / #params]

Figure A5. Accuracy densities for one-episode learning on original smartphone data.

Appendix B. Accuracy Densities for Each Task with Ten-Episode Learning


• Accuracy densities for ten-episode learning on CIFAR-10, Figure A6;
• Accuracy densities for ten-episode learning on Hymenoptera, Figure A7;
• Accuracy densities for ten-episode learning on MNIST, Figure A8;
• Accuracy densities for ten-episode learning on augmented smartphone data, Figure A9;
• Accuracy densities for ten-episode learning on original smartphone data, Figure A10.

Accuracy densities for CIFAR10 in full tune


vgg16
wide_resnet50_2
alexnet
densenet
resnext50_32x4d
resnet18
googlenet
mnasnet
mobilenet
shufflenet
squeezenet
0.00000 0.00002 0.00004 0.00006 0.00008 0.00010 0.00012
Accuracy density [%acc / #params]
Accuracy densities for CIFAR10 in output tune
vgg16
alexnet
densenet
wide_resnet50_2
resnext50_32x4d
mnasnet
mobilenet
googlenet
shufflenet
resnet18
squeezenet
0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016
Accuracy density [%acc / #params]
Figure A6. Accuracy densities for ten-episodes learning on CIFAR-10.
Mach. Learn. Knowl. Extr. 2022, 4 34

Accuracy densities for hymenoptera in full tune


vgg16
wide_resnet50_2
alexnet
densenet
resnext50_32x4d
resnet18
googlenet
mnasnet
mobilenet
shufflenet
squeezenet
0.00000 0.00002 0.00004 0.00006 0.00008 0.00010 0.00012
Accuracy density [%acc / #params]
Accuracy densities for hymenoptera in output tune
vgg16
alexnet
densenet
resnext50_32x4d
wide_resnet50_2
mobilenet
mnasnet
googlenet
shufflenet
squeezenet
resnet18
0.00 0.02 0.04 0.06 0.08
Accuracy density [%acc / #params]
Figure A7. Accuracy densities for ten-episodes learning on Hymenoptera.

Accuracy densities for MNIST in full tune


vgg16
wide_resnet50_2
alexnet
densenet
resnext50_32x4d
resnet18
googlenet
mnasnet
mobilenet
shufflenet
squeezenet
0.00000 0.00002 0.00004 0.00006 0.00008 0.00010 0.00012 0.00014
Accuracy density [%acc / #params]
Accuracy densities for MNIST in output tune
vgg16
alexnet
densenet
wide_resnet50_2
resnext50_32x4d
mobilenet
mnasnet
googlenet
shufflenet
resnet18
squeezenet
0.0000 0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 0.0175
Accuracy density [%acc / #params]
Figure A8. Accuracy densities for ten-episodes learning on MNIST.
Mach. Learn. Knowl. Extr. 2022, 4 35

Accuracy densities for smartphone_augmented in full tune


vgg16
wide_resnet50_2
alexnet
densenet
resnext50_32x4d
resnet18
googlenet
mnasnet
mobilenet
shufflenet
squeezenet
0.00000 0.00002 0.00004 0.00006 0.00008 0.00010 0.00012
Accuracy density [%acc / #params]
Accuracy densities for smartphone_augmented in output tune
vgg16
alexnet
wide_resnet50_2
resnext50_32x4d
densenet
mnasnet
mobilenet
googlenet
shufflenet
resnet18
squeezenet
0.000 0.002 0.004 0.006 0.008 0.010 0.012
Accuracy density [%acc / #params]
Figure A9. Accuracy densities for ten-episodes learning on augmented smartphone data.

Accuracy densities for smartphone_orig in full tune


vgg16
wide_resnet50_2
alexnet
densenet
resnext50_32x4d
resnet18
googlenet
mnasnet
mobilenet
shufflenet
squeezenet
0.00000 0.00002 0.00004 0.00006 0.00008 0.00010 0.00012
Accuracy density [%acc / #params]
Accuracy densities for smartphone_orig in output tune
vgg16
alexnet
wide_resnet50_2
densenet
resnext50_32x4d
mnasnet
mobilenet
shufflenet
googlenet
squeezenet
resnet18
0.000 0.002 0.004 0.006 0.008 0.010 0.012
Accuracy density [%acc / #params]
Figure A10. Accuracy densities for ten-episodes learning on original smartphone data.

Appendix C. Accuracy vs. Training Time and Number of Parameters (Model Size) for
Each Task with Ten-Episode Learning
• Accuracy vs. training time and model size for CIFAR-10 for ten-episode training,
Figure A11;
Mach. Learn. Knowl. Extr. 2022, 4 36

• Accuracy vs. training time and model size for MNIST for ten-episode training, Figure A12;
• Accuracy vs. training time and model size for Hymenoptera for ten-episode training,
Figure A13;
• Accuracy vs. training time and model size for original smartphone data for ten-episode
training, Figure A14;
• Accuracy vs. training time and model size for augmented smartphone data for ten-
episode training, Figure A15;
• Model sizes and accuracy vs. training time for all tasks and models after fine-tuning
the classifier layer only, where A refers to Augmented smartphones, C to CIFAR10, H
to Hymenoptera, M to MNIST, and O to the Original smartphone dataset, Figure A16;
• Model sizes and accuracy vs. training time for all tasks and models after full fine-
tuning, where A refers to Augmented smartphones, C to CIFAR10, H to Hymenoptera,
M to MNIST, and O to the Original smartphone dataset, Figure A17.

Model sizes and accuracies vs training time for CIFAR10


100.0

#params
97.5

5130

95.0
densenet-full
mobilenet-full
resnext50-full
googlenet-full 13434768
mnasnet-full
resnet18-full shufflenet-full
92.5 wide_resnet50_2-full
26864406
Accuracy [%]

squeezenet-full
90.0 40294045
vgg16-full
alexnet-full

87.5 53723683
vgg16-output
alexnet-output

67153322
85.0 shufflenet-output
densenet-output

80582960
82.5 squeezenet-output
resnext50_32x4d-output 94012598
googlenet-output
80.0 resnet18-output
wide_resnet50_ output
107442237
mobilenet-output
mnasnet-output

5000 10000 15000 20000 25000 120871875


Training time [s]

Figure A11. Accuracy vs. training time and model size for CIFAR-10 for ten-episode training.
Model sizes and accuracies vs training time for MNIST
100

resnet18-full
googlenet-full resnext50_32x4d-full #params
mobilenet-full mnasnet-full shufflenet-full densenet-full
squeezenet-full
alexnet-full wide_resnet50_2-full
alexnet-output vgg16-full
5130
vgg16-output
98
13434768

squeezenet-output 26864406
shufflenet-output
Accuracy [%]

96
densenet-output 40294045
resnet18-output

53723683

googlenet-output
94 67153322
mnasnet-output
resnext50_32x4d-output
mobilenet-output
80582960

94012598
92

wide_resnet50_2-output 107442237

1000 2000 3000 4000 5000 6000 7000 8000 9000 120871875
Training time [s]

Figure A12. Accuracy vs. training time and model size for MNIST for ten-episode training.
Model sizes and accuracies vs training time for hymenoptera
100

#params

1026
98

13427797
densenet-output

mnasnet-output resnext50_32x4d-output wide_resnet50_2-output


resnext50_32x4d-full 26854568
96 resnet18-output
mnasnet-full
wide_resnet50_2-full densenet-full
Accuracy [%]

40281339
mobilenet-output

shufflenet-output googlenet-full 53708110


vgg16-output
94 googlenet-output resnet18-full
mobilenet-full
67134882

squeezenet-output shufflenet-full 80561653

92 alexnet-output
vgg16-full 93988424
alexnet-full

squeezenet-full 107415195

90
50 60 70 80 90 100 110 120 130 120841966
Training time [s]

Figure A13. Accuracy vs. training time and model size for Hymenoptera for ten-episode training.
Mach. Learn. Knowl. Extr. 2022, 4 37

Model sizes and accuracies vs training time for smartphone_orig


100

#params

resnext50_32x4d-full
wide_resnet50_2-full 7182
densenet-full
resnet18-full vgg16-output
90
vgg16-full
googlenet-full
resnet18-output

13438254
alexnet-full

squeezenet-output squeezenet-full
mobilenet-full
alexnet-output

googlenet-output densenet-output 26869326


resnext50_32x4d-output
shufflenet-full
80
40300398
mobilenet-output

mnasnet-full 53731470
wide_resnet50_2-output

70 shufflenet-output 67162542

80593614

94024686
60

107455758
mnasnet-output

60 80 100 120 140 120886830


Training time [s]

Figure A14. Accuracy vs. training time and model size for original smartphone data for ten-
episode training.

Model sizes and accuracies vs training time for smartphone_augmented


100.0

mobilenet-full googlenet-full densenet-full


resnet18-full resnext50_32x4d-full wide_resnet50_2-full #params

97.5 shufflenet-full
squeezenet-full vgg16-full 7182
alexnet-full
mnasnet-full
95.0 13438254

densenet-output
26869326
92.5
Accuracy [%]

squeezenet-output
alexnet-output vgg16-output 40300398

90.0
53731470

87.5 shufflenet-output
67162542
resnet18-output resnext50_32x4d-output
mobilenet-output

googlenet-output
80593614
85.0

94024686

82.5 wide_resnet50_2-output

107455758

mnasnet-output

200 400 600 800 1000 120886830


Training time [s]

Figure A15. Accuracy vs. training time and model size for augmented smartphone data for ten-
episode training.

Model sizes and accuracies vs training time for all tasks and models after output fine tuning

100
resnext50_32x4d-H
densenet-H
#params
mnasnet-H alexnet-M
googlenet-H
mobilenet-H squeezenet-M vgg16-M
vgg16-H resnet18-M shufflenet-M
shufflenet-H
90 wide_resnet50_2-H googlenet-M 1026
resnext50_32x4d-M
resnet18-H mobilenet-M
wide_resnet50_2-M
alexnet-H
11961244
squeezenet-H vgg16-A
80 squeezenet-C shufflenet-C
densenet-A alexnet-C
squeezenet-A resnet18-C googlenet-C resnext50_32x4d-C
alexnet-A wide_resnet50_2-C
23921463
mobilenet-A mobilenet-C
googlenet-A
70
Accuracy [%]

vgg16-O
resnet18-A
resnext50_32x4d-A
35881682
resnet18-O wide_resnet50_2-A
shufflenet-A mnasnet-C

60 mobilenet-O
47841901
squeezenet-O mnasnet-A
densenet-O

alexnet-O 59802120
googlenet-O
50 resnext50_32x4d-O

71762338
mnasnet-O

40
83722557
shufflenet-O
mnasnet-M
wide_resnet50_2-O

30 95682776

50 100 150 200 250 300 107642995


Training time [s]

Figure A16. Model sizes and accuracy vs. training time for all tasks and models after fine-tuning
classifier layer only, where A refers to Augmented smartphones, C to CIFAR10, H to Hymenoptera,
M to MNIST, and O to the Original smartphone dataset.
Mach. Learn. Knowl. Extr. 2022, 4 38

Model sizes and accuracies vs training time for all tasks and models after full fine tuning

100
resnet18-M
mobilenet-M shufflenet-M #params
densenet-H squeezenet-M googlenet-M
wide_resnet50_2-M resnext50_32x4d-M
alexnet-M densenet-A vgg16-M
resnext50_32x4d-H resnet18-A resnext50_32x4d-A
mnasnet-H wide_resnet50_2-A mnasnet-M
wide_resnet50_2-H googlenet-C
resnet18-H mobilenet-A resnext50_32x4d-C 736450
90 mobilenet-H resnet18-C mobilenet-C
googlenet-H shufflenet-C
vgg16-H wide_resnet50_2-C
resnext50_32x4d-O
googlenet-A
shufflenet-H
squeezenet-H
14094595
resnet18-O
alexnet-C mnasnet-C
squeezenet-C vgg16-C
wide_resnet50_2-O
80 alexnet-H alexnet-A
vgg16-A 27452740
squeezenet-A
densenet-O shufflenet-A
vgg16-O
Accuracy [%]

40810885
70
mobilenet-O

googlenet-O 54169030
alexnet-O
60 mnasnet-A
mnasnet-O 67527176

80885321
50
squeezenet-O
94243466
shufflenet-O
40
107601611

200 400 600 800 120959756


Training time [s]

Figure A17. Model sizes and accuracy vs. training time for all tasks and models after full fine-tuning,
where A refers to Augmented smartphones, C to CIFAR10, H to Hymenoptera, M to MNIST, and O
to the Original smartphone dataset.

Appendix D. Accuracy vs. Training Time and Model Size for Each Task with
One-Episode Learning
• Accuracy vs. training time and model size for CIFAR-10 for one-episode training,
Figure A18;
• Accuracy vs. training time and model size for MNIST for one-episode training, Figure A19;
• Accuracy vs. training time and model size for Hymenoptera for one-episode training,
Figure A20;
• Accuracy vs. training time and model size for original smartphone data for one-
episode training, Figure A21;
• Accuracy vs. training time and model size for augmented smartphone data for one-
episode training, Figure A22.
Model sizes and accuracies vs training time for CIFAR10
100

#params

95
googlenet-full 5130

mobilenet-full densenet-full
resnext50_32x4d-full
resnet18-full shufflenet-full 13434768
90
wide_resnet50_2-full

26864406
vgg16-output
alexnet-full
85 densenet-output
Accuracy [%]

vgg16-full
squeezenet-full mnasnet-full 40294045
shufflenet-output

squeezenet-output alexnet-output 53723683


80
resnext50 output
resnet18-output googlenet-output
67153322
wide_resnet50_2-output

75
mobilenet-output 80582960

94012598
70

107442237

mnasnet-output
65
250 500 750 1000 1250 1500 1750 120871875
Training time [s]

Figure A18. Accuracy vs, training time and model size for CIFAR-10 for one-episode training.
Mach. Learn. Knowl. Extr. 2022, 4 39

Model sizes and accuracies vs training time for MNIST


100 resnet18-full shufflenet-full
alexnet-output squeezenet-full googlenet-full
mobilenet-full
alexnet-full vgg16-output vgg16-full wide_resnet50_2-full resnext50_32x4d-full densenet-full
squeezenet-output #params
mnasnet-full
resnet18-output shufflenet-output densenet-output
90 googlenet-output
resnext50_32x4d-output 5130
mobilenet-output
wide_resnet50_2-output

13434768
80

26864406

Accuracy [%]
70 40294045

53723683

60
67153322

50 80582960

94012598

40
107442237

mnasnet-output
200 400 600 800 1000 1200 1400 120871875
Training time [s]

Figure A19. Accuracy vs, training time and model size for MNIST for one-episode training.
Model sizes and accuracies vs training time for hymenoptera
100.0

#params

97.5
resnext50_32x4d-output
1026
densenet-output
mnasnet-output densenet-full
95.0
13427797
googlenet-output
mobilenet-output
resnext50_32x4d-full
vgg16-output 26854568
92.5
mnasnet-full
shufflenet-output wide_resnet50_2-full
Accuracy [%]

40281339
resnet18-full
90.0 vgg16-full
mobilenet-full
googlenet-full
53708110
wide_resnet50_2-output
resnet18-output
87.5
67134882

alexnet-output shufflenet-full
85.0 80561653
squeezenet-full

93988424
82.5
squeezenet-output
107415195

80.0 alexnet-full

6 8 10 12 14 16 18 20 120841966
Training time [s]

Figure A20. Accuracy vs, training time and model size for Hymenoptera for one-episode training.
Model sizes and accuracies vs training time for smartphone_orig
100

#params

90
7182

resnext50_32x4d-full
13438254
resnet18-full
80
wide_resnet50_2-full
26869326
densenet-full
vgg16-full
Accuracy [%]

70 40300398
vgg16-output
mobilenet-full
resnet18-output
googlenet-full 53731470
alexnet-full
60 mobilenet-output
mnasnet-full 67162542
squeezenet-output
densenet-output
googlenet-output
alexnet-output 80593614
50 resnext50_32x4d-output

squeezenet-full
94024686
mnasnet-output
shufflenet-full
40
107455758

shufflenet-output wide_resnet50_2-output
6 8 10 12 14 16 18 20 120886830
Training time [s]

Figure A21. Accuracy vs, training time and model size for original smartphone data for one-
episode training.
Model sizes and accuracies vs training time for smartphone_augmented
100

#params
densenet-full
95 resnext50_32x4d-full
resnet18-full wide_resnet50_2-full
7182
mobilenet-full
90
13438254

googlenet-full
85 26869326
vgg16-output
Accuracy [%]

40300398
80 alexnet-full
densenet-output vgg16-full
squeezenet-output
squeezenet-full 53731470

75 alexnet-output shufflenet-full
mobilenet-output
67162542
googlenet-output
70
resnext50_32x4d-output 80593614
resnet18-output
wide_resnet50_2-output
65 shufflenet-output 94024686

107455758
60 mnasnet-full
mnasnet-output
50 75 100 125 150 175 200 120886830
Training time [s]

Figure A22. Accuracy vs, training time and model size for augmented smartphone data for one-
episode training.
Mach. Learn. Knowl. Extr. 2022, 4 40

References
1. Lundervold, A.S.; Lundervold, A. An overview of deep learning in medical imaging focusing on MRI. Z. Fur Med. Phys. 2019,
29, 102–127. [CrossRef] [PubMed]
2. Pires de Lima, R.; Marfurt, K. Convolutional Neural Network for Remote-Sensing Scene Classification: Transfer Learning
Analysis. Remote Sens. 2020, 12, 86. [CrossRef]
3. Zou, M.; Zhong, Y. Transfer Learning for Classification of Optical Satellite Image. Sens. Imaging 2018, 19, 6. [CrossRef]
4. Abou Baker, N.; Szabo-Müller, P.; Handmann, U. Feature-fusion transfer learning method as a basis to support automated
smartphone recycling in a circular smart city. In Proceedings of the EAI S-CUBE 2020—11th EAI International Conference on
Sensor Systems and Software, Aalborg, Denmark, 10–11 December 2020.
5. Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; de Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-
Efficient Transfer Learning for NLP. arXiv 2019, arXiv:1902.00751.
6. Choe, D.; Choi, E.; Kim, D.K. The Real-Time Mobile Application for Classifying of Endangered Parrot Species Using the CNN
Models Based on Transfer Learning. Mob. Inf. Syst. 2020, 2020, 1–13. [CrossRef]
7. Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Transfer learning for time series classification. In Proceedings
of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018. [CrossRef]
8. Canziani, A.; Paszke, A.; Culurciello, E. An Analysis of Deep Neural Network Models for Practical Applications. arXiv 2017,
arXiv:1605.07678.
9. Bianco, S.; Cadene, R.; Celona, L.; Napoletano, P. Benchmark Analysis of Representative Deep Neural Network Architectures.
IEEE Access 2018, 6, 64270–64277. [CrossRef]
10. Socher, R.; Ganjoo, M.; Sridhar, H.; Bastani, O.; Manning, C.D.; Ng, A.Y. Zero-Shot Learning Through Cross-Modal Transfer.
arXiv 2013, arXiv:1301.3666.
11. Xian, Y.; Schiele, B.; Akata, Z. Zero-Shot Learning—The Good, the Bad and the Ugly. arXiv 2020, arXiv:1703.04394.
12. Lampert, C.H.; Nickisch, H.; Harmeling, S. Attribute-Based Classification for Zero-Shot Visual Object Categorization. IEEE Trans.
Pattern Anal. Mach. Intell. 2014, 36, 453–465. [CrossRef]
13. Zhang, Z.; Saligrama, V. Zero-Shot Learning via Semantic Similarity Embedding. arXiv 2015, arXiv:1509.04767.
14. Akata, Z.; Perronnin, F.; Harchaoui, Z.; Schmid, C. Label-Embedding for Image Classification. IEEE Trans. Pattern Anal. Mach.
Intell. 2016, 38, 1425–1438. [CrossRef] [PubMed]
15. Bart, E.; Ullman, S. Cross-generalization: Learning novel classes from a single example by feature replacement. In Proceedings of
the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26
June 2005; Volume 1, pp. 672–679. [CrossRef]
16. Fink, M. Object Classification from a Single Example Utilizing Class Relevance Metrics. In Advances in Neural Information
Processing Systems; Saul, L., Weiss, Y., Bottou, L., Eds.; MIT Press: Cambridge, MA, USA, 2005; Volume 17.
17. Tommasi, T.; Caputo, B. The More You Know, the Less You Learn: From Knowledge Transfer to One-shot Learning of Object
Categories. In Proceedings of the BMVC, London, UK, 7–10 September 2009. Available online: http://www.bmva.org/bmvc/20
09/Papers/Paper353/Paper353.html (accessed on 30 November 2021 ).
18. Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a Few Examples: A Survey on Few-Shot Learning. ACM Comput. Surv.
2020, 53, 63. [CrossRef]
19. Azadi, S.; Fisher, M.; Kim, V.; Wang, Z.; Shechtman, E.; Darrell, T. Multi-Content GAN for Few-Shot Font Style Transfer. arXiv
2017, arXiv:1712.00516.
20. Liu, B.; Wang, X.; Dixit, M.; Kwitt, R.; Vasconcelos, N. Feature Space Transfer for Data Augmentation. arXiv 2019, arXiv:1801.04356.
21. Luo, Z.; Zou, Y.; Hoffman, J.; Fei-Fei, L. Label Efficient Learning of Transferable Representations across Domains and Tasks. arXiv
2017, arXiv:1712.00123.
22. Tan, W.C.; Chen, I.M.; Pantazis, D.; Pan, S.J. Transfer Learning with PipNet: For Automated Visual Analysis of Piping Design. In
Proceedings of the 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), Munich, Germany,
20–24 August 2018; pp. 1296–1301. [CrossRef]
23. Montúfar, G.; Pascanu, R.; Cho, K.; Bengio, Y. On the Number of Linear Regions of Deep Neural Networks. arXiv 2014,
arXiv:1402.1869.
24. Kawaguchi, K.; Huang, J.; Kaelbling, L.P. Effect of Depth and Width on Local Minima in Deep Learning. Neural Comput. 2019,
31, 1462–1498. [CrossRef]
25. Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif.
Intell. Rev. 2020, 53, 5455–5516. [CrossRef]
26. Hochreiter, S. The Vanishing Gradient Problem during Learning Recurrent Neural Nets and Problem Solutions. Int. J. Uncertain.
Fuzziness Knowl.-Based Syst. 1998, 6, 107–116. [CrossRef]
27. Srivastava, R.K.; Greff, K.; Schmidhuber, J. Highway Networks. arXiv 2015, arXiv:1505.00387.
28. Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [CrossRef]
29. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf.
Process. Syst. 2012, 25, 1097–1105. [CrossRef]
30. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556.
Mach. Learn. Knowl. Extr. 2022, 4 41

31. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with
convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA,
USA, 7–12 June 2015; pp. 1–9. [CrossRef]
32. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385.
33. Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer
parameters and <0.5MB model size. arXiv 2016, arXiv:1602.07360.
34. Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings
of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017;
pp. 5987–5995. [CrossRef]
35. Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269.
[CrossRef]
36. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient
Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861.
37. Zagoruyko, S.; Komodakis, N. Wide Residual Networks. arXiv 2017, arXiv:1605.07146.
38. Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv
2017, arXiv:1707.01083.
39. Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. MnasNet: Platform-Aware Neural Architecture
Search for Mobile. arXiv 2019, arXiv:1807.11626
40. Zaheer, R.; Shaziya, H. A Study of the Optimization Algorithms in Deep Learning. In Proceedings of the 2019 Third International
Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 10–11 January 2019; pp. 536–539. [CrossRef]
41. Kaziha, O.; Bonny, T. A Comparison of Quantized Convolutional and LSTM Recurrent Neural Network Models Using MNIST. In
Proceedings of the 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA), Ras Al
Khaimah, United Arab Emirates, 19–21 November 2019; pp. 1–5. [CrossRef]
42. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch:
An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Process. Syst. 2019, 32, 8024–8035.
43. Baker, N.A.; Szabo-Mýller, P.; Handmann, U. Transfer learning-based method for automated e-waste recycling in smart cities.
EAI Endorsed Trans. Smart Cities 2021, 5. [CrossRef]
44. Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of Image Classification Algorithms Based on Convolutional Neural
Networks. Remote Sens. 2021, 13, 4712. [CrossRef]

You might also like