0% found this document useful (0 votes)
5 views

19.transfer Learning Based Data-Efficient Machine Learning Enabled Classification

Uploaded by

Bappy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

19.transfer Learning Based Data-Efficient Machine Learning Enabled Classification

Uploaded by

Bappy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing,

Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress

Transfer Learning based Data-Efficient Machine Learning Enabled Classification

Shuteng Niu Jian Wang Yongxin Liu


Embry-Riddle Aeronautical University Embry-Riddle Aeronautical University Embry-Riddle Aeronautical University
Daytona Beach, FL 32114 USA Daytona Beach, FL 32114 USA Daytona Beach, FL 32114 USA
Email: [email protected] Email: [email protected] Email: [email protected]

Houbing Song
Embry-Riddle Aeronautical University
Daytona Beach, FL 32114 USA
Email: [email protected]

Abstract—Recently, waste sorting has become more and and soil contamination. For example, toxic materials can be
more important in our daily life. It plays an essential role transferred into human bodies and wildlife from air, water,
in the big picture of waste recycling, reducing environmental and food. Moreover, soil contamination can seriously hurt
pollution significantly. Deep learning (DL) methods have been
dominating the field of image classification and have been all fields related to agriculture. As shown in the study of
successfully applied to waste sorting tasks to achieve state- [8], the expense of pollution control has been exponentially
of-art performance. However, most traditional DL methods increasing in the past few decades, and many potential
require a massive amount of annotated data for the training solutions have been proposed. To the best of our knowledge,
phase. Unfortunately, there is only one small data set for recycling is widely acknowledged as one of the proven ways
waste sorting, TrashNet created by Standford. In addition,
manually collecting and labeling a massive data-set can be too to reduce environmental pollution effectively. In general,
costly. To address this issue, we decided to implement transfer the benefits of cycling are listed as follows: reducing the
learning (TL) techniques to construct a robust model based on waste lost in landfills, reducing greenhouse gas emissions,
a fairly small set of training data by transferring knowledge and saving resources for making raw materials. Furthermore,
from existing deep networks, such as AlexNet, Resnet, and accurately sorting the waste from our daily life is the first
DensNet. As an innovation, we propose a novel domain loss
function, Dual Dynamic Domain Distance (4D), to produce a and very important step of the big picture of recycling.
more accurate domain distance measurement. There are three Therefore, finding an effective and efficient way is the key
contributions to this paper. First, our model has achieved the to the success of the cycling process.
best performance on the TrashNet data. Secondly, it is the In this paper, our focus is on building a DL model
first time that TL has been used for waste sorting. Finally, the for solid waste sorting, which lands in the field of image
proposed novel 4D domain loss has improved the performance
of TL for this task. In this paper, we implemented two types classification. Firstly, traditional image processing methods
of transfer learning methods, DDC, DeepCoral, to TrashNet use hand-designed features to complete tasks like classifica-
data-set. Moreover, the DeepCoral-Resnet50 model yields the tion, detection, segmentation. However, designing features
best performance of 96% test accuracy. More importantly, this by hand is a very time-consuming and costly process.
work can be easily generalized to other image classification Furthermore, it does not always output promising perfor-
tasks.
mance in complicated tasks. In the recent decade, DL has
Keywords-Deep learning, Transfer learning, image classifica- dominated this field by dramatically setting our hands free
tion, waste sort, Data-Efficient Machine Learning from designing features, and improving the performance.
Additionally, one of the most famous DL models, convo-
I. I NTRODUCTION lutional neural network (CNN), has shown its great power
We are entering a new era of smart cities, which offers in a number of different fields, such as object classification,
great promise for improved wellbeing and prosperity but object detection, and speech reorganization. Generally, a
poses significant challenges [1]–[3]. Machine learning and deep neural network (DNN) tends to enable the machine
data analytics have emerged as essential tools to address to learn how to accomplish the task. In other words, DNN
these challenges, which smart cities are facing [4]–[7]. can be considered as a black box of a massive amount of
Rapidly increasing pollution from overpopulation and hyper-parameters. The goal is to get the best performance
industrialization is causing serious damage to the natural by iteratively adjusting the values of parameters based on a
environment of the Earth. As the consequences, water pollu- set of rules. However, most DL methods require a huge set
tion, air pollution, and deforestation are causing a number of of well-labeled training data to get promising performance.
negative effects on our health and the economy, such as the In many real-world problems, we do not have a sufficient
increasing cancer rate, new diseases, extinction of species, amount of labeled data for training, or we cannot even

978-1-7281-6609-4/20/$31.00 ©2020 IEEE 620


DOI 10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00108

Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:27:17 UTC from IEEE Xplore. Restrictions apply.
find unlabeled training data. Researchers started focusing projects that are related to waste sorting. Moreover, for a
on transfer learning to address this issue, which allows better understanding, we categorize them into three sub-
us to leverage the knowledge stored in other well-trained fields: traditional methods, conventional DL methods, trans-
models. Moreover, we do not have many datasets for waste fer learning methods.
sorting tasks that can provide enough training data for deep
A. Traditional Methods
networks. Therefore, we propose a transfer learning model
for this topic. Firstly, a traditional model, support vector machine
According to [9], there are three common transfer learning (SVM), is considered one of the best initial image classi-
settings: inductive transfer learning, transductive transfer fication methods. Moreover, comparing to DL models, it is
learning, and unsupervised transfer learning. In general, simpler to build and easier to train. [10] built an SVM model
there are multiple different domains in a transfer learning for waste sorting based on a hand-designed feature detector,
task: one target domain and one or multiple source domains. SIFT. In addition, the SIFT descriptor is one of the most
As for inductive transfer learning, supervised training data powerful feature detectors, and it is invariant to scale, noise,
is always available in the target domain. In the setting and illumination [13]. Thus, it is extremely helpful to waste
of transductive transfer learning, the well-labeled data is sorting. Furthermore, the best kernel of SVM was found
only available in the source domain. Differently, there is no after testing a number of different kernels. It is defined as:
labeled data in both the source domain and the target domain 0 2
in the setting of unsupervised transfer learning. In this paper, 0
x−x
the setting of the proposed model fits into inductive transfer K(x, x ) = exp(− ) (1)
2σ 2
learning. In addition, we only have a small set of data [10] And, the best performance achieved by SMV was 63%
that contains 2530 images in total, which might not be testing accuracy.
enough for building a robust waste sorting model. We tend to
use domain adaption techniques to leverage the knowledge B. Conventional DL Methods
stored in deeply-trained models like, AlexNet [11], ResNet Importantly, as mentioned in the earlier contents, one of
[12], that are trained on ImageNet dataset. By doing so, we DL methods’ greatest advantages is that deep networks can
were able to push the testing accuracy to 96% by using such automatically learn features, instead of designing features
a small dataset. by hands. However, DL models require matching the size
The rest of the paper is organized as follows. Section of data and the size of the network. A significant mismatch
II presents related work. Dataset is introduced in Section usually causes over-fitting or under-fitting. [10] built a CNN
III. We present our proposed methodologies in Section IV. that is considered as a simplified version of AlexNet [11]. As
Moreover, experimental results are discussed in Section V. claimed by the authors, this model only achieved 22% testing
Section VI concludes this paper. accuracy, which is worse than a pure guess. Moreover,
[14] selected three successful DL architectures, namely,
Cardboard) Glass Metal
MobileNet [15], DenseNets [16], and Inception [17], to train
from scratch. As a result, those models achieved testing
accuracies, 84%, 84%, and 89%, respectively. DL models
achieve better performance than traditional models.
However, there are two main drawbacks of conventional
DL methods. Firstly, those selected models are reasonably
deep and complicated. Training from scratch is very time-
consuming and can be over-fitting with such a small dataset.
Paper Plastic Trash)
Secondly, one advantage we have is that there are a number
of datasets that contain the objects that are in TrashNet.
Furthermore, we can benefit from those samples in other
datasets if distribution mismatches can be reduced. However,
conventional DL methods cannot take advantage of those
samples from other domains.
Figure 1. Source Data & Target Data.
C. Transfer Learning Model
To address drawbacks of conventional methods, numerous
II. R ELATED W ORK transfer learning methods have been proposed. Commonly,
Previously, many image classification projects have been the distribution mismatches between the source domain and
created. However, there are not many that are related to the target domain are the main issue that prevents us from
waste sorting. In this section, we introduce a number of using samples collected from different domains for training.

621

Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:27:17 UTC from IEEE Xplore. Restrictions apply.
900
As one of the solutions, fine-tuning is acknowledged to be Source
800 Target
an effective way to deal with the distribution mismatch.
Primarily, [14] also implemented fine-tuning on the selected 700
DL architectures to improve the performance to a new
600
level. As shown in Table-I, the authors pushed the best
testing accuracy to 95% by combining fine-tuning and data 500

argumentation. Fine-tuning not only produces a better testing 400


accuracy but also dramatically reduces the training time.
300
Moreover, we would also like to expand the dataset by
leveraging the samples collected from other domains. In this 200
paper, we implement transfer learning methods DDC [18],
100
DeepCoral [19], to push the performance to an even higher
level. Generally, TL methods tend to reduce the distribution 0
Cardboard Glass Metal Paper Plastic Trash
mismatch by adding an additional constraint term to the
loss function. For example, DDC deploys Maximum Mean Figure 2. Source Data & Target Data.
Discrepancy [20] (MMD) and DeepCoral use CoralLoss
to measure the distance between two domains so that the
mismatch can be reduced. For our models, we modified by applying basic image data augmentation, such as flip,
the original loss functions in the original papers of DDC rotation, kernel filters.
and DeepCoral. Finally, the best testing accuracy, 96%, was
IV. M ETHODOLOGY
achieved by DeepCoral-based model.
As mentioned earlier, conventional DL algorithms have
Table I two significant shortcomings: insufficient training data and
P ERFORMANCE OVERVIEW
domain shift. Moreover, these two drawbacks significantly
Methods Testing Accuracy limit the potential of DL being applied to waste sorting. To
Traditional Methods 63% address this problem, we propose to adopt transfer learning
Conventional DL Methods 22%
to develop a robust waste classification model with a limited
Transfer Learning Methods 95%
Ours 96% amount of training data.
As a sub-field of data-efficient learning algorithms, trans-
fer learning is currently one of the most popular topics.
The concept of transfer learning is to solve the target task
III. DATASET
by leveraging the knowledge learned from source tasks
Firstly, there are not many open-source datasets for waste in different domains, instead of learning from scratch and
sorting. One of them, the TrashNet [10] was collected by requiring massive data. Generally, traditional machine learn-
students in Standford, which contains six classes: paper, ing algorithms assume that training and testing data are in
glass, metal, cardboard, plastic, and trash. There are 2527 the same feature space and share the identical distribution.
images with white background, and there are all resized However, this assumption does not always hold in many real-
to 512 by 384. Moreover, a few samples of each class of world problems [22]–[25]. One example is Office31 [26]
TrashNet are demonstrated in Figure1. classification, where we have a precise model trained on
Importantly, this is a fairly small dataset that might not be tons of data collected by webcam, but we now want to build
able to train a model with high-accuracy. And, [10], [14] all another model using a small amount of data collected from
used data augmentation techniques to expand the data-set. Amazon. In this case, we wish to generalize the knowledge
However, objects in TrashNet are all very common things learned from the source domain to the target task with a
and can be easily found in other datasets but with different completely different distribution. For this kind of problem,
distributions. In this paper, we wish to benefit from the transfer learning can deal with the limited data issue and
datasets in other domains using transfer learning techniques significantly reduce the time for training.
to deal with the distribution mismatch. In addition, we As introduced by [9], there are three categories of transfer
found another dataset [21] that has collected from different learning, inductive transfer learning: transductive transfer
distributions but contains very similar objects as TrashNet, learning, and unsupervised transfer learning. In this research,
so that we could use it as the source data. Moreover, the waste sorting is similar to multi-task learning problem,
distributions of the source data and the target data are shown which lands into the setting of inductive transfer learning.
in Figure2. As we can tell from the figure, the distribution of For inductive transfer learning, the source domain and the
the sample number of each class is imbalanced. Therefore, target domain usually have labeled data in both domains.
we first balanced out the sample number of each class However, the target domain’s training data is not always

622

Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:27:17 UTC from IEEE Xplore. Restrictions apply.
Classification Domain
Loss Loss Domain Confusion (DDC) [18], an AlexNet-based [11]
Convolutional Neural Network (CNN) with one adaptation
fc8 fc8
layer was proposed to learn a semantically meaningful and
domain invariant representation. Additionally, the evaluation
fc_adapt fc_adapt
metric can also be used to determine the position and the
fc7 fc7 dimensionality of the adaptation layer.
fc6 fc6 Additionally, DDC deploys a loss function that contains
conv5 conv5 two terms, classification loss LC , and MMD constraint
M M D2 . As shown in (2), XS and XT represent the
conv1 conv1 data sets from the source domain and the target domain.
Moreover, λ determines how strongly we would like to
Labeled Images Unlabeled Images confuse the domains.

L = LC (XL , y) + λM M D2 (XS , XT ) (2)


Figure 3. Deep Domain Confusion.

2
nS
X nT
X
enough, so we need to transfer the knowledge learned from M M D(XS , XT ) = φ(xiS ) − φ(xjT ) (3)
the source domain. This paper implemented a novel loss i=1 j=1
H
function with dynamic weighting and built four different
models, DDC-AlexNet, DDC-ResNet, DeepCoral-AlexNet, In addition, λ is a fixed coefficient, as described in the
and DeepCoral-Resnet. original paper. However, setting a reasonable value to it is
not a simple process. Greater value can lead the model to
fc6 fc7 focus too much on reducing the distribution mismatch, while
Cov1
Cov5
smaller value might get poor classification accuracy on the
fc8
target domain due to not focusing enough on the distribution
Classification
Loss mismatch. Therefore, we proposed to make λ to be a
dynamic factor. As described in (4), it is a hyperbolic-tan
function that scales from 0-1. Theoretically, we wish to focus
on extracting domain-invariant features in the early stage and
Shared

Shared

Shared

Shared

Shared

CORAL Loss

shift the focus on enhancing the target classification accuracy


at the later stage.

λ = tanh(0.02x) (4)
fc8
B. DDC-ResNet
Cov1 Cov5
fc6 fc7 Moreover, DDC is transfer learning architecture that can
be easily generalized to other pre-trained DL models. In
Figure 4. Deep coral with AlexNet backend. this paper, we also examined ResNet-based DDC model.
However, the adaption layer with dynamic loss function is
A. DDC-AlexNet added after the last average-pooling layer.
Previously, Alexnet [11] won the ILSVRC02012 compe- C. DeepCoral-AlexNet
tition and achieved top-5 test error rate of 15.3% on the Furthermore, [16] introduced another transfer learning
ImageNet data-set. Firstly, the idea of the adaptation layer framework, DeepCoral, which shares a similar idea as DDC.
was proposed by [27]. It introduced a modified feedforward As shown in Figure3 , it places one adaption layer after
neural network, Domain Adaptive Neural Network (DaNN), the last fully connected layer with a new loss function,
with one adaptation layer. Importantly, the loss function is CoralLoss. `CORAL , is defined as the distance between
constructed by two parts, the general loss, and the MMD the second-order covariances of the source and the target
regularizer, respectively. Additionally, the MMD loss is used features. And, it is described in (5),
to evaluate the distribution mismatch between the source
1
and target domains. However, it is a very shallow and `CORAL = kCS − CT k2F (5)
simple model, so the performance is still limited. To achieve 4d2
2
better performance, we wish to extend the potential of where CS and CT are feature covariance matrices, k·k is
DaNN to deeper networks. As illustrated in Figure3, Deep the squared matrix Frobenius norm. Moreover, inspired by

623

Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:27:17 UTC from IEEE Xplore. Restrictions apply.
FC

ResBlock1 ResBlock2 ResBlock3 ResBlock4 B. Results


AvgPool

According to Table II, comparing to other existing models


Classification
Loss built on TrashNet, our transfer learning models achieve
better performances in general and DeepCoral ResNet with
novel 4D loss has achieved the best testing accuracy, 96%
shared shared shared shared shared
with 75 epochs. Moreover, the only previous model that
CORAL Loss
FC
is close to DeepCoral ResNet is the fine-tuned DenseNet
ResBlock1 ResBlock2 ResBlock3 ResBlock4
model. What is more, we can see from the TableII is that
AvgPool transfer learning models are all the better than traditional
Classification
models and conventional DL models.
Loss

Table II
T RANSFER L EARNING P ERFORMANCE

Models TL Epoch Testing Accuracy


Figure 5. Deep coral with Resnet backend. DeepCoral ResNet 75 96%
DeepCoral AlexNet 80 93%
DDC ResNet 85 95%
DDC AlexNet 75 93%
multi-kernel MMD [28], we first propose a novel distribution DenseNet Fine-tune 120 95%
distance measurement, Dual Dynamic Domain Distance Models Not TL Epoch Testing Accuracy
(4D). As demonstrated in (6), 4D domain loss combines SVM 100 63%
two different evaluation metrics since we are not sure if a Inception-V1 100 89%
single metric is good enough for an accurate domain distance
measurement.
Furthermore, in all models built by us, DeepCoral ResNet
1 gives the best performance, 96% testing accuracy. Addi-
4D = (`M M D + `CORAL ) (6) tionally, as plotted in Figure6, ResNet-based models are
2
generally more accurate than AlexNet-based models. As
Finally, we dynamically combine the classification loss shown in the figure, all four models converge around 60
`Class and domain loss 4D as the final loss function: - 80 epochs, which is considerably faster than the fine-
tuning models proposed in [14]. However, TrashNet is still
relatively small for the DL architectures like ResNet, and
` = `Class + λ4D (7) AlexNet. The performances of the AlexNet-based model
start dropping after 130 epochs. Furthermore, we believe that
D. DeepCoral-ResNet the models start over-fitting from there. Differently, ResNet-
Same as DDC, DeepCoral also can be generalized to other based models maintain stable through all 200 epochs.
pre-trained networks. As shown in Figure5, we extended it 1
Resnet by adding the adaption layer after the last average-
pooling layer.
0.9

V. E XPERIMENTAL R ESULTS
0.8
Test Accuracy

A. Experimental Setup
0.7
As mentioned in section III, we have 2754 labeled-images
in the source domain, and 2530 labeled-images in the target
domain. In addition, images in two domains have the same 0.6
set of labels but different distributions. In the experiment,
we split the target dataset into Target train and Target test CoralRes
0.5 CoralAlex
by the ratio of 80/20. Moreover, the total epoch is set to DDCRes
200. Additionally, to extend the dataset even further, we DDCAlex

also applied simple data augmentation techniques to both 0.4


0 20 40 60 80 100 120 140 160 180 200
the source data and the target data. Specifically, horizontal Epoch
flipping, small rotation, and adding Gaussian noise were
performed. Figure 6. Accuracy Comparison.

624

Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:27:17 UTC from IEEE Xplore. Restrictions apply.
To show that the 4D loss function can improve the perfor- Lastly, models built in this experiment used labeled-target
mance, we made a comparison between DeepCoral ResNet data for training. However, other TL methods do not require
with regular loss function and the same model with a labeled-target for training, which might be more helpful for
dynamic loss function. As we can tell from Figure7, dynamic those real-world problems that do not have adequate labeled
loss function does not only faster convergence but also data.
gives a smoother curve. More importantly, the concept of
ACKNOWLEDGMENT
4D loss can be generalized to more different distribution
measurements by using a dynamical combination. This research was partially supported through Embry-
Riddle Aeronautical University’s Faculty Innovative Re-
1 search in Science and Technology (FIRST) Program.
0.95 R EFERENCES
0.9 [1] H. Song, R. Srinivasan, T. Sookoor, and S. Jeschke, Smart
Cities: Foundations, Principles and Applications. Hoboken,
0.85 NJ: Wiley, 2017.
Test Accuracy

0.8
[2] G. T. Reddy, M. P. K. Reddy, K. Lakshmanna, D. S. Rajput,
0.75 R. Kaluri, and G. Srivastava, “Hybrid genetic algorithm and
a fuzzy logic classifier for heart disease diagnosis,” Evolu-
0.7 tionary Intelligence, vol. 13, no. 2, pp. 185–196, 2020.
0.65 [3] H. Song, D. Rawat, S. Jeschke, and C. Brecher, Cyber-
Physical Systems: Foundations, Principles and Applications.
0.6
Boston, MA: Academic Press, 2016.
0.55 dynamic
regular [4] G. Dartmann, H. Song, and A. Schmeink, Big Data Analytics
0.5 for Cyber-Physical Systems: Machine Learning for the Inter-
0 20 40 60 80 100 120 140 160 180 200 net of Things. Elsevier, 2019.
Epoch
[5] G. T. Reddy, M. P. K. Reddy, K. Lakshmanna, R. Kaluri, D. S.
Figure 7. Dynamic loss vs Regular loss. Rajput, G. Srivastava, and T. Baker, “Analysis of dimension-
ality reduction techniques on big data,” IEEE Access, vol. 8,
pp. 54 776–54 788, 2020.
VI. C ONCLUDING R EMARKS
[6] Y. Sun, H. Song, A. J. Jara, and R. Bie, “Internet of things
First of all, recycling is an essential process for our Earth. and big data analytics for smart and connected communities,”
Pollution has caused a number of species extinctions, and IEEE Access, vol. 4, pp. 766–773, 2016.
the number is still increasing.
Secondly, DL is one of the most powerful ways for many [7] Z. Lv, H. Song, P. Basanta-Val, A. Steed, and M. Jo, “Next-
generation big data analytics: State of the art, challenges,
computer vision tasks. However, most DL methods have and future research topics,” IEEE Transactions on Industrial
heavily relied on the Big Data and computational power Informatics, vol. 13, no. 4, pp. 1891–1899, 2017.
to output state-of-art performances. In other words, the Big
Data is not only the power of DL, but also the limitation [8] A. Ghorani-Azam, B. Riahi-Zanjani, and M. Balali-Mood,
“Effects of air pollution on human health and practical mea-
of it. To address this issue, transfer learning has attracted sures for prevention in Iran,” Journal of research in medical
more and more attention in the past few years, and many TL sciences: the official journal of Isfahan University of Medical
algorithms have been proven to be successful. As introduced Sciences, vol. 21, 2016.
by Andrew Ng at NIPS 2016, TL will become the main
[9] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE
direction of DL in the future.
Transactions on knowledge and data engineering, vol. 22,
Finally, in this waste sorting experiment, we first have no. 10, pp. 1345–1359, 2009.
justified that TL models have achieved the best performance
better than all existing models built on TrashNet. And then, [10] M. Yang and G. Thung, “Classification of trash for recycla-
the novel domain loss function 4D proposed by us has bility status,” CS229 Project Report, vol. 2016, 2016.
shown the potential to benefit the TL models significantly [11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
with more accurate domain loss measurement. As in the classification with deep convolutional neural networks,” in
future, few ideas can potentially push the results to an even Advances in neural information processing systems, 2012, pp.
higher level. First, GANs-based data augmentation might 1097–1105.
perform better than traditional data augmentation techniques. [12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual
Then, other metrics that can calculate the distance between learning for image recognition,” CoRR, vol. abs/1512.03385,
two different domains could also enhance the performance. 2015. [Online]. Available: http://arxiv.org/abs/1512.03385

625

Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:27:17 UTC from IEEE Xplore. Restrictions apply.
[13] X. Yue, Y. Liu, J. Wang, H. Song, and H. Cao, “Software [27] M. Ghifary, W. B. Kleijn, and M. Zhang, “Domain adaptive
defined radio and wireless acoustic networking for ama- neural networks for object recognition,” in Pacific Rim inter-
teur drone surveillance,” IEEE Communications Magazine, national conference on artificial intelligence. Springer, 2014,
vol. 56, no. 4, pp. 90–97, April 2018. pp. 898–904.

[14] R. A. Aral, S. R. Keskin, M. Kaya, and M. Haciomeroglu, [28] A. Gretton, D. Sejdinovic, H. Strathmann, S. Balakrishnan,
“Classification of trashnet dataset based on deep learning M. Pontil, K. Fukumizu, and B. K. Sriperumbudur, “Optimal
models,” 2018 IEEE International Conference on Big Data kernel choice for large-scale two-sample tests,” in Advances
(Big Data), pp. 2058–2062, 2018. in neural information processing systems, 2012, pp. 1205–
1213.
[15] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko,
W. Wang, T. Weyand, M. Andreetto, and H. Adam,
“Mobilenets: Efficient convolutional neural networks for
mobile vision applications,” CoRR, vol. abs/1704.04861,
2017. [Online]. Available: http://arxiv.org/abs/1704.04861

[16] S. Ruder, “An overview of gradient descent optimization


algorithms,” CoRR, vol. abs/1609.04747, 2016. [Online].
Available: http://arxiv.org/abs/1609.04747

[17] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed,


D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich,
“Going deeper with convolutions,” CoRR, vol. abs/1409.4842,
2014. [Online]. Available: http://arxiv.org/abs/1409.4842

[18] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell,


“Deep domain confusion: Maximizing for domain invari-
ance,” arXiv preprint arXiv:1412.3474, 2014.

[19] B. Sun and K. Saenko, “Deep coral: Correlation alignment


for deep domain adaptation,” in European Conference on
Computer Vision. Springer, 2016, pp. 443–450.

[20] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and


A. Smola, “A kernel two-sample test,” J. Mach. Learn. Res.,
vol. 13, no. 1, pp. 723–773, Mar. 2012. [Online]. Available:
http://dl.acm.org/citation.cfm?id=2503308.2188410

[21] K. Team, “Classify waste category from images,”


2018. [Online]. Available: https://www.kaggle.com/c/waste-
classification/overview

[22] M. Fang, Y. Guo, X. Zhang, and X. Li, “Multi-source transfer


learning based on label shared subspace,” Pattern Recognition
Letters, vol. 51, pp. 101–106, 2015.

[23] M. J. Afridi, A. Ross, and E. M. Shapiro, “On automated


source selection for transfer learning in convolutional neural
networks,” Pattern recognition, vol. 73, pp. 65–75, 2018.

[24] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle,


F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-
adversarial training of neural networks,” The Journal of
Machine Learning Research, vol. 17, no. 1, pp. 2096–2030,
2016.

[25] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How trans-


ferable are features in deep neural networks?” in Advances in
neural information processing systems, 2014, pp. 3320–3328.

[26] K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual


category models to new domains,” in European conference on
computer vision. Springer, 2010, pp. 213–226.

626

Authorized licensed use limited to: Frankfurt University of Applied Sciences. Downloaded on June 17,2022 at 15:27:17 UTC from IEEE Xplore. Restrictions apply.

You might also like