0% found this document useful (0 votes)

11 views12 pages

Zhuang 2017

Uploaded by

Nguyen Trong Tung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views12 pages

Zhuang 2017

Uploaded by

Nguyen Trong Tung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Transfer Learning with Manifold Regularized

Convolutional Neural Network

Fuzhen Zhuang1,2 , Lang Huang1(B) , Jia He1,2 , Jixin Ma3 , and Qing He1,2
1
Key Lab of Intelligent Information Processing of Chinese Academy of Sciences
(CAS), Institute of Computing Technology, CAS, Beijing 100190, China
{zhuangfz,hej,heq}@ics.ict.ac.cn, [email protected]
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
University of Greenwich, London, UK

Abstract. Deep learning has been recently proposed to learn robust

representation for various tasks and deliver state-of-the-art performance
in the past few years. Most researchers attribute such success to the
substantially increased depth of deep learning models. However, train-
ing a deep model is time-consuming and need huge amount of data.
Though techniques like fine-tuning can ease those pains, the general-
ization performance drops significantly in transfer learning setting with
little or without target domain data. Since the representation in higher
layers must transition from general to specific eventually, generalization
performance degrades without integrating sufficient label information of
target domain. To address such problem, we propose a transfer learn-
ing framework called manifold regularized convolutional neural networks
(MRCNN). Specifically, MRCNN fine-tunes a very deep convolutional
neural network on source domain, and simultaneously tries to preserve
the manifold structure of target domain. Extensive experiments demon-
strate the effectiveness of MRCNN compared to several state-of-the-art
baselines.

Keywords: Transfer learning · Convolutional neural network · Manifold

learning

1 Introduction

Recently, deep learning shows great success for learning robust representation
and outperforms conventional state-of-the-art methods in computer vision appli-
cations. Convolutional neural networks (CNNs), win the ImageNet challenge
which is a contest based on a large scale data sets with over 1 million images
since 2012 [15,23]. And the key of this success is that the substantially increased
depth enlarge the capacity of CNNs and then enable CNNs to ﬁt the data sets

Lang Huang—This work is ﬁnished when Lang Huang is an intern (under the super-
vision of Fuzhen Zhuang) in Institute of Computing Technology, Chinese Academy
of Sciences.

c Springer International Publishing AG 2017
G. Li et al. (Eds.): KSEM 2017, LNAI 10412, pp. 483–494, 2017.
DOI: 10.1007/978-3-319-63558-3 41
484 F. Zhuang et al.

well. The works [17,24] reinforced this result by showing significant improvement
over shallow models when applying a very deep neural network architecture.
On the other hand, as the power of CNNs keeps growing, the complexity of
models increases, which further requires more data to avoid overfitting during
training process. The problem is that most of data sets are not large enough
for training, thus the performance degrades. To take advantage of both huger
capacity of deep models and less needed data of shallow ones, a training technique
called fine-tuning is proposed. Fine-tuning adopts pretrained models, which are
trained on large scale data sets, e.g. ImageNet, on new tasks with only sightly
modifying the parameters of the pretrained models. Several studies have reported
that fine-tuning obtains outstanding performance and reduces training time from
2 or 3 weeks to few days [18,19].
Although fine-tuning can learn effective representation in various fields, the
performance drops significantly when directly applied to transfer learning with
insufficient target domain data. Transfer learning aims to improve learning per-
formance in target domain with little or without any label information by lever-
aging knowledge from auxiliary source domain. Yosinski et al. [20] pointed out
that deep feature must transition from general to specific by higher layers of the
networks for transfer learning. Hence, only integrating with source domain data
will lead to the learned presentation go too specific to source domain.
To address such problem, we propose a manifold regularized convolutional
neural network (MRCNN) framework for transfer learning, which aims to use
manifold learning approach to regularize fine-tuning progress. Manifold learn-
ing approaches are widely adopted in semi-supervised or unsupervised learning,
which assumes that data points within a same local structure are likely to have
the same label [2]. Therefore, the unlabeled data in target domain can be uti-
lized to preserve such structure in higher layer or output layer by imposing man-
ifold based constraints. By coupling manifold regularization and fine-tuning, we
expect that the learned representation in higher layers go more general or more
specific to target domain, and thus the knowledge from auxiliary source domain
is successfully transferred.
The contributions of this paper are summarized as follow:
1. We propose a unsupervised learning framework that collaborates fine-tuning
technique and manifold regularization within deep convolutional neural net-
work for transfer learning.
2. We conduct extensive experiments on several data sets and statistical evidence
shows the effectiveness of our framework.
3. Furthermore, we investigate the impact of fine-tuning and manifold regular-
ization on knowledge transfer.

2 Preliminary Knowledge
In this section, we ﬁrst review convolutional neural networks architecture, ﬁne-
tuning technique and manifold regularization, which serve as preliminary knowl-
edge of this paper.
Manifold Regularized Convolutional Neural Network 485

2.1 Convolutional Neural Network

Deep learning approaches have been widely adopted in the last decade [3]. Par-
ticularly, convolutional neural network (CNN) is proposed to learn robust rep-
resentation and achieve satisfying results in computer vision [15,17].
A typical CNN is a feed-forward neural network stacked with multiple convo-
lutional (conv) layers, pooling layers (max-pool or average-pool), fully connected
(f c) layers and a classiﬁer on top of them. Both conv and f c layers learn non-
linear mapping hl in the lth layer with slight diﬀerence. The mapping of conv
and f c layers can be formalized by

hl+1 = σ(wl ∗ hl + bl ) (1)

hl+1 = σ(wl hl + bl ) (2)

respectively, where wl is the weight matrix (kernel in conv layers) and bl is the
bias of lth hidden layer, ‘*’ denotes convolution operation and σ(·) is an non-
linear activation function, e.g., Rectiﬁed Linear Unit (ReLU) σ(x) = max(0, x)
x
[8] for hidden layers or softmax function σ(x) = ne exi for output layer. Pool-
i=1
ing layers perform a downsampling operation along the spacial dimension. It
is useful for reducing computational cost and providing robustness for learning
representation [26].
Given data set X with label y, the objective to minimize in CNN is
n
1
L= C(hli , yi ) (3)
n i=1

where n is the size of data set X and l is the depth of model, hl is the learned
representation of lth hidden layer formulized by Eq. (1) or (2) and C(·, ·) is the
cross entropy loss function.

2.2 Fine-Tuning

According to [20], the representation in earlier layers of deep CNNs which are
trained on large scale data sets are general to different tasks. Hence, it would
be beneficial for both performance boost and time-saving to use those weights
as either initialization or feature extractor. Fine-tuning is a training procedure
that aims to adopt pretrained models on new tasks.
A standard fine-tuning procedure usually re-initializes the top fc layers to
match the dimension, since the sizes of input data sets differ between tasks. And
for the conv layers, there are two major strategies: (1) Fix some earlier layers
and only fine-tune the higher layers when data is very limited [18]; (2) Back
propagate through all the layers when we have enough data [19].
Note that it’s common to use smaller learning rate to avoid overfitting. In
this paper, we mainly follow the first fine-tuning strategy and the details will be
presented in later sections.
486 F. Zhuang et al.

2.3 Manifold Regularization

Manifold learning is a graph based semi-supervised or unsupervised method. It

attempts to preserve the manifold structure in a data set. In this paper, we
incorporate manifold learning as a regularization term by enforcing neighbors
located in the same local structure on embedding space.
Given a data set X, let M be the adjacent matrix of the instances of X and

1, xi ∈ N N (k, xj ) or xj ∈ N N (k, xi )
M[i,j] = (4)
0, Otherwise

where k is a hyper-parameter and N N (k, x) denotes the k nearest neighbors of

x. The similarity between xi and xj can be measured by cosine distance

xi xj
cos(xi , xj ) = (5)
xi xi · xj xj

wherexi and xj are column vectors and denotes matrix transpose. Let D =
diag( j M[i,j] ) and L = D − M be the Laplacian matrix, then the manifold
regularized term Γ (X) can be written as

Γ (X) = i,j M[i,j] ||f (xi ) − f (xj )||2
(6)
= trace(F LF )

where f (·) is a map function, F[i,.] denotes ith row of F and F[i,.] = f (xi ).

3 MRCNN Model
In this paper, we focus on unsupervised transfer learning, i.e., only labeled data
are available in source domain and unlabeled data in target domain. Thus, we
denote source domain as Ds = {xi , yi }ni=1
(s) (s) s
with ns labeled instances, target
(t) nt
domain as Dt = {xi }i=1 with nt unlabeled instances.
Now we are ready to present details of our framework. MRCNN is based on
VGG19 Network architecture [17], which is composed of 16 conv, 5 max-pooling,
3 fc and 1 softmax layers. Following the notation in [17], the 16 conv layers are
divided into 5 blocks by max-pooling layers, i.e., conv 1 block consists of conv
layers before ﬁrst max-pooling layer, conv 2 block consists of conv layers between
ﬁrst and second max-pooling layer, and so on.
We adopt the weights of VGG19 model pretrained on ImageNet in all conv
layers and randomly initialize the fc layers, which are shared for both source
and target domains. Then we further extend VGG Net by integrating manifold
regularization, which is imposed on target domain to enforce similar instances
to have the same labels, and cross entropy loss on source domain data to incor-
porate label information. By preserving manifold structure in this manner, we
hope the learned representation in higher layers can be well generalized. Figure 1
intuitively illustrates MRCNN framework. Note that conv 1–conv 5 each denotes
Manifold Regularized Convolutional Neural Network 487

Fig. 1. The framework of MRCNN.

a convolutional block, not a single layer. Due to the limitation of data and the
learned representations transition from general to specific along the network,
we adopt following three different strategies: (1) randomly initialize 3 fc layers
because fc layers require strict dimensional matching while the sizes of input
data sets are different. (2) conv 5 block, containing 4 conv layers, is carefully
fine-tuned since the representation in this block is more transferable and needs
only sightly tuning. In other word, we apply a smaller learning rate on this block.
(3) the conv 1-conv 4 blocks consisting of 12 conv layers are fixed and used as
feature extractor since the representation of these blocks are general.
Let w and b denote collections of weights and biases of the MRCNN, L
denotes Laplacian matrix of target domain, the overall objective is written as
below:

J = L(w, b, x(s) , y (s) ) + αΓ (k, L, w, b, x(t) ) + βΩ(w, b). (7)

The ﬁrst term of Eq. (7) is the cross entropy loss between the output log-
its and labels of source domain as presented in Eq. (3). The second term
Γ (k, L, w, b, x(t) ) is the manifold regularization as described in Eq. (6) imposed
on target domain and k is the number of nearest neighbors. The last term of
Eq. (7) is L2 norm of weight and bias matrices, which controls the complexity of
the network structure and is deﬁned as
l

Ω(w, b) = (||wi ||2 + ||bi ||2 ), (8)
i=1

where l is the depth of the neural network, i.e., 19 in this paper. α, β are hyper-
parameters that balance the importance of manifold regularization and model
complexity in the entire framework.
The objective Eq. (7) can be minimized by performing Gradient Descent.
Since we implement MRCNN by Deep Learning Library TensorFlow [25], which
can automatically compute the gradients and derive the solution, the update
rules for parameters will be omitted here.
488 F. Zhuang et al.

4 Experiments
To evaluate the eﬀectiveness of our proposed framework MRCNN, we conduct
experiments on two image data sets and compare our model with several state-
of-the-art baseline methods.

4.1 Data Sets and Data Processing

CIFAR-100. CIFAR-100 data set1 has 100 classes, which are grouped into 20
superclasses [5], and each contains 600 images. Among these 20 superclasses,
we randomly choose two of them ‘fruit and vegetables’ and ‘household electri-
cal devices’ and take ‘fruit and vegetables’ as positive examples and ‘household
electrical devices’ as negative one. Each superclass of ‘fruit and vegetables’ and
‘household electrical devices’ has 5 classes. To construct transfer learning clas-
siﬁcation problems, we randomly choose one class from ‘fruit and vegetables’
and one from ‘household electrical devices’ as source domains, and then choose
another one class of ‘fruit and vegetables’ and another one of ‘household electrical
devices’ from the remaining classes to construct target domain. In this way, we
can obtain 400 (P52 · P52 ) classiﬁcation problems.

Corel. Corel data set2 consists of two different top categories, ‘flower ’ and ‘traf-
fic’ [9]. Each top category further includes four subcategories. We take ‘flower ’ as
positive class and ‘traffic’ as negative one. Then by following the same process-
ing procedure, we can construct 144 (P42 · P42 ) transfer learning classification
problems.
Note that we do not perform any data argumentation on these data sets
except subtracting mean for CNN based methods and normalizing features to
[0, 1] for the other compared competitors.

4.2 Baseline Methods

We compare MRCNN with a variety of baselines,

– Logistic regression (LR), one of the most widely applied supervised learning
algorithm without transfer learning technique.
– Transductive Support Vector Machine (TSVM) [1], a transductive learning
algorithm to incorporate unlabeled target domain data. However, TSVM
assumes the labeled source domain data and unlabeled target domain data
follow the same distribution.
– Transfer Component Analysis (TCA) [12], which aims at learning a low-
dimensional representation for transfer learning. We use Support Vector
Machine (SVM) as the basic classiﬁer for it in this paper.

1
https://www.cs.toronto.edu/∼kriz/cifar.html.
2
http://archive.ics.uci.edu/ml/datasets/Corel+Image+Features.
Manifold Regularized Convolutional Neural Network 489

– Transfer Learning with Deep Autoconders (TLDA) [22], which uses deep
autoencoders to ﬁnd a proper embedding space for both source and target
domains, while their distribution are explicitly enforced to be similar.
– Standard VGG Net [17], which ﬁnetunes on source domain but without man-
ifold regularization. We denote it as VGG.

4.3 Implementation Details

For LR, we perform grid search for L2 regularization term, and for TSVM, we use
SVMlin 3 and sample the hyper-parameter in [10−5 , 101 ]. For TCA, the number
of latent dimension is carefully tuned, e.g., for Corel data set, the number varies
between 10 and 100 with the step size 10. For TLDA, we adopt author’s source
code and use the default parameters.
VGG and MRCNN are implemented in Deep Learning Library TensorFlow4
[25]. And for VGG, the learning rate r and weight decay strength β are set as
10−2.5 and 10−2 for CIFAR-100, 10−4 and 10−2 for Corel. For MRCNN, r and β
are set as 10−2.5 and 10−2 for CIFAR-100, 10−2.5 and 10−1 for Corel. Moreover,
the trade-oﬀ parameter α is set as 10−5 for CIFAR-100, 10−3 for Corel, and the
number of nearest neighbors k is set as 3 for both CIFAR-100 and Corel. Note
that, we use Adam [21] as an optimizer to minimize the objective function for
both CNN based models.

4.4 Results
In total, we construct 400 classification tasks for CIFAR-100 and 144 classifica-
tion tasks for Corel. To make comprehensive comparison, we further divide the
classification tasks into two groups for each data set according to the accuracy
of LR. Specifically, we first conduct LR model on all classification tasks, and
then group them into two groups, i.e., the first group of classification tasks with
accuracies from LR lower than 70%, while the other one with accuracies from
LR higher than 70%. Finally, we report the average results of two groups, and all
the results are shown in Table 1. Left and Right respectively denote the average
performance of classification tasks whose accuracies lower and higher than 70%,
and Total means the average results over all tasks. Note that lower accuracy
from LR indicates more difficult to make transfer, and vice versa.
From these results, we have the following observations:
– TSVM is better than LR, which indicates the importance to consider unla-
beled data. However, TSVM can not achieve satisfying results since it assumes
the labeled and unlabeled data should follow the same distribution. TCA per-
forms even worse than LR on Corel data set, which may show the difficulty
to make transfer on the constructed classification tasks. TLDA delivers best
results in most cases compared to above methods, which reveals the power of
deep models.
3
http://vikas.sindhwani.org/svmlin.html.
4
The code is available at https://github.com/LayneH/MRCNN.
490 F. Zhuang et al.

– CNN based methods, i.e., VGG and MRCNN, significantly outperform all
other conventional methods by a large margin. We attribute it to the fine-
tuning procedure, which fixes the earlier 4 conv clocks and only back-
propagates through the last layers to preserve the generality. This also indi-
cates the essential to adopt deep learning models for classification.
– Among the deep learning methods, MRCNN achieves considerable improve-
ment over standard VGG Net with fine-tuning. This validates that manifold
regularization successfully guides the training process to obtain better repre-
sentation.
– Overall, the incorporation of manifold regularization leads to the success of
MRCNN. In other word, MRCNN performs the best on all groups.

4.5 Analysis and Discussion

Although we show the average results in Table 1, you maybe more interest in
more detailed results. Here we try our best to present more detailed results,
although it is not easy to show all detailed ones of totally 544 (400+144) clas-
siﬁcation tasks. Taking CIFAR-100 as an example, we average the accuracy of
problems with the same target domain (P51 · P51 = 25 instances). The accuracies
of LR, VGG and MRCNN are presented in Fig. 2(a) in an increasing order of
LR’s accuracy.
From the results in Fig. 2(a), MRCNN outperforms the baselines most of
time. However, MRCNN can not achieve satisfying results at some points. We
conjecture that the proposed model heavily depends on the local relationship of
target domain data, so the bad quality of nearest neighbors would lead to the
poor performance. Motivated by [6], we use to measure the confusion of nearest
neighbors, and it can be formalized as
nt #nni (t)
i=1 j=1 1{label(nni,j )=label(xi )}
= n t (9)
i=1 #nni

Table 1. Average accuracy (%) of diﬀerent methods.

LR TSVM TCA TLDA VGG MRCNN

CIFAR-100
Left 64.87 66.90 74.24 68.15 72.50 76.98
Right 79.11 80.33 78.38 81.58 82.33 86.91
Total 74.91 76.37 77.16 77.62 79.43 83.98
Corel
Left 61.64 78.27 71.04 79.12 86.26 87.25
Right 80.62 80.81 75.40 81.04 86.63 87.54
Total 74.03 79.93 73.89 80.38 86.50 87.44
Manifold Regularized Convolutional Neural Network 491

where 1{·} is the indicator function, #nni is the number of nearest neighbors
(t) (t)
of xi , nni,j is the jth nearest neighbor of xi and label(x) is the label of
x. Intuitively, the higher is, the more confusing knowledge are introduced by
imposing manifold regularization. We sort the classification problems according
to the values of on target domains and show how influence the classification
accuracy in Fig. 2(b). Moreover, we group the tasks into 3 groups according to
the values of : the first group consists of tasks with < 0.1, the second one
consists of those with 0.1 ≤ ≤ 0.15 and the rest form the third one. The
average accuracy of each group is presented in Table 2.

Fig. 2. Average accuracy of target domains. (Color ﬁgure online)

Table 2. Mean accuracy (%) of diﬀerent groups.

< 0.1 0.1 ≤ ≤ 0.15 > 0.15

LR 76.00 77.30 70.18
VGG 82.26 79.88 72.84
MRCNN 90.00 82.67 72.24

From Fig. 2(b) and Table 2, we can find that the classification accuracy sub-
stantially drops as grows for all methods. The reason may be that the positive
and negative instances are similar for these classification tasks, and they are not
easy to be separated. The above results also reveal that the confusion of nearest
neighbors is the key factor that influences the performance of MRCNN. Hence,
to further generalize MRCNN on transfer learning tasks, one crucial problem to
be enhanced is how to obtain correct nearest neighbors.
492 F. Zhuang et al.

4.6 Parameter Sensitivity

In this section, we will discuss how the parameters r, α, β and k in Eq. (7) impact
on the experiments. To tune the hyper-parameters, we randomly select 10 prob-
lems from CIFAR-100 as validation problems. We sample the learning rate r
between {10−5 , 10−4 , 10−3 , 10−2 , 10−1 , 100 , 101 }, the trade-oﬀ parameter α and
β from {10−6 , 10−5 , 10−4 , 10−3 , 10−2 , 10−1 , 100 , 101 }, and the number of nearest
neighbors is selected from [1, 3, 5, 10, 20, 40]. From these results from Fig. 3,
we then set r, α, β and k to 10−2.5 , 10−5 , 10−1 and 3 in CIFAR-100 experiment.
For the Corel data set, we randomly select 6 problems, and the parameters are
similarly tuned.

(a) Influence of k (b) Influence of α (c) Influence of β

Fig. 3. Parameter sensitivity

5 Related Work
Transfer learning is the improvement of learning in a new domain by transferring
the knowledge from auxiliary source domain [7,11]. It has drawn much attention
in past decades for its potential to ease the pain of manual labeling. Feature
based approaches are one of the most widely proposed, which aim to learn a good
feature representation for both source domain and target domain by reduce the
difference between domains or integrating regularization [10,12,14,16]. Among
feature based transfer learning methods, several methods have been proposed
to reduce the domain discrepancy explicitly. For example, transfer component
analysis (TCA) [12] aims to minimize the difference of distributions between
domains in a kernel Hilbert space, [10] is trying to find a subspace where training
and testing samples are approximately i.i.d. by integrating Bregman divergence-
based regularization between distributions of domains. One crucial problem of
these methods is that most of them only adopt shallow representation models
to reduce the domain discrepancy, which limits their ability to generalize for
various tasks.
Deep learning methods show its potential to learn effective and robust rep-
resentation in recent years. To enjoy such benefit, several frameworks have been
introduced. Stacked Denoising Autoencoders (SDAEs) [13] aims to improve the
effectiveness of learned representation in Denoising Autoencoders (DAEs) [4]
by extending the depth of DAEs, i.e. stack multiple DAEs within the frame-
work. [22] further couples SDAEs and feature based transfer learning approach
Manifold Regularized Convolutional Neural Network 493

together where they explicitly minimize the KL divergence between distributions

of source and target domains in embedding space.
However, we argue that such methods do not take advantage of the generality
brought by large scale data sets. Researches like [18,19] directly adopt the pre-
trained models on new tasks and deliver state-of-the-art results. [20] investigates
the generality of deep neural network and reveals that features must eventually
transition from general to speciﬁc by the last layer of the network. Motivated by
these works, we proposed a manifold regularized convolutional neural networks
(MRCNN) framework to enjoy the generality of deep learning models for transfer
learning.

6 Conclusion

In this paper, we propose a novel manifold regularized convolutional neural net-

work (MRCNN) framework for transfer learning. This framework adopts a very
deep CNN architecture and incorporates manifold regularization component. By
imposing manifold based constrains, we enforce the manifold structure of tar-
get domain to be preserved while the cross entropy loss of source domain is
minimized. We confirm that this manifold regularization can help improve gen-
eralization performance and thus successfully leverage knowledge cross domains.
Moreover, in order to ease the pain of training such deep CNN, we apply fine-
tuning technique to our framework. Finally, we conduct a series of experiments
against several competitors to demonstrate the effectiveness of our framework.

Acknowledgment. This work is supported by the National Natural Science Founda-

tion of China (Nos. 61473273, 91546122, 61573335, 61602438), Guangdong provincial
science and technology plan projects (No. 2015 B010109005), the Youth Innovation
Promotion Association CAS 2017146.

References
1. Joachims, T.: Transductive inference for text classiﬁcation using support vector
machines. ICML 99, 200–209 (1999)
2. Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric frame-
work for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7,
2399–2434 (2006)
3. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with
neural networks. Science 313(5786), 504–507 (2006)
4. Vincent, P., Larochelle, H., Bengio, Y., et al.: Extracting and composing robust
features with denoising autoencoders. In: Proceedings of the 25th International
Conference on Machine Learning, pp. 1096–1103. ACM (2008)
5. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images
(2009)
6. Wu, J., Xiong, H., Chen, J.: Adapting the right measures for k-means clustering.
In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp. 877–886. ACM (2009)
494 F. Zhuang et al.

7. Torrey, L., Shavlik, J.: Transfer learning. Handb. Res. Mach. Learn. Appl. Trends:
Algorithms Methods Tech. 1, 242 (2009)
8. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann
machines. In: Proceedings of the 27th International Conference on Machine Learn-
ing (ICML 2010), pp. 807–814 (2010)
9. Zhuang, F., Luo, P., Xiong, H., et al.: Cross-domain learning from multiple sources:
a consensus regularization perspective. IEEE Trans. Knowl. Data Eng. 22(12),
1664–1678 (2010)
10. Si, S., Tao, D., Geng, B.: Bregman divergence-based regularization for transfer
subspace learning. IEEE Trans. Knowl. Data Eng. 22(7), 929–942 (2010)
11. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng.
22(10), 1345–1359 (2010)
12. Pan, S.J., Tsang, I.W., Kwok, J.T., et al.: Domain adaptation via transfer compo-
nent analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011)
13. Vincent, P., Larochelle, H., Lajoie, I., et al.: Stacked denoising autoencoders: learn-
ing useful representations in a deep network with a local denoising criterion. J.
Mach. Learn. Res. 11, 3371–3408 (2010)
14. Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment
classification: a deep learning approach. In: Proceedings of the 28th International
Conference on Machine Learning (ICML 2011), pp. 513–520 (2011)
15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
volutional neural networks. In: Advances in Neural Information Processing Sys-
tems, pp. 1097–1105 (2012)
16. Chen, M., Xu, Z., Weinberger, K., et al.: Marginalized denoising autoencoders for
domain adaptation. arXiv preprint arXiv:1206.4683 (2012)
17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556 (2014)
18. Hoffman, J., Guadarrama, S., Tzeng, E.S., et al.: LSDA: large scale detection
through adaptation. In: Advances in Neural Information Processing Systems, pp.
3536–3544 (2014)
19. Sharif Razavian, A., Azizpour, H., Sullivan, J., et al.: CNN features off-the-shelf:
an astounding baseline for recognition. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops, pp. 806–813 (2014)
20. Yosinski, J., Clune, J., Bengio, Y., et al.: How transferable are features in deep
neural networks?. In: Advances in Neural Information Processing Systems, pp.
3320–3328 (2014)
21. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint
arXiv:1412.6980 (2014)
22. Zhuang, F., Cheng, X., Luo, P., et al.: Supervised representation learning: transfer
learning with deep autoencoders. In: IJCAI, pp. 4119–4125 (2015)
23. Russakovsky, O., Deng, J., Su, H., et al.: Imagenet large scale visual recognition
challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
24. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 770–778 (2016)
25. Abadi, M., Agarwal, A., Barham, P., et al.: Tensorflow: large-scale machine learning
on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
26. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge
(2016)

UNIT-III DLL Full Unit
No ratings yet
UNIT-III DLL Full Unit
63 pages
Chapter 6 - Notes PDF
No ratings yet
Chapter 6 - Notes PDF
22 pages
Radiation Protection in Medical Radiography 9th Edition Sherer Solution Manual Full Download
100% (2)
Radiation Protection in Medical Radiography 9th Edition Sherer Solution Manual Full Download
411 pages
Genetic Algorithm Based Hyper-Parameters Optimization For Transfer Convolutional Neural Network
No ratings yet
Genetic Algorithm Based Hyper-Parameters Optimization For Transfer Convolutional Neural Network
20 pages
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
No ratings yet
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
25 pages
Classic CNN
No ratings yet
Classic CNN
39 pages
An Introduction To Convolutional Neural Networks: November 2015
No ratings yet
An Introduction To Convolutional Neural Networks: November 2015
12 pages
TransferLearningwithAdaptiveFine Tuning
No ratings yet
TransferLearningwithAdaptiveFine Tuning
16 pages
Autotune: Automatically Tuning Convolutional Neural Networks For Improved Transfer Learning
No ratings yet
Autotune: Automatically Tuning Convolutional Neural Networks For Improved Transfer Learning
11 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
Compact CNN Transfer Learning For Small-Scale Image Classification
No ratings yet
Compact CNN Transfer Learning For Small-Scale Image Classification
5 pages
Ethinking The Yperparameters FOR INE Tuning
No ratings yet
Ethinking The Yperparameters FOR INE Tuning
20 pages
Transforming Sensor Data To The Image Domain For Deep Learning - An Application To Footstep Detection
No ratings yet
Transforming Sensor Data To The Image Domain For Deep Learning - An Application To Footstep Detection
8 pages
UNIT-IV Improving Deep Neural Networks
No ratings yet
UNIT-IV Improving Deep Neural Networks
17 pages
(Fall 2024) Deep Learning 3
No ratings yet
(Fall 2024) Deep Learning 3
54 pages
Target Aware Network Architecture Search
No ratings yet
Target Aware Network Architecture Search
9 pages
Conference 101719
No ratings yet
Conference 101719
5 pages
Week 09
No ratings yet
Week 09
6 pages
3) ImageNet Classfication
No ratings yet
3) ImageNet Classfication
9 pages
Oct2022 CSC649 SupervisedDL - CNN
No ratings yet
Oct2022 CSC649 SupervisedDL - CNN
79 pages
Transfer Learning Using VGG-16 With Deep Convoluti
No ratings yet
Transfer Learning Using VGG-16 With Deep Convoluti
9 pages
Program 5n6 DL
No ratings yet
Program 5n6 DL
9 pages
DL Exp-6 16010422230
No ratings yet
DL Exp-6 16010422230
8 pages
PROGRAM 5n6 DL - Final
No ratings yet
PROGRAM 5n6 DL - Final
9 pages
Quantifying Translation-Invariance in Convolutional Neural Networks
No ratings yet
Quantifying Translation-Invariance in Convolutional Neural Networks
6 pages
Assignment-6 STC-DL
No ratings yet
Assignment-6 STC-DL
17 pages
Cat and Dog 1
No ratings yet
Cat and Dog 1
9 pages
Unit 4
No ratings yet
Unit 4
50 pages
06 Pytorch Transfer Learning
No ratings yet
06 Pytorch Transfer Learning
18 pages
ReviewPaper TransferLearning
No ratings yet
ReviewPaper TransferLearning
6 pages
Lecture 5,6 - Transfer Learning
No ratings yet
Lecture 5,6 - Transfer Learning
24 pages
Why Convolutions?: Till Now in MLP
No ratings yet
Why Convolutions?: Till Now in MLP
38 pages
S-NN: Stacked Neural Networks: Milad Mohammadi Stanford University Subhasis Das Stanford University
No ratings yet
S-NN: Stacked Neural Networks: Milad Mohammadi Stanford University Subhasis Das Stanford University
8 pages
ch4 CNN
No ratings yet
ch4 CNN
35 pages
Object Detection Using Convolutional Neural Network Transfer Learning
No ratings yet
Object Detection Using Convolutional Neural Network Transfer Learning
11 pages
Application of Transfer Learning For Image Classification On Dataset With Not Mutually Exclusive Classes
No ratings yet
Application of Transfer Learning For Image Classification On Dataset With Not Mutually Exclusive Classes
4 pages
UNIT-III Convolution Neural Networks
No ratings yet
UNIT-III Convolution Neural Networks
9 pages
Unit Iii
No ratings yet
Unit Iii
26 pages
COMP3220 Lect 11 - Introduction To Convolutional Neural Networks
No ratings yet
COMP3220 Lect 11 - Introduction To Convolutional Neural Networks
13 pages
Cats and Dogs Classification
No ratings yet
Cats and Dogs Classification
12 pages
NB4-10 PT V Transfer Learning
No ratings yet
NB4-10 PT V Transfer Learning
16 pages
Data Aug Trans
No ratings yet
Data Aug Trans
4 pages
Slides CNN
No ratings yet
Slides CNN
17 pages
Operations Slides
No ratings yet
Operations Slides
11 pages
An Introduction To Convolutional Neural Networks: November 2015
No ratings yet
An Introduction To Convolutional Neural Networks: November 2015
12 pages
Construction Readiness Review Pack
67% (3)
Construction Readiness Review Pack
91 pages
FDP Ai, ML, DL Q5
No ratings yet
FDP Ai, ML, DL Q5
2 pages
L09-10 DL and CNN
No ratings yet
L09-10 DL and CNN
56 pages
CNN - Case Study
No ratings yet
CNN - Case Study
4 pages
1.convolutional Neural Networks For Image Classification
No ratings yet
1.convolutional Neural Networks For Image Classification
11 pages
CNN Project
No ratings yet
CNN Project
16 pages
Make 04 00002 v2
No ratings yet
Make 04 00002 v2
20 pages
Master's Thesis Deep Learning For Visual Recognition: Remi Cadene Supervised by Nicolas Thome and Matthieu Cord
No ratings yet
Master's Thesis Deep Learning For Visual Recognition: Remi Cadene Supervised by Nicolas Thome and Matthieu Cord
58 pages
DLL g7 Sci Micros
100% (2)
DLL g7 Sci Micros
3 pages
Mayo Clinic Internal Medicine Board Review 10th
No ratings yet
Mayo Clinic Internal Medicine Board Review 10th
303 pages
Improving CNN Performance With Min-Max Objective
No ratings yet
Improving CNN Performance With Min-Max Objective
7 pages
UNIT 2 Self Notes
No ratings yet
UNIT 2 Self Notes
10 pages
Transfer Learning: Objectives
No ratings yet
Transfer Learning: Objectives
16 pages
14174-English
No ratings yet
14174-English
7 pages
AAI Module 4
No ratings yet
AAI Module 4
13 pages
Guddu Jha - Organized
No ratings yet
Guddu Jha - Organized
3 pages
310 Drum Lifting Jacks, Shafts, Loading Traverses
100% (1)
310 Drum Lifting Jacks, Shafts, Loading Traverses
14 pages
Deep Learning Approach For Object Detection Using CNN: Abstract
No ratings yet
Deep Learning Approach For Object Detection Using CNN: Abstract
7 pages
Measurement of Sound Levels in Buildings: ANC Guidelines
No ratings yet
Measurement of Sound Levels in Buildings: ANC Guidelines
29 pages
The Case Study On Penang South - Island Reclamation (PSR) Megaproject
100% (1)
The Case Study On Penang South - Island Reclamation (PSR) Megaproject
43 pages
L1 Introduction To Academic and Professional Writing
0% (1)
L1 Introduction To Academic and Professional Writing
4 pages
Cooling Tower Performance Test (Id CT) : Manual Input Sheet Station: Report Date: Unit: Test Date
No ratings yet
Cooling Tower Performance Test (Id CT) : Manual Input Sheet Station: Report Date: Unit: Test Date
12 pages
Xuesong Wang Et Al - 2021 - Multipath Ensemble Convolutional Neural Network
No ratings yet
Xuesong Wang Et Al - 2021 - Multipath Ensemble Convolutional Neural Network
9 pages
Aging and The Life Course: An Introduction To Social Gerontology 6th Edition (Ebook PDF) PDF Download
100% (2)
Aging and The Life Course: An Introduction To Social Gerontology 6th Edition (Ebook PDF) PDF Download
49 pages
21st Century Education
No ratings yet
21st Century Education
6 pages
Solution Manual For Fundamentals of Geotechnical Engineering 4th Edition - Braja M. Das
No ratings yet
Solution Manual For Fundamentals of Geotechnical Engineering 4th Edition - Braja M. Das
7 pages
TCS Codevita Previous Papers - Pdf-Edited
No ratings yet
TCS Codevita Previous Papers - Pdf-Edited
8 pages
Plastic Waste Management
No ratings yet
Plastic Waste Management
9 pages
Chapter 1
No ratings yet
Chapter 1
19 pages
Robotics 2 How To Write A Paper
No ratings yet
Robotics 2 How To Write A Paper
39 pages
Comsats University Islamabad Lab Report#2 Applied Physics For Engineers
No ratings yet
Comsats University Islamabad Lab Report#2 Applied Physics For Engineers
4 pages
USP-NF 228 Ethylene Oxide and Dioxane
No ratings yet
USP-NF 228 Ethylene Oxide and Dioxane
3 pages
I 2 Solheim
100% (2)
I 2 Solheim
13 pages
23MIPrelimQP (H2 Chem Paper 3)
No ratings yet
23MIPrelimQP (H2 Chem Paper 3)
26 pages
Power System Operation and Control - EE8702 2017 Regulation - Question Paper 2020 Nov Dec
No ratings yet
Power System Operation and Control - EE8702 2017 Regulation - Question Paper 2020 Nov Dec
6 pages
2025 Grandiose Mock - Science 2
No ratings yet
2025 Grandiose Mock - Science 2
4 pages
Python Lab Program 3
No ratings yet
Python Lab Program 3
3 pages
Akaike 1998
No ratings yet
Akaike 1998
15 pages
Kien Nguyen - To2016differentially
No ratings yet
Kien Nguyen - To2016differentially
10 pages
25 Combi and NT Problems
No ratings yet
25 Combi and NT Problems
7 pages
5 Min Company Intro DZ CS+en
No ratings yet
5 Min Company Intro DZ CS+en
14 pages
Eigen Values and Eigen Vector
No ratings yet
Eigen Values and Eigen Vector
13 pages
ULG II Master Thesis Traussnig PDF
No ratings yet
ULG II Master Thesis Traussnig PDF
90 pages
Digital Filter Structures Digital Filter Structures
No ratings yet
Digital Filter Structures Digital Filter Structures
10 pages
Machine Learning Flyer
No ratings yet
Machine Learning Flyer
2 pages
Issue Essay
No ratings yet
Issue Essay
2 pages
MODULE 6 - THE LAWS OF PHYSICS-WPS Office
No ratings yet
MODULE 6 - THE LAWS OF PHYSICS-WPS Office
5 pages
Sentence Structure
No ratings yet
Sentence Structure
3 pages
Weir & Retaining Wall
No ratings yet
Weir & Retaining Wall
2 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Zhuang 2017

Uploaded by

Zhuang 2017

Uploaded by

Transfer Learning with Manifold Regularized

Convolutional Neural Network

Abstract. Deep learning has been recently proposed to learn robust

Keywords: Transfer learning · Convolutional neural network · Manifold

2.1 Convolutional Neural Network

hl+1 = σ(wl ∗ hl + bl ) (1)

hl+1 = σ(wl hl + bl ) (2)

2.3 Manifold Regularization

Manifold learning is a graph based semi-supervised or unsupervised method. It

where k is a hyper-parameter and N N (k, x) denotes the k nearest neighbors of

Fig. 1. The framework of MRCNN.

J = L(w, b, x(s) , y (s) ) + αΓ (k, L, w, b, x(t) ) + βΩ(w, b). (7)

4.1 Data Sets and Data Processing

4.2 Baseline Methods

4.3 Implementation Details

4.5 Analysis and Discussion

Table 1. Average accuracy (%) of diﬀerent methods.

LR TSVM TCA TLDA VGG MRCNN

Fig. 2. Average accuracy of target domains. (Color ﬁgure online)

Table 2. Mean accuracy (%) of diﬀerent groups.

< 0.1 0.1 ≤ ≤ 0.15 > 0.15

4.6 Parameter Sensitivity

(a) Influence of k (b) Influence of α (c) Influence of β

Fig. 3. Parameter sensitivity

together where they explicitly minimize the KL divergence between distributions

In this paper, we propose a novel manifold regularized convolutional neural net-

Acknowledgment. This work is supported by the National Natural Science Founda-

You might also like