Transfer Learning For Visual Categorization A Survey
Transfer Learning For Visual Categorization A Survey
Abstract— Regular machine learning and data mining tech- restrictions, sufficient training data belonging to the same
niques study the training data for future inferences under a feature space or the same distribution as the testing data
major assumption that the future data are within the same may not always be available. Typical examples are [8]–[11],
feature space or have the same distribution as the training
data. However, due to the limited availability of human labeled where only one action template is provided for each action
training data, training data that stay in the same feature space class for training, and [12], where training samples are
or have the same distribution as the future data cannot be captured from a different viewpoint. In such situations, regular
guaranteed to be sufficient enough to avoid the over-fitting machine learning techniques are very likely to fail. This
problem. In real-world applications, apart from data in the reminds us of the capability of the human vision system.
target domain, related data in a different domain can also be
included to expand the availability of our prior knowledge about Given the gigantic geometric and intraclass variabilities of
the target future data. Transfer learning addresses such cross- objects, humans are able to learn tens of thousands of visual
domain learning problems by extracting useful information from categories in their life, which leads to the hypothesis that
data in a related domain and transferring them for being used humans achieve such a capability by accumulated information
in target tasks. In recent years, with transfer learning being and knowledge [13]. It is estimated that there are about 10–30
applied to visual categorization, some typical problems, e.g., view
divergence in action recognition tasks and concept drifting in thousands object classes in the world [14] and children can
image classification tasks, can be efficiently solved. In this paper, learn 4–5 object classes per day [13]. Due to the limitation
we survey state-of-the-art transfer learning algorithms in visual of objects that a child can see within a day, learning new
categorization applications, such as object recognition, image object classes from large amounts of corresponding object
classification, and human action recognition. data is not possible. Thus, it is believed that the existing
Index Terms— Action recognition, image classification, knowledge gained from previous known objects assists the new
machine learning, object recognition, survey, transfer learning, learning process through their connections with the new object
visual categorization. categories. For example, assuming we did not know what a
watermelon is, we would only need one training sample of
I. I NTRODUCTION watermelons together with our previous knowledge on melons-
circular shapes, the green color, and so on, to remember the
I N THE past few years, the computer vision community
has witnessed a significant amount of applications in video
search and retrieval, surveillance, robotics, and so on. Regular
new object category watermelon. Transfer learning mimics the
human vision system by making use of sufficient amounts
machine learning approaches [1]–[7] have achieved promising of prior knowledge in other related domains when executing
results under the major assumption that the training and new tasks in the given domain. In transfer learning, both the
testing data stay in the same feature space or share the training data and the testing data can contribute to two types of
same distribution. However, in real-world applications, due to domains: 1) the target domain and 2) the source domain. The
the high price of human manual labeling and environmental target domain contains the testing instances, which are the task
of the categorization system, and the source domain contains
Manuscript received December 29, 2012; revised October 17, 2013, training instances, which are under a different distribution
January 30, 2014, and May 26, 2014; accepted June 3, 2014. Date of
publication July 1, 2014; date of current version April 15, 2015. This work with the target domain data. In most cases, there is only one
was supported in part by the National Basic Research Program of China (973 target domain for a transfer learning task, while either single
Program) under Grant 2012CB316400, in part by the University of Sheffield, or multiple source domains can exist. For example, in [15],
Sheffield, U.K., and in part by the National Natural Science Foundation of
China under Grants 61125106 and 61072093. action recognition is conducted across data sets from different
L. Shao is with the College of Electronic and Information Engineer- domains, where the KTH data set [16], which has a clean
ing, Nanjing University of Information Science and Technology, Nanjing background and limited viewpoint and scale changes, is set as
210044, China, and also with the Department of Electronic and Electri-
cal Engineering, University of Sheffield, Sheffield S1 3JD, U.K. (e-mail: the source data set, and the Microsoft research action data set1
[email protected]). and the TRECVID surveillance data [17], which are captured
F. Zhu is with the Department of Electronic and Electrical Engineering, Uni- from realistic scenarios, are used as the target data set. In [18],
versity of Sheffield, Sheffield S1 3JD, U.K. (e-mail: [email protected]).
X. Li is with the Center for OPTical IMagery Analysis and Learning, State the source and target data sets are chosen from different TV
Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics program channels for the task of video concept detection.
and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China Transfer learning can be considered as a special
(e-mail: [email protected]).
Color versions of one or more of the figures in this paper are available learning paradigm where partial/all training data used are
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNNLS.2014.2330900 1 http://research.microsoft.com/∼zliu/ActionRecoRsrc
2162-237X © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
1020 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 5, MAY 2015
Fig. 1. Basic frameworks of traditional machine learning approaches and knowledge transfer approaches. For regular machine learning approaches, the
learning system can only handle the situation that testing samples and training samples are under the same distribution. On the other hand, transfer learning
approaches have to deal with the data distribution mismatch problem through specific knowledge transfer methods, e.g., mining the shared patterns from data
across different domains.
under a different distribution with the testing data. To under- transfer, the training set can just contain some relevant cat-
stand the significance of knowledge transfer in terms of visual egories rather than cars, e.g., wheels, which are similar to the
learning problems, the literature, (see [19]–[21]) has concluded wheels of cars; bicycles, which share the knowledge of wheels
three general issues regarding the transfer process: 1) when to with the car wheels, or even some irrelevant objects, e.g.,
transfer; 2) what to transfer; and 3) how to transfer. First, laptops and birds, which seem to have no connections with
when to transfer includes the issues whether transfer learning cars, but actually share certain edges or geometrical layouts
is necessary for specific learning tasks and whether the source with local parts of a car image.
domain data are related to the target domain data. In the As the age of big data has come, transfer learning can
scenarios of [22]–[24], where training samples are sufficient provide more benefits to solve the target problem with more
and impressive performance can be achieved, while being relevant data. Thus, it is believed that more applications on
constrained in the target domains, including another domain transfer learning will emerge in future research. This survey
as the source domain becomes superfluous. A variety of aims to give a comprehensive overview of transfer learning
divergence levels exist across different pairs of source domain techniques on visual categorization tasks, so that readers
and target domain data, brute-forcing the knowledge from the could potentially use the analysis and discussions in this
source domain into the target domain irrespective of their survey to understand how transfer learning can be applied
divergence would cause certain performance degeneration, to visual categorization tasks or to solve their problem with
or, in even worse cases, it would break the original data a suitable transfer learning method. The visual categorization
consistency in the target domain. Second, the answer to what tasks possess some unique characteristics due to certain visual
to transfer can be concluded in three aspects: 1) inductive properties that can be potentially used in the training process,
transfer learning, where all the source domain instances and e.g., the appearance or shape of an object part, the local
their corresponding labels are used for knowledge transfer; symmetries of an object, and the structural. All these unique
2) instance transfer learning, where only the source domain properties can be employed when designing transfer learning
instances are used; and 3) parameter transfer learning, in algorithms, which makes our work different from that of [19]
addition to the source domain instances and labels, some and [29], where the former focuses on classification, regression
parameters of prelearned models from the source domain and clustering problems related to data mining tasks and
are utilized to help improve the performance in the target the latter focuses on reinforcement learning, which addresses
domain. Finally, how to transfer includes all the specific problems with only limited environmental feedback rather than
transfer learning techniques, and it is also the most important correctly labeled examples.
part that has been studied in the transfer learning literature. The remaining part of this survey is structured as follows.
Many transfer learning techniques have been proposed, e.g., An overview is given in Section II. In Sections III and IV,
in [25]–[27], where knowledge transfer is based on the non- two transfer learning categories, which execute knowledge
negative matrix trifactorization framework, and in [28], where transfer through feature representations and classifiers, are
the transfer learning phase is via dimensionality reduction. discussed in detail, respectively, answering the problems of
We illustrate the basic frameworks of traditional machine what to transfer and how to transfer. In Section V, the model
learning approaches and knowledge transfer approaches in selection methods from multiple source domains, i.e., when
Fig. 1. For traditional machine learning approaches, the ideal to transfer, are discussed. Evaluation, analysis, and discussions
choice of the training set to predict a testing instance car on the stated transfer learning methods are given in Section VI.
should contain cars. However, in the case of knowledge Finally, the conclusions are drawn in Section VII.
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
SHAO et al.: TRANSFER LEARNING FOR VISUAL CATEGORIZATION 1021
II. OVERVIEW d denotes the data dimension, and yi denotes the class label
of the i th sample.
A. Developing Interests on Transfer Learning
According to prior proposals, common issues regarding
Dating from the raising of its notion in the last cen- knowledge transfer are twofold. First, the auxiliary samples
tury, transfer learning (also known as, cross-domain learning, are typically treated without accounting for their mutual
domain transfer, and domain adaptation) has a long history dependency during adaptation, which may cause the adapted
of being studied as a particular machine learning technique. data to be arbitrarily distributed and the structural information
In recent years, with the information explosion on the Internet, beyond single data samples of the auxiliary data may become
(e.g., audio, images, and videos) and the growing demands undermined. Second, during adaptation, noises, and particu-
for target tasks in terms of accuracies, data scales, and com- larly possible outliers from the auxiliary domains are blindly
putational efficiencies, transfer learning approaches begin to forced to the target domain [30].
attract increasing interests from all research areas in pattern When transferring knowledge from the auxiliary domains
recognition and machine learning. When regular machine to the target domain, it is crucial to know the distribution
learning techniques reach their limits, transfer learning opens similarities between the target domain data and each source
the flow of a new stream that could fundamentally change domain data. So far, the most common criterion to measure
the way of how we used to learn things and how we used to the distribution similarity of two domains is a nonparametric
treat classification or regression tasks. Along with the flow, distances metric named maximum mean discrepancy (MMD).
some workshops and tutorial have been held (such as the The MMD is proposed in [31], and it compares data distrib-
NIPS 1995 postconference workshop2 in machine learning utions in the reproducing kernel Hilbert space
and data mining areas and another transfer learning survey 2
1 ns
s 1
nT
T
is given in [29] for reinforcement learning). In this survey,
Distk (D , D ) =
s T
φ xi − φ xi (1)
we focus on the applications of transfer learning techniques ns nT
i=1 i=1
to visual categorization, including action recognition, object
where φ(·) is the feature space mapping function.
recognition, and image classification.
In the literature, transfer learning techniques are categorized
according to a variety of taxonomies. In [19], considering
B. Notations and Issues tasks allocated to the target domain and auxiliary domains
and the availability of sample labels within the target domain
Some general notations are defined as follows for later
and auxiliary domains, transfer learning techniques are first
usage: let D T = DlT ∪ DuT denote the target domain data,
grouped as inductive transfer learning, transductive transfer
where the partially labeled parts are denoted by DlT and the learning, and unsupervised transfer learning, upon which they
unlabeled parts are denoted by DuT . In addition to the target are further categorized as instance-transfer, feature represen-
domain data, a set of auxiliary data is seen as the source tation transfer, parameter transfer, and relational knowledge
domain data, which is semilabeled or fully labeled and has transfer within each initial partition. Fig. 2 shows five ways
the representation Ds = {(x i , yi )}ai=1 in a single source case, of differentiating existing knowledge transfer approaches for
Na
and D1s , D2s , . . . , DsM with Dks = {(x ik , yik )}i=1
k
in a multiple visual categorization. In this survey, inheriting the concepts
source case. Here, x i ∈ R is the i th feature vector, where
d
from the computer vision community, we simply categorize
transfer learning techniques into feature representation level
2 https://nips.cc/Conferences/2005/Workshops/
knowledge transfer and classifier level knowledge transfer.
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
1022 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 5, MAY 2015
III. F EATURE R EPRESENTATION T RANSFER the task of image classification) so that knowledge can be
Feature representation level knowledge transfer is a popular transferred through the bottom level to form a higher level
transfer learning category that maps the target domain to the representation of the training samples in the target domain, in
source domains exploiting a set of meticulously manufactured which case the source domain data do not necessarily need
features. Through this type of feature representation level to be relevant to the target domain data. Since in the regular
knowledge transfer, data divergence between the target domain transfer learning formalism, the source domain data have to
and the source domains can be significantly reduced so that be relevant with the target domain data, such a knowledge
the performance of the task in the target domain is improved. transfer method is named self-taught learning rather than
Most existing transductive features are designed for specific transfer learning. Zhu and Shao [39] present a discriminative
domains and would not perform optimally across different data cross-domain DL (DCDDL) framework that utilizes relevant
types. Thus, we review the feature level knowledge transfer data from other visual domains as auxiliary knowledge for
techniques according to two data types: 1) cross-domain enhancing the learning system in the target domain. The objec-
knowledge transfer and 2) cross-view knowledge transfer. tive function is designed to encourage similar visual patterns
across different domains to possess identical representations
after being encoded by a learned dictionary pair. In the part-
A. Cross-Domain Knowledge Transfer of-speech (POS) tagging tasks, shared patterns from auxiliary
In the cross-domain setting, the gap between the source categorization tasks are extracted as pivot features, which
domain data and the target domain data varies from images represent the frequent words emerged in the speech and are
to videos and from objects to edges. According to the degree themselves indicative of their corresponding categories [40].
of data divergence, different approaches are proposed. In [15], While the pivot features are sensitive to the POS tagging tasks,
knowledge transfer is made between the KTH data set [16], pivot visual words do not exist in typical local histogram-based
the TRECVID data set [17] and the Microsoft research action low-level visual features, which indicates that no single feature
data set II (MSRII), where the KTH data set is seen as dimension of the histogram bins is discriminative enough to
the target domain and both the TRECVID data set and the represent the difference of the visual categories [41].
MSRII data set are used as the source domains. The KTH On the other hand, some works also target to identify a new
data set is limited to clean backgrounds and a single actor lower-dimensional feature space such that the auxiliary domain
and each video sequence exhibits one individual action from and the target domain manifest some shared characteristics
the beginning to the end. On the other hand, the TRECVID [42]–[44], instead of transferring the entire knowledge across
data set and the MSRII data set are captured from realistic the target domain and auxiliary domains making such an
scenarios, with cluttered backgrounds and multiple actors assumption that the smoothness property (i.e., those data points
in each video sequence. To take advantage of the labeled close to each other are more likely to share the same label) is
training data from both the target domain and the source satisfied in low-dimension subspaces [41].
domain, Daumé [32] proposed the feature replication (FR)
method using augmented feature for training. Inspired by [33],
which applies the Gaussian mixture model (GMM) to model B. Cross-View Knowledge Transfer
the visual similarities between images or videos, the work Cross-view knowledge transfer can be seen as a special case
in [15] models the spatial temporal interests points (STIPs) of cross-domain knowledge transfer, where the divergences
with the GMM and introduces a prior distribution of the across domains are caused by view-point changes. The task
GMM parameters to generate probabilistic representations of is to recognize action classes in the target view using training
the original STIPs. Such representations can accomplish the samples from one or more different views. Generating view-
adaptation from the source domains to the target domain. The invariant features to address the cross-view visual pattern
basic setting of [34] assumes that there are labeled training recognition problems attracts significant attention in the com-
data in the source domain, but no labeled training data in the puter vision field, especially for cross-view action recognition.
target domain. Furthermore, the activities in the source domain The bottom of Fig. 3 shows the cross-view knowledge transfer
and the target domain do not overlap, so that traditional scenario on the multiview IXMAS [45] data set. The typical
supervised learning methods cannot be applied in this scenario. setting is to use samples captured in one view (the source
Utilizing the Web pages returned by search engines to mine view) as training data to predict the labels of samples captured
similarities across the domains, the labeled data in the source from a different view (the target view). The core methodology
domain are then interpreted by the label space of the target of approaches that tackle visual categorization problems with
domain. In some extreme cases, the source domain data may changes in the observer’s viewpoint is to discover the shared
not be relevant to the target domain data. knowledge irrespective to such viewpoint changes. One com-
Sparseness has gained tremendous attention in various sci- mon approach to attack the cross-view feature representation
entific fields, and computer vision is a dominant part of this diversity problem is to infer 3-D scene structure for cross-view
trend. Sparse models can find their applications in a wide range feature adaptation, where the derived features can be adapted
of computer vision techniques, e.g., dictionary learning (DL) from one view to another utilizing geometric reasoning
[35]–[37] and transfer learning. Raina et al. [38] apply sparse [46]–[49]. Another family of approaches is to explore
coding to unlabeled data to break the tremendous amount of visual pattern properties, e.g., affine [50], projective [51],
data in the source domain into basic patterns, (e.g., edges in epipolar geometry [52]–[54], to compute such cross-view
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
SHAO et al.: TRANSFER LEARNING FOR VISUAL CATEGORIZATION 1023
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
1024 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 5, MAY 2015
TABLE I
M AIN C HARACTERISTICS OF L ISTED F EATURE R EPRESENTATION L EVEL K NOWLEDGE T RANSFER A PPROACHES .
AVAILABILITY OF B OTH TARGET D OMAIN L ABELS , A DAPTATION T YPE , AND A PPLICATIONS OF A LL
S TATED F EATURE R EPRESENTATION L EVEL K NOWLEDGE T RANSFER M ETHODS A RE L ISTED
transfer that the wheel part of a motorbike template can be where θ is a scaling factor in the range of (0, 1) to control the
increased in radius and reduced in thickness when fitting to degree of transfer across the learned model ws and the target
a bicycle wheel template. The DA-SVM can also be seen as model wt . When being extended to multimodel knowledge
the generalization form of the rigid A-SVM by replacing ws transfer (multi-KT), the scaling factor θ is substituted with
in (2) with τ (ws ) the vector = {θ1 , θ2 , . . . , θk }, where each θ j is the weight
of a corresponding prior model. Thus, (6) can be rewritten as
N 2
L DA = min wt − τ (ws )2 + C l(x i , yi ; wt , b)
k
f,wt ,b
L Multi-KT = min wt − θ j ws j
⎛ i
⎞ wt ,b
j =1
M,M
M
+λ ⎝ f i,2j di, j + (1 − f ii )2 d ⎠ (5) C
l
i = j i + ζi (yi − wt · φ(x i ) − b)2 . (7)
2
i=1
where di, j is the spatial distance between the i th and j th cell, The ζi in (7) is used for resampling the data so that training
d is the penalization for the additional flow from the i th source samples are balanced. Taking the advantage of LS-SVM that
cell to the i th target cell, and τ (ws )i = M
j f ij ws j is the flow the leave-one-out (LOO) error, which measures the proper
transformation, where the parameter fi j denotes the amount of amount of knowledge to be transferred, can be written in a
transfer from the j th cell in the source template to the i th cell closed form [67], the best values of θ j are those that minimize
in the transformed template. The cells are extracted from local the LOO error.
image regions, on which local descriptors, (e.g., HOG [64] and Typically, the kernel functions need to be specified in
SIFT [65]) are computed. Thus, different from other classifier- advance to learning and the associated kernel parameters,
based knowledge transfer techniques, DA-SVM has such a (e.g., the mean and variance in the Gaussian kernel) are
constraint that it has to be constructed using low-level visual determined during optimization. On top of the various
features that measure the geometrical information of local kernel learning methods [68]–[71], the domain transfer SVM
image parts. (DT-SVM) [72] unified the cross-domain learning framework
Tommasi et al. [66] proposed a discriminative transfer by searching for the SVM decision function f (x) = w φ(x)+b
learning method based on least squares support vector machine as well as the kernel function simultaneously instead of the
(LS-SVM) that learns the new category through adaptation. two-step approaches [28], [73]. In general, DT-SVM achieves
By replacing the regularization term in classical LS-SVM, cross-domain classification by reaching two objective criteria:
the new learning objective function for knowledge transfer is 1) DT-SVM minimizes the data distribution mismatch between
formulated as the target domain and source domains using the MMD crite-
rion mentioned in Section II and 2) DT-SVM pursues better
C
l
1
L KTLS = min wt − θ ws 2 + [yi − wt φ(x i )−b]2 (6) classification performance by minimizing the structural risk of
wt ,b 2 2 SVM. By meeting both criteria, an effective kernel function
i=1
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
SHAO et al.: TRANSFER LEARNING FOR VISUAL CATEGORIZATION 1025
can be learned for better separation performance in linear to target categories. Both the zero-shot learning problem and
space over different domains, and thus samples from the the one-shot learning problem are addressed, where in the first
source domains are infused to the target domain to improve problem, the attribute model learned from the source domain
the classification performance of the SVM classifier. categories is used to generate synthesized target training
examples through the generative process, and in the second
B. TrAdaboost problem, the learned attribute model is used to reduce the
uncertainty of parameters of the Dirichelt priors.
Adaptive boosting (AdaBoost) [74] is a popular boosting
algorithm, which has been used in conjunction with a wide
range of other machine learning algorithms to enhance their D. Fuzzy System-Based Models
performance. At every iteration, AdaBoost increases the accu- Transfer learning also finds its application in fuzzy systems.
racy of the selection of the next weak classifier by carefully Deng et al. [78] and [79] proposed two knowledge-leverage-
adjusting the weights on the training instances. Thus, more based fuzzy system models, respectively. The former is based
importance is given to misclassified instances since they are on the Takagi–Sugeno–Kang fuzzy system, and the latter is
believed to be the most informative for the next selection. based on the reduced set density estimator-based Mamdani–
The transfer learning AdaBoost (TrAdaBoost) is introduced Larsen-Type fuzzy system. In both works, the training set is
in [21] to extend AdaBoost for transfer learning by weighting decomposed to training data of the current scene and model
less on the different-distribution data, which are considered parameters of reference scenes. The same knowledge leverage
as dissimilar to the same-distribution data in each boosting strategy is adopted by both works, where model parameters
iteration. The goal of TrAdaBoost is to reduce the weighted obtained from the reference scenes are fed to the current scene
training error on the different-distribution data, and meanwhile for parameter approximation. The knowledge leverage strategy
preserving the properties of AdaBoost. Since the quality is performed through a unified objective function, which
of different-distribution data is not certain, the performance emphasizes on both learning from the data of the current scene
of TrAdaBoost cannot be always guaranteed to outperform and transferring model parameters from reference scenes.
AdaBoost. 1) Discussion: The stated SVM-based knowledge transfer
methods can act as a plug in to the SVM training process.
C. Generative Models A common trait shared amid these methods according to their
The learning to learn concept via rich generative objective functions is that they all include a regularization
models has emerged as one promising research area in both term that measures the similarity between the learned model
computer vision and machine learning. Recently, researchers and the target model. In A-SVM, PMT-SVM, and DA-SVM,
have begun developing new approaches to deal with transfer is the tradeoff parameter between margin maximization
learning problems using generative models. One workshop in and knowledge transfer, so it defines the amount of trans-
conjunction with NIPS 2010 was held specifically for the fer regularization. The DA-SVM is specialized in dealing
discussion of transfer learning via rich generative models. with the transfer of visually deformable templates, while
In general, the generative knowledge transfer methods can A-SVM and PMT-SVM are more likely to be generalized.
lead to higher-impact transfer, including more information than The advantage of PMT-SVM over A-SVM is that it can
those discriminative approaches and they can be more adaptive increase the amount of transfer without penalizing margin
to a single specific task. maximization, while A-SVM encourages w to be larger
Fei-Fei et al. [75] proposed a Bayesian-based unsuper- when increasing . A large w indicates small margins to the
vised one-shot learning object categorization framework that hyperplane, and thus the generalization error of the classifier
learns a new object category using a single example (or fails to gain an optimal bound. In general, PMT-SVM is
just a few). Since Bayesian methods allow us to incorpo- expected to outperform A-SVM.
rate prior information about objects into a prior probability Compared with SVM-based approaches, the boosting-based
density function when observations become available, general method, TrAdaBoost, is simpler in terms of implementation,
information coming from previously learnt unrelated cate- and it does not require the parameters from the prelearned
gories is represented with a suitable prior probability den- models. Like other boosting-based techniques, TrAdaBoost
sity function on the parameters of the probabilistic models. has a fairly strong generalization ability. However, TrAdaBoost
Thus, priors can be formed from unrelated object categories. relies heavily on the relevance of the source domain data to the
For example, when learning the category motorbikes, priors target domain data, thus it is vulnerable to negative transfers.
can be obtained by averaging the learnt model parameters In addition, TrAdaBoost can easily overfit in the presence
from other three categories spotted cats, faces, and airplanes, of noise in either domain. The generative models are more
so that the hyperparameters of the priors are then esti- adaptive to a specific task, however, but computationally more
mated from the parameters of the existing category models. complex.
Yu and Aloimonos [76] applied the generative author-
topic [77] model to learn the probabilistic distribution of V. M ODEL S ELECTION IN K NOWLEDGE T RANSFER
image features-based object attributes. Since object attributes In real-world applications, knowledge transfer techniques
can represent common properties across different categories, have to consider more complicated scenarios than adapting
they are used to transfer knowledge from source categories the samples or prelearned models from a single source domain
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
1026 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 5, MAY 2015
A. SVM-Based
In the one-to-one adaptation scenario of A-SVM [18], the
new target classifier f T (x) is adapted from the existing source
classifier f s (x) using the form
f T (x) = f s (x) + f (x) (8)
where the perturbation function f (x) is learned using the
labeled data DlT from the target domain. Intuitively, when
Fig. 4. Knowledge transfer from multiple auxiliary domains. (a) Auxiliary
encountering with multiple source domains D1s , D2s , . . . , DsM ,
domain 1. (b) Auxiliary domain 2. (c) Target domain. The two subfigures on which are assumed to possess similar distributions to the
the left denote the two different auxiliary domain data and their corresponding primary domain D t , the adapted classifier can be constructed
decision boundaries, where auxiliary domain 1 is partitioned by a horizontal
line and auxiliary domain 2 is partitioned by a vertical line. By brutally com-
using the ensemble of all the source domain classifiers
bining the decision boundaries from the two auxiliary domains, ambiguous f 1s (x), f 2s (x), . . . , f M
s
(x)
predictions will be caused in the top-left region and the bottom-right region
of the target domain.
M
f T (x) = γk fks (x) + f (x) (9)
to obtain the target learner. In the first case, more than one k=1
source domains are available yet we have no idea which where γk ∈ (0, 1) is the predefined weight of each source
m
source domain contains more useful information that poten- classifier f ks (x), which sums to one: k=1 γk = 1. The
tially improves the target learner or whether the knowledge MMD criterion can be applied for obtaining the value of γk .
in a specific domain is against the smoothness property in The perturbation function can be formulated as f (x) =
nl
the target domain. On the other hand, in visual categorization α T y T k(x T , x), where α T is the coefficient of the i th
i=1 i i i i
tasks, the shared information across the two domains can labeled pattern in the target domain and k(·, ·) is a kernel func-
be hidden in different visual forms, e.g., appearance, local tion induced from the nonlinear feature mapping φ(·). When
symmetry, and layout, which can be captured by different applying the same kernel function to the source classifiers,
feature descriptors. A fusion strategy is required to mine the (9) can be expanded as
most helpful knowledge from multiple features. The third case
nl
nl
is that some knowledge transfer techniques are constructed f (x) =
T
γs αis yis k x iT , x + αiT yiT k x iT , x , (10)
from prelearned models, e.g., a learned bicycle classifier or a s i=1 i=1
learned bird classifier, and these models can lead to different
scales of contributions to the target model. In advance to which is the sum of a set of weighted kernel evaluations
knowledge transfer, the bad prelearned models need to be between the test pattern x and all labeled patterns x iT and x is ,
filtered out so that the good models can achieve more effective respectively, from the target domain and all the source
transfer. All the above three cases generalize the common domains. Obviously, the learning process is inefficient when
many-to-one adaptation situations in knowledge transfer, and being applied to large-scale data sets, which is the first disad-
they can all be deemed as the model selection problem. Fig. 4 vantage of A-SVM on the many-to-one adaptation application.
shows a typical example of multisource binary classification. The second disadvantage of A-SVM is its failure on using the
A straightforward approach to reduce such prediction ambigu- unlabeled target domain data DuT .
ity is to measure the model similarity between each auxiliary Duan et al. [72] proposed the domain adaptation
domain and the target domain, and apply the closest model for machine (DAM) to overcome the two disadvantages of
prediction in the target domain, i.e., if auxiliary domain 1 is A-SVM. To utilize the unlabeled target domain data DuT ,
more similar with the target domain, the decision boundary a data-dependent regularizer is defined for the target
in Fig. 4(c) will inherit the decision boundary in Fig. 4(a). classifier f T
1 T
However, data in auxiliary domain 2, which also contain S m
2
useful information for the prediction of target domain data, ( f uT ) = γs f i − fis (11)
2
are abandoned. s=1 i=1
In general, extending the existing single-source knowledge where f uT = [ f nTl +1 , . . . , f nTT ] and f us = [ f nsl +1 , . . . , f nsT ] are
transfer techniques to the multiple-source scenario can evoke defined as the decision values from the target classifier and the
two challenges: 1) how to leverage the distribution differences sth source classifier, respectively. Based on the smoothness
among multiple source-domains to promote the prediction assumption for domain adaptation, DAM minimizes the struc-
performance on the target domain task? and 2) how to tural risk function of LS-SVM as well as the data-dependent
extend the single-source knowledge transfer techniques to a regularizer simultaneously. DAM is formulated as
distributed algorithm, while only sharing some statistical data
1
n
of all source domains instead of revealing the full contents? l
T 2
min (fT) + fi − yiT + D f uT (12)
Since most existing multiple-source knowledge transfer fT 2
i=1
methods are extended from their corresponding single-source
algorithms, we structure this section in a similar manner as where ( f T ) is a regularizer to control the complexity of
Sections III and IV. the target classifier f T . Since the target classifier in DAM is
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
SHAO et al.: TRANSFER LEARNING FOR VISUAL CATEGORIZATION 1027
learned in a sparse representation, the computation inefficiency from various source domains can be used. Task-TrAdaBoost
problem of A-SVM is overcome. is constituted of two separate phases. In phase-I, traditional
By arguing that it is more beneficial to transfer from a AdaBoost is employed to extract suitable weak classifiers
few relevant source domains rather than using all the source from each source domain, respectively, under the assump-
domains as in A-SVM and DAM, Duan et al. [80] further tion that some parameters are shared between the source
design a new data-dependant regularizer in domain selection domain and the target domain. Thus, the source domain is
machine (DSM) for source domain selection described explicitly rather than implicitly with only the labeled
source domain data. Phase-II runs the AdaBoost loop again
1 T
S m
2
(f) = ds f i − f is . (13) over the target training data using the collection of all the
2 candidate weak classifiers obtained from phase-I. At each
s=1 i=1
Similar as γs in (11), which is a predefined weight measuring iteration, the weak classifier with the lowest classification
the relevance between the sth source domain and the target error on the target training data is picked out to ensure the
domain, ds ∈ {0, 1} in (13) is a domain selection indicator knowledge being transferred is more relevant to the target
for the sth source domain. When the objective function is task. In addition, the update of the weights on the target
optimized, the value of ds is 1 if the sth source domain training data drives the search of the most helpful candi-
is relevant to the target domain, and the value of ds is 0 date classifiers in the next round for boosting the target
otherwise. Another advantage of DSM over most existing classifier.
transfer learning methods is its ability to work when the
source domains and the target domain are represented by
different types of features, e.g., using static 2-D SIFT features C. Multikernel Learning
to represent the source domain data and 3-D spatio-temporal
There are many types of hidden knowledge that can be
(ST) features to represent the target domain data. The learning
transferred across different visual domains, for example, the
function of DSM can be formulated as
appearance or shape of an object part, (e.g., the shape of
S
a wheel), local symmetries between parts, (e.g., the sym-
f (x) = f 2 D(x) + f 3 D(x) = ds βs f s (x) + w ϕ(x) + b
metry between front- and back-legs for quadrupeds), and
s=1
(14) the partially shared layout, (e.g., the layout of torso and
limbs of a human). When employing knowledge transfer
S
where f 2 D(x) = s=1 ds βs f s (x)
is a weighted combination between the visual domains, though the shared knowledge
of source classifiers based on SIFT features, βs is a real-valued exists among the target data and the source data, the exact
weight for the sth source domain, f 3D (x) = w ϕ(x) + b is type of knowledge that needs to be transferred is uncertain.
the adaptation error function of space-time features, ϕ(·) is a Alternately, since these different types of knowledge can be
feature mapping function that maps x into ϕ(x), w is a weight represented by different features or different prior models,
vector, and b is a bias term. all types of knowledge can be considered by fusing these
features or prior models when constructing the target model.
B. Boosting-Based Instead of using predefined weights for all the features or
As discussed in Section IV-B, TrAdaBoost relies only on prior models, multikernel learning provides a more appropriate
one source domain, which makes it intrinsically vulnerable solution by learning the linear combination of coefficients of
to negative samples in the source domain. To avoid such the prelearned classifiers to assure the minimization of domain
a problem, Yao and Doretto [81] proposed two boosting mismatches.
approaches multisource-TrAdaBoost and task-TrAdaBoost for Motivated by A-SVM, Duan et al. [82] proposed an adaptive
knowledge transfer with multiple source domains. multiple kernel learning (A-MKL) method to cope with the
Multisource-TrAdaBoost is an extension of TrAdaBoost considerable variation in feature distributions between videos
to multiple source domains. Instead of searching for a weak from two domains. As described above, in A-SVM, the target
classifier by leveraging a single source domain, a mechanism classifier is adapted from an existing classifier trained with the
is introduced to apply all the weak classifiers in the selected source domain data. When A-SVM employs multiple source
source domain that appears to be the most relevant to the target classifiers, those classifiers are fused with fixed weights.
domain at the current iteration. Specifically, the training data of Different from A-SVM, A-MKL learns the optimal combina-
each source domain are combined with the training data in the tion of coefficients corresponding to each prelearned classifier
target domain to generate a candidate weak classifier at each to minimize the mismatch between the data distributions of
iteration, while all the source domains are considered inde- two domains under the MMD criterion.
pendent from each other. Thus, the multisource-TrAdaBoost The multimodel knowledge transfer (multi-KT) [66] method
approach significantly reduces the effects of negative transfer modifies the l2 -norm regularizer in the LS-SVM objective
caused by the imposition to knowledge transfer from a single function and constrains the new hyperplane w to be close
source domain, which is potentially not relevant to the target to hyperplanes of F prior models. The regularization term is
domain. given as w − Fj=1 β j μ j , where μ j is the hyperplane of
On the other hand, task-TrAdaBoost is a parameter-transfer the j th model, and β j determines the amount of transfer from
approach, that tries to identify which parameters that come each model, while subjecting to the constraint that β2 ≤ 1.
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
1028 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 5, MAY 2015
For a sample x, the decision function is given by 1) Discussion: The multiple source A-SVM is an intuitive
extension of A-SVM that it assembles all the source domain
F
s(x) = w · φ(x) + β j μ j · φ(x). (15) classifiers by allocating a weight γk to each source classifier.
j =1
The DAM and DSM are proposed to overcome the disadvan-
tages of multiple source A-SVM in both inefficiency and the
While the solution to multi-KT is through two separate failure of using unlabeled target domain data, where DSM
optimization problems, Jie et al. [83] proposed a multiple precedes over DAM by filtering out those less relevant source
kernel transfer learning (MKTL) method that learns the best domain data.
hyperplanes and corresponding weights assigned to each prior By introducing multiple source domains rather than
model in a unified optimization process. The MKTL utilizes one in both multisource-TrAdaBoost and task-TrAdaBoost,
the prior knowledge as experts evaluating the new query the first imperfection of TrAdaBoost has been compen-
instances and addresses such a knowledge transfer problem sated. The convergence properties of multisource-TrAdaBoost
with a multikernel learning solver. In addition to the training can be inherited directly from TrAdaBoost [21], whereas
sample x i , the prediction score s p (x i , z), z = 1, . . . , F (F is for task-TrAdaBoost they can be inherited directly from
the total number of classes), predicted by the prior models are AdaBoost [74]. It has been proved in [81] that since the
considered when learning the new model. The intuition behind convergence rate of task-TrAdaBoost has a reduced upper
such an idea is that if prior knowledge of a bicycle gives a high bound compared with multisource-TrAdaBoost, it requires
prediction score to images of a motorbike, this information fewer iterations to converge.
may also be useful for the new model of motorbikes, since Compared with A-SVM, the unlabeled data in the target
certain visual parts, (e.g., the wheels) are shared between the domain are used in the MMD criterion of A-MKL, and
two categories. Priors are built over multiple features instead the weights in the target classifier are learned automatically
of only one, and meanwhile, different learning methods are together with the optimal kernel combination. Calling the
considered. theorem in [84], for the binary-class classification of multi-KT,
multi-KT is equivalent to multiple source A-SVM based on
D. Cross-View Multiple Source Adaptation the Mahalanobis distance measure [85]. Since the relationship
For the cross-view action recognition problem, some shared between A-SVM and PMT-SVM is demonstrated in (2)–(4),
visual patterns (either spatial or ST) can exist in actions the connection between multi-KT and PMT-SVM can be
captured from more than one view-points, thus transferring naturally discovered.
knowledge from multiple source views to the target view
is more beneficial rather than transferring from a single VI. E VALUATION , A NALYSIS , AND D ISCUSSION
view. In general, there are three types of benefits that transfer
Liu et al. [12] apply the locally weighted ensemble (LWE) learning can provide for performance improvements [66],
approach introduced in [45] to fuse the multiple [86], including: 1) higher start—improved performance at
classification models. Specifically, for a set of prelearned the initial points; 2) higher slope—more rapid growth of
models f 1 , f 2 , . . . , f k , the general Bayesian model performance; and 3) higher asymptote—leading to improved
averaging approach computes the posterior distribution final performance. In the following, several simple experiments
k
of y as P(y|x) = i=1 P(y|x, D, f i )P( f i |D), where are conducted with some selected representative knowledge
P(y|x, D, fi ) = P(y|x, f i ) is the prediction made by each transfer techniques discussed above to make a comparison
model and P( fi |D) is the posterior of model f i after observing between these methods and to see whether they can meet the
the training set D. Considering the data distribution mismatch stated criteria.
across the target domain and the source domains, the model
prior for P( fi |T ) is incorporated, where T is the test set. By A. Feature-Level Knowledge Transfer Methods
replacing P( fi |D) with P( fi |T ), the difference between the
target and the source domains are considered during learning Comparison between different feature representation cross-
view transfer learning methods is given in Tables II and III,
k
where experiments are conducted on every possible pairwise
P(y|x) = w fi ,x P(y|x, f i ) (16) view combination of the IXMAS data set (i.e., twenty combi-
i=1 nations in total) and columns demonstrate the results of target
where w fi ,x = P( fi |x) is the true model weight that is views, while rows demonstrate the results of auxiliary training
locally adjusted for x representing the model’s effectiveness views. According to previous cross-view action recognition
on the target data. works, there are two different experimental settings, which
Li and Zickler [58] achieve multiview fusion by aggregating are the correspondence mode and the partially labeled mode.
the response values from the w MKL-SVM [69] classifiers In the correspondence mode, the leave-one-action-class-out
on their corresponding cross-view features x̂, beyond which a scheme is applied, where one action class is considered as
binary decision is made. Similar as the idea in MKTL [83], the orphan action in the target view, while all action videos
MKL-SVM solves a standard SVM optimization problem, of the selected class are excluded when establishing the cor-
where the kernel is defined as a linear combination of multiple respondences. Approximately 30% of the nonorphan samples
kernels. are randomly selected to serve as the correspondences, and
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
SHAO et al.: TRANSFER LEARNING FOR VISUAL CATEGORIZATION 1029
TABLE III
B. Classifier-Level Knowledge Transfer Methods
C OMPARISON B ETWEEN C ROSS -V IEW K NOWLEDGE T RANSFER
M ETHODS IN THE PARTIALLY L ABELED M ODE . R ESULTS We conduct experiments on both image classification and
A RE R EPORTED ON E VERY P OSSIBLE PAIRWISE V IEW action recognition tasks, where the PASCAL VOC 2007 data
C OMBINATION OF THE IXMAS D ATA S ET, W HERE set [89] is used for image classification and the UCF YouTube
C OLUMNS C ORRESPOND TO THE TARGET and HMDB51 data set [90] are used for action recognition.
V IEWS AND ROWS C ORRESPOND The PASCAL VOC 2007 data set contains 20 object classes,
TO S OURCE V IEWS including bird, bicycle, motorbike, and so on, among which
we choose samples from the bicycle class and the motor-
bike class as positive samples of the target domain and the
source domain, respectively, and samples from the remaining
classes as negative testing samples in the target domain. The
histogram of oriented gradients (HOG) features are extracted
from each image by dividing each image into eight cells.
The task is to learn a bicycle classifier to achieve a binary
decision over whether the test sample belongs to the bicycle
category or a different category. The target classifier is learned
by transferring information from a motorbike classifier via
the guidance of a few bicycle samples. We compare the
methods of nontransfer SVM, A-SVM, PMT-SVM, DA-SVM,
and MKTL in Table IV with different numbers of training
examples that vary from 1 to 25 with the interval of 3. Among
these methods, DA-SVM achieves the best performance in
terms of higher start and higher slope, while the PMT-SVM
achieves the best final performance.
none of these correspondences are labeled. On the other hand, The UCF YouTube action data set is a realistic data set
there are a small set of samples labeled in the partially labeled that contains camera shaking, cluttered background, variations
mode. We list the performance comparison of the above in actors’ scale, variations in illumination, and view point
mentioned methods of the correspondence mode in Table II changes. There are 11 actions contained in the UCF YouTube
and of the partially labeled mode in Table III, respectively. data set, including biking, diving, golf swinging, and so on.
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
1030 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 5, MAY 2015
TABLE IV
P ERFORMANCE C OMPARISON ON THE I MAGE C LASSIFICATION TASK B ETWEEN SVM, A-SVM, PMT-SVM, DA-SVM, AND MKTL. M ODELS
A RE L EARNED W ITH D IFFERENT N UMBERS OF T RAINING E XAMPLES OF THE B ICYCLE C LASS AND THE M OTORBIKE C LASS AS THE
S OURCE D OMAIN . F IRST ROW I NDICATES THE N UMBER OF T RAINING S AMPLES U SED IN THE S OURCE D OMAIN
TABLE V
P ERFORMANCE C OMPARISON ON THE A CTION R ECOGNITION TASK B ETWEEN SVM, A-SVM, PMT-SVM, AND MKTL M ODELS A RE L EARNED
W ITH D IFFERENT N UMBERS OF T RAINING E XAMPLES OF THE B IKING C LASS AND THE D IVING C LASS AS THE S OURCE D OMAIN ON THE
UCF Y OU T UBE D ATA S ET. F IRST ROW I NDICATES THE N UMBER OF T RAINING S AMPLES U SED IN THE S OURCE D OMAIN
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
SHAO et al.: TRANSFER LEARNING FOR VISUAL CATEGORIZATION 1031
TABLE VIII
M EANS AND S TANDARD D EVIATIONS OF MAPs OVER S IX E VENTS FOR M ETHODS IN T HREE C ASES : 1) C LASSIFIERS L EARNED BASED ON SIFT
F EATURES ; 2) C LASSIFIERS L EARNED BASED ON ST F EATURES ; AND 3) C LASSIFIERS L EARNED BASED ON B OTH SIFT AND ST F EATURES
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
1032 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 5, MAY 2015
which indicates that the shared commons are more difficult R EFERENCES
to be captured in ST features than in static SIFT features.
The effectiveness of fusing average classifiers and multiple [1] L. Shao, L. Liu, and X. Li, “Feature learning for image classification via
multiobjective genetic programming,” IEEE Trans. Neural Netw. Learn.
base kernels is proved in A-MKL by providing the best Syst., vol. 25, no. 7, pp. 1359–1371, Jul. 2014.
performances for all cases. [2] L. Liu, L. Shao, and P. Rockett, “Boosted key-frame selection
The LWE fusing approach [12] and MKL-SVM approach and correlated pyramidal motion-feature representation for human
action recognition,” Pattern Recognit., vol. 46, no. 7, pp. 1810–1818,
[58] are compared with the SVMSUT, AUGSVM, MIXSVM 2013.
methods on both the correspondence mode and the par- [3] L. Shao, D. Wu, and X. Li, “Learning deep and wide: A spectral method
tially labeled mode in Table IX for cross-view multisource for learning deep networks,” IEEE Trans. Neural Netw. Learn. Syst., doi:
10.1109/TNNLS.2014.2308519.
knowledge transfer. The overall results in the correspondence
[4] L. Zhang, X. Zhen, and L. Shao, “Learning object-to-class kernels for
mode significantly outperforms the results in the partially scene classification,” IEEE Trans. Image Process., vol. 23, no. 8, pp.
labeled mode. In the correspondence mode, LWE and 3241–3253, Aug. 2014.
MKL-SVM achieve equivalent performance, while in the [5] L. Shao, X. Zhen, D. Tao, and X. Li, “Spatio-temporal Laplacian
pyramid coding for action recognition,” IEEE Trans. Cybern., vol. 44,
partially labeled mode, MKL-SVM consistently leads to the no. 6, pp. 817–827, Jun. 2014.
best performance. [6] L. Shao, S. Jones, and X. Li, “Efficient search and localization of human
actions in video databases,” IEEE Trans. Circuits Syst. Video Technol.,
VII. C ONCLUSION vol. 24, no. 3, pp. 504–512, Mar. 2014.
[7] F. Zhu, L. Shao, and M. Lin, “Multi-view action recognition using local
In this survey, we have reviewed transfer learning techniques similarity random forests and sensor fusion,” Pattern Recognit. Lett.,
on visual categorization tasks. There are three types of knowl- vol. 34, no. 1, pp. 20–24, 2013.
edge that are useful for knowledge transfer: 1) source domain [8] X. Cao, Z. Wang, P. Yan, and X. Li, “Transfer learning for pedestrian
detection,” Neurocomputing, vol. 100, no. 1, pp. 51–57, 2013.
features; 2) source domain features and the corresponding [9] X. Gao, X. Wang, X. Li, and D. Tao, “Transfer latent variable model
labels; and 3) parameters of the prelearned source domain based on divergence analysis,” Pattern Recognit., vol. 44, nos. 10–11,
models, which indicate instance-based transfer learning, induc- pp. 2358–2366, 2011.
tive transfer learning and parameter-based transfer learning, [10] C. Orrite, M. Rodríguez, and M. Montañés, “One-sequence learning of
human actions,” in Proc. 2nd Int. Workshop Human Behavior Unter-
respectively. Through the performance comparisons between stand., Amsterdam, The Netherlands, Nov. 2011, pp. 40–51.
knowledge transfer techniques and nonknowledge transfer [11] D. Wu, F. Zhu, and L. Shao, “One shot learning gesture recognition
techniques, we can conclude that brutal forcing the source from RGBD images,” in Proc. 25th IEEE Conf. Comput. Vis. Pattern
Recognit. Workshops, Providence, RI, USA, Jun. 2012, pp. 7–12.
domain data for learning can degrade the performance of [12] J. Liu, M. Shah, B. Kuipers, and S. Savarese, “Cross-view action
the original learning system, which demonstrates the signif- recognition via view knowledge transfer,” in Proc. 24th IEEE Conf.
icance of knowledge transfer. To transfer the source domain Comput. Vis. Pattern Recognit., Colorado Springs, CO, USA, Jun. 2011,
pp. 3209–3216.
knowledge to the target domain, methods are designed from [13] L. Fei-Fei, “Knowledge transfer in learning to recognize visual objects
either the feature representation level or the classifier level. classes,” in Proc. 5th Int. Conf. Develop. Learn., Bloomington, IN, USA,
In general, the feature representation level knowledge transfer Jun. 2006.
[14] I. Biederman, “Recognition-by-components: A theory of human image
aims to unify the mismatched data in different visual domains understanding,” Psychol. Rev., vol. 94, no. 2, pp. 115–147, 1987.
to the same feature space and the classifier level knowledge [15] L. Cao, Z. Liu, and T. S. Huang, “Cross-dataset action detection,” in
transfer aims to learn a target classifier based on the parameters Proc. 23rd IEEE Conf. Comput. Vis. Pattern Recognit., San Francisco,
of prelearned source domain models, while considering the CA, USA, Jun. 2010, pp. 1998–2005.
[16] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing human actions:
data smoothness in the target domain. Thus, the feature A local SVM approach,” in Proc. 17th Int. Conf. Pattern Recognit.,
representation level knowledge transfer techniques belong to Cambridge, U.K., Aug. 2004, pp. 32–36.
either instance-based transfer or inductive transfer, while most [17] A. F. Smeaton, P. Over, and W. Kraaij, “Evaluation campaigns and
TRECVid,” in Proc. 8th ACM Int. Workshop Multimedia Inform.
classifier level knowledge transfer techniques belong to the Retrieval, Santa Barbara, CA, USA, Oct. 2006, pp. 321–330.
parameter-based transfer. To avoid transferring the negative [18] J. Yang, R. Yan, and A. G. Hauptmann, “Cross-domain video concept
knowledge and deal with the many-to-one adaptation problem, detection using adaptive SVMs,” in Proc. 15th ACM Int. Conf. Multi-
media, Augsburg, Germany, Sep. 2007.
many strategies are proposed to learn a set of weights for each [19] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans.
source domain to achieve multiple source domain knowledge Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
fusion. [20] G.-J. Qi, C. Aggarwal, Y. Rui, Q. Tian, S. Chang, and T. Huang,
Transfer learning is a tool for improving the performance of “Towards cross-category knowledge propagation for learning visual
concepts,” in Proc. 24th IEEE Conf. Comput. Vis. Pattern Recognit.,
the target domain model only in the case that the target domain Colorado Springs, CO, USA, Jun. 2011, pp. 897–904.
labeled data are not sufficient, otherwise the knowledge trans- [21] W. Dai, Q. Yang, G.-R. Xue, and Y. Yu, “Boosting for transfer learning,”
fer is meaningless. So far, most research on transfer learning in Proc. 24th Int. Conf. Mach. Learn., Corvallis, OR, USA, Jun. 2007,
pp. 193–200.
only focuses on small scale data, which cannot well reflect the [22] Y. Wang and G. Mori, “Max-margin hidden conditional random fields
potential advantage of transfer learning over regular machine for human action recognition,” in Proc. 22nd IEEE Conf. Comput. Vis.
learning techniques. The future challenges of transfer learning Pattern Recognit., Miami, FL, USA, Jun. 2009, pp. 872–879.
[23] A. Yao, J. Gall, and L. Van Gool, “A Hough transform-based vot-
should lie in two aspects: 1) how to mine the information ing framework for action recognition,” in Proc. 23rd IEEE Conf.
that would be helpful for the target domain from highly noisy Comput. Vis. Pattern Recognit., San Francisco, CA, USA, Jun. 2010,
source domain data and 2) how to extend the existing transfer pp. 2061–2068.
[24] T. Xia, D. Tao, T. Mei, and Y. Zhang, “Multiview spectral embed-
learning methods to deal with large-scale source domain ding,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 40, no. 6,
data. pp. 1438–1446, Dec. 2010.
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
SHAO et al.: TRANSFER LEARNING FOR VISUAL CATEGORIZATION 1033
[25] H. Wang, F. Nie, H. Huang, and C. Ding, “Dyadic transfer learning [50] C. Rao, A. Yilmaz, and M. Shah, “View-invariant representation and
for cross-domain image classification,” in Proc. 13th IEEE Int. Conf. recognition of actions,” Int. J. Comput. Vis., vol. 50, no. 2, pp. 203–226,
Comput. Vis., Barcelona, Spain, Nov. 2011, pp. 551–556. 2002.
[26] T. Li and C. Ding, “The relationships among various nonnegative matrix [51] V. Parameswaran and R. Chellappa, “View invariance for human action
factorization methods for clustering,” in Proc. 6th IEEE Int. Conf. Data recognition,” Int. J. Comput. Vis., vol. 66, no. 1, pp. 83–101, 2006.
Mining, Hong Kong, Dec. 2006, pp. 362–371. [52] T. Syeda-Mahmood, A. Vasilescu, and S. Sethi, “Recognizing action
[27] C. Ding, T. Li, W. Peng, and H. Park, “Orthogonal nonnegative matrix events from multiple viewpoints,” in Proc. Workshop Detect. Recognit.
t-factorizations for clustering,” in Proc. 12th ACM Int. Conf. Knowl. Dis- Events Video, Vancouver, BC, Canada, May 2001, pp. 64–72.
covery Data Mining, Philadelphia, PA, USA, Aug. 2006, pp. 126–135. [53] A. Yilmaz and M. Shah, “Actions sketch: A novel action representation,”
[28] S. J. Pan, J. T. Kwok, and Q. Yang, “Transfer learning via dimensionality in Proc. 18th IEEE Conf. Comput. Vis. Pattern Recognit., San Diego,
reduction,” in Proc. 23rd Nat. Conf. Artif. Intell. (AAAI), Chicago, IL, CA, USA, Jun. 2005, pp. 984–989.
USA, Jul. 2008, pp. 677–682. [54] A. Gritai, Y. Sheikh, and M. Shah, “On the use of anthropometry in the
[29] M. E. Taylor and P. Stone, “Transfer learning for reinforcement learning invariant analysis of human actions,” in Proc. 17th Int. Conf. Pattern
domains: A survey,” J. Mach. Learn. Res., vol. 10, no. 1, pp. 1633–1685, Recognit., Cambridge, U.K., Aug. 2004, pp. 923–926.
2009. [55] I. Junejo, E. Dexter, I. Laptev, and P. Pérez, “Cross-view action recog-
[30] I. Jhuo, D. Liu, D. Lee, and S. Chang, “Robust visual domain adaptation nition from temporal self-similarities,” in Proc. 10th Eur. Conf. Comput.
with low-rank reconstruction,” in Proc. 25th IEEE Conf. Comput. Vis. Vis., Marseille, France, Oct. 2008, pp. 293–306.
Pattern Recognit., Providence, RI, USA, Jun. 2012, pp. 2168–2175. [56] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as
[31] K. Borgwardt, A. Gretton, M. Rasch, H. Kriegel, B. Schölkopf, and space-time shapes,” in Proc. 10th IEEE Int. Conf. Comput. Vis., Beijing,
A. Smola, “Integrating structured biological data by kernel maximum China, Oct. 2005, pp. 1395–1402.
mean discrepancy,” Bioinformatics, vol. 22, no. 14, pp. e49–e57, 2006. [57] S. Seitz and C. Dyer, “View-invariant analysis of cyclic motion,” Int. J.
[32] H. Daumé, “Frustratingly easy domain adaptation,” in Proc. 45th Meet- Comput. Vis., vol. 25, no. 3, pp. 231–251, 1997.
ing Assoc. Comput. Linguist., Prague, Czech Republic, Jun. 2007. [58] R. Li and T. Zickler, “Discriminative virtual views for cross-view action
[33] X. Zhou, X. Zhuang, S. Yan, S.-F. Chang, M. Hasegawa-Johnson, and recognition,” in Proc. 25th IEEE Conf. Comput. Vis. Pattern Recognit.,
T. S. Huang, “SIFT-bag kernel for video event analysis,” in Proc. Providence, RI, USA, Jun. 2012, pp. 2855–2862.
16th ACM Int. Conf. Multimedia, Vancouver, BC, Canada, Oct. 2008, [59] J. Zheng, Z. Jiang, P. Phillips, and R. Chellappa, “Cross-view action
pp. 229–238. recognition via a transferable dictionary pair,” in Proc. 23rd British
[34] V. W. Zheng, D. H. Hu, and Q. Yang, “Cross-domain activity recogni- Mach. Vis. Conf., Surrey, U.K., Sep. 2012.
tion,” in Proc. 11th Int. Conf. Ubiquitous Comput., Orlando, FL, USA, [60] F. Zhu and L. Shao, “Correspondence-free dictionary learning for cross-
Jun. 2009, pp. 61–70. view action recognition,” in Proc. 22nd Int. Conf. Pattern Recognit.,
[35] R. Yan, L. Shao, and Y. Liu, “Nonlocal hierarchical dictionary learning Stockholm, Sweden, Aug. 2014.
using wavelets for image denoising,” IEEE Trans. Image Process., [61] A. Quattoni, M. Collins, and T. Darrell, “Transfer learning for image
vol. 22, no. 12, pp. 4689–4698, Dec. 2013. classification with sparse prototype representations,” in Proc. 21st IEEE
[36] L. Shao, R. Yan, X. Li, and Y. Liu, “From heuristic optimization Conf. Comput. Vis. Pattern Recognit., Anchorage, AK, USA, Jun. 2008,
to dictionary learning: A review and comprehensive comparison of pp. 1–8.
image denoising algorithms,” IEEE Trans. Cybern., vol. 44, no. 7, [62] A. Farhadi and M. Tabrizi, “Learning to recognize activities from the
pp. 1001–1013, Jul. 2014. wrong view point,” in Proc. 10th Eur. Conf. Comput. Vis., Marseille,
[37] J. Tang, L. Shao, and X. Li, “Efficient dictionary learning for visual France, Oct. 2008, pp. 154–166.
categorization,” Comput. Vis. Image Understand., vol. 124, no. 1, [63] Y. Aytar and A. Zisserman, “Tabula rasa: Model transfer for object cate-
pp. 91–98, 2014. gory detection,” in Proc. 13th IEEE Int. Conf. Comput. Vis., Barcelona,
[38] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught Spain, Nov. 2011, pp. 2252–2259.
learning: Transfer learning from unlabeled data,” in Proc. 24th Int. Conf. [64] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
Mach. Learn., Corvallis, OR, USA, Jun. 2007, pp. 759–766. detection,” in Proc. 21st IEEE Conf. Comput. Vis. Pattern Recognit.,
San Diego, CA, USA, Jun. 2005, pp. 886–893.
[39] F. Zhu and L. Shao, “Weakly-supervised cross-domain dictionary learn-
[65] D. Lowe, “Distinctive image features from scale-invariant keypoints,”
ing for visual recognition,” Int. J. Comput. Vis., vol. 109, nos. 1–2,
pp. 42–59, 2014. Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
[66] T. Tommasi, F. Orabona, and B. Caputo, “Safety in numbers: Learning
[40] J. Blitzer, R. McDonald, and F. Pereira, “Domain adaptation with
categories from few examples with multi model knowledge transfer,” in
structural correspondence learning,” in Proc. Conf. Empirical Methods
Proc. 23rd IEEE Conf. Comput. Vis. Pattern Recognit., San Francisco,
Natural Lang. Process., Sydney, Australia, Jul. 2006, pp. 120–128.
CA, USA, Jun. 2010, pp. 3081–3088.
[41] B. Gong, Y. Shi, F. Sha, and K. Grauman, “Geodesic flow kernel for
[67] C. Cawley, “Leave-one-out cross-validation based model selection cri-
unsupervised domain adaptation,” in Proc. 25th IEEE Conf. Comput.
teria for weighted LS-SVMs,” in Proc. IEEE Int. Joint Conf. Neural
Vis. Pattern Recognit., Providence, RI, USA, Jun. 2012, pp. 2066–2073.
Netw., San Diego, CA, USA, Jul. 2006, pp. 1661–1668.
[42] J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid [68] G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and
matching using sparse coding for image classification,” in Proc. 25th M. I. Jordan, “Learning the kernel matrix with semidefinite program-
IEEE Conf. Comput. Vis. Pattern Recognit., Miami, FL, USA, Jun. 2009,
ming,” J. Mach. Learn. Res., vol. 5, pp. 27–72, Dec. 2004.
pp. 1794–1801. [69] A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet,
[43] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” “SimpleMKL,” J. Mach. Learn. Res., vol. 9, no. 11, pp. 2491–2521,
Chemometrics Intell. Lab. Syst., vol. 2, no. 1, pp. 37–52, 1987. 2008.
[44] P. O. Hoyer, “Non-negative sparse coding,” in Proc. 12th IEEE Workshop [70] S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf, “Large
Neural Netw. Signal Process., Miami, FL, USA, Jun. 2002, pp. 557–565. scale multiple kernel learning,” J. Mach. Learn. Res., vol. 7, no. 6,
[45] D. Weinland, R. Ronfard, and E. Boyer, “Free viewpoint action recog- pp. 1531–1565, 2006.
nition using motion history volumes,” Comput. Vis. Image Understand., [71] L. Duan, I. Tsang, and D. Xu, “Domain transfer multiple kernel
vol. 104, nos. 2–3, pp. 249–257, 2006. learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 3,
[46] T. J. Darrell, I. A. Essa, and A. P. Pentland, “Task-specific gesture pp. 465–479, Mar. 2012.
analysis in real-time using interpolated views,” IEEE Trans. Pattern [72] L. Duan, I. Tsang, D. Xu, and S. Maybank, “Domain transfer SVM
Anal. Mach. Intell., vol. 18, no. 12, pp. 1236–1242, Dec. 1996. for video concept detection,” in Proc. 22nd IEEE Conf. Comput. Vis.
[47] D. M. Gavrila and L. S. Davis, “3-D model-based tracking of humans Pattern Recognit., Miami, FL, USA, Jun. 2009, pp. 1375–1381.
in action: A multi-view approach,” in Proc. 9th IEEE Conf. Comput. [73] B. Schölkopf et al., “Correcting sample selection bias by unlabeled
Vis. Pattern Recognit., San Francisco, CA, USA, Jun. 1996, pp. 73–80. data,” in Proc. 20th Conf. Neural Inform. Process. Syst., Dec. 2006,
[48] F. Lv and R. Nevatia, “Single view human action recognition using key pp. 601–608.
pose matching and Viterbi path searching,” in Proc. 20th IEEE Conf. [74] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of
Comput. Vis. Pattern Recognit., Minneapolis, MN, USA, Jun. 2007, on-line learning and an application to boosting,” J. Comput. Syst. Sci.,
pp. 1–8. vol. 55, no. 1, pp. 119–139, 1997.
[49] D. Weinland, E. Boyer, and R. Ronfard, “Action recognition from [75] L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object
arbitrary views using 3D exemplars,” in Proc. 11th Int. Conf. Comput. categories,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4,
Vis., Rio de Janeiro, Brazil, Oct. 2007, pp. 1–8. pp. 594–611, Apr. 2006.
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.
1034 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 5, MAY 2015
[76] X. Yu and Y. Aloimonos, “Attribute-based transfer learning for object [96] T. Chua, J. Tang, R. Hong, H. Li, L. Zhiping, and Y. Zheng,
categorization with zero/one training example,” in Proc. 11th Eur. Conf. “NUS-WIDE: A real-world web image database from National
Comput. Vis. (ECCV), Hersonissos, Greece, Sep. 2010, pp. 127–140. University of Singapore,” in Proc. ACM Int. Conf. Image Video Retrieval,
[77] M. Rosen-Zvi, C. Chemudugunta, T. Griffiths, P. Smyth, and Santorini, Greece, Jul. 2009, pp. 48–56.
M. Steyvers, “Learning author-topic models from text corpora,” ACM [97] Y.-G. Jiang, G. Ye, S.-F. Chang, D. Ellis, and A. C. Loui, “Consumer
Trans. Inform. Syst., vol. 28, no. 1, pp. 1–38, 2010. video understanding: A benchmark database and an evaluation of human
[78] Z. Deng, Y. Jiang, K.-S. Choi, F.-L. Chung, and S. Wang, “Knowledge- and machine performance,” in Proc. 1st ACM Int. Conf. Multimedia
leverage-based TSK fuzzy system modeling,” IEEE Trans. Neural Netw. Retrieval, Trento, Italy, Apr. 2011, pp. 29–37.
Learn. Syst., vol. 24, no. 8, pp. 1200–1212, Aug. 2013.
[79] Z. Deng, Y. Jiang, F.-L. Chung, H. Ishibuchi, and S. Wang, “Knowledge-
leverage-based fuzzy system and its modeling,” IEEE Trans. Fuzzy Syst.,
vol. 21, no. 4, pp. 597–609, Aug. 2013.
[80] L. Duan, D. Xu, and S.-F. Chang, “Exploiting web images for event
recognition in consumer videos: A multiple source domain adaptation
approach,” in Proc. 25th IEEE Conf. Comput. Vis. Pattern Recognit.,
Providence, RI, USA, Jun. 2012, pp. 1338–1345. Ling Shao (M’09–SM’10) received the B.Eng.
[81] Y. Yao and G. Doretto, “Boosting for transfer learning with multiple degree from the University of Science and Technol-
sources,” in Proc. 23rd IEEE Conf. Comput. Vis. Pattern Recognit., ogy of China, Hefei, China, and the M.Sc. and Ph.D.
San Francisco, CA, USA, Jun. 2010, pp. 1855–1862. degrees from the University of Oxford, Oxford,
[82] L. Duan, D. Xu, I. Tsang, and J. Luo, “Visual event recognition in videos U.K.
by learning from web data,” IEEE Trans. Pattern Anal. Mach. Intell., He is a Senior Lecturer (Associate Professor)
vol. 34, no. 9, pp. 1667–1680, Sep. 2012. with the Department of Electronic and Electrical
[83] L. Jie, T. Tommasi, and B. Caputo, “Multiclass transfer learning from Engineering, University of Sheffield, Sheffield, U.K.,
unconstrained priors,” in Proc. 13th IEEE Int. Conf. Comput. Vis., and a Guest Professor with the College of Electronic
Barcelona, Spain, Nov. 2011, pp. 1863–1870. and Information Engineering, Nanjing University
[84] J. Ye and T. Xiong, “SVM versus least squares SVM,” in Proc. 7th Int. of Information Science and Technology, Nanjing,
Conf. Artif. Intell. Stat., Scottsdale, AZ, USA, Apr. 2007, pp. 644–651. China. He has authored and co-authored over 120 papers in well-known
[85] H. Trevor, T. Robert, and H. Friedman, The Elements of Statistical journals/conferences such as International Journal of Computer Vision, the
Learning. New York, NY, USA: Springer-Verlag, 2001. IEEE T RANSACTIONS ON I MAGE P ROCESSING, the IEEE T RANSACTIONS
[86] E. Olivas, M. Guerrero, M. B. M. Sober, and S. Lopez, Handbook of ON N EURAL N ETWORKS AND L EARNING S YSTEMS , the IEEE T RANSAC -
Research on Machine Learning Applications and Trends: Algorithms, TIONS ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY , the IEEE
Methods and Techniques, vol. 2. Hershey, PA, USA: Information Science T RANSACTIONS ON C YBERNETICS , Pattern Recognition, Computer Vision
IGI Publishing, 2009. and Image Understanding, the IEEE Conference on Computer Vision and Pat-
[87] A. Farhadi, M. Tabrizi, I. Endres, and D. Forsyth, “A latent model tern Recognition, the International Joint Conference on Artificial Intelligence,
of discriminative aspect,” in Proc. IEEE 12th Int. Conf. Comput. Vis., ACM Multimedia, and the British Machine Vision Conference, and holds more
Kyoto, Japan, Sep. 2009, pp. 948–955. than 10 European/U.S. patents. His current research interests include computer
[88] A. Bergamo and L. Torresani, “Exploiting weakly-labeled web images vision, image/video processing, pattern recognition, and machine learning.
to improve object classification: A domain adaptation approach,” in Dr. Shao is an Associate Editor of the IEEE T RANSACTIONS ON C YBER -
Proc. 24th Conf. Neural Inform. Process. Syst., Trento, Italy, Apr. 2010, NETICS , Information Sciences, and several other journals. He is a fellow of
pp. 29–37. the British Computer Society.
[89] (2007). The PASCAL Visual Object Classes Challenge 2007 (VOC2007)
Results [Online]. Available: http://www.pascal-network.org/challenges/
VOC/voc2007/workshop/index.html
[90] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “HMDB:
A large video database for human motion recognition,” in Proc. Fan Zhu (S’12) received the B.S. degree from
13th IEEE Int. Conf. Comput. Vis., Barcelona, Spain, Nov. 2011, the Wuhan Institute of Technology, Wuhan, China,
pp. 2556–2563. in 2010, and the M.Sc. (Hons.) degree from the
[91] H. Wang, A. Klaser, C. Schmid, and C. L. Liu, “Action recognition University of Sheffield, Sheffield, U.K., in 2012,
by dense trajectories,” in Proc. 24th IEEE Conf. Comput. Vis. Pattern where he is currently pursuing the Ph.D. degree
Recognit., Colorado Springs, CO, USA, Jun. 2011, pp. 3169–3176. with the Department of Electronic and Electrical
[92] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning Engineering.
realistic human actions from movies,” in Proc. 21st IEEE Conf. Comput. His current research interests include submodular
Vis. Pattern Recognit., Anchorage, AK, USA, Jun. 2008, pp. 1–8. optimization for computer vision, sparse coding, and
[93] N. Dalal, B. Triggs, and C. Schmid, “Human detection using oriented dictionary learning and transfer learning.
histograms of flow and appearance,” in Proc. 9th Eur. Conf. Comput.
Vis., Graz, Austria, May 2006, pp. 428–441.
[94] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-
constrained linear coding for image classification,” in Proc. 23rd
IEEE Conf. Comput. Vis. Pattern Recognit., San Francisco, CA, USA,
Jun. 2010, pp. 3360–3367. Xuelong Li (M’02–SM’07–F’12) is currently a Full Professor with the
[95] M. Elad and M. Aharon, “Image denoising via sparse and redundant Center for OPTical IMagery Analysis and Learning, State Key Laboratory
representations over learned dictionaries,” IEEE Trans. Image Process., of Transient Optics and Photonics, Xi’an Institute of Optics and Precision
vol. 15, no. 12, pp. 3736–3745, Dec. 2006. Mechanics, Chinese Academy of Sciences, Xi’an, China.
Authorized licensed use limited to: Universidad Tecnologica de Panama. Downloaded on January 29,2025 at 22:46:08 UTC from IEEE Xplore. Restrictions apply.