0% found this document useful (0 votes)
13 views14 pages

Transfer Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views14 pages

Transfer Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

3418 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 28, NO.

12, DECEMBER 2020

Fuzzy Multiple-Source Transfer Learning


Jie Lu , Fellow, IEEE, Hua Zuo , Member, IEEE, and Guangquan Zhang

Abstract—Transfer learning is gaining increasing attention due relates to the original domain. The first survey, which is state-of-
to its ability to leverage previously acquired knowledge to assist in the-art, on transfer learning [6] provides important definitions in
completing a prediction task in a related domain. Fuzzy transfer transfer learning. As part of this review, transfer learning studies
learning, which is based on fuzzy systems and particularly fuzzy
rule-based models, was developed due to its capacity to deal with are categorized into multitask learning [7], domain adaptation
uncertainty. However, one issue with fuzzy transfer learning, even [8], and cross-domain learning [9]. However, as this area has
in the area of general transfer learning, has not been resolved: how attracted many researchers, more methods are being developed
to combine and then use knowledge when multiple-source domains to handle transfer learning problems, and survey papers are
are available. This study presents new methods for merging fuzzy beginning to focus on precise areas, e.g., visualization [10],
rules from multiple domains for regression tasks. Two different
settings are separately explored: homogeneous and heterogeneous reinforcement learning [11], activity recognition [12], computa-
space. In homogeneous situations, knowledge from the source do- tional intelligence [13], and collaborative recommendation [14].
mains is merged in the form of fuzzy rules. In heterogeneous situa- The current applications for transfer learning techniques are
tions, knowledge is merged in the form of both data and fuzzy rules. extensive—from image processing [15] to text categorization
Experiments on both synthetic and real-world datasets provide [16] to natural language processing [17] to fault diagnosis [18]
insights into the scope of applications suitable for the proposed
methods and validate their effectiveness through comparisons with and beyond.
other state-of-the-art transfer learning methods. An analysis of Yet, while these existing methods have had some success
parameter sensitivity is also included. in handling domain adaptation issues, most ignore the inher-
Index Terms—Domain adaptation, fuzzy systems, machine ent phenomenon of uncertainty—a crucial factor during the
learning, regression, transfer learning. knowledge transfer process [19]. There is a clear codependency
between the level of certainty in learning a task and the amount
I. INTRODUCTION of information that is available. Problems with too little infor-
ACHINE learning [1] has deeply affected the great mation have a high degree of uncertainty. If there are too few
M achievements gained in many areas of data science,
including computer vision [2], biology [3], medical imaging
data with labels in the target domain, only a finite amount of
information can be extracted, and this leads to a high degree
[4], and business management [5]. However, fundamentally, of uncertainty. However, the emergence of fuzzy systems has
many well-known machine-learning algorithms, such as neural shown promising results in overcoming this problem [20].
networks, support vector machine (SVM), and Bayesian net- The integration of fuzzy logic with transfer learning has drawn
work, are supervised processes, which means the performance considerable attention in the literature. For example, researchers
and generalizability of the resulting models tend to rely on have applied fuzzy sets to represent linguistic variables when
massive amounts of labeled data. Unfortunately, in some fields, feature values cannot be precisely described numerically, while
especially in new and emerging areas of business, gathering fuzzy distances assist the retrieval of similar cases [21]. Trans-
enough labeled data to train a model properly is difficult, even ferring implicit and explicit knowledge from similar domains is
impossible. Without enough labeled data, the accuracy and hidden and uncertain by nature, so using fuzzy logic and fuzzy
generalizability of a model suffers. Thus, transfer learning [6] rule theory to handle the associated vagueness and uncertainty
has emerged as a potential solution. is apt and can improve transfer accuracy. Hence, many scholars
Transfer learning, in general, addresses the problem of how have turned to fuzzy systems as a solution to transfer learning
to leverage previously acquired knowledge to improve the effi- problems with promising results. Deng et al. [22], [23] proposed
ciency and accuracy of learning in one domain that in some way a series of transfer learning methods using a Mamdani–Larsen-
type fuzzy system and a Takagi–Sugeno–Kang (TSK) fuzzy
Manuscript received January 27, 2019; revised June 17, 2019 and September model coupled with novel fuzzy logic algorithms that include
3, 2019; accepted October 31, 2019. Date of publication November 11, 2019; definitions for two new objective functions. Furthermore, they
date of current version December 1, 2020. This work was supported by the
Australian Research Council under Discovery Grant DP 170101632. (Corre- applied their methods to scenarios with insufficient data, such
sponding author: Jie Lu.) as recognizing electroencephalogram signals in environments
The authors are with the Decision Systems and e-Service Intelligence Labora- with a data shortage. Behbood et al. [24], [25] proposed a
tory, Centre for Artificial Intelligence, Faculty of Engineering and Information
Technology, University of Technology Sydney, Sydney, NSW 2007, Australia fuzzy-based transfer learning approach to long-term bank failure
(e-mail: [email protected]; [email protected]; [email protected]. prediction models with source and target domains that have dif-
au). ferent data distributions. Liu et al. [26] focused on unsupervised
Color versions of one or more of the figures in this article are available online
at http://ieeexplore.ieee.org. heterogeneous domain adaptation problems, presenting a novel
Digital Object Identifier 10.1109/TFUZZ.2019.2952792 transfer learning model that incorporates n-dimensional fuzzy

1063-6706 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3419

geometry and fuzzy equivalence relations. A metric based on n- We propose two algorithms to handle knowledge transfer from
dimensional fuzzy geometry is defined to measure the similarity multiple-source domains to one target domain for regression
of features between domains. Shared fuzzy equivalence relations tasks, and the use of fuzzy systems confers the model the
then force the same number of clustering categories given the capacity to handle uncertainty in an information insufficient
same value of α, which means knowledge can be transferred environment and improves the prediction accuracy.
from the source domain to the target domain in heterogeneous The remainder of this article is structured as follows. Section II
space through the clustering categories. presents the preliminaries of this article, including some im-
Despite these advancements in fuzzy system-based transfer portant definitions in transfer learning, and the main prediction
learning methods, there is still the main issue that has not been model applied, i.e., the Takagi–Sugeno (T–S) fuzzy model.
solved: how to merge and transfer knowledge when multiple- Section III presents a four-step algorithm for the domain adap-
source domains are available. This case is quite common in tation process using multiple-source domains in homogeneous
the real world. For example, a company needs to determine the space. Section IV presents an algorithm for multiple-source
price for a new type of computer entering the Australian market domain knowledge transfer in heterogeneous situations, which
but has very little data on consumer responses to the product. includes two approaches, implemented simultaneously—one
However, data for two other types of computers sold in Australia with four steps and one with five steps. Sections V and VI present
are available. So, how might these two datasets (source domains) the validation tests of the two proposed algorithms using both
be used to support the pricing decision at hand (target domain)? synthetic and real-world datasets. Section VII concludes this
There have already been some studies on multiple-source article and outlines future work.
domain adaptation problems. Yao and Doretto [27] proposed
two new algorithms, MultiSource-TrAdaBoost and TaskTraAd- II. PRELIMINARIES
aBoost, which extend the boosting framework for transfer-
ring knowledge from multiple sources. These algorithms re- This section begins with some basic definitions of transfer
duce negative transfers by increasing the number of sources. learning, followed by an introduction to the T–S fuzzy model,
Tan et al. [28] presented a novel algorithm to leverage knowl- which is the basic prediction model used in our multiple-source
edge from different views and sources collaboratively by letting domain adaptation method.
different views from different sources complement each other
through a cotraining style framework to reduce the differences A. Definitions
in distribution across different domains. Beyond transferring
Definition 1 (Domain) [6]: A domain is denoted as D =
the source data, Zhuang et al. [29] discovered a more powerful
{F, P (X)}, where F is a feature space, and P (X) X =
feature representation of the data when transferring knowledge
{x1 , . . . , xn } are the probability distributions of the instances.
from multiple-source domains to the target domain. Here, au-
Definition 2 (Task) [6]: A task is denoted as T = {Y, f (·)},
toencoders are used to construct a feature mapping from an
where Y ∈ R is the output, and f (·) is an objective predictive
original instance to a hidden representation, and multiple classi-
function.
fiers from the source domain data are jointly trained to learn the
Definition 3 (Transfer Learning) [6]: Given a source domain
hidden representation and classifiers simultaneously. However,
Ds , a learning task Ts , a target domain Dt , and a learning task Tt ,
these approaches were developed for classification tasks and,
transfer learning aims to improve learning of the target predictive
thus far, combining knowledge from multiple sources cannot be
function ft (·) in Dt using the knowledge in Ds and Ts where
translated into fuzzy systems, which are superior at handling
Ds = Dt or Ts = Tt .
uncertainty in domain adaptation problems.
In short, transfer learning aims to use previously acquired
Some of our own previous research has focused on developing
knowledge (from a source domain) to assist prediction tasks in
the domain adaptation ability of fuzzy rule-based models with
a new, but related domain (the target domain).
regression tasks [30], [31]. We proposed a set of algorithms for
two different scenarios, where the datasets for the source domain
and target domain were homogeneous [32] and heterogeneous B. T–S Fuzzy Model
[33]. In this article, we explore the ability of fuzzy systems A fuzzy system, in this case, a T–S model, comprises a set of
to deal with transfer learning problems when multiple-source IF–THEN fuzzy rules in the following form:
domains are available based on these previous works. If x is Ai (x, v i ), then y is Li (x, ai )
The specific contribution of this article is to advance the do-
main adaptation ability of fuzzy rule-based systems in multiple- i = 1, . . . , c (1)
source environments for regression tasks. The current transfer
learning methods cannot deal with the regression tasks with where v i are the prototypes, and ai are the coefficients of the
multiple sources. In principle, sometimes a single source transfer linear function.
is better than multidomains and sometimes multidomain transfer This T–S fuzzy model is built using a set of instances
is better than single domain, which determines the “similarity” {(x1 , y1 ), . . . , (xN , yN )} using a sequence of two procedures
between source domains and the target domain. This article, in [34]: from the conditions A1 , . . . , Ac through fuzzy cluster-
fact, aims to identify which source domains(s) are more suitable ing, and from the optimized parameters of the linear functions
than others to transfer knowledge to a given target domain. Li (x, ai ).

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
3420 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020

instances without labels. The numbers of instances in T L and


T U are Nt1 and Nt − Nt1 , respectively, and satisfy Nt1  Nt ,
Nt1  Ns1 , . . . , Nt1  Nsh .
In each source domain, a well-performed prediction model
could be built since there is a sufficient amount of labeled data
and, hence, corresponding sets of fuzzy rules can be obtained.
In homogeneous space, the dimensionality of the input space
in the h source domains and the target domain are the same,
and the input variables have exactly the same meanings. The
datasets S 1 , . . . , S h and T are distinguished from each other
due to discrepancies in their distributions. Therefore, the rules
Fig. 1. T–S fuzzy model in a neural network structure. in the source domains cannot be directly used to solve prediction
problems in the target domain.
The T–S fuzzy model could also be rewritten in the form of a Many general transfer learning methods are easily able to
neural network with the structure illustrated in Fig. 1. The first solve single-domain transfer problems, but there are two main
layer represents the input data, and each neuron in the second issues when using fuzzy rule-based models to solve problems
layer represents a cluster, which also represents the condition that involve multiple-source domains. First, brute force meth-
of a fuzzy rule. The third layer contains the corresponding ods that combine all the rules in the source domains lead to
consequences of the fuzzy rules, i.e., the output. redundancy. In addition, accumulating all the rules increases
The output of the T–S fuzzy model is calculated by the number of parameters to optimize, which in turn increases
c
the computational complexity. However, a fuzzy rule-based
 Ai (x, v i ) multiple-source transfer learning method can overcome these
y= c · Li (x, ai ) . (2)
i=1 j=1 Aj (x, v j ) problems. The details are proposed in the following section.

III. FUZZY TRANSFER LEARNING USING MULTIPLE SOURCES B. Multiple-Source Domain Adaptation in
IN HOMOGENEOUS SPACE Homogeneous Space
This section presents a method for transferring knowledge The method of transferring knowledge from multiple-source
from multiple-source domains to the target domain in homo- domains to a target domain can be summarized in four steps.
geneous space. The multiple-domain adaptation problem with Step 1: Combine all the rules in the source domains.
a fuzzy rule-based model is outlined first with formulas. And Given h source domains S 1 , . . . , S h , h sets of fuzzy rules
the specific challenge with implementing knowledge transfer in would be obtained, denoted as Rs1 , . . . , Rsh
such cases is described. Second, the procedures in the proposed       
Rs1 = r v s1 s1
1 , a1 , r v s1 s1
2 , a2 , . . . , r v s1 s1
c1 , ac1
method are described in detail.
....
A. Problem Statement       
Rsh = r v sh sh
1 , a1 , r v sh sh
2 , a2 , . . . , r v sh sh
ch , ach (5)
Consider there are h source domains with large amounts of
labeled data and a target domain with very little labeled data. The where r(v sj sj
i , ai ), (i = 1, . . . , cj), represents a rule in the
datasets in multiple-source domains are denoted as S 1 , . . . , S h source domain S j , j = 1, . . . , h, v sj
k is the prototype, i.e., the
    center of the data clusters, and asj are the coefficients of the cor-
S 1 = xs1 1 , y1
s1
, . . . , xs1 s1
Ns1 , yNs1
k
responding linear functions. The rule r(v sj sj
k , ak ) is represented
.... as
 sh sh    If xsj sj sj sj sj sj
Sh = x1 , y1 , . . . , xsh sh k is Ai (xk , v i ), then yk is Li (xk , ak )
Nsh , yNsh . (3)
i = 1, . . . , cj. (6)
(xsj sj
k , yk ) is the kth input–output data pair in the jth source
domain, where xsj n
k ∈ R (k = 1, . . . , Nsj , j = 1, . . . , h) is an
In homogeneous cases, there is an assumption that the source
n-dimensional input variable, the label yksj ∈ R is a continuous data are not accessible after constructing the model, and only
variable, and Nsj indicates the number of data pairs. the rules are available. This could preserve the privacy of the
The dataset in the target domain T consists of two subsets: source data.
one with labels and one without The rules in S 1 , . . . , S h are combined and denoted as Rs
        
T = {T L , T U } = xt1 , y1t , . . . , xtNt1 , yN
t
, Rs = r v s1 1 , a1
s1
, . . . , r v s1 s1
c1 , ac1 , . . . ,
t1
   
 r v sh 1 , a1
sh
, . . . , r v sh sh
ch , ach (7)
xtNt1+1 , . . . , xtNt (4)
which can be rewritten as
where xtk ∈ Rn (k = 1, . . . , Nt ) is the n-dimensional input vari-
Rs = {r (v s1 , as1 ) , r (v s2 , as2 ) , . . . , r (v scs , ascs )} (8)
able, ykt ∈ R is a label that is only accessible for the first Nt1
data. T L contains the instances with labels, and T U contains the where cs = c1 + · · · + ch.

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3421

Fig. 2. Example of the results for IGMM.

Due to the different distributions of the source and target data,


the rules in Rs will have poor prediction accuracy for the target
Fig. 3. Modified T–S model with changed inputs and outputs.
data.
Step 2: Determine the number of fuzzy rules in the target
domain.
The distances between each element in {v t1 , v t2 , . . . , v tct }
To effectively use the rules in the source domain, it is im-
and all the centers in the source domains {v s1 , v s2 , . . . , v scs } are
portant if not crucial to determine the number of clusters or the
measured, and the corresponding rule with the smallest distance
number of fuzzy rules in target domain to inform how many
is selected. Therefore, the ct rules in Rs for the target domain
(and which) rules to select for the target domain in the next step.
are selected after calculating the distances for all the elements in
Here, an infinite Gaussian mixture model (IGMM) is used to
{v t1 , v t2 , . . . , v tct }. For simplicity, assume the first few ct rules
explore the data structure of the target domain and determine
in Rs are the ones selected, denoted as
the number of fuzzy rules.
IGMM simulates the distribution of the target data by us- Rt = {R (v s1 , as1 ) , . . . , R (v sct , asct )} . (11)
ing {xt1 , . . . , xtNt } in an unsupervised learning manner. Fig. 2
illustrates the probability of finding various data structures in
a dataset in histogram form. The x-coordinate represents the Step 4: Modify the selected rules to fit the target data.
number of Gaussian distributions, i.e., the number of clusters, The selected rules in Rt cannot be used to solve the regression
and the y-coordinate represents the number of times the dataset tasks in the target domain because they have different data
has been divided into the corresponding clusters. As the figure distributions. Thus, some techniques we presented in a previous
shows, in 2000 iterations of IGMM, the dataset was divided into paper [32] are used to modify the fuzzy rules by changing
three clusters more than 1000 times, into four clusters about 500 the input/output spaces through mappings. Fig. 3 shows the
times, into one cluster about 250 times, and into two or five modifications to the T–S model.
clusters less than 100 times. Therefore, we can conclude, with Comparing the structure of the T–S models shown in Fig. 1
high probability, that the dataset is composed of three Gaussian versus Fig. 3, two modifications have been made, i.e., the input
distributions (clusters). space and output space use the mappings Φ and Ψ, respectively,
Applying the technique IGMM, the number of rules in the tar- for the dotted lines.
get domain is determined based on {xt1 , . . . , xtNt }, and denoted The idea of changing the input space is supported by the notion
as ct. In addition, this step lays the basis for the next step where that each input variable is assumed to be determined by some
the number of clusters needs to be provided in advance. hidden compared features. As such, the different distributions
The unlabeled target data are used in this step, and the labeled of the input variables in the two domains must be due to either
target data are used in Step 4 to optimize the mappings by different hidden features or different weights of those features.
modifying the existing rules. Therefore, changing the input variables effectively adjusts the
Step 3: Select the appropriate rules for the target domain. number and weight of these hidden features, so the input distri-
How the source rules in Rs are merged is the key step in bution is more compatible with the target data.
implementing domain adaptation with multiple sources. We Unlike classification tasks, where the results largely depend
have adopted the method of selecting the most appropriate rules on the distribution and structure of the data, regression prediction
from Rs based on the center of the clusters in the target data. tasks rely on more complicated factors. For instance, in a T–S
First, fuzzy C-means is applied to {xt1 , . . . , xtNt } to find the fuzzy regression model, the data distributions only determine the
center of the clusters in the target data, denoted as conditions of the fuzzy rules, i.e., whether or not each instance
v t1 , v t2 , . . . , v tct . (9) adheres to a particular fuzzy rule. The conclusions and the linear
functions are governed by other factors that have a more critical
Then, based on the obtained centers, the ct rules that satisfy impact on the final output. This is also the main reason that
    
Rt = r (v si , asi ) ∈ Rs dist v si , v tk ≤ dist v sj , v tk unsupervised domain adaptation is not feasible for regression
 tasks where only unlabeled data are available. Thus, changing
∀j ∈ {1, . . . , cs} , k = 1, . . . , ct (10)
the output space as a consequence of the fuzzy rules in regression
are selected from Rs . tasks is both essential and effective.

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
3422 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020

Algorithm 1: Homogenous Domain Adaptation using Mul-


tiple Sources.
Input: Rs1 , . . . , Rsh , and T
Output: Y U for T U
1. Combine all the rules in Rs1 , . . . , Rsh to get Rs .
2. Determine the number of rules ct in T using IGMM.
3. Select the rules from Rs to get Rt for target domain.
3.1 Find the centers of clusters in T : {v t1 , . . . , v tct }
3.2 For each v tk , calculate dist(v sj , v tk ), find the
smallest distance and get the corresponding rule
3.3 Obtain the selected rules Rt
Fig. 4. Nonlinear mappings structure. 4. Modify the selected fuzzy rules to fit target data.
4.1 Change the input space
4.2 Change the output space
The method to construct mappings for the input and output 4.3 Change the input and output spaces
space is the same. The nonlinear functions are used to build 4.4 Compare the above three models and choose one
the mappings. The nonlinear function is constructed through a with the best performance
network that is composed of P nodes in the hidden layer and 5. Use the modified and optimized rules to predict labels
a single node in the output layer. The transformation of the jth Y U for T U
input variable of data xtk is shown in Fig. 4 as an example of the
nonlinear mapping for each input variable.
After the transformation mapping, the rules in Rt will take on
The active functions of the nodes in the hidden layer are
a new representation
sigmoid functions, which are dominated by two parameters.
if xtk is Φ(Ai (xtk , v si )), then ykt is Ψ(Li (xtk , asi ))
Therefore, as shown in Fig. 4, the graphical representation of
the transformed jth input variable of data xtk is i = 1, . . . , ct. (13)
P

  The overall algorithm for domain adaptation using multiple
Φj xtkj = wjp ∗ z tkj (12)
sources in homogeneous space is provided in Algorithm 1.
p=1

where wjp indicates the weights of the pth node’s contribution IV. MULTIPLE-SOURCE DOMAIN ADAPTATION IN
1
to the output, and z tkj = −αjp (xt −βjp )
, j = 1, . . . , n, p = HETEROGENEOUS SPACE
1+e kj

1, . . . , P , αjp > 0. This section discusses domain adaptation problems involving


There are three ways to change the T–S model: changing the multiple sources in heterogeneous space. The symbol represen-
input space, changing the output space, and changing both the tations of the heterogeneous domain adaptation problem are
input and output spaces. The application of these methods is provided first to illustrate the method more clearly in the fol-
discussed in our previous paper [32]. But, in summary, we have lowing discussion. Then, the proposed method of implementing
found that using one specific method does not always produce multiple domains knowledge transfer is discussed with detailed
the best performance. Rather, specific datasets are generated to procedures.
simulate different cases of fuzzy rule-based domain adaptation,
and the corresponding approach is applied to modify the T–S
A. Problem Statement
model. For example, one may generate two datasets that have
the same input distributions but different linear functions that Suppose the datasets in h source domains and a target domain
change the output space. Overall, changing the input space is are S 1 , . . . , S h , and T .
superior in cases where the source and target data have different    
input distributions and linear functions due to the optimization S 1 = xs1 s1
1 , y1 , . . . , xs1 s1
Ns1 , yNs1
process for modifying an input or output space. But sometimes ....
optimizing the mapping parameters for the output space can  sh sh   
S h = x1 , y1 , . . . , xsh sh
Nsh , yNsh
cover the gap between the input data. Therefore, selecting the    
best method for modifying the T–S model is problem-oriented T = {T L , T U } = xt1 , y1t , . . . , xtNt1 , yN
t
t1
,
and depends heavily on the datasets. 
Following our previous findings, we suggest the strategy of xtNt1+1 , . . . , xtNt . (14)
trying the three methods and choosing the one with the best
performance. The small number of parameters in Φ and Ψ make Unlike homogeneous cases, the number of features in
optimizing and constructing the transformation mappings highly S 1 , . . . , S h are different from those in T , i.e., the dimensions
efficient, even when the three approaches are implemented of the input data in {xs1 s1 sh sh
1 , . . . , xNs1 }, . . . , {x1 , . . . , xNsh } are
t t
simultaneously. not identical to {x1 , . . . , xNt }. In this article, particularly, we

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3423

Fig. 5. Transfer learning based on combined data.


Fig. 6. Transfer learning based on combined rules.

have concentrated on cases where the h source domains share


the same feature space, but the feature distributions are different. between sets of variables. Therefore, we have applied CCA here
Since the dimensions of the input data in the source domains to extract a latent feature space by using the source and target
are different from the target domain, it is impossible to apply the data with unsupervised learning. Two mappings are then learned
models built for the source domains to solve regression tasks in to map the source and target data to a new latent feature space
the target domain. Furthermore, merging and transferring knowl- with data distributions from all domains that are quite similar.
edge from multiple domains is a more challenging problem. This The data take on new representations as follows:

section presents the multiple-source transfer learning method for S̄ = (x̄s1 , y1s ) , . . . , x̄sNs1+···+ Nsh , yN s
s1+···+ Nsh
heterogeneous space. The specific procedures are described in
    t t  
detail in the following section.
T̄ = T̄ L , T̄ U = x̄1 , y1 , . . . , x̄tNt1 , yN t
t1
,

B. Transfer Learning With Multiple-Source Domains in
x̄tNt1+1 , . . . , x̄tNt . (16)
Heterogeneous Space
Since the knowledge transfer in heterogeneous space is much Step A.3: Construct the source model using the source data
more complicated and challenging, knowledge in the source with the new representation.
domains needs to be transferred in more than one way to guar- Based on the combined source dataset S̄, a T–S fuzzy model
antee optimal results. We have incorporated two different forms is built and a set of fuzzy rules is obtained
of knowledge transfer in our method: data transfer and rules if x̄sk isAi (x̄sk , v̄ si ), then yks is Li (x̄sk , āsi ) i = 1, . . . , ct
transfer. (17)
The processes for using knowledge from the source domains where ct is the number of clusters in the target data (IGMM
in the form of data and rules are shown in Figs. 5 and 6, applied).
respectively. For simplicity, h is equal to two as an example. Step A.4: Transfer the fuzzy rules from the combined source
The approach for transferring knowledge in the form of data domain to the target domain.
comprises four steps. The fuzzy rules of the source domain obtained from Step A.3
Step A.1: Combine all the data in h source domains indistin- are modified by changing the input or output space, and the
guishably, denoted as S transformation mapping parameters are optimized using T̄ L .

The rules are transferred to fit the target data.
S = (xs1 , y1s ) , . . . , xsNs1+···+ Nsh , yN
s
s1+···+ Nsh
. (15)
if xtk is Φ(Ai (xtk , v̄ si )), then ykt is Ψ(Li (xtk , āsi ))
Step A.2: Extract the latent feature space Ls of the combined i = 1, . . . , ct. (18)
source and target domains using canonical correlation analysis
(CCA). CCA connects two sets of variables by finding linear The approach for knowledge transfer in the form of rules
combinations of variables that maximally correlate. Typically, contains five steps.
CCA has two purposes: data reduction by explaining the co- Step B.1: Extract the latent feature space from the source
variation between two sets of variables using a small number of domains separately.
linear combinations; and data interpretation by finding features Based on each source data {xsj sj
1 , . . . , xNsj }(j = 1, . . . , h)
(canonical variates) that are important for explaining covariation and target data {xt1 , . . . , xtNt }, apply CCA and extract a latent

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
3424 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020

feature space Lsj , and the data in S j and T are converted to TABLE I
 DATASETS USED IN EACH OF THE NINE EXPERIMENTS
S̃ j = x̃sj
1 , y1
sj
, . . . , x̃sj sj
Nsj , yNsj
 
T̃ j = T̃ jL , T̃ jU = x̃tj
1 , y1
tj
, . . . , x̃tj tj
Ntj , yNtj ,

x̃tj tj
Ntj+1 , . . . , x̃Nj . (19)

Note that, with this technique, the dimensions of the h latent


feature spaces must be the same.
Step B.2: The T–S fuzzy models for the h source domains
are built in the new latent feature space separately. And, h sets Algorithm 2: Heterogeneous Domain Adaptation using
of fuzzy rules are constructed correspondingly. Suppose the Multiple Sources.
obtained fuzzy rules are Input: S 1 , . . . , S h and T
  Output: Y U for T U
RS̃ 1 = r̃ s1 s1 s1
1 , r̃ 2 , . . . , r̃ c1 1. Apply data-based multiple-source transfer
... 1.1 Combine S 1 , . . . , S h to S
  1.2 Extract Ls using S and T , and convert S and T to
RS̃ h = r̃ sh sh sh
1 , r̃ 2 , . . . , r̃ ch . (20) S̄ and T̄
1.3 Train fuzzy rules using S̄
The rules in RS̃ j (j = 1, . . . , h) are represented as
1.4 Modify rules using T̄ L
if x̃sj sj sj sj sj sj
k is Ai (x̃k , ṽ i ), then yk is Li (x̃k , ãi ) 1.5 Get labels for T L
i = 1, . . . , cj (21) 2. Implement rule-based multiple-source transfer
2.1 Extract Lsj using S j and T , j = 1, . . . , h
where cj is numbers of clusters in the jth source domains,
2.2 Convert S j to S̃ j , and train rules RS̃ j
respectively.
2.3 Combine RS̃ 1 , . . . , RŜ h to get RS
Step B.3: Combine the fuzzy rules in the source domains.
2.4 Modify the rules in RS by constructing mappings
Since the dimensions of the h latent feature spaces are iden-
2.5 Select rules from RS and modify them
tical, the fuzzy rules for the source domains can be easily com-
2.6 Get labels for T L
bined, denoted as RS = {r s1 , r s2 , . . . , r sc1+···+ch }. Each rule in
3. Compare the results in 1.5 and 2.6, and select the
RS is subsequently represented as
better model
if x̃sk is Ai (x̃sk , ṽ si ), then yks is Li (x̃sk , ãsi )
4. Predict labels for T U
i = 1, . . . , c1 + · · · + ch. (22)
Step B.4: Modify the fuzzy rules in RS to suit the target data V. EXPERIMENTS IN HOMOGENEOUS SPACE
using the same process as in Step A.4.
We executed a set of experiments to validate and analyze
Step B.5: Select the ct rules from RS using formula (10) and
the presented method in dealing with domain adaptation prob-
modify the rules.
lems when multiple-source domains are available. Section V-A
In homogeneous situations, not all rules in the combined set
explains how the synthetic datasets were designed and imple-
are used in the knowledge transfer process. Rather only some
mented to simulate multiple-source scenarios, along with our
rules are selected and then modified for the target domain. In
experiments for verifying the effectiveness of this new method
heterogeneous situations, transferring features from the original
and a discussion on the application scope for multiple sources.
space into the new latent feature space will inevitably result in
The experiments in Section V-B involve real-world datasets and
some information loss, giving rise to uncertainty in the domain
compare the performance of our method with several state-of-
adaptation process. Therefore, to guarantee the best quality
the-art methods on multiple-source domain adaptation prob-
transfer, a strategy that uses all the combined rules and a strategy
lems. The sensitivity of the parameters is also analyzed with
that only relies on selected rules should be tested, and the strategy
practical cases.
with the best results should be chosen.
In the two approaches, Step A.1 to Step A.4 and Step B.1
to Step B.5, a latent feature space is used to transform all A. Experiments on Synthetic Datasets
the data into a unified dimensional space so as to convert the Several synthetic datasets were generated to simulate different
heterogeneous transfer learning problem into a homogeneous multiple-source transfer learning scenarios. Although the real
transfer learning problem. Therefore, applying the rules adap- cases may be quite different from the scenarios we created
tation method in Algorithm 1 to implement the following rules using the synthetic datasets, the results and patterns obtained
transfer means this procedure does not have to be repeated in do provide some guidance in knowledge transfer for practical
Steps A.4 and B.5. cases.
The algorithm for domain adaptation using multiple sources There are two intuitive baselines for transfer learning prob-
in heterogeneous space is provided in Algorithm 2. lems with multiple sources. The first baseline is a model that

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3425

TABLE II
RMSE OF THREE TYPES TRANSFER LEARNING (NO, SINGLE, AND MULTIPLE)

contains all the fuzzy rules from all source domains. The second
is a single source domain model. We evaluated the performance
of our method with three models: no-transfer models, single-
source transfer models, and multiple-source transfer models.
A no-transfer model means the source model is used directly
to solve the target task. A single-source transfer model indi-
cates that only one domain has been used as the source, and a
multiple-source transfer model obviously means that knowledge
is leveraged from multiple-source domains to support regression
tasks in the target domain.
Fig. 7. Scenario A: Different distributions in three domains.
In our experiments, all models were tested on unlabeled target
datasets T U to verify the models’ ability to solve regression tasks
in the target domain.
In this set of experiments, we generated four datasets with
two, three, four, and five clusters. More details about how the
datasets were generated can be found in our previous paper
[35]. For each experiment, we chose two datasets to serve as the
source domains and one to serve as the target domain. Table I
lists the datasets used for each experiment with each dataset
denoted by the number of clusters it contains. For example, in
Experiments 1–3, the dataset with two clusters is selected as
the target domain, and two of the remaining three datasets are
Fig. 8. Scenario B: Similar distributions in three domains.
selected as the source domains, which results in three configu-
rations. Similar operations were conducted for Experiments 4–6
and 7–9. The reason that the dataset with five clusters was not The results show that the no-transfer method returned high
chosen as the target domain is that the aim is to construct an mean values, which represents the gap between the source
environment where there are less rules in the target domain than domains and the target domain. Comparing the two forms of
in the combined source domains. A sufficient number of rules is multiple-source transfer, selecting a set of appropriate rules
beneficial in the selection process and also creates a guarantee worked better than the brute force method of combining all
of the model performance in the target domain. As such, nine the rules, which serves as a clear indication that combining
different experimental configurations are shown in Table I, and every fuzzy rule leads to redundancy and inferior results. For
nine groups of results are shown in Table II. the most part, the multiple-source method also worked better
We tested these nine dataset configurations in Table II with than single-source transfer. Experiments 8 and 9 were the ex-
three types of models—no transfer, single transfer, and multiple ceptions. Upon further analysis, we attribute the success of the
transfer. The “no-transfer model” actually contains two models, single-source method in these two cases to the poor selection of
one prediction model for each of the two source domains. rules, but this does highlight that multiple-source transfer has
Similarly, the “single-source transfer” involves two models. some limitations.
The “multiple-source transfer” models also contain two mod- Hence, in the following part, we explore the scope of appli-
els: one is the T–S model with all the fuzzy rules from both cations to provide some guidance for practical uses of multiple-
source domains; the other is the model built using our proposed source transfer learning.
method. Root mean square error (RMSE) is used to measure Three experiments are implemented in this part. The input
the regression performance. All models were constructed using datasets in each experiment are shown in Figs. 7–9. Each figure
five-cross validation; therefore, the results are shown in the form represents a different input data scenario, where the points in
of “mean±variance.” The results with the best performance are blue indicate input data from Source Domain 1, yellow indicates
in highlighted in bold. Source Domain 2, and red indicates the target input data. The

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
3426 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020

TABLE III
TRANSFER PERFORMANCES OF THREE SCENARIOS IN FIGS. 7–9

data, the single transfer method will likely provide the best
performance. Furthermore, the closer the source domain to the
target domain, the better the transfer result. Since, in real-world
applications, it is difficult to identify the structure, especially in
high-dimensional datasets, our method could play a significant
role in practical multiple-source cases.
In all the abovementioned experiments, for simplicity, the
source domains are designed to have the same number of data.
Please note that unbalanced data in multiple-source domains will
not affect the performance of the method, since the performance
Fig. 9. Scenario C: Similar distributions in Source 1 and Target, but different of the source model can be guaranteed as long as the source data
in Source 2. covers all the clusters.

linear functions used to generate these datasets are the same in B. Experiments on Real-World Datasets
the source and target domains. In this section, we used real-world datasets to validate the
In Scenario A, the distributions of all three domains are quite effectiveness of the proposed multiple-source transfer method.
dissimilar, and both Source Domain 1 and Source Domain 2 Since the studies on domain adaptation with regression problems
are very different from the target data, as shown in Fig. 7. The are scarce, there are no public datasets for these scenarios. We,
results in Table III show that our method of combining rules from therefore, turned to five datasets from the UCI machine learning
multiple-source domains has the best performance in this case. repository and modified them to simulate a range of multiple-
Scenarios B and C are two special cases that are more appli- source transfer learning scenarios. Since how the datasets were
cable to a single-source transfer method. However, they require modified is crucial, a detailed description follows using two
a strict condition—the data structures in all the domains must datasets as examples.
be identified. The “condition-based maintenance (CBM) of naval propul-
In Scenario B, the distributions of all the three domains are sion plants” dataset contains 14 features, such as ship speed,
quite similar, but the discrepancies between the two source do- gas turbine shaft torque, and so on. These features were used
mains and the target domain are different, as shown in Fig. 8. The to predict gas turbine decay state coefficients. We split the data
results show that the single-source transfer methods are superior according to ship speed; speeds greater than ten knots formed
to the multisource transfer methods, and that the single-source the source domains (7500 instances), and the remaining 3500
transfer methods based on Source Domain 2 performed the best. instances were used as the target domain. The source instances
This is because the data in both source domains have similar were further divided into 4000 for Source 1 and 3500 for
distributions to the target domain, and Source Domain 2 is more Source 2. All instances in the source domains were labeled with
similar to the target domain than Source Domain 1. only ten labeled instances in the target domain.
In Scenario C, only one source domain, Source Domain 1, The “combined cycle power plant” (CCPP) dataset contains
has a similar data structure to the target data, while Source four attributes: temperature, ambient pressure, relative humid-
Domain 2 has a dissimilar data structure, as shown in Fig. 9. ity, and exhaust vacuum, which were used to predict the net
Thus, the single-transfer method with Source Domain 1 would hourly electrical energy output. A total of 6800 instances with
be superior to the other methods. a temperature of not greater than 25° formed Source Domains
The results of implementing transfer learning with different 1 and 2, evenly split into groups of 3400. The remaining 2500
models are shown in Table III. The results with the best perfor- instances formed the target domain. Again, all source instances
mance are indicated in bold. were labeled, and ten target instances were labeled.
Analyzing the results from the abovementioned three sce- The other three datasets are “Istanbul stock exchange
narios, we can draw two conclusions. First, if all the source dataset,” “air quality dataset,” and “airfoil self-noise dataset.”
domains have a dissimilar structure to the target domain or the For more details, please refer to the UCI machine learning
relationships of data structures between domains are implicit, repository.
then selecting an appropriate subset of rules from multiple- We performed two groups of experiments to both validate
source domains would be the optimal choice. Second, if a source our method and analyze the impact of the number of clusters.
domain exists with a data structure that is similar to the target In the first set of experiments, we compared our method with

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3427

TABLE IV
COMPARISON OF OUR METHOD WITH TSK, TCA, SA, AND GFK

TABLE V
PERFORMANCE WITH VARYING NUMBERS OF CLUSTERS (RULES) IN TARGET DOMAIN

TABLE VI
TRANSFER PERFORMANCE (RMSE) IN FIVE CITIES IN THE YEAR 2013

some state-of-the-art methods in transfer learning, i.e., TSK [36], wind direction, cumulated wind speed, hourly precipitation, and
TCA [37], SA [38], and GFK [39]. All these methods are able cumulated precipitation. The 13 attributes are used as the inputs
to solve both classification and regression tasks but have not to predict PM 2.5 concentration.
been presented as solutions for multiple-source situations. To To simulate the multisource transfer learning scenario, four
be fair, we used these methods with combined data from all groups of experiments have been designed in the new version to
source domains for knowledge transfer. implement and validate our method in handling multiple sources.
Although there are some methods and heuristic algorithms for The transfer performance of these four groups experiments are
determining the number of clusters, it is often difficult to identify shown in Tables VI to IX. The first group experiments execute
the number of clusters to use when constructing a T–S model the knowledge transfer among five cities in year 2013, and
with real-world data—especially those with high-dimensional Table VI displays the transfer performance (RMSE). The third
data. Hence, in the second set of experiments, we treated the column in Table VI indicates the city that is selected as the
number of clusters as a hyperparameter and discuss its impact target domain, and the second column shows the two cities that
on the transfer learning results. are chosen from the remaining cities as the source domains.
The results of the two groups of experiments are shown in Two types of transfer learning methods: single-source transfer
Tables IV and V. The results show superior performance by our and multiple-source transfer are implemented. Here, the “single-
proposed method on all five datasets. Table V shows there is source transfer” involves two models: one is transferred with
no obvious impact on prediction accuracy with a different num- fuzzy rules from Source Domain 1, and the other is transferred
ber of clusters. However, in most cases, the best performance with fuzzy rules from Source Domain 2. The “multiple-source
appeared with fewer clusters. transfer” contains three models: 1) combining all the data across
Beside the five datasets, we also applied a new, large, and more the source domains, 2) combining the rules from the source
complex problem of predicting PM 2.5 concentration in different domains, and 3) selecting rules from the source domains using
cities to further validate our transfer learning method in multiple- our proposed method. Similarly, experiments on Tables VII and
source scenario. The dataset contains PM 2.5 data and related VIII implement transfer learning in five cities in year 2014 and
meteorological data in five big cities [Beijing (BJ), Shanghai 2015. The last group of experiments in Table IX uses the data
(SH), Guangzhou (GZ), Chengdu (CD) and Shenyang (SY)] in from years 2013 and 2014 to predict the PM 2.5 concentration
China from year 2013 to 2015. Beside the values of PM 2.5, there in year 2015.
are 13 main attributes to describe the data: year, month, day, hour, The RMSE shown in Tables VI to IX indicates that our pro-
season, dew point, temperature, humidity, pressure, combined posed method is superior to the single transfer leaning methods

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
3428 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020

TABLE VII
TRANSFER PERFORMANCE (RMSE) IN FIVE CITIES IN THE YEAR 2014

TABLE VIII
TRANSFER PERFORMANCE (RMSE) IN FIVE CITIES IN THE YEAR 2015

TABLE IX
TRANSFER DATA FROM YEARS 2013 AND 2014–2015

TABLE X TABLE XI
DATASET WITH 3-D SOURCE DATA AND 2-D TARGET DATA DATASET WITH 4-D SOURCE DATA AND 3-D TARGET DATA

and the other two multiple-source transfer learning methods. The parameters in Tables X and XI show that, although the
This further validates the effectiveness of our method. source and target domains have different dimensions, there
are always some shared features with similar values, which
VI. EXPERIMENTS IN HETEROGENEOUS SPACE
represent the relevance between the domains. However, there
Our experiments in heterogeneous settings also involved syn- is always a domain that has quite different linear coefficients
thetic and real-world datasets. than the other two.
Table XII implements the experiments where data in the
A. Synthetic Datasets
source domains are 3-D and data in the target domain are 2-D.
We designed two groups of experiments: one with three- And in Table XIII, the source data are 4-D and the target data are
dimensional (3-D) data in the source domains and 2-D data in 3-D. Tables XII and XIII show the RMSE of the ten experiments
the target domain; the other with 4-D data as the source and for each group using the single-source transfer method plus three
3-D data as the target. Ten experiments were executed in each variations of the multiple-source transfer method: combining all
group. The settings for one of the ten experiments are provided the data across the source domains, combining the rules from the
in Tables X and XI as an example to illustrate the data structures source domains, and selecting only some rules from the source
in the three domains. domains. The best results are indicated in bold.

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3429

TABLE XII
HETEROGENEOUS TRANSFER WITH 3-D SOURCE DATA AND 2-D TARGET DATA

TABLE XIII
HETEROGENEOUS TRANSFER WITH 4-D SOURCE DATA AND 3-D TARGET DATA

TABLE XIV The feature values for all three domains are similar, but the
DATASET FOR SHOWING LIMITATION OF MULTIPLE-SOURCE TRANSFER
most important thing is that the coefficients of linear functions
in two source domains are exactly the same, and they are also
almost equal to the target domain.
The results of these ten experiments are shown in Table XV.
As the results show, the multiple-source method was inferior
to the single-source method in each of these situations and
highlights that proper domain selection is key to producing good
results when drawing on multiple-source domains to transfer
knowledge. Selecting an appropriate domain is a crucial problem
and, therefore, will be studied in future work.

Table XII shows the best results in eight of the experi-


B. Real-World Datasets
ments were obtained by combining all the data from the source
domains. Combining all the rules from both source domains For this set of experiments, we used the “airfoil self-noise”
worked best in the remaining experiments. However, Table XIII dataset from UCI, divided according to frequency. Data with a
shows that only selecting some rules was advantageous in most frequency of greater than 800 Hz formed the source domains
of the experiments, while two experiments benefited from com- (900 instances split equally between Sources 1 and 2), the
bining all the data in the source domains. remaining 450 instances were used as the target domain. Five
The results support our conclusion with the synthetic datasets, attributes—frequency, angle of attack, chord length, free-stream
i.e., our method, using knowledge from multiple-source do- velocity, and suction side displacement thickness, were used
mains, has good performance in dealing with multiple-source to predict scaled sound pressure levels. We removed suction
situations. But, the setting of the two groups of experiments side displacement thickness in the target domain to replicate a
cannot cover all cases, and there are some cases where a multiple- heterogeneous setting. All the instances in the source domains
source method is not suitable. The following experiments illus- were labeled; ten instances in the target domain were labeled.
trate such inappropriate situations. The results are shown in Table XVI with the best results in bold.
Following the same procedures, we designed ten experiments As shown, combining all the data in the source domains gave
and provide the data structures for one experiment in Table XIV the best performance in half of the experiments and selecting
as an example. some of the rules gave the best performance in the remaining

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
3430 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020

TABLE XV
RESULTS OF EXPERIMENTS FOR LIMITATION OF MULTIPLE-SOURCE TRANSFER

TABLE XVI
RESULTS FOR THE HETEROGENEOUS REAL-WORLD DATASETS

half. Taken overall, we therefore conclude that leveraging multi- domain selection is a key factor in the methods’ efficacy, which
ple domains as sources performs better than using a single source we intend to examine as an effective way to avoid negative
in heterogeneous situations. transfer in future studies.

VII. CONCLUSION AND FURTHER STUDY REFERENCES


[1] N. M. Nasrabadi, “Pattern recognition and machine learning,” J. Electron.
This article explores transfer learning problems when Imag., vol. 16, no. 4, 2007, Art. no. 049901.
multiple-source domains are available. In homogeneous space, [2] A. Ioannidou, E. Chatzilari, S. Nikolopoulos, and I. Kompatsiaris, “Deep
our method is based on combining the fuzzy rules in the source learning advances in computer vision with 3d data: A survey,” ACM
Comput. Surv., vol. 50, no. 2, 2017, Art. no. 20.
domains, selecting some of those rules, and modifying them [3] C. Angermueller, T. Pärnamaa, L. Parts, and O. Stegle, “Deep learning for
to handle regression tasks in a target domain based on labeled computational biology,” Mol. Syst. Biol., vol. 12, no. 7, 2016, Art. no. 878.
target data. We further generalized the idea of using multiple- [4] S. Hoo-Chang et al., “Deep convolutional neural networks for computer-
aided detection: CNN architectures, dataset characteristics and trans-
source domains to suit heterogeneous space. Unlike homoge- fer learning,” IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1285–1298,
neous cases, where only fuzzy rules are available for transfer, in May 2016.
heterogeneous cases, the source data are also available. This data [5] J. Heaton, N. Polson, and J. H. Witte, “Deep learning for finance: deep
portfolios,” Appl. Stochastic Models Bus. Ind., vol. 33, no. 1, pp. 3–12,
are used to extract a shared latent feature space for transfer along 2017.
with the rules. Both methods rely on the same basic procedures. [6] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl.
We conducted experiments on synthetic datasets to simulate Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
[7] A. Liu, Y. Su, W. Nie, and M. S. Kankanhalli, “Hierarchical clustering
complex cases of knowledge transfer, and the results validate multi-task learning for joint human action grouping and recognition,” IEEE
that our methods have better performance than no-transfer or Trans. Pattern Anal. Mach. Intell., vol. 39, no. 1, pp. 102–114, Jan. 2017.
single-source transfer approaches. Further experiments using [8] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discrimi-
native domain adaptation,” Comput. Vis. Pattern Recognit., vol. 1, no. 2,
real-world datasets support our findings and show that our pp. 7167–7176, 2017.
method of drawing on multiple-source domains provides su- [9] S. J. Pan, X. Ni, J.-T. Sun, Q. Yang, and Z. Chen, “Cross-domain sentiment
perior performance compared to some state-of-the-art methods classification via spectral feature alignment,” in Proc. 19th Int. Conf. World
Wide Web, 2010, pp. 751–760.
in transfer learning. [10] L. Shao, F. Zhu, and X. Li, “Transfer learning for visual categorization: A
The methods presented in this article aim to deal with transfer survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 5, pp. 1019–
learning in situations with multiple-source domains, especially 1034, May 2015.
[11] M. E. Taylor and P. Stone, “Transfer learning for reinforcement learning
source domains that share the same feature space. In future domains: A survey,” J. Mach. Learn. Res., vol. 10, pp. 1633–1685, 2009.
studies, we plan to study more complicated cases—for example, [12] D. Cook, K. D. Feuz, and N. C. Krishnan, “Transfer learning for activity
where the dimensions of multiple-source domains are also dif- recognition: A survey,” Knowl. Inf. Syst., vol. 36, no. 3, pp. 537–556,
2013.
ferent. In addition, we will explore wider applications for these [13] J. Lu, V. Behbood, P. Hao, H. Zuo, S. Xue, and G. Zhang, “Transfer
transfer learning techniques, such as activity recognition and learning using computational intelligence: A survey,” Knowl.-Based Syst.,
robotics. Finally, these experiments have revealed that source vol. 80, pp. 14–23, 2015.

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3431

[14] W. Pan, “A survey of transfer learning for collaborative recommendation [37] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via
with auxiliary data,” Neurocomputing, vol. 177, pp. 447–453, 2016. transfer component analysis,” IEEE Trans. Neural Netw., vol. 22, no. 2,
[15] L. Wen, X. Li, and L. Gao, “A transfer convolutional neural network for pp. 199–210, Feb. 2011.
fault diagnosis based on ResNet-50,” in Neural Computing and Applica- [38] B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars, “Unsupervised
tions. Berlin, Germany: Springer, 2019, pp. 1–14. visual domain adaptation using subspace alignment,” in Proc. IEEE Int.
[16] Z. Lu, Y. Zhu, S. J. Pan, E. W. Xiang, Y. Wang, and Q. Yang, “Source free Conf. Comput. Vis., 2013, pp. 2960–2967.
transfer learning for text classification,” in Proc. 28th AAAI Conf. Artif. [39] B. Gong, Y. Shi, F. Sha, and K. Grauman, “Geodesic flow kernel for un-
Intell., 2014, pp. 122–128. supervised domain adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern
[17] R. Collobert and J. Weston, “A unified architecture for natural language Recognit., 2012, pp. 2066–2073.
processing: Deep neural networks with multitask learning,” in Proc. 25th
Int. Conf. Mach. Learn., 2008, pp. 160–167.
[18] L. Wen, L. Gao, and X. Li, “A new deep transfer learning based on sparse
auto-encoder for fault diagnosis,” IEEE Trans. Syst., Man, Cybern., Syst., Jie Lu (F’18) received the Ph.D. degree in informa-
vol. 49, no. 1, pp. 136–144, Jan. 2019. tion systems from the Curtin University of Technol-
[19] R. R. Yager and L. A. Zadeh, An Introduction to Fuzzy Logic Applications ogy, Perth, WA, Australia, in 2000.
in Intelligent Systems. Berlin, Germany: Springer, 2012. She is currently a Distinguished Professor, the Di-
[20] J. Shell and S. Coupland, “Fuzzy transfer learning: Methodology and rector of the Centre for Artificial Intelligence, and
application,” Inf. Sci., vol. 293, pp. 59–79, 2015. the Associate Dean (Research Excellence) with the
[21] V. Behbood, J. Lu, and G. Zhang, “Fuzzy refinement domain adaptation for Faculty of Engineering and Information Technology,
long term prediction in banking ecosystem,” IEEE Trans. Ind. Informat., University of Technology Sydney, Sydney, NSW,
vol. 10, no. 2, pp. 1637–1646, May 2014. Australia. She has authored or coauthored six re-
[22] Z. Deng, Y. Jiang, F.-L. Chung, H. Ishibuchi, and S. Wang, “Knowledge- search books and more than 450 papers in refereed
leverage-based fuzzy system and its modeling,” IEEE Trans. Fuzzy Syst., journals and conference proceedings. Her main re-
vol. 21, no. 4, pp. 597–609, Aug. 2013. search interests are in the areas of fuzzy transfer learning, concept drift, decision
[23] Z. Deng, Y. Jiang, H. Ishibuchi, K.-S. Choi, and S. Wang, “Enhanced support systems, and recommender systems.
knowledge-leverage-based TSK fuzzy system modeling for inductive Dr. Lu is an IFSA Fellow and Australian Laureate Fellow. She has won
transfer learning,” ACM Trans. Intell. Syst. Technol., vol. 8, no. 1, 2016, more than 20 ARC Laureate, ARC Discovery Projects, government and in-
Art. no. 11. dustry projects. She serves as the Editor-in-Chief for Knowledge-Based Systems
[24] V. Behbood, J. Lu, and G. Zhang, “Fuzzy bridged refinement domain (Elsevier) and International Journal of Computational Intelligence Systems. She
adaptation: Long-term bank failure prediction,” Int. J. Comput. Intell. has delivered more than 25 keynote speeches at international conferences and
Appl., vol. 12, no. 01, 2013, Art. no. 1350003. chaired 15 international conferences. She was the recipient of various awards
[25] V. Behbood, J. Lu, G. Zhang, and W. Pedrycz, “Multistep fuzzy bridged such as the UTS Medal for Research and Teaching Integration (2010), the UTS
refinement domain adaptation algorithm and its application to bank failure Medal for Research Excellence (2019), the Computer Journal Wilkes Award
prediction,” IEEE Trans. Fuzzy Syst., vol. 23, no. 6, pp. 1917–1935, (2018), the IEEE Transactions on Fuzzy Systems Outstanding Paper Award
Dec. 2015. (2019), and the Australian Most Innovative Engineer Award (2019).
[26] F. Liu, J. Lu, and G. Zhang, “Unsupervised heterogeneous domain adap-
tation via shared fuzzy equivalence relations,” IEEE Trans. Fuzzy Syst.,
vol. 26, no. 6, pp. 3555–3568, Dec. 2018.
[27] Y. Yao and G. Doretto, “Boosting for transfer learning with multiple Hua Zuo received the Ph.D. degree in computer
sources,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2010, science from the University of Technology Sydney,
pp. 1855–1862. Sydney, NSW, Australia, in 2018.
[28] B. Tan, E. Zhong, E. W. Xiang, and Q. Yang, “Multi-transfer: Transfer She is currently a Lecturer with the Faculty of
learning with multiple views and multiple sources,” in Proc. SIAM Int. Engineering and Information Technology, University
Conf. Data Mining, 2013, pp. 243–251. of Technology Sydney. Her research interests include
[29] F. Zhuang, X. Cheng, S. J. Pan, W. Yu, Q. He, and Z. Shi, “Transfer transfer learning and fuzzy systems.
learning with multiple sources via consensus regularized autoencoders,” Dr. Zuo is currently a member of the Decision
in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases, 2014, Systems and e-Service Intelligence (DeSI) Research
pp. 417–431. Laboratory, Centre for Artificial Intelligence, Univer-
[30] H. Zuo, J. Lu, G. Zhang, and F. Liu, “Fuzzy transfer learning using an sity of Technology Sydney.
infinite Gaussian mixture model and active learning,” IEEE Trans. Fuzzy
Syst., vol. 27, no. 2, pp. 291–303, Feb. 2019.
[31] H. Zuo, G. Zhang, and J. Lu, “Semi-supervised transfer learning in Takagi- Guangquan Zhang received the Ph.D. degree in
Sugeno fuzzy models,” in Proc. 13th Int. Flins Conf. Data Sci. Knowl. Eng. applied mathematics from the Curtin University of
Sens. Decis. Support, 2018, vol. 11, pp. 316–322. Technology, Perth, WA, Australia, in 2001.
[32] H. Zuo, G. Zhang, W. Pedrycz, V. Behbood, and J. Lu, “Granular fuzzy He is currently an Associate Professor and the
regression domain adaptation in Takagi–Sugeno fuzzy models,” IEEE Director of the Decision Systems and e-Service
Trans. Fuzzy Syst., vol. 26, no. 2, pp. 847–858, Apr. 2018. Intelligent (DeSI) Research Laboratory, Center for
[33] H. Zuo, J. Lu, G. Zhang, and W. Pedrycz, “Fuzzy rule-based domain Artificial Intelligence, Faculty of Engineering and
adaptation in homogeneous and heterogeneous spaces,” IEEE Trans. Fuzzy Information Technology, University of Technology
Syst., vol. 27, no. 2, pp. 348–361, Feb. 2019. Sydney, Sydney, NSW, Australia. He has authored
[34] M. L. Hadjili and V. Wertz, “Takagi-Sugeno fuzzy modeling incorporating four monographs, five textbooks, and 300 papers
input variables selection,” IEEE Trans. Fuzzy Syst., vol. 10, no. 6, pp. 728– including 154 refereed international journal papers.
742, Dec. 2002. His research interests include fuzzy sets and systems, fuzzy optimization, fuzzy
[35] H. Zuo, G. Zhang, W. Pedrycz, V. Behbood, and J. Lu, “Fuzzy regression transfer learning, and fuzzy modeling in machine learning and data analytics.
transfer learning in Takagi–Sugeno fuzzy models,” IEEE Trans. Fuzzy Dr. Zhang has won eight Australian Research Council (ARC) Discovery
Syst., vol. 25, no. 6, pp. 1795–1807, Dec. 2017. Projects grants and many other research grants. He was awarded an ARC QEII
[36] C. Yang, Z. Deng, K.-S. Choi, and S. Wang, “Takagi–Sugeno–Kang trans- fellowship. He was a Guest Editor of eight special issues of IEEE Transactions
fer learning fuzzy logic system for the adaptive recognition of epileptic and other international journals and has cochaired several international con-
electroencephalogram signals,” IEEE Trans. Fuzzy Syst., vol. 24, no. 5, ferences and workshops in the area of fuzzy decision-making and knowledge
pp. 1079–1094, Oct. 2016. engineering.

Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.

You might also like