Transfer Learning
Transfer Learning
Abstract—Transfer learning is gaining increasing attention due relates to the original domain. The first survey, which is state-of-
to its ability to leverage previously acquired knowledge to assist in the-art, on transfer learning [6] provides important definitions in
completing a prediction task in a related domain. Fuzzy transfer transfer learning. As part of this review, transfer learning studies
learning, which is based on fuzzy systems and particularly fuzzy
rule-based models, was developed due to its capacity to deal with are categorized into multitask learning [7], domain adaptation
uncertainty. However, one issue with fuzzy transfer learning, even [8], and cross-domain learning [9]. However, as this area has
in the area of general transfer learning, has not been resolved: how attracted many researchers, more methods are being developed
to combine and then use knowledge when multiple-source domains to handle transfer learning problems, and survey papers are
are available. This study presents new methods for merging fuzzy beginning to focus on precise areas, e.g., visualization [10],
rules from multiple domains for regression tasks. Two different
settings are separately explored: homogeneous and heterogeneous reinforcement learning [11], activity recognition [12], computa-
space. In homogeneous situations, knowledge from the source do- tional intelligence [13], and collaborative recommendation [14].
mains is merged in the form of fuzzy rules. In heterogeneous situa- The current applications for transfer learning techniques are
tions, knowledge is merged in the form of both data and fuzzy rules. extensive—from image processing [15] to text categorization
Experiments on both synthetic and real-world datasets provide [16] to natural language processing [17] to fault diagnosis [18]
insights into the scope of applications suitable for the proposed
methods and validate their effectiveness through comparisons with and beyond.
other state-of-the-art transfer learning methods. An analysis of Yet, while these existing methods have had some success
parameter sensitivity is also included. in handling domain adaptation issues, most ignore the inher-
Index Terms—Domain adaptation, fuzzy systems, machine ent phenomenon of uncertainty—a crucial factor during the
learning, regression, transfer learning. knowledge transfer process [19]. There is a clear codependency
between the level of certainty in learning a task and the amount
I. INTRODUCTION of information that is available. Problems with too little infor-
ACHINE learning [1] has deeply affected the great mation have a high degree of uncertainty. If there are too few
M achievements gained in many areas of data science,
including computer vision [2], biology [3], medical imaging
data with labels in the target domain, only a finite amount of
information can be extracted, and this leads to a high degree
[4], and business management [5]. However, fundamentally, of uncertainty. However, the emergence of fuzzy systems has
many well-known machine-learning algorithms, such as neural shown promising results in overcoming this problem [20].
networks, support vector machine (SVM), and Bayesian net- The integration of fuzzy logic with transfer learning has drawn
work, are supervised processes, which means the performance considerable attention in the literature. For example, researchers
and generalizability of the resulting models tend to rely on have applied fuzzy sets to represent linguistic variables when
massive amounts of labeled data. Unfortunately, in some fields, feature values cannot be precisely described numerically, while
especially in new and emerging areas of business, gathering fuzzy distances assist the retrieval of similar cases [21]. Trans-
enough labeled data to train a model properly is difficult, even ferring implicit and explicit knowledge from similar domains is
impossible. Without enough labeled data, the accuracy and hidden and uncertain by nature, so using fuzzy logic and fuzzy
generalizability of a model suffers. Thus, transfer learning [6] rule theory to handle the associated vagueness and uncertainty
has emerged as a potential solution. is apt and can improve transfer accuracy. Hence, many scholars
Transfer learning, in general, addresses the problem of how have turned to fuzzy systems as a solution to transfer learning
to leverage previously acquired knowledge to improve the effi- problems with promising results. Deng et al. [22], [23] proposed
ciency and accuracy of learning in one domain that in some way a series of transfer learning methods using a Mamdani–Larsen-
type fuzzy system and a Takagi–Sugeno–Kang (TSK) fuzzy
Manuscript received January 27, 2019; revised June 17, 2019 and September model coupled with novel fuzzy logic algorithms that include
3, 2019; accepted October 31, 2019. Date of publication November 11, 2019; definitions for two new objective functions. Furthermore, they
date of current version December 1, 2020. This work was supported by the
Australian Research Council under Discovery Grant DP 170101632. (Corre- applied their methods to scenarios with insufficient data, such
sponding author: Jie Lu.) as recognizing electroencephalogram signals in environments
The authors are with the Decision Systems and e-Service Intelligence Labora- with a data shortage. Behbood et al. [24], [25] proposed a
tory, Centre for Artificial Intelligence, Faculty of Engineering and Information
Technology, University of Technology Sydney, Sydney, NSW 2007, Australia fuzzy-based transfer learning approach to long-term bank failure
(e-mail: [email protected]; [email protected]; [email protected]. prediction models with source and target domains that have dif-
au). ferent data distributions. Liu et al. [26] focused on unsupervised
Color versions of one or more of the figures in this article are available online
at http://ieeexplore.ieee.org. heterogeneous domain adaptation problems, presenting a novel
Digital Object Identifier 10.1109/TFUZZ.2019.2952792 transfer learning model that incorporates n-dimensional fuzzy
1063-6706 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3419
geometry and fuzzy equivalence relations. A metric based on n- We propose two algorithms to handle knowledge transfer from
dimensional fuzzy geometry is defined to measure the similarity multiple-source domains to one target domain for regression
of features between domains. Shared fuzzy equivalence relations tasks, and the use of fuzzy systems confers the model the
then force the same number of clustering categories given the capacity to handle uncertainty in an information insufficient
same value of α, which means knowledge can be transferred environment and improves the prediction accuracy.
from the source domain to the target domain in heterogeneous The remainder of this article is structured as follows. Section II
space through the clustering categories. presents the preliminaries of this article, including some im-
Despite these advancements in fuzzy system-based transfer portant definitions in transfer learning, and the main prediction
learning methods, there is still the main issue that has not been model applied, i.e., the Takagi–Sugeno (T–S) fuzzy model.
solved: how to merge and transfer knowledge when multiple- Section III presents a four-step algorithm for the domain adap-
source domains are available. This case is quite common in tation process using multiple-source domains in homogeneous
the real world. For example, a company needs to determine the space. Section IV presents an algorithm for multiple-source
price for a new type of computer entering the Australian market domain knowledge transfer in heterogeneous situations, which
but has very little data on consumer responses to the product. includes two approaches, implemented simultaneously—one
However, data for two other types of computers sold in Australia with four steps and one with five steps. Sections V and VI present
are available. So, how might these two datasets (source domains) the validation tests of the two proposed algorithms using both
be used to support the pricing decision at hand (target domain)? synthetic and real-world datasets. Section VII concludes this
There have already been some studies on multiple-source article and outlines future work.
domain adaptation problems. Yao and Doretto [27] proposed
two new algorithms, MultiSource-TrAdaBoost and TaskTraAd- II. PRELIMINARIES
aBoost, which extend the boosting framework for transfer-
ring knowledge from multiple sources. These algorithms re- This section begins with some basic definitions of transfer
duce negative transfers by increasing the number of sources. learning, followed by an introduction to the T–S fuzzy model,
Tan et al. [28] presented a novel algorithm to leverage knowl- which is the basic prediction model used in our multiple-source
edge from different views and sources collaboratively by letting domain adaptation method.
different views from different sources complement each other
through a cotraining style framework to reduce the differences A. Definitions
in distribution across different domains. Beyond transferring
Definition 1 (Domain) [6]: A domain is denoted as D =
the source data, Zhuang et al. [29] discovered a more powerful
{F, P (X)}, where F is a feature space, and P (X) X =
feature representation of the data when transferring knowledge
{x1 , . . . , xn } are the probability distributions of the instances.
from multiple-source domains to the target domain. Here, au-
Definition 2 (Task) [6]: A task is denoted as T = {Y, f (·)},
toencoders are used to construct a feature mapping from an
where Y ∈ R is the output, and f (·) is an objective predictive
original instance to a hidden representation, and multiple classi-
function.
fiers from the source domain data are jointly trained to learn the
Definition 3 (Transfer Learning) [6]: Given a source domain
hidden representation and classifiers simultaneously. However,
Ds , a learning task Ts , a target domain Dt , and a learning task Tt ,
these approaches were developed for classification tasks and,
transfer learning aims to improve learning of the target predictive
thus far, combining knowledge from multiple sources cannot be
function ft (·) in Dt using the knowledge in Ds and Ts where
translated into fuzzy systems, which are superior at handling
Ds = Dt or Ts = Tt .
uncertainty in domain adaptation problems.
In short, transfer learning aims to use previously acquired
Some of our own previous research has focused on developing
knowledge (from a source domain) to assist prediction tasks in
the domain adaptation ability of fuzzy rule-based models with
a new, but related domain (the target domain).
regression tasks [30], [31]. We proposed a set of algorithms for
two different scenarios, where the datasets for the source domain
and target domain were homogeneous [32] and heterogeneous B. T–S Fuzzy Model
[33]. In this article, we explore the ability of fuzzy systems A fuzzy system, in this case, a T–S model, comprises a set of
to deal with transfer learning problems when multiple-source IF–THEN fuzzy rules in the following form:
domains are available based on these previous works. If x is Ai (x, v i ), then y is Li (x, ai )
The specific contribution of this article is to advance the do-
main adaptation ability of fuzzy rule-based systems in multiple- i = 1, . . . , c (1)
source environments for regression tasks. The current transfer
learning methods cannot deal with the regression tasks with where v i are the prototypes, and ai are the coefficients of the
multiple sources. In principle, sometimes a single source transfer linear function.
is better than multidomains and sometimes multidomain transfer This T–S fuzzy model is built using a set of instances
is better than single domain, which determines the “similarity” {(x1 , y1 ), . . . , (xN , yN )} using a sequence of two procedures
between source domains and the target domain. This article, in [34]: from the conditions A1 , . . . , Ac through fuzzy cluster-
fact, aims to identify which source domains(s) are more suitable ing, and from the optimized parameters of the linear functions
than others to transfer knowledge to a given target domain. Li (x, ai ).
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
3420 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020
III. FUZZY TRANSFER LEARNING USING MULTIPLE SOURCES B. Multiple-Source Domain Adaptation in
IN HOMOGENEOUS SPACE Homogeneous Space
This section presents a method for transferring knowledge The method of transferring knowledge from multiple-source
from multiple-source domains to the target domain in homo- domains to a target domain can be summarized in four steps.
geneous space. The multiple-domain adaptation problem with Step 1: Combine all the rules in the source domains.
a fuzzy rule-based model is outlined first with formulas. And Given h source domains S 1 , . . . , S h , h sets of fuzzy rules
the specific challenge with implementing knowledge transfer in would be obtained, denoted as Rs1 , . . . , Rsh
such cases is described. Second, the procedures in the proposed
Rs1 = r v s1 s1
1 , a1 , r v s1 s1
2 , a2 , . . . , r v s1 s1
c1 , ac1
method are described in detail.
....
A. Problem Statement
Rsh = r v sh sh
1 , a1 , r v sh sh
2 , a2 , . . . , r v sh sh
ch , ach (5)
Consider there are h source domains with large amounts of
labeled data and a target domain with very little labeled data. The where r(v sj sj
i , ai ), (i = 1, . . . , cj), represents a rule in the
datasets in multiple-source domains are denoted as S 1 , . . . , S h source domain S j , j = 1, . . . , h, v sj
k is the prototype, i.e., the
center of the data clusters, and asj are the coefficients of the cor-
S 1 = xs1 1 , y1
s1
, . . . , xs1 s1
Ns1 , yNs1
k
responding linear functions. The rule r(v sj sj
k , ak ) is represented
.... as
sh sh If xsj sj sj sj sj sj
Sh = x1 , y1 , . . . , xsh sh k is Ai (xk , v i ), then yk is Li (xk , ak )
Nsh , yNsh . (3)
i = 1, . . . , cj. (6)
(xsj sj
k , yk ) is the kth input–output data pair in the jth source
domain, where xsj n
k ∈ R (k = 1, . . . , Nsj , j = 1, . . . , h) is an
In homogeneous cases, there is an assumption that the source
n-dimensional input variable, the label yksj ∈ R is a continuous data are not accessible after constructing the model, and only
variable, and Nsj indicates the number of data pairs. the rules are available. This could preserve the privacy of the
The dataset in the target domain T consists of two subsets: source data.
one with labels and one without The rules in S 1 , . . . , S h are combined and denoted as Rs
T = {T L , T U } = xt1 , y1t , . . . , xtNt1 , yN
t
, Rs = r v s1 1 , a1
s1
, . . . , r v s1 s1
c1 , ac1 , . . . ,
t1
r v sh 1 , a1
sh
, . . . , r v sh sh
ch , ach (7)
xtNt1+1 , . . . , xtNt (4)
which can be rewritten as
where xtk ∈ Rn (k = 1, . . . , Nt ) is the n-dimensional input vari-
Rs = {r (v s1 , as1 ) , r (v s2 , as2 ) , . . . , r (v scs , ascs )} (8)
able, ykt ∈ R is a label that is only accessible for the first Nt1
data. T L contains the instances with labels, and T U contains the where cs = c1 + · · · + ch.
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3421
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
3422 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020
where wjp indicates the weights of the pth node’s contribution IV. MULTIPLE-SOURCE DOMAIN ADAPTATION IN
1
to the output, and z tkj = −αjp (xt −βjp )
, j = 1, . . . , n, p = HETEROGENEOUS SPACE
1+e kj
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3423
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
3424 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020
feature space Lsj , and the data in S j and T are converted to TABLE I
DATASETS USED IN EACH OF THE NINE EXPERIMENTS
S̃ j = x̃sj
1 , y1
sj
, . . . , x̃sj sj
Nsj , yNsj
T̃ j = T̃ jL , T̃ jU = x̃tj
1 , y1
tj
, . . . , x̃tj tj
Ntj , yNtj ,
x̃tj tj
Ntj+1 , . . . , x̃Nj . (19)
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3425
TABLE II
RMSE OF THREE TYPES TRANSFER LEARNING (NO, SINGLE, AND MULTIPLE)
contains all the fuzzy rules from all source domains. The second
is a single source domain model. We evaluated the performance
of our method with three models: no-transfer models, single-
source transfer models, and multiple-source transfer models.
A no-transfer model means the source model is used directly
to solve the target task. A single-source transfer model indi-
cates that only one domain has been used as the source, and a
multiple-source transfer model obviously means that knowledge
is leveraged from multiple-source domains to support regression
tasks in the target domain.
Fig. 7. Scenario A: Different distributions in three domains.
In our experiments, all models were tested on unlabeled target
datasets T U to verify the models’ ability to solve regression tasks
in the target domain.
In this set of experiments, we generated four datasets with
two, three, four, and five clusters. More details about how the
datasets were generated can be found in our previous paper
[35]. For each experiment, we chose two datasets to serve as the
source domains and one to serve as the target domain. Table I
lists the datasets used for each experiment with each dataset
denoted by the number of clusters it contains. For example, in
Experiments 1–3, the dataset with two clusters is selected as
the target domain, and two of the remaining three datasets are
Fig. 8. Scenario B: Similar distributions in three domains.
selected as the source domains, which results in three configu-
rations. Similar operations were conducted for Experiments 4–6
and 7–9. The reason that the dataset with five clusters was not The results show that the no-transfer method returned high
chosen as the target domain is that the aim is to construct an mean values, which represents the gap between the source
environment where there are less rules in the target domain than domains and the target domain. Comparing the two forms of
in the combined source domains. A sufficient number of rules is multiple-source transfer, selecting a set of appropriate rules
beneficial in the selection process and also creates a guarantee worked better than the brute force method of combining all
of the model performance in the target domain. As such, nine the rules, which serves as a clear indication that combining
different experimental configurations are shown in Table I, and every fuzzy rule leads to redundancy and inferior results. For
nine groups of results are shown in Table II. the most part, the multiple-source method also worked better
We tested these nine dataset configurations in Table II with than single-source transfer. Experiments 8 and 9 were the ex-
three types of models—no transfer, single transfer, and multiple ceptions. Upon further analysis, we attribute the success of the
transfer. The “no-transfer model” actually contains two models, single-source method in these two cases to the poor selection of
one prediction model for each of the two source domains. rules, but this does highlight that multiple-source transfer has
Similarly, the “single-source transfer” involves two models. some limitations.
The “multiple-source transfer” models also contain two mod- Hence, in the following part, we explore the scope of appli-
els: one is the T–S model with all the fuzzy rules from both cations to provide some guidance for practical uses of multiple-
source domains; the other is the model built using our proposed source transfer learning.
method. Root mean square error (RMSE) is used to measure Three experiments are implemented in this part. The input
the regression performance. All models were constructed using datasets in each experiment are shown in Figs. 7–9. Each figure
five-cross validation; therefore, the results are shown in the form represents a different input data scenario, where the points in
of “mean±variance.” The results with the best performance are blue indicate input data from Source Domain 1, yellow indicates
in highlighted in bold. Source Domain 2, and red indicates the target input data. The
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
3426 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020
TABLE III
TRANSFER PERFORMANCES OF THREE SCENARIOS IN FIGS. 7–9
data, the single transfer method will likely provide the best
performance. Furthermore, the closer the source domain to the
target domain, the better the transfer result. Since, in real-world
applications, it is difficult to identify the structure, especially in
high-dimensional datasets, our method could play a significant
role in practical multiple-source cases.
In all the abovementioned experiments, for simplicity, the
source domains are designed to have the same number of data.
Please note that unbalanced data in multiple-source domains will
not affect the performance of the method, since the performance
Fig. 9. Scenario C: Similar distributions in Source 1 and Target, but different of the source model can be guaranteed as long as the source data
in Source 2. covers all the clusters.
linear functions used to generate these datasets are the same in B. Experiments on Real-World Datasets
the source and target domains. In this section, we used real-world datasets to validate the
In Scenario A, the distributions of all three domains are quite effectiveness of the proposed multiple-source transfer method.
dissimilar, and both Source Domain 1 and Source Domain 2 Since the studies on domain adaptation with regression problems
are very different from the target data, as shown in Fig. 7. The are scarce, there are no public datasets for these scenarios. We,
results in Table III show that our method of combining rules from therefore, turned to five datasets from the UCI machine learning
multiple-source domains has the best performance in this case. repository and modified them to simulate a range of multiple-
Scenarios B and C are two special cases that are more appli- source transfer learning scenarios. Since how the datasets were
cable to a single-source transfer method. However, they require modified is crucial, a detailed description follows using two
a strict condition—the data structures in all the domains must datasets as examples.
be identified. The “condition-based maintenance (CBM) of naval propul-
In Scenario B, the distributions of all the three domains are sion plants” dataset contains 14 features, such as ship speed,
quite similar, but the discrepancies between the two source do- gas turbine shaft torque, and so on. These features were used
mains and the target domain are different, as shown in Fig. 8. The to predict gas turbine decay state coefficients. We split the data
results show that the single-source transfer methods are superior according to ship speed; speeds greater than ten knots formed
to the multisource transfer methods, and that the single-source the source domains (7500 instances), and the remaining 3500
transfer methods based on Source Domain 2 performed the best. instances were used as the target domain. The source instances
This is because the data in both source domains have similar were further divided into 4000 for Source 1 and 3500 for
distributions to the target domain, and Source Domain 2 is more Source 2. All instances in the source domains were labeled with
similar to the target domain than Source Domain 1. only ten labeled instances in the target domain.
In Scenario C, only one source domain, Source Domain 1, The “combined cycle power plant” (CCPP) dataset contains
has a similar data structure to the target data, while Source four attributes: temperature, ambient pressure, relative humid-
Domain 2 has a dissimilar data structure, as shown in Fig. 9. ity, and exhaust vacuum, which were used to predict the net
Thus, the single-transfer method with Source Domain 1 would hourly electrical energy output. A total of 6800 instances with
be superior to the other methods. a temperature of not greater than 25° formed Source Domains
The results of implementing transfer learning with different 1 and 2, evenly split into groups of 3400. The remaining 2500
models are shown in Table III. The results with the best perfor- instances formed the target domain. Again, all source instances
mance are indicated in bold. were labeled, and ten target instances were labeled.
Analyzing the results from the abovementioned three sce- The other three datasets are “Istanbul stock exchange
narios, we can draw two conclusions. First, if all the source dataset,” “air quality dataset,” and “airfoil self-noise dataset.”
domains have a dissimilar structure to the target domain or the For more details, please refer to the UCI machine learning
relationships of data structures between domains are implicit, repository.
then selecting an appropriate subset of rules from multiple- We performed two groups of experiments to both validate
source domains would be the optimal choice. Second, if a source our method and analyze the impact of the number of clusters.
domain exists with a data structure that is similar to the target In the first set of experiments, we compared our method with
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3427
TABLE IV
COMPARISON OF OUR METHOD WITH TSK, TCA, SA, AND GFK
TABLE V
PERFORMANCE WITH VARYING NUMBERS OF CLUSTERS (RULES) IN TARGET DOMAIN
TABLE VI
TRANSFER PERFORMANCE (RMSE) IN FIVE CITIES IN THE YEAR 2013
some state-of-the-art methods in transfer learning, i.e., TSK [36], wind direction, cumulated wind speed, hourly precipitation, and
TCA [37], SA [38], and GFK [39]. All these methods are able cumulated precipitation. The 13 attributes are used as the inputs
to solve both classification and regression tasks but have not to predict PM 2.5 concentration.
been presented as solutions for multiple-source situations. To To simulate the multisource transfer learning scenario, four
be fair, we used these methods with combined data from all groups of experiments have been designed in the new version to
source domains for knowledge transfer. implement and validate our method in handling multiple sources.
Although there are some methods and heuristic algorithms for The transfer performance of these four groups experiments are
determining the number of clusters, it is often difficult to identify shown in Tables VI to IX. The first group experiments execute
the number of clusters to use when constructing a T–S model the knowledge transfer among five cities in year 2013, and
with real-world data—especially those with high-dimensional Table VI displays the transfer performance (RMSE). The third
data. Hence, in the second set of experiments, we treated the column in Table VI indicates the city that is selected as the
number of clusters as a hyperparameter and discuss its impact target domain, and the second column shows the two cities that
on the transfer learning results. are chosen from the remaining cities as the source domains.
The results of the two groups of experiments are shown in Two types of transfer learning methods: single-source transfer
Tables IV and V. The results show superior performance by our and multiple-source transfer are implemented. Here, the “single-
proposed method on all five datasets. Table V shows there is source transfer” involves two models: one is transferred with
no obvious impact on prediction accuracy with a different num- fuzzy rules from Source Domain 1, and the other is transferred
ber of clusters. However, in most cases, the best performance with fuzzy rules from Source Domain 2. The “multiple-source
appeared with fewer clusters. transfer” contains three models: 1) combining all the data across
Beside the five datasets, we also applied a new, large, and more the source domains, 2) combining the rules from the source
complex problem of predicting PM 2.5 concentration in different domains, and 3) selecting rules from the source domains using
cities to further validate our transfer learning method in multiple- our proposed method. Similarly, experiments on Tables VII and
source scenario. The dataset contains PM 2.5 data and related VIII implement transfer learning in five cities in year 2014 and
meteorological data in five big cities [Beijing (BJ), Shanghai 2015. The last group of experiments in Table IX uses the data
(SH), Guangzhou (GZ), Chengdu (CD) and Shenyang (SY)] in from years 2013 and 2014 to predict the PM 2.5 concentration
China from year 2013 to 2015. Beside the values of PM 2.5, there in year 2015.
are 13 main attributes to describe the data: year, month, day, hour, The RMSE shown in Tables VI to IX indicates that our pro-
season, dew point, temperature, humidity, pressure, combined posed method is superior to the single transfer leaning methods
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
3428 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020
TABLE VII
TRANSFER PERFORMANCE (RMSE) IN FIVE CITIES IN THE YEAR 2014
TABLE VIII
TRANSFER PERFORMANCE (RMSE) IN FIVE CITIES IN THE YEAR 2015
TABLE IX
TRANSFER DATA FROM YEARS 2013 AND 2014–2015
TABLE X TABLE XI
DATASET WITH 3-D SOURCE DATA AND 2-D TARGET DATA DATASET WITH 4-D SOURCE DATA AND 3-D TARGET DATA
and the other two multiple-source transfer learning methods. The parameters in Tables X and XI show that, although the
This further validates the effectiveness of our method. source and target domains have different dimensions, there
are always some shared features with similar values, which
VI. EXPERIMENTS IN HETEROGENEOUS SPACE
represent the relevance between the domains. However, there
Our experiments in heterogeneous settings also involved syn- is always a domain that has quite different linear coefficients
thetic and real-world datasets. than the other two.
Table XII implements the experiments where data in the
A. Synthetic Datasets
source domains are 3-D and data in the target domain are 2-D.
We designed two groups of experiments: one with three- And in Table XIII, the source data are 4-D and the target data are
dimensional (3-D) data in the source domains and 2-D data in 3-D. Tables XII and XIII show the RMSE of the ten experiments
the target domain; the other with 4-D data as the source and for each group using the single-source transfer method plus three
3-D data as the target. Ten experiments were executed in each variations of the multiple-source transfer method: combining all
group. The settings for one of the ten experiments are provided the data across the source domains, combining the rules from the
in Tables X and XI as an example to illustrate the data structures source domains, and selecting only some rules from the source
in the three domains. domains. The best results are indicated in bold.
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3429
TABLE XII
HETEROGENEOUS TRANSFER WITH 3-D SOURCE DATA AND 2-D TARGET DATA
TABLE XIII
HETEROGENEOUS TRANSFER WITH 4-D SOURCE DATA AND 3-D TARGET DATA
TABLE XIV The feature values for all three domains are similar, but the
DATASET FOR SHOWING LIMITATION OF MULTIPLE-SOURCE TRANSFER
most important thing is that the coefficients of linear functions
in two source domains are exactly the same, and they are also
almost equal to the target domain.
The results of these ten experiments are shown in Table XV.
As the results show, the multiple-source method was inferior
to the single-source method in each of these situations and
highlights that proper domain selection is key to producing good
results when drawing on multiple-source domains to transfer
knowledge. Selecting an appropriate domain is a crucial problem
and, therefore, will be studied in future work.
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
3430 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 28, NO. 12, DECEMBER 2020
TABLE XV
RESULTS OF EXPERIMENTS FOR LIMITATION OF MULTIPLE-SOURCE TRANSFER
TABLE XVI
RESULTS FOR THE HETEROGENEOUS REAL-WORLD DATASETS
half. Taken overall, we therefore conclude that leveraging multi- domain selection is a key factor in the methods’ efficacy, which
ple domains as sources performs better than using a single source we intend to examine as an effective way to avoid negative
in heterogeneous situations. transfer in future studies.
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.
LU et al.: FUZZY MULTIPLE-SOURCE TRANSFER LEARNING 3431
[14] W. Pan, “A survey of transfer learning for collaborative recommendation [37] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via
with auxiliary data,” Neurocomputing, vol. 177, pp. 447–453, 2016. transfer component analysis,” IEEE Trans. Neural Netw., vol. 22, no. 2,
[15] L. Wen, X. Li, and L. Gao, “A transfer convolutional neural network for pp. 199–210, Feb. 2011.
fault diagnosis based on ResNet-50,” in Neural Computing and Applica- [38] B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars, “Unsupervised
tions. Berlin, Germany: Springer, 2019, pp. 1–14. visual domain adaptation using subspace alignment,” in Proc. IEEE Int.
[16] Z. Lu, Y. Zhu, S. J. Pan, E. W. Xiang, Y. Wang, and Q. Yang, “Source free Conf. Comput. Vis., 2013, pp. 2960–2967.
transfer learning for text classification,” in Proc. 28th AAAI Conf. Artif. [39] B. Gong, Y. Shi, F. Sha, and K. Grauman, “Geodesic flow kernel for un-
Intell., 2014, pp. 122–128. supervised domain adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern
[17] R. Collobert and J. Weston, “A unified architecture for natural language Recognit., 2012, pp. 2066–2073.
processing: Deep neural networks with multitask learning,” in Proc. 25th
Int. Conf. Mach. Learn., 2008, pp. 160–167.
[18] L. Wen, L. Gao, and X. Li, “A new deep transfer learning based on sparse
auto-encoder for fault diagnosis,” IEEE Trans. Syst., Man, Cybern., Syst., Jie Lu (F’18) received the Ph.D. degree in informa-
vol. 49, no. 1, pp. 136–144, Jan. 2019. tion systems from the Curtin University of Technol-
[19] R. R. Yager and L. A. Zadeh, An Introduction to Fuzzy Logic Applications ogy, Perth, WA, Australia, in 2000.
in Intelligent Systems. Berlin, Germany: Springer, 2012. She is currently a Distinguished Professor, the Di-
[20] J. Shell and S. Coupland, “Fuzzy transfer learning: Methodology and rector of the Centre for Artificial Intelligence, and
application,” Inf. Sci., vol. 293, pp. 59–79, 2015. the Associate Dean (Research Excellence) with the
[21] V. Behbood, J. Lu, and G. Zhang, “Fuzzy refinement domain adaptation for Faculty of Engineering and Information Technology,
long term prediction in banking ecosystem,” IEEE Trans. Ind. Informat., University of Technology Sydney, Sydney, NSW,
vol. 10, no. 2, pp. 1637–1646, May 2014. Australia. She has authored or coauthored six re-
[22] Z. Deng, Y. Jiang, F.-L. Chung, H. Ishibuchi, and S. Wang, “Knowledge- search books and more than 450 papers in refereed
leverage-based fuzzy system and its modeling,” IEEE Trans. Fuzzy Syst., journals and conference proceedings. Her main re-
vol. 21, no. 4, pp. 597–609, Aug. 2013. search interests are in the areas of fuzzy transfer learning, concept drift, decision
[23] Z. Deng, Y. Jiang, H. Ishibuchi, K.-S. Choi, and S. Wang, “Enhanced support systems, and recommender systems.
knowledge-leverage-based TSK fuzzy system modeling for inductive Dr. Lu is an IFSA Fellow and Australian Laureate Fellow. She has won
transfer learning,” ACM Trans. Intell. Syst. Technol., vol. 8, no. 1, 2016, more than 20 ARC Laureate, ARC Discovery Projects, government and in-
Art. no. 11. dustry projects. She serves as the Editor-in-Chief for Knowledge-Based Systems
[24] V. Behbood, J. Lu, and G. Zhang, “Fuzzy bridged refinement domain (Elsevier) and International Journal of Computational Intelligence Systems. She
adaptation: Long-term bank failure prediction,” Int. J. Comput. Intell. has delivered more than 25 keynote speeches at international conferences and
Appl., vol. 12, no. 01, 2013, Art. no. 1350003. chaired 15 international conferences. She was the recipient of various awards
[25] V. Behbood, J. Lu, G. Zhang, and W. Pedrycz, “Multistep fuzzy bridged such as the UTS Medal for Research and Teaching Integration (2010), the UTS
refinement domain adaptation algorithm and its application to bank failure Medal for Research Excellence (2019), the Computer Journal Wilkes Award
prediction,” IEEE Trans. Fuzzy Syst., vol. 23, no. 6, pp. 1917–1935, (2018), the IEEE Transactions on Fuzzy Systems Outstanding Paper Award
Dec. 2015. (2019), and the Australian Most Innovative Engineer Award (2019).
[26] F. Liu, J. Lu, and G. Zhang, “Unsupervised heterogeneous domain adap-
tation via shared fuzzy equivalence relations,” IEEE Trans. Fuzzy Syst.,
vol. 26, no. 6, pp. 3555–3568, Dec. 2018.
[27] Y. Yao and G. Doretto, “Boosting for transfer learning with multiple Hua Zuo received the Ph.D. degree in computer
sources,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2010, science from the University of Technology Sydney,
pp. 1855–1862. Sydney, NSW, Australia, in 2018.
[28] B. Tan, E. Zhong, E. W. Xiang, and Q. Yang, “Multi-transfer: Transfer She is currently a Lecturer with the Faculty of
learning with multiple views and multiple sources,” in Proc. SIAM Int. Engineering and Information Technology, University
Conf. Data Mining, 2013, pp. 243–251. of Technology Sydney. Her research interests include
[29] F. Zhuang, X. Cheng, S. J. Pan, W. Yu, Q. He, and Z. Shi, “Transfer transfer learning and fuzzy systems.
learning with multiple sources via consensus regularized autoencoders,” Dr. Zuo is currently a member of the Decision
in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases, 2014, Systems and e-Service Intelligence (DeSI) Research
pp. 417–431. Laboratory, Centre for Artificial Intelligence, Univer-
[30] H. Zuo, J. Lu, G. Zhang, and F. Liu, “Fuzzy transfer learning using an sity of Technology Sydney.
infinite Gaussian mixture model and active learning,” IEEE Trans. Fuzzy
Syst., vol. 27, no. 2, pp. 291–303, Feb. 2019.
[31] H. Zuo, G. Zhang, and J. Lu, “Semi-supervised transfer learning in Takagi- Guangquan Zhang received the Ph.D. degree in
Sugeno fuzzy models,” in Proc. 13th Int. Flins Conf. Data Sci. Knowl. Eng. applied mathematics from the Curtin University of
Sens. Decis. Support, 2018, vol. 11, pp. 316–322. Technology, Perth, WA, Australia, in 2001.
[32] H. Zuo, G. Zhang, W. Pedrycz, V. Behbood, and J. Lu, “Granular fuzzy He is currently an Associate Professor and the
regression domain adaptation in Takagi–Sugeno fuzzy models,” IEEE Director of the Decision Systems and e-Service
Trans. Fuzzy Syst., vol. 26, no. 2, pp. 847–858, Apr. 2018. Intelligent (DeSI) Research Laboratory, Center for
[33] H. Zuo, J. Lu, G. Zhang, and W. Pedrycz, “Fuzzy rule-based domain Artificial Intelligence, Faculty of Engineering and
adaptation in homogeneous and heterogeneous spaces,” IEEE Trans. Fuzzy Information Technology, University of Technology
Syst., vol. 27, no. 2, pp. 348–361, Feb. 2019. Sydney, Sydney, NSW, Australia. He has authored
[34] M. L. Hadjili and V. Wertz, “Takagi-Sugeno fuzzy modeling incorporating four monographs, five textbooks, and 300 papers
input variables selection,” IEEE Trans. Fuzzy Syst., vol. 10, no. 6, pp. 728– including 154 refereed international journal papers.
742, Dec. 2002. His research interests include fuzzy sets and systems, fuzzy optimization, fuzzy
[35] H. Zuo, G. Zhang, W. Pedrycz, V. Behbood, and J. Lu, “Fuzzy regression transfer learning, and fuzzy modeling in machine learning and data analytics.
transfer learning in Takagi–Sugeno fuzzy models,” IEEE Trans. Fuzzy Dr. Zhang has won eight Australian Research Council (ARC) Discovery
Syst., vol. 25, no. 6, pp. 1795–1807, Dec. 2017. Projects grants and many other research grants. He was awarded an ARC QEII
[36] C. Yang, Z. Deng, K.-S. Choi, and S. Wang, “Takagi–Sugeno–Kang trans- fellowship. He was a Guest Editor of eight special issues of IEEE Transactions
fer learning fuzzy logic system for the adaptive recognition of epileptic and other international journals and has cochaired several international con-
electroencephalogram signals,” IEEE Trans. Fuzzy Syst., vol. 24, no. 5, ferences and workshops in the area of fuzzy decision-making and knowledge
pp. 1079–1094, Oct. 2016. engineering.
Authorized licensed use limited to: ULAKBIM UASL - Galatasaray Universitesi. Downloaded on April 03,2022 at 12:29:06 UTC from IEEE Xplore. Restrictions apply.