Transfer Learning
Transfer Learning
INTRODUCTION
Human learners appear to have inherent ways to transfer knowledge between
tasks. That is, we recognize and apply relevant knowledge from previous learning
experiences when we encounter new tasks. The more related a new task is to our
previous experience, the more easily we can master it.
Common machine learning algorithms, in contrast, traditionally address iso-
lated tasks. Transfer learning attempts to change this by developing methods
to transfer knowledge learned in one or more source tasks and use it to improve
learning in a related target task (see Figure 1). Techniques that enable knowl-
edge transfer represent progress towards making machine learning as efficient as
human learning.
This chapter provides an introduction to the goals, formulations, and chal-
lenges of transfer learning. It surveys current research in this area, giving an
overview of the state of the art and outlining the open problems.
Transfer methods tend to be highly dependent on the machine learning al-
gorithms being used to learn the tasks, and can often simply be considered
extensions of those algorithms. Some work in transfer learning is in the context
of inductive learning, and involves extending well-known classification and infer-
ence algorithms such as neural networks, Bayesian networks, and Markov Logic
Networks. Another major area is in the context of reinforcement learning, and
involves extending algorithms such as Q-learning and policy search. This chapter
surveys these areas separately.
Appears in the Handbook of Research on Machine Learning Applications, published
by IGI Global, edited by E. Soria, J. Martin, R. Magdalena, M. Martinez and A.
Serrano, 2009.
1
Given Learn
Data
Target Task
Source-Task
Học chuyển giao là học máy Knowledge
với một nguồn thông tin bổ
sung ngoài dữ liệu đào tạo
tiêu chuẩn: kiến thức từ
Fig. 1. Transfer learning is machine learning with an additional source of information
một hoặc nhiều nhiệm vụ apart from the standard training data: knowledge from one or more related tasks.
liên quan
with transfer
without transfer
higher start
training
Fig. 2. Three ways in which transfer might improve learning.
2
Transfer Learning Multi-task Learning
Task Task
1 3
Source Target
Task Task
Task Task
2 4
Fig. 3. As we define transfer learning, the information flows in one direction only, from
the source task to the target task. In multi-task learning, information can flow freely
among all tasks.
3
assumption that predicates contribute significantly to example coverage by them-
selves rather than in pairs or more.
Transfer in inductive learning works by allowing source-task knowledge to
affect the target task’s inductive bias. It is usually concerned with improving
the speed with which a model is learned, or with improving its generalization
capability. The next subsection discusses inductive transfer, and the following
ones elaborate on three specific settings for inductive transfer.
There is some related work that is not discussed here because it specifically
addresses multi-task learning. For example, Niculescu-Mizil and Caruana [29]
learn Bayesian networks simultaneously for multiple related tasks by biasing
learning toward similar structures for each task. While this is clearly related to
transfer learning, it is not directly applicable to the scenario in which a target
task is encountered after one or more source tasks have already been learned.
Inductive Transfer
Search Search
Allowed Hypotheses
Allowed Hypotheses
All Hypotheses All Hypotheses
Fig. 4. Inductive learning can be viewed as a directed search through a specified hy-
pothesis space [28]. Inductive transfer uses source-task knowledge to adjust the induc-
tive bias, which could involve changing the hypothesis space or the search steps.
4
Thrun and Mitchell [55] look at solving Boolean classification tasks in a
lifelong-learning framework, where an agent encounters a collection of related
problems over its lifetime. They learn each new task with a neural network, but
they enhance the standard gradient-descent algorithm with slope information
acquired from previous tasks. This speeds up the search for network parameters
in a target task and biases it towards the parameters for previous tasks.
Mihalkova and Mooney [27] perform transfer between Markov Logic Net-
works. Given a learned MLN for a source task, they learn an MLN for a related
target task by starting with the source-task one and diagnosing each formula,
adjusting ones that are too general or too specific in the target domain. The
hypothesis space for the target task is therefore defined in relation to the source-
task MLN by the operators that generalize or specify formulas.
Hlynsson [17] phrases transfer learning in classification as a minimum descrip-
tion length problem given source-task hypotheses and target-task data. That is,
the chosen hypothesis for a new task can use hypotheses for old tasks but stip-
ulate exceptions for some data points in the new task. This method aims for a
tradeoff between accuracy and compactness in the new hypothesis.
Ben-David and Schuller [3] propose a transformation framework to determine
how related two Boolean classification tasks are. They define two tasks as related
with respect to a class of transformations if they are equivalent under that class;
that is, if a series of transformations can make one task look exactly like the
other. They provide conditions under which learning related tasks concurrently
requires fewer examples than single-task learning.
Bayesian Transfer
One area of inductive transfer applies specifically to Bayesian learning meth-
ods. Bayesian learning involves modeling probability distributions and taking
advantage of conditional independence among variables to simplify the model.
An additional aspect that Bayesian models often have is a prior distribution,
which describes the assumptions one can make about a domain before seeing
any training data. Given the data, a Bayesian model makes predictions by com-
bining it with the prior distribution to produce a posterior distribution. A strong
prior can significantly affect these results (see Figure 5). This serves as a natural
way for Bayesian learning methods to incorporate prior knowledge – in the case
of transfer learning, source-task knowledge.
Marx et al. [24] use a Bayesian transfer method for tasks solved by a logistic
regression classifier. The usual prior for this classifier is a Gaussian distribution
with a mean and variance set through cross-validation. To perform transfer, they
instead estimate the mean and variance by averaging over several source tasks.
Raina et al. [33] use a similar approach for multi-class classification by learning
a multivariate Gaussian prior from several source tasks.
Dai et al. [7] apply a Bayesian transfer method to a Naive Bayes classifier.
They set the initial probability parameters based on a single source task, and
revise them using target-task data. They also provide some theoretical bounds
on the prediction error and convergence rate of their algorithm.
5
Bayesian Learning Bayesian Transfer
Prior
distribution
Data
Posterior
Distribution
Fig. 5. Bayesian learning uses a prior distribution to smooth the estimates from train-
ing data. Bayesian transfer may provide a more informative prior from source-task
knowledge.
Hierarchical Transfer
Pipe
Surface Circle
Line Curve
Fig. 6. An example of a concept hierarchy that could be used for hierarchical transfer,
in which solutions from simple tasks are used to help learn a solution to a more complex
task. Here the simple tasks involve recognizing lines and curves in images, and the more
complex tasks involve recognizing surfaces, circles, and finally pipe shapes.
6
Stracuzzi [42] looks at the problem of choosing relevant source-task Boolean
concepts from a knowledge base to use while learning more complex concepts.
He learns rules to express concepts from a stream of examples, allowing existing
concepts to be used if they help to classify the examples, and adds and removes
dependencies between concepts in the knowledge base.
Taylor et al. [49] propose a transfer hierarchy that orders tasks by difficulty,
so that an agent can learn them in sequence via inductive transfer. By putting
tasks in order of increasing difficulty, they aim to make transfer more effective.
This approach may be more applicable to the multi-task learning scenario, since
by our definition of transfer learning the agent may not be able to choose the
order in which it learns tasks, but it could be applied to help choose from an
existing set of source tasks.
7
TRANSFER IN REINFORCEMENT LEARNING
A reinforcement learning (RL) agent operates in a sequential-control environ-
ment called a Markov decision process (MDP) [45]. It senses the state of the en-
vironment and performs actions that change the state and also trigger rewards.
Its objective is to learn a policy for acting in order to maximize its cumulative
reward. This involves solving a temporal credit-assignment problem, since an
entire sequence of actions may be responsible for a single immediate reward.
A typical RL agent behaves according to the diagram in Figure 7. At time step
t, it observes the current state st and consults its current policy π to choose an
action, π(st ) = at . After taking the action, it receives a reward rt and observes
the new state st+1 , and it uses that information to update its policy before
repeating the cycle. Often RL consists of a sequence of episodes, which end
whenever the agent reaches one of a set of ending states.
During learning, the agent must balance between exploiting the current policy
(acting in areas that it knows to have high rewards) and exploring new areas to
find potentially higher rewards. A common solution is the ǫ-greedy method, in
which the agent takes random exploratory actions a small fraction of the time
(ǫ << 1), but usually takes the action recommended by the current policy.
There are several categories of RL algorithms. Some types of methods are only
applicable when the agent knows its environment model (the reward function
and the state transition function). In this case dynamic programming can solve
directly for the optimal policy without requiring any interaction with the envi-
ronment. In most RL problems, however, the model is unknown. Model-learning
approaches use interaction with the environment to build an approximation of
the true model. Model-free approaches learn to act without ever explicitly mod-
eling the environment.
Temporal-difference methods [44] operate by maintaining and iteratively up-
dating value functions to predict the rewards earned by actions. They begin
with an inaccurate function and update it based on interaction with the en-
vironment, propagating reward information back along action sequences. One
popular method is Q-learning [62], which involves learning a function Q(s, a)
that estimates the cumulative reward starting in state s and taking action a
and following the current policy thereafter. Given the optimal Q-function, the
Environment
s0 a0 r0 … s1 a1 r1 … time
Agent
Fig. 7. A reinforcement learning agent interacts with its environment: it receives in-
formation about its state (s), chooses an action to take (a), receives a reward (r), and
then repeats.
8
optimal policy is to take the action corresponding to argmaxa Q(st , a). When
there are small finite numbers of states and actions, the Q-function can be repre-
sented explicitly as a table. In domains that have large or infinite state spaces, a
function approximator such as a neural network or support-vector machine can
be used to represent the Q-function.
Policy-search methods, instead of maintaining a function upon which a policy
is based, maintain and update a policy directly. They begin with an inaccurate
policy and update it based on interaction with the environment. Heuristic search
and optimization through gradient descent are among the approaches that can
be used in policy search.
Transfer in RL is concerned with speeding up the learning process, since RL
agents can spend many episodes doing random exploration before acquiring a
reasonable Q-function. We divide RL transfer into five categories that represent
progressively larger changes to existing RL algorithms. The subsections below
describe those categories and present examples from published research.
Starting-Point Methods
Since all RL methods begin with an initial solution and then update it through
experience, one straightforward type of transfer in RL is to set the initial solution
in a target task based on knowledge from a source task (see Figure 8). Compared
to the random or zero setting that RL algorithms usually use at first, these
starting-point methods can begin the RL process at a point much closer to a
good target-task solution. There are variations on how to use the source-task
knowledge to set the initial solution, but in general the RL algorithm in the
target task is unchanged.
Taylor et al. [53] use a starting-point method for transfer in temporal-difference
RL. To perform transfer, they copy the final value function of the source task
and use it as the initial one for the target task. As many transfer approaches
do, this requires a mapping of features and actions between the tasks, and they
provide a mapping based on their domain knowledge.
Initial Q-table
transfer
2 5 4 8
source no transfer
9 1 7 2
task 5 9 1 4
0 0 0 0
0 0 0 0
0 0 0 0 target-task training
Fig. 8. Starting-point methods for RL transfer set the initial solution based on the
source task in the hope of starting at a higher performance level than the typical
initial solution would. In this example, a Q-function table is initialized to a source-task
table, and the target-task performance begins at a level that is only reached after some
training when beginning with a typical all-zero table.
9
Tanaka and Yamamura [47] use a similar approach in temporal-difference
learning without function approximation, where value functions are simply rep-
resented by tables. This greater simplicity allows them to combine knowledge
from several source tasks: they initialize the value table of the target task to the
average of tables from several prior tasks. Furthermore, they use the standard
deviations from prior tasks to determine priorities between temporal-difference
backups.
Approaching temporal-difference RL as a batch problem instead of an in-
cremental one allows for different kinds of starting-point transfer methods. In
batch RL, the agent interacts with the environment for more than one step or
episode at a time before updating its solution. Lazaric et al. [21] perform trans-
fer in this setting by finding source-task samples that are similar to the target
task and adding them to the normal target-task samples in each batch, thus
increasing the available data early on. The early solutions are almost entirely
based on source-task knowledge, but the impact decreases in later batches as
more target-task data becomes available.
Moving away from temporal-difference RL, starting-point methods can take
even more forms. In a model-learning Bayesian RL algorithm, Wilson et al. [63]
perform transfer by treating the distribution of previous MDPs as a prior for the
current MDP. In a policy-search genetic algorithm, Taylor et al. [54] transfer a
population of policies from a source task to serve as the initial population for a
target task.
Imitation Methods
Another class of RL transfer methods involves applying the source-task policy
to choose some actions while learning the target task (see Figure 9). While they
make no direct changes to the target-task solution the way that starting-point
methods do, these imitation methods affect the developing solution by producing
different function or policy updates. Compared to the random exploration that
RL algorithms typically do, decisions based on a source-task policy can lead the
agent more quickly to promising areas of the environment.
One method is to follow a source-task policy only during exploration steps
of the target task, when the agent would otherwise be taking a random action.
Madden and Howley [23] use this approach in tabular Q-learning. They represent
a source-task policy as a set of rules in propositional logic and choose actions
based on those rules during exploration steps.
Fernandez and Veloso [15] instead give the agent a three-way choice between
exploiting the current target-task policy, exploiting a past policy, and exploring
randomly. They introduce a second parameter, in addition to the ǫ of ǫ-greedy
exploration, to determine the probability of making each choice.
Another imitation method called demonstration involves following a source-
task policy for a fixed number of episodes at the beginning of the target task
and then reverting to normal RL. In the early steps of the target task, the cur-
rent policy can be so ill-formed that exploiting it is no different than exploring
randomly. This approach aims to avoid that initial uncertainty and to generate
10
source source
policy policy
used target used target
(a) (b)
Fig. 9. Imitation methods for RL transfer follow the source-task policy during some
steps of the target task. The imitation steps may all occur at the beginning of the
target task, as in (a) above, or they may be interspersed with steps that follow the
developing target-task policy, as in (b) above.
enough data to create a reasonable target-task policy by the time the demon-
stration period ends. Torrey et al. [58] and Torrey et al. [56] perform transfer
via demonstration, representing the source-task policy as a relational finite-state
machine and a Markov Logic Network respectively.
Hierarchical Methods
A third class of RL transfer includes hierarchical methods. These view the source
as a subtask of the target, and use the solution to the source as a building block
for learning the target. Methods in this class have strong connections to the area
of hierarchical RL, in which a complex task is learned in pieces through division
into a hierarchy of subtasks (see Figure 10).
An early approach of this type is to compose several source-task solutions
to form a target-task solution, as is done by Singh [40]. He addresses a scenario
in which complex tasks are temporal concatenations of simple ones, so that a
target task can be solved by a composition of several smaller solutions.
Mehta et al. [25] have a transfer method that works directly within the hi-
erarchical RL framework. They learn a task hierarchy by observing successful
behavior in a source task, and then use it to apply the MaxQ hierarchical RL
algorithm [10] in the target task. This removes the burden of designing a task
hierarchy through transfer.
Other approaches operate within the framework of options, which is a term
for temporally-extended actions in RL [31]. An option typically consists of a
starting condition, an ending condition, and an internal policy for choosing lower-
level actions. An RL agent treats each option as an additional action along with
the original lower-level ones (see Figure 10).
In some scenarios it may be useful to have the entire source-task policy as an
option in the target task, as Croonenborghs et al. [6] do. They learn a relational
decision tree to represent the source-task policy and allow the target-task learner
to execute it as an option. Another possibility is to learn smaller options, either
during or after the process of learning the source task, and offer them to the
target. Asadi and Huber [1] do this by finding frequently-visited states in the
source task to serve as ending conditions for options.
11
Run
Soccer
Kick
Pass
Pass Shoot
Run Kick
Shoot
Run Kick
Kick
(a) (b)
Fig. 10. (a) An example of a task hierarchy that could be used to train agents to
play soccer via hierarchical RL. Lower-level abilities like kicking a ball and running
are needed for higher-level abilities like passing and shooting, which could then be
combined to learn to play soccer. (b) The mid-level abilities represented as options
alongside the low-level actions.
Alteration Methods
The next class of RL transfer methods involves altering the state space, action
space, or reward function of the target task based on source-task knowledge.
These alteration methods have some overlap with option-based transfer, which
also changes the action space in the target task, but they include a wide range
of other approaches as well.
One way to alter the target-task state space is to simplify it through state
abstraction. Walsh et al. [60] do this by aggregating over comparable source-task
states. They then use the aggregate states to learn the target task, which reduces
the complexity significantly.
There are also approaches that expand the target-task state space instead of
reducing it. Taylor and Stone [51] do this by adding a new state variable in the
target task. They learn a decision list that represents the source-task policy and
use its output as the new state variable.
While option-based transfer methods add to the target-task action space,
there is also some work in decreasing the action space. Sherstov and Stone [38]
do this by evaluating in the source task which of a large set of actions are most
useful. They then consider only a smaller action set in the target task, which
decreases the complexity of the value function significantly and also decreases
the amount of exploration needed.
Reward shaping is a design technique in RL that aims to speed up learning
by providing immediate rewards that are more indicative of cumulative rewards.
Usually it requires human effort, as many aspects of RL task design do. Konidaris
and Barto [19] do reward shaping automatically through transfer. They learn to
predict rewards in the source task and use this information to create a shaped
reward function in the target task.
12
New RL Algorithms
Given a target task, the effectiveness of any transfer method depends on the
source task and how it is related to the target. If the relationship is strong and
the transfer method can take advantage of it, the performance in the target task
can significantly improve through transfer. However, if the source task is not
sufficiently related or if the relationship is not well leveraged by the transfer
method, the performance with many approaches may not only fail to improve
– it may actually decrease. This section examines work on preventing transfer
from negatively affecting performance.
Ideally, a transfer method would produce positive transfer between appro-
priately related tasks while avoiding negative transfer when the tasks are not a
good match. In practice, these goals are difficult to achieve simultaneously. Ap-
proaches that have safeguards to avoid negative transfer often produce a smaller
13
transfer aggressive
performance
safe
task
relatedness
Fig. 11. A representation of how the degree of relatedness between the source and
target tasks translates to target-task performance when conducting transfer from the
source task. With aggressive approaches, there can be higher benefits at high degrees
of relatedness, but there can also be negative transfer at low levels. Safer approaches
may limit negative transfer at the lower end, but may also have fewer benefits at the
higher end.
effect from positive transfer due to their caution. Conversely, approaches that
transfer aggressively and produce large positive-transfer effects often have no
protection against negative transfer (see Figure 11).
For example, consider the imitation methods for RL transfer. On one end
of the range an agent imitates a source-task policy only during infrequent ex-
ploration steps, and on the other end it demonstrates the source-task policy
for a fixed number of initial episodes. The exploration method is very cautious
and therefore unlikely to produce negative transfer, but it is also unlikely to
produce large initial performance increases. The demonstration method is very
aggressive; if the source-task policy is a poor one for the target task, following it
blindly will produce negative transfer. However, when the source-task solution
is a decent one for the target task, it can produce some of the largest initial
performance improvements of any method.
14
grow and the agent will choose it more often. Option-based transfer can there-
fore provide a good balance between achieving positive transfer and avoiding
negative transfer.
A specific approach that incorporates the ability to reject bad information is
the KBKR advice-taking algorithm for transfer in reinforcement learning [57, 59].
Recall that KBKR approximates the Q-function with a support-vector machine
and includes advice from the source task as a soft constraint. Since the Q-function
trades off between matching the agent’s experience and matching the advice, the
agent can learn to disregard advice that disagrees with its experience.
Rosenstein et al. [35] present an approach for detecting negative transfer in
naive Bayes classification tasks. They learn a hyperprior for both the source and
target tasks, and the variance of this hyperprior is proportional to the dissimi-
larity between the tasks. It may be possible to use a method like this to decide
whether to transfer at all, by setting an acceptable threshold of similarity.
15
Task 4
Task 1 Task 1
dist1 dist4
Task 2
Target
Task 3 Select Task
dist3
… Task 3
dist2
Target
Task N Task Task 2
(a) (b)
Fig. 12. (a) One way to avoid negative transfer is to choose a good source task from
which to transfer. In this example, Task 2 is selected as being the most related. (b)
Another way to avoid negative transfer is to model the way source tasks are related to
the target task and combine knowledge from them with those relationships in mind.
high-resolution models are less transferrable between tasks, and select a resolu-
tion below which to share models with a target task.
16
AUTOMATICALLY MAPPING TASKS
An inherent aspect of transfer learning is recognizing the correspondences be-
tween tasks. Knowledge from one task can only be applied to another if it is
expressed in a way that the target-task agent understands. In some cases, the
representations of the tasks are assumed to be identical, or at least one is a
subset of the other. Otherwise, a mapping is needed to translate between task
representations (see Figure 13).
Many transfer approaches do not address the mapping problem directly and
require that a human provide this information. However, there are some transfer
approaches that do address the mapping problem. This section discusses some
of this work.
Property 1 Property 1
Property 2 Property 2
… Property 3
Property N …
Property M
Fig. 13. A mapping generally translates source-task properties into target-task prop-
erties. The numbers of properties may not be equal in the two tasks, and the mapping
may not be one-to-one. Properties may include entries in a feature vector, objects in a
relational world, RL actions, etc.
17
Pan et al. [30] take a mathematical approach to finding a common repre-
sentation for two separate classification tasks. They use kernel methods to find
a low-dimensional feature space where the distributions of source and target
data are similar, and transfer a source-task model for this smaller space. This
approach stretches our strict definition of transfer learning, which assumes the
target task is unknown when the source task is learned, but in some scenarios it
may be practical to adjust the source-task solution to a different feature space
after gaining some knowledge about the target task.
Mapping by Analogy
If the task representations must differ, and the scenario calls for choosing one
mapping rather than trying multiple candidates, then there are some methods
that construct a mapping by analogy. These methods examine the characteristics
of the source and target tasks and find elements that correspond. For example,
in reinforcement learning, actions that correspond produce similar rewards and
state changes, and objects that correspond are affected similarly by actions.
Analogical structure mapping [14] is a generic procedure based on cognitive
theories of analogy that finds corresponding elements. It assigns scores to local
matches and searches for a global match that maximizes the scores; permissi-
ble matches and scoring functions are domain-dependent. Several transfer ap-
proaches use this framework solve the mapping problem. Klenk and Forbus [18]
apply it to solve physics problems that are written in a predicate-calculus lan-
guage by retrieving and forming analogies from worked solutions written in the
18
same language. Liu and Stone [22] apply it in reinforcement learning to find
matching features and actions between tasks.
There are also some approaches that rely more on statistical analysis than
on logical reasoning to find matching elements. Taylor and Stone [52] learn map-
pings for RL tasks by running a small number of target-task episodes and then
training classifiers to characterize actions and objects. If a classifier trained for
one action predicts the results of another action well, then those actions are
mapped; likewise, if a classifier trained for one object predicts the behavior of
another object well, those objects are mapped. Wang and Mahadevan [61] trans-
late datasets to low-dimensional feature spaces using dimensionality reduction,
and then perform a statistical shaping technique called Procrustes analysis to
align the feature spaces.
ACKNOWLEDGEMENTS
This chapter was written while the authors were partially supported by DARPA
grants HR0011-07-C-0060 and FA8650-06-C-7606.
19
References
1. M. Asadi and M. Huber. Effective control knowledge transfer through learning
skill and representation hierarchies. In International Joint Conference on Artificial
Intelligence, 2007.
2. J. Baxter. A model of inductive bias learning. Journal of Artificial Intelligence
Research, 12:149–198, 2000.
3. S. Ben-David and R. Schuller. Exploiting task relatedness for multiple task learn-
ing. In Conference on Learning Theory, 2003.
4. C. Carroll and K. Seppi. Task similarity measures for transfer in reinforcement
learning task libraries. In IEEE International Joint Conference on Neural Net-
works, 2005.
5. R. Caruana. Multitask learning. Machine Learning, 28:41–75, 1997.
6. T. Croonenborghs, K. Driessens, and M. Bruynooghe. Learning relational skills for
inductive transfer in relational reinforcement learning. In International Conference
on Inductive Logic Programming, 2007.
7. W. Dai, G. Xue, Q. Yang, and Y. Yu. Transferring Naive Bayes classifiers for text
classification. In AAAI Conference on Artificial Intelligence, 2007.
8. W. Dai, Q. Yang, G. Xue, and Y. Yu. Boosting for transfer learning. In Interna-
tional Conference on Machine Learning, 2007.
9. J. Davis and P. Domingos. Deep transfer via second-order Markov logic. In AAAI
Workshop on Transfer Learning for Complex Tasks, 2008.
10. T. Dietterich. Hierarchical reinforcement learning with the MAXQ value function
decomposition. Journal of Artificial Intelligence Research, 13:227–303, 2000.
11. K. Driessens, J. Ramon, and T. Croonenborghs. Transfer learning for reinforce-
ment learning through goal and policy parametrization. In ICML Workshop on
Structural Knowledge Transfer for Machine Learning, 2006.
12. E. Eaton and M. DesJardins. Knowledge transfer with a multiresolution ensemble
of classifiers. In ICML Workshop on Structural Knowledge Transfer for Machine
Learning, 2006.
13. E. Eaton, M. DesJardins, and T. Lane. Modeling transfer relationships between
learning tasks for improved inductive transfer. In European Conference on Machine
Learning, 2008.
14. B. Falkenhainer, K. Forbus, and D. Gentner. The structure-mapping engine: Al-
gorithm and examples. Artificial Intelligence, 41:1–63, 1989.
15. F. Fernandez and M. Veloso. Probabilistic policy reuse in a reinforcement learning
agent. In Conference on Autonomous Agents and Multi-Agent Systems, 2006.
16. Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learn-
ing and an application to boosting. Journal of Computer and System Sciences,
55(1):119–139, 1997.
17. H. Hlynsson. Transfer learning using the minimum description length principle
with a decision tree application. Master’s thesis, University of Amsterdam, 2007.
18. M. Klenk and K. Forbus. Measuring the level of transfer learning by an AP physics
problem-solver. In AAAI Conference on Artificial Intelligence, 2007.
19. G. Konidaris and A. Barto. Autonomous shaping: Knowledge transfer in reinforce-
ment learning. In International Conference on Machine Learning, 2006.
20. G. Kuhlmann and P. Stone. Graph-based domain mapping for transfer learning in
general games. In European Conference on Machine Learning, 2007.
21. A. Lazaric, M. Restelli, and A. Bonarini. Transfer of samples in batch reinforcement
learning. In International Conference on Machine Learning, 2008.
20
22. Y. Liu and P. Stone. Value-function-based transfer for reinforcement learning using
structure mapping. In AAAI Conference on Artificial Intelligence, 2006.
23. M. Madden and T. Howley. Transfer of experience between reinforcement learning
environments with progressive difficulty. Artificial Intelligence Review, 21:375–398,
2004.
24. Z. Marx, M. Rosenstein, L. Kaelbling, and T. Dietterich. Transfer learning with
an ensemble of background tasks. In NIPS Workshop on Transfer Learning, 2005.
25. N. Mehta, S. Ray, P. Tadepalli, and T. Dietterich. Automatic discovery and transfer
of MAXQ hierarchies. In International Conference on Machine Learning, 2008.
26. L. Mihalkova, T. Huynh, and R. Mooney. Mapping and revising Markov Logic
Networks for transfer learning. In AAAI Conference on Artificial Intelligence,
2007.
27. L. Mihalkova and R. Mooney. Transfer learning with Markov Logic Networks. In
ICML Workshop on Structural Knowledge Transfer for Machine Learning, 2006.
28. T. Mitchell. Machine Learning. McGraw-Hill, 1997.
29. A. Niculescu-Mizil and R. Caruana. Inductive transfer for Bayesian network struc-
ture learning. In Conference on AI and Statistics, 2007.
30. S. Pan, J. Kwok, and Q. Yang. Transfer learning via dimensionality reduction. In
AAAI Conference on Artificial Intelligence, 2008.
31. T. Perkins and D. Precup. Using options for knowledge transfer in reinforce-
ment learning. Technical Report UM-CS-1999-034, University of Massachusetts,
Amherst, 1999.
32. B. Price and C. Boutilier. Implicit imitation in multiagent reinforcement learning.
In International Conference on Machine Learning, 1999.
33. R. Raina, A. Ng, and D. Koller. Constructing informative priors using transfer
learning. In International Conference on Machine Learning, 2006.
34. M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62(1-
2):107–136, 2006.
35. M. Rosenstein, Z. Marx, L. Kaelbling, and T. Dietterich. To transfer or not to
transfer. In NIPS Workshop on Inductive Transfer, 2005.
36. U. Ruckert and S. Kramer. Kernel-based inductive transfer. In European Confer-
ence on Machine Learning, 2008.
37. M. Sharma, M. Holmes, J. Santamaria, A. Irani, C. Isbell, and A. Ram. Transfer
learning in real-time strategy games using hybrid CBR/RL. In International Joint
Conference on Artificial Intelligence, 2007.
38. A. Sherstov and P. Stone. Action-space knowledge transfer in MDPs: Formalism,
suboptimality bounds, and algorithms. In Conference on Learning Theory, 2005.
39. X. Shi, W. Fan, and J. Ren. Actively transfer domain knowledge. In European
Conference on Machine Learning, 2008.
40. S. Singh. Transfer of learning by composing solutions of elemental sequential tasks.
Machine Learning, 8(3-4):323–339, 1992.
41. V. Soni and S. Singh. Using homomorphisms to transfer options across continuous
reinforcement learning domains. In AAAI Conference on Artificial Intelligence,
2006.
42. D. Stracuzzi. Memory organization and knowledge transfer. In ICML Workshop
on Structural Knowledge Transfer for Machine Learning, 2006.
43. C. Sutton and A. McCallum. Composition of conditional random fields for transfer
learning. In Conference on Empirical methods in Natural Language Processing,
2005.
44. R. Sutton. Learning to predict by the methods of temporal differences. Machine
Learning, 3:9–44, 1988.
21
45. R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press,
1998.
46. E. Talvitie and S. Singh. An experts algorithm for transfer learning. In Interna-
tional Joint Conference on Artificial Intelligence, 2007.
47. F. Tanaka and M. Yamamura. Multitask reinforcement learning on the distri-
bution of MDPs. Transactions of the Institute of Electrical Engineers of Japan,
123(5):1004–1011, 2003.
48. M. Taylor, N. Jong, and P. Stone. Transferring instances for model-based rein-
forcement learning. In European Conference on Machine Learning, 2008.
49. M. Taylor, G. Kuhlmann, and P. Stone. Accelerating search with transferred
heuristics. In ICAPS Workshop on AI Planning and Learning, 2007.
50. M. Taylor, G. Kuhlmann, and P. Stone. Autonomous transfer for reinforcement
learning. In Conference on Autonomous Agents and Multi-Agent Systems, 2008.
51. M. Taylor and P. Stone. Cross-domain transfer for reinforcement learning. In
International Conference on Machine Learning, 2007.
52. M. Taylor and P. Stone. Transfer via inter-task mappings in policy search reinforce-
ment learning. In Conference on Autonomous Agents and Multi-Agent Systems,
2007.
53. M. Taylor, P. Stone, and Y. Liu. Value functions for RL-based behavior transfer:
A comparative study. In AAAI Conference on Artificial Intelligence, 2005.
54. M. Taylor, S. Whiteson, and P. Stone. Transfer learning for policy search methods.
In ICML Workshop on Structural Knowledge Transfer for Machine Learning, 2006.
55. S. Thrun and T. Mitchell. Learning one more thing. In International Joint Con-
ference on Artificial Intelligence, 1995.
56. L. Torrey, J. Shavlik, S. Natarajan, P. Kuppili, and T. Walker. Transfer in rein-
forcement learning via Markov Logic Networks. In AAAI Workshop on Transfer
Learning for Complex Tasks, 2008.
57. L. Torrey, J. Shavlik, T. Walker, and R. Maclin. Relational skill transfer via
advice taking. In ICML Workshop on Structural Knowledge Transfer for Machine
Learning, 2006.
58. L. Torrey, J. Shavlik, T. Walker, and R. Maclin. Relational macros for transfer in
reinforcement learning. In International Conference on Inductive Logic Program-
ming, 2007.
59. L. Torrey, T. Walker, J. Shavlik, and R. Maclin. Using advice to transfer knowledge
acquired in one reinforcement learning task to another. In European Conference
on Machine Learning, 2005.
60. T. Walsh, L. Li, and M. Littman. Transferring state abstractions between MDPs.
In ICML Workshop on Structural Knowledge Transfer for Machine Learning, 2006.
61. C. Wang and S. Mahadevan. Manifold alignment using Procrustes analysis. In
International Conference on Machine Learning, 2008.
62. C. Watkins. Learning from delayed rewards. PhD thesis, University of Cambridge,
1989.
63. A. Wilson, A. Fern, S. Ray, and P. Tadepalli. Multi-task reinforcement learning: A
hierarchical Bayesian approach. In International Conference on Machine Learning,
2007.
22