Masters Thesis
Masters Thesis
Masters Thesis
Masters of Philosophy
August 2012
Declaration
I hereby declare that this submission is my own work and that, to the best of my
another person nor material which to a substantial extent has been accepted for the
award of any other degree or diploma of the University or other institute of higher
learning, except where due acknowledgement has been made in the text.
31 August 2012
ii Declaration
Abstract
Planetary rovers are required to safely navigate across unstructured and hazardous
pending danger because remote control has been proved challenging and dangerous for
planetary rovers due to the large communication delays. Safety is especially a concern
Unstructured terrain poses several types of hazards to a robot such as getting stuck,
safety is very sensitive to deviations from an intended path. To maintain the safety
of the platform during navigation, planetary rovers must consider control uncertainty
during motion planning. Thus, not only does the system need to make predictions
of action outcomes, it also needs to estimate the accuracy of these predictions. The
aim of this research is to provide planetary rovers with the ability to plan motions to
goal regions that optimise the safety of the platform by including information about
the accuracy of its controls. Modelling such control uncertainty is dicult due to the
complex interaction between the platform and its environment. In this thesis, we pro-
pose an approach to learn the outcome of control actions from experience, represented
policy for navigation to a goal region. Motion planning strategies are considered that
heading and yaw of the platform, across various motion primitives. The approach
is implemented on a holonomic rover with six wheels and a Rocker-bogie frame and
tested on a Mars analogue terrain that includes at ground, small rocks, and non-
traversable rocks. Planning is computed over a terrain map built using an on-board
RGB-D camera. We report the results of 200 simulated and 95 experimental trials
that validate the approach. These results demonstrate that explicitly incorporating
knowledge of control uncertainty into the motion planning process increases platform
safety by decreasing the likelihood of the rover getting stuck and reducing the cost
accumulated over the executed path. Accounting for heading uncertainty resulted in
I wish to sincerely thank both my supervisors Robert Fitch and Thierry Peynot for
their constant guidance and support. This work would not have been possible without
their continued assistance, and I have learned so much under their supervision.
A sincere thanks also to my work colleagues Ali, Esa, Ken and Angela whom helped
build, maintain and debug the rover test platform. Being part of this team, and
interfacing this research with other aspects of the rover gave me a great sense of
satisfaction, a great source of learning and a lot of fun.
I would also like to acknowledge the Pathways to Space program of the Powerhouse
museum in Sydney, for their use of the `Marscape' which allowed for testing in a
realistic Martian environment, and maintained interest in the project as a public
exhibition. Also Salah Sukkarieh for heading the rover project.
Lastly I would like to thank my family for their consistent love and support.
I have approximate answers and possible beliefs and dierent degrees of certainty
about dierent things, but I am not absolutely sure about anything
- Richard Feynman
Declaration i
Abstract iii
Acknowledgements v
Contents vii
List of Figures xi
Nomenclature xv
1 Introduction 1
1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Related Work 7
2.2 Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Background 19
3.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.3 Sweeping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Implementation 35
5.4 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.5.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5.2 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.5.3 Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6 Experimental Results 49
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Bibliography 71
A 77
6.5 Second learning results for rough terrain traversal, used for rock eld
B. This also compares control errors encountered during learning and
testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.6 Planning with traversal over rock eld A and rock eld B . . . . . . . 67
xiv List of Tables
Nomenclature
a action
A action set or action space
s state
S state space
∆s change in state
δs error in change of state
δshead error in change of state by heading
δsdist error in change of state by distance
δsyaw error in change of state by yaw
p(s0 |s, a) control uncertainty model
P (s0 |s, a) transition probability function
R(s0 , s, a) reward function
α chassis conguration angles
θ pitch
φ roll
ψ yaw
Θ hyperparameters of the GP
λ terrain prole features / characteristics
DP dynamic programming
GP Gaussian process
MDP Markov decision process
xvi Nomenclature
Chapter 1
Introduction
1.1 Objectives
Autonomous planetary rovers have the potential for persistent operation, reacting
in real time to dangerous situations without the need for frequent manual inter-
vention [37]. Operating rovers via remote control, by contrast, is limited by high
control uncertainty. Robots such as planetary rovers are designed for mobility in
the purpose of motion planning is dicult due to the complexity of this type of envi-
in environments that expose the robot to the risk of serious mechanical damage. We
are interested in this problem in the context of planetary rovers [46]. An example of a
planetary rover is shown in Fig. 1.1. This thesis addresses navigation of autonomous
Figure 1.1 Planetary rover Mawson used for experimental validation, shown in the
Mars yard at the Powerhouse Museum in Sydney, Australia. All experiments were
performed in this environment.
The goal of classical geometric motion planning is to plan a sequence of motions, from
start to goal, that minimise time or distance travelled while avoiding obstacles [35].
The conceptual distinction between free space and obstacles for planetary rovers on
unstructured terrain, however, is not clear. One denition of obstacle in the con-
mechanical damage to the robot. Such obstacles are not directly observable, though
a continuous metric of risk to the platform's safety can instead be derived using a
The result of a control input is never certain due to imperfect actuation and imperfect
knowledge about robot-terrain interaction. Thus a robot will not follow an intended
path exactly, which can be disastrous in an environment cluttered with high risk
where, due to control uncertainty, the robot would have a high likelihood of deviating
into a high risk region. As deterministic planning algorithms do not account for
deviations from the computed path, they expose a robot to considerable risk.
1.2 Motion Planning and Control Uncertainty 3
cult for planetary rovers due to complex terramechanics [20]. For previously unob-
served terrain, prior models of terrain properties may not be available. It is therefore
important to model control uncertainty with a method that can be feasibly executed
online during operation of the robot, and to validate such a model experimentally.
the relation between control uncertainty and the local environment in previous
terrain.
• Validate that the methods proposed increase the safety and the reliability of a
The aim of this motion planner is to compute control inputs that guide the robot
1
A motion policy is used to consider the expected long-term consequences of each po-
tential deviation. A policy is required, rather than a path, since potential deviations
of the robot cannot be anticipated due to the uncertainty of control input outcomes.
The motion planner predicts the outcomes of control inputs from specied positions
namics model. This model is trained on various terrain, and then used for predictions
Our approach builds a statistical model of control uncertainty directly from observed
heading of the platform and in distance travelled. We use this GP model to build
a stochastic transition function for use in motion planning. The planning goal is to
compute a policy that allows the robot to reach a given goal location while maintaining
the safety of the platform. We assume that a map of the environment is available,
function over this terrain map, which is constructed a priori using on-board sensors.
We compute the policy using dynamic programming (DP), where the resolution of
in the planning phase to compute safe paths over unstructured terrain. A number
of research groups have proposed frameworks for autonomous ground vehicles with
similar aims. However, few of these account for control uncertainty in the planning
for realistic predictions of action outcomes. Existing motion planning research under
uncertainty has not yet been extended to cases of unstructured terrain outdoors. This
gap between realistic perception and good motion planning under explicit uncertainty
has not been tackled before on unstructured terrain, although much attention has
been given to simulated environments [2, 7, 10, 19, 30, 31, 42].
In this thesis, we present the details of our approach and its implementation for the
planetary rover shown in Fig 1.1, which was introduced originally in [39]. The environ-
ment consists of at terrain, traversable rocks, and non-traversable rocks. GP models
are learned for rock traversal, mapping environment features to a distribution of re-
sulting rover congurations (in state space) for two types of control actions. The cost
map is constructed from data collected by an RGB-D camera on-board the rover. We
report the results of 200 simulated trials and 95 experimental trials that evaluate the
rover's ability to traverse at terrain and small rocks while avoiding non-traversable
with and without control uncertainty. Our results show empirically that planning
1.3 Contributions 5
with control uncertainty improves the rover's ability to navigate while avoiding non-
traversable areas. These results demonstrate the value of planning under uncertainty
1.3 Contributions
The main contributions of this thesis are an approach to motion planning that con-
validation of this approach on real unstructured terrain, supporting its suitability for
form safety,
Chapter 2 is a review of related work. First, motion planning algorithms are re-
are then considered, and implications of these sources of uncertainty on motion plan-
ning are discussed. Finally, methods to incorporate control uncertainty into motion
planning are discussed, as well as how models of uncertainty have been learned from
experience.
6 Introduction
Chapter 3 introduces key algorithms used in the development of our technical ap-
proach. The DP algorithm is rst discussed, outlining how optimal control policies are
using GP is discussed.
Chapter 4 outlines the approach chosen in this thesis for motion planning under
stochastic control. This approach is general and applicable to any ground robot. The
robotic platform used for experimental validation. Details of the platform, how it is
modelled and specication of the state space are given. The source of data chosen to
in simulation and in real experiments on at terrain and on rough terrain are given
and discussed. Results show smoother behaviour and an increase in executed safety
Related Work
A typical autonomous terrain traversal system is shown in Fig. 2.1 [4]. This system
observes the environment using sensors, which pass data to the rover's software com-
ponents (yellow) to decide the next action command or path for a controller to follow.
In turn, the controller sends control commands to the motors and actuators on the
rover platform. Sensor data is used to determine the rover's localisation and also to
build a terrain model. The terrain model is a representation of the rover's environ-
for building a cost map. A cost map is used to quantify the diculty of traversability
at locations throughout the known environment. This cost depends on vehicle knowl-
edge - a knowledge of constraints about how the rover can move and operate in an
environment. The motion planner uses the vehicle knowledge, a specication of the
goal region to traverse to and input from the localisation and cost map to decide the
next appropriate action command to send to the controller. The aim of the motion
planner is to send a series of action commands, directing the rover to its goal location,
the platform.
8 Related Work
Goal Input
Figure 2.1 A typical planetary rover system. Hardware components shown in red,
software components shown in yellow.
Planning motions of a robot through geometric space that avoid collisions with ob-
(vector) is a specication of the values of each of its degrees of freedom, which de-
termine the geometric space the robot occupies. Conguration space is the set of all
that result in a contact with an obstacle, and Cf ree otherwise. Using this abstrac-
tion to plan in conguration space rather than geometric space allows generalised
motion planning algorithms to be used, which are independent of how the robot is
constructed. Planning for point robots in known C -space has been studied exten-
sively [33, 35]. The general motion planning problem of nding a shortest collision
Sampling-based planners
eciently. Feasible paths are collision free but not necessarily shortest. Sampling-
based algorithms are not complete: they cannot detect if a solution does not exist
in nite time. However, if a solution does exists, the planner will eventually nd it.
steps: construct a roadmap and then query the roadmap. A roadmap is made up of
2.1 Motion Planning 9
• vertices: states which exist in free space (robot not in contact with an obstacle),
• edges: the result of an action which would cause the robot to transition contin-
uously through state space from one state to another without collisions.
The roadmap is constructed by sampling the conguration space many times (ran-
domly, or according to a pattern), discarding samples which are not in Cf ree , then us-
ing a local planner to attempt to connect to nearby local congurations. The roadmap
is queried using any graph search algorithm, and may terminate once a collision-free
path exists between start and goal congurations (or possibly keep searching for lower
cost paths). Because the roadmap can be queried for various start and goal congu-
Rapidly-exploring random trees (RRT) [34] is another, widely used sampling based
planner [7, 42], suited to high-dimensional spaces. RRTs compute open loop paths
planner to compute a path to a close conguration on the existing tree to the random
sample. Due to constraints of the robot, the path may not link the two congurations
exactly, but nevertheless the tree grows. This has the property of ecient `space
lling': nding a path (albeit non-optimal paths) such that a path can be found
between the root-conguration to most other congurations (or at least close by).
of the RRT* algorithm [24] has been proven to probabilistically converge towards an
optimum path-solution with increased sampling. An RRT admits a single start and
Grid search is a family of algorithms that plan over a discretisation of both conguration-
space and action-space. Reachability trees, A* (or similarly D*) and dynamic pro-
means that the solution is computed with respect to the given discretisation.
10 Related Work
Reachability trees are arguably the simplest grid-search algorithm. Motion plans are
Breadth First Search (BFS) which nds shortest paths with (and only with) unit
edge weights. The root node is the robot's start conguration, and node expansion
involves assessing the result of each motion primitive available to the robot. Each
motion primitive results in a new conguration child-node. The process stops when a
goal-conguration is reached. The path returned is the path of least possible motion
A* and Dijkstra's algorithm nd shortest paths with non-negative, non-unit edge
weights and are thus more general than BFS. A* also uses a heuristic function to guide
a search, making it much more ecient in large search domains. Dijkstra's algorithm
As such, A* is more general and ecient than BFS and Dijkstra's algorithm, and a
common choice for motion planning algorithms [17, 31]. D* is a similar but dynamic
addition, these algorithms compute only a single path and not a complete motion
policy which is more relevant to a robotic system which will inevitably deviate from
an intended path.
Markov decision processes (MDPs) are commonly used to formulate problems in mo-
tion planning with uncertainty [35, 36], treating uncertainty explicitly and providing
0
represented as a stochastic transition function P (s |s, a), describing a probabilistic
An optimal policy can be computed using P (s0 |s, a) with dynamic programming [2].
However, these techniques are most often evaluated in simulation only and there is a
continuous calculus of dynamic programming, solving the search explicitly with ordi-
2.2 Uncertainty 11
nary dierential equations (ODEs). Such a technique can be far more ecient than
iterative computation, but unfortunately so far, a general technique for solving asso-
ciated ODEs has not been found. Solutions have been to some simple systems whose
input.
2.2 Uncertainty
Uncertainty can describe a lack of knowledge or inability to detect precisely the state
of an entity or the outcome of a process. From an agent's point of view, there are
multiple possibilities of what the true state of an entity is, or what the outcome of a
process will be. Quantication of uncertainty, with sucient data, can be represented
ing the robot itself ). This includes knowledge of the state of the universe, and
thing can be known in theory, but is not considered in practice. For instance, a
that two control inputs are identical. For instance, repeated trials of a system
that records successive control inputs as identical will actually result in dierent
This thesis is concerned with both Aleatoric and Epistemological uncertainty. The
following subsections consider the dierent forms of uncertainty that exist in the
rover system (Fig. 2.1) and the related work that mitigates risk and cost to a rover
Localisation uncertainty implies that a robot does not know exactly where it is in
relation to the world around it. If a robot cannot be exactly sure of where it is, the risk
global frame of reference, this uncertainty and associated risk of collision will increase
with time if the localisation system is proprioceptive. The severity (or variance) of
with many distinct features allow for lower localisation uncertainty [12].
space. Belief space is a probabilistic distribution over possible states. The partially-
observable Markov decision process (POMDP) [22] is a common solution for planning
in belief states according to the MDP model of the environment [21, 30]. Optimal
actions in the POMDP framework are those which maximise the average expected
utility of executing the action from each possible state the robot is in, weighted by the
level of belief (probability) that the robot is in that state. POMDPs are complete and
provide optimal motion policies, although do not scale as well as other methods in-
cluding PRM-adapted approaches for searching belief space [44]. Other methods plan
path towards some future conguration, the localisation algorithm can be modelled
according to known features it would see, and thus a prediction of localisation error
could be computed. Such a method was proposed in [10] by including the predicted
When an object is sensed, e.g. using a laser range nder, the distance and/or direction
measured will vary slightly from the true value due to imperfect sensing. A similar
Examples can include point clouds where individual points have associated position
planning motions over such terrain will be uncertain about its interaction with that
terrain. The robot will therefore be uncertain about the incurred risk and cost at
of terrain into a cost function, a greater reliability of selecting safe footholds was
achieved compared with an Interpolated Elevation Grid terrain model that `lls in
the gaps' with no uncertainty. Methods to deal with environmental mapping uncer-
an obstacle is known to exist, but the location of that obstacle is only approximately
known [11, 16, 40]. These motion planners reason about the probability of each
conguration being an obstacle. This was implemented using PRMs, querying least-
cost paths where the cost of each conguration is a function of the belief that the
conguration is in Cobstacle .
14 Related Work
motion planner will not be executed in an exactly predictable way. Thus, from a
state the rover is in before executing an action, the resultant state is not known with
certainty a priori. In regards to motion planning, this means a robot can never be
a resultant state s0 from an original state s and action a selected by: P (s0 |s, a) which
can be incorporated into motion planning explicitly. DP can use this model of control
uncertainty to compute control commands which are optimised with respect to this
model to avoid high risk regions [2]. Control uncertainty is the type of uncertainty
this thesis addresses, and a discussion of how control uncertainty is incorporated into
ning
that even if the robot slips or skids unexpectedly, it is unlikely to collide with an
obstacle. Such an approach rst involves the expansion of obstacles using Minkowski
addition [50] to account for the shape of the robot [2], and a further obstacle expan-
sions by the `safety factor' amount in each direction. Such an approach can work well
Cobstacle . Additionally, safety factors pose articial limits on what would otherwise be
Cf ree , such that in cluttered environments, it may remove the possibility that a solu-
tion can be found at all. If unstructured terrain is modelled in this fashion, it often
neither optimal nor robust with respect to a measure of diculty, nor the uncertainty
The rest of this section presents previous research on motion planning with control
One approach is to avoid regions the platform estimates would incur higher uncer-
tainty in control. This avoids the problems of dealing with uncertainty by avoiding
regions of higher uncertainty. This has been done by expressing uncertainty as a cost,
where higher uncertainty incurs higher cost, and then planning a path that minimises
this cost assuming deterministic control [17]. However, by `avoiding the problem',
such an approach can fail in terrain where most states incur considerable control un-
does not necessarily imply high risk of collision, nor vice versa. A fundamental re-
quirement of motion planning is to avoid obstacles, and such a cost function does not
the path itself [27]. Linear-Quadratic-Gaussian (LQG) controllers have been used to
from potential deviations [7, 10, 15, 19, 21, 42]. However, this treats motion planning
and control uncertainty as decoupled: candidate paths are selected, and afterwards
evaluated in light of a stochastic control model. Several iterations may result in some
rejected paths until a suitable path is found. Whilst this methodology can provide
paths that satisfy a minimum-threshold of safety, the path selected is not optimised
16 Related Work
the problems as coupled. Once the robot has deviated from its path, the best option
control policy which does not assume (nor force) a robot to commit to a pre-computed
path, once deviations inevitably occur in the execution of its motion strategy.
controls is that the model assumes the degree and nature of uncertainty in controls
is homogeneous throughout all the environment. While this could be suitable in ho-
Additionally, these methods were validated in simulation only, and not unstructured
terrain. Learning mobility, and consequently control uncertainty, directly from ex-
mobility of a robotic platform have been achieved using terramechanics and near-to-
far learning. These methods have been used to estimate uncertainty in outcome of
motion-primitives and to estimate slip. Slip is the dierence between wheel angular
velocity and linear velocity of the wheel centre divided by wheel radius, normalised by
the maximum of both terms [5]. Any slip value other than zero is often undesirable.
els by considering features such as soil cohesion and density [18]. These methods have
been developed and tested for rovers on real terrain, not just simulation. An advan-
tage of these methods is feature identication: motion is predicted based upon the
values of a small and relevant set of features selected from a physical understanding of
the rover-environment interaction (e.g. soil cohesion, soil density, wheel traction etc.).
This can reduce the number of parameters that require calibration during a learning
has been proposed that generates a Gaussian distribution over predicted future states
on homogeneous terrain [20], and to predict slip [19]. These methods are well suited
precisely model non-homogeneous terrain that includes dierent grades of sand and
rocks of dierent sizes and shapes. In addition, it is extremely dicult to model how
2.3 Incorporating Control Uncertainty in Motion Planning 17
ing framework called `near-to-far' learning [9, 29]. These approaches use observations
when `far away' to classify terrain types according to the associated proprioceptive
mechanical properties encountered when on (`near') that same terrain. While this
approach does not explicitly learn a model of control uncertainty, this learning frame-
work could be suitable for that purpose, and could be used to learn continuously even
when deployed on missions. Inference using regression, however, would have the ad-
vantage over classication used in this technique of a less abrupt change in behaviour
Research done by [3, 25] learned to estimate slip based on visual and geometric
information of unstructured terrain not yet traversed. Work in [25] uses terrain slope
and control inputs to estimate wheel slip using GPs. This method does not utilise
the uncertainty in slip explicitly, rather it creates a mobility map of velocity limits
as a function of estimated slip and uses deterministic A* to solve for least cost path.
However, this framework for slip estimation could be extended to provide a model of
uncertainty in resultant state and thus control uncertainty. The use of non-parametric
terrain has the advantage over parametric models that it makes fewer assumptions
Integrating a realistic model of control uncertainty into the planning process requires
tainty, decouple control and planning, or use parametric models of control uncertainty
suer from lower expectation of robot safety because they make strong assumptions
of the environment. Despite these drawbacks the aforementioned methods have some
advantages which are summarised in Sec. 2.4, and used to inspire our approach pre-
sented in Chapter 3.
18 Related Work
2.4 Conclusion
The literature shows that incorporating a model of uncertainty into motion planning
the uncertainty model, when the model is explicitly incorporated into motion plan-
ning. A realistic uncertainty model necessitates training. This is especially true for
unstructured terrain where the degree of control uncertainty can vary widely, and
tured terrain, we argue that learning without a complex terramechanics model, nor
terrain because it makes fewer assumptions about the complex interaction between
Background
This chapter introduces two key algorithmic ideas used in this thesis: dynamic pro-
3.1.1 Overview
An MDP is dened by a nite set of states S , a set of actions available A(s) executable
at each state s ∈ S, a transition probability function P (s0 |s, a) which outputs the
probability that executing action a from state s will result in the agent being in state
s0 , and a reward function R(s0 , s, a) which outputs the `reward' an agent would receive
by executing action a from state s which results in state s0 [47].
agent is to transition to a goal state in S. Default values are given to all states
initially, and various algorithms exist that compute the optimal value function V ∗ (s).
The optimal value function in the MDP framework satises the Bellman optimality
equation:
nX o
∗ 0 0 0
V (s) = max P (s |s, a)(R(s , s, a) + γV (s )) , (3.1.1)
a
s0
relative to immediate rewards. Given the Bellman optimality equation (Eq. 3.1.1),
nX o
π ∗ (s) = arg max P (s0 |s, a)(R(s0 , s, a) + γV (s0 )) , (3.1.2)
a
s0
A more extensive description of the various algorithms in DP, and methods to increase
their eciency can be found in Sutton and Barto's book (1998) [47]. This thesis uses
the value iteration method, detailed below, to construct an optimal motion planner.
initialising the value of all states in S to −∞ , except for goal states which retains xed
values of zero. Goal states may comprise a single state, or multiple states, perhaps all
state that lie within a goal region. Values of each state are updated iteratively, and
once all state values have stabilised (condition on line 9), the iteration terminates,
and the optimum policy is returned. The optimum policy is a greedy search of the
highest value state (line 10). At each iteration (while loop: lines 2 - 9) the value of
of resultant states s0 it can reach, by execution of each action a∈A (line 6). This
update (line 6), assesses the maximum value of state s according to each action a ∈ A
3.1 Dynamic Programming 21
that can be executed at state s. For each action, the `value of state s if action a is
state. The maximum value is selected, and the value function is updated for state s
(line 6).
in Fig. 3.1 with corresponding policy improvement in Fig. 3.1(b). This shows four
4×4 grid. Available actions from any state are to transition either left, right, up or
particular state if it would cause the agent to transition out of bounds. Goals states
are coloured grey, they begin with value zero, and are the only states exempt from
value updates. Each iteration updates the value function of every state once, which
the agent can use to navigate towards a goal using Eq. 3.1.2. Each action incurs
0
a reward of `-1' (R(s , s, a) = 1, ∀ s ∈ S, ∀ a ∈ A in this example) to execute an
action. Beginning at the rst iteration (k = 0), all non-goal state values are set to
negative innity. After the rst backup (k = 1), the value of each state updates by the
maximum value of `reward + value-of-neighbour-state' it nds. The value of states
22 Background
adjacent to the goal states update to −1 + 0 = −1, whilst states further away remain
at negative innity as their neighbours were still currently valued as negative innity.
Since values were updated during this backup, the algorithm has not necessarily yet
converged, and thus it will iterate again. As the k = 2 iteration, states adjacent to `-1'
valued states update to value `-2' and so on. Finally the value function converges after
the k = 3 backup, and the optimal and greedy policy in Fig. 3.1(b) at iteration k = 3
is returned. By following this optimal policy, an agent anywhere on the environment
can nd its way to one of the goal states with an optimum number of action executions
tialises a policy with random actions to perform from each state, and iteratively:
1. computes policy evaluation (a value function Vπ for the current policy π ). This
step is itself an iterative method similar to algorithm 3.1 except for line 6, where
the action is chosen according to the current policy π rather than the action
iteration is, however, generally not as ecient as value iteration due to the nesting
of the policy evaluation iteration within policy iteration [47]. Thus policy iteration is
3.1.3 Sweeping
focusing on states whose values are frequently changing. For instance, if the value of
one state s0 is updated, the value of each state s that can transition to it (predecessor
states) may also need to be updated due to Eq. 3.1.1. Conversely, if no resultant
24 Background
states s0 transitionable from state s updated their value in the previous backup,
there is no need for s to update. Thus, value propagation can still be achieved
without updating every state's value during each backup, rather only the predecessor
states whose resultant state's values did change on the previous iteration. This more
in Algorithm 3.2.
Prioritised Sweeping uses a priority queue in Algorithm 3.2. As the change in value
in some updates will be greater than others, and a large change in value is more
`relevant' in value propagation than small updates, associated predecessor states can
This thesis makes use of Gaussian process regression for learning-based mobility pre-
detail in [45]. As mentioned previously in Sec. 2.3, GPs have the advantage over para-
metric models that they makes fewer assumptions about the complex environment-
robot interaction. In addition, GPs are used in this thesis as they are adept at
data. Thus, control uncertainty (the DP stochastic transition function) can be mod-
elled directly with PDFs the GP predicts, relying on limited training data which may
only form a small subset of the many possible terrain formations the rover could
encounter.
3.2.1 Model
and classication. GPs are a standard framework to learn a model of correlated data
data X and training output data Y to model the correlations between the output
values given dierent inputs. As such, a new test input vector x∗ (or `covariates') can
inputs and outputs, with a focus on training free-parameters within such a function,
optimising for Root Mean Square error of the test datum. However, it is often the
case that there exist multiple functions which describe such a relation well, or perhaps
there is not enough test data to justify why one candidate function should be used and
scribes how one observation is associated (or correlated ) with another observation.
Typical covariance functions would have high correlation of outputs between two test
datum that are similar, and small correlation between two test datum that are very
26 Background
ters, represented by Θ.
The observed outputs dier from the underlying function by additive noise w of mean
zero and variance σn2 : y = f (x) + w, where w ∼ N (0, σn2 ). σn2 represents the noise in
the observation of the output. The covariance function between two inputs is given
by: cov(xi , xj ) = k(xi , xj ) + σn2 . Here, the covariance function cov is composed of an
A common choice of covariance function used to describe the spatial correlation be-
1
k(x, x0 ) = σf2 exp − (x − x0 )> Λ−2 (x − x0 ) + σn2 I, (3.2.1)
2
where σf2 is the input variance (noise expected in observing the input data) and Λ
is a length scale matrix of diagonal elements that describe the smoothness of the
data or how `fast' the covariance decays as the distance between inputs increases.
These parameters comprise the hyperparameters for the squared exponential covari-
3.2.2 Inference
and variance
the n × 1 vector of all training output (scalar) values, and x∗ is the test input vector.
K(X, x∗ ) is the covariance matrix which stores the covariance of each training input
perparameters involves assessing the likelihood of the observations, given the inputs
and hyperparameters. This is called the marginal likelihood, though the log of the
1 1 n
log p(Y |X, Θ) = − Y > KY−1 Y − log|KY | − log 2π, (3.2.5)
2 2 2
where KY = Kf + o2n I .
A natural choice of hyperparameters are those which best describe why the outputs
Y were observed. This is done by computing the likelihood of observing the training
maximising the log p(Y |X, Θ) term in Eq. 3.2.5. by gradient decent. Unfortunately,
1
this optimisation problem is often not convex , however local optima solutions are of-
ten adequate. The partial derivatives of Eq. 3.2.5 with respect to the hyperparameters
∂ 1 > −1 ∂K
log p(Y |X, Θ) = tr (ββ − K ) , (3.2.6)
∂Θi 2 ∂Θi
where β = K −1 Y . Various methods used for adapting the step size and stopping
criteria during decent exist. This thesis uses the implementation by [45] which uses
1 If
a function is not convex, then local optima computed by gradient decent of such a function
are not guaranteed to be the global optima of the function.
28 Background
to plan motion given these sources of uncertainty. As GPs are adept at providing
predictions with uncertainty given sparsely populated, spatially correlated data, they
are useful tools for learning and modelling dierent sources of uncertainty given ad-
DP and GPs together can be constructed to allow a rover to learn from and plan
With the aim of reaching a given goal region safely while optimising the total cost
of traversal over the executed trajectory, our approach is to take into account the
stochasticity of the control of the robot at the planning stage. This requires modelling
the control uncertainty, which we achieve by experience, using machine learning. This
section rst presents our model of control uncertainty (Sec. 4.1), followed by the
presentation of the learning technique used to train our model of control uncertainty
(Sec. 4.2). The motion planning algorithm, which uses the trained uncertainty model
to compute a motion policy rather than a single path, is then discussed (Sec. 4.3).
Finally, an overview of the complete system is given (Sec. 4.4). Later, Chapter 5
details the implementation details which are specic to our test platform.
Control uncertainty is modelled as a PDF of relative transition between states p(∆s|s, a).
p(∆s|s, a) can be expressed using P (s0 |s, a) where ∆s ≡ s0 − s:
Z
0
P (s |s, a) = p(∆s|s, a)f (s + ∆s, s0 ) d∆s, (4.1.1)
30 Motion Planning and Learned Control Uncertainty
1 if the discretisation of s1 corresponds to s2
where f (s1 , s2 ) =
0 otherwise
p(∆s|s, a) comprises our model of stochastic actions and is tied to observed environ-
mental features that vary across the terrain and is learned through experience. In
The stochastic transition function P (s0 |s, a) in Eq. 3.1.2 is not known a priori and
requires empirical calibration. This section describes how relative state transitions
The estimation of relative change in state is achieved using Gaussian process regres-
sion. As described in Sec. 3.2, GPs are eective in cases where the input data is
sparsely populated and spatially correlated. This is the case for our data. An ad-
Firstly multiple models are not required, one for each terrain classication. Second,
the learning of control uncertainty. This is especially useful when a robot considers
In unstructured terrains, ∆s may strongly depend on factors such as the terrain prole
along the executed path and also the action executed. A complete terrain prole, as
execution, contains too much information such that it would overt the learning
algorithm if input directly. For example, the concerned subset of the elevation map
comprises all elevation cells traversed by each wheel. In our implementation (see
Chapter 5), we have a: 2.5cm state discretisation, 30cm crab actions, and 6 wheels.
This means there are approximately 72 elevation cells the rover traverses during an
4.2 Learning-based Mobility Prediction 31
action primitive. In this case, the input dimensionality (72) exceeds the training
set size for any particular action (Table. 6.1), therefore it is unsuitable for learning.
terrain prole are extracted for learning. A trivial choice of features may be those
prole. However, since this choice of functions is in no way related to the particular
would retain the relevant information to eectively predict motion. Terrain proles
identify the expected terrain prole with reference to the elevation map. The aim
is that the information content of λ(s, a) should be neither too low nor too high for
eective learning and that no pair of functions (λi , λj ) be too strongly correlated
(to avoid unnecessary redundancy in the learning). Good choices for candidate λ
functions to test are those which can distinguish between situations where the robot
would be more prone to slipping or have trouble climbing over a rock, with say hard
at terrain where the rover might have very little stochasticity in control. Therefore,
We use the GP formulation from [45]. The input vector x is a function of the terrain
features, shifted such that the input has zero mean, i.e., x = λ − λ̄train , where λ̄train
is the mean of each terrain feature in the training data. We dene ∆si,a as the ith
single-valued component of the change of state ∆s resulting from executing action a.
We dene a GP for each ∆si,a , i ∈ [[1, N ]], a ∈ A, N = Dim(s). The output
where
value yi,a of one of these GPs is dened as yi,a = ∆si,a − ∆s¯i,a , where ∆s¯i,a is the mean
value of ∆si across all executions of action a in the training data. yi,a = fi,a + wi,a
32 Motion Planning and Learned Control Uncertainty
has a zero mean and variance σn2 equal to the variance of the additive noise wi,a , i.e.,
The covariance function used to describe the spatial correlation between two input
vectors is the squared exponential (Eq. 3.2.1). Using Eq. 3.2.2, the predictive distri-
bution is:
with predictive mean µ∗ and variance Σ∗ are given by Eq. 3.2.3 and Eq. 3.2.4.
Thus, p(∆si,a ) is to be computed for untraversed terrain proles using Eq. 4.2.3,
where the test input vector x∗ has not been observed directly but rather estimated.
Estimation employs a kinematics model of the robot over the series of states that
Finally, our planner considers the uncertainty in each component ∆si separately by
using the full distribution learnt from ∆si and the expectation of the other compo-
We compute a motion policy for the robot using dynamic programming. A policy
can be viewed as a representation of least cost paths from every state to the goal.
to compute which control action to take at every state in the state space. The
a goal region, which is optimal with respect to the discretisation of the state-space,
discrete set of (stochastic) primitive motions the robot has available and the reward
function [35]. Grid search algorithms such as A* can also achieve this, however, as DP
the path, the expected total cost of traversal is additionally optimised with respect to
our uncertainty model (Sec. 4.1). Finally DP computes a motion policy rather than
a single path, which is relevant to this context given that the robot will inevitably
low dimensional state spaces; in our problem the state s can be dened using two
lateral dimensions x and y and one dimension for orientation ψ, i.e., s = {x, y, ψ}.
The motion policy is computed using DP with sweeping (see Sec. 3.1.3) where the
state s denes the robot state (discretised into uniform cells), each action a in A is
a short motion primitive, and the discount factor γ=1 to ensure expected time to
goal does not inuence the safety of the platform. The reward function R(s0 |s, a) is
computed from a cost map which represents the diculty of traversal at each state
in the terrain. Predictions of control uncertainty P (s0 |s, a) are computed by using
This choice of MDP, outlined in Sec. 3.1, is not a risk sensitive MDP, thus our motion
planner is not risk adverse nor risk prone per se, rather it optimises for the average
value of resultant state plus traversal cost, weighted by the probably of arrival, in
selecting subsequent action commands. As such, the motion planner will still avoid
though it is not adverse to high levels of control uncertainty when far from danger.
4.4 System
Fig. 4.1 shows an outline of the system. Once an elevation map of the robot's local
environment has been computed, a kinematics model is used to estimate both the
0
terrain prole characteristics and cost of traversal (R(s , s, a)) at each state-action
3D pointcloud
(offline learning)
elevation map
Learning
Cov. Matrix Kinematics Model
GP Comp. Rewards
P(s'|s,a) R(s'|s,a)
goal (input)
localisation Motion Planning
actions (output)
Figure 4.1 System Outline. Colours indicate perception (red), oine learning (yel-
low), estimation (blue) and planning (green).
Unit (IMU) and the localisation system, are used as input data to train the GP oine.
The GP can then be used to estimate the stochastic transition function P (s0 |s, a)
on terrain similar to that encountered during training. Using both P (s0 |s, a) and
R(s0 , s, a), the motion planner computes the value function (Eq. (3.1.1)) over the
observed state space to follow greedily as per the policy given in Eq. (3.1.2).
Chapter 5
Implementation
This section describes the implementation of the proposed approach on our experi-
All experiments described in (Chapter 6) were conducted in the Mars Yard environ-
ment shown earlier in Fig. 1.1, at the Powerhouse Museum in Sydney Australia. This
2
environment is 117m in area and was designed to reproduce typical Mars terrain.
It contains rocks of various sizes, small craters, and various grades of sand, dirt and
gravel.
Mawson is a six wheeled rover with individual steering servo motors on each wheel
• an RGB-D camera (Microsoft Kinect) mounted on a mast, tilted down 14°, used
ψ
α2
ϕ front
α 4
θ α 1
α 3
rear
Figure 5.1 The Mawson Rover (a) and its chassis conguration (b).
scopes) used to measure the roll (φ) and pitch (θ ) only of the robot,
both bogie angles and the rocker dierential (αi in Fig. 5.1(b)).
The IMU does not require initialisation because only relative changes in pitch and
roll during an action primitive are used for learning, described in Sec. 5.5.1. Absolute
values of pitch and roll are not used directly. Similarly, the system is also robust
For localisation and terrain modelling we use the RGB-D SLAM algorithm [13], im-
plemented in the Robot Operating System (ROS) [14], which uses data from the
An elevation map is generated from the point clouds supplied by the RGB-D camera
0.025m × 0.025m. This decision was based on a lower bound of the resolution,
5.3 Kinematics Model 37
according to the point cloud density the sensor provided, approximately 1 − 2cm
between points at a range of 6m from the sensor. A soft upper bound is the contact
area the wheel forms with the terrain, approximately 0.05m × 0.05m, to provide the
kinematic model with the most accurate data possible to predict motions.
in the Mars Yard environment. Various motion primitives (detailed in Sec. 5.4.2) were
trialled including forward, backward and sideways translations, and rotations. Typi-
cal distance and yaw errors of RGB-D SLAM over 10m traversals are on average 9.7
cm and 0.069 radians [13]. Distance errors in the alternative InterSense localisation
system used in one test (described in Sec. 6.4) was found empirically to be approx-
imately 3cm. Neither localisation system used odometry information nor a system
model, and thus any slip of the wheels did not contribute to localisation inaccuracy.
A kinematics model is used to estimate how the platform would interact with its
of terrain prole characteristics λ(s, a) using this algorithm is presented in Sec. 5.3.2.
Finally, the diculty of traversability of a state (cost function) in the elevation map
L6
L4
front
n3
w4
L5
n1
L3
w5 w1
n4
w2
L1 n2
L2
w6
rear
w3
Figure 5.2 Rover's Kinematic Structure: The rover is modelled with four nodes (pin
joints) and six wheels.
does not take into account the dynamics of the platform, this simplied model is a
The kinematic model of the rover is based on the pin-joint model in Fig. 5.2, which
includes four pin-joint nodes that can freely rotate and six contact points (wheels) that
rest on the surface of the elevation map. Wheel shape is not included in the model;
the point of contact between a wheel and the elevation map is approximated as the
base of the wheel. As such, local deformations of both the wheel and terrain caused
addition, variation in wheel height, compared with the measurements taken of wheel
heights on at terrain, are required to be small. Both of these have been the case in our
training and testing. Another consequence of wheel size when traversing jagged edges
is the ability to `smooth out' vertical deviations of the wheel, which have thus been
ignored. However, these errors experienced during testing were also small, plus the
localised smoothing eects do not eect how the terrain characteristics are computed
(Eq. 5.5.2 - 5.5.5). This is thus a simplistic model, but realistic enough for our
angles {φ, θ} and pin-joint rotations {α1 , α2 , α3 , α4 }. During each iteration, wheel
positions are calculated according to these values, and each wheel's height above the
elevation map is used as an error term in the following iteration to adjust orientations
each wheel's height above the elevation map is less than 1cm.
elevation of the map at the same x -y location. The iterative process follows:
• The rover is `lowered' onto the elevation map by the minimum hi value, so that
at least one wheel will be in contact with the elevation map, the other wheels
• The pitch θ of the rover is then corrected (Fig. 5.3(a)) such that the average hi
value of the front wheels is equal to that of the rear wheels:
h1 +h4
θ → θ − asin (h3 +h6 )(L4 +L2 /2)
• The roll φ of the rover is then corrected (Fig. 5.3(b)) such that the average hi
value of the right-side wheels is equal to that of the left-side wheels:
h1 +h2 +h3
φ → φ − asin (h4 +h5 +h6 )(L6 )
• The Rocker's right pin-joint is corrected such that the h1 value of the front-right
wheel is equal to the average of the right-bogie wheels:
2h1
α1 → α1 + asin (h2 +h3 )(L4 )
This loop exits when the all hi values are less than 1cm. An exception is if a asin()
function returns N aN which means not all six wheels would in contact with the ele-
vation map at this (x, y, yaw ) location. In this case the state is marked as an obstacle
and the motion planner will not plan through it. Pseudocode of the kinematics model
h3
h1
h1
(a) Pitch correction: from the right-side of view of the rover, an iteration of the
kinematics model algorithm corrects the rover's pitch.
h6
h3
h3
(b) Roll correction: from the rear view of the rover, an iteration of the kinematics
model algorithm corrects the rover's roll.
Given an elevation map and a method to predict attitude angles and chassis cong-
uration angles, a feature map is computed. This maps states-action pairs to charac-
teristics of the expected terrain prole encountered (Eq. 5.3.1). The terrain prole is
matics model. Thus, for any state-action pair, a corresponding value of λ(s, a) is
5.3 Kinematics Model 41
Figure 5.4 Rover simulated on a terrain model using the kinematic model.
42 Implementation
λ(s, a) is estimated for each (s, a) pair by rst assessing the expected terrain prole
the rover would traverse. This is computed as a linear interpolation between the
start state `s' and the the expected resultant state E(s0 |s, a). The kinematics model
is used to estimate rover attitude angles and pin-joint rotations at states along this
line, giving a sequence of expected congurations along the motion primitive. This
sequence, representing the terrain prole, is used as input into λ, discussed further
in Sec. 5.5.2.
The cost function chosen to generate the cost map from the elevation map penalises
large absolute values of roll, pitch, and conguration angles of the chassis at a given
where:
The (α2 − α1 ) term is used as only the dierential of the rocker can be measured.
Since the conguration of the robot at a given position on the elevation map depends
on its orientation, a 2D (x, y) cost map needs to be generated for each discretised
orientation. The result is a 3-dimensional (x, y, ψ) cost map. The cost function was
derived empirically by noting the diculty of the platform in climbing dierent sizes
of rocks.
5.4 Planning 43
5.4 Planning
This section describes how the motion planning algorithm of Sec. 4.3 is implemented.
This includes how the state space, action space and reward function have been dened,
As mentioned in Sec. 4.3, the rover's state s is dened as its position and orientation:
s , {x, y, ψ} ∈ R3 , (5.4.1)
where x and y are orthogonal lateral dimensions in meters, and ψ represents yaw in
radians. This denition species all other orientations and internal angles φ, θ, α at
each state using the kinematics model. State resolution was required to be smaller
than the uncertainty bounds of resultant positions of actions in order for uncertainty
1 π
to be considered by the DP . A discretisation of 0.025m × 0.025m × 32 rad is sucient
in which actions could result in one of multiple distinct states. The discretisation was
determined empirically using the standard deviation of errors of heading and yaw
from actions listed in Table 6.1. This equates to approximately two to ve possible
The point position {x, y} refers to approximately the rover's centre of rotation. The
centre of rotation lies 13mm from the rover's line of symmetry, to the right when
rotating clockwise, to the left when rotation anti-clockwise, due to the front wheel's
motors being more powerful and a forward centre of gravity. Thus, the rover would
`drag' the platform slightly when rotating. The robot's point position is dened as
1 Ifthe state discretisation dominates the range of resultant positions such that within two
standard deviations all possible resultant positions computed lie within the same state discretisation,
then the planning is eectively deterministic. In this case information about the uncertainty of a
motion primitive is lost.
44 Implementation
We dene two action types for the rover: crabbing and rotation. Crabbing corre-
and heading, with no change in ψ, and constant linear velocity (0.11m/s). The rover
total, the action set A is composed of 2 rotation and 8 crabbing motion primitives:
A, rotate (π/4) , rotate (−π/4) ,
Restricting the actions to this set was the result of a trade-o between rover dexterity
and algorithm complexity. A discrete set is used, rather than continuous, as the
motion planning task in the MDP environment is not convex and cannot be solved by
gradient descent. The set was chosen empirically with sucient actions to achieve a
suitable level of path diversity. However, the number of actions was limited to 10 to
bound the training set size required (independent training is required for each action)
and also limit the number of actions assessed at each state during planning. A single
The reward R(s0 , s, a) of an action is dened as the negative of the average cost of
states that lie on a linear interpolation between a start state s and a resultant state
s0 :
K
0 1 X i
R(s , s, a) = −ξ − Cost sx + (s0x − sx ),
K i=0 K
i i
sy + (s0y − sy ), sψ + (s0ψ − sψ ) , (5.4.3)
K K
5.5 Learning and GP 45
where ξ = 0.003 is a small penalty used to deter excessive motions on at terrain,
from [45]. The approach was rst tested on at terrain which consisted of at sand
and large rock obstacles to avoid. This was intended as a preliminary trial of the
approach before extending training and testing to `rough terrain'. Rough terrain
tests used rocks of dierent sizes, a subset of which were small enough to not be
5.5.1 Training
Training data were obtained through experimental runs of the rover executing each
action a from A multiple times, while logging discretised sets of {α, φ, θ, time, s, a}
continuously. The training data collected comprise the rover's attitude angles, pro-
vided by the on-board IMU, and the deviation from the expected motion when exe-
cuting a given action, measured using the localisation system. Due to the left-right
symmetry of the platform, training was only required on 6 of the 10 actions from
A (Fig. 6.1 shows which set of training data can be combined between symmetric
actions).
Due to slow localisation updates, ∆s could only be measured at the end of each action
primitive. However, multiple values of {φ, θ, α} were recorded during the execution
of an action primitive, resulting in a vector of these values {φ, θ, α}, representing the
terrain prole.
fore, motion errors were learnt with respect to action only. As both the input and
46 Implementation
output were uni-dimensional, basic mean and variances of motion errors are computed
Rough terrain: Terrain proles were encoded more compactly by extracting fea-
tures of the evolution of {φ, θ, α} to limit training data dimensionality, and thus
avoid overtting. A combination of terrain prole features were tested with GP re-
gression using cross validation. The features (vector of functions λ) which produced
the lowest Root Mean Square error between the GP mean estimations and test datum
λ , {λ1 , λ2 , λ3 , λ4 }, (5.5.1)
where
decrease of θ during an action primitive. α did not improve predictive results of the
5.5.2 Prediction
When planning over previously untraversed states, φ and θ are estimated using the
kinematics model from a state-action pair: φ̂(s, a) and θ̂(s, a). The λ functions are
then applied to these estimates, which are input to the GP. Thus, the expression
φ̂(s, a) and θ̂(s, a) are computed using the expected sequence of states s traversed
by (s, a) as the linear interpolation of states between the start state `s' and the
expected resultant state E(s0 |s, a). For each interval state si in s, the kinematics
5.5.3 Outputs
The components ∆si of ∆s (see Sec. 4.2) were dened according to radial coordi-
nates, as per the control space of crabbing and rotations, opposed to the Cartesian
representation of the state space {x, y, φ} itself. The ∆si components we consider are
5.6 Discussion
Uncertainty of the resultant state has several causes including imperfect actuation, un-
requirements to climb dierent rocks are factors which are not modelled explicitly,
but correlate with the terrain prole features learned, and thus implicitly captured
in the training data. Training and testing was restricted to non-sloping terrains to
48 Implementation
limit the complexity of the scenarios. Whilst this does not encompass all scenarios
In the rest of the thesis, we dene control errors (δs) similarly to slip [3] as the
˜)
dierence between observed change in state (∆s) and ideal change in state (∆s
Experimental Results
We evaluated our approach both in simulation and through experiments with a plan-
etary rover robot. Our experimental methodology consisted of two phases. First, we
learned statistics of control error through empirical trials, described in Sec. 6.1. Then,
we performed navigation experiments using these learned data to build the motion
and Sec. 6.3 and 6.4 describe experiments using the robot.
Training was conducted for two cases: at-terrain traversal and rough-terrain traver-
sal. For the at terrain case, control errors were learned by executing multiple runs for
(one at a time).
Statistics on these error terms of the training data obtained (before GP estimation)
for δshead and δsyaw are shown in Table 6.1. In particular, this table indicates the
mean and standard deviation (std.) of the error for each action.
50 Experimental Results
per symmetric action), recording IMU values and localisation data. The heading
errors (δshead ) and yaw errors (δsyaw ) learned for each of the six symmetric actions
are shown in Fig. 6.1. Mean and variance were computed directly for at terrain
learning rather than using GPs because the input data (the action executed) was one
dimensional. Although the terrain was mostly at, the variance in motion primitive
error is signicant, which validates the need to take uncertainty into account in the
Gaussian.
(d) Crab(0.3m, ± 3π
4 ) (e) Crab(0.3m, π ) (f) Rotate(± π4 )
Figure 6.1 Mobility prediction by action on at terrain. Histograms present heading
errors recording during crabbing, and yaw errors recorded during rotations.
During rough-terrain traversal, various rocks were traversed by each wheel of the
rover. Note that in some cases some rocks shifted under the weight of the rover,
slightly sinking into the sand or rolling over. These types of situations, which are ex-
52 Experimental Results
tremely dicult to predict by modelling, were therefore captured in our learning data.
As expected, Table 6.1 and Fig. 6.2 shows the control errors were more signicant in
(d) Crab(0.3m, ± 3π
4 ) (e) Crab(0.3m, π ) (f) Rotate(± π4 )
Figure 6.2 Mobility prediction by action on rough terrain. Histograms present head-
ing errors recording during crabbing, and yaw errors recorded during rotations.
Table 6.2 shows the GP hyperparameters obtained. A visual example of the terrain
per symmetric action). The time to computing terrain features of all 296 motion
primitives and train all 18 GPs (Eq. 4.2.3) was 40 seconds. This was done oine on
We simulated the robot traversing at terrain. A simulation was conducted to test
the planner more times than was feasible on the real terrain. Control uncertainty
was simulated using the learned data described in Sec. 6.1. The robot's environment
6.2 Simulation of Flat Terrain Traversal 53
Figure 6.3 Example of terrain prole features. A set of features are computed
for an action primitive during training. Green: evolution of rover pitch during an
action primitive. Blue: evolution of rover roll. Black: lines indicating the maximum
increase and decrease of both roll and pitch during execution of the action primitive,
according to Eq. 5.5.2-5.5.5.
54 Experimental Results
Figure 6.4 (a) shows a few samples of simulated trajectories navigated around a
cluster of rocks (shown in (b)). The cost on the map is shown as levels of grey, with
white indicating the highest cost, for a single orientation value: the rover facing left.
The brown rectangle on the left indicates the common goal region. Red trajectories
were computed without considering uncertainty, while green trajectories considered
uncertainty.
6.2 Simulation of Flat Terrain Traversal 55
was simulated using a point cloud acquired by the robot's RGB-D camera. This
environment is a roughly at area with a cluster of rocks, shown in Fig. 6.4(b). Trials
consisted of placing the robot randomly around the cluster of rocks and directed to a
unique goal region opposite the rock cluster. We used a cost function where any rock
E(P (∆s|λ(s, a), a)) learned is used instead of the full distribution) and planning con-
sidering uncertainty were each tested 100 times. When planning considering uncer-
tainty, the at-terrain learning data was used, where ∆shead was considered during
crab actions, and ∆syaw was considered during rotate actions. Resultant trajectories
were assessed in light of the known ∆shead and ∆syaw distributions to determine the
for each trajectory. Results obtained for both methods can be compared using the
statistics on all executed trajectories in Table 6.3. These statistics represent: the av-
erages of total cost accumulated from start position to goal costtotal , the probability
of hitting an obstacle summed over the entire trajectory Pcollision and the minimum
costtotal is the sum of rewards (Eq. 5.4.3) between the sequence of states each recorded
56 Experimental Results
at the completion of an action, from state state s0 to nal state sG , where G represents
the number of actions executed to reach the goal:
G−1
X
costtotal = R(si+1 , si , ai ), (6.2.1)
i=0
where ai is the action executed from state si . Pcollision is computed using the learned
G−1
YX
Pcollision = 1 − P (s0 , si , ai ).isreal(R(s0 , si , ai )), (6.2.2)
i=0 s0
where the isreal() function returns false if any state sampled by the reward function
Fig. 6.4(a) shows a few examples of trajectories executed. Holes present in the cost
map are considered obstacles: they are either states of impossible conguration
where the kinematics model estimates the rover would not have all six wheels on the
terrain at that position, or occlusions due to shadow areas behind rocks where the
camera could not observe. By convention these obstacles are not considered as valid
states by the motion planner and it does not consider actions that may result in an
invalid state.
6.3 Experiments of Flat Terrain Traversal 57
Results in Table 6.3 highlight two of the major consequences of planning without
uncertainty: a platform will be more likely to collide with an obstacle and will, on
average, accumulate more cost in traversing to the goal region. In fact, when planning
when the robot deviates at least once from the lowest cost path it computes, which
is frequently the case with imperfect control. Thus planning without uncertainty
cannot provide any guarantee of total cost accumulated to reach a goal region, which
visiting up to a certain cost of traversing there. The safety issue of rock collisions
occurs when the planner follows paths very close to an obstacle without considering
We conducted a series of experiments using the physical robot traversing at terrain.
The robot navigated from a starting position to a goal region whilst avoiding large
rocks (Fig. 6.5 shows the rover in the goal position). The terrain was at sand
and gravel, however due to the imperfect control on the loose terrain, the control
uncertainties were signicant, as shown earlier in the training data in Sec. 6.1. We
used the same cost function as for the simulation (Eq. 5.3.2).
Trajectories were planned and executed on the robot 10 times for planning with uncer-
tainty (including heading during crabbing actions and yaw during rotation actions).
For comparison, 10 trajectories were also planned and executed without accounting
for uncertainty. In both cases, two dierent starting points were used, while the goal
Fig. 6.6 shows a few examples of the actual trajectories executed by the rover. Occlu-
sions shown in the cost map in are due to shadows of rocks when they were observed
by the robot.
The results of planning with uncertainty of heading during crabbing actions and yaw
58 Experimental Results
Figure 6.5 Flat traversal experiment, planning around a collection of rocks too large
to climb over safely.
6.3 Experiments of Flat Terrain Traversal 59
Figure 6.6 Example of trajectories taken to avoid several rocks on otherwise at
terrain. Red: without uncertainty considered. Green: with uncertainty in heading.
The starting position is indicated by the brown circle on the left and the goal region
is shown as the brown rectangle at the bottom right. Planning with uncertainty
generates paths that are less sensitive to control error.
60 Experimental Results
during rotation actions are compared statistically against results of planning without
uncertainty in Table 6.4. This table shows statistics of the accumulated cost along
the executed paths. Results indicate that the robot would, on average, plan wider
berths around rocks with uncertainty considered and execute safer paths in practice.
A point of interest is the number of collisions with a rock (considered in this test as
untraversable): 1 collision still occurred when using planning with uncertainty, against
assessment of each trajectory (Eq. 6.2.2) showed that in this case, Mawson was quite
unlucky to collide with the rock as it had computed only a 16% chance of collision.
This assessment also revealed the rover was quite lucky it did not collide with
more than 2 rocks when uncertainty was not considered, with an average collision
ing Rocks
We also conducted a series of experiments where the robot traverses rough terrain.
Trained GP models were used to predict control uncertainty as described in Sec. 6.1.
As in the training phase, the experiments were conducted in unstructured and rough
terrain because of the presence of rocks that the robot sometimes has to travel across,
The learned rough-terrain models were limited to traversal of one rock at a time, and
thus two rock elds (Fig. 6.7) were set up accordingly (labelled `rock eld A' and
`rock eld B'). However, the layout of rocks was dense enough to cause the robot to
In addition to ∆syaw during rotation actions, two types of uncertainty were considered
(separately) with crab actions: ∆shead and ∆sdist . Fig. 6.8 and Fig. 6.9 show example
the Mars Yard. This was the InterSense IS-1200 inertial-optical tracking system [51].
This system uses a camera on the rover to compute rover position and orientation
with respect to duciary markers placed in the Mars Yard. Accuracy of the sys-
tem was superior to the previous system mentioned Sec. 5.2.1, with the additional
benet that, as a global positioning system, error in localisation did not increase
with increased distance travelled by the rover. This new localisation system neces-
increase torque applied to the rover's wheels for greater ability to climb larger rocks
in rock eld B. Results of re-learning the control error in rough terrain are shown
in Table 6.5, marginalised by action. This learning was used for rock eld B only.
This table additionally compares control error found during the learning phase with
1
those recorded during the planning/test phase traversing rock eld B. The observed
Resultant trajectories in Fig. 6.10 show that whilst each planning method attempted
to navigate between rocks, the method of planning without uncertainty (red) would
tend to zig zag more severely in attempts to attain the least possible cost path, nely
tended to hold the course more. This is because the zig zag action sequences can
actually result in more rock traversals in total, due to a greater distance travelled in
1 Dueto a bug in logging, values for two particular actions during planning were not recorded,
marked by a `n/a'.
62 Experimental Results
(a) Rock eld A. The goal region to the right (out of the picture).
Figure 6.7 Rock traversal experiment setup: Mawson is shown at its starting position.
It must traverse over a rock eld to reach a goal region.
6.4 Experiments on Unstructured Terrain: Traversing Rocks 63
Figure 6.8 Example policy obtained for rough terrain experiment with ∆shead con-
sidered, projected onto the x−y plane. The goal is the empty square in the upper
right corner of the gure. Arrows indicate the preferred crab actions at each state.
Dots indicate rotations.
64 Experimental Results
Figure 6.9 Motion policies over rough terrain (rock eld B). Shown the for zero yaw
(rover facing positive x direction), down-sampled 16 fold. The goal is the orange
square in the bottom right corner.
6.4 Experiments on Unstructured Terrain: Traversing Rocks 65
Figure 6.10 Examples of trajectories during the rock traversal experiments, compar-
ing planning when considering dierent uncertainty sources. Shown are one or two
example trajectories from each of the following motion planning methods trialled:
Red: without uncertainty considered. Green: with uncertainty in heading. Cyan:
with uncertainty in distance. Cost map shown in grayscale, from dark to light as
cost increases. The starting position is indicated by the brown disk on the left and
the goal region is shown as the brown rectangle on the right. The planner generally
attempts to navigate between most of the rocks which are high cost.
66 Experimental Results
Table 6.5 Second learning results for rough terrain traversal, used for rock eld B.
This also compares control errors encountered during learning and testing.
Results shown in Table 6.6 indicate the policies chosen by the planning with un-
certainty in this case were favourable, with less accumulated cost. In these tests,
considering heading uncertainty was most signicant to overall cost. This was fol-
performed the poorest. In this table, stuck states refers to situations where the rover
dug one of its wheels in next to a rock during the test and could not initially mount
the rock as the result. This occurred 20% - 30% of the time when no uncertainty
was considered. When considering uncertainty in distance travelled, stuck states still
occurred. However, planning with heading uncertainty again made more impact, with
the least amount of stuck states. During repeated motion planning trials on rock eld
B, the accuracy of the elevation map computed deteriorated as the rover altered the
terrain slightly each time, including exact locations of each rock. This resulted in a
slight deterioration of each motion planning method. For fairness of comparison, the
This trend in the data supports the claim that our approach to consider the control
uncertainty can signicantly improve the safety of the platform at the execution of
planned trajectories.
6.4 Experiments on Unstructured Terrain: Traversing Rocks 67
Table 6.6 Planning with traversal over rock eld A and rock eld B
dently. Considering them in combination should then have an even stronger impact
This chapter summarises conclusions of this thesis in Sec. 7.1. Future work is discussed
in Sec. 7.2.
7.1 Conclusions
Since the motion of any real mobile robot is stochastic to some degree, considering
control uncertainty at the planning stage enables us to signicantly enhance the safety
of the platform (e.g. mitigating chances of collisions) and lower the cost accumulated
A model of uncertainty was built using learned data and Gaussian processes to pre-
dict motions over at and unstructured terrain in a Mars-analogue environment. The
trained model was then exploited to plan policies using dynamic programming, and to
execute paths following the planned policies. Experimental validation was achieved
both in simulation and in real experiments, in a variety of situations, taking into ac-
count uncertainties in heading and distance travelled. Our experiments compared the
the improvement in safety and accumulated cost when accounting for uncertainty.
70 Conclusion and Future Work
Important areas of future work include the consideration of more complex terrain
with larger slopes and denser collections of small rocks. Rough terrain traversals in
this thesis were limited to traversing one rock at a time, however, complex terrain
can necessitate learning for multiple wheels traversing dierent rocks simultaneously.
It is also important to study the ability of our approach to learn and predict the
deviations of control actions due to loose rocks that shift during traversal.
the training data. Benets include the ability of the rover to adapt to new types of
the exact robot-terrain interaction found in a Mars environment will not be known a
priori.
time updates of DP transition functions as they are learned. This allows learning
computed online to update a motion policy in real-time, rather than taking eect on
[4] S. Balakirsky and A. Lacaze. World modeling and behavior generation for
autonomous ground vehicle. In IEEE International Conference on Robotics and
Automation, volume 2, pages 12011206, 2000.
[5] R. Balakrishna and A. Ghosal. Modeling of slip for wheeled mobile robots.
IEEE Transactions on Robotics and Automation, pages 126132, feb 1995.
[7] J. Berb, P. Abbeel, and K. Goldberg. LQG-MP: Optimized path planning for
robots with motion uncertainty and imperfect state information. In Robotics:
Science and Systems, 2010.
[10] A. Bry and N. Roy. Rapidly-exploring random belief trees for motion planning
under uncertainty. In IEEE International Conference on Robotics and
Automation, 2011.
72 Bibliography
[15] M. Greytak and F. Hover. Analytic error variance predictions for planar
vehicles. In IEEE International Conference on Robotics and Automation, pages
471476, may 2009.
[33] J.-C. Latombe. Robot Motion Planning. Kluwer Academic Publishers, Norwell,
MA, USA, 1991.
[34] S. M. LaValle. Rapidly-exploring random trees: A new tool for path planning.
In Technical Report, oct 1998.
[37] P.C. Leger, A. Trebi-Ollennu, J.R. Wright, S.A. Maxwell, R.G. Bonitz, J.J.
Biesiadecki, F.R. Hartman, B.K. Cooper, E.T. Baumgartner, and M.W.
Maimone. Mars exploration rover surface operations: driving spirit at gusev
crater. In IEEE International Conference on Systems, Man and Cybernetics,
volume 2, pages 18151822, Oct 2005.
[44] S. Prentice and N. Roy. The belief roadmap: Ecient planning in belief space
by factoring the covariance. The International Journal of Robotics Research,
2009.
This section lists pseudocode for the motion planning algorithm implemented on the
test platform, to plan a motion policy over an elevation map to a goal region.
// 1. build states
// 2. link states
78
startState =localise()
queue.push(startState)
while queue 6= ∅ do
state = queue.pop()
for ∀ action ∈ Actions do
st =stochasticTransition(state, action, elevationM ap)
if st can go out of bounds then
continue // do not consider
end if
state.f orwardT ransitions.push(st) // parent records child
queue.push()
end for
end for
end while
// 3. dp - build poilcy
// 4. execute policy
A.1 Pseudocode for Motion Planning Algorithm 79
while currentState ∈
/ GoalStates do
for ∀ st ∈ currentState.f orwardT ransitions do
if possible that st can go of bounds then
continue // do not consider
end if
value =bellmanValue(currentState, st.action)
if value is highest seen so far for currentState then
optimalAction = st.action
end if
end for
controls.execute(optimalAction)
currentState =localise()
end while
// Function Denitions
bellmanValue(state, action):
end for
return expectedReward
This section lists pseudocode for the kinematics model algorithm used to model the
Rocker-bogie test platform on an elevation map. Some variable denitions are found
in Fig. 5.2.
[h1, h2, h3, h4, h5, h6] = getW heelElevations(wheelP ositions, elevationM ap)
while |max(elevationOf EachW heel)| > a small positive number do
// drop simulated robot
pose.alpha3+ = asin((h2/h3)/L2 )
A.2 Pseudocode for Kinematics Model Algorithm 81
pose.alpha3+ = asin((h5/h6)/L2 )
if Any value in pose is N aN then
mark state as obstacle
break
end if
end while
return pose
// Function Denitions
reconfigureNodes(pose):
// add root nodes that never change their position with repect to pose.position
queue.add(nodes.roots)
while queue 6= ∅ do
node = queue.pop()
end if
// nd location of child nodes
end for
end while
wheelElevations = getElevationOf EachW heel(nodes.wheelP ositions, elevationM ap)