Behavior Trees in Robotic A I
Behavior Trees in Robotic A I
Behavior Trees in
arXiv:1709.00084v4 [cs.RO] 3 Jun 2020
Robotics and AI
An Introduction
Contents
I
II Contents
3 Design principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1 Improving Readability using Explicit Success Conditions . . . . . . . . . 45
3.2 Improving Reactivity using Implicit Sequences . . . . . . . . . . . . . . . . . . 46
3.3 Handling Different Cases using a Decision Tree Structure . . . . . . . . . 47
3.4 Improving Safety using Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 Creating Deliberative BTs using Backchaining . . . . . . . . . . . . . . . . . . 49
3.6 Creating Un-Reactive BTs using Memory Nodes . . . . . . . . . . . . . . . . 51
3.7 Choosing the Proper Granularity of a BT . . . . . . . . . . . . . . . . . . . . . . . 52
3.8 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Jonathan Ross
Head of Jibo SDK
“There are a lot of different ways to create AI’s, and I feel like I’ve tried pretty
much all of them at one point or another, but ever since I started using behavior
trees, I wouldn’t want to do it any other way. I wish I could go back in time with
this information and do some things differently.” 2
Mike Weldon
Disney, Pixar
“[...]. Sure you could build the very same behaviors with a finite state machine
(FSM). But anyone who has worked with this kind of technology in industry knows
how fragile such logic gets as it grows. A finely tuned hierarchical FSM before a
game ships is often a temperamental work of art not to be messed with!” 3
Alex J. Champandard
Editor in Chief & Founder AiGameDev.com,
Senior AI Programmer Rockstar Games
Daniel Broder
Unreal Engine developer
“The main advantage [of Behavior Trees] is that individual behaviors can easily
be reused in the context of another higher-level behavior, without needing to specify
how they relate to subsequent behaviors”, [2].
Andrew Bagnell et al.
Carnegie Mellon University.
1 https://developers.jibo.com/blog/the-jibo-sdk-
reaching-out-beyond-the-screen
2 http://www.gamasutra.com/blogs/ChrisSimpson/20140717/221339/
Behavior_trees_for_AI_How_they_work.php
3 http://aigamedev.com/open/article/fsm-age-is-over/
4 https://forums.unrealengine.com/showthread.php?
6016-Behavior-Trees-What-and-Why
Chapter 1
What are Behavior Trees?
A Behavior Tree (BT) is a way to structure the switching between different tasks1
in an autonomous agent, such as a robot or a virtual entity in a computer game. An
example of a BT performing a pick and place task can be seen in Fig. 1.1a. As will
be explained, BTs are a very efficient way of creating complex systems that are both
modular and reactive. These properties are crucial in many applications, which has
led to the spread of BT from computer game programming to many branches of AI
and Robotics.
In this book, we will first give an introduction to BTs, in the present chapter.
Then, in Chapter 2 we describe how BTs relate to, and in many cases generalize,
earlier switching structures, or control architectures as they are often called. These
ideas are then used as a foundation for a set of efficient and easy to use design prin-
ciples described in Chapter 3. Then, in Chapter 4 we describe a set of important
extensions to BTs. Properties such as safety, robustness, and efficiency are impor-
tant for an autonomous system, and in Chapter 5 we describe a set of tools for for-
mally analyzing these using a state space formulation of BTs. With the new analysis
tools, we can formalize the descriptions of how BTs generalize earlier approaches in
Chapter 6. Then, we see how BTs can be automatically generated using planning, in
Chapter 7 and learning, in Chapter 8. Finally, we describe an extended set of tools to
capture the behavior of Stochastic BTs, where the outcomes of actions are described
by probabilities, in Chapter 9. These tools enable the computation of both success
probabilities and time to completion.
In this chapter, we will first tell a brief history of BTs in Section 1.1, and explain
the core benefits of BTs, in Section 1.2, then in Section 1.3 we will describe how
a BT works. Then, we will create a simple BT for the computer game Pac-Man in
Section 1.4 and a more sophisticated BT for a mobile manipulator in Section 1.5.
We finally describe the usage of BT in a number of applications in Section 1.6.
1 assuming that an activity can somehow be broken down into reusable sub-activities called tasks
sometimes also denoted actions or control modes
3
4 1 What are Behavior Trees?
(a) A high level BT carrying out a task consisting of first finding, then picking and finally placing
a ball.
? ?
Pick Ball
(b) The Action Pick Ball from the BT in Fig. 1.1a is expanded into a sub-BT. The Ball is ap-
proached until it is considered close, and then the Action grasp is executed until the ball is securely
grasped.
Fig. 1.1: Illustrations of a BT carrying out a pick and place task with different degrees of detail.
The execution of a BT will be described in Section 1.3.
BTs have also been used to enable non-experts to do robot programming of pick
and place operations, due to their “modular, adaptable representation of a robotic
task” [27] and allowed “end-users to visually create programs with the same amount
of complexity and power as traditionally-written programs” [56]. Furthermore, BTs
have been proposed as a key component in brain surgery robotics due to their “flex-
ibility, reusability, and simple syntax” [30].
next section we will describe how BTs work in detail, so these figures are just meant
to give a first glimpse of BTs, rather than the whole picture.
A behavior is often composed of a sequence of sub-behaviors that are task inde-
pendent, meaning that while creating one sub-behavior the designer does not need
to know which sub-behavior will be performed next. Sub-behaviors can be designed
recursively, adding more details as in Figure 1.1b. BTs are executed in a particular
way, which will be described in the following section, that allows the behavior to be
carried out reactively. For example, the BT in Figure 1.1 executes the sub-behavior
Place Ball, but also verifies that the ball is still at a known location and securely
grasped. If, due to an external event, the ball slips out out of the grasp, then the
robot will abort the sub-behavior Place Ball and will re-execute the sub-behavior
Pick Ball or Find Ball according to the current situation.
The Fallback node2 executes Algorithm 2, which corresponds to routing the ticks
to its children from the left until it finds a child that returns either Success or Run-
2 Fallback nodes are sometimes also called selector or priority selector nodes.
1.3 Classical Formulation of BTs 7
7 return Success
ning, then it returns Success or Running accordingly to its own parent. It returns
Failure if and only if all its children return Failure. Note that when a child returns
Running or Success, the Fallback node does not route the ticks to the next child (if
any). The symbol of the the Fallback node is a box containing the label “?”, shown
in Figure 1.3.
The Parallel node executes Algorithm 3, which corresponds to routing the ticks to
all its children and it returns Success if M children return Success, it returns Failure
if N − M + 1 children return Failure, and it returns Running otherwise, where N is
the number of children and M ≤ N is a user defined threshold. The symbol of the
the Parallel node is a box containing the label “⇒”, shown in Figure 1.4.
When it receives ticks, an Action node executes a command. It returns Success if
the action is correctly completed or Failure if the action has failed. While the action
is ongoing it returns Running. An Action node is shown in Figure 1.5a.
8 1 What are Behavior Trees?
7 return Failure
δ Policy
(a) Action node. The la- (b) Condition node. The label (c) Decorator node. The la-
bel describes the action per- describes the condition veri- bel describes the user defined
formed. fied. policy.
Fig. 1.5: Graphical representation of Action (a), Condition (b), and Decorator (c) node.
ticks the child according to some predefined rule. For example, an invert decorator
inverts the Success/Failure status of the child; a max-N-tries decorator only lets its
child fail N times, then always returns Failure without ticking the child; a max-T-
sec decorator lets the child run for T seconds then, if the child is still Running, the
Decorator returns Failure without ticking the child. The symbol of the Decorator is
a rhombus, as in Figure 1.5c.
Ask for
Help
→
? ? ? ? ?
RU N N IN G
?
Ask for
RU N N IN G
Help
→
RU N N IN G
? ? ? ? ?
F AILU RE, RU N N IN G
Find Approach Grasp Approach Place
Ball Found Ball Close Ball Grasped Bin Close Ball Placed
Ball Ball Ball Bin Ball
Ask for
RU N N IN G
Help
→
SU CCESS RU N N IN G
? ? ? ? ?
SU CCESS F AILU RE, RU N N IN G
Find Approach Grasp Approach Place
Ball Found Ball Close Ball Grasped Bin Close Ball Placed
Ball Ball Ball Bin Ball
Ask for
RU N N IN G
Help
→
Ask for
RU N N IN G
Help
→
SU CCESS RU N N IN G
? ? ? ? ?
SU CCESS F AILU RE, RU N N IN G
Find Approach Grasp Approach Place
Ball Found Ball Close Ball Grasped Bin Close Ball Placed
Ball Ball Ball Bin Ball
(d) Ticks’ traversal while the robot is approaching the ball again (because it was removed from the
hand).
Fig. 1.7: Visualization of the ticks’ traversal in the different situations, as explained in Sec-
tion 1.3.1.
1.3 Classical Formulation of BTs 11
→∗
? ?
(a) Sequence composition (b) BT that emulates the execution of the Sequence com-
with memory. position with memory using nodes without memory.
Remark 1.1. Some BT implementations, such as the one described in [43], do not
include the Running return status. Instead, they let each Action run until it returns
Failure or Success. We denote these BTs as non-reactive, since they do not allow
actions other than the currently active one to react to changes. This is a significant
limitation on non-reactive BTs, which was also noted in [43]. A non-reactive BT
can be seen as a BT with only memory nodes.
As reactivity is one of the key strengths of BTs, the non-reactive BTs are of
limited use.
12 1 What are Behavior Trees?
Fig. 1.9: The game Pac-Man for which we will design a BT. There exists maps of different com-
plexity.
The simplest behavior is to let Pac-Man ignore the ghosts and just focus on eating
pills. This is done using a greedy action Eat Pills as in Figure 1.10.
Eat Pills
Fig. 1.10: BT for the simplest non-random behavior, Eat Pills, which maximizes the number of
pills eaten in the next time step.
3https://btirai.github.io/
4 The software was developed at UC Berkeley for educational purposes. More information avail-
able at: http://ai.berkeley.edu/project_overview.html
1.4 Creating a BT for Pac-Man from Scratch 13
The simple behavior described above ignores the ghosts. To take them into ac-
count, we can extend the previous behavior by adding an Avoid Ghosts Action to
be executed whenever the condition Ghost Close is true. This Action will greedily
maximize the distance to all ghosts. The new Action and condition can be added to
the BT as depicted in Fig. 1.11. The resulting BT will switch between Eat Pills and
Avoid Ghost depending on whether Ghost Close returns Success or Failure.
→ Eat Pills
Ghost Avoid
Close Ghost
Fig. 1.11: If a Ghost is Close, the BT will execute the Action Avoid Ghost, else it will run Eat Pills.
The next extension we make is to take the power pills into account. When Pac-
Man eats a Power pill, the ghosts are edible, and we would like to chase them,
instead of avoiding them. To do this we add the condition Ghost Scared and the Ac-
tion Chase Ghost to the BT, as shown in Fig. 1.12. Chase Ghost greedily minimizes
the distance to the closest edible ghost. Note that we only start chasing the ghost
if it is close, otherwise we continue eating pills. Note also that all extensions are
modular, without the need to rewire the previous BT.
→ Eat Pills
Ghost
?
Close
→ Avoid
Ghost
Ghost Chase
Scared Ghost
With this incremental design, we have created a basic AI for playing Pac-Man,
but what if we want to make a world class Pac-Man AI? You could add additional
nodes to the BT, such as moving towards the Power pills when being chased, and
stop chasing ghosts when they are blinking and soon will transform into normal
ghosts. However, much of the fine details of Pac-Man lies in considerations of the
Maze geometry, choosing paths that avoid dead ends and possible capture by mul-
tiple ghosts. Such spatial analysis is probably best done inside the actions, e.g.,
making Avoid Ghosts take dead ends and ghost positions into account. The question
of what functionality to address in the BT structure, and what to take care of inside
the actions is open, and must be decided on a case to case basis, as discussed in
Section 3.7.
Fig. 1.13: The Mobile Manipulator for which we will design a BT.
The simplest possible BT is to check the goal condition Green Cube on Goal. If
this condition is satisfied (i.e. the cube is on the goal) the task is done, if it is not
satisfied the robot needs to place the cube onto the goal area. To correctly execute
the Action Place Cube, two conditions need to hold: the robot is holding the green
cube and the robot is close to the goal area. The behavior described so far can be
encoded in the BT in Figure 1.14. This BT is able to place the green cube on the goal
area if and only if the robot is close to the goal area with the green cube grasped.
Green Cube →
on Goal
Holding Close to
Place Cube
Green Cube Goal
Now, thanks to the modularity of BTs, we can separately design the BTs needed
to satisfy the two lower conditions in Fig. 1.14, i.e., the BT needed to grasp the
green cube and the BT needed to reach the goal area. To grasp the green cube, the
robot needs to have the hand free and be close to the cube. If it is not close, it
approaches as long as a collision free trajectory exists. This behavior is encoded in
the BT in Figure 1.15a. To reach the goal area we let the robot simply Move To the
Goal as long as a collision free trajectory exists. This behavior is encoded in the BT
in Figure 1.15b.
Now we can extend the simple BT in Fig. 1.14 above by replacing the two lower
conditions in Fig. 1.14 with the two BTs in Fig. 1.15. The result can be seen in
Fig. 1.16. Using this design, the robot is able to place the green cube in the goal area
as long as there exists a collision free trajectory to the green cube and to the goal
area.
We can continue to incrementally build the BT in this way to handle more sit-
uations, for instance removing obstructing objects to ensure that a collision free
trajectory exists, and dropping things in the hand to be able to pick the green cube
up.
Holding →
Green Cube
Close to →
Cube
Exists Approach
Collision Free Cube
Trajectory
Close to →
Goal
Exists Approach
Collision Free Goal
Trajectory
Fig. 1.15: Illustrations of a BT carrying out the subtasks of picking the green cube and reaching
the goal area
Green Cube
on Goal →
? ?
Place Cube
Holding → Close to →
Green Cube Goal
Exists Approach
Hand Free ? Pick Cube
Collision Free Goal
Trajectory
Close to →
Cube
Exists Approach
Collision Free Cube
Trajectory
Fig. 1.16: Final BT resulting from the aggregation of the BTs in Figs. 1.14-1.15
development of early prototypes; and their maintainability, making the editing task
easier. Figure 1.17 shows two trucks used in the iQmatic project.
CoSTAR [56] is a project that aims at developing a software framework that con-
tains tools for industrial applications that involve human cooperation. The use cases
include non trained operators composing task plans, and training robots to perform
complex behaviors. BTs have found successful applications in this project as they
simplify the composition of subtasks. The order in which the subtasks are executed
is independent from the subtask implementation; this enables easy composition of
trees and the iterative composition of larger and larger trees. Figure 1.18 shows one
of the robotic platforms of the project.
SARAFun8 is a project that aims at developing a robot-programming framework
that enables a non-expert user to program an assembly task from scratch on a robot
in less than a day. It takes advantages of state of the art techniques in sensory and
cognitive abilities, robot control, and planning.
BTs are used to execute the generic actions learned or planned. For the purpose
of this project, the control architecture must be human readable, enable code reuse,
and modular. BTs have created advantages also during the development stage, when
the code written by different partners had to be integrated. Figure 1.19 shows an
ABB Yumi robot used in the SARAFun testbed.
Rethink Robotics released its software platform Intera in 2017, with BTs at the
“heart of the design”. Intera claims to be a “first-of-its-kind software platform that
connects everything from a single robot controller, extending the smart, flexible
power of Rethink Robotics’ Sawyer to the entire work cell and simplifying automa-
tion with unparalleled ease of deployment.”11 It is designed with the goal of creating
the world’s fastest-to-deploy robot and fundamentally changing the concepts of in-
tegration, making it drastically easier and affordable.
Intera’s BT defines the Sequence of tasks the robot will perform. The tree can
be created manually or trained by demonstration. Users can inspect any portion of
the BT and make adjustments. The Intera interface (see Figure 1.20) also includes a
simulated robot, so a user can run simulations while the program executes the BT.
BTs are appreciated in this context because the train-by-demonstration framework
builds a BT that is easily inspectable and modifiable.12
Fig. 1.21: The KTH entry in the Amazon Picking Challenge at ICRA 2015.
11 http://www.rethinkrobotics.com/news-item/
rethink-robotics-releases- intera-5-new-approach-automation/
12 http://twimage.net/rodney-brooks-743452002
1.6 Use of BTs in Robotics and AI 21
13 https://github.com/amazon-picking-challenge
14 http://time.com/5023212/best-inventions-of-2017/
15 https://developers.jibo.com/docs/behavior-trees.html
22 1 What are Behavior Trees?
Fig. 1.22: The JIBO social robot has an SDK based on BTs.
Chapter 2
How Behavior Trees Generalize and Relate to
Earlier Ideas
In this chapter, we describe how BTs relate to, and often generalize, a number of
well known control architectures including FSMs (Section 2.1), the Subsumption
Architecture (Section 2.3), the Teleo-Reactive Approach (Section 2.4) and Decision
Trees (Section 2.5). We also present advantages and disadvantages of each approach.
Finally, we list a set of advantages and disadvantages of BTs (2.6). Some of the
results of this chapter were previously published in the journal paper [13].
Help OK
Failure Wait for Help Success
Help not OK
Fig. 2.1: Graphical representation of a FSM designed to carry out a simple grab-and-throw task.
The initial state has a thicker border, and events names are given next to the corresponding transi-
tion arrows.
23
24 2 How Behavior Trees Generalize and Relate to Earlier Ideas
However, the drawbacks of FSMs gives rise to problems when the system mod-
elled grows in complexity and number of states, as described briefly in Section 1.2.
In particular we have the following drawbacks
Go Go Aim Shoot
To A To B
Go Go Reload
To D To C
Aim Shoot
Use Handgun
Reload
Fig. 2.2: Example of a HFSM controlling a NPC of a combat game. Patrol, Use Rifle, and Use
Handgun are superstates.
C2
C2
C2
not C10
C8
C6 C6
C6 C8
C8 Bye
C6
C8
C6 C7 C7
Stand Up C7 Wave Say Goodbye
2.2 Hierarchical Finite State Machines
C7
C7 C8
Sit
Activity Manager
not C9
Set Current
Idle Activity
C9
Fig. 2.3: A HFSM description of the BT in Figure 2.4. The transition conditions are shown at the end of each arrow to indicate the direction of the
transition. Note how the complexity of the transitions within each layer of the HFSM grows with the number of nodes. The conditions labels are:
C1 = Activity Sit, C2 = Not Know What to Do, C3 = Activity Stand Up, C4 = Activity Sleep, C5 = Activity Ball Game, C6 = Ball Close, C7 = Ball Grasped,
C8 = New User Suggestion, C9 = Activity Sit, C10 = Bumper Pressed.
27
?
28
Self Protection
→
⇒
Activity Manager
Toe Bumper
⇒ → →
Pressed
User Interaction
?
Perform Activities ?
→ → → → →
→
Track Wave Say
Ball Hand Goodbye
Throw
? ?
Ball
Fig. 2.4: A BT that combines some capabilities of a humanoid robot in an interactive and modular way. Note how atomic actions can easily be replaced by more
complex sub-BTs.
2 How Behavior Trees Generalize and Relate to Earlier Ideas
2.2 Hierarchical Finite State Machines 29
Tick
Source
Generic BT
S
Atomic action
In or R
Composition
Fig. 2.5: An FSM behaving like a BT, made up of a single normal state, three out transitions
Success (S), Running (R) and Failure (F), and a Tick source.
We can now compose such FSM states using both Fallback and Sequence con-
structs. The FSM corresponding to the Fallback example in Figure 2.6 would then
look like the one shown in Figure 2.7.
Enter Enter
through through
Front Door Back Door
Fig. 2.6: A Fallback is used to create an Enter Building BT. The back door option is only tried if
the front door option fails.
Similarly, the FSM corresponding to the sequence example in Figure 2.8 would
then look like the one shown in Figure 2.9, and a two level BT, such as the one in
Figure 2.10 would look like Figure 2.11.
A few observations can be made from the above examples. First, it is perfectly
possible to design FSMs with a structure taken from BTs. Second, considering that
a BT with 2 levels corresponds to the FSM in Figure 2.11, a BT with 5 levels, such
as the one in Figure 2.12 would correspond to a somewhat complex FSM.
Third, and more importantly, the modularity of the BT construct is illustrated in
Figures 2.5-2.11. Figure 2.11 might be complex, but that complexity is encapsu-
30 2 How Behavior Trees Generalize and Relate to Earlier Ideas
In In R In R R
F F
F
Fig. 2.7: A FSM corresponding to the Fallback BT in Figure 2.6. Note how the second state is only
executed if the first fails.
Open Pass
Front through
Door Door
Fig. 2.8: A Sequence is used to to create an Enter Through Front Door BT. Passing the door is
only tried if the opening action succeeds.
In In R In R R
F F
F
Fig. 2.9: An FSM corresponding to the Sequence BT in Figure 2.8. Note how the second state is
only executed if the first succeeds.
lated in a box with a single in-transition and three out-transitions, just as the box in
Figure 2.5.
→ →
Fig. 2.10: The two BTs in Figures 2.6 and 2.8 are combined to larger BT. If e.g. the robot opens
the front door, but does not manage to pass through it, it will try the back door.
2.2 Hierarchical Finite State Machines 31
Fallback(Sequence(Open Front Door,Pass Front Door), Sequence(Open Back Door,Pass Back Door))
S
Sequence(Open Front Door,Pass Front Door) Sequence(Open Back Door,Pass Back Door)
S S
Open Front Door S Pass Front Door S Open Back Door S Pass Back Door S
In In In R In R R In In R In R R R
F F F F
F F
? →
→ →
Fig. 2.12: Combining the BTs above and some additional Actions, we get a flexible BT for entering
a building and performing some task.
Fourth, as was mentioned in Section 1.2, the decision of what to do after a given
sub-BT returns is always decided on the parent level of that BT. The sub-BT is
ticked, and returns Success, Running or Failure and the parent level decides whether
to tick the next child, or return something to its own parent. Thus, the BT ticking
and returning of a sub-BT is similar to a function call in a piece of source code, just
as described in Section 1.2. A function call in Java, C++, or Python moves execution
to another piece of the source code, but then returns the execution to the line right
below the function call. What to do next is decided by the piece of code that made
the function call, not the function itself. As discussed, this is quite different from
standard FSMs where the decision of what to do next is decided by the state being
transitioned to, in a way that resembles the Goto statement.
32 2 How Behavior Trees Generalize and Relate to Earlier Ideas
→ ··· →
Fig. 2.13: Example of a straightforward translation of a FSM to a BT using a global State Variable.
Recharge if Needed S
Fig. 2.14: Example of Subsumption Architecture composed by three controllers. The controller
Stop if Overheated subsumes the controller Recharge if Needed, which subsumes the controller Do
Other Tasks.
2.4 Teleo-Reactive programs 33
that had to achieve specific goals while being responsive to changes in the environ-
ment. A TR program is composed of a set of prioritized condition-action rules that
directs the agent towards a goal state (hence the term teleo) while monitoring the
environmental changes (hence the term reactive). In its simplest form, a TR program
is described by a list of condition-action rules as the following:
c1 → a1
c2 → a2
···
cm → am
where the ci are conditions and ai are actions. The condition-action rules list is
scanned from the top until it finds a condition that holds, then the corresponding
action is executed. In a TR program, actions are usually durative rather than dis-
crete. A durative action is one that continues indefinitely in time, e.g. the Action
move forwards is a durative action, whereas the action take one step is discrete. In
a TR program, a durative action is executed as long as its corresponding condition
remains the one with the highest priority among the ones that hold. When the high-
est priority condition that holds changes, the action executed changes accordingly.
Thus, the conditions must be evaluated continuously so that the action associated
with the current highest priority condition that holds, is always the one being exe-
cuted. A running action terminates when its corresponding condition ceases to hold
or when another condition with higher priority takes precedence. Figure 2.16 shows
an example of a TR program for navigating in a obstacle free environment.
Equal(pos,goal) → Idle
Heading Towards (goal) → Go Forwards
(else) → Rotate
Fig. 2.16: Example of teleoreactive program carrying out a navigation task. If the robot is in the
goal position, the action performed is Idle (no actions executed). Otherwise if it is heading towards
the goal, the action performed is Go Forwards. Otherwise, the robot performs the action Rotate.
→ ··· →
c1 a1 cm am
The core idea of continuously checking conditions and applying the correspond-
ing rules can be captured using a Fallback node and pairs of conditions and actions.
Thus, a general TR program can be represented in the BT of Fig. 2.17. A more for-
mal argument using a state space representation of BTs will be given in Section 6.4.
Have Task
to Do?
yes no
Task is Recharge
Urgent? Now!
yes no
yes no yes no
Fig. 2.18: Example of a Decision Tree executing a generic robotic task. The predicate are evaluated
traversing the tree in a top-down fashion.
?
Predicate
Fig. 2.19: The basic building blocks of Decision Trees are ‘If ... then ... else ...’ statements (left),
and those can be created in BTs as illustrated above (right).
→ Recharge
Now
Have Task
?
To Do
→ ?
Task is Recharge
? →
Urgent Now
Battery Perform
Level > 10% Task
2.6.1 Advantages
As described in Section 1.2 many advantages stem from BTs being both modular
and reactive. Below we list a set of advantages of BTs.
be designed, implemented, tested and reused one module at a time. The benefits
of modularity thus increases, the more complex a system is, by enabling a divide
and conquer approach when designing, implementing and testing.
BTs are modular, since each subtree of a BT can be seen as a module in the
above sense, with a standard interface given by the return statuses. Thus, BTs are
modular on all scales ranging from the topmost subtrees to all the leaves of the
tree.
Hierarchical organization: If a control architecture contains several levels of de-
cision making it is hierarchical. The possibility of designing and analyzing struc-
tures on different hierarchical levels is important for both humans and computers,
as it enables e.g., iterative refinement and extensions of a plan, see Section 3.5.
BTs are hierarchical, since each level of a BT automatically defines a level in the
hierarchy.
Reusable code: Having reusable code is very important in any large, complex,
long-term project. The ability to reuse designs relies on the ability to build larger
things from smaller parts, and on the independence of the input and output of
those parts from their use in the project. To enable reuse of code, each module
must interface the control architecture in a clear and well-defined fashion.
BTs enable reusable code, since given the proper implementation, any subtree
can be reused in multiple places of a BT. Furthermore, when writing the code of
a leaf node, the developer needs to just take care of returning the correct return
status which is universally predefined as either Running, Success, or Failure.
Unlike FSMs and HFSMs, where the outgoing transitions require knowledge
about the next state, in BTs leaf nodes are developed disregarding which node is
going to be executed next. Hence, the BT logic is independent from the leaf node
executions and viceversa.
Reactivity: By reactive we mean the ability to quickly and efficiently react to
changes. For unstructured environments, where outcomes of actions are not cer-
tain and the state of the world is constantly changed by external actors, plans that
were created offline and then executed in an open loop fashion are often likely to
fail.
BTs are reactive, since the continual generation of ticks and their tree traversal
result in a closed loop execution. Actions are executed and aborted according to
the ticks’ traversal, which depends on the leaf nodes’ return statuses. Leaf nodes
are tightly connected with the environment (e.g. condition nodes evaluate the
overall system properties and Action nodes return Failure/Success if the action
failed/succeeded). Thus, BTs are highly responsive to changes in the environ-
ment.
Human readable: A readable structure is desirable for reducing the cost of devel-
opment and debugging, especially when the task is human designed. The struc-
ture should remain readable even for large systems. Human readability requires
a coherent and compact structure.
BTs are human readable due to their tree structure and modularity.
Expressive: A control architecture must be sufficiently expressive to encode a
large variety of behaviors.
2.6 Advantages and Disadvantages of Behavior Trees 39
BTs are at least as expressive as FSMs, see Section 2.1, the Subsumption Archi-
tecture, see Section 2.3, Teleo-Reactive programs, see Section 2.4, and Decision
Trees, see Section 2.5.
Suitable for analysis: Safety critical robot applications often require an analysis
of qualitative and quantitative system properties. These properties include: safety,
in the sense of avoiding irreversible undesired behaviors; robustness, in the sense
of a large domain of operation; efficiency, in the sense of time to completion;
reliability, in the sense of success probability; and composability, in the sense of
analyzing whether properties are preserved over compositions of subtasks.
BTs have tools available to evaluate such system properties, see Chapters 5 and
9.
Suitable for automatic synthesis: In some problem instances, it is preferable that
the action ordering of a task, or a policy, is automatically synthesized using task-
planning or machine learning techniques. The control architecture can influence
the efficiency of such synthesis techniques (e.g. a FSM with a large number of
transitions can drastically deteriorate the speed of an algorithm that has to con-
sider all the possible paths in the FSMs).
BTs are suitable for automatic synthesis in terms of both planning, see Sec-
tion 3.5 and in more detail Chapter 7 and learning, see Chapter 8.
To illustrate the advantages listed above, we consider the following simple ex-
ample.
Example 2.1. A robot is tasked to find a ball, pick it up, and place it into a bin. If the
robot fails to complete the task, it should go to a safe position and wait for a human
operator. After picking up the ball (Figure 2.21a), the robot moves towards the bin
(Figure 2.21b). While moving towards the bin, an external entity takes the ball from
the robot’s gripper (Figure 2.21c) and immediately throws it in front of the robot,
where it can be seen (Figure 2.21d). The robot aborts the execution of moving and
it starts to approach the ball again.
40 2 How Behavior Trees Generalize and Relate to Earlier Ideas
(a) The robot is picking up the ball. (b) The robot moves toward the bin (far
away from the robot) with the ball in the
hand.
(c) An external entity (a human) takes the (d) The robot approaches the ball in the new
ball from the robot gripper. location.
Ball Lost
Ball Found ∧ Ball Far Ball Found ∧ Ball Close ∧ Ball Grasped ∧ Bin not Reached
Help OK
Failure Wait for Help Success
Help not OK
Fig. 2.22: FSM modeling the robot’s behavior in Example 2.1. The initial state has a thicker border.
41
42 2 How Behavior Trees Generalize and Relate to Earlier Ideas
In this example, the robot does not simply execute a pick-and-place task. It con-
tinually monitors the progress of the actions, stops whenever needed, skips planned
actions, decides the actions to execute, and responds to exogenous events. In order
to execute some actions, the robot might need to inject new actions into the plan
(e.g. the robot might need to empty the bin before placing the ball). Hence the task
requires a control architecture suitable for extensions. These extensions might be
human made (e.g. the robot asks the operator to update the current action policy) re-
quiring an architecture to be human readable, or automated (e.g. using model-based
reasoning) requiring an architecture to be suitable for automatic synthesis. In either
case, to be able to easily extend and modify the action policy, its representation must
be modular. In addition, new actions may subsume existing ones whenever needed
(e.g. empty the bin if it is full must be executed before place the ball). This requires
a hierarchical representation of the policy. Moreover there might be multiple dif-
ferent ways of carrying out a task (e.g. picking the ball using the left hand or the
right hand). The robot must be able to decide which option is the best, requiring
the architecture to be suitable for analysis. Finally, once the policy is designed, it is
desirable that it can be reused in other contexts.
Most control architectures lack one or more of the properties described above.
Take as an example a FSM modeling the behavior of the robot in Example 2.1,
depicted in Figure 2.22. As can be seen, even for this simple example the FSM gets
fairly complex with many transitions.
2.6.2 Disadvantages
In this section we describe some disadvantages of BTs.
BTs are fairly easy to understand and use, but to make full use of their potential it
can be good to be aware of a set of design principles that can be used in different
situations. In this chapter, we will describe these principles using a number of ex-
amples. First, in Section 3.1, we will describe the benefit of using explicit success
conditions in sequences, then, in Section 3.2, we describe how the reactivity of a
BT can be increased by creating implicit sequences, using Fallback nodes. In Sec-
tion 3.3, we show how BTs can be designed in a way that is similar to Decision
Trees. Then, in Section 3.4, we show how safety can be improved using sequences.
Backchaining is an idea used in automated planning, and in Section 3.5 we show
how it can be used to create deliberative, goal directed, BTs. Memory nodes and
granularity of BTs is discussed in Sections 3.6 and 3.7. Finally, we show how easily
all these principles can be combined at different levels of a BT in Section 3.8.
Pass
Unlock Open Door through
Door
Door
45
46 3 Design principles
Consider the sequence in Figure 3.1. One can assume that Unlock Door returns
Success when it has unlocked the door, but what if it is called when the door is
already unlocked? Depending on the implementation it might either return Suc-
cess immediately, or actually try to unlock the door again, with the possibility of
returning Failure if the key cannot be turned further. A similar uncertainty holds
regarding the implementation of Open Door (what if the door is already open?) and
Pass through Door. To address this problem, and remove uncertainties regarding the
implementation, explicit Success conditions can be included in the BT.
? ? ?
Pass
Door Unlock Door Open Agent Has
through
Unlocked Door Open Door Passed
Door
Fig. 3.2: Sequence with explicit success conditions. Note how each action is paired with a condition
through a Fallback node, making the success condition of the pair explicit.
In Figure 3.2, the BT from Figure 3.1 has been extended to include explicit suc-
cess conditions. These conditions are added in a pair with the corresponding action
using a Fallback node. Now, if the door is already unlocked and open, the two first
conditions of Figure 3.2 will return Success, the third will return Failure, and the
agent will proceed to execute the action Pass through Door.
→ → →
Pass
Agent Has Door Door Unlock
through Open Door Has Key
Passed Open Unlocked Door
Door
Fig. 3.3: An Implicit Sequence is constructed using a Fallback node, reversing the order of the
actions and pairing them with appropriate preconditions.
3.4 Improving Safety using Sequences 47
The key observation needed to improve reactivity is to realize that the goal is to
get through the door, and that the other actions are just means to get to that goal. In
the BT in Figure 3.3 we have reversed the order of the actions in order the check the
goal state first. We then changed fallbacks to sequences and vice versa, and finally
changed the conditions. Now, instead of checking outcomes, or success conditions
as we did in Figure 3.2, we check preconditions, conditions needed to execute the
corresponding actions, in Figure 3.3. First the BT checks if the agent has passed the
door, if so it returns Success. If not, it proceeds to check if the door is open, and if so
passes through it. If neither of the previous conditions are satisfied, it checks if the
door is unlocked, and if so starts to open it. As a final check, if nothing else returns
Success, it checks if it has the key to the door. If it does, it tries to open it, if not it
returns Failure.
The use of implicit sequences is particularly important in cases where the agent
needs to undo some of its own actions, such as closing a door after passing it. A
systematic way of creating implicit sequences is to use back chaining, as described
in Section 3.5.
→ Eat Pills
Ghost
?
Close
→ Avoid
Ghost
Ghost Chase
Scared Ghost
Do Main
?
Task
Battery Recharge
Level > 20 % Battery
Fig. 3.5: A BT that is guaranteed not to run out of batteries, as long as Main Task keeps the robot
close enough to the recharging station so that 20% of battery will be enough to travel back.
up chattering i.e. quickly switching between the two tasks. The solution is to make
sure that once recharging, the robot waits until the battery is back at 100%. This can
be achieved by the BT in Fig 3.6.
Do Main
?
Task
Recharge
Battery Battery
Level > 20 %
and not
Recharging
Fig. 3.6: By changing the condition in Fig. 3.5 the robot now keeps recharging until the Battery
level reaches 100%.
3.5 Creating Deliberative BTs using Backchaining 49
Is Inside
House
Now imagine we have a set of small BTs such as the ones in Figures 3.8 and 3.9,
each on the format of the general Postcondition-Precondition-Action (PPA) BT in
Figure 3.11.
Is Inside →
House
Door is Go Inside
Open
Fig. 3.8: PPA for achieving the postcondition Is Inside House. If the postcondition is not satisfied
already, the BT checks the precondition Door is Open, if so it executes the action Go Inside.
If we have such a set, be can work our way backwards from the goal (backchain-
ing) by replacing preconditions with PPAs having the corresponding postcondition.
Thus replacing the single condition in Figure 3.7 with the PPA of Figure 3.8 we get
Figure 3.8 again, since we started with a single condition. More interestingly, if we
replace the precondition Door is Open in Figure 3.8 with the PPA of Figure 3.9 we
get the BT of Figure 3.10
Thus we can iteratively build a deliberative BT by applying Algorithm 4. Look-
ing at the BT in Figure 3.10 we note that it first checks if the agent Is Inside House,
if so it returns Success. If not it checks if Door is Open, and if it is, it proceeds to
Go Inside. If not it checks if Door is Unlocked and correspondingly executes Open
Door. Else it checks if Door is Weak, and it Has Crowbar and proceeds to Brake
Door Open if that is the case. Else it returns Failure. If an action is executed it might
either succeed, which will result in a new condition being satisfied and another ac-
tion being executed until the task is finished, or it might fail. If Go Inside fails, the
50 3 Design principles
Door is → →
Open
Fig. 3.9: PPA for achieving the postcondition Door is Open. If the postcondition is not satisfied,
the BT checks the first precondition Door is Unlocked, if so it executes the action Open Door, if
not it checks the second set of preconditions, starting with Has Crowbar, if so it checks Door is
Weak, if both are satisfied it executes Brake Door Open.
Is Inside →
House
? Go Inside
Door is → →
Open
Fig. 3.10: The result of replacing Door is Open in Figure 3.8 with the PPA of Figure 3.9.
whole BT returns Failure, but if Open Door fails, the conditions Door is Weak and
Has Crowbar are checked.
In general, we let the PPA have the form of Figure 3.11, with one postcondition
C that can be achieved by either one of a set of actions Ai , each of these action are
combined in a sequence with its corresponding list of preconditions Ci j , and these
action precondition sequences are fallbacks for achieving the same objective. We
see that from an efficiency point of view it makes sense to put actions that are most
3.6 Creating Un-Reactive BTs using Memory Nodes 51
→ →
C
Fig. 3.11: General format of a PPA BT. The Postcondition C can be achieved by either one of
actions A1 or A2 , which have Preconditions C1i and C2i respectively.
likely to succeed first (to avoid unnecessary failures) and check preconditions that
are most likely to fail first (to quickly move on to the next fallback option).
→∗
Fig. 3.12: Example of a Un-Reactive Sequence composition of the behaviors pick, move, and
place.
52 3 Design principles
• It makes sense to encode the behavior in a single leaf when the potential subparts
of the behavior are always used and executed in this particular combination.
• It makes sense to encode a behavior as a sub-BT, braking it up into conditions,
actions and flow control nodes, when the subparts are likely to be usable in other
combinations in other parts of the BT, and when the reactivity of BTs can be used
to re-execute parts of the behavior when needed.
Consider the BT in Figure 3.13 describing the behavior of a humanoid robot. The
actions sit and stand cannot be divided into meaningful sub-behaviors.
Self Protection
→
⇒
Activity Manager
Toe Bumper
⇒ → →
Pressed
User Interaction
?
Perform Activities ?
→ → → → →
→
Track Wave Say
Ball Hand Goodbye
Throw
? ?
Ball
the worker picks up an object that the robot is trying to reach, or the object slipping
out of the robot gripper while the robot is moving it, etc. If we had instead chosen
to aggregate the actions pick object, assemble object, and place object into a single
action we would lose reactiveness when, for example, the robot has to re-pick an
assembled object that slipped out from the robot’s grippers. With a single action the
robot would try to re-assemble an already assembled object.
? ? ?
The advice above should give the designer an idea on how to reach a balanced
BT that is neither too fine grained nor too compact. A fine grained BT might be
unreasonably complex. While a compact BT may risk being not sufficiently reac-
tive, by executing too many operations in a feed-forward fashion, losing one main
advantage of BTs.
→ → → Drive
Around
In this section, we will show how the modularity of BTs make it very straight-
forward to combine the design principles described in this chapter at different levels
of a BT. Image we are designing the AI for a game character making a living as
a burglar. Its daily live could be filled with stealing and spending money, as de-
scribed in the BT of Figure 3.15. Note that we have used the Implicit Sequence
design principle from Section 3.2. The intended progression is driving around until
54 3 Design principles
a promising house is found, enter the house and find indications of money nearby,
steal the money and then leave the house to spend the money.
? Steal
Stuff
No Cops Fight
Escape
Nearby Cops
Fig. 3.16: If the escape (or fight) action is efficient enough, this sequence construction will guar-
antee that the burglar is never caught.
Performing the actions described above, the burglar is also interested in not bee-
ing captured by the police. Therefore we might design a BT handling when to es-
cape, and when to fight the cops trying to catch it. This might be considered a safety
issue, and we can use the design principle for improving safety using sequences, as
described in Section 3.4 above. The result might look like the BT in Figure 3.16. If
cops are nearby the burglar will first try to escape, and if that fails fight. If anytime
during the fight, the escape option is viable, the burglar will switch to escaping.
Is Inside →
House
? Go Inside
Door is → →
Open
Fig. 3.17: Using backchaining, a BT of desired complexity can be created to get a burglar into a
house, this is the same as Figure 3.10.
Now, the modularity of BTs enable us to combine all these BTs, created with dif-
ferent design principles, into a single, more complex BT, as shown in Figure 3.18.
Note that the reactivity of all parts is maintained, and the switches between differ-
ent sub-BTs happen just the way they should, for example from Drive Around, to
Braking a Door Open (when finding a house), to Fighting Cops (when the police
arrives and escape is impossible) and then Stealing Money (when police officers are
defeated). We will come back to this example in the next chapter on BT extensions.
No Cops Fight
Escape
Nearby Cops ?
→ → → Drive
Around
Is Inside →
House
? Go Inside
Door is → →
Open
Fig. 3.18: A straightforward combination of the BTs in Figures 3.15, 3.16, and 3.17.
Chapter 4
Extensions of Behavior Trees
?
utility Steal
Stuff
No Cops Fight
Escape
Nearby Cops
Fig. 4.1: The result of adding a utility Fallback in the BT controlling a burglar game character in
Figure 3.16. Note how the Utility node enables a reactive re-ordering of the actions Escape and
Fight Cops.
57
58 4 Extensions of Behavior Trees
s
PSequence = Πi Pis , s
PFallback = 1 − Πi (1 − Pis ), (4.1)
since Sequences need all children to succeed, while Fallbacks need only one, with
probability equal to the complement of all failing. This is theoretically appealing,
but relies on the implicit assumption that each action is only tried once. In a reactive
BT for a robot picking and placing items, you could imagine the robot first picking
an item, then accidentally dropping it halfway, and then picking it up again. Note
that the formulas above do not account for this kind of events.
Now the question comes to how we compute or estimate Pis for the individual
actions. A natural idea is to learn this from experience [28]. It is reasonable to as-
sume that the success probability of an action, Pis , is a function of the world state,
so it would make sense to try to learn the success probability as a function of state.
Ideally we can classify situations such that one action is known to work in some
situations, and another is known to work in others. The continuous maximization of
success probabilities in a Fallback node would then make the BT choose the correct
action depending on the situation at hand.
There might still be some randomness to the outcomes, and then the following
estimate is reasonable
# successes
Pis = . (4.2)
# trials
However, this leads to a exploit/explore problem [28]. What if both available ac-
tions of a Fallback have high success probability? Initially we try one that works,
yielding a good estimate for that action. Then the optimization might continue to
favor (exploit) that action, never trying (explore) the other one that might be even
better. For the estimates to converge for all actions, even the ones with lower success
estimates needs to be executed sometimes. One can also note that having multiple
similar robots connected to a cloud service enables much faster learning of both
forms of success estimates described above.
It was mentioned above that it might also be relevant to include costs and ex-
ecution times in the decision of what tree to execute. A formal treatment of both
success probabilities and execution times can be found in Chapter 9. A combination
of cost and success probabilities might result in a utility system, as described above,
but finding the right combination of all three is still an open problem.
AI is influenced by both level designers, responsible for the player experience, and
AI engineers, responsible for agents behaving rationally. Thus, the level designers
need a way of making some behaviors more likely, without causing irrational side
effects ruining the game experience.
? Steal
Stuff
No Cops Fight
Escape
Nearby Cops
Fig. 4.2: The agressive burglar style, resulting from disabling Escape in the BT controlling a
burglar game character in Figure 3.16.
This problem was discussed in one of the first papers on BTs [31], with the pro-
posed solutions being styles, with each style corresponding to disabling a subset of
the BT. For instance, the style agressive burglar might simply have the actions Es-
cape disabled, making it disregard injuries and attack until defeated, see Figure 4.2.
Similarly, the Fight action can be disabled in the pacifist burglar style, as shown
in Figure 4.3. A more elaborate solution to the same problem can be found in the
Hinted BTs described below.
? Steal
Stuff
No Cops Fight
Escape
Nearby Cops
Fig. 4.3: The pacifist burglar style, resulting from disabling Fight in the BT controlling a burglar
game character in Figure 3.16.
4.4 Other extensions of BTs 61
Hinted BTs were first introduced in [53, 54]. The key idea is to have an exter-
nal entity, either human or machine, giving suggestions, so-called hints, regarding
actions to execute, to a BT. In robotics, the external entity might be an operator or
user suggesting something, and in a computer game it might be the level designer
wanting to influence the behavior of a character without having to edit the actual
BT.
The hints can be both positive (+), in terms of suggested actions, and negative
(-), actions to avoid, and a somewhat complex example can be found in Figure 4.4.
Multiple hints can be active simultaneously, each influencing the BT in one, or both,
of two different ways. First they can effect the ordering of Fallback nodes. Actions
or trees with positive hints are moved to the left, and ones with negative hints are
moved to the right. Second, the BT is extended with additional conditions, checking
if a specific hint is given.
In the BT of Figure 4.4, the following hints were given: Fight Cops+, Brake Door
Open+ and Spend Money-. Fight Cops+ makes the burglar first considering the fight
option, and only escaping when fighting fails. Brake Door Open+ makes the burglar
try to brake the door, before seeing if it is open or not, and the new corresponding
condition makes it ignore the requirements of having a weak door and a crowbar
before attempting to brake the door. Finally, Spend Money- makes the burglar prefer
to drive around looking for promising houses rather than spending money.
No Cops Fight
Escape
Nearby Cops ?
→ → Drive →
Around
Is Inside →
House
? Go Inside
Door is → →
Open
Hint: Brake
→
Door Open
Door is Has
Weak Crowbar
Fig. 4.4: The result of providing the hints Fight Cops+, Brake Door Open+ and Spend Money- to
the BT in Figure 3.18. The dashed arrows indicated changes in the BT.
Chapter 5
Analysis of Efficiency, Safety, and Robustness
Autonomous agents will need to be efficient, robust, and reliable in order to be used
on a large scale. In this chapter, we present a mathematical framework for analyz-
ing these properties for a BT (Section 5.1). The analysis includes efficiency (Sec-
tion 5.2), in terms of execution time bounds; robustness (Section 5.2), in terms of
capability to operate in large domains; and safety (Section 5.3), in terms of avoiding
some particular parts of the state space. Some of the results of this chapter were
previously published in the journal paper [13].
Ti = { fi , ri , ∆t}, (5.1)
where i ∈ N is the index of the tree, fi : Rn → Rn is the right hand side of an ordinary
difference equation, ∆t is a time step and ri : Rn → {R, S , F } is the return status
that can be equal to either Running (R), Success (S ), or Failure (F ). Let the Run-
ning/Activation region (Ri ), Success region (Si ) and Failure region (Fi ) correspond
to a partitioning of the state space, defined as follows:
Ri = {x : ri (x) = R} (5.2)
Si = {x : ri (x) = S } (5.3)
Fi = {x : ri (x) = F }. (5.4)
63
64 5 Analysis of Efficiency, Safety, and Robustness
Finally, let xk = x(tk ) be the system state at time tk , then the execution of a BT Ti is
a standard ordinary difference equation
The return status ri will be used when combining BTs recursively, as explained
below.
Assumption 5.1 From now on we will assume that all BTs evolve in the same con-
tinuous space Rn using the same time step ∆t.
Remark 5.1. It is often the case, that different BTs, controlling different vehicle sub-
systems evolving in different state spaces, need to be combined into a single BT.
Such cases can be accommodated in the assumption above by letting all systems
evolve in a larger state space, that is the Cartesian product of the smaller state spaces.
Definition 5.2 (Sequence compositions of BTs). Two or more BTs can be com-
posed into a more complex BT using a Sequence operator,
T0 = Sequence(T1 , T2 ).
If xk ∈ S1 (5.7)
r0 (xk ) = r2 (xk ) (5.8)
f0 (xk ) = f2 (xk ) (5.9)
else
r0 (xk ) = r1 (xk ) (5.10)
f0 (xk ) = f1 (xk ). (5.11)
T1 and T2 are called children of T0 . Note that when executing the new BT, T0 first
keeps executing its first child T1 as long as it returns Running or Failure. The second
child is executed only when the first returns Success, and T0 returns Success only
when all children have succeeded, hence the name Sequence, just as the classical
definition of Sequences in Algorithm 1 of Section 1.3.
For notational convenience, we write
Definition 5.3 (Fallback compositions of BTs). Two or more BTs can be com-
posed into a more complex BT using a Fallback operator,
5.1 Statespace Formulation of BTs 65
T0 = Fallback(T1 , T2 ).
If xk ∈ F1 (5.13)
r0 (xk ) = r2 (xk ) (5.14)
f0 (xk ) = f2 (xk ) (5.15)
else
r0 (xk ) = r1 (xk ) (5.16)
f0 (xk ) = f1 (xk ). (5.17)
Note that when executing the new BT, T0 first keeps executing its first child T1
as long as it returns Running or Success. The second child is executed only when
the first returns Failure, and T0 returns Failure only when all children have tried,
but failed, hence the name Fallback, just as the classical definition of Fallbacks in
Algorithm 2 of Section 1.3.
For notational convenience, we write
Definition 5.4 (Parallel compositions of BTs). Two or more BTs can be composed
into a more complex BT using a Parallel operator,
T0 = Parallel(T1 , T2 ).
Let x = (x1 , x2 ) be the partitioning of the state space described in Assumption 5.2,
then f0 (x) = ( f11 (x), f22 (x)) and r0 is defined as follows
66 5 Analysis of Efficiency, Safety, and Robustness
If M = 1
r0 (x) = S If r1 (x) = S ∨ r2 (x) = S (5.19)
r0 (x) = F If r1 (x) = F ∧ r2 (x) = F (5.20)
r0 (x) = R else (5.21)
If M = 2
r0 (x) = S If r1 (x) = S ∧ r2 (x) = S (5.22)
r0 (x) = F If r1 (x) = F ∨ r2 (x) = F (5.23)
r0 (x) = R else (5.24)
As noted in the following lemma, exponential stability implies FTS, given the right
choices of the sets S, F, R.
Lemma 5.1 (Exponential stability and FTS). A BT for which xs is a globally ex-
ponentially stable equilibrium of the execution (5.5), and S ⊃ {x : ||x − xs || ≤ ε},
/ R = Rn \ S, is FTS.
ε > 0, F = 0,
1 Both meanings of robustness are aligned with the IEEE standard glossary of software engineering
terminology: “The degree to which a system or component can function correctly in the presence
of invalid inputs or stressful environmental conditions.”
5.2 Efficiency and Robustness 67
Proof. Global exponential stability implies that there exists a > 0 such that ||x(k) −
xs || ≤ e−ak for all k. Then, for each ε there is a time τ such that ||x(k)−xs || ≤ e−aτ <
ε, which implies that there is a τ 0 < τ such that x(τ 0 ) ∈ S and the BT is FTS.
We are now ready to look at how these properties extend across compositions of
BTs.
Proof. First we consider the case when x(0) ∈ R01 . Then, as T1 is FTS, the state
will reach S1 in a time k1 < τ1 , without leaving R01 . Then T2 starts executing, and
will keep the state inside S1 , since S1 = R02 ∪ S2 . T2 will then bring the state into
S2 , in a time k2 < τ2 , and T0 will return Success. Thus we have the combined time
k1 + k2 < τ1 + τ1 .
If x(0) ∈ R02 , T1 immediately returns Success, and T2 starts executing as above.
The lemma above is illustrated in Figure 5.2, and Example 5.1 below.
Open Pass
Front through
Door Door
Fig. 5.1: A Sequence is used to to create an Enter Through Front Door BT. Passing the door is
only tried if the opening action succeeds. Sequences are denoted by a white box with an arrow.
Example 5.1. Consider the BT in Figure 5.1. If we know that Open Front Door is
FTS and will finish in less than τ1 seconds, and that Pass through Door is FTS
and will finish in less than τ2 seconds. Then, as long as S1 = R02 ∪ S2 , Lemma 5.2
states that the combined BT in Figure 5.1 is also FTS, with an upper bound on the
execution time of τ1 +τ2 . Note that the condition S1 = R02 ∪S2 implies that the action
Pass through Door will not make the system leave S1 , by e.g. accidentally colliding
with the door and thereby closing it without having passed through it.
The result for Fallback compositions is related, but with a slightly different con-
dition on Si and R0j . Note that this is the theoretical underpinning of the design
principle Implicit Sequences described in Section 3.2.
R10
S1
R10 S1
R20 S2
Fig. 5.2: The sets R01 , S1 , R02 , S2 of Example 5.1 and Lemma 5.2.
Proof. First we consider the case when x(0) ∈ R01 . Then, as T1 is FTS, the state
will reach S1 before k = τ1 < τ0 , without leaving R01 . If x(0) ∈ R02 \ R01 , T2 will
execute, and the state will progress towards S2 . But as S2 ⊂ R01 , x(k1 ) ∈ R01 at some
time k1 < τ2 . Then, we have the case above, reaching x(k2 ) ∈ S1 in a total time of
k2 < τ1 + k1 < τ1 + τ2 .
The Lemma above is illustrated in Figure 5.3, and Example 5.2 below.
F1
R1
Rn
R1
R2 S1
S2
R2
F2
Fig. 5.3: The sets S1 , F1 , R1 (solid boundaries) and S2 , F2 , R2 (dashed boundaries) of Example 5.2
and Lemma 5.3.
Remark 5.2. As can be noted, the necessary conditions in Lemma 5.2, includ-
ing S1 = R02 ∪ S2 might be harder to satisfy than the conditions of Lemma 5.3, in-
5.2 Efficiency and Robustness 69
Pass Open
through Front
Door Door
Fig. 5.4: An Implicit Sequence created using a Fallback, as described in Example 5.2 and Lemma
5.3.
cluding S2 ⊂ R01 . Therefore, Lemma 5.3 is often preferable from a practical point of
view, e.g. using implicit sequences as shown below.
Example 5.2. This example will illustrate the design principle Implicit sequences of
Section 3.2. Consider the BT in Figure 5.4. During execution, if the door is closed,
then Pass through Door will fail and Open Front Door will start to execute. Now,
right before Open Front Door returns Success, the first action Pass through Door
(with higher priority) will realize that the state of the world has now changed enough
to enable a possible success and starts to execute, i.e. return Running instead of
Failure. The combined action of this BT will thus make the robot open the door (if
necessary) and then pass through if.
Thus, even though a Fallback composition is used, the result is sometimes a se-
quential execution of the children in reverse order (from right to left). Hence the
name Implicit sequence.
The example above illustrates how we can increase the robustness of a BT. If
we want to be able to handle more diverse situations, such as a closed door, we
do not have to make the door passing action more complex, instead we combine it
with another BT that can handle the situation and move the system into a part of the
statespace that the first BT can handle. The sets S0 , F0 , R0 and f0 of the combined BT
are shown in Figure 5.5, together with the vector field f0 (x) − x. As can be seen, the
combined BT can now move a larger set of initial conditions to the desired region
S0 = S1 .
F0
R0
Rn f0 = f1
S0
f0 = f2
R0
F0
Fig. 5.5: The sets S0 , F0 , R0 and the vector field ( f0 (x) − x) of Example 5.2 and Lemma 5.3.
If M = 1
R00 = {R01 ∪ R02 } \ {S1 ∪ S2 } (5.25)
S0 = S1 ∪ S2 (5.26)
τ0 = min(τ1 , τ2 ) (5.27)
If M = 2
R00 = {R01 ∩ R02 } \ {S1 ∩ S2 } (5.28)
S0 = S1 ∩ S2 (5.29)
τ0 = max(τ1 , τ2 ) (5.30)
5.3 Safety
In this section we will show how some aspects of safety carry across modular com-
positions of BTs. The results will enable us to design a BT to handle safety guaran-
tees and a BT to handle the task execution separately.
5.3 Safety 71
In order to formalize the discussion above, we say that safety can be measured
by the ability to avoid a particular part of the statespace, which we for simplicity
denote the Obstacle Region.
In order to make statements about the safety of composite BTs we also need the
following definition.
This implies that the system, under the control of another BT with maximal states-
pace steplength d, cannot leave S without entering I, and thus avoiding O, see
Lemma 5.5 below.
Example 5.3. To illustrate how safety can be improved using a Sequence composi-
tion, we consider the UAV control BT in Figure 5.6. The sets Si , Fi , Ri are shown
in Figure 5.7. As T1 is Guarrantee altitude above 1000 ft, its failure region F1 is a
small part of the state space (corresponding to a crash) surrounded by the running
region R1 that is supposed to move the UAV away from the ground, guaranteeing a
minimum altitude of 1000 ft. The success region S1 is large, every state sufficiently
distant from F1 . The BT that performs the mission, T2 , has a smaller success region
S2 , surrounded by a very large running region R2 , containing a small failure region
F2 . The function f0 is governed by Equations (5.9) and (5.11) and is depicted in
form of the vector field ( f0 (x) − x) in Figure 5.8.
Guarantee
Perform
altitude above
Mission
1000 ft
Fig. 5.6: The Safety of the UAV control BT is Guaranteed by the first Action.
Rn
S1 R1
F1
F2 R2
R2 S2
Fig. 5.7: The sets S1 , F1 , R1 (solid boundaries) and S2 , F2 , R2 (dashed boundaries) of Example 5.3
and Lemma 5.5.
Rn
R0
R0
F0 F0
f0 = f1
S0
f0 = f2
Fig. 5.8: The sets S0 , F0 , R0 and the vector field ( f0 (x) − x) of Example 5.3 and Lemma 5.5.
systems, and BTs are no exception. As is suggested by the right part of Figure 5.8,
chattering can be a problem when vector fields meet at a switching surface.
Although the efficiency of some compositions can be computed using Lemma
5.2 and 5.3 above, chattering can significantly reduce the efficiency of others. In-
spired by [16] the following result can give an indication of when chattering is to be
expected.
Let Ri and R j be the running region of Ti and T j respectively. We want to study
the behavior of the system when a composition of Ti and T j is applied. In some
cases the execution of a BT will lead to the running region of the other BT and vice-
versa. Then, both BTs are alternatively executed and the state trajectory chatters
on the boundary between Ri and R j . We formalize this discussion in the following
lemma.
Proof. When the condition holds, the vector field is pointing outwards on at least
one side of the switching boundary.
Note that this condition is not satisfied on the right hand side of Figure 5.8. This
concludes our analysis of BT compositions.
5.4 Examples
In this section, we show some BTs of example and we analyze their properties.
Section 5.4.1 Illustrates how to analyze robustness and efficiency of a robot exe-
cuting a generic task. Section 5.3 illustrates to compute safety using the functional
representation of Section 5.1. Section 9.4 illustrate how to compute the performance
estimate of a given BT. Finally, Section 5.4.3 illustrate the properties above of a
complex BT .
All BTs were implemented using the ROS BT library.2 A video showing the
executions of the BTs used in Sections 5.4.2-5.4.1 is publicly available. 3
Example 5.4. Let x = (x1 , x2 ) ∈ R2 , where x1 ∈ [0, 0.5] is the horizontal position of
the robot head and x2 ∈ [0, 0.55] is vertical position (height above the floor) of the
robot head. The objective of the robot is to get to the destination at (0, 0.48).
Fig. 5.9: The combination T3 =Fallback(T4 , T5 , T6 ) increases robustness by increasing the region
of attraction.
First we describe the sets Si , Fi , Ri and the corresponding vector fields of the
functional representation. Then we apply Lemma 5.3 to see that the combination
does indeed improve robustness. For this example ∆t = 1s.
For Walk Home, T4 , we have that
S4 = {x : x1 ≤ 0} (5.32)
R4 = {x : x1 6= 0, x2 ≥ 0.48} (5.33)
F4 = {x : x1 6= 0, x2 < 0.48} (5.34)
x1 − 0.1
f4 (x) = (5.35)
x2
that is, it runs as long as the vertical position of the robot head, x2 , is at least 0.48m
above the floor, and moves towards the origin with a speed of 0.1m/s. If the robot is
not standing up x2 < 0.48m it returns Failure. A phase portrait of f4 (x) − x is shown
in Figure 5.10. Note that T4 is FTS with the completion time bound τ4 = 0.5/0.1 =
10 and region of attraction R04 = R4 .
For Sit to Stand, T5 , we have that
5.4 Examples 75
0.5
0.4
0.3
x2 [m]
0.2
0.1
Fig. 5.10: The Action Walk Home, keeps the head around x2 = 0.5 and moves it towards the
destination x1 = 0.
S5 = {x : 0.48 ≤ x2 } (5.36)
R5 = {x : 0.3 ≤ x2 < 0.48} (5.37)
F5 = {x : x2 < 0.3} (5.38)
x1
f5 (x) = (5.39)
x2 + 0.05
that is, it runs as long as the vertical position of the robot head, x2 , is in between
0.3m and 0.48m above the floor. If 0.48 ≤ x2 the robot is standing up, and it returns
Success. If x2 ≤ 0.3 the robot is lying down, and it returns Failure. A phase portrait
of f5 (x) − x is shown in Figure 5.11. Note that T5 is FTS with the completion time
bound τ5 = ceil(0.18/0.05) = ceil(3.6) = 4 and region of attraction R05 = R5
For Lie down to Sit Up, T6 , we have that
S6 = {x : 0.3 ≤ x2 } (5.40)
R6 = {x : 0 ≤ x2 < 0.3} (5.41)
F6 = 0/ (5.42)
x1
f6 (x) = (5.43)
x2 + 0.03
that is, it runs as long as the vertical position of the robot head, x2 , is below 0.3m
above the floor. If 0.3 ≤ x2 the robot is sitting up (or standing up), and it returns
Success. If x2 < 0.3 the robot is lying down, and it returns Running. A phase portrait
of f6 (x) − x is shown in Figure 5.12. Note that T6 is FTS with the completion time
bound τ6 = 0.3/0.03 = 10 and region of attraction R06 = R6
76 5 Analysis of Efficiency, Safety, and Robustness
0.5
0.4
0.3
x2 [m]
0.2
0.1
Fig. 5.11: The Action Sit to Stand moves the head upwards in the vertical direction towards stand-
ing.
0.5
0.4
0.3
x2 [m]
0.2
0.1
Fig. 5.12: The Action Lie down to Sit Up moves the head upwards in the vertical direction towards
sitting.
Informally, we can look at the phase portrait in Figure 5.13 to get a feeling for
what is going on. As can be seen the Fallbacks make sure that the robot gets on its
feet and walks back, independently of where it started in {x : 0 < x1 ≤ 0.5, 0 ≤ x2 ≤
0.55}.
Formally, we can use Lemma 5.3 to compute robustness in terms of the region of
attraction R03 , and efficiency in terms of bounds on completion time τ3 . The results
are described in the following Lemma.
0.5
0.4
0.3
x2 [m]
0.2
0.1
Fig. 5.13: The combination Fallback(T4 , T5 , T6 ) first gets up, and then walks home.
5.4.2 Safety
To illustrate Lemma 5.5 we choose the BT of Figure 5.14. The idea is that the first
subtree in the Sequence (named Guarantee Power Supply) is to guarantee that the
combination does not run out of power, under very general assumptions about what
is going on in the second BT.
First we describe the sets Si , Fi , Ri and the corresponding vector fields of the
functional representation. Then we apply Lemma 5.5 to see that the combination
does indeed guarantee against running out of batteries.
Example 5.5. Let T1 be Guarantee Power Supply and T2 be Do other tasks. Let
furthermore x = (x1 , x2 ) ∈ R2 , where x1 ∈ [0, 100] is the distance from the current
position to the recharging station and x2 ∈ [0, 100] is the battery level. For this ex-
ample ∆t = 10s.
For Guarantee Power Supply, T1 , we have that
78 5 Analysis of Efficiency, Safety, and Robustness
Guarantee
Power Do Other
Supply Task
Fig. 5.14: A BT where the first action guarantees that the combination does not run out of battery.
that is, when running, the robot moves to x1 < 0.1 and recharges. While moving,
the battery level decreases and while charging the battery level increases. If at the
recharge position, it returns Success only after reaching x2 ≥ 100. Outside of the
recharge area, it returns Success as long as the battery level is above 20%. A phase
portrait of f1 (x) − x is shown in Figure 5.15.
100
90
80
70
60
x2 [%]
50
40
30
20
10
0
0 20 40 60 80 100
x1 [m]
S2 = 0/ (5.49)
R2 = R2 (5.50)
F2 = 0/ (5.51)
x1 + (50 − x1 )/50
f2 (x) = (5.52)
x2 − 0.1
that is, when running, the robot moves towards x1 = 50 and does some important
task, while the battery level keeps on decreasing. A phase portrait of f2 (x) − x is
shown in Figure 5.15.
100
90
80
70
60
x2 [%]
50
40
30
20
10
0
0 20 40 60 80 100
x1 [m]
Lemma 5.8. Let the obstacle region be O = {x : x2 = 0} and the initialization region
be I = {x : x1 ∈ [0, 100], x2 ≥ 15}.
Furthermore, let T1 be given by (5.44)-(5.48) and T2 be an arbitrary BT satis-
fying maxx ||x − f2 (x)|| < d = 5, then T0 =Sequence(T1 , T2 ) is safe with respect to
I and O, i.e. if x(0) ∈ I, then x(t) 6∈ O, for all t > 0.
80 5 Analysis of Efficiency, Safety, and Robustness
100
90
80
70
60
x2 [%]
50
40
30
20
10
0
0 20 40 60 80 100
x1 [m]
Fig. 5.17: Phase portrait of T0 =Sequence(T1 , T2 ). Note that T1 guarantees that the combination
does not run out of battery. The dashed line is a simulated execution, starting at (80, 50).
Proof. First we see that T1 is safe with respect to O and I. Then we notice that T1
is safeguarding with margin d = 10 for the reachable set X = {x : x1 ∈ [0, 100], x2 ∈
[0, 100]}. Finally we conclude that T0 is Safe, according to Lemma 5.5.
Note that if we did not constraint the robot to move in some reachable set X = {x :
x1 ∈ [0, 100], x2 ∈ [0, 100]}, it would be able to move so far away from the recharging
station that the battery would not be sufficient to bring it back again before reaching
x2 = 0.
5.4 Examples 81
→
Guarantee
Power
Supply
?
Self Protection
→
⇒
Activity Manager
Toe Bumper
⇒ → →
Pressed
User Interaction
?
Perform Activities ?
→ → → → →
→
Track Wave Say
Ball Hand Goodbye
Throw
? ?
Ball
Fig. 5.18: A BT that combines some capabilities of the humanoid robot in an interactive and
modular way. Note how atomic actions can easily be replaced by more complex sub-BTs.
Example 5.6. The BT in Figure 5.18 is designed for controlling a humanoid robot
in an interactive capability demo, and includes the BTs of Figures 5.14 and 5.9 as
subtrees, as discussed below.
The top left part of the tree includes some exception handling, in terms of battery
management, and backing up and complaining in case the toe bumpers are pressed.
The top right part of the tree is a Parallel node, listening for new user commands,
along with a request for such commands if none are given and an execution of the
corresponding activities if a command has been received.
The subtree Perform Activities is composed of checking of what activity to do,
and execution of the corresponding command. Since the activities are mutually ex-
clusive, we let the Current Activity hold only the latest command and no ambiguities
of control commands will occur.
The subtree Play Ball Game runs the ball tracker, in parallel with moving closer
to the ball, grasping it, and throwing it.
82 5 Analysis of Efficiency, Safety, and Robustness
As can be seen, the design is quite modular. A HDS implementation of the same
functionality would need an extensive amount of transition arrows going in between
the different actions.
We will now apply the analysis tools of the paper to the example, initially assum-
ing that all atomic actions are FTS, as described in Definition 5.5.
Comparing Figures 5.14 and 5.18 we see that they are identical, if we let Do
Other Task correspond to the whole right part of the larger BT. Thus, according to
Lemma 5.8, the complete BT is safe, i.e. it will not run out of batteries, as long
as the reachable state space is bounded by 100 distance units from the recharging
station and the time steps are small enough so that maxx ||x − f2 (x)|| < d = 5, i.e.
the battery does not decrease more than 5% in a single time step.
The design of the right subtree in Play Ball Game is made to satisfy Lemma 5.2,
with the condition S1 = R02 ∪ S2 . Let T1 = Fallback(Ball Close?, Approach Ball),
T2 = Fallback(Ball Grasped?, Grasp Ball), T3 = Throw Ball. Note that the use of
condition-action pairs makes the success regions explicit. Thus S1 = R02 ∪ S2 , i.e.
Ball Close is designed to describe the Region of Attraction of Grasp Ball, and S2 =
R03 ∪ S3 , i.e. Ball Grasped is designed to describe the Region of Attraction of Throw
Ball. Finally, applying Lemma 5.2 twice we conclude that the right part of Play
Ball Game is FTS with completion time bound τ1 + τ2 + τ3 , region of attraction
R01 ∪ R02 ∪ R03 and success region S1 ∩ S2 ∩ S3 .
The Parallel composition at the top of Play Ball Game combines Ball Tracker
which always returns Running, with the subtree discussed above. The Parallel node
has M = 1, i.e. it only needs the Success of one child to return Success. Thus, it is
clear from Definition 5.4 that the whole BT Play Ball Game has the same properties
regarding FTS as the right subtree.
Finally, we note that Play Ball Game fails if the robot is not standing up. There-
fore, we improve the robustness of that subtree in a way similar to Example 5.4 in
Figure 5.9. Thus we create the composition Fallback(Play Ball Game, T5 , T6 ), with
T5 = Sit to Stand, T6 = Lie Down to Sit Up.
Assuming that that high dimensional dynamics of Play Ball Game is somehow
captured in the x1 dimension we can apply an argument similar to Lemma 5.7 to
conclude that the combined BT is indeed also FTS with completion time bound
τ1 + τ2 + τ3 + τ5 + τ6 , region of attraction R01 ∪ R02 ∪ R03 ∪ R05 ∪ R06 and success region
S1 ∩ S2 ∩ S3 .
The rest of the BT concerns user interaction and is thus not suitable for doing
performance analysis.
Note that the assumption on all atomic actions being FTS is fairly strong. For
example, the humanoid’s grasping capabilities are somewhat unreliable. A deter-
ministic analysis such as this one is still useful for making good design choices, but
in order to capture the stochastic properties of a BT, we need the tools of Chapter 9.
But first we will use the tools developed in this chapter to formally investigate
how BTs relate to other control architectures.
Chapter 6
Formal Analysis of How Behavior Trees
Generalize Earlier Ideas
In this chapter, we will formalize the arguments of Chapter 2, using the tools devel-
oped in Chapter 5. In particular, we prove that BTs generalize Decision Trees (6.1),
the Subsumptions Architecture (6.2), Sequential Behavior Compositions (6.3) and
the Teleo-Reactive Approach (6.4). Some of the results of this chapter were previ-
ously published in the journal paper [13].
Have Task
to Do?
yes no
Task is Recharge
Urgent? Now!
yes no
yes no yes no
Fig. 6.1: The Decision Tree of a robot control system. The decisions are interior nodes, and the
actions are leaves.
83
84 6 Formal Analysis of How Behavior Trees Generalize Earlier Ideas
Consider the Decision Tree of Figure 6.1, the robot has to decide whether to
perform a given task or recharge its batteries. This decision is taken based upon the
urgency of the task, and the current battery level. The following Lemma shows how
to create an equivalent BT from a given Decision Tree.
where DTi1 , DTi2 are either atomic actions, or subtrees with identical structure, we
can create an equivalent BT by setting
for non-atomic actions, Ti = DTi for atomic actions and requiring all actions to
return Running all the time.
The original Decision Tree and the new BT are equivalent in the sense that the
same values for Pi will always lead to the same atomic action being executed. The
lemma is illustrated in Figure 6.2.
Informally, first we note that by requiring all actions to return Running, we ba-
sically disable the feedback functionality that is built into the BT. Instead whatever
action that is ticked will be the one that executes, just as the Decision Tree. Sec-
ond the result is a direct consequence of the fact that the predicates of the Decision
Trees are essentially ‘If ... then ... else ...’ statements, that can be captured by BTs
as shown in Figure 6.2.
Note that this observation opens possibilities of using the extensive literature on
learning Decision Trees from human operators, see e.g. [65], to create BTs. These
learned BTs can then be extended with safety or robustness features, as described in
Section 5.2.
We finish this section with an example of how BTs generalize Decision Trees.
Consider the Decision Tree in Figure 6.1. Applying Lemma 6.1 we get the equiva-
lent BT of Figure 6.3. However the direct mapping does not always take full advan-
6.1 How BTs Generalize Decision Trees 85
?
Predicate
Fig. 6.2: The basic building blocks of Decision Trees are ‘If ... then ... else ...’ statements (left),
and those can be created in BTs as illustrated above (right).
→ Recharge
Now
Have Task
?
To Do
→ ?
Task is Recharge
? →
Urgent Now
Battery Perform
Level > 10% Task
Fig. 6.3: A BT that is equivalent to the Decision Tree in Figure 6.1. A compact version of the same
tree can be found in Figure 6.4.
→
Recharge
Now!
?
Have Task Recharge
to Do Now!
Battery →
Level > 30 %
Task is Battery
Urgent Level > 10 %
tage of the features of BTs. Thus a more compact, and still equivalent, BT can be
found in Figure 6.4, where again, we assume that all actions always return Running.
Recharge if Needed S
Fig. 6.5: The Subsumption Architecture. A higher level behavior can subsume (or suppress) a
lower level one.
which is equivalent to (6.4) above. In other words, actions will be checked in order
of priority, until one that returns Running is found.
A BT version of the example in Figure 6.5 can be found in Figure 6.6. Table
6.1 illustrates how the two control structures are equivalent, listing all the 23 possi-
ble return status combinations. Note that no action is executed if all actions return
Failure.
where we convert the True/False of the conditions to Success/Failure, and let the
actions only return Running.
Proof. It is straightforward to see that the BT above executes the exact same ai as
the original TR would have, depending on the values of the conditions ci , i.e. it finds
the first condition ci that returns Success, and executes the corresponding ai .
→ ··· →
c1 a1 cm am
We will now illustrate the lemma with an example from Nilssons original pa-
per [51].
where pos is the current robot position and loc is the current destination.
Executing this Teleo-Reactive program, we get the following behavior. If the
robot is at the destination it does nothing. If it is heading the right way it moves
forward, and else it rotates on the spot. In a perfect world without obstacles, this
90 6 Formal Analysis of How Behavior Trees Generalize Earlier Ideas
will get the robot to the goal, just as predicted in Lemma 6.5. Applying Lemma 6.4,
the Teleo-Reactive program Goto is translated to a BT in Figure 6.8.
→ → →
Equal Heading Go
(pos,loc) Idle Towards True Rotate
Forwards
(loc)
The example continues in [51] with a higher level recursive Teleo-Reactive pro-
gram, called Amble(loc), designed to add a basic obstacle avoidance behavior
where new point picks a new random point in the vicinity of pos and loc.
Again, if the robot is at the destination it does nothing. If the path to goal is clear
it executes the Teleo-Reactive program Goto. Else it picks a new point relative to its
current position and destination (loc) and recursively executes a new copy of Amble
with that destination. Applying Lemma 6.4, the Amble TR is translated to a BT in
Figure 6.9.
→ → →
Amble
Equal Clear Path
Idle GoTo(loc) True (new
(pos,loc) (pos,loc)
point(pos,loc))
Lemma 6.5 (Nilsson 1994). If a Teleo-Reactive program is Universal, and there are
no sensing and execution errors, then the execution of the program will lead to the
satisfaction of c1 .
Proof. In [51] it is stated that it is easy to see that this is the case.
The idea of the proof is indeed straight forward, but as we will see when we
compare it to the BT results in Section 6.4.1 below, the proof is incomplete.
In Lemma 5.3, Si , Ri , Fi correspond to Success, Running and Failure regions and
R0 denotes the region of attraction.
Lemma 5.3 shows under what conditions we can guarantee that the Success re-
gion S0 is reached in finite time. If we for illustrative purposes assume that the
regions of attraction are identical to the running regions Ri = R0i , the lemma states
that as long as the system starts in R00 = R01 ∪ R02 it will reach S0 = S1 in less than
τ0 = τ1 + τ2 time units. The condition analogous to the regression property is that
S2 ⊂ R01 , i.e. that the Success region of the second BT is a subset of the region of
attraction R01 of the first BT. The regions of attraction, R01 and R02 are very important,
but there is no corresponding concept in Lemma 6.5. In fact, we can construct a
counter example showing that Lemma 6.5 does not hold.
The fix is however quite straightforward, and amounts to using the following
definition with a stronger assumption.
Definition 6.1 (Stronger Regression property). For each ci , i > 1 there is c j , j <
i such that the execution of action ai leads to the satisfaction of c j , without ever
violating ci .
Chapter 7
Behavior Trees and Automated Planning
In this chapter, we describe how automatic planning can be used to create BTs,
exploiting ideas from [17, 75, 74, 10]. First, in Section 7.1, we present an exten-
sion of the Backchaining design principle, introduced in Section 3.5, including a
robotics example. Then we present an alternative approach using A Behavior Lan-
guage (ABL), in Section 7.2, including a game example.
In classical planning research, the world is often assumed to be static and known,
with all changes occurring as a result of the actions executed by one controlled
agent [25]. Therefore, most approaches return a static plan, i.e. a sequence of actions
that brings the system from the initial state to the goal state, with a corresponding
execution handled by a classical FSM.
However, many agents, both real and virtual, act in an uncertain world populated
by other agents, each with their own set of goals and objectives. Thus, the effect
of an action can be unexpected, diverging from the planned state trajectory, making
the next planned action infeasible. A common way of handling this problem is to
re-plan from scratch on a regular basis, which can be expensive both in terms of
time and computational load. To address these problems, the following two open
challenges were identified within the planning community [24]:
• “Hierarchically organized deliberation. This principle goes beyond existing hier-
archical planning techniques; its requirements and scope are significantly differ-
ent. The actor performs its deliberation online”
• “Continual planning and deliberation. The actor monitors, refines, extends, up-
dates, changes and repairs its plans throughout the acting process, using both
descriptive and operational models of actions.”
Similarly, the recent book [25] describes the need for an actor that “reacts to events
and extends, updates, and repairs its plan on the basis of its perception”.
Combining planning with BTs is one way of addressing these challenges. The
reactivity of BTs enables the agent to re-execute previous subplans without having
to replan at all, and the modularity enables extending the plan recursively, without
having to re-plan the whole task. Thus, using BTs as the control architecture in an
automated planning algorithm addresses the above challenges by enabling a reason-
93
94 7 Behavior Trees and Automated Planning
ing process that is both hierarchical and modular in its deliberation, and can monitor,
update, and extend its plans while acting.
In practice, and as will be seen in the examples below, using BTs enables re-
activity, in the sense that if an object slips out of a robot’s gripper, the robot will
automatically stop and pick it up again without the need to replan or change the BT,
see Fig. 7.16. Using BTs also enables iterative plan refinement, in the sense that if
an object is moved to block the path, the original BT can be extended to include a
removal of the blocking obstacle. Then, if the obstacle is removed by an external
actor, the robot reactively skips the obstacle removal, and goes on to pick up the
main object without having to change the BT, see Fig. 7.23.
→ →
C
Fig. 7.1: Copy of Figure 3.11. The general format of a PPA BT. The Postcondition C can be
achieved by either one of actions A1 or A2 , which have Preconditions C1i and C2i respectively.)
oc ∈ GoalRect
oc ∈ GoalRect →
?
? or ∈ Npg Place(c, pg )
oc ∈ GoalRect →
h=c →
(b) The BT after one iteration. (c) The BT after two iterations.
?
?
oc ∈ GoalRect →
oc ∈ GoalRect →
or ∈ Npg ? ? Place(c, pg )
? Place(c, pg )
(d) The BT after three iterations. (e) The BT after four iterations, the final version
Example 7.1. The robot in Figure 7.3 is given the task to move the green cube into
the rectangle marked GOAL (the red sphere is handled in Example 7.2 below, in
this inital example it is ignored). The BT in Figure 7.2e is executed, and in each
time step the root of the BT is ticked. The root is a Fallback node, which ticks is
96 7 Behavior Trees and Automated Planning
(a) (b)
Fig. 7.3: A simple example scenario where the goal is to place the green cube C onto the goal
region G. The fact that the sphere S is suddenly moved (red arrow) by an external agent to block
the path must be handled. In (a) the nominal plan is to MoveTo(c)→Pick(c)→MoveTo(g)→Drop()
when the sphere suddenly moves to block the path. In (b), after refining the plan, the extended plan
is to MoveTo(s)→Push(s)→MoveTo(c) →Pick(c)→MoveTo(g)→Drop() when the sphere is again
suddenly moved by another agent, before being pushed. Thus our agent must smoothly revert
to the original set of actions. PA-BT does this without re-planning. Note that when S appears,
τσ ⊂ CcollFree returns Failure and the BT in Figure 7.2e is expanded further, see Example 7.2, to
push it out of the way.
first child, the condition oc ∈ GoalRect (cube on goal). If the cube is indeed in the
rectangle we are done, and the BT returns Success.
If not, the second child, a Sequence node, is ticked. The node ticks its first child,
which is a Fallback, which again ticks its first child, the condition h = c (object in
hand is cube). If the cube is indeed in the hand, the Condition node returns Success,
its parent, the Fallback node returns Success, and its parent, the Sequence node
ticks its second child, which is a different Fallback, ticking its first child which is
the condition or ∈ N pg (robot in the neighborhood of pg ). If the robot is in the
neighborhood of the goal, the condition and its parent node (the Fallback) returns
Success, followed by the sequence ticking its third child, the action Place(c, pg )
(place cube in a position pg on the goal), and we are done.
If or ∈ N pg does not hold, the action MoveTo(pg , τg ) (move to position pg on
the goal region using the trajectory τg ) is executed, given that the trajectory is free
τ ⊂ CollFree . Similarly, if the cube is not in the hand, the robot does a MoveTo(pc , τc )
(move to cube, using the trajectory τc ) followed by a Pick(c) after checking that the
hand is empty, the robot is not in the neighborhood of c and that the corresponding
trajectory is free.
We conclude the example by noting that the BT is ticked every timestep, e.g.
every 0.1 second. Thus, when actions return Running (i.e. they are not finished yet)
the return status of Running is progressed up the BT and the corresponding action is
allowed to control the robot. However, if e.g., the cube slips out of the gripper, the
condition h = c instantly returns Failure, and the robot starts checking if it is in the
neighborhood of the cube or if it has to move before picking it up again.
7.1 The Planning and Acting (PA-BT) approach 97
We are now ready to study PA-BT in detail. The approach is described in Algo-
rithms 5 (finding what condition to replace with a PPA) and 6 (creating the PPA and
adding it to the BT). First we will give an overview of the algorithms and see how
they are applied to the robot in Figure 7.3, to iteratively create the BTs of Figure 7.2.
We will then discuss the key steps in more detail.
Remark 7.1. Note that the conditions of an action template can contain a disjunction
of propositions. This can be encoded by a Fallback composition of the correspond-
ing Condition nodes.
98 7 Behavior Trees and Automated Planning
5 return None
PA-BT is based on the definition of the action templates, which contains the de-
scriptive model of an action. An action template is characterized by conditions con
(sometimes called preconditions) and effects eff (sometimes called postconditions)
that are both constraints on the world (e.g. door open, robot in position). An action
template is mapped online into an action primitive, which contains the operational
model of an action and is executable. Figure 7.4 shows an example of an action
template and its corresponding action refinement.
To plan in infinite state space, PA-BT relies on a so-called Reachability Graph
(RG) provided by the HBF algorithm, see [22] for details. The RG provides efficient
sampling for the actions in the BT, allowing us to map the descriptive model of an
action into its operational model.
Pick(i) Pick(cube)
con : or ∈ Noi con : or ∈ Nocube
h = 0/ h = 0/
eff : h = i eff : h = cube
(a) Action Template for picking a (b) Action primitive created from
generic object denoted i. the Template in (a), where the ob-
ject is given as i = cube.
Fig. 7.4: Action Template for Pick and its corresponding Action primitive. or is the robot’s position,
Noi is a set that defines a neighborhood of the object oi , h is the object currently in the robot’s hand.
The conditions are that the robot is in the neighborhood of the object, and that the robot hand is
empty. The effect is that the object is in the robot hand.
Get Condition To Expand and Expand Tree (Algorithm 5 Lines 9 and 10)
the cost of minor mistakes (e.g. non-optimal actions order) is often much lower than
the cost of the extensive modeling, information gathering and thorough deliberation
needed to achieve optimality.
Similar to any STRIPS-style planner, adding a new action in the plan can cause a
conflict (i.e. the execution of this new action reverses the effects of a previous ac-
tion). In PA-BT, this possibility is checked in Algorithm 5 Line 11 by analyzing
the conditions of the new action added with the effects of the actions that the sub-
tree executes before executing the new action. If this effects/conditions pair is in
conflict, the goal will not be reached. An example of this situation is described in
Example 7.2 below.
Again, following the approach used in STRIPS-style planners, we resolve this
conflict by finding the correct action order. Exploiting the structure of BTs we can
do so by moving the tree composed by the new action and its condition leftward (a
BT executes its children from left to right, thus moving a subtree leftward implies
executing the new action earlier). If it is the leftmost one, this means that it must
be executed before its parent (i.e. it must be placed at the same depth of the parent
but to its left). This operation is done in Algorithm 5 Line 12. PA-BT incrementally
increases the priority of this subtree in this way, until it finds a feasible tree. In [10]
it is proved that, under certain assumptions, a feasible tree always exists .
Let’s look again at Example 7.1 above and see how the BT in Figure 7.2e was
created using the PA-BT approach. In this example, the action templates are sum-
marized below with conditions and effect:
where τ is a trajectory, CollFree is the set of all collision free trajectories, or is the
robot pose, p is a pose in the state space, h is the object currently in the end effector,
i is the label of the i-th object in the scene, and Nx is the set of all the poses near the
pose x.
The descriptive model of the action MoveTo is parametrized over the destination
p and the trajectory τ. It requires that the trajectory is collision free (τ ⊂ CollFree ). As
effect the action MoveTo places the robot at p (i.e. or = p), the descriptive model of
the action Pick is parametrized over object i. It requires having the end effector free
7.1 The Planning and Acting (PA-BT) approach 101
Example 7.2. Here we show a more complex example highlighting two main prop-
erties of PA-BT: the livelock freedom and the continual deliberative plan and act
cycle. This example is an extension of Example 7.1 where, due to the dynamic en-
vironment, the robot needs to replan.
Consider the execution of the final BT in Figure 7.2e of Example 7.1, where
the robot is carrying the desired object to the goal location. Suddenly, as in Fig-
ure 7.3 (b), an object s obstructs the (only possible) path. Then the condition
τ ⊂ CollFree returns Failure and Algorithm 5 expands the tree accordingly (Line
10) as in Figure 7.5a.
The new subtree has as condition h = 0/ (no objects in hand) but the effect of the
left branch (i.e. the main part in Figure 7.2e) of the BT is h = c (cube in hand) (i.e.
the new subtree will be executed if and only if h = c holds). Clearly the expanded
tree has a conflict (Algorithm 5 Line 11) and the priority of the new subtree is
increased (Line 12), until the expanded tree is in form of Figure 7.5b. Now the BT
is free from conflicts as the first subtree has as effect h = 0/ and the second subtree
has a condition h = 0./ Executing the tree the robot approaches the obstructing object,
now the condition h = 0/ returns Failure and the tree is expanded accordingly, letting
the robot drop the current object grasped, satisfying h = 0, / then it picks up the
obstructing object and places it on the side of the path. Now the condition τ ⊂
CollFree finally returns Success. The robot can then again approach the desired object
and move to the goal region and place the object in it.
oc ∈ GoalRect →
? ? Place(c, pg )
h=c → oy ∈ Npg
h=∅ ? Pick(c) →
os ∈
/ τσ →
? or ∈ Npx Place(s, px )
h=s →
? ? Pick(s)
h=∅ → or ∈ Nos
oc ∈ GoalRect →
? ? ? Place(c, pg )
os ∈
/ τσ or ∈ Npg → h=c → or ∈ Npg
h=∅ → or ∈ Nos
Fig. 7.5: Steps to increase the priority of the new subtree added in Example 7.2.
tree needs no extension as there is no failed condition of an action) which gets re-
refined in the next loop (Algorithm 5 Line 5). For example, if the robot planned to
7.1 The Planning and Acting (PA-BT) approach 103
place the object in a particular position on the desk but this position was no longer
feasible (e.g. another object was placed in that position by an external agent).
Example 7.3. Consider an agent moving in different states modeled by the graph in
Figure 7.6 where the initial state is s0 and the goal state is sg . Every arc represents
an action that moves an agent from one state to another. The action that moves the
agent from a state si to a state s j is denoted by si → s j . The initial tree, depicted
in Figure 7.7a, is defined as a Condition node sg which returns Success if and only
if the robot is at the state sg in the graph. The current state is s0 (the initial state).
Hence the BT returns a status of Failure. Algorithm 5 invokes the BT expansion
routine. The state sg can be reached from the state s5 , through the action s5 → sg ,
or from the state s3 , through the action s3 → sg . The tree is expanded accordingly
as depicted in Figure 7.7b. Now executing this tree, it returns a status of Failure.
Since the current state is neither sg nor s3 nor s5 . Now the tree is expanded in a BFS
fashion, finding a subtree for condition s5 as in Figure 7.7c. The process continues
for two more iterations. Note that at iteration 4 (See Figure 7.8b) Algorithm 5 did
not expand the condition sg as it was previously expanded (Algorithm 7 line 3) this
avoids infinite loops in the search. The same argument applies for conditions s4 and
sg in iteration 5 (See Figure 7.8c). The BT at iteration 5 includes the action s0 → s1
whose precondition is satisfied (the current state is s0 ). The action is then executed.
Performing that action (and moving to s1 ), the condition s1 is satisfied. The BT
executes the action s1 → s3 and then s3 → sg , making the agent reach the goal state.
It is clear that the resulting execution is the same as a BFS on the graph would
have rendered. Note however that PA-BT is designed for more complex problems
than graph search.
s4 s1 s0
sg s3
s5 s2
1
sg
→ →
2 3
s5 s5 → sg s3 s3 → sg
→ →
3
? s5 → s g s3 s3 → sg
2
s5 ?∗
→ → →
4 5 6
sg sg → s5 s4 s4 → s 5 s2 s2 → s5
Fig. 7.7: First three BT updates during execution of Example 7.3. The numbers represent the index
of the BFS of Algorithm 7. Note that the node labeled with ?∗ is a Fallback node with memory.
Example 7.4 (From [33]). Consider a multipurpose robot that is asked to clean the
object A and then put it in the storage room as shown in Figure 7.9 (in this first
example we ignore the other robots as they are not in [33]). The goal is specified
as a conjunction Clean(A) ∧ In(A, storage). Using PA-BT, the initial BT is defined
as a sequence composition of Clean(A) with In(A, storage) as in Figure 7.10a. At
execution, the Condition node Clean(A) returns Failure and the tree is expanded
accordingly, as in Figure 7.10b. Executing the expanded tree, the Condition node
In(A,Washer) returns Failure and the BT is expanded again, as in Figure 7.10c. This
iterative process of planning and acting continues until it creates a BT such that the
robot is able to reach object C and remove it. After cleaning object A, the approach
constructs the tree to satisfy the condition In(A, storage) as depicted in Figure 7.12.
This subtree requires picking object A and then placing it into the storage. However
after the BT is expanded to place A into the storage it contains a conflict: in order
7.1 The Planning and Acting (PA-BT) approach 105
?
1
sg ?∗
→ →
? s5 → s g ? s3 → sg
2 3
s5 ?∗ s3 ?∗
→ → → → → →
4 5 6 7 8 9
sg sg → s5 s4 s4 → s5 s2 s 2 → s5 s4 s4 → s3 sg sg → s3 s1 s1 → s3
→ →
? s5 → sg ? s 3 → sg
2 3
s5 ∗ s3 ∗
? ?
→ → → → → →
4 6 7 8 9
sg sg → s5 ? s4 → s5 s2 s2 → s5 s4 s4 → s 3 sg sg → s3 s1 s1 → s3
5
s4 ∗
?
→ → →
10 11 12
s5 s5 → s4 s1 s1 → s4 s3 s 3 → s4
→ →
? s5 → sg ? s 3 → sg
2 3
s5 ?∗ s3 ?∗
→ → → → → →
4 6 7 8
sg sg → s5 ? s4 → s5 s2 s2 → s5 s4 s4 → s 3 sg sg → s3 ? s1 → s3
5 9
s4 ?∗ s1 ?∗
→ → → → → → →
10 11 12 13 14 15 16
s5 s5 → s4 s1 s1 → s4 s3 s 3 → s4 s3 s3 → s1 s4 s4 → s1 s2 s 2 → s1 s0 s 0 → s1
Fig. 7.8: Next BT updates during execution of Example 7.3. The numbers represent the index of
the BFS of Algorithm 3. Note that the node labeled with ?∗ is a Fallback node with memory.
to remove object D the robot needs to grasp it. But to let the ticks reach this tree,
the condition Holding() = a needs to returns Success. Clearly the robot cannot hold
106 7 Behavior Trees and Automated Planning
Fig. 7.9: An example scenario from [33], with the addition of two externally controlled robots
(semi-transparent) providing disturbances. The robot must wash the object ”A” and then put in
into the storage.
both A and D. The new subtree is moved in the BT to a position with a higher priority
(See Algorithm 5 Line 12) and the resulting BT is the one depicted in Figure 7.13.
Note that the final BT depicted in Figure 7.13 is similar to the planning and execu-
tion tree of [33] with the key difference that the BT enables the continual monitoring
of the plan progress, as described in [49]. For example, if A slips out of the robot
gripper, the robot will automatically stop and pick it up again without the need to re-
plan or change the BT. Moreover if D moves away from the storage while the robot
is approaching it to remove it, the robot aborts the plan to remove D and continues
with the plan to place A in the storage room. Hence we can claim that the PA-BT is
reactive. The execution is exactly the same as [33]: the robot removes the obstruct-
ing objects B and C then places A into the washer. When A is clean, the robot picks it
up, but then it has to unpick it since it has to move D away from the storage. This is
a small drawback of this type of planning algorithms. Again, as stressed in [25] the
cost of a non-optimal plan is often much lower than the cost of extensive modeling,
information gathering and thorough deliberation needed to achieve optimality.
7.1 The Planning and Acting (PA-BT) approach 107
Clean(a) In(a,storage)
? In(a,storage)
Clean(a) →
In(a,washer) Wash
? In(a,storage)
Clean(a) →
? Wash
In(a,washer) →
? In(a,storage)
Clean(a) →
? Wash
In(a,washer) →
holding()=a →
? In(a,storage)
Clean(a) →
? Wash
In(a,washer) →
holding()=a →
ClearX(swept a,a) →
Overlaps(b,swept a) Overlaps(c,swept a)
? ?
Clean(a) → In(a,storage) →
? Wash ? Place(a,storage)
In(a,washer) → holding()=a →
holding()=a → ?
Overlaps(d,swept a)
ClearX(swept a,a) ? = False
→
Overlaps(b,swept a)
= False
→ ? Place(d,51040)
ClearX(swept b,b) ?
Overlaps(c,swept b) =
False
→
? Place(c,ps29385)
holding()=c →
Fig. 7.12: BT containing a conflict. The subtree created to achieve ClearX(swept a, storage) is in conflict with the subtree created to achieve holding() = a.
109
→
110
? ?
Clean(a) → In(a,storage) →
? Wash ? ? Place(a,storage)
Overlaps(b,swept a)
= False
→ holding()=∅
ClearX(swept b,b) ?
Overlaps(c,swept b) =
False
→
holding()=c →
7.1.6 Reactiveness
In this section we show how BTs enable a reactive blended acting and planning,
providing concrete examples that highlight the importance of reactiveness in robotic
applications.
Reactiveness is a key property for online deliberation. By reactiveness we mean
the capability of dealing with drastic changes in the environment in short time. The
domains we consider are highly dynamic and unpredictable. To deal with such do-
mains, a robot must be reactive in both planning and acting.
If an external event happens that the robot needs to react to, one or more con-
ditions will change. The next tick after the change will happen at most a time 1ft
after the change. Then a subset of all Conditions, Sequences and Fallbacks will be
evaluated to either reach an Action, or to invoke additional planning. This takes less
than ∆t. The combined reaction time is thus bounded above by 1ft + ∆t.
Remark 7.2. Note that ∆t is strictly dependent on the real world implementation.
Faster computers and better code implementation allows a smaller ∆t.
We are now ready to show three colloquial examples, one highlighting the reac-
tive acting (preemption and re-execution of subplans if needed), and two highlight-
ing the reactive planning, expanding the current plan as well as exploiting serendip-
ity [36].
Example 7.5 (Reactive Acting). Consider the robot in Figure 7.9, running the BT
of Figure 7.13. The object A is not clean and it is not in the washer, however the
robot is holding it. This results in the Condition nodes Clean(a) and In(a,washer)
returning Failure and the Condition node holding()=a Success. According to the
BT’s logic, the ticks traverse the tree and reach the action Place(a,washer). Now,
due to vibrations during movements, the object slips out of the robot’s grippers. In
this new situation the Condition node holding()=a now returns Failure. The ticks
now traverse the tree and reach the action Pick(a,aCurrent) (i.e. pick A from the
current position, whose preconditions are satisfied). The robot then re-picks the ob-
ject, making the Condition node holding()=a returning Success and letting the robot
resume the execution of Place(a,washer).
Example 7.6 (Reactive Planning). Consider the robot in Figure 7.9, running the BT
of Figure 7.13. The object A is clean, it is not in the storage and the robot is not hold-
ing it. This results in the Condition nodes holding()=a and In(a,storage) returning
Failure and the Condition nodes holding()=0, / Clean(a) and
ClearX(swept a,washer) returning Success. According to the BT’s logic, the ticks
traverse the tree and reach the action Pick(a,washer) that let the robot approach
the object and then grasp it. While the robot is approaching A, an external uncon-
trolled robot places an object in front of A, obstructing the passage. In this new
situation the Condition node ClearX(swept a,washer) now returns Failure. The ac-
tion Pick(a,washer) no longer receives ticks and it is then preempted. The BT is
expanded accordingly finding a subtree to make ClearX(swept a,washer) return
112 7 Behavior Trees and Automated Planning
Success. This subtree will make the robot pick the obstructing object and remove
it. Then the robot can finally reach A and place it into the storage.
Example 7.7 (Serendipity Exploitation). Consider the robot in Figure 7.9, running
the BT of Figure 7.13. The object A is not clean, it is not in the washer and the
robot is not holding it. According to the BT logic, the ticks traverse the tree and
reach the action Pick(b,bStart). While the robot is reaching the object, an external
uncontrolled agent picks B and removes it. Now the condition Overlaps(b,swept a)
= False returns Success and the BT preempts the execution of Pick(b,bStart) and
skips the execution of Places(b,ps28541) going directly to execute Pick(a,aStart).
7.1.7 Safety
In this section we show how BTs allow a safe blended acting and planning, provid-
ing a concrete example that highlights the importance of safety in robotics applica-
tions.
Safety is a key property for robotics applications. By safety we mean the capa-
bility of avoiding undesired outcomes. The domains we usually consider have few
catastrophic outcomes of actions and, as highlighted in [33], the result of an action
can usually be undone by a finite sequence of actions. However there are some cases
in which the outcome of the plan can damage the robot or its surroundings. These
cases are assumed to be identified in advance by human operators, who then add the
corresponding sub-BT to guarantee the avoidance of them. Then, the rest of the BT
is expanded using the algorithm described above.
We are now ready to show a colloquial example.
? Goal
Fig. 7.14: A safe BT for Example 7.8. The BT guaranteeing safety is combined with the mission
objective constraint.
7.1 The Planning and Acting (PA-BT) approach 113
Example 7.8 (Safe Execution). Consider the multipurpose robot of Example 7.4.
Now, due to overheating, the robot has to stop whenever the motors’ temperatures
reach a given threshold, allowing them to cool down. This situation is relatively
easy to model, and a subtree to avoid it can be designed as shown in Figure 7.14.
When running this BT, the robot will preempt any action whenever the temperature
is above the given threshold and stay inactive until the motors have cooled down to
the temperature where Cool Down Motors return Success. Note that the Not Cooling
Down part of the condition is needed to provide hystereses. The robot stops when
TMAX is reached, and waits until Cool Down Motors return Success at some given
temperature below TMAX . To perform the actual mission, the BT in Figure 7.14 is
executed and expanded as explained above.
Thus, we first identify and handle the safety critical events separately, and then
progress as above without jeopardizing the safety guarantees. Note that the tree in
Figure 7.14, as well as all possible expansions of it using the PA-BT algorithm is
safe (see Section 5.3).
1 http://wiki.ros.org/behavior_tree
2 https://youtu.be/mhYuyB0uCLM
7.1 The Planning and Acting (PA-BT) approach 115
(b) The robot has to move the blue cube away from the path to the goal. But the robot is currently
grasping the green cube. Hence the subtree created to move the blue cube needs to have a higher
priority.
(a) While the robot is moving towards the goal region, the green cube slips out of the gripper. The
robot reactively preempts the subtree to move to the goal and re-executes the subtree to grasp the
green cube. Without replanning.
(b) The robot places the object onto the desired location.
Experiment 7.2 (Safety) In this experiment the robot is asked to perform the same
task as in Experiment 7.1 with the main difference that now the robot’s battery can
run out of power. To avoid this undesired irreversible outcome, the initial BT is
manually created in a way that is similar to the one in Figure 7.14, managing the
battery charging instead. As might be expected, the execution is similar to the one
described in Experiment 7.1 with the difference that the robot reaches the charging
station whenever needed: The robot first reaches the green cube (see Figure 7.17a).
Then, while the robot is approaching the blue cube, the battery level becomes low.
Hence the subplan to reach the blue cube is aborted and the subplan to charge the
battery takes over (see Figure 7.17b). When the battery is charged the robot can
resume its plan (see Figure 7.18a) and complete the task (see Figure 7.18b).
(b) Due to the low battery level, the robot moves to the charging station.
(a) Once the battery is charged, the robot resumes its plan.
(b) The robot places the object onto the desired location.
Experiment 7.3 (Fault Tolerance) In this experiment the robot is asked to per-
form the same task as in Experiment 7.1 with the main difference that the robot
is equipped with an auxiliary arm and a fault can occur to either arm, causing the
arm to stop functioning properly. The robot starts the execution as in the previous
experiments (see Figure 7.19a). However while the robot is approaching the goal
area, the primary arm breaks, making the green cube fall on the ground (see Fig-
ure 7.19b). The robot now tries to re-grasp the object with the primary arm, but this
action fails since the grippers are no longer attached to the primary arm, hence the
robot tries to grasp the robot with the auxiliary arm. However the auxiliary arm
is too far from the object, and thus the robot has to move in a different position
(see Figure 7.20a) such that the object can be grasped (see Figure 7.20b) and the
execution can continue.
(a) The robot moves the blue cube away from the path to the goal.
(b) A fault occurs on the primary arm (the grippers break) and the green cube falls to the floor.
(a) The robot rotates to have the object closer to the auxiliary arm.
(b) The robot grasps the object with the auxiliary arm.
Experiment 7.4 (Dynamic Environment) In this experiment the single armed ver-
sion of the robot co-exists with other uncontrolled external robots. The robot is asked
to place the green cube in the goal area on the opposite side of the room. The robot
starts picking up the green cube and moves towards an obstructing object (a blue
cube) to place it to the side (see Figure 7.21a). Being single armed, the robot has to
ungrasp the green cube (see Figure 7.21b) to grasp the blue one (see Figure 7.22a).
While the robot is placing the blue cube to the side, an external robot places a
new object between the controlled robot and the green cube (see Figure 7.22b). The
plan of the robot is then expanded to include the removal of this new object (see
Figure 7.22c). Then the robot can continue its plan by re-picking the green cube,
without replaning. Now the robot approaches the yellow cube to remove it (see Fig-
ure 7.23a), but before the robot is able to grasp the yellow cube, another external
robot picks up the yellow cube (see Figure 7.23b) and places it to the side. The
subplan for removing the yellow cube is skipped (without replaning) and the robot
continues its task until the goal is reached (see Figure 7.23c).
(b) The blue cube obstructs the path to the goal region. The robot drops the green cube in order to
pick the blue cube.
(b) While the robot moves the blue cube away from the path to the goal, an external agent places a
red cube between the robot and the green cube.
(c) The robot moves the red cube away from the path to the goal.
(a) The yellow cube obstructs the path to the goal region. The robot drops the green cube in order
to pick the yellow cube.
(b) While the robot approaches the yellow cube, an external agent moves the yellow cube away.
(c) The robot picks the green cube and places it onto the goal region.
In these scenarios, an ABB Yumi has to assemble a cellphone whose parts are scat-
tered across a table, see Figure 7.24. The robot is equipped with two arms with
simple parallel grippers, which are not suitable for dexterous manipulation. Fur-
thermore, some parts must be grasped in a particular way to enable the assembly
operation.
Experiment 7.5 In this experiment, the robot needs to re-orient some cellphone’s
parts to expose them for assembly. Due to the gripper design, the robot must reorient
the parts by performing multiple grasps transferring the part to the other gripper,
see Figure 7.24b, effectively changing its orientation (see Figures 7.25a-7.25b).
(a) The robot picks the cellphone’s chassis. The chassis cannot be assembled with this orientation.
(a) The chassis is placed onto the table with a different orientation than before (the opening part is
facing down now).
(b) The robot picks the chassis with the new orientation.
(a) The robot pick the next cellphone’s part to be assembled (the motherboard).
Behavior Library
Sensors
World
3 In the first version of ABL, the tree structure that stores all the goals is called “Active Behavior
Tree”. This tree is related to, but different from the BTs we cover in this book (e.g. no Fallbacks
and no ticks). Later work used the classic BT formulation also for ABL.
128 7 Behavior Trees and Automated Planning
An agent running an ABL planner is called an ABL Agent. Figure 7.27 depicts the
architecture of an ABL agent. The behavior library is a repository of pre-defined
behaviors where each behavior consists of a set of actions to execute to accomplish
a goal (e.g. move to given location). There are two kinds of behaviors in ABL,
sequential behaviors and parallel behaviors. The working memory is a container
for any information the agent must access during execution (e.g. unit’s position on
the map). The sensors report information about changes in the world by writing that
information into the working memory (e.g. when another agent is within sight). The
tree (henceforth denoted ABL tree to avoid confusion) is an execution structure that
describes how the agent will act, and it is dynamically extended. The ABL tree is
initially defined as a collection of all the agent’s goals, then it is recursively extended
using a set of instructions that describe how to expand the tree. Figure 7.28 shows
the initial ABL tree for the ABL Pac-Man Agent. Below we describe the semantic
of ABL tree instructions.
i n i t i a l −t r e e {
subgoal handleGhosts ( ) ;
subgoal e a t A l l P i l l s ( ) ;
}
Fig. 7.28: Example of an initial ABL tree instruction of the ABL agent for Pac-Man.
A spawngoal instruction is the key component for expanding the BT. It defines
the subgoals that must be accomplished to achieve a behavior.
Remark 7.3. The main difference between the instructions subgoal and spawngoal is
that the spawngoal instruction is evaluated in a lazy fashion, expanding the tree only
when the goal spawned is needed for the first time, whereas the subgoal instruction
7.2 Planning using A Behavior Language (ABL) 129
p a r a l l e l behavior e a t A l l P i l l s (){
mental act recordData ( ) ;
a c t exploreRoom ( ) ;
}
is evaluated in a greedy fashion, requiring the details on how to carry out the subgoal
at design time.
Fig. 7.32: Example of a precondition instruction of the ABL agent for Pac-Man.
c o n f l i c t keepDistanceFromDeadlyGhost followOptimalPath ;
Fig. 7.33: Example of a conflict instruction of the ABL agent for Pac-Man.
A conflict instruction specifies priority order if two or more actions are scheduled
for execution at the same time, using the same (virtual) actuator.
130 7 Behavior Trees and Automated Planning
5 case PlaceholderNode do
6 node ←GetBT(node.goal)
7 Execute(node)
8 otherwise do
9 Tick(node)
The execution of the algorithm is simple. It first creates a BT from the initial ABL
tree t, collecting all the subgoals in a BT Parallel composition (Algorithm 8, Line 1),
then, the tree is extended by finding a BT for each subgoal (Algorithm 8, Line 3).
Each subgoal is translated in a corresponding BT node (Sequence or Parallel, ac-
cording to the behavior in the behavior library) whose children are the instruction
of the subgoal. If a behavior has precondition instructions, they are translated into
BT Condition nodes, added first as children (Algorithm 9, Line 15). If a behavior
has act or mental act instruction, they are translated into BT Action nodes (Algo-
rithm 9, Lines 9-12) and set as children. If a behavior has spawngoal instruction
(Algorithm 9, Line 13), this is added as a placeholder node, which, when ticked,
extends itself as done for the subgoals (Algorithm 10, Line 6).
We are now ready to see how the algorithm is executed in a simple Pac-Man
game.
Example 7.10 (Simple Execution in Pac-Man). While Pac-Man has to avoid being
eaten by the ghosts, he has to compute the path to take in order to eat all Pills.
The ABL tree Pac-Man agent is shown in Figure 7.34. Running Algorithm 8, the
initial tree is translated into the BT in Figure 7.35. The subgoal eatAllPills is ex-
panded as the sequence of the two BT’s Action nodes computeOptimalPath and
followOptimalPath. The subgoal handleGhosts is extended as a Sequence compos-
tion of the Condition node ghostClose (which is a precondition for handleGhosts)
and a Parallel composition of placeholder nodes handleDeadlyGhosts and han-
dleScaredGhosts. The BT is ready to be executed. Let’s imagine that for a while
Pac-Man is free to eat pills without being disturbed by the ghosts. For this time
the condition ghostClose is always false and the spawn of neither handleDead-
lyGhosts nor handleScaredGhosts is invoked. Imagine now that a Ghost is close
for the first time. This will trigger the expansion of handleDeadlyGhosts and han-
dleScaredGhosts. The expanded tree is shown in Figure 7.36.
132 7 Behavior Trees and Automated Planning
pacman agent{
i n i t i a l −t r e e {
subgoal handleGhosts ( ) ;
subgoal e a t A l l P i l l s ( ) ;
}
}
s e q u e n t i a l behavior handleScaredGhost (){
precondition {
( scaredGhostClose ) ;
}
a c t moveToScaredGhost ( ) ;
}
s e q u e n t i a l behavior handleDeadlyGhost (){
precondition {
( deadlyGhostClose ) ;
}
a c t keepDistanceFromDeadlyGhost ( ) ;
}
c o n f l i c t k e e p D i s t a n c e F r o m D e a d l y G h o s t moveToScaredGhost f o l l o w O p t i m a l P a t h ;
}
→ →
Compute Follow
Ghost
⇒ Optimal Optimal
Close
Path Path
Handle Handle
Scared Deadly
Ghosts Ghosts
→ →
Compute Follow
Ghost
⇒ Optimal Optimal
Close
Path Path
Handle Deadly Ghosts Handle Scared Ghosts
→ →
Keep Distance
Deadly Scared Move To
from Deadly
Ghost Close Ghost Close Scared Ghost
Ghost
Fig. 7.37: A screenshot of StarCraft showing two players engaged in combat [75].
We will now very briefly describe the results from a more complex scenario,
from [75]. One of the most well known strategy computer games that require multi-
scale reasoning is the real-time strategy game StarCraft. In StarCraft the players
manage groups of units to compete for the control of the map by gathering re-
sources to produce buildings and units, and by researching technologies that un-
lock more advanced abilities. Building agents that perform well in this domain is
challenging due to the large decision space [1]. StarCraft is also a very fast-paced
game, with top players performing above 300 actions per minute during peak inten-
sity episodes [40]. This shows that a competitive agent for StarCraft must reason
quickly at multiple scales in order to achieve efficient game play.
Example 7.11. The StarCraft ABL agent is composed of three high-level managers:
Strategy manager, responsible for the strategy selection and attack timing compe-
tencies; Production manager, responsible for the worker units, resource collection,
and expansion; and Tactics manager, responsible for the combat tasks and micro-
management unit behaviors. The initial ABL tree takes the form of Figure 7.38.
i n i t i a l −t r e e {
subgoal ManageTactic ( ) ;
subgoal ManageProduction ( ) ;
subgoal ManageStrategy ( ) ;
}
Further discussions of the specific managers and behaviors are available in [75],
and a portion of the BT after some rounds of the game is shown in Figure 7.39.
⇒ Train Vulture ⇒
Table 7.1: Win rate on different map/race combination over 20 trials [74].
In [74] the ABL agent of Example 7.11 was evaluated against the build-in Star-
Craft AI. The agent was tested against three professional gaming maps: Andromeda,
Destination, and Heartbreak Ridge; against all three races: Protoss, Terran, and Zerg
over 20 trials. The result are shown in Table 7.1. The ABL agent scored an overall
win rate of over 60%, additionally, the agent was capable to perform over 200 game
actions per minute, highlighting the capability of the agent to combine low-level
tactical task with high-level strategic reasoning.
Remark 8.1. The use of BTs as the knowledge representation framework in the GP
avoids the problem of logic violation during cross-over stressed in [19]. The logic
violation occurs when, after the cross-over, the resulting individuals might not have
a consistent logic structure. One example of this is the crossover combination of two
137
138 8 Behavior Trees and Machine Learning
⇒ →
? → A2 ?
A1 B1 C1 D1 B2 C2
⇒ →
? A2 → ?
A1 B1 C1 D1 B2 C2
FSMs that might lead to a logic violation in terms of some transitions not leading to
an existing state.
Safe
Learn
Subtree
Fig. 8.2: The initial BT is a combination of the BT guaranteeing safety, and the BT Learn that will
be expanded during the learning.
Example 8.1. Consider the case of the Mario AI setup in Figure 8.5a, starting with
the BT in Figure 8.2.
The objective of Mario is to reach the rightmost end of the level. The safety BT is
optional, but motivated by the need to enable guarantees that the agent avoids some
regions of the state space that are known by the user to be un-safe. Thus, this is the
only part of GP-BT that requires user input.
The un-safe regions must have a conservative margin to enable the safety action
to act before it is too late (see Section 5.3). Thus we cannot use the enemies as
unsafe regions as Mario needs to move very close to those to complete the level.
Therefore, for illustrative purposes, we let the safety action guarantee that Mario
never reaches the leftmost wall of the level.
Mario starts really close to the left most wall, so the first thing that happens is that
the safety action moves Mario a bit to the right. Then the Learn action is executed.
This action first checks all inputs and creates a BT, Tcondτ , of conditions that
returns Success if and only if all inputs correspond to the current “situation”, as will
be explained below.
Then the learning action executes all single actions available to Mario, e.g. go
left, go right, jump, fire etc. and checks the resulting reward.
140 8 Behavior Trees and Machine Learning
All actions yielding an increase in the reward are collected in a Fallback com-
position BT, Tactsi , sorted with respect to performance, with the best performing
action first.
If no single action results in a reward increase, the GP is invoked to find a combi-
nation of actions (arbitrary BTs are explored, possibly including parallel, Sequence
and Fallback nodes) that produces an increase. Given some realistic assumptions,
described in [14], such a combination exists and will eventually be found by the GP
according to previous results in [63], and stored in Tactsi .
Then, the condition BT, Tcondτ , is composed with the corresponding action BT,
Tactsi , in a Sequence node and the result is added to the previously learned BT, with
a higher priority than the learning action.
Finally, the new BT is executed until once again the learning action is invoked.
14 ρ ← GetReward(SequenceNode(Tsa f e ,T ))
15 while ρ < 1;
16 return T
Situation
(t) (t) (t)
S (t) = [CT ,CF ] is the situation vector, where CT = {CT 1 , . . . ,CT N } is the set of
(t)
conditions that are true at time t and CF = {CF1 , . . . ,CFM } is the set of conditions
that are false at time t.
Then, using the analogy between AND-OR trees and BTs [12], we create the BT
that returns success only when a given situation occurs.
Tcondτ
→ →
¬ ¬ ¬ ¬
cT 1 cT 2 ··· cT N cF 1 cF 2 ··· cF M
Fig. 8.3: Graphical representation of Tcondτ , cFi ∈ CFτ , cT j ∈ CTτ . The Decorator is the negation
Decorator (i.e it inverts the Success/Failure status). This BT returns success only when all cT j are
true and all cFi are false.
Tactsi
Given a small ε > 0, if at least one action results in an increase in reward, ∆ ρ > ε,
let AP1 , . . . , APÑ be the list of actions that result in such an improvement, sorted on
the size of ∆ ρ, then, Tactsi is defined as:
New BT
If Tactsi is not contained in the BT, the new BT learned is given as follows:
Else, if Tactsi is contained in the BT, i.e., there exists an index j 6= i such that
Tactsi = Tacts j , this means there is already a situation identified where Tactsi is the
appropriate response. Then, to reduce the number of nodes in the BT, we general-
ize the two special cases where this response is needed. This is done by combining
Tcondi with Tcond j , that is, we find the BT, Tcondi j , that returns success if and only if
Tcondi or Tcond j return success. Formally, this can be written as
Figure 8.4 shows an example of this simplifying procedure. As can be seen this
simplification generalizes the policy by iteratively removing conditions that are not
relevant for the application of the specific action. It is thus central for keeping the
number of nodes low, as seen below in Fig. 8.8.
→ Avoid →
Obstacle
Obstacle Avoid →
¬ →
in Front Obstacle
(a) The action Avoid Obsta- (b) The action Avoid Obsta- (c) Simplified merged Tree.
cle is executed if there is an cle is executed if there is an The action Avoid Obstacle is
obstacle in front and the sun obstacle in front and the sun executed if there is an obsta-
is not shining. is shining. cle in front.
Fig. 8.4: Example of the simplifying procedure in (8.7). The two learned rules (a) and (b) are
combined into (c): The important condition appears to be Obstacle in Front, and there is no reason
to check the condition The Sun is Shining. These simplifications generalize the policies, and keep
the BT sizes down.
Given these definitions, we can go through the steps listed in Algorithm 11. Note
that the agent runs Ti until a new situation is encountered which requires learning
an expanded BT, or the goal is reached.
The BT T is first initialized to be a single action, Action Learn, which will
be used to trigger the learning algorithm. Running Algorithm 11 we execute the
8.2 The GP-BT Approach 143
Sequence composition of the safe subtree Tsa f e (generated manually, or using a non-
learning approach) with the current tree T (Algorithm 11 Line 3). The execution of
Tsa f e overrides the execution of T when needed to guarantee safety. If the action
Action Learn is executed, it means that the current situation in not considered in
neither Tsa f e nor T , hence a new action, or action composition, must be learned.
The framework first starts with the greedy approach (Algorithm 11, Line 6) where
it tries each action and stores the ones that increase the reward function, if no such
actions are available (i.e. the reward value is a local maximum), then the framework
starts learning the BT composition using the GP component (Algorithm 11, Line
8). Once the tree Tacts is computed, by either the greedy or the GP component,
the algorithm checks if Tacts is already present in the BT as a response to another
situation, and the new tree T can be simplified using a generalization of the two
situations (Algorithm 11, Line 11). Otherwise, the new tree T is composed by the
selector composition of the old T with the new tree learned (Algorithm 11, Line
13). The algorithm runs until the goal is reached. The algorithm is guaranteed to
lead the agent to the goal, under reasonable assumptions [14].
This function returns the tree Tcond which represents the current situation. Tcond is
computed according to Equation (8.3).
This function returns the tree Tacts which represent the action to execute whenever
the situation described by Tcond holds. Tacts is a Fallback composition with memory
of all the actions that, if performed when Tcond holds, increases the reward. The
function LearnSingleAction runs the same episode Na (number of actions) times
executing a different action whenever Tcond holds. When trying a new action, if
the resulting reward increases, this action is stored. All the actions that lead to an
increased reward are collected in a Fallback composition, ordered by the reward
value. This Fallback composition, if any, is then returned to Algorithm 11.
144 8 Behavior Trees and Machine Learning
If the resulting Tacts is present in T (Algorithm 11, Line 9) this means that there
exist another situation Sexist described by the BT Tcondexist , where the response in
Tacts is appropriate. To reduce the number of nodes in the updated tree, we create a
new tree that captures both situations S and Sexist . This procedure removes from
Tcondexist a single condition c that is present in CF for one situation (S or Sexist ) and
CS for the other situation.
Remark 8.2. Note that the GP component is invoked exclusively whenever the
greedy component fails to find a single action.
8.2.4.1 Mario AI
so-called seeds for the learning episode and the validating episode. The result shown
below are cross-validated in this way.
GP Parameters Whenever the GP part is invoked, it starts with 4 random BTs
composed by one random control flow node and 2 random leaf nodes. The number
of individuals in a generation is set to 25.
Scenarios We ran the algorithm in five different scenarios of increasing diffi-
culty. The first scenario had no enemies and no gaps, thus only requiring motion
to the right and jumping at the proper places. The resulting BT can be seen in Fig-
ure 8.7a where the action Jump is executed if an obstacle is in front of Mario and
the action Go Right is executed otherwise. The second scenario has no obstacles but
it has gaps. The resulting BT can be seen in Figure 8.7b where the action Jump is
executed if Mario is close to a gap. The third scenario has high obstacles, gaps and
walking enemies. The resulting BT can be seen in Figure 8.7c which is similar to a
combination of the previous BTs with the addition of the action Shoot executed as
soon as Mario sees an enemy (cell 14), and to Jump higher obstacles Mario cannot
be too close. Note that to be able to show the BTs in a limited space, we used the
Pruning procedure mentioned in Section 8.2.3. A video is available that shows the
performance of the algorithm in all 5 scenarios.1
1 https://youtu.be/QO0VtUYkoNQ
8.2 The GP-BT Approach 147
→ Go Right
?
→ Jump
→ Go Right
¬ ¬
Obstacle
Jump Obstacle Obstacle
in Cell 12 in Cell 16 in Cell 17
→ → Go Right
Enemy in
Shoot ? Jump
Cell 14
Obstacle Obstacle →
in Cell 12 in Cell 8
¬ ¬
Obstacle Obstacle
in Cell 16 in Cell 17
conditions is larger than the number of actions, hence the pure GP approach often
constructs BTs that check a large amount of conditions, while performing very few
actions, or even none. Without a greedy component and the AND-OR-tree gener-
alization with the conditions, a pure GP approach, like the one in [66], is having
difficulties without any a-priory information. Looking at the performance in Fig-
ure 8.8 we see that GP-BT is the only one who reaches a reward of 1 (i.e. the task
is completed) within the given execution time, not only for Scenario 5, but also for
the less complex Scenario 1.
Remark 8.3. Note that we do not compare GP-BT with the ones of the Mario AI
challenge, as we study problems with no a-priori information or model, whereas the
challenge provided full information about the task and environment. When the GP-
BT learning procedure starts, the agent (Mario) does not even know that the enemies
should be killed or avoided.
The other scenarios have similar result. We chose to depict the simplest and the
most complex ones.
148 8 Behavior Trees and Machine Learning
Reward Reward
1 1
0.5 0.5
0 0
0 2 4 6 8 0 2 4 6 8 10 12 14 16
Time (sec) Time (sec)
Fig. 8.8: Reward value comparison (a and b) and nodes number comparison (c and d). The blue
solid line refers to GP-BT. The red dash-dotted line refers to the pure GP-based algorithm. The
green dashed line refers to the FSM-based algorithm.
As mentioned above, we use the KUKA Youbot to verify GP-BT on a real scenario.
We consider three scenarios, one with a partially known environment and two with
completely unknown environments.
Consider the Youbot in Fig. 8.5b, the conditions are given in terms of the 10
receptive fields and binary conditions regarding a number of different objects, e.g.
larger or smaller obstacles. The corresponding actions are: go left/right/forward,
push object, pick object up etc. Again, the problem is to learn a switching policy
mapping conditions to actions.
Setup The robot is equipped with a wide range HD camera and uses markers
to recognize the objects nearby. The recognized objects are mapped into the robot
simulation environment V-REP [18]. The learning procedure is first tested on the
simulation environment and then executed on the real robot.
Actions Move Forward, Move Left, Move Right, Fetch Object, Slide Object to
the Side, Push Object.
Conditions Wall on the Left, Wall on the Right, Glass in Front, Glass on the
Left, Glass on the Right, Cylinder in Front, Cylinder on the Left, Cylinder on the
Right, Ball in Front, Ball on the Left, Ball on the Right, Big Object in Front, Big
Object on the Left, Big Object on the Right.
Scenarios In the first scenario, the robot has to traverse a corridor dealing with
different objects that are encountered on the way. The destination and the position
of the walls is known a priori for simplicity. The other objects are recognized and
8.3 Reinforcement Learning applied to BTs 149
mapped once they enter the field of view of the camera. The second scenario il-
lustrates the reason why GP-BT performs the learning procedure for each different
situation. The same type of cylinder is dealt with differently in two different situa-
tions.
In the third scenario, a single action is not sufficient to increase the reward. The
robot has to learn an action composition using GP to reach the goal.
A YouTube video is available that shows all three scenarios in detail2 .
where k is the iteration index (increasing every time the agent receives an update), r
is the reward, γ is the discount factor that trades off the influence of early versus late
rewards, and αk is the learning rate that trades off the influence of newly acquired
information versus old information.
The algorithm converges to an optimal policy that maximizes the reward if all
admissible state-action pairs are updated infinitely often [69, 3]. At each state, the
optimal policy selects the action that maximizes the Q value.
Hierarchical Reinforcement Learning
The vast majority of RL algorithms are based upon the MDP model above. How-
ever, Hierarchical RL (HRL) algorithms have their basis in Semi MDPs (SMDPs) [3].
SMDPs enable a special treatment of the temporal aspects of the problem and,
thanks to their hierarchical structure, reduce the impact of the curse of dimension-
ality by dividing the main problem into several subproblems.
Finally, the option framework [70] is a SMDP-based HRL approach used to ef-
ficiently compute the Q-function. In this approach, the options are a generalization
of actions, that can call other options upon execution in a hierarchical fashion until
a primitive option (an action executable in the current state) is found.
manually designed BT where a subset of nodes are replaced with so-called Learning
Nodes, which can be both actions and flow control nodes.
In the Learning Action node, the node encapsulates a Q-learning algorithm, with
states s, actions A, and reward r defined by the user. For example, imagine a robot
with an Action node that need to “pick an object”. This learning Action node uses
Q-learning to learn how to grasp an object in different positions. Then, the state is
defined as the pose of the object with respect to the robot’s frame, the actions as the
different grasp poses, and the reward function as the grasp quality.
In the Learning Control Flow Node, the node is a standard Fallback with flexible
node ordering, quite similar to the ideas described in Chapter 4. Here, the designer
chooses the representation for the state s, while the actions in A are the children
of the Fallback node, which can be learning nodes or regular nodes. Hence, given
a state s, the Learning Control Flow Node selects the order of its own children
based on the reward. The reward function can be task-depended (i.e. the user finds
a measure related to the specific task) or return value-dependent (i.e., it gives a
positive reward for Success and negative reward for Failure). For example, consider
a NPC that has three main goals: find resources, hide from stronger players, and
attack weaker ones. The Learning Control Flow node has 3 subtrees, one for each
goal. Hence the state s can be a vector collecting information about players’ position,
weapons position, health level, etc. The reward function can be a combination of
health level, score, etc.
In [57], a formal definition of the learning node is made, with analogies to the
Options approach for HRL to compute the Q-function, and connections between
BTs and the recursive nature of the approach.
? ? Change
Room
Use
No Save No
Extinguisher
Victim Victim Fire
X
Fig. 8.9: The BT model for the first agent. Only a single Action node (Use Extinguisher) has
learning capabilities.
Figure 8.9 depicts the BT modeling the behavior of the learning agent. In the
learning Action node Use Extinguisher X the state is defined as s = hfire typei, where
fire type = {1, 2, 3}, and the available actions are as follows:
A = {Use Extinguisher A, Use Extinguisher B, Use Extinguisher B}. The reward is
defined as 10 if the extinguisher can put out the fire and −10 otherwise. The results
in [57] show that the accuracy converges to 100%.
Scenario 2
This scenario is a more complex version of the one above, where we consider the
time spent to execute an action.
The actions Save Victim and Use Extinguisher X now take time to complete, de-
pending on the fire intensity. Any given fire has an intensity fire intensity ∈ {1, 2, 3},
chosen randomly for each room. The fire intensity specifies the time steps needed
to extinguish a fire. The fire is extinguished when its intensity is reduced to 0.
The fire intensity is reduced by 1 each time the agent uses the correct extin-
guisher. The action change room is still executed instantly and the use of the wrong
extinguisher makes the agent lose the room.
? Change ?
Room
Use
No Save No
Extinguisher
Victim Victim Fire
X
Fig. 8.10: The BT model for the second agent using two nested Learning Nodes.
8.5 Learning from Demonstration applied to BTs 153
Figure 8.10 shows the BT modeling the learning agent. It uses 2 learning nodes.
The first, similar to the one used in Scenario 1, is a learning Action node using. In
that node, the state is defined as s = hfire typei, where fire type = {1, 2, 3}, the action
10
set as a = {A, B,C}, and the reward as fire intensity if the extinguisher can extinguish
the fire and −10 otherwise.
The second learning node is a learning control flow node. It learns the behav-
ior that must be executed given the state s = hhas victim?, has fire?i. The node’s
children are the actions A = {save victim, use extinguisher X, change room}. This
learning node receives a cumulative reward, −10 if the node tries to save and there
is no victim, −1 while saving the victim, and +10 when the victim is saved; −10
if trying to extinguish a non-existing fire, −1 while extinguishing it, and +10 if the
fire is extinguished; +10 when the agent leaves the room at the right moment and
−10 otherwise.
The results in [57] show that the accuracy converges to 97-99%. The deviation
from 100% is due to the fact that the learning control flow node needs some steps of
trial-and-error to learn the most effective action order.
In robotics, the Intera5 software for the Baxter and Sawyer robots provides learn-
ing by demonstration support3 .
In computer games, one can imagine a game designer controlling the NPC dur-
ing a training session in the game, demonstrating the expected behavior for that
character in different situations. One such approach called Trained BTs (TBTs) was
proposed in [64].
TBT applies the following approach: it first records traces of actions executed
by the designer controlling the NPC to be programmed, then it processes the traces
to generate a BT that can control the same NPC in the game. The resulting BT can
then be further edited with a visual editor. The process starts with the creation of a
minimal BT using the provided BT editor. This initial BT contains a special node
called a Trainer Node (TN). Data for this node are collected in a game session where
the designer simulates the intended behavior of the NPC. Then, the trainer node is
replaced by a machine-generated sub-behavior that, when reached, selects the task
that best fits the actual state of the game and the traces stored during the training
session. The approach combines the advantages of programming by demonstration
with the ability to fine-tune the learned BT.
Unfortunately, using learning from demonstration approaches the learned BT
easily becomes very large, as each trace is directly mapped into a new sub-BT.
Recent approaches address this problem by finding common patterns in the BT and
generalizing them [62]. First, it creates a maximally specific BT from the given
traces. Then iteratively reduces the BT in size by finding and combining common
patterns of actions. The process stops when the BT has no common patters. Reduc-
ing the size of the BT also improves readability.
3 http://mfg.rethinkrobotics.com/intera/Building_a_Behavior_Tree_
Using_the_Robot_Screen
Chapter 9
Stochastic Behavior Trees
In this chapter, we study the reliability of reactive plan executions, in terms of execu-
tion times and success probabilities. To clarify what we mean by these concepts, we
consider the following minimalistic example: a robot is searching for an object, and
can choose between the two subtasks searching on the table, and opening/search-
ing the drawer. One possible plan is depicted in Figure 9.1. Here, the robot first
searches the table and then, if the object was not found on the table, opens the
drawer and searches it. In the figure, each task has an execution time and a success
probability. For example, searching the table has a success probability of 0.1 and an
execution time of 5s. Given a plan like this, it is fairly straightforward to compute
the reliability of the entire plan, in terms of execution time distribution and success
probability. In this chapter, we show how to compute such performance metrics for
arbitrary complex plans encoded using BTs. In particular, we will define Stochastic
BTs in Section 9.1, transform them into Discrete Time Markov Chains (DTMCs) in
Section 9.2, compute reliabilities in Section 9.3 and describe examples Section 9.4.
Some of the results of this chapter were previously published in the paper [11].
Before motivating our study of BTs we will make a few more observations re-
garding the example above. The ordering of the children of Fallback nodes (search-
ing on the table and opening/searching the drawer) can in general be changed,
whereas the ordering of the children of Sequence nodes (opening the drawer and
searching the drawer) cannot. Note also that adding subtasks to a Sequence gen-
erally decreases overall success probabilities, whereas adding Fallbacks generally
increases overall success probabilities, as described in Section 4.2.
155
156 9 Stochastic Behavior Trees
Search 0.1(success)
Table Task
start
(T=5s) Succeeded
s s)
0.9(failure) ce
s uc
9(
0.
Open 0.9(success) Search
re)
Drawer Drawer
ilu
(T=10s) (T=10s)
(fa
0.1
0.1(f
ailur
e ) Task
Failed
Fig. 9.1: A simple plan for a search task, modelled by a Markov Chain.
above. To address the questions above, we need to introduce some concepts from
Markov theory.
Search Table →
(T=5s, P=0.1)
Fig. 9.2: The BT equivalent of the Markov chain in Figure 9.1. The atomic actions are the leaves
of the tree, while the interior nodes correspond to Sequence compositions (arrow) and Fallbacks
compositions (question mark).
9.1 Stochastic BTs 157
p14
s1
21
p
p1
12
p34
3
s2 s3 s4 1
∀ n ∈ N, and ∀ s ∈ S
The expression on the right hand side of (9.1) is the so-called one step transition
probability of the chain and denotes the probability that the process goes from state
sn to state sn+1 . We use the following notation:
to denote the probability to jump from a state si to a state s j . Since we only consider
homogeneous DTMC, the above probabilities do not change in time.
Definition 9.2. The one-step transition matrix P is a |S | × |S | matrix in which the
entries are the transition probabilities pi j .
Let π(k) = [π1 (k), . . . , π|S | (k)]> , where πi is the probability of being in state i, then
the Markov process can be described as a discrete time system with the following
time evolution: (
π(k + 1) = P> π(k)
. (9.3)
π(0) = π0 .
where π0 is assumed to be known a priori.
Definition 9.3. The stochastic sequence {X(t),t ≥ 0} is a CTMC provided that:
∀ n ∈ N, ∀ s ∈ S , and all sequences {t0 ,t1 , . . . ,tn+1 } such that t0 < t1 < . . . < tn <
tn+1 . We use the following notation:
to denote the probability to be in a state s j after a time interval of length τ given that
at present time is into a state si . Since we only consider homogeneous CTMC, the
above probabilities only depend on the time length τ.
To study the continuous time behavior of a Markov process we define the so-
called infinitesimal generator matrix Q.
Definition 9.4. The infinitesimal generator of the transition probability matrix P(t)
is given by:
Q = [qi j ] (9.6)
where
p (∆t)
lim i j
if i 6= j
qi j = ∆t→0 ∆t (9.7)
− ∑ qk j
otherwise.
k6=i
Then, the continuous time behavior of the Markov process is described by the fol-
lowing ordinary differential equation, known as the Cauchy problem:
(
π̇(t) = Q> π(t)
(9.8)
π(0) = π0
1
SJi = − (9.9)
qii
Definition 9.6. Considering the CTMC {X(t),t ≥ 0}, the stochastic sequence {Yn , n =
0, 1, 2, . . .} is a DTMC and it is called Embedded MC (EMC) of the process
X(t) [68].
On the other hand, the infinitesimal generator matrix Q can be reconstructed from
the EMC as follows
9.1 Stochastic BTs 159
1
SJ j ri j if i 6= j
qi j = otherwise . (9.12)
− ∑ rk j
k6=i
9.1.2 Formulation
We are now ready to make some definitions and assumptions, needed to compute
the performance estimates.
1The execution of the parent node starts when it receives a tick and finishes when it returns either
Success/Failure to its parent.
160 9 Stochastic Behavior Trees
where δ (·) is the Dirac’s delta function. From the PDFs we can calculate the
CDFs:
Remark 9.1. Note that it makes sense to sometimes have τs 6= τ f . Imagine a door
opening task which takes 10s to complete successfully but fails 30% of the time
after 5s when the critical grasp phase fails.
Example 9.1. For comparison, given a deterministic action with τs , we let the rates
of a stochastic action have µ = τs−1 . Then the PDFs and CDFs are as seen in Fig-
ure 9.4.
p̂(t)
0 τ t
(a) PDFs.
p̂(t)
0 τ t
(b) CDFs.
Fig. 9.4: Cumulative and probability density distribution function for a deterministic (dark straight
lines) and stochastic action (bright curves).
9.1 Stochastic BTs 161
Definition 9.9. An action A in a BT, is called hybrid if one of ps (t) and p f (t) is a
random variable with exponential distribution, and the other one is deterministic.
In this case the CDF and the PDF of the probability to succeed are discontinuous.
In fact this hybrid action will return Failure if, after the success time τs , it does not
return Success. Then, to have an analogy with stochastic actions we derive the PDF
of the probability to succeed:
To have an analogy with stochastic actions we derive the PDF of the probability to
fail:
p̂ f (t) = p f δ (t − τ f ) (9.26)
and the CDFs as follows:
p̄ f (t) = p f H(t − τ f ) (9.27)
(
ps (1 − e−µt ) if t < τ f
p̄s (t) = . (9.28)
1 − p̄ f (t) otherwise
Remark 9.2. Note that the addition of deterministic execution times makes (9.8)
discontinuous on the right hand side, but it still has a unique solution in the
Carathéodory sense [16].
We will now give an example of how these concepts transfer over BT composi-
tions.
1
⇒
Deterministic Stochastic
Action Action
Example 9.2. Consider the BT in Figure 9.5. The Parallel node is set to returns Suc-
cess as soon as one child returns Success, and the two children are of different kinds,
one deterministic and the other stochastic. Note that the MTTS and MTTF of this
BT has to account for the heterogeneity of its children. The deterministic child can
succeed only at time τs . The CDF of the Parallel node is given by the sum of the
CDFs of its children. The PDF has a jump at time τs accounting for the fact that the
Parallel node is more likely to return Success after that time. Thus, the PDF and the
CDF of a Success return status are shown in Figure 9.6.
Definition 9.10. A BT T1 and a BT T2 are said equivalent if and only if T1 can be
created from T2 by permutations of the children of Fallbacks and Parallel composi-
tions.
9.1 Stochastic BTs 163
p̂(t)
0 τs t
(a) PDF.
p̄(t)
0 τs t
(b) CDF.
Fig. 9.6: Cumulative and probability density distribution function of Success of the Parallel node
in Figure 9.5.
Assumption 9.1 For each action A in the BT, one of the following holds
• The action A is a stochastic action.
• The action A is a deterministic action.
• The action A is a hybrid action.
Assumption 9.2 For each condition C in the BT, the following holds
• It consistently returns the same value (Success or Failure) throughout the execu-
tion of its parent node.
• The probability to succeed at any given time ps (t) and the probability to fail at
any given time p f (t) are known a priori.
Given a SBT, we want to use the probabilistic descriptions of its actions and
conditions, ps (t), p f (t), µ and ν, to recursively compute analogous descriptions for
every subtrees and finally the whole tree.
To illustrate the investigated problems and SBTs we take a look at the following
example.
Example 9.3. Imagine a robot that is to search for a set of keys on a table and in
a drawer. The robot knows that the keys are often located in the drawer, so that
location is more likely than the table. However, searching the table takes less time,
since the drawer must be opened first. Two possible plans are conceivable: searching
the table first, and then the drawer, as in Figure 9.7a, or the other way around as in
Figure 9.7b. These two plans can be formulated as SBTs and analyzed through the
scope of Problem 1 and 2, using the results of Section 9.1 below. Depending on the
user requirements in terms of available time or desired reliability at a given time,
the proper SBT can be chosen.
? ?
(a) (b)
Fig. 9.7: BT modeling of two plan options. In (a), the robot searches on the table first, and in the
drawer only if the table search fails. In (b), the table is searched only if nothing is found in the
drawer.
Remark 9.3. Note that Assumptions 9.1 corresponds to the return status of the search
actions in Example 9.3 behaving in a reasonable way, e.g., not switching between
Success and Failure.
in a recursive fashion, beginning with the leaves of the BT, i.e. the actions and con-
ditions which have known probabilistic parameters according to Assumptions 9.1
and 9.2, and then progressing upwards in a scalable fashion.
To keep track of the execution of a given flow control node, the children outcomes
are collected in a vector state called the marking of the node, and the transitions
between markings are defined according to the execution policy of the node. In
detail, let m(k) = [m1 (k), m2 (k), . . . , mN (k)] be a marking of a given BT node with
N children at time step k with
−1
if child i returns Failure at k
mi (k) = 1 if child i returns Success at k (9.29)
0 otherwise
Example 9.4. Consider the BT in Figure9.7a. If the first child (Search Table) has
failed, and the second (Search Drawer) is currently running, the marking would be
m(k) = [−1, 0].
We define an event related to a BT node when one of its children returns either
Success or Failure. Defining ei (k) to be the vector associated to the event of the i-th
running child, all zeros except the i-th entry which is equal to ei (k) ∈ {−1, 1}:
(
−1 if child i has failed at k
ei (k) = (9.30)
1 if child i has succeeeded at k.
We would like to describe the time evolution of the node marking due to an event
associated with the child i as follows:
with the event ei (k) restricted to the feasible set of events at m(k), i.e.
ei (k) ∈ F (m(k)).
i.e. events having only one nonzero element, with value −1 or 1. We will now de-
scribe the set F (m(k)) for the three different node types.
m0 [0 0 0 · · · 0]
e1 = −1 e1 = 1
pf 1 ps1
[−1 0 0 · · · 0] m1 m2 [1 0 0 · · · 0]
e2 = −1 e2 = 1
1 pf 2 ps2
[1 − 1 0 · · · 0] m3 m4 [1 1 0 · · · 0]
e3 = −1 e3 = 1
1 pf 3 ps3
[1 1 − 1 · · · 0] m5 ..
.
eN −1 = 1
1 psN−1
ms−2 [1 1 1 · · · 0]
eN = −1 eN = 1
pf N psN
[1 1 1 · · · − 1] ms−1 ms [1 1 1 · · · 1]
1 1
Fig. 9.8: MRG of the Sequence node (rectangles) with N children and its DTMC representation
(circles).
can then map the node execution to a DTMC where the states are the markings in
the MRG and the one-step transition matrix P is given by the probability of jump
between markings, with off diagonal entries defined as follows:
p̃sh if m j − mi ∈ F (mi ) ∧ eh eh (m j − mi ) > 0
T
pii = 1 − ∑ pi j . (9.37)
j
with: −1
psh µh νh µ jν j
p̃sh = · ∑ (9.38)
p f h µh + psh νh p f j µ j + ps j ν j
j:e j ∈F (mi )
and
−1
p f h µh νh µ jν j
p̃ f h = · ∑ (9.39)
p f h µh + psh νh p f j µ j + ps j ν j
j:e j ∈F (mi )
168 9 Stochastic Behavior Trees
m0 [0 0 0 · · · 0]
e1 = −1 e1 = 1
pf 1 ps1
[−1 0 0 · · · 0] m2 m1 [1 0 0 · · · 0]
e2 = −1 e2 = 1
pf 2 ps2 1
[−1 − 1 0 · · · 0] m4 m3 [−1 1 0 · · · 0]
e3 = −1 e3 = 1
pf 3 ps3 1
. m5 [−1 − 1 1 · · · 0]
..
eN −1 = −1
pf N−1 1
[−1 − 1 − 1 · · · 0] ms−2
eN = −1 eN = 1
pf N psN
[−1 − 1 − 1 · · · − 1] ms ms−1 [−1 − 1 − 1 · · · 1]
1 1
Fig. 9.9: MRG of the Fallback node (rectangles) with N children and its DTMC representation
(circles).
[0 0]
m0
p̃f 1 p̃f 2
e1 = −1 e1 = 1 e2 = 1 e2 = −1
p̃s1 p̃s2
[−1 0] m1 [1 0] m2 m4 [0 1] m3 [0 − 1]
1 1 1
Fig. 9.10: MRG of the Parallel node (rectangles) with 2 children and its DTMC representation
(circles).
Remark 9.5. For Sequence and Fallback nodes the following holds: p̃sh = psh and
p̃ f h = p f h .
In Figures. 9.8 and 9.9 the mapping from MRG to a DTCM related to a Sequence
node and a Fallback node are shown. In Figure 9.10 the mapping for a Parallel node
with two children and M = 2 is shown. We choose not to depict the mapping of
9.3 Reliability of a SBT 169
a general Parallel node, due to its large amount of states and possible transition
between them.
To obtain the continuous time probability vector π(t) we need to compute the
infinitesimal generator matrix Q associated with the BT node. For doing so we con-
struct a CTMC for which the EMC is the DTMC of the BT node above computed.
According to (9.7) the map from the EMC and the related CTMC is direct, given
the average sojourn times SJi .
−1 !−1
psh p f h
SJi = ∑ + (9.40)
h:eh ∈F (mi )
µh νh
with h : eh ∈ F (mi ).
Proof. In each marking one of the following occur: the running child h fails or
succeeds. To take into account both probabilities and time rates, that influence the
average sojourn time, we describe the child execution using an additional CTMC,
depicted in Figure 9.11
According to (9.9) the average sojourn time is:
p f h µh + psh νh psh p f h
τi = = + (9.41)
νh µh µh νh
and the rate of leaving that state is τi −1 . Now to account all the possible running
children outcome, e.g. in a Parallel node, we consider all the rates associate to the
running children. The rate of such node is the sum of all the rates associated to the
running children τi −1 . Finally, the average sojourn time of a marking mi is given by
the inverse of the combined rate:
1 1
= ∑ psh pfh (9.42)
SJi h:e +
h ∈F (mh ) µh νh
− pf h µνhh+µphsh νh
pf h νh µh psh νh µh
pf h µh +psh νh Running pf h µh +psh νh
Failure Success
Remark 9.6. The EMC associated with the CTMC in Figure 9.11 is depicted in Fig-
ure 9.12. It describes the child’s execution as a DTMC.
pf h Running psh
Failure Success
Lemma 9.2. Let A be a matrix with the i j-th entry defined as exp(ti j ) where ti j is
the time needed to transit from a state j to a state i if j, i are neighbors in the MRG,
0 otherwise. The MTTF and MTTS of the BT node can be computed as follows
|S |
∑i=1F uFi1 log(hFi1 )
MT T F = |S |
(9.44)
∑i=1F uFi1
where:
∞
H F , AF ∑ AiT . (9.45)
i=0
and
|S |
∑i=1S uSi1 log(hSi1 )
MT T S = |S |
(9.46)
∑i=1S uSi1
where:
∞
H S , AS ∑ AiT (9.47)
i=0
Proof. Failure and success states are absorbing, hence we focus our attention on the
probability of leaving a transient state, described by the matrix U, defined below:
∞
U= ∑ T i, (9.49)
k=0
Thus, considering i as the initial transient state, the entries ui j is the mean number
of visits of j starting from i before being absorbed, we have to distinguish the case
in which the absorbing state is a failure state from the case in which it is a success
state:
172 9 Stochastic Behavior Trees
U F , RF U (9.50)
U S , RSU. (9.51)
Equations (9.50) and (9.51) represent the mean number of visits before being ab-
sorbed in a failure or success state respectively.
To derive MTTF/MTTS we take into account the mean time needed to reach ev-
ery single failure/success state with its probability, normalized over the probability
of reaching any failure (success) state, starting from the initial state. Hence we sum
the probabilities of reaching a state starting from the initial one, taking into account
only the first column of the matrices obtaining (9.44) and (9.46).
Remark 9.7. Since there are no self loops in the transient state of the DTMC above,
the matrix T is nilpotent. Hence ui j is finite ∀i, j.
where π(t) is the probability vector of the DTMC related to the node (i.e. the solution
of (9.8)). The time to succeed (fail) for a node is given by a random variable with
9.3 Reliability of a SBT 173
exponential distribution and rate given by the inverse of the MTTS (MTTF) since for
such random variables the mean time is given by the inverse of the rate.
µ = MT T S−1 (9.54)
−1
ν = MT T F (9.55)
Remark 9.8. Proposition 9.55 holds also for deterministic and hybrid BTs, as (9.8)
has a unique solution in the Carathéodory sense [16].
T = Fallback(A1 , A2 ) (9.56)
depicted in Figure 9.13 and let τFi (τSi ) be the MT T F (MT T S) of action i and p f i
(psi ) its probability to fail (succeed). The success/failure probability over time of the
tree T is a discontinuous function depicted in Figure 9.14.
Deterministic Deterministic
Action 1 Action 2
Hence the success and failure probability have discrete jumps over time. These
piece-wise continuous functions can be described by the discrete time system (9.3)
introducing the information of the time when the transitions take place, which is
more tractable than directly solving (9.8). Then, the calculation of π(t) is given by
a zero order hold of the discrete solution.
174 9 Stochastic Behavior Trees
p(t)
pf 1 ·pf 2
ps1
Fig. 9.14: Failure (red, lower) and success (green, upper) probability of the deterministic node of
example. The running probability is the complement of the other two (not shown).
Proposition 9.2. Let P be the one-step transition matrix given in Definition 9.2 and
let τFi (τSi ) be the time to fail (succeed) of action i and p f i (psi ) its probability to
fail (succeed). Let π̃(τ) = [π̃1 (τ), . . . , π̃|S | (τ)]> , where π̃i (τ) is the probability of
being in a marking mi at time τ of a MRG representing a deterministic node with N
children, let P̃(τ) be a matrix which entries p̃i j (τ) are defined as:
pi j · δ (τ − (log(ã j1 )) if i 6= j
p̃i j (τ) = 1 − p̃ik otherwise (9.57)
∑
k6=i
∞
à , ∑ Ai (9.58)
i=0
where ∆ τ is the common factor of {τF1 , τS1 , τF2 , τS2 , . . . , τFN , τSN } Then for, deter-
ministic nodes, given π̃(τ) the probability over time is given by:
Proof. The proof is trivial considering that (9.59) is a piece-wise constant function
and ∆ τ is the common faction of all the step instants.
9.4 Examples 175
9.4 Examples
In this section, we present three examples. The first example is the BT in Fig-
ure 9.15a, which is fairly small and allows us to show the details of each step.
The second example is the deterministic time version of the same BT, illustrating
the differences between the two cases. The third example involves a more complex
BT, shown in Figure 9.17. This example will be used to verify the approach numer-
ically, by performing Monte Carlo simulations and comparing the numeric results
to the analytical ones, see Table 9.2 and Figure 9.20. It is also used to illustrate the
difference in performance metrics, between two equivalent BTs, see Figure 9.22.
We will now carry out the computation of probabilistic parameters for an exam-
ple SBT.
m
s10 [0 0 0]
pf 1 ps1
0
∅
[−1 0 0] m
s22 m
s51 [1 0 0]
pf 2 ps2 1
[−1 − 1 0] m
s34 m
s63 [−1 1 0]
?
pf 3 ps3 1
[−1 − 1 − 1] mss−2
4 m
s75 [−1 − 1 1]
Action Action Action
1 1
1 2 3
(a) BT of example. (b) Markov Chain.
Fig. 9.15: BT and related DTMC modeling the plan of Example 9.4.
Example 9.7. Given the tree shown in Figure 9.15a, its probabilistic parameters are
given by evaluating the Fallback node, since it is the child of the root node. The
given PDF of the i-th action are:
where:
• p fi probability of failure
• psi probability of success
• νi failure rate
• µi success rate
The DTMC related as shown in Figure 9.15b has S = {s1 , s2 , s3 , s4 , s5 , s6 , s7 }, SF =
{s4 } and SS = {s5 , s6 , s7 }.
176 9 Stochastic Behavior Trees
According to (9.40) the average sojourn times are collected in the following vec-
tor
pf pf pf
hp ps ps
i
s
SJ = µ11 + ν11 , µ22 + ν22 , µ33 + ν33 (9.64)
We can now derive closed form expression for MTTS and MTTF. Using the decom-
position in (9.43), the matrices computed according (9.51) and (9.50) are:
ps1 0 0
U S = p f 1 p s2 ps2 0 (9.67)
p f1 p f2 ps3 p f2 ps3 p s3
U F = p f1 p f2 p f3 p f2 p f3
p f3 (9.68)
ets1
0 0
t
H = e f1 ets2
S
ets2 0 (9.70)
et f1 et f2 ets3 et f2 ets3 ets3
H F = et f1 et f2 et f3 et f2 et f3 et f3
(9.71)
Using (9.44) and (9.46) we obtain the MTTS and MTTF. Finally, the probabilistic
parameters of the tree are expressed in a closed form according to (9.52)-(9.55):
Example 9.8. Consider the BT given in Example 9.4, we now compute the perfor-
mances in case when the actions are all deterministic.
The computation of MTTF and MTTS follows from Example 9.4, whereas the
computation of π(t) can be made according to Proposition 9.2.
According to (9.58) the matrix à takes the form below
0 0 0 0000
t f
t e 1t
0 0 0 0 0 0
t f2
e f1 e f2
t t t t e t t0 0 0 0 0
à = e e e e e e 0 0 0 0
f1 f2 f3 f2 f3 f3
(9.76)
ts
t e 1t
0 0 0 0 0 0
ets2
e f1 e s2 0 0 0 0 0
t f1 t f2 ts3 t f2 ts3 ts3
e e e e e e 0000
thereby, according to (9.57), the modified one step transition matrix takes the form
of Figure 9.16,
1 − (p f
1
δ (t−t f1 )+ps1 δ (t−ts1 )) 0 0 0 0 0 0
178
Example 9.9. The task given to a two armed robot is to find and collect objects
which can be found either on the floor, in the drawers or in the closet. The time
needed to search for a desired object on the floor is less than the time needed to
search for it in the drawers, since the latter has to be reached and opened first. On
the other hand, the object is more likely to be in the drawers than on the floor, or in
the closet. Moreover, the available policies for picking up objects are the one-hand
and the two-hands grasps. The one-hand grasp most likely fails, but it takes less
time to check if it has failed or not. Given these options, the task can be achieved
in different ways, each of them corresponding to a different performance measure.
The plan chosen for this example is modeled by the SBT shown in Figure 9.17.
The performance estimates given by the proposed approach for the whole BT, as
well as for two sub trees can be seen in Figures 9.18-9.19 .
∅ 0
→ 1
? 2 ? 4
Obj.Pos.
Retrieved ? 3 Object
Grasped ? 5
Fig. 9.17: BT modeling the search and grasp plan. The leaf nodes are labeled with a text, and the
control flow nodes are labeled with a number, for easy reference.
We also use the example above to verify the correctness of the analytical esti-
mates, and the results can be seen in Table 9.2. We compared the analytical solution
derived using our approach with numerical results given by a massive Monte Carlo
180 9 Stochastic Behavior Trees
Table 9.1: Table comparing numerical and experimental results of MTTF and MTTS. The labels
of the subscripts are given in Figure 9.17
Table 9.2: Table collecting given parameters, the labels of the control flow nodes are given in
Figure 9.17.
0.4
0.2
0
0 500 1000 1500
Time [s]
Fig. 9.18: Probability distribution over time for the Root node of the larger BT in Figure 9.17.
Numerical results are marked with an ’x’ and analytical results are drawn using solid lines. Note
how the failure probability is initially lower, but then becomes higher than the success probability
after t = 500.
the floor. Thus the optimal solution is a new BT, with the drawer search as the first
option. Note that the asymptotic probabilities are always the same for equivalent BT,
see Definition 9.10, as the changes considered are only permutations of Fallbacks.
182 9 Stochastic Behavior Trees
1
Running
0.9 Failed
Succeeded
0.8
0.6
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200
Time [s]
(a) Node 5
1
Running
0.9 Failed
Succeeded
0.8
Probabilities of the node 3
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 500 1000 1500
Time [s]
(b) Node 3
Fig. 9.19: Comparison of probability distribution over time related to Node 5 (a) and Node 3 (b).
Numerical results are marked with an ’x’ and analytical results are drawn using solid lines. The
failure probabilities are lower in both plots.
9.4 Examples 183
1
Running
0.9 Failed
0.8 Succeeded
Probabilities of Root
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 500 1000 1500
Time [s]
Fig. 9.20: Comparison of Success/Failure/Running probabilities of the root node in the case of
deterministic times (thick) and stochastic times (thin).
184 9 Stochastic Behavior Trees
1
Running
Failed
0.9 Succeeded
0.8
0.7
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200 250
Time [s]
(a) Node 5
1
Running
Failed
Succeeded
0.9
0.8
0.7
Probabilities of the node 3
0.6
0.5
0.4
0.3
0.2
0.1
0
0 500 1000 1500
Time [s]
(b) Node 3
Fig. 9.21: Comparison of Success/Failure/Running probabilities of the node 5 (a) and node 3 (b)
in the case of deterministic times (thick) and stochastic times (thin).
9.4 Examples 185
0.5
0.4
0.2
0.1
0
0 50 100 150 200 250 300 350 400 450 500
Time [s]
Fig. 9.22: Success/Failure probabilities in the case of searching on the floor first (dashed) and
searching in the drawer first (solid). Failure probabilities are lower in both cases.
Chapter 10
Concluding Remarks
In this book, we have tried to present a broad, unified picture of BTs. We have
covered the classical formulation of BTs, its extensions and its relation to other ap-
proaches. We have provided theoretical results on efficiency, safety and robustness,
using a new state space formalism, as well as estimates on execution time and suc-
cess probabilities using a stochastic framework. We have described a number of
practical design principles as well as connections between BTs and the important
areas of planning and learning.
We believe that modularity is the main reason behind the huge success of BTs in
the computer game AI community, and the growing popularity of BTs in robotics.
It is well known that modularity is a key enabler when designing complex, main-
tainable and reusable systems. Clear interfaces reduce dependencies between com-
ponents and makes development, testing, and reuse much simpler. BTs have such
interfaces, as each level of the tree has the same interface as a single action, and the
internal nodes of the tree makes the implementation of an action independent of the
context and order in which the action is to be used. Finally, these simple interfaces
provide structures that are equally beneficial for both humans and machines. In fact,
they are vital to the ideas of all chapters, from state-space formalism and planning
to design principles and machine learning.
Thus, BTs represent a promising control architecture in both computer game
AI and robotics. However, the parallel development in the field has given rise to a
set of different formulations and variations on the theme. This book is an attempt to
provide a unified view of a breadth of ideas, algoritms and applications. There is still
lots of work to be done, and we hope the reader has found this book helpful, and
perhaps inspiring, when continuing on the journey towards building better virtual
agents and robots.
188 10 Concluding Remarks
References
1. David Aha, Matthew Molineaux, and Marc Ponsen. Learning to win: Case-based plan se-
lection in a real-time strategy game. Case-based reasoning research and development, pages
5–20, 2005.
2. J. Andrew (Drew) Bagnell, Felipe Cavalcanti, Lei Cui, Thomas Galluzzo, Martial Hebert,
Moslem Kazemi, Matthew Klingensmith, Jacqueline Libby, Tian Yu Liu, Nancy Pollard,
Mikhail Pivtoraiko, Jean-Sebastien Valois, and Ranqi Zhu. An Integrated System for Au-
tonomous Robotics Manipulation. In IEEE/RSJ International Conference on Intelligent
Robots and Systems, pages 2955–2962, October 2012.
3. Andrew G Barto and Sridhar Mahadevan. Recent advances in hierarchical reinforcement
learning. Discrete Event Dynamic Systems, 13(4):341–379, 2003.
4. Scott Benson and Nils J Nilsson. Reacting, planning, and learning in an autonomous agent.
In Machine intelligence 14, pages 29–64. Citeseer, 1995.
5. Iva Bojic, Tomislav Lipic, Mario Kusek, and Gordan Jezic. Extending the JADE Agent Be-
haviour Model with JBehaviourtrees Framework. In Agent and Multi-Agent Systems: Tech-
nologies and Applications, pages 159–168. Springer, 2011.
6. R. Brooks. A Robust Layered Control System for a Mobile Robot. Robotics and Automation,
IEEE Journal of, 2(1):14–23, 1986.
7. R.A. Brooks. Elephants don’t play chess. Robotics and autonomous systems, 6(1-2):3–15,
1990.
8. Robert R Burridge, Alfred A Rizzi, and Daniel E Koditschek. Sequential Composition of
Dynamically Dexterous Robot Behaviors. The International Journal of Robotics Research,
18(6):534–555, 1999.
9. A.J. Champandard. Understanding Behavior Trees. AiGameDev. com, 6, 2007.
10. Michele Colledanchise, Diogo Almeida, and Petter Ögren. Towards blended reactive planning
and acting using behavior trees. arXiv preprint arXiv:1611.00230, 2016.
11. Michele Colledanchise, Alejandro Marzinotto, and Petter Ögren. Performance Analysis of
Stochastic Behavior Trees. In Robotics and Automation (ICRA), 2014 IEEE International
Conference on, June 2014.
12. Michele Colledanchise and Petter Ögren. How behavior trees generalize the teleo-reactive
paradigm and and-or-trees. In Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ Inter-
national Conference on, pages 424–429. IEEE, 2016.
13. Michele Colledanchise and Petter Ögren. How behavior trees modularize hybrid control sys-
tems and generalize sequential behavior compositions, the subsumption architecture, and de-
cision trees. IEEE Transactions on Robotics, 33(2):372–389, 2017.
14. Michele Colledanchise, Ramviyas Parasuraman, and Petter Ögren. Learning of behavior trees
for autonomous agents. arXiv preprint arXiv:1504.05811, 2015.
15. Edsger W. Dijkstra. Letters to the editor: go to statement considered harmful. Commun. ACM,
11:147–148, March 1968.
16. A.F. Filippov and F.M. Arscott. Differential Equations with Discontinuous Righthand Sides:
Control Systems. Mathematics and its Applications. Kluwer Academic Publishers, 1988.
17. Gonzalo Flórez-Puga, Marco Gomez-Martin, Belen Diaz-Agudo, and Pedro Gonzalez-Calero.
Dynamic expansion of behaviour trees. In Proceedings of Artificial Intelligence and Interac-
tive Digital Entertainment Conference. AAAI Press, pages 36–41, 2008.
18. Marc Freese, Surya Singh, Fumio Ozaki, and Nobuto Matsuhira. Virtual robot experimenta-
tion platform v-rep: A versatile 3d robot simulator. Simulation, modeling, and programming
for autonomous robots, pages 51–62, 2010.
19. Zhiwei Fu, Bruce L Golden, Shreevardhan Lele, S Raghavan, and Edward A Wasil. A ge-
netic algorithm-based approach for building accurate decision trees. INFORMS Journal on
Computing, 15(1):3–22, 2003.
20. Thomas Galluzzo, Moslem Kazemi, and Jean-Sebastien Valois. Bart - behavior architecture
for robotic tasks, https://code.google.com/p/bart/. Technical report, 2013.
References 189
21. Ramon Garcia-Martinez and Daniel Borrajo. An integrated approach of learning, planning,
and execution. Journal of Intelligent and Robotic Systems, 29(1):47–78, 2000.
22. Caelan Reed Garrett, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Backward-forward
search for manipulation planning. In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ
International Conference on, pages 6366–6373. IEEE, 2015.
23. JK Gershenson, GJ Prasad, and Y Zhang. Product modularity: definitions and benefits. Jour-
nal of Engineering design, 14(3):295–313, 2003.
24. Malik Ghallab, Dana Nau, and Paolo Traverso. The actor’s view of automated planning and
acting: A position paper. Artif. Intell., 208:1–17, March 2014.
25. Malik Ghallab, Dana Nau, and Paolo Traverso. Automated Planning and Acting. Cambridge
University Press, 2016.
26. Gerhard Gubisch, Gerald Steinbauer, Martin Weiglhofer, and Franz Wotawa. A teleo-reactive
architecture for fast, reactive and robust control of mobile robots. In New Frontiers in Applied
Artificial Intelligence, pages 541–550. Springer, 2008.
27. Kelleher R. Guerin, Colin Lea, Chris Paxton, and Gregory D. Hager. A framework for end-
user instruction of a robot assistant for manufacturing. In IEEE International Conference on
Robotics and Automation (ICRA), 2015.
28. Blake Hannaford, Danying Hu, Dianmu Zhang, and Yangming Li. Simulation results on
selector adaptation in behavior trees. arXiv preprint arXiv:1606.09219, 2016.
29. David Harel. Statecharts: A visual formalism for complex systems, 1987.
30. Danying Hu, Yuanzheng Gong, Blake Hannaford, and Eric J. Seibel. Semi-autonomous sim-
ulated brain tumor ablation with raven ii surgical robot using behavior tree. In IEEE Interna-
tional Conference on Robotics and Automation (ICRA), 2015.
31. Damian Isla. Handling Complexity in the Halo 2 AI. In Game Developers Conference, 2005.
32. Damian Isla. Halo 3-building a Better Battle. In Game Developers Conference, 2008.
33. Leslie Pack Kaelbling and Tomás Lozano-Pérez. Hierarchical task and motion planning in
the now. In Robotics and Automation (ICRA), 2011 IEEE International Conference on, pages
1470–1477. IEEE, 2011.
34. Sergey Karakovskiy and Julian Togelius. The mario ai benchmark and competitions. Compu-
tational Intelligence and AI in Games, IEEE Transactions on, 4(1):55–67, 2012.
35. Andreas Klökner. Interfacing Behavior Trees with the World Using Description Logic. In
AIAA conference on Guidance, Navigation and Control, Boston, 2013.
36. Martin Levihn, Leslie Pack Kaelbling, Tomas Lozano-Perez, and Mike Stilman. Foresight
and reconsideration in hierarchical planning and execution. In Intelligent Robots and Systems
(IROS), 2013 IEEE/RSJ International Conference on, pages 224–231. IEEE, 2013.
37. C.U. Lim, R. Baumgarten, and S. Colton. Evolving Behaviour Trees for the Commercial
Game DEFCON. Applications of Evolutionary Computation, pages 100–110, 2010.
38. Alejandro Marzinotto, Michele Colledanchise, Christian Smith, and Petter Ögren. Towards a
Unified Behavior Trees Framework for Robot Control. In Robotics and Automation (ICRA),
2014 IEEE International Conference on, June 2014.
39. M. Mateas and A. Stern. A Behavior Language for story-based believable agents. IEEE
Intelligent Systems, 17(4):39–47, Jul 2002.
40. Joshua McCoy and Michael Mateas. An Integrated Agent for Playing Real-Time Strategy
Games. In AAAI, volume 8, pages 1313–1318, 2008.
41. G. H. Mealy. A method for synthesizing sequential circuits. The Bell System Technical Jour-
nal, 34(5):1045–1079, Sept 1955.
42. Bill Merrill. Ch 10, building utility decisions into your existing behavior tree. Game AI Pro.
A collected wisdom of game AI professionals, 2014.
43. Ian Millington and John Funge. Artificial intelligence for games. CRC Press, 2009.
44. Tom M Mitchell. Machine learning. WCB, volume 8. McGraw-Hill Boston, MA:, 1997.
45. Michael Montemerlo, Jan Becker, Suhrid Bhat, Hendrik Dahlkamp, Dmitri Dolgov, Scott Et-
tinger, Dirk Haehnel, Tim Hilden, Gabe Hoffmann, Burkhard Huhnke, et al. Junior: The
stanford entry in the urban challenge. Journal of field Robotics, 25(9):569–597, 2008.
46. Edward F Moore. Gedanken-experiments on sequential machines. Automata studies, 34:129–
153, 1956.
190 10 Concluding Remarks
47. Seyed R Mousavi and Krysia Broda. Simplification Of Teleo-Reactive sequences. Imperial
College of Science, Technology and Medicine, Department of Computing, 2003.
48. Tadao Murata. Petri nets: Properties, analysis and applications. Proceedings of the IEEE,
77(4):541–580, 1989.
49. Dana S. Nau, Malik Ghallab, and Paolo Traverso. Blended planning and acting: Preliminary
approach, research challenges. In Proceedings of the Twenty-Ninth AAAI Conference on Arti-
ficial Intelligence, AAAI’15, pages 4047–4051. AAAI Press, 2015.
50. M. Nicolau, D. Perez-Liebana, M. O’Neill, and A. Brabazon. Evolutionary behavior tree
approaches for navigating platform games. IEEE Transactions on Computational Intelligence
and AI in Games, PP(99):1–1, 2016.
51. Nils J. Nilsson. Teleo-reactive programs for agent control. JAIR, 1:139–158, 1994.
52. J.R. Norris. Markov Chains. Number no. 2008 in Cambridge Series in Statistical and Proba-
bilistic Mathematics. Cambridge University Press, 1998.
53. S Ocio. A dynamic decision-making model for videogame AI systems, adapted to players. PhD
thesis, Ph. D. diss., Department of Computer Science, University of Oviedo, Spain, 2010.
54. Sergio Ocio. Adapting ai behaviors to players in driver san francisco: Hinted-execution behav-
ior trees. In Eighth Artificial Intelligence and Interactive Digital Entertainment Conference,
2012.
55. Petter Ögren. Increasing Modularity of UAV Control Systems using Computer Game Behavior
Trees. In AIAA Guidance, Navigation and Control Conference, Minneapolis, MN, 2012.
56. Chris Paxton, Andrew Hundt, Felix Jonathan, Kelleher Guerin, and Gregory D Hager.
Costar: Instructing collaborative robots with behavior trees and vision. arXiv preprint
arXiv:1611.06145, 2016.
57. Renato de Pontes Pereira and Paulo Martins Engel. A framework for constrained and adaptive
behavior-based agents. arXiv preprint arXiv:1506.02312, 2015.
58. Diego Perez, Miguel Nicolau, Michael O’Neill, and Anthony Brabazon. Evolving behaviour
trees for the mario ai competition using grammatical evolution. In Proceedings of the 2011 In-
ternational Conference on Applications of Evolutionary Computation - Volume Part I, EvoAp-
plications’11, Berlin, Heidelberg, 2011. Springer-Verlag.
59. Matthew Powers, Dave Wooden, Magnus Egerstedt, Henrik Christensen, and Tucker Balch.
The Sting Racing Team’s Entry to the Urban Challenge. In Experience from the DARPA Urban
Challenge, pages 43–65. Springer, 2012.
60. Steve Rabin. Game AI Pro, chapter 6. The Behavior Tree Starter Kit. CRC Press, 2014.
61. Ingo Rechenberg. Evolution strategy. Computational Intelligence: Imitating Life, 1, 1994.
62. Glen Robertson and Ian Watson. Building behavior trees from observations in real-time strat-
egy games. In Innovations in Intelligent SysTems and Applications (INISTA), 2015 Interna-
tional Symposium on, pages 1–7. IEEE, 2015.
63. Günter Rudolph. Convergence analysis of canonical genetic algorithms. Neural Networks,
IEEE Transactions on, pages 96–101, 1994.
64. I. Sagredo-Olivenza, P. P. Gomez-Martin, M. A. Gomez-Martin, and P. A. Gonzalez-Calero.
Trained behavior trees: Programming by demonstration to support ai game designers. IEEE
Transactions on Games, PP(99):1–1, 2017.
65. Claude Sammut, Scott Hurst, Dana Kedzier, and Donald Michie. Imitation in Animals and
Artifacts, chapter Learning to Fly, page 171. MIT Press, 2002.
66. Kirk Y. W. Scheper, Sjoerd Tijmons, Coen C. de Visser, and Guido C. H. E. de Croon. Be-
haviour trees for evolutionary robotics. CoRR, abs/1411.7267, 2014.
67. Alexander Shoulson, Francisco M Garcia, Matthew Jones, Robert Mead, and Norman I Badler.
Parameterizing Behavior Trees. In Motion in Games. Springer, 2011.
68. William J Stewart. Probability, Markov chains, queues, and simulation: the mathematical
basis of performance modeling. Princeton University Press, 2009.
69. Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction, volume 1.
MIT press Cambridge, 1998.
70. Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A frame-
work for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–
211, 1999.
References 191
71. Chris Urmson, Joshua Anhalt, Drew Bagnell, Christopher Baker, Robert Bittner, MN Clark,
John Dolan, Dave Duggins, Tugrul Galatali, Chris Geyer, et al. Autonomous driving in urban
environments: Boss and the urban challenge. Journal of Field Robotics, 25(8):425–466, 2008.
72. Chris Urmson, J Andrew Bagnell, Christopher R Baker, Martial Hebert, Alonzo Kelly, Raj
Rajkumar, Paul E Rybski, Sebastian Scherer, Reid Simmons, Sanjiv Singh, et al. Tartan racing:
A multi-modal approach to the darpa urban challenge. 2007.
73. Blanca Vargas and E Morales. Solving navigation tasks with learned teleo-reactive programs.
Proceedings of IEEE International Conference on Robots and Systems (IROS), 2008.
74. Ben G Weber, Peter Mawhorter, Michael Mateas, and Arnav Jhala. Reactive planning id-
ioms for multi-scale game ai. In Computational Intelligence and Games (CIG), 2010 IEEE
Symposium on, pages 115–122. IEEE, 2010.
75. Ben George Weber, Michael Mateas, and Arnav Jhala. Building Human-Level AI for Real-
Time Strategy Games. In AAAI Fall Symposium: Advances in Cognitive Systems, volume 11,
page 01, 2011.