0% found this document useful (0 votes)
22 views9 pages

Possible

Uploaded by

karimhafez37
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views9 pages

Possible

Uploaded by

karimhafez37
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Proceedings of the Twenty-Seventh International Conference on Automated Planning and Scheduling (ICAPS 2017)

Framer: Planning Models from Natural


Language Action Descriptions
Alan Lindsay,1 Jonathon Read,2 João F. Ferreira,1,3
Thomas Hayton,1 Julie Porteous,1 Peter Gregory1
1
Digital Futures Institute, School of Computing, Teesside University, UK.
2
Ocado Technology, Hatfield, UK.
3
HASLab/INESC TEC, Universidade do Minho, 4704-553 Braga, Portugal.
[email protected] | [email protected]

Abstract space applications (Frank et al. 2011). An extended ver-


sion of the LOCM domain model acquisition system (Cress-
In this paper, we describe an approach for learning planning
domain models directly from natural language (NL) descrip-
well, McCluskey, and West 2009) has also been used to
tions of activity sequences. The modelling problem has been help in the development of a puzzle game (Ersen and Sariel
identified as a bottleneck for the widespread exploitation of 2015) based on spatio-temporal reasoning. Web Service
various technologies in Artificial Intelligence, including auto- Composition is another area in which domain model ac-
mated planners. There have been great advances in modelling quisition techniques have been used (Walsh and Littman
assisting and model generation tools, including a wide range 2008). These tools vary in the specifics of the input language,
of domain model acquisition tools. However, for modelling such as example action sequences (Cresswell, McCluskey,
tools, there is the underlying assumption that the user can and West 2009; Cresswell and Gregory 2011), or action se-
formulate the problem using some formal language. And even quences and a partial domain model (McCluskey et al. 2009;
in the case of the domain model acquisition tools, there is Richardson 2008); the query system by which they ac-
still a requirement to specify input plans in an easily machine
readable format. Providing this type of input is impractical for
quire the input data, which is typically static training sets,
many potential users. This motivates us to generate planning although there are examples working with an interactive
domain models directly from NL descriptions, as this would querying system (Walsh and Littman 2008; Mehta, Tade-
provide an important step in extending the widespread adop- palli, and Fern 2011); and the target model language, includ-
tion of planning techniques. We start from NL descriptions of ing STRIPS (Cresswell, McCluskey, and West 2009; Cress-
actions and use NL analysis to construct structured representa- well and Gregory 2011), probabilistic (Mourão, Petrick, and
tions, from which we construct formal representations of the Steedman 2010), and numeric (Gregory and Lindsay 2016;
action sequences. The generated action sequences provide the Hayton et al. 2016). However, in each case the user is left
necessary structured input for inducing a PDDL domain, us- the responsibility of defining a formal representation for the
ing domain model acquisition technology. In order to capture solution.
a concise planning model, we use an estimate of functional
similarity, so sentences that describe similar behaviours are Defining these logical formalisms and applying them con-
represented by the same planning operator. We validate our sistently requires time and experience in both the target do-
approach with a user study, where participants are tasked with main and in the representation language, which many poten-
describing the activities occurring in several videos. Then our tial users will not have. It is therefore important to consider
system is used to learn planning domain models using the alternative input languages, such as Natural Language (Gold-
participants’ NL input. We demonstrate that our approach is
effective at learning models on these tasks.
wasser and Roth 2011). Natural Language (NL) input is the
most natural way for humans to interact and it is no surprise
that there is much interest in using NL as input for com-
Introduction puter systems. In day-to-day life, Siri and its competitors
are controlled by simple spoken word input, but can activate
Modelling problems appropriately for use by a computer complex procedures on our phones. In the RoboCup@Home
program has been identified as a key bottleneck in the ex- competitions robots are controlled by task descriptions and
ploitation of various AI technologies. In Automated Plan- are automatically translated into a series of simple actions
ning, this has inspired a growing body of work that aims that can be performed on the robot. And NL lessons have
to support the modelling process including domain acquisi- been used to learn partial representations of the world dynam-
tion tools, which learn a formal domain model of a system ics for game-like environments (Goldwasser and Roth 2011).
from some form of input data. There is interest in apply- A key aspect of these systems is an underlying language,
ing domain model acquisition across a range of research which the NL input is mapped onto. For example, in the
and application areas. For example within the business pro- case of RoboCup@Home, an input of ‘go to the living room’
cess community (Hoffmann, Weber, and Kraft 2012) and might be mapped onto quite a different representation, using
Copyright  c 2017, Association for the Advancement of Artificial the action name ‘move’ and requiring a set of parameters that
Intelligence (www.aaai.org). All rights reserved. break the movement into smaller steps between connected

434
Figure 1: System overview: NL sentences are transformed into reduced representations (action templates) (a) that are clustered
based on similarity (b). Consistent formulations of the original sentences are then extracted (c) and a PDDL domain model is
induced using the domain model acquisition tool LOCM (d).

rooms. Therefore in these domains there is still a requirement Background


for a domain engineer to develop the formal representation.
Here we present relevant background in planning, domain
model acquisition and NLP approaches.
In this work we consider the problem of generating plan- The description of a planning problem in PDDL (McDer-
ning action representations automatically from a collection mott et al. 1998) is separated into two parts: the domain
of NL sentences that are descriptions of actions. The ben- model, a definition of the problem domain that defines the
efit of this is that we can target domain model acquisition world and its behaviours; and an explanation of the specific
systems, but do not have to manually define a formal repre- problem to be solved within that world. It is the domain
sentation for the solution. The key challenge is to generate model which is the target output of our approach. A domain
an appropriate formal representation, from which a general model is a tuple, D = O, P, defining the sets of operators,
and concise model can be induced, and that best represents O, and predicates, P. An operator, O ∈ O, is represented
the input sentences. Our approach, illustrated in Figure 1, by an operator header: a unique symbol (operator name)
can be summarised as follows. The first step (a) generates a and a list of typed variables (parameters). The operator body
reduced representation of each sentence, an action template, consists of three sets of predicates: the preconditions, and the
which captures the main action as well as the objects that are add and delete effects. An action, A, is a planning operator,
mentioned and an indication of their roles in the sentence. O, that has been instantiated with problem constants (param-
The second step (b) uses a measure of functional similarity, eters, preconditions and effects) and an action header is a
based on the reduced representations, in order to cluster sen- name and a list of constants (the instantiated parameters).
tences into operator sets. A consistent action representation is Our approach builds over work in domain model acqui-
then established by defining a mapping between the members sition, which exploits assumptions in the structure of the
of each cluster. The action sequences are then rewritten (c) input to learn planning models using a minimal input lan-
using these action representations and used to induce a PDDL guage (Cresswell and Gregory 2011). LOCM (Cresswell,
domain model (d) using the domain model acquisition tool, McCluskey, and West 2009; Cresswell and Gregory 2011;
LOCM2 (Cresswell and Gregory 2011). Gregory and Cresswell 2015; Gregory and Lindsay 2016) is a
family of domain model acquisition systems that operate from
In our implemented system user interaction is allowed dur- a set of example plans and generates a planning domain in
ing the development of the representation, although it is not the standard Planning Domain Definition Language (PDDL).
required at any step. There are three key aspects where the The LOCM procedure uncovers the structure embedded in
user can influence the representation: the user can rephrase the action sequences in order to identify FSM descriptions
sentences where an action was not detected; they can mod- for each object type. In order to do this, the system observes
ify the automatically selected sentence clusters (operator the transitions that an individual world object makes and then
sets); and finally they can correct or fill-in missing action generalises these behaviours to types based on the parameter
roles (parameters). In the evaluation, we demonstrate that our position that the objects take in each action. The structures
approach can accurately identify behaviour groups in NL sen- captured by the FSMs are then used to define a domain model
tences describing actions, rewrite related sentences using a and from this and the input sequences, problem descriptions
single representation and be robust to certain inconsistencies. can be generated.
These sentences were used to induce a PDDL model in three We use Stanford CoreNLP (Manning et al. 2014), a
domains: Towers of Hanoi, Logistics and Tyreworld. publicly-available and widely-used annotation pipeline for

435
NL Pick up Parcel1 from location B and put in the red truck order to maintain the same level of granularity as the input
[Pick/VB descriptions, internal action predicates are merged to join a
compound:prt>up/RP single action template. In Figure 2 the internal put predicate
Annotation

is merged into the pick action, creating a single action


CoreNLP

dobj>Parcel1/NN
nmod:from>[B/NN case>from/IN compound>location/NN] template for pick and put. It is important to notice that
cc>and/CC this structure is similar to the one generated for S2 in Figure 1
conj:and>[put/VB nmod:in> for quite a different sentence structure.
[truck/NN case>in/IN det>the/DT amod>red/JJ]]]

action : pick and put { Partitioning sentences into operator sets


Template

object : parcel1
Action

from : location b
The next step is to identify an appropriate partitioning of
in : red truck } the reduced sentences that best represent them, in order to
derive a set of consistent and general operator descriptions (as
Figure 2: Example NL sentence input, with its CoreNLP shown in Figure 1(b)). Our approach is to define a distance
annotation and resulting action template after rewrite rules: measure between sentence pairs and use it to identify clusters.
from CoreNLP annotation, verb, subject and object of the As each cluster represents a planning operator in the final
sentence form the action name and arguments (see text). domain model, we also consider, in the following discussion,
how the clustering can be controlled in order to effect the
generality of the generated domain model.
natural language analysis. Of most relevance to the current
work are the syntactic parsing annotations CoreNLP pro- Functional distance between sentences
duces. Syntactic analysis in CoreNLP is a two-stage process. The ideal distance measure would estimate the difference
Firstly, phrase structure trees are generated using statistical between the underlying functional process described in the
analysis of datasets containing many examples of manually sentences. This would require a rich description of the under-
annotated sentence parses (Klein and Manning 2003). Sec- lying process, whereas we consider how this can be estimated
ondly, these phrase structure trees are converted to depen- using the sentences in isolation. However, we do not want
dency parse graphs using a series of manually-curated rules to directly estimate the distance between the sentences. In-
based on patterns observed in the phrase structure trees (de stead we want to estimate the structural difference between
Marneffe, MacCartney, and Manning 2006). the sentences, with specific focus on the key action of the
sentence. In particular, we want to exploit the structure of the
Extracting action templates from NL input generated templates (previous section), which identifies the
The first step in our approach (see Figure 1(a)) is genera- main action of the sentence and provides a context for the
tion of action templates: reduced representations of input purpose (or role) of each object in the sentence.
sentences, which capture the main action, objects that are For example, consider these sentences:
mentioned and an indication of their roles in the sentence. S5 The truck has moved from Aberdeen to Dundee
For this we utilise the dependency graphs output by CoreNLP, S6 A green box was put onto the truck in Dundee
illustrated in Figure 2. The structure of this middle represen- S7 The truck was driven from Dundee to Stirling
tation must be further simplified to move closer to a predicate Sentences S5 and S7 are very similar in structure and prob-
logic representation. This is achieved through a recursive set ably describe a similar underlying behaviour. In particular,
of rules that crawl the dependency graph, transforming the we observe that the words moved and driven are in fact often
relations based on their types. Most importantly, the root verb used interchangeably and that representing them with a single
of the sentence forms the basis of the action name, while the operator would lead to a more concise and general model. In
verb’s subject and objects form the arguments. Conjunctions contrast, sentence S6 differs in the specific verb as well as
introduce new clauses of the sentence, which form further the structure of the sentence and therefore we would expect
predicates. Other relation types such as modifiers and com- it to be represented in a planning model with a different plan-
pounds are used to transform the names of the predicates ning operator. In general then we would expect that those
and arguments. The input of this process is a sentence and templates that are similar to each other might be represented
the output is a collection of slot label (e.g., in) and slot filler by a single operator. However, it is not as simple as collecting
(e.g., red truck) pairs. Where an action is detected, one or similar verbs. For example in Figure 1, S2 and S4 share the
more of the slot fillers will identify action name elements. same verb, however, describe different behaviours, which
The other slot labels indicate the associated slot filler’s role can only be distinguished by the roles of the objects in the
in the sentence. sentences.
There are many possible ways that users could formulate We first describe our approach for estimating the similarity
sentences to describe what happens during a single action. of individual symbols (i.e., the distance between them) and
This structure therefore provides a reduced representation then build from this to a complete similarity measure.
that identifies the main actors in an action and the roles
that they play in the sentence. It should be noted that our Distance between terms In this application, we are specif-
approach relies on consistent object references (e.g., ‘the ically interested in whether words can be used in place of
red truck’ in Figure 1) throughout the action sequence. In each other and are therefore synonymous with each other. We

436
use a collection of online lexical resources1 in order to gener- into similar groups. We can use the distance matrix for the
ate a set of weighted synonyms (a similar approach was used distance between templates pairs for clustering, which saves
to find antonyms for action and predicate names in (Porteous defining a projection of a template into a space.
et al. 2015; Lindsay et al. 2015)). Online lexical resources We adopt the Partitioning Around Medoids (PAM) im-
provide a source of typical synonyms without relying on the plementation of the k-medoids method (Kaufmann and
user to identify similar terms; however, the quality of output Rousseeuw 1987). This approach partitions the data into
can be inconsistent and therefore it is prudent to combine k clusters, each associated with a representative data point,
several sources. Each source, Si , can be seen as a function considered the most central in the cluster. The specific bene-
providing a vote and its score for each word pair can be nor- fits for this work are that the algorithm partitions the objects
malised to a value between 0 and 1 (higher scores for higher and operates from the dissimilarity matrix of the data points,
correspondence). The sources can also be parameterised by allowing us to use an arbitrary distance score.
the parts of speech (POS) of the generated synonyms, which
is useful in the case of estimating action type similarity, as Model generality Selecting an appropriate clustering is
these tend to be verbs. We define the similarity function, an interesting problem and one that will effect the gener-
SIM(w0 , w1 , P OS), which estimates the similarity between ality of the final representation. There may be more than
words w0 and w1 , with the set of accepted POS, as: one correct partitioning of the sentences into correct be-
n haviour groups. For example, consider the various encodings
SIM(w0 , w1 , P OS) = n1 i=1 Si (w0 , w1 , P OS) of stacking behaviours in PDDL: Depots, Towers of Hanoi,
In the case of multiple word terms, the above function is gen- and multiple representations of Blocksworld. Our default
eralised using the Levenshtein distance (Levenshtein 1966) approach is to calculate the average silhouette score for the
of the two word sequences (using a symbol for each word) clusters (Rousseeuw 1987), which evaluates the clusters by
and using the complement of the similarity scores (distance averaging the similarity within clusters and dissimilarity be-
measure) as a partial match cost. This provides a measure of tween clusters, with respect to the distance measure. This
correspondence between the sequences, while also respecting can only be evaluated accurately for at least 2 clusters, so we
ordering. first test to determine whether more than 1 cluster is appro-
priate (Duda, Hart, and others 1973). The optimal average
Similarity measure The similarity measure is based on silhouette score indicates a good trade-off between the size
the idea that if the actions involved in a pair of sentences are of k and the amount of dissimilarity in each cluster.
similar and the roles of the objects are similar then we expect Our system supports interaction at this stage, allowing the
that the function that the sentences are describing is similar. user to pick between different values of k, but also changing
The similarity measure for templates τ0 and τ1 , denoted the clusters. It is interesting to notice that the user organises
by δ(τ0 , τ1 ), is computed from two values: (their own) NL sentences into behaviour groups and therefore
• SAN(τ0 , τ1 ) : The similarity of the action names (entries in does not need to interpret any abstracted representation.
slots with label action) of τ0 and τ1 ;
• SSL(τ0 , τ1 ): The average similarity for each non-action
Generating a domain model
slot label (each role) from τ0 to τ1 , where for each role of We have presented our approach for selecting the operator
τ0 the closest matching role of τ1 is selected. sets that determine the main language for the generated do-
main model. In this section we construct a planning model
These values rely on the similarity SIM(w0 , w1 , P OS), de- that represents the dynamics captured in the NL action de-
fined above, which is used with action names with the ar- scriptions. The first step is to define the action language of the
gument, P OS=verb and for roles with argument, P OS=∗ planning model and this is achieved by demonstrating a con-
(any part of speech). sistent formalisation within each group of templates that have
The similarity measure can be computed as: been partitioned into a single operator set (as shown in Figure
δ(τ0 , τ1 ) = γ × SAN(τ0 , τ1 ) + (1−γ) × SSL(τ0 , τ1 ) 1, step (c)). This supports the rewriting of the sentences as
The parameter γ controls the importance in similarity be- sequences of action headers, which is a sufficient input for
tween roles and action names. domain model acquisition (Figure 1, step (d)). We conclude
the section by considering how missing values (parameters)
Cluster-based approach to operator sets selection can be addressed.
Clustering identifies groups of elements, which are similar Formalised representation of the sentences
(or close) to the elements in their own group, while being
dissimilar (or far away from) elements in other groups. It is We use the centre most element (the medoid and therefore
a hard problem in general, especially in domains where the a natural output from the clustering algorithm) as the basis
number of clusters cannot be guessed. However, it is a very for the operator description. For the associated template we
well studied area and many off-the-shelf toolkits exist. In define an operator header as follows: the name is the con-
this specific problem we want to cluster the action templates catenation of the terms with action slot labels (joined with
a symbol, e.g., ‘-’); and the parameter list is represented
1
Merriam-Webster http://www.dictionaryapi.com; Big Huge by the (non-action) slot labels (i.e., the slot fillers represent
Thesaurus http://words.bighugelabs.com; Power Thesaurus http: instantiations of those parameters). The translation of the
//www.powerthesaurus.org medoid sentence into the language of the action model is a

437
tence (see the discussion for plan rewriting in the evaluation).
Missing values are indicated and can be filled in by the user.
The main benefit of this approach is that the user can interact
with the system using only NL.
The default behaviour in this case is to break the plan
(a) Blocksworld (training) (b) Towers of Hanoi into two action sequences by removing the partially speci-
fied action. That is, for a plan π = a0 , . . . , ai , . . . , an , and
partially specified action, ai , we create two plan fragments:
π F 1 = a0 , . . . , ai−1 and π F 2 = ai+1 , . . . , an . These frag-
ments can then be used as input to the domain acquisition
system instead of the complete plan descriptions.
(c) Logistics (d) Tyreworld Evaluation
Figure 3: Screenshots from the videos used in the evaluation. In this section we present a case study examining the ap-
proach developed in this paper. This is split into two sections:
the first examines the viability of acquiring suitable input
space-separated concatenation of the name and the slot filler sentences from users; the second examines whether operator
(multiple words joined with a symbol, e.g., ‘-’), for each of sets can be synthesised that generalise user sentences and
the slot labels in order. planning models induced.
To translate one of the other templates τ  of the cluster, we
establish a mapping between the slot labels of the medoid Acquiring action descriptions
template, τ M , and those of τ  . This is done by establishing In order to obtain the action descriptions we asked naive users
a best match between the slot labels of the templates, by to explain animations that depicted action sequences from a
maximising the overall similarity of labels, using the (pre- collection of typical planning benchmark domains. The users
viously defined) function: SIM(slot-labelτ M , slot-labelτ  , ∗). were provided with recommendations for constructing the
The sentence is then constructed in a similar way, except sentences and were provided guidance when their sentences
for a specific slot label, the mapping is used to identify the did not meet these recommendations. In each case we noted
corresponding slot label in τ  and use its entry instead of the type of deviation that was made and we are therefore
τ M . In the case that a mapping is not found for a tag in y% able to provide an indication of the areas where training is
of the members of a cluster then the tag is pruned. As in required, or opportunities for supporting similar approaches
some cases a description will have more information than is with inference. The specific descriptions made by each user
necessary, this filtering process aims to identify the important still provide a wide variety of inputs.
parameters for the operator. We gathered sentences from 10 participants with a mix-
The output of this process are sequences of action headers ture of backgrounds and no experience of PDDL or related
that each instantiate one of the operator headers implied by languages. In each session we collected 39 action descrip-
the behaviour groups. It has been shown that, within certain tions between 3 domains. The session started by reading an
restrictions, sequences of action headers provide sufficient introduction to the study and several recommendations for
evidence of the dynamic structure of planning domains and the sentences, including general properties of the sentences,
can be used directly to induce a domain model (Cresswell e.g., that it was a stand-alone sentence without co-reference
and Gregory 2011). It is presenting the key objects in consis- (not enforced). The key recommendations were: that the sen-
tent orderings (achieved through our mapping approach) that tence should always include an explicit description of the
allows LOCM to uncover the inherent structures. It is then starting situation of the main object before the action (e.g.,
the job of LOCM to identify the key relationships between pick up the blue block from the red block); and to use con-
the parameters of actions, which it then encodes as predicates sistent referencing for objects; for example if you refer to an
in the induced domain model. object as ‘the red block’ to always use this name. The partici-
pants were then shown an example video and sentence in the
Missing parameter values Blocksworld domain (Figure 3a). The participants described
actions for animated videos in the following domains:
In practice, a user may not always mention all of the objects
involved in an action, or may not be consistent with the ob- • Towers of Hanoi: The benchmark domain, visualised with
jects mentioned. There is no guaranteed method of inferring cards and not pegs (Figure 3b). A card can only be placed
the missing parameters as the correct dynamics of the sys- on top of a card with a lower rank. The player must move
tem (even if they can be expressed in STRIPS) are unknown. the cards so that all 4 cards are piled on the right-most
Thus the system supports user interaction at this stage, al- stack. We used the first 8 moves of a solution.
lowing the user to both fill in missing parameters and correct • Logistics: A standard logistics domain with misplaced
incorrect parameters. The representative sentence for each packages that must be relocated using trucks (Figure 3c).
cluster is used as a template to rewrite each of the sentences The goal indicates the final configuration of the packages.
of the cluster. The slot fillers of the representative sentence The videos presented one small (4 step) and one longer
are replaced with the relevant fillers from the member sen- (15 step) scenario.

438
Ref. Variation More info. Action detection Others p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
Hanoi 3 5 1 2 Hanoi 0(1) 0(3)∗ 0(2)∗ 0(1) 0(1)∗ 0(1) 0(2)∗ 0(2)∗ 0(4)∗ 0(2)∗
Logistics 0 7 3 4 Log. 0(3) 0(3) 1(4)∗ 0(3) 0(3) 1(3) 0(3) 1(4)∗ 0(3) 0(3)
Tyres 0 4 4 0 Tyres 2(6) 2(6)∗ 2(6) 0(5) 3(5)∗ 1(5) 1(5)∗ 1(5)∗ 2(6)∗ 1(5)∗

Table 1: The number of instances for each category of Table 2: Number of errors and clusters (in parenthesis) for the
guidance required by participants during sentence collection sentences for each participant (p1–p10) in the three domains.
(from a total of 390 action descriptions collected). The cate- Each cell records the number of wrongly allocated sentences
gories are: referential variation, more required information, and the cluster count in parenthesis. The symbol ∗ indicates
no action detected in the sentence, and others (concentration at least one k (not selected by silhouettes) partitions the
and participant instigated querying of verb use). sentences into distinct behaviours.

Open the boot of the red car


Take out the jack from the boot of the red car that were surprisingly good. There were relatively few typos
Put the jack underneath the front of the red car and only one occurrence of entering the wrong event. There
Lift the front of the red car with the jack was only one occasion where a participant changed the way
Remove the front wheel of the red car they were recording a behaviour during a scenario in such
Replace the front wheel of the red car
a way that the following sentence did not include enough
Let down the front of the red car with the jack
Take the jack from underneath the front of the red car
information (from a lack of concentration). Although we
Put the jack into the boot of the red car
had to request more information on several occasions, the
Close the boot of the red car participants typically continued to provide this in subsequent
descriptions (within that scenario).
Figure 4: An example of the input NL sentences (from par- In a final step, we normalised user references, e.g., ‘point
ticipant p5) describing the actions in the Tyreworld scenario. A’ and ‘A’ were both mapped to ‘location A’. This was in
order to assist with reference disambiguation and parsing,
e.g., ‘A’ is a word and parsed differently from ‘B’ or ‘C’.
• Tyreworld: A subset of the benchmark domain. The sce- In general, the main limitation of the sentences was spe-
nario involves jacking a car up by opening the boot, re- cific missing information. Therefore, considering how this
moving the jack, raising the car and removing the wheel information can be recovered from alternative sources, in-
(Figure 3d), before reversing the process. An example of a cluding additional user input is key future work in extending
participants sentences are presented in Figure 4. the applicability of this approach.

Table 1 presents a coding of the guidance that was required Inducing planning models
during participant construction of input sentences. These can In this part we take each participant’s input separately and
be largely divided into two areas: parsing the inputs and learn a domain model using the process as presented.
guidance on specific content of the inputs. The main parsing
issue involved a failure of the parser to extract actions from Identifying behaviour groups The first stage is clustering
the sentences. We were able to test the user sentences as the sentences into individual behaviour groupings. Table 2
they were constructed and therefore discover an alternative shows the number of errors in splitting the sentences into
quickly. In these cases (e.g., ‘load’ or ‘lower’ at the beginning behaviours. In Logistics there are three main behaviours and
of a sentence) we asked the participant to consider using an the clustering approach typically divides the sentences ac-
alternative verb. cordingly. There were several cases (p6, p7, p8, p10) where
The main limitation observed in the user sentences was the same verb was used for distinct behaviours, e.g., using
missing out one of the key actors in the action. It is perhaps ‘moved’ for driving, loading and unloading. However, the
unsurprising that in the first sentences for Logistics and Tow- clustering was robust to this, although in some of these cases
ers of Hanoi the participants often did not mention the starting other causes impacted on the performance. This demonstrates
location when describing moving a card, truck or package. the importance of using the roles as part of the similarity
Although it should be noted that the participants had been measure. In fact the only source of error in Logistics was
explicitly asked to do so using the Blocksworld example. inconsistency in the sentences. This happened both in verb
Missing information was less of a problem in the Tyreworld use (p3) and different role identifiers (p6, p8, p10).
domain. The most common omission was not mentioning the Whereas in Logistics there seem to be clear distinct be-
jack’s role in lowering the car. haviours, there is more ambiguity in the other domains. In
In Towers of Hanoi, some of the participants described the the Towers of Hanoi examples there are 4 behaviours that
cards moving between columns, or an enumeration of the can be distinguished: from empty, to empty; from empty, to
cards in the involved stacks. Use of alternative referencing card; from card, to empty; and from card, to card. In some
encodings (referential variation) was not observed in other cases the participants made consistent distinctions and some
domains. Of course these alternative descriptions are valid of these behaviours were isolated. In Table 3 we present the
and more importantly might be appropriate for alternative number of behaviours that were correctly isolated using dif-
model acquisition target languages. ferent values of k in the clustering algorithm. In some cases,
There were certain aspects of the participants’ sentences no distinction in language was made and only a single be-

439
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p1 p2 p3 p4 p5 p6 p7 p8 p9 p10
1 cluster           Hanoi  R 1R      R∗ R
2 clusters           Logistics        1∗  1
3 clusters       1    Tyres          
4 clusters   1       1
Table 4: Whether the correctly clustered sentences induced a
Table 3: Tower of Hanoi results for participants (p1-p10). model. : a PDDL model was induced; n: some fixes were
For the 4 possible behaviours (i.e. clusters k = 1 to max. required and then a model was induced; ∗: a partial model
4.):  indicates the participant correctly distinguished the was induced before fixes; R: a correct representation was
behaviour;  the number of clusters selected by the average constructed.
silhouette score; ‘1’ indicates a single change was required;
and  failure to distinguish behaviour (see text).
selected by silhouettes). A representative was selected for
each cluster and then a mapping was made onto each mem-
ber of the cluster to identify the best match for each of the
0.6
Avg. Silhouette

(unfiltered) representative’s roles. We set the parameter for


tag filtering at y = 20%.
0.4

As we have seen above (e.g., Table 3) in Towers of Hanoi,


the participants varied in their chosen description strategy.
0.2

When selecting a single cluster, the action headers for partici-


0.0

pants p1, p4, p5, p6, p7 and p8 are equivalent to the PDDL
2 4 6 8 10 12
benchmark model. Participants: p2, p3, p9, p10, made dis-
tinctions between the different behaviours. In these cases
k the short action sequence that they were asked to describe
was insufficient to provide enough examples for LOCM to
Figure 5: Plot of average silhouette scores for different values induce a model properly. However, for participants, p2, p3
of k in Tyreworld (participant p5). and p10, the final operator headers correctly described the
actions. For p9, the rewriting rules used during parsing re-
moved some of the important content and so the resulting
haviour was identified (e.g., p1, p4 and p6). The ‘to empty’ operator headers, while correct for the information, did not
and ‘empty to empty’ behaviours were the most commonly contain all the important objects. In each case except 9, if
distinguished. additional sentences are added (we added 15 sentences) using
The performance of the silhouette selection is not as effec- a consistent method of description and the sentences are sepa-
tive in Tyreworld. The number of samples from each partic- rated into four clusters then a PDDL similar to the 4-operator
ipant is small and this is particularly relevant in Tyreworld Blocksworld model is induced.
where there can be 10 different behaviours (depending on In Logistics the participants used predominantly consistent
participant encoding). It is important to note that there are sentences within each of the three behaviour groups. Out of
7 out of 10 cases where the distance measure distinguished four inconsistencies, there were two cases (p8 and p10) that
behaviours for some value of k. The silhouette plot presented prevented the correct slot fillers to be identified. For example,
in Figure 5 illustrates how close the silhouette scores were one participant mixed ‘onto’ and ‘into’ and these words are
for p5 to a correct partitioning at k=8. not identified as synonyms by the selected sources. However,
In Logistics and Hanoi, the γ (the bias between action p3 changed their explanation of taking a package out of a
name and roles in the distance measure) values: 0.33, 0.5 truck from: ‘The Parcel1 has been taken out of the red truck at
and 0.66, generated the same clusters. In Tyreworld there location C’ to ‘The Parcel2 has been removed from the blue
are small differences in the order the sentences break into truck at location E’. The sources matched both ‘taken’ and
separate clusters as k is increased. However, there is only one ‘removed’ as synonyms, as well as the roles ‘from’ and ‘out’.
change in silhouette score (p2) and by k = 8 (approximately This highlights how the use of synonyms help to generalise
the number of behaviours) all clusters are the same. over some of the variation in descriptions.
In general over the 3 domains there are only 4 instances In Tyreworld the main structure is in sequential applica-
where there is not a valid partitioning for some value of k. bility of operators. One participant captured a more factored
This provides support that the selected distance measure is model, representing the jack moving out from the car boot to
an effective approach for identifying behaviours. However, the ground, round the car and then to a position underneath
choosing amongst correct partitionings is still an interesting the car (with its return journey). This provided a traversal
problem, as it can influence the generality of the induced structure, whereas most of the other descriptions used distin-
model. However, pragmatically selecting one that has least guishing language between the behaviours, e.g., putting the
missing values could be considered. jack in and taking it out of the boot.
Formalising the behaviour representations In this part The learnt PDDL model Once consistent formulations of
we assume that the behaviours have been correctly identified the input NL sentences had been extracted, the resulting ac-
and split into different clusters (i.e., not necessarily the one tion sequences were formatted, as action headers, and input

440
drive_1 red_truck location_A location_B
move_2 Parcel1 location_B red_truck
drive_1 red_truck location_B location_E
move_3 Parcel1 red_truck location_E
drive_1 red_truck location_E location_C
move_2 Parcel2 location_C red_truck
drive_1 red_truck location_C location_A
move_3 Parcel2 red_truck location_A
Figure 6: Induced FSM for a jack in Tyreworld. The states move_2 Parcel1 location_E blue_truck
have been labelled for presentation. move_3 Parcel1 blue_truck location_E
drive_1 blue_truck location_E location_C

to LOCM. LOCM generates a collection of FSMs that char- Figure 7: Logistics domain: plan generated from a larger
acterise the dynamics of the problem domain. For example, example including sentences S1-S4 (participant p6).
Figure 6 presents the structure identified for the jack object
for a participant’s sequences for Tyreworld. These FSMs can
be used to induce a planning domain model (e.g., the Logis- These include (Sil and Yates 2011) who used text mining
tics domain for the sentences of p6 is presented in Figure 1) via a search engine to identify documents that contain words
and in combination with an action sequence, to generate a that represent target verbs or events and then uses inductive
problem model. For example, a planning problem was gen- learning techniques to identify appropriate action pre- and
erated using p6’s descriptions of the Logistics scenario (a post-conditions. Their system was able to learn action rep-
larger example of sentences S1-S4) and Figure 7 presents a resentations, although with certain restrictions such as the
plan generated to solve that problem. number of predicate arguments. Branavan et al. (2012) intro-
During processing, the mapping from the original sentence duce a reinforcement learning approach which uses surface
to the action template is retained and the parameter positions linguistic cues to learn pre-condition relation pairs from text
of the operator are identified in each sentence. This provides for use during planning. The success of the learnt model
us with a template for rewriting a generated plan by fitting relies on use of feedback automatically obtained from plan
the arguments into the sentence. For example, in Figure 7, execution attempts. Yordanova (2016) presents an approach
we present a plan generated using the learnt model. Using the which works with input text solution plans, as a proxy for
template for the move 2 operator, plan step two can be rewrit- instructions, and aims to learn pre- and post-condition ac-
ten as ‘move the Parcel1 from location B into the red truck’ tion representations. However this approach uses hand-coded
(underscores retained for clarity). The development of this representations of the initial and goal state for input plans.
would be to modify the sentence using a language model,
e.g., selecting determiners. Conclusion and future work
This system presents a step towards a general interface for We believe this is the first approach that generates PDDL mod-
exploiting planning technologies using only NL. els directly from NL without an existing target model. Our
approach harnesses a selection of existing technologies, in-
Related work cluding Standford CoreNLP, several online lexical resources,
The majority of related work has aimed at mapping NL input PAM, and LOCM. In our evaluation we demonstrated that the
onto an existing formal representation. system can create formalisms from a variety of different NL
In RoboCup@Home various approaches have been representations. Although improving the robustness of the
adopted to define the mapping onto the grounded domain approach will be important future work, it should be noted
representation. For example Kollar et al. (2013) present a that once a model is generated then it can be combined with
probabilistic approach to learning the referring expressions the large body of existing work that looks at mapping NL
for robot primitives and physical locations in a region. And onto existing formalisms. In the current approach, separating
Mokhtari, Lopes, and Pinho (2016) present an approach to the sentences into appropriate behaviours plays an important
learning action schemata for high-level robot control. role in determining the quality of the generated PDDL model.
In (Goldwasser and Roth 2011) the authors present an Important future work will explore generating models with
alternative approach to learning the dynamics of the world various granularities and identifying whether they can be
where the NL input provides a direct lesson about part of supported by the information content of the input sentences.
the dynamics of the environment. For example, the lesson: Another avenue of future work is considering more intelli-
‘You can move any of the top cards to an empty free-cell’ is a gent ways of dealing with missing information. Our approach
general rule that applies across several grounded situations. relies heavily on a sufficient number and length of fully speci-
Each lesson is supported by a small training data set (e.g., 20 fied sequences in the input. An interesting approach would be
examples) to support learning from the lessons. In contrast to use a predictive model to estimate the parameter selections;
to our approach, their system relies on a representation of perhaps taking inspiration from the recommendation system
the states and actions, which means their NLP approach can approach presented in (Krivic et al. 2016) for predicting ini-
target an existing language. tial world object properties. Alternatively, we could target
More closely related to our work are attempts to learn other domain acquisition systems, such as (Mourão, Petrick,
planning models in the absence of a target representation. and Steedman 2010) that handle noisy data.

441
Acknowledgements robot Dialog. In Proceedings of IEEE International Confer-
This work is supported by EPSRC Grant EP/N017447/1. ence on Robotics and Automation (ICRA).
Krivic, S.; Cashmore, M.; Ridder, B.; and Piater, J. 2016.
References Initial State Prediction in Planning. In Proc. 31st Workshop
of the UK Planning and Scheduling SIG (PlanSIG).
Branavan, S. R. K.; Kushman, N.; Lei, T.; and Barzilay, R. Levenshtein, V. I. 1966. Binary codes capable of correcting
2012. Learning High-level Planning from Text. In Proceed- deletions, insertions and reversals. Cybernetics and Control
ings of the 50th Annual Meeting of the Association for Com- Theory 10:707–710.
putational Linguistics, ACL ’12, 126–135. Stroudsburg, PA,
USA: Association for Computational Linguistics. Lindsay, A.; Charles, F.; Read, J.; Porteous, J.; Cavazza, M.;
and Georg, G. 2015. Generation of non-compliant behaviour
Cresswell, S., and Gregory, P. 2011. Generalised domain in virtual medical narratives. In Proc. of the 15th Interna-
model acquisition from action traces. In Proc. of the 21st Int. tional Conference on Intelligent Virtual Agents (IVA).
Conf. on Automated Planning and Scheduling (ICAPS).
Manning, C. D.; Surdeanu, M.; Bauer, J.; Finkel, J. R.;
Cresswell, S. N.; McCluskey, T. L.; and West, M. M. 2009. Bethard, S.; and McClosky, D. 2014. The stanford corenlp
Acquisition of Object-Centred Domain Models from Plan- natural language processing toolkit. In The Annual Meet-
ning Examples. In Proc. of 19th Int. Conf. on Automated ing of the Association for Computational Linguistics (System
Planning and Scheduling (ICAPS). Demonstrations), 55–60.
de Marneffe, M.-C.; MacCartney, B.; and Manning, C. D. McCluskey, T. L.; Cresswell, S. N.; Richardson, N. E.; and
2006. Generating typed dependency parses from phrase struc- West, M. M. 2009. Automated acquisition of action knowl-
ture parses. In Proceedings of the Conference on Language edge. In International Conference on Agents and Artificial
Resources and Evaluation, 449–454. Intelligence (ICAART), 93–100.
Duda, R. O.; Hart, P. E.; et al. 1973. Pattern classification McDermott, D.; Ghallab, M.; Howe, A.; Knoblock, C.; Ram,
and scene analysis, volume 3. Wiley New York. A.; Veloso, M.; Weld, D.; and Wilkins, D. 1998. PDDL-the
Ersen, M., and Sariel, S. 2015. Learning behaviors of and in- planning domain definition language. Technical report, Yale
teractions among objects through spatio-temporal reasoning. University.
Computational Intelligence and AI in Games, IEEE Transac- Mehta, N.; Tadepalli, P.; and Fern, A. 2011. Efficient Learn-
tions on 7(1):75–87. ing of Action Models for Planning. In ICAPS Planning and
Frank, J. D.; Clement, B. J.; Chachere, J. M.; Smith, T. B.; and Learning Workshop (PAL),.
Swanson, K. J. 2011. The Challenge of Configuring Model- Mokhtari, V.; Lopes, L. S.; and Pinho, A. J. 2016. Experience-
Based Space Mission Planners. In International Workshop on Based Robot Task Learning and Planning with Goal Infer-
Planning and Scheduling for Space. ence. In Proc. of the 26th International Conference on Au-
Goldwasser, D., and Roth, D. 2011. Learning from natural tomated Planning and Scheduling (ICAPS).
instructions. In Proc. of the 22nd Int. Joint Conf. on Artifical Mourão, K.; Petrick, R. P. A.; and Steedman, M. 2010. Learn-
Intelligence (IJCAI). ing action effects in partially observable domains. In Proc.
Gregory, P., and Cresswell, S. 2015. Domain Model Acquisi- 19th European Conference on AI (ECAI). IOS Press.
tion in the Presence of Static Relations in the LOP System. In Porteous, J.; Lindsay, A.; Read, J.; Truran, M.; and Cavazza,
Proc. of 25th Int. Conf. on Automated Planning and Schedul- M. 2015. Automated extension of narrative planning domains
ing (ICAPS), 97–105. with antonymic operators. In Proc. of the Int. Conf. on Au-
Gregory, P., and Lindsay, A. 2016. Domain Model Acquisi- tonomous Agents and Multiagent Systems (AAMAS).
tion in Domains with Action Costs. In Proc. of the 26th Int. Richardson, N. E. 2008. An Operator Induction Tool Support-
Conf. on Automated Planning and Scheduling (ICAPS). ing Knowledge Engineering in Planning. Ph.D. Dissertation,
Hayton, T.; Gregory, P.; Lindsay, A.; and Porteous, J. 2016. School of Computing and Engineering, University of Hudder-
Best-fit action-cost domain model acquisition and its applica- sfield, UK.
tion to authorship in interactive narrative. In AAAI Conf. on Rousseeuw, P. J. 1987. Silhouettes: a graphical aid to the
AI and Interactive Digital Entertainment (AIIDE). interpretation and validation of cluster analysis. Journal of
Hoffmann, J.; Weber, I.; and Kraft, F. M. 2012. SAP speaks computational and applied mathematics 20:53–65.
PDDL: Exploiting a software-engineering model for planning Sil, A., and Yates, A. 2011. Extracting strips representations
in business process management. Journal of Artificial Intelli- of actions and events. In Recent Advances in Natural Lan-
gence Research 44:587–632. guage Processing (RANLP).
Kaufmann, L., and Rousseeuw, P. J. 1987. Clustering by Walsh, T. J., and Littman, M. L. 2008. Efficient Learning of
means of medoids. Journal of Machine Learning Research. Action Schemas and Web-Service Descriptions. In Proc. of
23rd AAAI Conference on Artificial Intelligence.
Klein, D., and Manning, C. D. 2003. Fast exact inference with
a factored model for natural language parsing. In Advances Yordanova, K. 2016. From Textual Instructions to Sensor-
in Neural Information Processing Systems, volume 15. 3–10. based Recognition of User Behaviour. In Proc. of 21st Int.
Conf. on Intelligent User Interfaces, IUI Companion. ACM.
Kollar, T.; Perera, V.; Nardi, D.; ; and Veloso, M. 2013. Learn-
ing Environmental Knowledge from Task-based Human-

442

You might also like