Final Part Load HAPS2
Final Part Load HAPS2
1. From the data store, load the exercise5.txt file in the GUI. Also load the
playercontext.jape file from Example 5 folder. Before you run this Jape grammar, look at
the text in exercise5.txt and write down what you would expect the result of running the
program using “Brill”, “Appelt” and “All”. Run the program and see the results to check
against your assumptions and try to understand any differences.
2. The “appelt” control style is the most appropriate for named entity recognition as under
“appelt” only one rule can fire for the same pattern. Do you agree?
We can use macros with JAPE grammar with the same effect as in the programming
languages. In the Example 2 of this tutorial, the reusable pattern Lookup.majorType ==
Person can be converted in a macro. Look for PersonMacro.jape file from the folder
Example 5 and load it into GUI. Run the jape transducer on Example 6.txt file to inspect
the results. You will achieve same results as with the nestedpattern.jape.
Phase: PersonMacro
Input: Lookup Token
//note that we are using Lookup and Token both inside our rules.
Options: control = appelt
Macro: PERSON
(
{Lookup.majorType == Person}
{Token.string == "player"}
)
:temp
(
PERSON
|
(
{Token.kind==word, Token.category==NNP, Token.orth==upperInitial}
{Token.kind==word, Token.category==NNP, Token.orth==upperInitial}
) )
:player -->
Following shows you how to use a negation operator in JAPE grammar. Let’s take one
example to demonstrate the requirement of the negation operator in entity extraction. For
example, we are looking for titles in the text but particularly not interested in title “Sir”.
The correct rule shall not detect title (“Sir”) from the following story text (Example 7.txt)
but detect “Mr”:
Park Ji Sung and Jonny Evans are expected to commit their long-term
futures to Manchester United in the coming weeks as the English, European
and world champions continue to plan for life after Sir Alex Ferguson.
Rule: negationop
(
:TitleNotStartingWithS
(
{Lookup.majorType == "Person"}
):person
-->
:TitleNotStartingWithS.Title = {rule= "negationop" }, :person.Person =
{rule= "negationop" }
The line:
Will take care of ignoring the title “Sir” and making sure that person will be annotated
only once as the rule as a whole will be applied.
The RHS of a JAPE rule can consist of any Java code. This is useful for removing
temporary annotations and for percolating and manipulating features from previous
annotations identified by the LHS. The example text story (Example 8.txt) we are using for
this example is:
We would ideally like to annotate name of the Team with label “Team” and also annotate
the team name with the property “teamOfSport” which is already available through the
Lookup.
The JAPE grammar to achieve this is the usingJAVAinRHS.jape.
Phase:usingJAVAinRHS
Input: Lookup
Options: control = all
Rule: javainRHS1
(
{Lookup.majorType == Team}
)
:team
-->
{
gate.AnnotationSet team = (gate.AnnotationSet)bindings.get("team");
gate.Annotation teamAnn = (gate.Annotation)team.iterator().next();
gate.FeatureMap features = Factory.newFeatureMap();
features.put("teamOfSport", teamAnn.getFeatures().get("minorType"));
features.put("rule","javainRHS1");
outputAS.add(team.firstNode(), team.lastNode(), "Team",features); }
The rule matches a team’s name, e.g. “Manchester United”, and adds a teamOfSport
feature depending on the value of the minorType from the gazetteer list in which the name
was found. We first get the bindings associated with the team label (i.e. the Lookup
annotation). We then create a new annotation called “teamAnn” which contains this
annotation, and create a new FeatureMap to enable us to add features. Then we get the
minorType features (and its value) from the teamAnn annotation (in this case, the feature
will be “teamOfSport” and the value will be “Football_Club”), and add this value to a new
feature called “teamOfSport”. We create another feature “rule” with value “javainRHS1”.
Finally, we add all the features to a new annotation “Team” which attaches to the same
nodes as the original “team” binding.
Note that inputAS and outputAS represent the input and output annotation set. Normally,
these would be the same (by default when using ANNIE, these will be the “Default”
annotation set) however the user is at liberty to change the input and output annotation sets
in the parameters of the JAPE transducer at runtime, it cannot be guaranteed that the input
and output annotation sets will be the same, and therefore we must specify the annotation
set we are referring to.
So far, we have individual JAPE grammars doing their trick in isolation, however easily we
can contemplate real-world scenarios where you want these grammar to work together to
achieve a complex task. For achieving this, the list of phases can be specified (in the order
in which they are to be run) in a file, conventionally named main.jape. When loading the
grammar into GATE, it is only necessary to load this main file – the phases will then be
loaded automatically. It is, however, possible to omit this main file, and just load the phases
individually, but this is much more time- consuming. The grammar phases do not need to
be located in the same directory as the main file, but if they are not, the relative path should
be specified for each phase.
One of the main reasons for using a sequence of phases is that a pattern can only be used
once in each phase, but it can be reused in a later phase. Combined with the fact that
priority can only operate within a single grammar, this can be exploited to help deal with
ambiguity issues. The solution currently adopted is to write a grammar phase for each
annotation type, or for each combination of similar annotation types, and to create
temporary annotations. These temporary annotations are accessed by later grammar phases,
and can be manipulated as necessary to resolve ambiguity or to merge consecutive
annotations. The temporary annotations can either be removed later, or left and simply
ignored. Generally, annotations about which we are more certain are created earlier on.
Annotations which are more dubious may be created temporarily, and then manipulated by
later phases as more information becomes available.
See the difference in the syntax of main.jape compared to other jape files that contains
single phase.
Following is a complex example using Java in RHS. To explain what we are after we will
use following text story (Example 10.txt).
The aim here is to generate full name from this information (Lookup annotation) and at the
same time specify gender component of each such person as a property. We are after
something like:
Rule: FirstName
// Fred
(
{Lookup.majorType == person_first}
):person -->
{
gate.AnnotationSet person = (gate.AnnotationSet)bindings.get("person");
gate.Annotation personAnn = (gate.Annotation)person.iterator().next();
gate.FeatureMap features = Factory.newFeatureMap();
if(anAnnot.getEndNode().getOffset().equals(personAnn.getEndNode().get
Offset())){
ambig = !gender.equals(anAnnot.getFeatures().get("minorType")); }
}
if(!ambig) features.put("gender", gender);