OwlExporter
Guide for Users and Developers
René Witte
Ninus Khamis
Release 1.0-beta2
May 16, 2010
Semantic Software Lab
Concordia University
Montréal, Canada
http://www.semanticsoftware.info
Contents
1 Introduction to the OwlExporter 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 How to read this documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Installing the OwlExporter 3
2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Download . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.4 Importing the OwlExporter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Using the OwlExporter 5
3.0.1 Example Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.0.2 ANNIE + OwlExporter + Individuals . . . . . . . . . . . . . . . . . . . . . . 5
3.0.3 ANNIE + OwlExporter + Relationships . . . . . . . . . . . . . . . . . . . . 9
3.0.4 ANNIE + OwlExporter + Coreference Chains . . . . . . . . . . . . . . . . . 14
4 OwlExporter Implementation Notes 17
4.0.5 Multiple Document Exportation . . . . . . . . . . . . . . . . . . . . . . . . 17
4.0.6 Duplicate Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.0.7 Annotations Sets versus Annotations . . . . . . . . . . . . . . . . . . . . . 17
About this document
Within this writting we provide the instructions needed to install and run the OwlExporter
in a GATE Cunningham et al. (2002) application in order to export entities of a text corpus
as individuals, and datatype and object relationships of a Web Ontology Language (OWL)
model Franz Baader et al. (2007).
Acknowledgments
The following developers contributed to the design and implementation of the OwlExporter:
René Witte, Ninus Khamis, Qiangqiang Li and Thomas Kappler.
License
The OwlExporter, and resources are published under the GNU Affero General Public License
v3 (AGPL3)1 .
1 AGPL3, http://www.fsf.org/licensing/licenses/agpl-3.0.html
iii
Contents
iv
Chapter 1
Introduction to the OwlExporter
1.1 Overview
The General Architecture for Text Engineering (GATE) is a tool used for the processing of
natural language documents 1 .
The GATE framework architecture can be extended using plug-ins known as a Collection
of REusable Objects for Language Engineering (CREOLE). There are three types of CREOLE
resources:
Language Resources (LRs): Entities such as lexicons, corpora or ontologies
Processing Resources (PRs): Entities that are primarily algorithmic such as parsers, genera-
tors, or ngram modelers.
Visual Resources (VRs): Visualization and editing components that participate in GUIs.
The OwlExporter is a Processing Resource that allows to largely automate ontology pop-
ulation from text using a GATE application. In addition to domain-specific concepts, the
OwlExporter is also capable of exporting NLP concepts, such as Sentence, NP and VP, and
integrating reasoning support for entities that reappear within a given text and are linked
together using coreference chains.
Figure 1.1: The OwlExporter Processing Resource
Originally developed by the IPD research group at Karlsruhe University, the OwlExporter is
currently being maintained by the Semantic Software Lab at Concordia University.
1 GATE, http://gate.ac.uk
1
Chapter 1 Introduction to the OwlExporter
1.2 How to read this documentation
To get the OwlExporter up and running within a GATE environment, we recommend that
your read the End Users documentation. To extend the OwlExporter you will need to read
the Developers section:
End Users: Please refer to the OwlExporter installation and usage Chapters (2 and 3)
Developers: Please refer to the OwlExporter developer’s notes (Chapter 4)
2
Chapter 2
Installing the OwlExporter
Note: At present, the installation has been tested under Linux and Windows.
The OwlExporter is implemented using the Protégé 3.4.2 OWL-API, and the GATE 5.2
library.
2.1 Prerequisites
To deploy the OwlExporter, a number of other (open source) components are needed. The
current distribution comes with pre-compiled libraries for the OwlExporter, but to compile
and run them you will still need to install a the following:
Needed Software:
1. Apache ANT, http://ant.apache.org/
2. Sun JDK 6, http://java.sun.com/javase/downloads/index.jsp
3. GATE v5.0 or newer, http://gate.ac.uk/
4. Protégé-OWL API, http://protege.stanford.edu/plugins/owl/api/
2.2 Download
To download the latest version of the OwlExporter, this documentation and the GATE demo
pipeline, please refer to http://www.semanticsoftware.info/owlexporter.
2.3 Compilation
Note: The GATE and Protégé lib folders MUST be part of the operating system’s environment
variable for the OwlExporter to compile and run successfully.
The implementation of the OwlExporter is located in OWlExporterV2/. To compile the
OwlExporter, follow these steps:
1. cd to OwlExporterV2
2. ant jar.
After performing the steps mentioned above, a Java Archive called OwlExporterV2.jar is
built. This will contain the binaries needed to include the OwlExporter as a PR within GATE.
3
Chapter 2 Installing the OwlExporter
2.4 Importing the OwlExporter
In order to run the OwlExporter we first need to import the PR into the GATE environment.
Plugins are added to the GATE environment using the “Plugin Management Console”. The
console can be accessed by either selecting “File - Manage CREOLE plugins” from the menu
or the ”Manage CREOLE plugins” toolbar item.
Figure 2.1: GATE Known CREOLE Window
When in the “Plugin Management Console” screen, click on the “Add a new CREOLE
Repository” button and using the “Open File” dialog box navigate to the directory containing
the OwlExporterV2.jar file.
After selecting the directory that contains the jar file, the “OwlExporterV2” plugin will be
added to the list of “Known CREOLE Directories”. To begin working with the OwlExporter we
will also need to select the “Load now” and/or the “Load Always” options. After completing
the previous steps click on the “OK” button in order for the OwlExporter to be successfully
added to the GATE environment.
Note: If the installation of the OwlExporter plug-in was unsuccessful, the stack trace dis-
playing the nature of the error can be viewed in the “Messages” tab of the GATE IDE.
4
Chapter 3
Using the OwlExporter
In section 3.0.2 we explain what is needed in order to successfully export entities of a
given annotation as individuals for a specific Concept in a Domain and NLP ontology using
an ANNIE Cunningham et al. (2002) pipeline, an ANNIE domain ontology, an NLP ontology,
and the OwlExporter. In section 3.0.3 we export additional entities of annotations that are
not specific to the ANNIE pipeline, more importantly we explain what is needed in order to
successfully export the relationships created by a GATE application as datatype and object
properties of a Domain and NLP ontology.
Note: To download the latest version of the GATE demo pipeline, and the ontologies, please
refer to http://www.semanticsoftware.info/owlexporter.
3.0.1 Example Ontologies
For the demo described below we used two example ontologies:
ANNIE Ontology: For the domain ontology we used the demo.owl 1 taxonomic classifica-
tion that models commonly used concepts in GATE such as Person, Organization and
Location.
NLP Ontology: The NLP ontology used in the demo contains the concepts Document, Sentence
and NP. Additionally the ontology has the object properties containsSentence that has
Document as the domain and Sentence as the range, and contains that has Sentence as
the domain and NP as the range.
3.0.2 ANNIE + OwlExporter + Individuals
In this section we will discuss creating an ANNIE pipeline, including the OwlExporter PR in
the pipeline, and creating three simple Jape grammars that map the entities of a corpus to
the Concepts in the Domain and NLP ontologies.
Creating an ANNIE Pipeline
The ANNIE system is a set of PRs that rely on finite state algorithms and JAPE transducers
to perform IE tasks. For a detailed description of ANNIE please refer to Chapter 8 of the
GATE User Guide Cunningham et al. (2006). For our demo pipeline we will begin with an
ANNIE system that uses the default parameters. In order to create an ANNIE pipeline simply
select “File - Load ANNIE System - With Defaults” from the menu or click on the green ANNIE
symbol from the toolbar and select “With Defaults”. When creating an ANNIE system we are
presented with a pipeline containing the following PRs:
1 demo.owl is bundled with GATE and can be accessed from the GATE HOME \plugins\Ontology\Tools \resources
directory
5
Chapter 3 Using the OwlExporter
1. Document Reset PR
2. ANNIE English Tokenizer
3. ANNIE Gazetteer
4. ANNIE Sentence Splitter
5. ANNIE POS Tagger
6. ANNIE NE Transducer
7. ANNIE OrthoMatcher
Creating the Mapping Grammar
In this section we will discuss the mapping Jape grammars that are in charge of creating the
temporary annotations and their related features required by the OwlExporter to export the
entities of a corpus as individuals of an ontology.
OwlExportClassDomain: The OwlExporter uses the OwlExportClassDomain annotation to
identify the domain annotations of a corpus that need to be exported as instances of
their related concepts in the domain Ontology.
OwlExportClassNLP: The OwlExporter uses the OwlExportClassNLP annotation to identify
the NLP annotations of a corpus that need to be exported as instances of their related
concepts in the NLP Ontology.
Both the OwlExportClassDomain and OwlExportClassNLP annotations must contain the
features shown in Table 3.1
Feature Description
className This feature is of type String and must match the name of the Concept in the
Ontology that the annotated text in the corpus is an instances of.
instanceName This feature is String type and is the normalized form of the annotated text that
needs to be exported.
corefChain This feature is of type integer and is the ID of the FuzzyChain.
representationId This feature is of type integer and is the ID of the annotation that needs to be
exported.
Table 3.1: Features needed by the OwlExportClassDomain and the OwlExportClassNLP an-
notations
The Jape grammar shown in 3.1 is a multi-phased jape transducer that includes the
mention map domain entities, mention map nlp entities and mention owl class export map-
ping grammars. To include the Jape grammar in the ANNIE pipeline simply right-click on
“Processing Resources - New - Jape Transducer” and use “Grammar URL” to navigate to the
Jape file.
MultiPhase: Mention
Phases:
mention_map_domain_entities
mention_map_nlp_entities
mention_owl_class_export
Figure 3.1: An Example of the Multi Phased Jape Transducer
6
The mention map domain entities Jape grammar shown in 3.2 accepts the Person, Organi-
zation and Location annotations and creates the MentionDomain annotation that includes the
className feature needed by the OwlExporter to export entities of a corpus as individuals of
the domain ontology.
Phase: mention_map_domain_entities
Input: Person Organization Location
Options: control = all debug = true
Rule: mention_map_domain_entities
(
{Person} |
{Organization} |
{Location}
)
:ann
-->
{
AnnotationSet as = (gate.AnnotationSet)bindings.get("ann");
Annotation ann = (gate.Annotation)as.iterator().next();
if(ann.getType().compareToIgnoreCase("Person")==0)
if(ann.getFeatures().get("NMRule")!=null)
return;
FeatureMap features = ann.getFeatures();
if(ann.getType().compareToIgnoreCase("Person")==0 ||
ann.getType().compareToIgnoreCase("Location")==0 ||
ann.getType().compareToIgnoreCase("Organization")==0) {
features.put("className", ann.getType());
outputAS.add(as.firstNode(), as.lastNode(), "MentionDomain", features);
}
}
Figure 3.2: An Example of Domain Entities Mapping Grammar
The mention map NLP entities Jape grammar shown in 3.3 accepts the Sentence annotation
and creates the MentionNLP annotation that includes the className feature needed by the
OwlExporter to export entities of a corpus as individuals of the NLP ontology.
Phase: mention_map_nlp_entities
Input: Sentence
Options: control = all debug = true
Rule: mention_map_nlp_entities
(
{Sentence}
)
:ann
-->
{
AnnotationSet as = (gate.AnnotationSet)bindings.get("ann");
Annotation ann = (gate.Annotation)as.iterator().next();
FeatureMap features = ann.getFeatures();
features.put("className", ann.getType());
outputAS.add(as.firstNode(), as.lastNode(), "MentionNLP", features);
}
Figure 3.3: An Example of NLP Entities Mapping Grammar
Note: The value of the “className” feature for both the “OwlExportClassDomain” and “Owl-
ExportClassNLP” annotations must match the exact name of the mapped-to concept in the
ontology
The mention owl class export Jape grammar shown in 3.4 accepts the MentionDomain and
MentionNLP annotations created previously and creates the OwlExportClassDomain anno-
7
Chapter 3 Using the OwlExporter
tation for entities that get exported to the domain ontology, and the OwlExportClassNLP
annotation for the entities that get exported to the NLP ontology respectively. Both newly cre-
ated annotations include the kind, representationId, instanceName and corefChain features
needed by the OwlExporter.
Phase: mention_owl_class_export
Input: MentionDomain MentionNLP
Options: control = all
Rule: mention_owl_class_export
(
{MentionDomain} |
{MentionNLP}
)
:ann
-->
{
AnnotationSet as = (gate.AnnotationSet)bindings.get("ann");
Annotation ann = (gate.Annotation)as.iterator().next();
String instanceName = "";
try {
instanceName = doc.getContent().getContent(
ann.getStartNode().getOffset(), ann.getEndNode().getOffset()).toString();
}
catch(Exception e) {
}
FeatureMap features = ann.getFeatures();
features.put("kind", "Class");
features.put("representationId", ann.getId());
features.put("instanceName", instanceName);
features.put("corefChain", null);
if(ann.getType().compareToIgnoreCase("MentionDomain")==0)
outputAS.add(as.firstNode(), as.lastNode(), "OwlExportClassDomain", ann.getFeatures());
else
outputAS.add(as.firstNode(), as.lastNode(), "OwlExportClassNLP", ann.getFeatures());
}
Figure 3.4: An Example of the OwlExportClassDomain and OwlExportClassNLP Grammar
Including the OwlExporter in the ANNIE Pipeline
If the steps explained in 2 were performed successfully, the OwlExporter can be imported
into the newly created pipeline by right clicking on Processing Resources and selecting
OwlExporterV2.
To include the OwlExporter in the pipeline, double click on “ANNIE” under “Applications”
and position the OwlExporter PR in the pipeline by first moving it from “Loaded Processing
resources” to “Selected Processing resources”, and secondly positioning within the set of
existing PRs.
Note: The order of which the processing resources execute is important, and depending on
the design of the GATE application, the OwlExporter should always be placed at the end of the
pipeline. The services provided by the Jape grammars (JapeMain) and (NP) discussed herein
are used by the OwlExporter, and therefore should ALWAYS execute before the OwlExporter in
the GATE pipeline.
In Figure 3.5 we show the order of which the Processing Resources in the demo pipeline
get executed. Notice that the “NP” and “JapeMain” grammars are executed after the default
ANNIE PRs, and before the OwlExporter.
In table 3.2 we cover the run-time parameters used by the OwlExporter to process a text
corpus.
8
Figure 3.5: ANNIE Pipeline Including the OwlExporter
3.0.3 ANNIE + OwlExporter + Relationships
In this section we discuss exporting additional entities that are not specific to the ANNIE
pipeline. More importantly we discuss what is needed to export relationships created by a
GATE application as datatype and object property relationships within a specific ontology,
and finally what is needed to export relationships created by a GATE application as object
property relationships between individuals that belong to the two different domain and NLP
ontologies.
Creating the Document and NP Annotations
In order for the OwlExporter to be aware of the documents that were already processed,
a Document concept can be created in either the domain or NLP ontology that contains
document instances that store the information of the documents that have been processed.
In Figure 3.6 we show an example of a Jape grammar that creates a Document annotation
that includes the required features shown below:
title: Used to store the name of the document
source: Used to store the source url of the document
Note: The OwlExporter uses the “title” and “source” information to track which documents
have already been processed. If a document has already been processed a warning message
will be displayed.
9
Chapter 3 Using the OwlExporter
Run-time Parameters Type Required Description
corefChainList ArrayList The set of coreference chains in the text corpus.
debugFlag Boolean • Run the OwlExporter with or without the debugger.
When set to true the OwlExporter prints various mes-
sages during different processing stages.
exportDomainOntology URL • The location and filename of the outputed domain on-
tology.
exportNLP boolean • Export an NLP ontology or not.
exportNLPOntology URL • The location and filename of the outputed NLP ontology.
importDomainOntology URL • The path of the file that contains the pre-existing do-
main ontology that the OwlExporter will use to popu-
late and create the “exportDomainOntology”output on-
tology.
importNLPOntology URL • The path of the file that contains the pre-existing NLP
ontology that the OwlExporter will use to populate and
create the “exportNLPOntology”output ontology.
inputASName String The processing annotation set where the OwlExporter
will look for the OwlExportClass and OwlExportRelation.
multiOwlExport Boolean • Output a single Owl file or multiple Owl Files Generic,
NP, and Coreferencer that are all imported by a base
output ontology.
Table 3.2: Summary of OwlExporter Run-time Parameters
Phase: mention_doc_info
Input: Token
Options: control = Once
Rule: mention_doc_info
({Token})
:ann
-->
{
try
{
FeatureMap features = Factory.newFeatureMap();
features.put("title",doc.getName());
features.put("source",doc.getSourceUrl().getFile());
outputAS.add((long) 0,doc.getContent().size(),
"Document",features);
}
catch(InvalidOffsetException ioEX) {
System.out.println(ioEX);
}
}
Figure 3.6: An Example of the Doc Info Grammar
We modified the NLP Ontology by adding a “Document” concept that the Document annota-
tions in the GATE application will be exported to as individuals of the ontology.
Note: The “Document” instance can be exported in either the Domain or NLP ontology de-
pending on the the design of the ontology.
In the NLP Ontology we also created an “NP” concept that the NP annotations in the GATE
application will be exported to. For creating the NP annotations included the Multilingual
Noun Phase Extractor (MuNPEx) 2 in our ANNIE pipeline.
2 The latest version of MuNPEx can be downloaded from http://www.semanticsoftware.info/munpex
10
To export the newly created “Document” and “NP” annotations to the NLP ontology we
modified the “mention map entities” Jape grammar discussed earlier to accept the newly
created annotations as shown in 3.7.
Phase: mention_map_entities
Input: Document Sentence NP
Options: control = all debug = true
Rule: mention_map_nlp_entities
(
{Document} |
{Sentence} |
{NP}
)
:ann
-->
{
AnnotationSet as = (gate.AnnotationSet)bindings.get("ann");
Annotation ann = (gate.Annotation)as.iterator().next();
FeatureMap features = ann.getFeatures();
features.put("className", ann.getType());
outputAS.add(as.firstNode(), as.lastNode(), "MentionNLP", features);
}
Figure 3.7: An Example of the Doc Info Grammar
Exporting Datatype Property Relationships
In order for datatype properties to be exported all that is required is the creation of the
datatype property in the ontology with a name that matches that of the feature of a given
annotation that needs to be exported. Additionally the domain of the datatype property must
match the annotation that the feature belongs to, and the range of the property must match
the literal type of the information being exported.
For example, continuing with the Document example Jape grammar created previously,
for the title and source information to be exported as datatype properties in the ontology
we created the datatype properties title and source that have Document as the domain and
xsd:string as the range. And using the features created by the Jape grammar shown in 3.6,
the title and source datatype property relationships are created.
Note: No additional annotations are needed in order for the the OwlExporter to create
datatype property relationships.
Exporting Domain and NLP Object Property Relationships
In this section we will discuss the mapping Jape grammars that are in charge of creating the
temporary annotations and their related features required by the OwlExporter to export the
entities of a corpus as relationships of their related instances in an ontology.
Unlike datatype property relationships, separate annotations are required in order for the
OwlExporter to create object property relationships in an ontology. The OwlExporter uses the
OwlExportRelationDomain and OwlExportRelationNLP annotations to identify the entities of a
corpus that need to be exported as object property relationships in a Domain or NLP ontology.
OwlExportRelationDomain: The OwlExporter uses the OwlExportRelationDomain annotation
to identify the domain annotations of a corpus that need to be exported as relationships
of their related individuals in the domain Ontology.
11
Chapter 3 Using the OwlExporter
OwlExportRelationNLP: The OwlExporter uses the OwlExportRelationNLP annotation to
identify the NLP annotations of a corpus that need to be exported as relationships of
their related individuals in the NLP Ontology.
Both the OwlExportRelationDomain and OwlExportRelationNLP annotations must contain
the features shown in Table 3.3
Feature Description
propertyName This feature is of type String and must match the name of the related object
property in the ontology.
domainId This is feature is of type Integer, it is the id of the annotation that is set as the
domain of the related object property.
rangeId This feature is also of type Integer, it is the id of the annotation that is set as the
range of the related object property.
kind This features is String type and is used by the OwlExporter to identify annotations
that need to be exported as relationships. The value of this feature is always
JAPE.
Table 3.3: Features needed by the OwlExportClassDomain and the OwlExportClassNLP an-
notations
Continuing with our previous JapeMain multi-phased Jape transducer, we added a
new Jape file called mention map nlp relationships that includes two Jape grammars Docu-
ment containsSentence Sentence 3.9 and Sentence contains NP 3.10. The first relationship
is in charge of finding all of the Sentences that belong to a specific Document and creating
a relationship between them. The second rule is in charge of finding all of the NPs within a
given Sentence and creating a relationship between them.
MultiPhase: Mention
Phases:
mention_doc_info
mention_map_domain_entities
mention_map_nlp_entities
mention_owl_class_export
mention_map_nlp_relationships
Figure 3.8: An Example of the Multi Phased Jape Transducer
The mention owl relation export Jape grammar shown in 3.11 accepts the MentionRela-
tionDomain and MentionRelationNLP annotations created previously and creates the OwlEx-
portRelationDomain annotation for entities that get exported as relationships to the domain
ontology, and the OwlExportRelationNLP annotation for the entities that get exported as
relationships to the NLP ontology. Both newly created annotations include the propertyName,
domainId, rangeId and kind features needed by the OwlExporter.
Note: Even though the “MentionRelationDomain” annotation is never used within out exam-
ple the idea is the same for entities of a corpus that need to be exported as object property
relationships in the domain ontology.
Exporting Object Property Relationships across Domain and NLP Ontolgies
In the previous section we explained what is required in order to export object property
relationships within either a Domain or NLP ontology. In this section we will discuss what
is required in order to export object property relationships between the Domain and NLP
ontology.
Similar to object property relationships that are created within a specific ontology, separate
annotations are required in order for the OwlExporter to create object property relationships
12
Rule: Document_containsSentence_Sentence
(
{Document}
)
:ann
-->
{
gate.AnnotationSet docAS = (gate.AnnotationSet)bindings.get("ann");
gate.Annotation docAnn = (gate.Annotation)docAS.iterator().next();
AnnotationSet sentAS = inputAS.getContained(
docAS.firstNode().getOffset(), docAS.lastNode().getOffset()).get("Sentence");
gate.Annotation sentAnn = (gate.Annotation)sentAS.iterator().next();
String propertyName="containsSentence";
for(Annotation a : sentAS) {
gate.FeatureMap features = Factory.newFeatureMap();
features.put("propertyName",propertyName);
features.put("domainId",docAnn.getId());
features.put("rangeId",a.getId());
features.put("kind", "JAPE");
outputAS.add(
docAS.firstNode(), docAS.lastNode(),"MentionRelationNLP",features);
}
}
Figure 3.9: An Example of a Doc/Sent Relationship Grammar
Rule: Sentence_contains_NP
(
{Sentence}
)
:ann
-->
{
gate.AnnotationSet sentAS = (gate.AnnotationSet)bindings.get("ann");
gate.Annotation sentAnn = (gate.Annotation)sentAS.iterator().next();
AnnotationSet npAS = inputAS.getContained(
sentAS.firstNode().getOffset(), sentAS.lastNode().getOffset()).get("NP");
String propertyName="contains";
for(Annotation a : npAS) {
gate.FeatureMap features = Factory.newFeatureMap();
features.put("propertyName",propertyName);
features.put("domainId", sentAnn.getId());
features.put("rangeId",a.getId());
features.put("kind", "JAPE");
outputAS.add(npAS.firstNode(), npAS.lastNode(),"MentionRelationNLP",features);
}
}
Figure 3.10: An Example of a Sent/NP Relationship Grammar
between the domain and NLP ontologies. The OwlExporter uses the OwlExportRelationDo-
mainNLP annotation to identify the entities of a corpus that need to be exported as object
property relationships between the Domain and NLP ontologies.
For our example we will create an object property relationship appearsIn that links the
domain entities to the sentences they appear in. We began by modifying the Domain ontology
to import the NLP ontology. This enabled us to create the appearsIn relationship in the
domain ontology that has Person, Location, and Organization from the domain ontology as
the domain and Sentence from the NLP ontology as the range. We then added the men-
tion owl relation export show in Figure 3.12. The Jape rule looks identical to the grammars
discussed earlier with the exception of the output annotation set which in this case is Owl-
ExportRelationDomainNLP, which is the annotation the OwlExporter uses to export object
13
Chapter 3 Using the OwlExporter
Phase: mention_owl_relation_export
Input: MentionRelationDomain MentionRelationNLP
Options: control = all
Rule: mention_owl_class_export
(
{MentionRelationDomain} |
{MentionRelationNLP}
)
:ann
-->
{
AnnotationSet as = (gate.AnnotationSet)bindings.get("ann");
Annotation ann = (gate.Annotation)as.iterator().next();
FeatureMap features = ann.getFeatures();
features.put("kind", "JAPE");
if(ann.getType().compareToIgnoreCase("MentionRelationDomain")==0)
outputAS.add(as.firstNode(), as.lastNode(), "OwlExportRelationDomain", ann.getFeatures());
else
outputAS.add(as.firstNode(), as.lastNode(), "OwlExportRelationNLP", ann.getFeatures());
}
Figure 3.11: An Example of the OwlExportRelationNLP Grammar
property relationships that span the domain and NLP ontologies.
Phase: mention_owl_domain_nlp_relation
Input: Sentence
Options: control = all
Rule: DomainEntity_appearsIn_Sentence
(
{Sentence}
)
:ann
-->
{
gate.AnnotationSet sentAS = (gate.AnnotationSet)bindings.get("ann");
gate.Annotation sentAnn = (gate.Annotation)sentAS.iterator().next();
AnnotationSet domAS = inputAS.getContained(
sentAS.firstNode().getOffset(), sentAS.lastNode().getOffset()).get("OwlExportClassDomain");
String propertyName="appearsIn";
for(Annotation a : domAS) {
gate.FeatureMap features = Factory.newFeatureMap();
features.put("propertyName",propertyName);
features.put("domainId", a.getFeatures().get("representationId"));
features.put("rangeId",sentAnn.getFeatures().get("representationId"));
features.put("kind", "JAPE");
outputAS.add(sentAS.firstNode(), sentAS.lastNode(),"OwlExportRelationDomainNLP",features);
}
}
Figure 3.12: An Example of the OwlExportRelationDomainNLP Grammar
3.0.4 ANNIE + OwlExporter + Coreference Chains
The OwlExporter also support modelling entities that reappear in different parts of a corpus,
and that are linked together using coreference chains. For example, an NE coreferencer such
as the one in ANNIE Cunningham et al. (2002) can identify the nominal and pronominal
coreferences between entities and create chains that link them together.
14
Figure 3.13: Coreference Chain Relationships exported into an OWL Ontology
When the coreferencer identifies entities of a corpus as being part of the same referent or
representative an annotation (for example, NP Chain) is created that contains a list of entities
that make up a coreference chain. The OwlExporter’s OwlExportClassDomain annotation
accepts a corefChain feature that contains the ID of the coreference chain as shown in 3.2.
By including the corefChain feature the OwlExporter:
Creates the corefSentenceWithId relationship: The corefSentenceWithId relationship associates
the referent in a chain with the sentences containing the occurrences of the coreferences.
Creates the corefStringWithId relationship: The corefStringWithId relationship ties the referent
to it the multiple occurrences of its coreferences.
Creates the sameAs relationship: The OwlExporter establishes the links between the indi-
viduals in the ontology using the symmetric owl:sameAs property Franz Baader et al.
(2007). This allows the related individuals to be classified by an OWL reasoner as being
equivalent.
Figure 3.14: Entities in a Coreference Chain linked together using the owlSameAs property
15
Chapter 3 Using the OwlExporter
In Figure 3.13 we show the corefSentenceWithId and corefStringWithId relationships created
by the OwlExporter for a coreference chain. Figure 3.14 shows how the OwlExporter models
coreference chains using the owl:sameAs property.
16
Chapter 4
OwlExporter Implementation Notes
In this section we discuss the various requirements taken into consideration when implement-
ing the OwlExporter. Please not that this is not a formal representation of the OwlExporter
implementation.
4.0.5 Multiple Document Exportation
The OwlExporter is capable of handling a GATE Corpus that contains multiple GATE docu-
ments. The instances and relationships of each GATE document get appended to the output
ontology.
4.0.6 Duplicate Instances
The OwlExporter exports all OwlExportClass annotations from multiple documents and thus
exists the possiblility that multiple instances of the same type get exported. We handle
this by underscoring the id of the instance (id x), and we also use ”owl:same As” to make
all occurences of x the same. Another reason that maintaining single occurences of each
instance is not favourable is because of the coreferece chains created by the Coreferencer
and exported to the Ontology.
4.0.7 Annotations Sets versus Annotations
As explained earlier the OwlExporter only looks for two annotation sets called OwlExportClass
and OwlExportRelation, to create Instances and Relationships expressed by Object Properties,
respectively. This is that it is a much clearer design because it separates the OwlExporter
from the components generating the domain-specific annotations. The OwlExporter becomes
simpler because it doesn’t need the list with ASs to export anymore, but handles just one.
The UI gets simpler because the user doesn’t have to use the cumbersome Gate UI to add
the relevant ASs. Authors of components have total flexibility concerning what is exported to
the ontology, because only after all their analysis steps are done, they decide which instance
(annotation) of their annotation set should become the representation, i.e. the exported
instance.
17
Bibliography
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: an Architecture for
Development of Robust HLT Applications. Proceedings of the 40th Anniversary Meeting
of the Association for Computational Linguistics (ACL), 2002. URL http://gate.ac.uk/
sale/acl02/acl-main.pdf.
Hamish Cunningham, Diana Maynard, Kalina Bontcheva, Valentin Tablan, Cristian Ursu,
Marin Dimitrov, Mike Dowman, Niraj Aswani, and Ian Roberts. Developing Language Pro-
cessing Components with GATE Version 5 (a User Guide). University of Sheffield, February
2006. URL http://gate.ac.uk/sale/tao/.
Diego Calvanese Franz Baader, Deborah L. McGuinness, Daniele Nardi, and Peter F. Patel-
Schneider. The Description Logic Handbook: Theory, Implementation, and Applications.
Cambridge University Press, 2007.
18