0% found this document useful (0 votes)
79 views6 pages

Webanno: A Flexible, Web-Based and Visually Supported System For Distributed Annotations

This document introduces WebAnno, a web-based annotation tool that allows distributed annotation of documents for natural language processing. Some key points: - WebAnno allows annotation of part-of-speech, named entities, dependencies, and coreferences. It supports configurable tagsets and multiple user roles. - The tool has a web-based interface for annotation in a browser, supports large documents, import/export of annotations, and crowdsourcing of annotations. - WebAnno provides quality control features like tracking inter-annotator agreement, user management, and adjudication of annotations. The architecture allows adding new visualization or editing modes.

Uploaded by

Firdaus Maulana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views6 pages

Webanno: A Flexible, Web-Based and Visually Supported System For Distributed Annotations

This document introduces WebAnno, a web-based annotation tool that allows distributed annotation of documents for natural language processing. Some key points: - WebAnno allows annotation of part-of-speech, named entities, dependencies, and coreferences. It supports configurable tagsets and multiple user roles. - The tool has a web-based interface for annotation in a browser, supports large documents, import/export of annotations, and crowdsourcing of annotations. - WebAnno provides quality control features like tracking inter-annotator agreement, user management, and adjudication of annotations. The architecture allows adding new visualization or editing modes.

Uploaded by

Firdaus Maulana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

WebAnno: A Flexible, Web-based and Visually Supported

System for Distributed Annotations


Seid Muhie Yimam1,3 Iryna Gurevych2,3 Richard Eckart de Castilho2 Chris Biemann1
(1) FG Language Technology, Dept. of Computer Science, Technische Universität Darmstadt
(2) Ubiquitous Knowledge Processing Lab (UKP-TUDA)
Dept. of Computer Science, Technische Universität Darmstadt
(3) Ubiquitous Knowledge Processing Lab (UKP-DIPF)
German Institute for Educational Research and Educational Information
http://www.ukp.tu-darmstadt.de

Abstract Furthermore, an interface to crowdsourcing plat-


We present WebAnno, a general pur- forms enables scaling out simple annotation tasks
pose web-based annotation tool for a wide to a large numbers of micro-workers. The added
range of linguistic annotations. Web- value of WebAnno, as compared to previous an-
Anno offers annotation project manage- notation tools, is on the one hand its web-based
ment, freely configurable tagsets and the interface targeted at skilled as well as unskilled
management of users in different roles. annotators, which unlocks a potentially very large
WebAnno uses modern web technology workforce. On the other hand, it is the support for
for visualizing and editing annotations in quality control, annotator management, and adju-
a web browser. It supports arbitrarily dication/curation, which lowers the entrance bar-
large documents, pluggable import/export rier for new annotation projects. We created Web-
filters, the curation of annotations across Anno to fulfill the following requirements:
various users, and an interface to farming
• Web-based: Distributed work, no installation
out annotations to a crowdsourcing plat-
effort, increased availability.
form. Currently WebAnno allows part-of-
speech, named entity, dependency parsing • Interface to crowdsourcing: unlocking a very
and co-reference chain annotations. The large distributed workforce.
architecture design allows adding addi-
tional modes of visualization and editing, • Quality and user management: Integrated
when new kinds of annotations are to be different user roles support (administra-
supported. tor, annotator, and curator), inter-annotator
agreement measurement, data curation, and
1 Introduction
progress monitoring.
The creation of training data precedes any sta-
tistical approach to natural language processing • Flexibility: Support of multiple annotation
(NLP). Linguistic annotation is a process whereby layers, pluggable import and export formats,
linguistic information is added to a document, and extensibility to other front ends.
such as part-of-speech, lemmata, named entities,
• Pre-annotated and un-annotated documents:
or dependency relations. In the past, platforms
supporting new annotations, as well as man-
for linguistic annotations were mostly developed
ual corrections of existing, possibly auto-
ad-hoc for the given annotation task at hand, used
matic annotations.
proprietary formats for data exchange, or required
local installation effort. We present WebAnno, a • Permissive open source: Usability of our tool
browser-based tool that is immediately usable by in future projects without restrictions, under
any annotator with internet access. It supports an- the Apache 2.0 license.
notation on a variety of linguistic levels (called an-
notation layers in the remainder), is interoperable In the following section, we revisit related work
with a variety of data formats, supports annotation on annotation tools, which only partially fulfill the
project management such as user management, of- aforementioned requirements. In Section 3, the ar-
fers an adjudication interface, and provides qual- chitecture as well as usage aspects of our tool are
ity management using inter-annotator agreement. lined out. The scope and functionality summary

1
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 1–6,
c
Sofia, Bulgaria, August 4-9 2013. 2013 Association for Computational Linguistics
of WebAnno is presented in Section 4. Section 5
elaborates on several use cases of WebAnno, and
Section 6 concludes and gives an outlook to fur-
ther directions.

2 Related Work
GATE Teamware (Bontcheva et al., 2010) is prob-
ably the tool that closely matches our requirements
regarding quality management, annotator manage-
ment, and support of a large set of annotation lay-
ers and formats. It is mostly web-based, but the Figure 1: System architecture, organized in user,
annotation is carried out with locally downloaded front end, back end and persistent data storage.
software. An interface to crowdsourcing platforms
is missing. The GATE Teamware system is heav-
which is mirrored in its open-source implementa-
ily targeted towards template-based information
tion1 , makes it possible to easily extend the tool or
extraction. It sets a focus on the integration of au-
add alternative user interfaces for annotation lay-
tomatic annotation components rather than on the
ers that brat is less suited for, e.g. for constituent
interface for manual annotation. Besides, the over-
structure. In Section 3.1, we illustrate how differ-
all application is rather complex for average users,
ent user roles are provided with different graphical
requires considerable training and does not offer
user interfaces, and show the expressiveness of the
an alternative simplified interface as it would be
annotation model. Section 3.2 elaborates on the
required for crowdsourcing.
functionality of the back end, and describes how
General-purpose annotation tools like MMAX2
data is imported and exported, as well as our im-
(Müller and Strube, 2006) or WordFreak (Morton
plementation of the persistent data storage.
and LaCivita, 2003) are not web-based and do not
provide annotation project management. They are
also not sufficiently flexible regarding different an- 3.1 Front End
notation layers. The same holds for specialized All functionality of WebAnno is accessible via
tools for single annotation layers, which we can- a web browser. For annotation and visualiza-
not list here for the sake of brevity. tion of annotated documents, we adapted the brat
With the brat rapid annotation tool (Stenetorp rapid annotation tool. Changes had to be made to
et al., 2012), for the first time a web-based open- make brat interoperate with the Apache Wicket,
source annotation tool was introduced, which sup- on which WebAnno is built, and to better integrate
ports collaborative annotation for multiple anno- into the WebAnno experience.
tation layers simultaneously on a single copy of
the document, and is based on a client-server ar-
3.1.1 Project Definition
chitecture. However, the current version of brat
has limitations such as: (i) slowness for docu- The definition and the monitoring of an annota-
ments of more than 100 sentences, (ii) limits re- tion project is conducted by a project manager (cf.
garding file formats, (iii) web-based configuration Figure 1) in a project definition form. It supports
of tagsets/tags is not possible and (iv) configuring creating a project, loading un-annotated or pre-
the display of multiple layers is not yet supported. annotated documents in different formats2 , adding
While we use brat’s excellent visualization front annotator and curator users, defining tagsets, and
end in WebAnno, we decided to replace the server configuring the annotation layers. Only a project
layer to support the user and quality management, manager can administer a project. Figure 2 illus-
and monitoring tools as well as to add the interface trates the project definition page with the tagset
to crowdsourcing. editor highlighted.

3 System Architecture of WebAnno 1


Available for download at (this paper is based on v0.3.0):
webanno.googlecode.com/
The overall architecture of WebAnno is depicted 2
Formats: plain text, CoNLL (Nivre et al., 2007), TCF
in Figure 1. The modularity of the architecture, (Heid et al., 2010), UIMA XMI (Ferrucci and Lally, 2004)

2
Figure 2: The tagset editor on the project definition page

3.1.2 Annotation separate version of the document, which is set to


Annotation is carried out with an adapted ver- the state in progress the first time a document is
sion of the brat editor, which communicates with opened by the annotator. The annotator can then
the server via Ajax (Wang et al., 2008) using the mark it as complete at the end of annotation at
JSON (Lin et al., 2012) format. Annotators only which point it is locked for further annotation and
see projects they are assigned to. The annotation can be used for curation. Such a document cannot
page presents the annotator different options to set be changed anymore by an annotator, but can be
up the annotation environment, for customization: used by a curator. A curator can mark a document
as adjudicated.
• Paging: For heavily annotated documents or
very large documents, the original brat vi- 3.1.4 Curation
sualization is very slow, both for displaying The curation interface allows the curator to open a
and annotating the document. We use a pag- document and compare annotations made by the
ing mechanism that limits the number of sen- annotators that already marked the document as
tences displayed at a time to make the perfor- complete. The curator reconciles the annotation
mance independent of the document size. with disagreements. The curator can either decide
on one of the presented alternatives, or freely re-
• Annotation layers: Annotators usually work
annotate. Figure 3 illustrates how the curation in-
on one or two annotations layers, such as
terface detects sentences with annotation disagree-
part-of-speech and dependency or named en-
ment (left side of Figure 3) which can be used to
tity annotation. Overloading the annota-
navigate to the sentences for curation.
tion page by displaying all annotation layers
makes the annotation and visualization pro- 3.1.5 Monitoring
cess slower. WebAnno provides an option to WebAnno provides a monitoring component, to
configure visible/editable annotation layers. track the progress of a project. The project man-
• Immediate persistence: Every annotation is ager can check the progress and compute agree-
sent to the back end immediately and per- ment with Kappa and Tau (Carletta, 1996) mea-
sisted there. An explicit interaction by the sures. The progress is visualized using a matrix of
user to save changes is not required. annotators and documents displaying which docu-
ments the annotators have marked as complete and
3.1.3 Workflow which documents the curator adjudicated. Fig-
WebAnno implements a simple workflow to track ure 4 shows the project progress, progress of in-
the state of a project. Every annotator works on a dividual annotator and completion statistics.

3
Figure 3: Curation user interface (left: sentences
with disagreement; right: merging editor) Figure 4: Project monitoring

3.1.6 Crowdsourcing as UIMA CAS objects (Ferrucci and Lally, 2004).


All other data is persisted in an SQL database.
Crowdsourcing is a way to quickly scale annota-
tion projects. Distributing a task that otherwise
3.2.1 Data Conversion
will be performed by a controlled user group has
become much easier. Hence, if quality can be en- WebAnno supports different data models that re-
sured, it is an alternative to high quality annotation flect the different communication of data between
using a large number of arbitrary redundant anno- the front end, back end, and the persistent data
tations (Wang et al., 2013). For WebAnno, we storage. The brat data model serves exchanging
have designed an approach where a source doc- data between the front end and the back end.
ument is split into small parts that get presented The documents are stored in their original for-
to micro-workers in the CrowdFlower platform3 . mats. For annotations, we use the type system
The crowdsourcing component is a separate mod- from the DKPro Core collection of UIMA compo-
ule that handles the communication via Crowd- nents (Eckart de Castilho and Gurevych, 2009)4 .
Flower’s API, the definition of test items and job This is converted to the brat model for visualiza-
parameters, and the aggregation of results. The tion. Importing documents and exporting anno-
crowdsourced annotation appears as a virtual an- tations is implemented using UIMA reader and
notator in the tool. writer components from DKPro Core as plug-ins.
Since it is not trivial to express complex anno- Thus, support for new formats can easily be added.
tation tasks in comparably simple templates suit- To provide quick reaction times in the user inter-
able for crowdsourcing (Biemann, 2013), we pro- face, WebAnno internally stores annotations in a
ceed by working out crowdsourcing templates and binary format, using the SerializedCasReader and
strategies per annotation layer. We currently only SerializedCasWriter components.
support named entity annotation with predefined
templates. However, the open and modular archi- 3.2.2 Persistent Data Storage
tecture allows to add more crowdsourced annota- Project definitions including project name and de-
tion layers. scriptions, tagsets and tags, and user details are
kept in a database, whereas the documents and an-
3.2 Back End notations are stored in the file system. WebAnno
WebAnno is a Java-based web application that supports limited versioning of annotations, to pro-
may run on any modern servlet container. In mem- tect against the unforeseen loss of data. Figure 5
ory and on the file system, annotations are stored shows the database entity relation diagram.
3 4
www.crowdflower.com code.google.com/p/dkpro-core-asl/

4
Figure 6: Parts-of-speech & dependency relations

Figure 7: Co-reference & named entites


Figure 5: WebAnno database scheme

• Named entities: a multiple span annotation


4 Scope and Functionality Summary task. Spans can cover multiple adjacent to-
kens, nest and overlap (Figure 7), but cannot
WebAnno supports the production of linguistically
cross sentence boundaries.
annotated corpora for different natural language
processing applications. WebAnno implements Arc Annotations
ease of usage and simplicity for untrained users,
• Dependency relations: This is an arc annota-
and provides:
tion which connects two POS tag annotations
• Annotation via a fast, and easy-to-use web- with a directed relation (Figure 6).
based user interface. • Co-reference chains: The co-reference chain
is realized as a set of typed mention spans
• Project and user management. linked by typed co-reference relation arcs.
The co-reference relation annotation can
• Progress and quality monitoring.
cross multiple sentences and is represented in
co-reference chains (Figure 7).
• Interactive curation by adjudicating disagree-
ing annotations from multiple users. The brat front end supports tokens and sub-
tokens as a span annotation. However, tokens are
• Crowdsourcing of annotation tasks. currently the minimal annotation units in Web-
Anno, due to a requirement of supporting the TCF
• Configurable annotation types and tag sets. file format (Heid et al., 2010). Part-of-speech an-
notation is limited to singles token, while named
5 Use Cases entity and co-reference chain annotations may
WebAnno currently allows to configure different span multiple tokens. Dependency relations are
span and arc annotations. It comes pre-configured implemented in such a way that the arc is drawn
with the following annotation layers from the from the governor to the dependent (or the other
DKPro Core type system: way around, configurable), while co-reference
chains are unidirectional and a chain is formed by
Span annotations referents that are transitively connected by arcs.
Based on common practice in manual annota-
• Part-of-Speech (POS) tags: an annotation tion, every user works on their own copy of the
task on tokens. Currently, POS can be added same document so that no concurrent editing oc-
to a token, if not already present, and can be curs. We also found that displaying all annotation
modified. POS annotation is a prerequisite of layers at the same time is inconvenient for anno-
dependency annotation (Figure 6). tators. This is why WebAnno supports showing

5
and hiding of individual annotation layers. The Kalina Bontcheva, Hamish Cunningham, Ian Roberts, and
WebAnno curation component displays all anno- Valentin Tablan. 2010. Web-based collaborative corpus
annotation: Requirements and a framework implementa-
tation documents from all users for a given source tion. In New Challenges for NLP Frameworks workshop
document, enabling the curator to visualize all of at LREC-2010, Malta.
the annotations with differences at a time. Unlike Jean Carletta. 1996. Assessing agreement on classification
most of the annotation tools which rely on config- tasks: the kappa statistic. In Computational Linguistics,
uration files, WebAnno enables to freely configure Volume 22 Issue 2, pages 249–254.
all parameters directly in the browser. Richard Eckart de Castilho and Iryna Gurevych. 2009.
DKPro-UGD: A Flexible Data-Cleansing Approach to
6 Conclusion and Outlook Processing User-Generated Discourse. In Online-
proceedings of the First French-speaking meeting around
WebAnno is a new web-based linguistic annota- the framework Apache UIMA, LINA CNRS UMR 6241 -
tion tool. The brat annotation and GUI front end University of Nantes, France.
have been enhanced to support rapidly process- David Ferrucci and Adam Lally. 2004. UIMA: An Architec-
ing large annotation documents, configuring the tural Approach to Unstructured Information Processing in
the Corporate Research Environment. In Journal of Natu-
annotation tag and tagsets in the browser, speci- ral Language Engineering 2004, pages 327–348.
fying visible annotation layers, separating anno-
tation documents per user, just to name the most Ulrich Heid, Helmut Schmid, Kerstin Eckart, and Erhard
Hinrichs. 2010. A Corpus Representation Format for
important distinctions. Besides, WebAnno sup- Linguistic Web Services: the D-SPIN Text Corpus Format
ports project definition, import/export of tag and and its Relationship with ISO Standards. In Proceedings
of LREC 2010, Malta.
tagsets. Flexible support for importing and ex-
porting different data formats is handled through Boci Lin, Yan Chen, Xu Chen, and Yingying Yu. 2012.
UIMA components from the DKPro Core project. Comparison between JSON and XML in Applications
Based on AJAX. In Computer Science & Service System
The monitoring component of WebAnno helps the (CSSS), 2012, Nanjing, China.
administrator to control the progress of annotators.
Thomas Morton and Jeremy LaCivita. 2003. WordFreak:
The crowdsourcing component of WebAnno pro- an open tool for linguistic annotation. In Proceedings of
vides a unique functionality to distribute the an- NAACL-2003, NAACL-Demonstrations ’03, pages 17–18,
notation to a large workforce and automatically Edmonton, Canada.
integrate the results back into the tool via the Christoph Müller and Michael Strube. 2006. Multi-level an-
crowdsourcing server. The WebAnno annotation notation of linguistic data with MMAX2. In S. Braun,
tool supports curation of different annotation doc- K. Kohn, and J. Mukherjee, editors, Corpus Technology
and Language Pedagogy: New Resources, New Tools,
uments, displaying annotation documents created New Methods, pages 197–214. Peter Lang, Frankfurt a.M.,
by users in a given project with annotation dis- Germany.
agreements. In future work, WebAnno will be en- Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald,
hanced to support several other front ends to han- Jens Nilsson, Sebastian Riedel, and Deniz Yuret. 2007.
dle even more annotation layers, and to provide The CoNLL 2007 Shared Task on Dependency Parsing.
In Proceedings of the CoNLL Shared Task Session of
more crowdsourcing templates. Another planned EMNLP-CoNLL 2007, pages 915–932, Prague, Czech Re-
extension is a more seamless integration of lan- public.
guage processing tools for pre-annotation.
Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko
Ohta, Sophia Ananiadou, and Jun’ichi Tsujii. 2012. brat:
Acknowledgments a Web-based Tool for NLP-Assisted Text Annotation. In
We would like to thank Benjamin Milde and Andreas Proceedings of the Demonstrations at EACL-2012, Avi-
Straninger, who assisted in implementing WebAnno, as well gnon, France.
as Marc Reznicek, Nils Reiter and the whole CLARIN-D F-
Qingling Wang, Qin Liu, Na Li, and Yan Liu. 2008. An
AG 7 for testing and providing valuable feedback. The work
Automatic Approach to Reengineering Common Website
presented in this paper was funded by a German BMBF grant
with AJAX. In 4th International Conference on Next Gen-
to the CLARIN-D project, the Hessian LOEWE research ex-
eration Web Services Practices, pages 185–190, Seoul,
cellence program as part of the research center “Digital Hu-
South Korea.
manities” and by the Volkswagen Foundation as part of the
Lichtenberg-Professorship Program under grant No. I/82806. Aobo Wang, Cong Duy Vu Hoang, and Min-Yen Kan. 2013.
Perspectives on Crowdsourcing Annotations for Natural
Language Processing. In Language Resources And Eval-
References uation, pages 9–31. Springer Netherlands.
Chris Biemann. 2013. Creating a system for lexical substi-
tutions from scratch using crowdsourcing. Lang. Resour.
Eval., 47(1):97–122, March.

You might also like