Webanno: A Flexible, Web-Based and Visually Supported System For Distributed Annotations
Webanno: A Flexible, Web-Based and Visually Supported System For Distributed Annotations
1
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 1–6,
c
Sofia, Bulgaria, August 4-9 2013.
2013 Association for Computational Linguistics
of WebAnno is presented in Section 4. Section 5
elaborates on several use cases of WebAnno, and
Section 6 concludes and gives an outlook to fur-
ther directions.
2 Related Work
GATE Teamware (Bontcheva et al., 2010) is prob-
ably the tool that closely matches our requirements
regarding quality management, annotator manage-
ment, and support of a large set of annotation lay-
ers and formats. It is mostly web-based, but the Figure 1: System architecture, organized in user,
annotation is carried out with locally downloaded front end, back end and persistent data storage.
software. An interface to crowdsourcing platforms
is missing. The GATE Teamware system is heav-
which is mirrored in its open-source implementa-
ily targeted towards template-based information
tion1 , makes it possible to easily extend the tool or
extraction. It sets a focus on the integration of au-
add alternative user interfaces for annotation lay-
tomatic annotation components rather than on the
ers that brat is less suited for, e.g. for constituent
interface for manual annotation. Besides, the over-
structure. In Section 3.1, we illustrate how differ-
all application is rather complex for average users,
ent user roles are provided with different graphical
requires considerable training and does not offer
user interfaces, and show the expressiveness of the
an alternative simplified interface as it would be
annotation model. Section 3.2 elaborates on the
required for crowdsourcing.
functionality of the back end, and describes how
General-purpose annotation tools like MMAX2
data is imported and exported, as well as our im-
(Müller and Strube, 2006) or WordFreak (Morton
plementation of the persistent data storage.
and LaCivita, 2003) are not web-based and do not
provide annotation project management. They are
also not sufficiently flexible regarding different an- 3.1 Front End
notation layers. The same holds for specialized All functionality of WebAnno is accessible via
tools for single annotation layers, which we can- a web browser. For annotation and visualiza-
not list here for the sake of brevity. tion of annotated documents, we adapted the brat
With the brat rapid annotation tool (Stenetorp rapid annotation tool. Changes had to be made to
et al., 2012), for the first time a web-based open- make brat interoperate with the Apache Wicket,
source annotation tool was introduced, which sup- on which WebAnno is built, and to better integrate
ports collaborative annotation for multiple anno- into the WebAnno experience.
tation layers simultaneously on a single copy of
the document, and is based on a client-server ar-
3.1.1 Project Definition
chitecture. However, the current version of brat
has limitations such as: (i) slowness for docu- The definition and the monitoring of an annota-
ments of more than 100 sentences, (ii) limits re- tion project is conducted by a project manager (cf.
garding file formats, (iii) web-based configuration Figure 1) in a project definition form. It supports
of tagsets/tags is not possible and (iv) configuring creating a project, loading un-annotated or pre-
the display of multiple layers is not yet supported. annotated documents in different formats2 , adding
While we use brat’s excellent visualization front annotator and curator users, defining tagsets, and
end in WebAnno, we decided to replace the server configuring the annotation layers. Only a project
layer to support the user and quality management, manager can administer a project. Figure 2 illus-
and monitoring tools as well as to add the interface trates the project definition page with the tagset
to crowdsourcing. editor highlighted.
2
Figure 2: The tagset editor on the project definition page
3
Figure 3: Curation user interface (left: sentences
with disagreement; right: merging editor) Figure 4: Project monitoring
4
Figure 6: Parts-of-speech & dependency relations
5
and hiding of individual annotation layers. The Kalina Bontcheva, Hamish Cunningham, Ian Roberts, and
WebAnno curation component displays all anno- Valentin Tablan. 2010. Web-based collaborative corpus
annotation: Requirements and a framework implementa-
tation documents from all users for a given source tion. In New Challenges for NLP Frameworks workshop
document, enabling the curator to visualize all of at LREC-2010, Malta.
the annotations with differences at a time. Unlike Jean Carletta. 1996. Assessing agreement on classification
most of the annotation tools which rely on config- tasks: the kappa statistic. In Computational Linguistics,
uration files, WebAnno enables to freely configure Volume 22 Issue 2, pages 249–254.
all parameters directly in the browser. Richard Eckart de Castilho and Iryna Gurevych. 2009.
DKPro-UGD: A Flexible Data-Cleansing Approach to
6 Conclusion and Outlook Processing User-Generated Discourse. In Online-
proceedings of the First French-speaking meeting around
WebAnno is a new web-based linguistic annota- the framework Apache UIMA, LINA CNRS UMR 6241 -
tion tool. The brat annotation and GUI front end University of Nantes, France.
have been enhanced to support rapidly process- David Ferrucci and Adam Lally. 2004. UIMA: An Architec-
ing large annotation documents, configuring the tural Approach to Unstructured Information Processing in
the Corporate Research Environment. In Journal of Natu-
annotation tag and tagsets in the browser, speci- ral Language Engineering 2004, pages 327–348.
fying visible annotation layers, separating anno-
tation documents per user, just to name the most Ulrich Heid, Helmut Schmid, Kerstin Eckart, and Erhard
Hinrichs. 2010. A Corpus Representation Format for
important distinctions. Besides, WebAnno sup- Linguistic Web Services: the D-SPIN Text Corpus Format
ports project definition, import/export of tag and and its Relationship with ISO Standards. In Proceedings
of LREC 2010, Malta.
tagsets. Flexible support for importing and ex-
porting different data formats is handled through Boci Lin, Yan Chen, Xu Chen, and Yingying Yu. 2012.
UIMA components from the DKPro Core project. Comparison between JSON and XML in Applications
Based on AJAX. In Computer Science & Service System
The monitoring component of WebAnno helps the (CSSS), 2012, Nanjing, China.
administrator to control the progress of annotators.
Thomas Morton and Jeremy LaCivita. 2003. WordFreak:
The crowdsourcing component of WebAnno pro- an open tool for linguistic annotation. In Proceedings of
vides a unique functionality to distribute the an- NAACL-2003, NAACL-Demonstrations ’03, pages 17–18,
notation to a large workforce and automatically Edmonton, Canada.
integrate the results back into the tool via the Christoph Müller and Michael Strube. 2006. Multi-level an-
crowdsourcing server. The WebAnno annotation notation of linguistic data with MMAX2. In S. Braun,
tool supports curation of different annotation doc- K. Kohn, and J. Mukherjee, editors, Corpus Technology
and Language Pedagogy: New Resources, New Tools,
uments, displaying annotation documents created New Methods, pages 197–214. Peter Lang, Frankfurt a.M.,
by users in a given project with annotation dis- Germany.
agreements. In future work, WebAnno will be en- Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald,
hanced to support several other front ends to han- Jens Nilsson, Sebastian Riedel, and Deniz Yuret. 2007.
dle even more annotation layers, and to provide The CoNLL 2007 Shared Task on Dependency Parsing.
In Proceedings of the CoNLL Shared Task Session of
more crowdsourcing templates. Another planned EMNLP-CoNLL 2007, pages 915–932, Prague, Czech Re-
extension is a more seamless integration of lan- public.
guage processing tools for pre-annotation.
Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko
Ohta, Sophia Ananiadou, and Jun’ichi Tsujii. 2012. brat:
Acknowledgments a Web-based Tool for NLP-Assisted Text Annotation. In
We would like to thank Benjamin Milde and Andreas Proceedings of the Demonstrations at EACL-2012, Avi-
Straninger, who assisted in implementing WebAnno, as well gnon, France.
as Marc Reznicek, Nils Reiter and the whole CLARIN-D F-
Qingling Wang, Qin Liu, Na Li, and Yan Liu. 2008. An
AG 7 for testing and providing valuable feedback. The work
Automatic Approach to Reengineering Common Website
presented in this paper was funded by a German BMBF grant
with AJAX. In 4th International Conference on Next Gen-
to the CLARIN-D project, the Hessian LOEWE research ex-
eration Web Services Practices, pages 185–190, Seoul,
cellence program as part of the research center “Digital Hu-
South Korea.
manities” and by the Volkswagen Foundation as part of the
Lichtenberg-Professorship Program under grant No. I/82806. Aobo Wang, Cong Duy Vu Hoang, and Min-Yen Kan. 2013.
Perspectives on Crowdsourcing Annotations for Natural
Language Processing. In Language Resources And Eval-
References uation, pages 9–31. Springer Netherlands.
Chris Biemann. 2013. Creating a system for lexical substi-
tutions from scratch using crowdsourcing. Lang. Resour.
Eval., 47(1):97–122, March.