Computer Science > Information Retrieval
[Submitted on 17 May 2020]
Title:DEXA: Supporting Non-Expert Annotators with Dynamic Examples from Experts
View PDFAbstract:The success of crowdsourcing based annotation of text corpora depends on ensuring that crowdworkers are sufficiently well-trained to perform the annotation task accurately. To that end, a frequent approach to train annotators is to provide instructions and a few example cases that demonstrate how the task should be performed (referred to as the CONTROL approach). These globally defined "task-level examples", however, (i) often only cover the common cases that are encountered during an annotation task; and (ii) require effort from crowdworkers during the annotation process to find the most relevant example for the currently annotated sample. To overcome these limitations, we propose to support workers in addition to task-level examples, also with "task-instance level" examples that are semantically similar to the currently annotated data sample (referred to as Dynamic Examples for Annotation, DEXA). Such dynamic examples can be retrieved from collections previously labeled by experts, which are usually available as gold standard dataset. We evaluate DEXA on a complex task of annotating participants, interventions, and outcomes (known as PIO) in sentences of medical studies. The dynamic examples are retrieved using BioSent2Vec, an unsupervised semantic sentence similarity method specific to the biomedical domain. Results show that (i) workers of the DEXA approach reach on average much higher agreements (Cohen's Kappa) to experts than workers of the the CONTROL approach (avg. of 0.68 to experts in DEXA vs. 0.40 in CONTROL); (ii) already three per majority voting aggregated annotations of the DEXA approach reach substantial agreements to experts of 0.78/0.75/0.69 for P/I/O (in CONTROL 0.73/0.58/0.46). Finally, (iii) we acquire explicit feedback from workers and show that in the majority of cases (avg. 72%) workers find the dynamic examples useful.
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.