Appen
Appen
Table of Contents
Introduction ............................................................................................................................ 1
Project Overview .................................................................................................................... 1
Annotation & Transcription Correction Guidelines ...................................................................... 2
A. Terminology ................................................................................................................ 2
B. Requirements ............................................................................................................. 2
C. General Guidelines ..................................................................................................... 2
D. Rejection Reasons ..................................................................................................... 2
E. Detailed Guidelines .................................................................................................... 3
User Interface: ........................................................................................................................ 1
Introduction
These project guidelines contain comprehensive information about transcription and annotation
for project Anagram. Annotators are requested to read all the topics in detail before working on
this project.
Project Overview
In this project, annotators will be required to:
- Correct transcribed source text
- An audio file will contain an excerpt of speech. A transcription will
accompany this excerpt. Annotators will be required to review and correct
the transcription so that it matches the excerpt of speech, with exceptions
for technical issues detailed later in the guidelines.
- Add Audio labels
- Answer a Yes/No question regarding the quality and nature of the audio
clip
- Check a box if there are background noises or if speech is cut off in the
audio clip
Annotation & Transcription Correction Guidelines
A. Terminology
a. Source audio file we are interested in annotating.
b. Labels the characteristics of the source.
c. Transcribe the act of assigning words to speech audio producing text.
d. Post refers to the unit consisting of source audio files and source transcribed
text.
e. List of languages refers to the languages we are interested in annotating, this
list is comprised of: English, Spanish, French, German, Italian, Mandarin.
B. Requirements
a. Annotators need to be at a minimum native in the assigned non-English
languages, aware of the cultures of such languages, and understand the
nuances of social media language.
b. Annotators should be able to correct incorrect transcriptions and assign labels to
the audio files provided.
C. General Guidelines
a. Listen to the audio carefully: note the words that you hear and any background
noise if present.
i. If background noise was present, select the relevant labels.
b. Fix a source transcription if it deviates from the audio, see the detailed
guidelines for transcription considerations
i. If no corrections are necessary, ensure that the exact provided text is
submitted. Do NOT leave a blank text box, a substitution character (such
as -), or a message stating that no correction is needed.
D. Rejection Reasons
Reject a job for the following reasons:
1. Source has more than one intelligible speaker: More than 50% of the audio contains
two or more speakers speaking at the same time. This includes musical lyrics being
sung over someone speaking.
2. Source contains more than 10% in a different language: Audio is in a language other
than that of the queue you are transcribing or the user’s accent is such that you are
unable to understand the utterance (this includes “child speak”)
3. Source has more than 75% unintelligible speech
4. Source has no speech to transcribe: Ensure the entire audio clip has no speech
before rejecting.
E. Detailed Guidelines
1. Instructions
a. Listen to the audio carefully and note the words that you hear.
b. Edit transcript. If the speech and provided transcript do not match, please
edit the transcript so that they do match. For unintelligible words on jobs that
don’t fit the rejection criteria, please type [x] in its place. If audio is in a
different language, but does not meet the rejection criteria, type [c] in place of
the different language. See guidelines for further details.
3. Transcription Considerations
● Do not ignore any meaningful information that was present in the source.
● If unsure of how to spell a word, check Merriam-Webster dictionary.
● Spell out all words and numbers exactly as they sound in the audio file
a. Do NOT use any symbols (+ - : $ & @ §# etc.) to represent a
spoken word.
b. Time: should be transcribed to the equivalent form in the
source language.
● URLs and email addresses
a. In the case the audio contains a URL or email address, do not
separate elements as they are spoken and provide their symbol
equivalent in the language spoken. For example, the
transcription for the speech “www dot facebook dot com” would be
transcribed as “www.facebook.com”; another example, for the
speech “<email username> at meta dot com” would be transcribed
as “<email username>@meta.com”