0% found this document useful (0 votes)

14 views9 pages

Appen

Uploaded by

radisamia1984

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views9 pages

Appen

Uploaded by

radisamia1984

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Annotation Guidelines for Enablement of Reels

Translation and Auto Dubbing

Table of Contents
Introduction ............................................................................................................................ 1
Project Overview .................................................................................................................... 1
Annotation & Transcription Correction Guidelines ...................................................................... 2
A. Terminology ................................................................................................................ 2
B. Requirements ............................................................................................................. 2
C. General Guidelines ..................................................................................................... 2
D. Rejection Reasons ..................................................................................................... 2
E. Detailed Guidelines .................................................................................................... 3
User Interface: ........................................................................................................................ 1

Introduction
These project guidelines contain comprehensive information about transcription and annotation
for project Anagram. Annotators are requested to read all the topics in detail before working on
this project.

Project Overview
In this project, annotators will be required to:
- Correct transcribed source text
- An audio file will contain an excerpt of speech. A transcription will
accompany this excerpt. Annotators will be required to review and correct
the transcription so that it matches the excerpt of speech, with exceptions
for technical issues detailed later in the guidelines.
- Add Audio labels
- Answer a Yes/No question regarding the quality and nature of the audio
clip
- Check a box if there are background noises or if speech is cut off in the
audio clip
Annotation & Transcription Correction Guidelines

A. Terminology
a. Source audio file we are interested in annotating.
b. Labels the characteristics of the source.
c. Transcribe the act of assigning words to speech audio producing text.
d. Post refers to the unit consisting of source audio files and source transcribed
text.
e. List of languages refers to the languages we are interested in annotating, this
list is comprised of: English, Spanish, French, German, Italian, Mandarin.
B. Requirements
a. Annotators need to be at a minimum native in the assigned non-English
languages, aware of the cultures of such languages, and understand the
nuances of social media language.
b. Annotators should be able to correct incorrect transcriptions and assign labels to
the audio files provided.
C. General Guidelines
a. Listen to the audio carefully: note the words that you hear and any background
noise if present.
i. If background noise was present, select the relevant labels.
b. Fix a source transcription if it deviates from the audio, see the detailed
guidelines for transcription considerations
i. If no corrections are necessary, ensure that the exact provided text is
submitted. Do NOT leave a blank text box, a substitution character (such
as -), or a message stating that no correction is needed.

D. Rejection Reasons
Reject a job for the following reasons:

1. Source has more than one intelligible speaker: More than 50% of the audio contains
two or more speakers speaking at the same time. This includes musical lyrics being
sung over someone speaking.
2. Source contains more than 10% in a different language: Audio is in a language other
than that of the queue you are transcribing or the user’s accent is such that you are
unable to understand the utterance (this includes “child speak”)
3. Source has more than 75% unintelligible speech
4. Source has no speech to transcribe: Ensure the entire audio clip has no speech
before rejecting.
E. Detailed Guidelines
1. Instructions
a. Listen to the audio carefully and note the words that you hear.

b. Edit transcript. If the speech and provided transcript do not match, please
edit the transcript so that they do match. For unintelligible words on jobs that
don’t fit the rejection criteria, please type [x] in its place. If audio is in a
different language, but does not meet the rejection criteria, type [c] in place of
the different language. See guidelines for further details.

c. Background noise. If you noticed any background noises while listening,

check the appropriate boxes. No need to re-listen for this step.

d. [Rejection Reasons] In the case there are multiple intelligible speakers,

more than 10% speech that does not match the identified language, the
speech is unintelligible, or the audio contains no speech, please REJECT the
job and select the appropriate rejection reason. You can skip step #2 if
rejecting the job.
Rejection Reasons:

i. [multiple_speakers] Source has more than one intelligible speaker.

ii. [foreign] Source contains more than 10% in a different language.

iii. [unintelligible] Source has more than 75% unintelligible speech.

iv. [no_speech] Source has no speech to translate.

v. [other] Other - Provide a reason for rejecting.

2. Label Assignment Considerations

a. Original audio is only in the expected source language, and it is NOT a mix of
multiple languages.
Original audio has high quality clean speech, i.e. NO background music/
strong noise/ NO distorted or broken speech, etc.
Original audio has NO extremely offensive profanity.
Original audio does NOT have a strong accent and is easy to understand.
Original audio has low gibberish/ incoherent/ content, and has less than 25%
unintelligible content.
● Only mark yes if meets all the criteria above
b. Background Noise Labels:
● Only mark labels that were present in the audio
c. Audio speech cut off:
● Only mark if the provided audio contains speech that has been cut off
1. Case 1: Audio has speech cut off AND it is not reflected in the
provided transcript. For example, an audio clip starts midway
through the speaker saying something and it is not captured in the
provided transcript. Another example, audio clip ends midway
through speaker saying something and it is not in the provided
transcript
2. Case 2: Audio has speech cut off AND has a reflected word in the
provided transcript. For example, audio clip starts midway through
speaker saying “alphabetical” such that only “abetical” is heard
and the provided transcript has “alphabetical”.

3. Transcription Considerations

1.Speech and sounds

Transcribe all speech you hear in the audio:

● Include transcription of:

a. hesitations ("um", "er", "hmm", etc)
b. informal words ("gonna", "wanna", "they’ll", "I’ve", etc)
c. repeated words ("they they was gonna be there")
● Transcribe unintelligible speech by writing [x] for the following reasons,
in general if an audio has a portion you cannot transcribe you should
mark as [x]:
a. The audio is unintelligible for a portion of the audio (muttering,
too quiet, you don’t know what the speaker is saying, too much
background noise etc), if the majority/entire audio is
unintelligible/lacks speech follow guidelines in h.
b. For multiple sounds in sequence that are unintelligible only
write [x] once
c. If audio is in a different language, but does not meet the
rejection criteria in h, mark the speech in a different language
with [c]. If there multiple different language words in sequence,
write [c] only once
d. The audio temporarily cuts off and you cannot discern what is
said. For example, the audio clip starts midway through what
the speaker is saying something and you cannot
understand/discern it, you would transcribe as [x], if you can
tell what is being said please provide the transcription instead.
This is encouraged but not required so no need to spend extra
time trying to understand what is being said. When in doubt just
mark as [x])
e. Be sure to double check that the square brackets are properly
closed off [x].
● Exclude transcription of:
a. laughing, crying, sighs, music, dogs barking, or other noises
(thuds, bangs, closing doors, footsteps, crackles in the audio,
etc) heard in the clip.
b. Stutters: For example if speech audio for the word “call” is
stuttered (ex: “ca-call”) transcribe as the word only and do not
transcribe the stutter (“ca-call” → “call”).
c. Do not expand or replace information, do not add any
explanatory/parenthetical information, definitions, etc.
d. Do not transcribe any echo heard in the audio.

● Do not ignore any meaningful information that was present in the source.
● If unsure of how to spell a word, check Merriam-Webster dictionary.

Note: some words are spelled/spaced differently according to their

usage, some examples are but not limited to:

f. pickup (noun) and pick up (verb)

g. setup (noun) and set up (verb)
h. shutdown (noun) and shut down (verb)
i. standby (noun) and stand by (verb)

2. Numbers and Symbols

● Spell out all words and numbers exactly as they sound in the audio file
a. Do NOT use any symbols (+ - : $ & @ §# etc.) to represent a
spoken word.
b. Time: should be transcribed to the equivalent form in the
source language.
● URLs and email addresses
a. In the case the audio contains a URL or email address, do not
separate elements as they are spoken and provide their symbol
equivalent in the language spoken. For example, the
transcription for the speech “www dot facebook dot com” would be
transcribed as “www.facebook.com”; another example, for the
speech “<email username> at meta dot com” would be transcribed
as “<email username>@meta.com”

3.Punctuation and Capitalization

● Punctuation and capitalization rules are not standard across languages,
apply best practices for punctuation and capitalization for your given
source language.
● In the case the source transcript provided for the audio has punctuation
that does not match the source audio, correct the punctuation to match
source audio.
User Interface:
a. Instructions (in blue) alongside audio and reference completed transcript of the audio (to the right).
b. Transcription Correction Input Box (first section after blue instruction box):
c. Label Assignment Questions (Section that follows the transcription correction input box):

Gotranscript Transcription Guidelines (Adapted For Translation Into Multiple Languages)
No ratings yet
Gotranscript Transcription Guidelines (Adapted For Translation Into Multiple Languages)
9 pages
Transcription Guidelines
100% (1)
Transcription Guidelines
12 pages
Sample For Transcription
No ratings yet
Sample For Transcription
5 pages
Transcription Guidelines - GoTranscript
No ratings yet
Transcription Guidelines - GoTranscript
12 pages
Tanaman Hias
No ratings yet
Tanaman Hias
8 pages
Rev Transcription
100% (2)
Rev Transcription
24 pages
Fanuc LATHE CNC Program Manual Gcodetraining 588
77% (13)
Fanuc LATHE CNC Program Manual Gcodetraining 588
104 pages
Go Transcript Guidelines
No ratings yet
Go Transcript Guidelines
11 pages
Transcription Guidelines
No ratings yet
Transcription Guidelines
13 pages
Transcription Guidelines
No ratings yet
Transcription Guidelines
8 pages
Text Format Descriptions: Full Verbatim
No ratings yet
Text Format Descriptions: Full Verbatim
10 pages
Circlet For Edinburgh
100% (1)
Circlet For Edinburgh
2 pages
Guide For Transcription PDF
0% (1)
Guide For Transcription PDF
11 pages
Go Transcript
No ratings yet
Go Transcript
8 pages
Bart Daily Routine 6TH
50% (2)
Bart Daily Routine 6TH
2 pages
Specification
No ratings yet
Specification
4 pages
Transcription Guide 20171117
No ratings yet
Transcription Guide 20171117
11 pages
GOT
No ratings yet
GOT
13 pages
Indonesia Transcription Guidelines - EN - 0413
No ratings yet
Indonesia Transcription Guidelines - EN - 0413
7 pages
Mendel and Heredity Worksheet
No ratings yet
Mendel and Heredity Worksheet
11 pages
English Proficiency Test For Aviation: Set 33-Pilot
No ratings yet
English Proficiency Test For Aviation: Set 33-Pilot
13 pages
Annotation Project
No ratings yet
Annotation Project
11 pages
Scribie Transcription Guide
No ratings yet
Scribie Transcription Guide
13 pages
Brand Perception of Honda Products
No ratings yet
Brand Perception of Honda Products
64 pages
Paypal Payoneer Paypal Payoneer: Example
No ratings yet
Paypal Payoneer Paypal Payoneer: Example
5 pages
Their Lives Before The Throne S1PDF-1
No ratings yet
Their Lives Before The Throne S1PDF-1
652 pages
Text Annotation Guidelines For Hindi ASR
No ratings yet
Text Annotation Guidelines For Hindi ASR
8 pages
Healing Benefits of Himalayan Pink Salt
No ratings yet
Healing Benefits of Himalayan Pink Salt
4 pages
Rev Transcription Style Guide v3.3
No ratings yet
Rev Transcription Style Guide v3.3
18 pages
Transcription Guidelines en Ver2-9 05291019
No ratings yet
Transcription Guidelines en Ver2-9 05291019
12 pages
Carneros Transcription Guidelines - Updated 20210727
No ratings yet
Carneros Transcription Guidelines - Updated 20210727
29 pages
Chapter 4 Flexural Design - (Part 3)
No ratings yet
Chapter 4 Flexural Design - (Part 3)
37 pages
Casting Words Guidelines
No ratings yet
Casting Words Guidelines
22 pages
Hate Speech, 2016 Report
No ratings yet
Hate Speech, 2016 Report
60 pages
Chapter One Transformer
No ratings yet
Chapter One Transformer
45 pages
What Do We Do?: We Provide Audio Transcription Services, Which Means That We Convert Audio and Video Files Into Text
No ratings yet
What Do We Do?: We Provide Audio Transcription Services, Which Means That We Convert Audio and Video Files Into Text
12 pages
Eura English Transcription Guidelines 2024 - ADAP QF
No ratings yet
Eura English Transcription Guidelines 2024 - ADAP QF
25 pages
Guidelines Transcribing
No ratings yet
Guidelines Transcribing
35 pages
Go Transcript Guidelines
No ratings yet
Go Transcript Guidelines
11 pages
Transcription Requirements AA
No ratings yet
Transcription Requirements AA
11 pages
NARI Phaltan Rural Visit Report
100% (1)
NARI Phaltan Rural Visit Report
3 pages
Gotranscripts Guidelines
No ratings yet
Gotranscripts Guidelines
12 pages
Transcription Guidelines: Last Updated: 05292019
No ratings yet
Transcription Guidelines: Last Updated: 05292019
11 pages
Cat Global Catalog Loctite
100% (1)
Cat Global Catalog Loctite
47 pages
Capacity Planning For Products and Services
No ratings yet
Capacity Planning For Products and Services
31 pages
Rev+Transcription+Style+Guide+3 0
No ratings yet
Rev+Transcription+Style+Guide+3 0
18 pages
Introduction
No ratings yet
Introduction
9 pages
Computer Networks Introduction Computer Networking
No ratings yet
Computer Networks Introduction Computer Networking
8 pages
SAP Material Training
No ratings yet
SAP Material Training
37 pages
SJJ Hindi Transcription
No ratings yet
SJJ Hindi Transcription
9 pages
Gotranscript'S Transcription Guidelines: What Do We Do?
No ratings yet
Gotranscript'S Transcription Guidelines: What Do We Do?
7 pages
Scribe Application - Happy Scribe
No ratings yet
Scribe Application - Happy Scribe
42 pages
Job 2 Guidelines
No ratings yet
Job 2 Guidelines
9 pages
Aragorn Training Document
No ratings yet
Aragorn Training Document
34 pages
Chapter 2 Different Types of Fixtures
No ratings yet
Chapter 2 Different Types of Fixtures
20 pages
Tata Motors
No ratings yet
Tata Motors
38 pages
Tiktok Project Rules: Audio Characteristics
No ratings yet
Tiktok Project Rules: Audio Characteristics
7 pages
Sales Management & Sales Distribution: A Project ON Mumbai Dabawalla'S
No ratings yet
Sales Management & Sales Distribution: A Project ON Mumbai Dabawalla'S
30 pages
Transcription Rules - English Version
No ratings yet
Transcription Rules - English Version
7 pages
Cyber Security Module 1 Lesson 3 Notes
No ratings yet
Cyber Security Module 1 Lesson 3 Notes
20 pages
game 外语视频标注规范
No ratings yet
game 外语视频标注规范
6 pages
Introduction To Transcription
No ratings yet
Introduction To Transcription
8 pages
Shujiajia Audio Transcription & QA
No ratings yet
Shujiajia Audio Transcription & QA
6 pages
LinearAI-DS Mid ch1-6 2021S2 DR - Omar
No ratings yet
LinearAI-DS Mid ch1-6 2021S2 DR - Omar
10 pages
Gujarat (Standard Language) Specification
No ratings yet
Gujarat (Standard Language) Specification
6 pages
Ake ASR Transcription Rule (EN) - Long Audio - V0117
No ratings yet
Ake ASR Transcription Rule (EN) - Long Audio - V0117
5 pages
AP Chemistry Bonding Help Sheet: 2, (Diamond)
No ratings yet
AP Chemistry Bonding Help Sheet: 2, (Diamond)
6 pages
G2 3 1 2HowBearLostHisTail5
No ratings yet
G2 3 1 2HowBearLostHisTail5
15 pages
LOFT System Guidelines
No ratings yet
LOFT System Guidelines
17 pages
Guideline
No ratings yet
Guideline
4 pages
EU Portuguese Natural Conversation Annotation.docx 20240404 170408 ٠٠٠٠
No ratings yet
EU Portuguese Natural Conversation Annotation.docx 20240404 170408 ٠٠٠٠
8 pages
STEP 3 Audio - Transcription - Rules - EN-Final - 0526
No ratings yet
STEP 3 Audio - Transcription - Rules - EN-Final - 0526
13 pages
Data Annotation Guideline
No ratings yet
Data Annotation Guideline
8 pages
Notation: Ae Aeff An
No ratings yet
Notation: Ae Aeff An
4 pages
TCS Bangla Guidelines
No ratings yet
TCS Bangla Guidelines
7 pages
GoTranscript's Transcription Guidelines
No ratings yet
GoTranscript's Transcription Guidelines
9 pages
2020 05 22 - 684496o PDF
No ratings yet
2020 05 22 - 684496o PDF
2 pages
Syllabus MBA542 Fall 2020
No ratings yet
Syllabus MBA542 Fall 2020
3 pages
Casting Words Guidelines
No ratings yet
Casting Words Guidelines
1 page
Transcription
No ratings yet
Transcription
4 pages
PT-BR Transcription rules-0124-EN
No ratings yet
PT-BR Transcription rules-0124-EN
7 pages
User Guide - Colloquial Video Annotation
No ratings yet
User Guide - Colloquial Video Annotation
5 pages
Requirement
No ratings yet
Requirement
6 pages
Ake ASR Transcription Rule (En) - Long Audio
No ratings yet
Ake ASR Transcription Rule (En) - Long Audio
4 pages
9709 s10 QP 32
No ratings yet
9709 s10 QP 32
4 pages
Network Administrator or Configuration Manager or Application de
No ratings yet
Network Administrator or Configuration Manager or Application de
2 pages
Quote For Outstanding Usari SD Fab Scopes Rev1 - 23-01-2025
No ratings yet
Quote For Outstanding Usari SD Fab Scopes Rev1 - 23-01-2025
1 page
TEDxYouth Programme
No ratings yet
TEDxYouth Programme
2 pages
Character Voices: A Workbook for Audiobook Narration: Narrated by the Author, #2
From Everand
Character Voices: A Workbook for Audiobook Narration: Narrated by the Author, #2
Renee Conoulty
5/5 (1)
English / German Phrasebook: Words R Us Bilingual Phrasebooks, #40
From Everand
English / German Phrasebook: Words R Us Bilingual Phrasebooks, #40
John C. Rigdon
No ratings yet

Appen

Uploaded by

Appen

Uploaded by

Annotation Guidelines for Enablement of Reels

Translation and Auto Dubbing

c. Background noise. If you noticed any background noises while listening,

d. [Rejection Reasons] In the case there are multiple intelligible speakers,

i. [multiple_speakers] Source has more than one intelligible speaker.

ii. [foreign] Source contains more than 10% in a different language.

iii. [unintelligible] Source has more than 75% unintelligible speech.

iv. [no_speech] Source has no speech to translate.

v. [other] Other - Provide a reason for rejecting.

2. Label Assignment Considerations

1.Speech and sounds

Transcribe all speech you hear in the audio:

● Include transcription of:

Note: some words are spelled/spaced differently according to their

f. pickup (noun) and pick up (verb)

2. Numbers and Symbols

3.Punctuation and Capitalization

You might also like