Pre-Test Quick Guide
Pre-Test Quick Guide
Turns are the transcription boxes created for whenever speeches are heard to be transcribed within those
boxes. The start time and end time of those boxes/ turns should be created precisely according to the
time the speech starts and ends.
500 ms
If a speaker stops talking for 500ms (0.5 seconds) or more, please end the speech turn where s/he
pauses and create a new turn when s/he resumes his/ her speech.
30 sec
Speech turns can never exceed 30 seconds per turn, not even 1 more millisecond. This rule doesn’t apply
to annotations.
Labels/ Annotations
- Labels/ annotations are any non-speech sound event whether it’s background or foreground sound. E.g.,
sneezing, coughing, music, knocking, Cheering, crying, screaming...etc.
-Labels have no time limit, but cannot we overlap two annotations of the same kind
-The 500ms rule applies to annotations too!
Common label types:
Unintelligible
Spoken words that can’t be heard clearly
Foreign speech
Any different language/ locale.
PII
All Names, addresses, phone numbers, Email addresses, Payment card numbers, passwords should NOT
be transcribed. Instead, please label them as PII.
Globally known people are an exception as well as phone numbers and addresses of companies.
Singing
All lyrics of the target locale must be transcribed and labeled as singing
Noise
Any background for foreground noise of sorts.
Beginning & Ending of Turns & Labels
Beginnings and endings of speaker turns and annotations must be accurate at all times. Careful not to
cut any sounds while creating the start and end time of both labels and turns
Speech overlapping
when more than one speaker talk at the same time; separate turns under separate speakers should be
created.
Overlapping speech Turns of the Same Speaker
Same speaker can't have two turns intersecting at any point
Speakers Numbering
Numbered speakers should be used only when you do not know the name of the speaker. For example,
the first time you identify a speaker, they should be labeled as “speaker 1”. Do Not c
apitalize the word
“speaker”
Speaker Naming
Only use the speaker Name option if the name of the speaker becomes known at some point in the
audio. E.g., speaker Mary, speaker Bob, s
peaker David. Use first and last names if available. E.g.,
‘speaker David Jones’
Unidentifiable Speaker
This is used when you cannot determine which speaker is talking. It’s common that you might hear the
same speech by many speakers and fail to determine who is talking and who’s not. This speech should
be placed under “unidentifiable speaker”
Important: Each task should have o nly one unidentifiable speaker for all the speakers you cannot
determine.
Pre recorded speaker
It can be a recording within the record, a device talking, or background singing.
Example1: live speaker with background singing. T
he singing lyrics should
be assigned to a pre recorded speaker.
Example 2: Elevator’s recorded floor announcement.
Important: pre recorded speakers shouldn’t be named.
Interjections
Refer to your official dictionary (Equivalent to Merriam Webster) and transcribe the common interjections
and slang confirmations you hear such as: huh, woah, okay, yep, uh-huh, mhm, nah
If a very small part of a word (at most one syllable) has been cut off, and you know what the word is
supposed to be, transcribe the entire word. If you are not sure what the word should be, do not transcribe
the word at all. Do not put punctuation after words that have been cut off.
Numbers:
Cardinals and ordinals from 0 to 9 should be written in letters (except for measures and currency).
Use digits for 10 and above. E.g.: I have s
ix dogs and 1
2 cats.
In math expressions or units & measures, transcribe fraction words using numerals and slashes. Be
careful not to use pre-combined fractions like "1⁄4".
Correct example: In 3/4 of a mi, turn right.