0% found this document useful (0 votes)
107 views10 pages

System Guidlines PDF

This document provides guidance for transcribing audio clips from the Long Transcription system. It outlines key rules for segmenting speaker turns and annotating other audio events. Speaker segments must be labeled and follow rules including allowing a 100ms buffer at segment start/end, splitting segments that exceed 30 seconds, and splitting segments with over 500ms of silence. Annotation types like noise, music and laughter are also defined. Guidelines for applying rules like giving a 1ms gap between segments split by the 30 second rule are provided. A linked video guide covers topics such as the 100ms rule in more detail.

Uploaded by

Ser Hee Paung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views10 pages

System Guidlines PDF

This document provides guidance for transcribing audio clips from the Long Transcription system. It outlines key rules for segmenting speaker turns and annotating other audio events. Speaker segments must be labeled and follow rules including allowing a 100ms buffer at segment start/end, splitting segments that exceed 30 seconds, and splitting segments with over 500ms of silence. Annotation types like noise, music and laughter are also defined. Guidelines for applying rules like giving a 1ms gap between segments split by the 30 second rule are provided. A linked video guide covers topics such as the 100ms rule in more detail.

Uploaded by

Ser Hee Paung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Introduction

1. What is Long Transcription LT?


It is a system which contains longer conversation audio clips, either one-sided or
including multiple speakers which must be transcribed as per the rules and regulations of
Guideline along with the Instruction manual.

You will be listening to the dialogue that will likely contain multiple speakers. Your job is
to identify and mark when each speaker is speaking and segment the corresponding
audio.

Some of the audio will contain background noise, background music, and ringtones; this
must be marked too.

2. What are the key points I need to remember to work on


the project?
The key points are as follows ​(Please refer to FAQs/video guide below for details)​:
1. You will need to create speaker segments (box) and annotations following the
rules of the system.
2. Speaker types are commonly categorized into three:
● Normal Speakers (Labelled as speaker 1, speaker 2, and so on)
● Unidentifiable Speakers (Labelled as unidentifiable speaker)
● Pre-Recorded Speakers (Labelled as pre-recorded speaker 1, and so on)
● Maximum number of speakers allowed in one task is 20 (also includes
unidentifiable and pre recorded speakers). If more speakers occur in the
task, please stop the task as the 21st speaker is introduced.

3. Annotation types are commonly categorized into following:


● Noise (Labelled for noises and background unintelligible speech of
multiple users)
● Music (Labelled for music)
● DTMF (Dial Tone Multi Frequency- used when you hear a key tone of
phone) (For example: You hear “beep” sound of key when you are in a
phone call)
● Applause (Labelled when crowd applauses/claps/cheers)
● Foreign Speech (Labelled when foreign speech occurs rather than the
language you are working on)
● Laughter (Labelled when laughter is heard)
● RingTone (Labelled when incoming or outgoing ringtone is heard)
● Singing (Labelled when speaker is singing. *Note that songs should also
be labelled under “pre recorded speaker” and annotated under “singing”.)
● Unintelligible (Labelled when transcriber cannot transcribe the word
heard)
● PII (Labelled when transcriber finds any personal information)

**Do not focus on the above two points, work on these points only if
confident of unintelligible and PII parts. (Please refer to the prefill text for
gaining confidence.)

4. 100 ms segment rule


This rule indicates that you will need to keep extra 85-100ms segment at start
and end of each segment where possible to avoid cutting off words outside
segments

5. 30 second split rule


This rule indicates that no speaker segment should exceed 30 second. If a
speaker speaks continuously for more than 30 seconds, a separate speaker
segment needs to be created for speech after 30 seconds, and so on.

6. 500 ms split rule


This rule indicates that if there is a gap of 0.5 second between words, separate
segments need to be created for those two words.

7. 20 speaker rule
This rule indicates that if a task contains more than 20 speakers, it cannot be
continued any further. So, the task needs to be stopped when the 21st speaker is
introduced.

**Note that 30 seconds, 100 ms, and 500 ms rule does not apply to annotations.
FAQ for LT system
1. How much gap should be given at the beginning/end of the
segment?
The segment beginning/end must contain a 100MS gap. Please be sure that the
beginning and end of each segment contains a 100MS gap at most. Do not exceed
100MS. It can lay somewhere in between 85-100MS, but should not exceed 100MS. Be
more precise on this rule in every created segment and note to apply the 100MS rule
accordingly in each created segment. Do not use this for annotations.

Here, the speaker has started speaking from 00:00.090 and the segment has begun
from 0:00.000 which lies in between 85-100MS. Please use this rule at the segmentation
end as well.
Here, the speaker stops speaking at 00:19.202 thus the segment ends at 00:19.299,
which is approximately a 100MS gap.

2. Can we overlap two segments?


Yes, we can overlap two segments but we cannot overlap the same speaker segment.
We can overlap different speakers like: speaker 1 and speaker 2 can overlap each other.
Speaker 1 and speaker 1 cannot overlap each other.

3. In which case should we split the segment?


0.5-sec pause between two utterances must be split into two segments. Whenever the
speaker pauses for 0.5 sec/500MS between the previous and next utterance, then the
segment must be split into two parts.
Here the speaker stops speaking at 00:03.219 and again begins speaking at 00:04.080
which clearly has a gap/pause of greater than 0.5 seconds or 500MS. Thus, in these
cases, two separate segments must be created.

4. How to use annotations?


Be precise on the timestamp of the annotation. Annotation must run only up to the
required mark, not more or not less.
Make sure that the created annotation covers the intended sound of the audio. Be
precise on the use of annotation, it should not contain 100MS gap as per the
segmentation rules.

5. Can we overlap annotations?


Yes, we can overlap annotations but if it’s the same annotation like music overlapping
music, we cannot overlap it.

1 music annotation can be used for different music sounds at the same time.

6. Are sound effects in audio a noise?


Yes, sounds effects in audio should be annotated as noise. Sounds effects like used in
cartoons.

7. Should we include laughter in segments?


We should not include laughter in segments. We should annotate it as laughter.

If the speaker is laughing and speaking at the same time then we should include it in
both segments and annotation.

If the speaker laughs between the conversation, we should annotate it as laughter but
we have to check if the laughter is more than 500ms so that we could split the segment
using 500ms gap rule.

8. Should we treat singing as an annotation or a segment?


We should treat singing as a segment. It falls in the category of annotation but we treat it
as a segment. We apply every segment rule to the singing segment.
9. What is 30 second split rule?
If a segment runs longer than 30 seconds then please split the turn at the 30-second
mark and create a next segment to transcribe after 30 seconds. Create new segments
each at the 30-second mark. Make sure if a word lies at the 30-second mark then omit
the word at first segment and include it in the next segment so that the word is not split
into two segments.

Here, the speaker speaks from 12:58.004 to 13:28.580 which is greater than
30-seconds. Create a segment upto 30-second mark, i.e. 13:28.004 and create a new
segment from 13:28.005. Thus, at every 30-second mark a new segment must be
created if a single segment intends to run for longer than 30 seconds.

10. How much gap should be given between two segments after
using 30 second split rule?
We should give 1ms gap between the two segments. For example: If one segment stops
at 0:30:579 then another segment should start at 0:30:580.
**Please be cautious that multiple segments of the same speaker must contain at
least a gap of 1MS so that it does not overlap the previous segment. If a segment
ends at 00:03.579 and PII begins, then the start time of PII must be 00:03.580 and
so on.

11. Should we give 100ms gap when using the rule of 30 second
split?

If the speaker is speaking for exactly 2 minutes, we should create 4 segments every 30
second.
**Here is the beginning of the segment. We should give 100ms gap as per the rule.

**In this screenshot we split the segments and here we should not give 100ms gap. If we
give 100ms gap from one of the segments then we would overlap each other. So, we
just give 1ms gap between two segments because the speaker is speaking continuously
and we don’t want to miss a word the speaker is speaking.
**Here the speaker has stopped the speech. Now we can give 100ms gap at the end of the
segment.

Video Guides for LT system


Please go to the following link for the video guides for the below listed topics.
https://www.youtube.com/playlist?list=PLzicDqm9vmqbzq5a0-5rU9Aoy711PhyiP

Topics covered​:
1. 100ms rule
2. Creating new speaker
3. 30 second rule
4. 500ms rule
5. Labelling the speakers
6. 1ms rule

You might also like