0% found this document useful (0 votes)
75 views7 pages

TCS Bangla Guidelines

Uploaded by

Nirban saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views7 pages

TCS Bangla Guidelines

Uploaded by

Nirban saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

I.

Glossary
Term Definition

Speech Clear human voice

Discard Get rid of

Intercept Cut off/ seize

Transcribe To make a written copy of speech

Modal words Ha Ha; Wow; Oh; Aha

Default cut A piece of intercepted audio by default (system)

Current cut The audio that you cut

Overlapping Two or more people speak at the same time


Speech

Homophone Two or more words having the same pronunciation but different meanings
or spelling

Dialect A language used by the people of a specific area/district

Accelerated Increase in speed

II. Annotation Guidelines

1. Audio classes
There are 2 options for audio classes: 【speech】 and 【discard】, here are the definitions:

1. speech:
1. You can select a speech part which is in the language you transcribe, and the speech part is clear
2. Only when you chose speech, you need to transcribe text from the audio
2. discard:
1. The entire audio is not in the language you transcribe;
2. The entire audio is unclear or non-audible speech;
3. The entire audio is songs or non-human speech, which includes melodies, animals' sounds and nature sounds ;
4. The entire audio contains only modal words.
3. Cut speech

3. Text transcribe

III. Added explanation


1. Double space between words is ok
Note:
1. Please double check and make sure that the text aligns with the audio before moving on to the next section.
2. Transcribe what you hear, including ungrammaticalities.
3. Transcriptions must be 100% accurate to the cut speech part.
4. All symbols and numbers in the audio must be transcribed to corresponding words in your language accordingly.

Evaluation : Intercept and Written Error Guide

Scenario 1 : Beginning and the ending of the audio has unclear speech and in the middle a
word was spelt wrong in the text transcription.
Definition of Unclear speech = words spoken to slow cant put together, baby blabbering, low volume, fuzzy as long as it is due
to human speech
Answer : This situation we have intercept error and written error. Both is accepted.

Scenario 2 : Beginning and the ending of the audio has noise and in the middle a word was
spelt wrong in the text transcription.
Definition of Noise = Background music, fireworks, rainfall – Non human speech related
Answer : Written error. We ignore the beginning and end portion of noise

Scenario 3 : Beginning of the audio was not cut accurately causing the word to be half heard.
All the other portion of the audio has no written error.
Explanation : The intercept error has caused the written error. ( Since theaudio was not cut properly, it caused a
misunderstanding)
Answer : Intercept Error only.

Scenario 4 : Beginning and the ending of the audio is complete silence and was not cut off
and in the audio in the middle has a word was spelt wrong/missing word in the text
transcription.
Explanation : This situation has 1 type of error.
Answer : Written error. (Silence can be ignored)

Scenario 5 : Beginning and the ending of the audio is complete silence and was not cut off
and the audio in the middle as no written error.
Answer : This case can be pass
[Labeling] Queue SOP - Evaluation (TH-CS)

1. Production index
1. Productivity Requirement: 150-200 cases/h

2. Components of moderation interface


1. Audio part
2. Labeling index
3. Evaluation
4. Default cut: duration of the original audio, given by the machine (grey area)
5. Current cut: the speech part made by the first verifier.
6. Audio classes:
1. Default option made by the first verifier
2. Speech - clear human speech
3. Discard - audio does not meet ASR speech requirements and needs to be discarded.
7. Text box: transcribe audio of clear human speech into text, default text transcribed by the first verifier.

8. Evaluation conclusion:
1. Pass: [audio classes] option is correct, and the transcription is perfectly aligned with the speech part (current
cut)
2. Written error: written error in transcription
3. Intercept error: the speech cut is wrong
4. Classification error: [audio classes] option is wrong
5. Blank error: the transcript text area is blank
6. Punctuation error: punctuation errors

3. Interface instruction

4. Flow Chart of moderation instruction

Unable to paste block outside Docs

5. Operation steps and instructions


1. Step 1: Listen to the intercepted audio.
2. Step 2: Check the default written text.
3. Step 3: Evaluate corresponding conclusion.
4. Step 4: Submit or Submit and Leave
Dialects:

Words with the same meaning as the Bengali Dialect. Difference between both groups would be method of
pronunciation.
The Bengali Dialect has no accent. Other Dialects with an accent.

This will not be Discarded

Words with the same meaning as the Bengali Dialect but is completely different due to the location and cultural
difference.
These words are pronounced and spelt completely different than compared to the Bengali Dialect

If terms are completely different, then it must be a Discard


IMPORTANT GUIDELINE UPDATES

Date UPDATE STATUS

Dialects with same word but slight pronunciation difference will not be
14-July Effective Today (14-07-2021)
discarded. If complete term is different than standard, then discard

All English words will be written in Bangla script. Even abbreviations, brand
15-July Effective Today (15-07-2021)
names, proper nouns etc. Only Bangla script will be used

For proper nouns, they will remain written in BN as we have alligned & this will
17-July
be treated as BN as well. Meaning we do not need to cut it at the start or end Effective Today (17-07-2021)
of a sentences.

You might also like