TCS Bangla Guidelines
TCS Bangla Guidelines
Glossary
Term Definition
Homophone Two or more words having the same pronunciation but different meanings
or spelling
1. Audio classes
There are 2 options for audio classes: 【speech】 and 【discard】, here are the definitions:
1. speech:
1. You can select a speech part which is in the language you transcribe, and the speech part is clear
2. Only when you chose speech, you need to transcribe text from the audio
2. discard:
1. The entire audio is not in the language you transcribe;
2. The entire audio is unclear or non-audible speech;
3. The entire audio is songs or non-human speech, which includes melodies, animals' sounds and nature sounds ;
4. The entire audio contains only modal words.
3. Cut speech
3. Text transcribe
Scenario 1 : Beginning and the ending of the audio has unclear speech and in the middle a
word was spelt wrong in the text transcription.
Definition of Unclear speech = words spoken to slow cant put together, baby blabbering, low volume, fuzzy as long as it is due
to human speech
Answer : This situation we have intercept error and written error. Both is accepted.
Scenario 2 : Beginning and the ending of the audio has noise and in the middle a word was
spelt wrong in the text transcription.
Definition of Noise = Background music, fireworks, rainfall – Non human speech related
Answer : Written error. We ignore the beginning and end portion of noise
Scenario 3 : Beginning of the audio was not cut accurately causing the word to be half heard.
All the other portion of the audio has no written error.
Explanation : The intercept error has caused the written error. ( Since theaudio was not cut properly, it caused a
misunderstanding)
Answer : Intercept Error only.
Scenario 4 : Beginning and the ending of the audio is complete silence and was not cut off
and in the audio in the middle has a word was spelt wrong/missing word in the text
transcription.
Explanation : This situation has 1 type of error.
Answer : Written error. (Silence can be ignored)
Scenario 5 : Beginning and the ending of the audio is complete silence and was not cut off
and the audio in the middle as no written error.
Answer : This case can be pass
[Labeling] Queue SOP - Evaluation (TH-CS)
1. Production index
1. Productivity Requirement: 150-200 cases/h
8. Evaluation conclusion:
1. Pass: [audio classes] option is correct, and the transcription is perfectly aligned with the speech part (current
cut)
2. Written error: written error in transcription
3. Intercept error: the speech cut is wrong
4. Classification error: [audio classes] option is wrong
5. Blank error: the transcript text area is blank
6. Punctuation error: punctuation errors
3. Interface instruction
Words with the same meaning as the Bengali Dialect. Difference between both groups would be method of
pronunciation.
The Bengali Dialect has no accent. Other Dialects with an accent.
Words with the same meaning as the Bengali Dialect but is completely different due to the location and cultural
difference.
These words are pronounced and spelt completely different than compared to the Bengali Dialect
Dialects with same word but slight pronunciation difference will not be
14-July Effective Today (14-07-2021)
discarded. If complete term is different than standard, then discard
All English words will be written in Bangla script. Even abbreviations, brand
15-July Effective Today (15-07-2021)
names, proper nouns etc. Only Bangla script will be used
For proper nouns, they will remain written in BN as we have alligned & this will
17-July
be treated as BN as well. Meaning we do not need to cut it at the start or end Effective Today (17-07-2021)
of a sentences.