Bengali Guidelines
Bengali Guidelines
1. Transcriptions MUST end with a period (।) or question mark (?). No other punctuation
items or symbols (e.g., commas, exclamation marks, etc.) are allowed within a
transcription. For example:
Incorrect: যাইেহাক, িতিন শষপয চেল যাওয়ারই িস া িনেলন। => The comma (,) in the middle
is not allowed.
Correct: যাইেহাক িতিন শষপয চেল যাওয়ারই িস া িনেলন।
NB: Only items that are required for correct spelling are acceptable:
Apostrophes for correct spelling are required. E.g., when we say ই as ', ছয় as
ছ' ( ই হাজার becomes ' হাজার, ছয় হজার becomes ছ' হাজার in regular conversation) are
acceptable.
In Bengali, hyphens are used as a connector. In some scenarios, where there
is a need to join a Bengali suffix with any English initialisms or acronyms, the
hyphen should be used.
Example 1: আইিপএল-এর সময়।
Example 2: আইিপএল-এ।
2. A transcription might include more than two sentences, in these instances, sentences
are separated by a space only. For example:
Incorrect: দা ণ! িতিন এটা সামেল িনেত পারেবন। আমরা কান ধরেনর তথ খুঁজিছ?
Correct: দা ণ িতিন এটা সামেল িনেত পারেবন আমরা কান ধরেনর তথ খুঁজিছ?
3. Write out all punctuation that would be spoken as words. In general scenario
only, .com would be transcribed as ডট কম (e.g., এডু ডট কম।), % would be written in word
as spoken in the audio. (E.g., পাঁচ শতাংশ।)
7. Write out all numbers as words, as they are spoken. For example: 2021 can be
spoken in Bengali as ' হাজার এ শ or ই হাজার এ শ or ই শূণ ই এক; ১০,০০০ - দশ হাজার; ১৩ ফা ন-
তেরা ফা ন, etc.
9. Transcribe all repetitions as you hear them but DO NOT transcribe any false starts.
Example 1: গতকাল আিম ওেক ফান কেরিছলাম আিম ওেক ফান কেরিছলাম।
Example 2: না না না আিম তােক িকছু ই বিলিন।
10. Use the [foreign] tag to denote words/phrases that are spoken in a foreign language.
DO NOT transcribe foreign content, even if you know the language. For example:
জনাথেনর age হেয়িছল একেশা আিশ বছর। should be transcribed জনাথেনর [foreign] হেয়িছল একেশা আিশ বছর।
A foreign tag is only required once for each foreign utterance, for example: আ া ওটা হল
uno, dos, tres, আর cuatro। should be transcribed as:
11. Transcribe the English words in Bengali when they are spoken with Bengali suffixes
like এ, এর, র, য় etc.
For example: আিম bag-এ কের বাজার িনেয় এলাম। should be transcribed as আিম ব ােগ কের বাজার িনেয়
এলাম। => Here the English word 'bag' has been spoken with the Bengali suffix 'এ' and
it's sound like a Bengali word.
Similarly, if you encounter the word 'trip', please use the [foreign] tag, but if you hear
'trip+এ', transcribe it as েপ.
12. Transcribe any grammatical errors as you hear them and DO NOT correct them. E.g.,
স বলেব আর আিম সটা কেরিছ এটা হেত পাের না। grammatically this is incorrect but no need to correct
while transcribing.
Skills Assessment section (Question 11 - 18)
Apply both the guidelines from Transcription guidelines (on the previous page) and the
following guidelines to this section. In case of a conflict between the guidelines from the
Transcription guidelines and the following guidelines, use the following guidelines. The
following guidelines (Skill Assessment) only apply to Question 11 to 18.
2. Transcribe audios with more than one speaker using the following format:
Speaker 1: Transcription
Speaker 2: Transcription
Speaker 1: Transcription etc.
3. Assign a speaker role based on the individuals involved in the audio, considering their
respective roles and the relationship dynamics between them, e.g., a father and son,
or a doctor and patient.
4. Filler words are those words that speakers use to indicate hesitation or fill a pause while
thinking about what to say next. Any filler words must be transcribed consistently. Filler
words like উ , উ , , , অ া , অ া, আ, should be transcribed. All filler words must be
preceded by a hash (#) with no space, e.g., #উ , ও মেন হয় তখন বােস কের গল।
5. Any sensitive personal information (e.g., full names, social security numbers, dates
of birth, addresses) must not be included in the transcription. Please replace sensitive
data in the transcribed text with the following tags: [name], [socialsecurity], [dob],
[address]. DO NOT use those tags if the information relates to a public figure or
business, or if the context of the audio to be transcribed implies that the speaker's
personal information is not private. For example, a person participating in a podcast
would not have an expectation of privacy when their name is used.
NB: highly sensitive information, such as social security numbers, should always be
tagged. For example, Mukesh Ambani, Salman Khan are famous celebrities. It's okay
to transcribe their names. But common people name should not be transcribed. e,g.
হ াঁ, আিম [name] বলিছ। আমার আধার কাড না ারটা িলেখ িনন, [socialsecurity]। জ তািরখ হল [dob]।
6. Place any non-linguistic events in the transcription where they occur, using these tags
and a space between other words in the text: [noise], [music]. E.g., আমরা খু বই মজা করিছলাম
[noise] এক িমিনট, কউ দরজায় বল বাজােলা।
7. Use the [overlapping] tag to indicate moments where speakers' speech overlaps.
This applies to both interruptions and unintentional simultaneous talking. Split the
transcription at the word where the overlap begins.
For example, in the below scenario, the end of Speaker 1's speech is overlapping with
the start of Speaker 2's speech:
Speaker 1: নম ার! আপিন কমন আেছন? আিম গতকাল আপনার পাঠােনা (কিবতার বই হােত পেয়িছ।)
Speaker 2: (আিম ভােলা আিছ। আপিন কমন আেছন?)
Speaker 1: হ াঁ, আিমও ভােলা আিছ।
8. A timestamp should appear at the start of each segment (this means that overlapping
segments also need timestamps). They should be placed before any other information
or transcription.
For example, in the audio scenario below, a timestamp would be assigned to each
speaker turn and overlapping segment.
[00:00] Speaker 1: নম ার! আপনার নামটা জানেত ( পাির?)
[00:01] Speaker 2: (নম ার!) আপিন ক বলেছন?
[00:03] Speaker 1: আিম িদশা বলিছ।
NB: When there is 2 seconds or more of silence in the audio, because a speaker
pauses or there is a break in speaking, the silence should be split into a new segment
with its own timestamp added. Note that silent segments only include a timestamp and
the speaker role [Silence].
E.g.,
[00:01] Speaker 1: [music] এই গানটা আমার একদম ভােলা লােগ না।
[00:02] [Silence]
[00:05] Speaker 1: এত জাের গান বাজােনার েয়াজন কী? একটু আে বাজােত পাের!