A Comprehensive Dialect Conversion Approach From Chittagonian To Standard Bangla
A Comprehensive Dialect Conversion Approach From Chittagonian To Standard Bangla
net/publication/342467631
CITATION READS
1 66
4 authors:
Some of the authors of this publication are also working on these related projects:
A Comprehensive Dialect Converter using NLP from Chittagonian to Standard Bangla View project
All content following this page was uploaded by Nahid Hossain on 09 April 2022.
Abstract—We present a comprehensive conversion system to rate of 85% [4]. In 2012, G.H. Al-Gaphai et al. worked
convert the Chittagonian dialect to standard Bangla language. with 9386 words and their rule-based approach yielded an
It is a text to text conversion system based on word-to-word accuracy of 77.32% [5]. Hitahm Abo Bakr et al. proposed a
mapping adopting a bilingual dictionary, rule-based morpho-
logical transformation on suffixes, and a supportive word sug- hybrid approach for converting Egyptian colloquial to Modern
gestion module. The system tokenizes the regional input text Standard Arabic with an accuracy of 88% in 2008 [6]. They
and processes the tokens through word-to-word mapping and used tokenization and POS (Parts of Speech) tagging to
morphological transformation using suffix transformation rules improve the performance of their system. Md. Shahnur Azad
if word-to-word mapping fails. We are also introducing an aiding Chowdhury worked on Bangla to English machine translation
tool that generates suggested words for the dialectal input. The
system achieved an accuracy of 94.75% for producing standard using POS tagging [7]. He used Tag Vectors and a set of
Bangla translation from Chittagonian words. It must be noted grammar rules for the conversion process.
that there is no published work on the Chittagonian dialect Our proposed system is the first that provides a compre-
conversion from a computational point of view. We are the hensive solution. We have created a bilingual dictionary as
first ones to have built such a system for Chittagonian dialect to the dataset to map standard Bangla word for Chittagonian
standard Bangla conversion.
word. If the word-to-word mapping fails to give a proper
Keywords—Bangla, Dialect, Chittagonian, Double Metaphone. translation, the system moves to suffix transformation. It
splits each token into a root and a suffix and performs word-
to-word mapping on the root word. We have used POS
I. Introduction tagging to find the proper suffix that fits with the standard
According to many linguists and researchers, dialects are Bangla root word. We have also provided a word suggestion
just a different form of the language, spoken with different module since people might spell the same word differently.
accents and morphemes. A dialect may even have its own We acquired the suggestions by means of Double Metaphone
grammar and sentence rules. Some dialects are rich enough Encoding [8], LCS (Longest Common Subsequence) [9] [10],
to be accepted as a full-fledged language. Chittagonian is one and K-NN (K-Nearest Neighbors) [11]. Double Metaphone
of the principal dialects of Bangla language that is spoken algorithm encodes the input into corresponding English letters,
widely across the south-eastern region as the only means of LCS compares Double Metaphone encodings to determine
communication. It is one of the most intricate dialects for similarity and K-NN finds the closest matches to generate the
the non-native standard Bangla speakers to understand as it is suggestions.
rich with its words and phrases. Activities like establishing Section II describes the proposed system and presents step
deals and finding accommodations prove to be challenging by step explanation of our work along with algorithms. The
from time to time. To cope with this, people are using English experimental results and performance analysis is provided in
and standard Bangla more; as a result, this enriched dialect is section III, section IV concludes the paper with limitations of
losing its speakers day by day. the system and future work.
As we have mentioned earlier, no notable work has been
done yet that deals with the conversion of the Chittagonian II. Proposed Method
dialect. In 2017, Amrita Das presented an in-depth study on In this section, we have incorporated the whole process step
Sylheti grammar which helped us to work with Chittagonian by step in detail.
grammar [1]. Mohammad Azizul Hoque’s 2015 paper on
Chittagonian language describing Chittagonian grammar, word A. Dataset Collection and Corpus Study
pronunciation which helped us with our research [2]. In Chittagonian dialect has a very different set of words than
2015, Arvinder Singh et al. proposed a converter for Punjabi that of standard Bangla. The key part of the converter is
dialects that worked using a rule-based approach and bilingual the dataset. Accuracy and time complexities are immensely
dictionary [3]. In 2014, K Marimuthu et al. provided a dependable on the dataset alone. Chittagonian dialect hardly
method to convert dialectal Tamil text to standard Tamil text has any resources in written format. Although it’s enriched
using Finite State Transducers, which yielded an accuracy in culture and literature, it lacks written texts, especially in a