Proceedings of International Ethical Hacking Conference 2018
Proceedings of International Ethical Hacking Conference 2018
Mohuya Chakraborty
Satyajit Chakrabarti
Valentina Emilia Balas · J. K. Mandal
Editors
Proceedings of
International
Ethical Hacking
Conference 2018
eHaCON 2018, Kolkata, India
Bilingual Machine Translation: English
to Bengali
[email protected]
248 S. Bal et al.
1 Introduction
2 Literature Survey
A good translator system should contain all words and their corresponding trans-
lated words. The main problem of this kind of system is limited available vocabu-
lary. Fuzzy-If-Then-Rule is one of the frequently used methodologies for machine
translation [1]. In the process of translation from one language to another, there are
some challenges like, lack of resources, different tools, pronunciation dictionary,
different language modeling, dialog modeling, content summarization etc. [2]. More
research is required to increase the accuracy rate when translation is done in case of
low resource languages and in the cases where the volume of target language vocab-
ulary is limited [3]. Another approach of language translation is based on the tense
where English sentences can be used as input. This kind of system uses context free
grammars for analyzing the syntactical structure of the input which helps to translate
the sentence and verify the accuracy of the output [4]. Machine translation may be
achieved by deep learning-based neural network (DNN). Memory-augmented neu-
ral network is introduced with this mechanism where the memory structure does not
consist of any phrase [5]. Another method of machine translation is to retrieve by
audio analysis and feature extraction. This kind of process can solve the ambiguity
problem in sentence translation to improve the output [6]. Another approach is used
[email protected]
Bilingual Machine Translation: English to Bengali 249
for translation where values from the New Testament were used as training values. If
the proper resources are not available and the machine is not properly trained, accu-
racy rate will be decreased [7]. Example-based machine translation is found to be
another methodology, used in this case. The problem of this methodology is limited
knowledge base. It makes the system inefficient for translation where low-resourced
language is used [8]. Machine translation is also important for question–answering
sessions. The main problem for this type of system is word ambiguity. By using the
matrix factorization, this can be improved. If there are dynamic question–answering
sessions, large vocabulary and proper learning would be required for accuracy [9].
For speech to test conversion, machine translation is also important. If the speech
is in different language, it is important to have the proper resources for translation.
This kind of system extracts the meaning of input sentence. So, proper decision-
making algorithm and proper training is needed [10, 11]. Deep learning is one of
the important concepts for natural language processing. For language translation, it
is important to choose the right decision. Based on the past experience, training can
be done and system can take the proper decision by using the concept of deep learn-
ing [12]. Sometimes language translation efficiency is reduced when phrase-based
translation is required for long sentences. Sequence of the words in inputted language
may differ with output language. So, rule-based system is required to improve the
translation quality [13]. If there is any complex sentence, tree-based method can be
applied for simplification. So, the splitting and decision making should be proper for
accurate language translation [14]. If there is any sentence with complicated struc-
ture, the parse tree may not be created properly. So, it is very important to generate
parse tree, so that the translation can be done efficiently [15]. At the time of machine
translation, it is very important to detect sub phrase as well as clause detection. If
there is any error in clause detection, the translation may not be done properly [16,
17]. Parsing-based sentence simplification is one of the methods where keywords
can be extracted. This process follows dependency-based parsing technique [18].
The study of related works shows that, due to the lack of resources, tools, vocab-
ularies, it is not always possible to translate the English sentence into regional lan-
guage by using the existing methodologies. If the translation is not properly done,
the meaning of the translated statement may not be appropriate. The main reason of
this problem is improper analysis of the sentences. Generally, in existing systems,
some general rules are applied that fails to do the proper conversion in some cases,
e.g., if the first letter of some name is given in small letter, the output of existing
system drastically changes. This is one of the major drawbacks of existing system.
Parts of Speech (POS) tagging does not work properly in these cases. It shows that
priority should be given to make the translation system intelligent enough to analyze
of the sentences properly.
[email protected]
250 S. Bal et al.
3 Methodology
The present work proposes a novel methodology of English to Bengali text trans-
lation. Here, an English text or sentence or a paragraph is used as input and the
system generates its appropriate Bengali meaning. So, first of all, the English text
is taken as input to the system. Then, the sentence is broken into words and then
by using the Parts of Speech (POS) Tagger, it retrieves the Parts of Speech of each
word. Then, the words are clustered into three groups, i.e.—Subject, Verb, Object,
and some other required parts (e.g., WH-words, exclamatory expression etc.). After
that, the parse tree is generated for English text and converted into the parse tree of
Bengali language by using different Bengali grammatical rules [19]. Here, a separate
file is used as database where the English word and the respective Bengali meanings
are stored. After judging the syntactical structure of the sentence, the appropriate
Bengali words are selected and used. Finally, the output of the system is generated
in Bengali language. The proposed system is shown with the help of a bock diagram
in Fig. 1.
In the present work, the types of sentences taken as input are shown in Fig. 2.
Here, two examples of assertive and interrogative sentences are taken and demon-
strated how they actually work.
Assertive Sentence: First recognize the pattern of the input sentence.
In English, the pattern is: Sub + Verb + Obj
e.g., “I am going to school.”
Now, as per the Bengali grammar [19], reconstruct as the pattern: Sub + Obj +
Verb