0% found this document useful (0 votes)
56 views5 pages

Proceedings of International Ethical Hacking Conference 2018

Bilingual Machine Translation: English to Bengali

Uploaded by

Multi Vac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views5 pages

Proceedings of International Ethical Hacking Conference 2018

Bilingual Machine Translation: English to Bengali

Uploaded by

Multi Vac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Advances in Intelligent Systems and Computing 811

Mohuya Chakraborty 
Satyajit Chakrabarti 
Valentina Emilia Balas · J. K. Mandal
Editors

Proceedings of
International
Ethical Hacking
Conference 2018
eHaCON 2018, Kolkata, India
Bilingual Machine Translation: English
to Bengali

Sauvik Bal, Supriyo Mahanta, Lopa Mandal and Ranjan Parekh

Abstract The present work proposes a methodology of machine translation system


which takes English sentences as input and produces appropriate Bengali sentences
as output using natural language processing (NLP) techniques. It first uses a parse tree
for syntactic analysis of the sentence structure and then applies semantic analysis for
extracting the meaning of the words. An inverse function is then provided to fit that
into the Bengali syntax. A dictionary as a separate file is used for mapping between the
English words and their Bengali counterparts. The novelty of the present work lies in
the fact that it combines both a syntax-based and a meaning-based analysis to arrive at
the proper translation. The effectiveness of the algorithm has been demonstrated with
examples of different English sentence conversions with several rules, and the results
have been compared with that of the Google translator to show the improvements
achieved.

Keywords POS tagging · Machine translation · Parse tree · Rule-based system

S. Bal (B) · S. Mahanta (B)


University of Engineering & Management, Jaipur, India
e-mail: [email protected]
S. Mahanta
e-mail: [email protected]
L. Mandal
Institute of Engineering & Management, Kolkata, India
e-mail: [email protected]
R. Parekh
Jadavpur University, Kolkata, India
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2019 247


M. Chakraborty et al. (eds.), Proceedings of International Ethical Hacking
Conference 2018, Advances in Intelligent Systems and Computing 811,
https://doi.org/10.1007/978-981-13-1544-2_21

[email protected]
248 S. Bal et al.

1 Introduction

Language translation is one of the important applications in the present scenario as


today’s world is considered to be a global village. If a person has to move from
one location to another and is not aware of the regional language of that location,
it would be very difficult for him/her to communicate. Not only it is relevant in a
global scenario where multiple languages come into consideration, but also in a local
setting where two or more neighboring countries might share the same language with
similar/dissimilar dialects. For example, in India and Bangladesh, many people use
Bengali as their mother tongue though with different dialects. All these make machine
translation to be an important area of research. The present work aims to translate
a worldwide used language viz. English into a regional language viz. Bengali. The
main challenge of language translation is that often a simple mapping between words
does not produce expected results. Restructuring of the sentences as well as analysis
of the inherent meaning is also necessary for correct outputs. In the existing process,
there are so many sentences where the translation does not give meaningful output
due to problem of proper analysis of sentences, lack of resources etc. The present
work proposed a novel approach where English to Bengali language conversion is
done based on some grammatical rules. The proposed work is based on the version
of the language used by the Bengali people of West Bengal, India.

2 Literature Survey

A good translator system should contain all words and their corresponding trans-
lated words. The main problem of this kind of system is limited available vocabu-
lary. Fuzzy-If-Then-Rule is one of the frequently used methodologies for machine
translation [1]. In the process of translation from one language to another, there are
some challenges like, lack of resources, different tools, pronunciation dictionary,
different language modeling, dialog modeling, content summarization etc. [2]. More
research is required to increase the accuracy rate when translation is done in case of
low resource languages and in the cases where the volume of target language vocab-
ulary is limited [3]. Another approach of language translation is based on the tense
where English sentences can be used as input. This kind of system uses context free
grammars for analyzing the syntactical structure of the input which helps to translate
the sentence and verify the accuracy of the output [4]. Machine translation may be
achieved by deep learning-based neural network (DNN). Memory-augmented neu-
ral network is introduced with this mechanism where the memory structure does not
consist of any phrase [5]. Another method of machine translation is to retrieve by
audio analysis and feature extraction. This kind of process can solve the ambiguity
problem in sentence translation to improve the output [6]. Another approach is used

[email protected]
Bilingual Machine Translation: English to Bengali 249

for translation where values from the New Testament were used as training values. If
the proper resources are not available and the machine is not properly trained, accu-
racy rate will be decreased [7]. Example-based machine translation is found to be
another methodology, used in this case. The problem of this methodology is limited
knowledge base. It makes the system inefficient for translation where low-resourced
language is used [8]. Machine translation is also important for question–answering
sessions. The main problem for this type of system is word ambiguity. By using the
matrix factorization, this can be improved. If there are dynamic question–answering
sessions, large vocabulary and proper learning would be required for accuracy [9].
For speech to test conversion, machine translation is also important. If the speech
is in different language, it is important to have the proper resources for translation.
This kind of system extracts the meaning of input sentence. So, proper decision-
making algorithm and proper training is needed [10, 11]. Deep learning is one of
the important concepts for natural language processing. For language translation, it
is important to choose the right decision. Based on the past experience, training can
be done and system can take the proper decision by using the concept of deep learn-
ing [12]. Sometimes language translation efficiency is reduced when phrase-based
translation is required for long sentences. Sequence of the words in inputted language
may differ with output language. So, rule-based system is required to improve the
translation quality [13]. If there is any complex sentence, tree-based method can be
applied for simplification. So, the splitting and decision making should be proper for
accurate language translation [14]. If there is any sentence with complicated struc-
ture, the parse tree may not be created properly. So, it is very important to generate
parse tree, so that the translation can be done efficiently [15]. At the time of machine
translation, it is very important to detect sub phrase as well as clause detection. If
there is any error in clause detection, the translation may not be done properly [16,
17]. Parsing-based sentence simplification is one of the methods where keywords
can be extracted. This process follows dependency-based parsing technique [18].
The study of related works shows that, due to the lack of resources, tools, vocab-
ularies, it is not always possible to translate the English sentence into regional lan-
guage by using the existing methodologies. If the translation is not properly done,
the meaning of the translated statement may not be appropriate. The main reason of
this problem is improper analysis of the sentences. Generally, in existing systems,
some general rules are applied that fails to do the proper conversion in some cases,
e.g., if the first letter of some name is given in small letter, the output of existing
system drastically changes. This is one of the major drawbacks of existing system.
Parts of Speech (POS) tagging does not work properly in these cases. It shows that
priority should be given to make the translation system intelligent enough to analyze
of the sentences properly.

[email protected]
250 S. Bal et al.

3 Methodology

The present work proposes a novel methodology of English to Bengali text trans-
lation. Here, an English text or sentence or a paragraph is used as input and the
system generates its appropriate Bengali meaning. So, first of all, the English text
is taken as input to the system. Then, the sentence is broken into words and then
by using the Parts of Speech (POS) Tagger, it retrieves the Parts of Speech of each
word. Then, the words are clustered into three groups, i.e.—Subject, Verb, Object,
and some other required parts (e.g., WH-words, exclamatory expression etc.). After
that, the parse tree is generated for English text and converted into the parse tree of
Bengali language by using different Bengali grammatical rules [19]. Here, a separate
file is used as database where the English word and the respective Bengali meanings
are stored. After judging the syntactical structure of the sentence, the appropriate
Bengali words are selected and used. Finally, the output of the system is generated
in Bengali language. The proposed system is shown with the help of a bock diagram
in Fig. 1.
In the present work, the types of sentences taken as input are shown in Fig. 2.
Here, two examples of assertive and interrogative sentences are taken and demon-
strated how they actually work.
Assertive Sentence: First recognize the pattern of the input sentence.
In English, the pattern is: Sub + Verb + Obj
e.g., “I am going to school.”

I (sub) am going (verb) to school (obj)

Now, as per the Bengali grammar [19], reconstruct as the pattern: Sub + Obj +
Verb

I (sub) school to (obj) am going (verb)

Fetch corresponding Bengali words.


Interrogative Sentence: First recognize the pattern of the sentence.
e.g., What is the capital of India?
So, the pattern in English is: “wh” word + obj + Sub

What (sub) is the capital (obj) of India (Sub)

Now the pattern as per the Bengali language is reconstructed.


So, pattern in Bengali is: sub + obj + “wh” word

[email protected]

You might also like