0% found this document useful (0 votes)
7 views

Machine Translation of Vedic Sanskrit Using Deep Learning Algorithm

Uploaded by

Amrita Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Machine Translation of Vedic Sanskrit Using Deep Learning Algorithm

Uploaded by

Amrita Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)

Machine Translation of Vedic Sanskrit using Deep


Learning Algorithm
2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N) | 978-1-6654-7436-8/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICAC3N56670.2022.10074224

Mrinal Pandey
Rashmikiran Pandey, Alexey Nazarov
Department of Radio Engineering and Computer
Department of Radio Engineering and Computer
Technologies
Technologies
Moscow Institute of Physics & Technology
Moscow Institute of Physics & Technology
Moscow, Russia
Moscow, Russia
[email protected]
[email protected]

Abstract—Machine Translation using deep learning is a the author introduced artificial neural networks model for rule-
big leap in the arena of automatic translations. In this paper based machine translation. The progress in the field of
we have proposed a neural translation method for end-to-end Sanskrit to English translation is further seen in [7] with the
translation. The model is designed by using the 3 encoder and introduction of encoder-decoder architecture based Neural
3 decoder layers of gated Recurrent Neural Networks Machine Translation.
followed by the dense layer. We have also improved the
translation by reversing the words from the source language In order to maintain the unbiased approach in language
using the Bi-Directional GRU. The output of translation from translation, we've constructed the model based on the
source language to target language achieves a significantly fundamental decrypting methods across the languages used.
better BLUE and WRE score as compared to existing rule These procedures are based upon mapping the words that
based translation. show similar meaning with the context and minimizing the
Keywords—NLP; Deep Learning; Encoder-Decoder;
distance between them. In the paper[16] it is demonstrated
Ancient Language; Neural Machine Translation (key words) that the approach is quite promising while dealing with
character level recognition based on their occurrences. At a
I. INTRODUCTION deeper level we want to bridge the source language with
one-to-one destination language correspondence. Finally we
Knowledge extraction from ancient scripts is the most
challenging task in the field of Natural Language can ensure that the source language is almost mapped with
Processing. The importance of ancient language is that they its very closed vocabulary term in the destination language.
offer many techniques and open a new paradigm to
understand the technologies that were used during that era. Given below is the basic general flowchart of the neural
Sanskrit is the most extensively studied ancient language in machine translation. At the first stage the gathering of data
the world and English is the most widely used modern occurs and the implementation of data pre-processing occurs
language. After the introduction of deep learning, the task at this stage. Here the data is cleaned by implementing various
of machine translation [1, 2] became a big leap in the field techniques, like for example filling up the missing values and
of cross-language translation. The biggest challenge in the after that the source language is forwarded to the process of
translation from Sanskrit language to English language is Natural Language Understanding or simply NLU. In this
its divergence or simply we can say the shlokas or poems in section the source language is being mapped by adjusting the
Sanskrit has multiple meanings. Many ancient scriptures parameters so as to get a well optimized structure of the
and scientific information are written in Sanskrit and yet to respective model. Further the data is transferred to the Natural
be translated and only a few have been translated into Language Generation section in which the creation of
English like Bhagwat Gita, Mahabharata, Ramayana using meaningful statements are formed by combining the words
manual translations by the scholars. based on the distance vector calculated in the previous step.

Although in the last decades there are many efforts This stage is very complex in the learning process of any
made in this direction by the researchers to help in the task Neural Network, as we know these complications arise due to
of translation. As [3] "in every case there are no pre- the mapping from numerical values and minimizing the loss
specified constraints on the length sequences because the function of any given Neural Network. By minimizing the
recurrent transformation is fixed and can be applied as value of loss function we can conclude that the system is well
many times as we like" which shows the effectiveness of optimized and finally after achieving the optimized value after
recurrent units of neural networks. The papers [4, 5] running the model over and over during the training, the test
demonstrated the implementation of rule-based machine data is finally passed from the same trained network. The final
translation and authors applied example-based Sanskrit to section dedicated to the destination language and all the
English language translation. Later in this development [6] optimized values passes through the model gives most concise
results. The best thing about Neural Networks is that the

ISBN: 978-1-6654-7436-8/22/$31.00 ©2022 IEEE 1477

Authorized licensed use limited to: CMR Institute of Technology. Downloaded on April 03,2025 at 07:45:06 UTC from IEEE Xplore. Restrictions apply.
2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)

problem of optimization can easily be solved using the The proposed model of machine translation in [1] uses
modern algorithms and resources. This method is way encoder-decoder where the encoder converts the input
faster than the traditional human translation or rule based sentence into a fixed form of matrix and thereafter this matrix
machine translations. is translated with the help of decoder. The author found that
use of fixed length matrix is a bottleneck in achieving the
translation preformation hence incorporating a smart search
procedure for input sentence. The results show that
performance improved while comparing to existing phrase
based English to French translation. Author in [14] presented a
Neural Machine translation based system to overcome the
major challenges of machine learning based translations such
as rare words. The model connected 8 encoders and 8
decoders in parallel for faster performance. Top layer of the
encoder is connected to the bottom layer of the decoder hence
reducing the training time. Additionally a low precision
arithmetic was employed in computation to speed up the
performance and for faster proceedings of rare words, the
words are divided into sets of common words. The results
produced in the model were promising and reduced the errors
around 60%.
Moreover Author in [8] presented Itihasa to translate a
large scale of around 93000 shlokas of Sanskrit into English.
Author in the research used an encoder-decoder based SMT-
NMT system to train and test the model. Author in [15]
developed a procedure to translate Sanskrit Language into
English language using the reinforcement learning algorithm.
A baseline translation model with encoder-decoder
Fig. 1. Natural Language Processing Flowchart. architecture is used first and thereafter language models and
agents have been incorporated to generate monolingual data.
In this paper we have proposed a model that is trained Two types of approaches opted in the work to translate the
by feeding the corpus based translated data [8]. The languages first is transfer learning approach and another is
framework is basically Keras as a front end on top of the transformer translator. Additionally the author shared the
TensorFlow as on backend implementation. Solving the monolingual Sanskrit and English corpora to the community
task of translation through encoder-decoder architecture for more research. Datasets in the work are collected from
along with gated recurrent unit as described in[9] will different sources and modified as per the requirement.
overcome the problem of vanishing gradient which is very
common in using encoder-decoder architecture with long
short term memory units as it is described in [10] Upon
III. DATASETS
comparison with the existing models[11,12] our proposed
model gives us 38.2% less word rate error result and 26 Corpus based translation system from Sanskrit to English
more accurate BLEU score as compared to that[13] of translation where Bhagavad Gita and Itihasa are used as input
traditional rule based machine translation in English to training data[6]. Ramayana and Mahabharata slokas and their
Sanskrit. translation of 93,000 pairs from Sanskrit to English[5].

II. RELATED WORK


In [2] English to French translation is done on WMT,
14 dataset. A sequential learning based end to end approach
based deep learning method used in the research.
Multilayer long short-term memory (LSTM) method has
been used to encode the input sequence of language and
fixed the same into a matrix. Further another LSTM is used
to decode the language sequence. The BLEU score on the
entire dataset was 34.8. while the author in [9] used
Recurrent Neural Network for translation of English to
French. Further authors frame a phrase table to collect the
standard phrase of the language for faster translation.
Author claims that phrase based RNN improves the
translation performance. Fig. 2. Overview of proposed methodology.

1478

Authorized licensed use limited to: CMR Institute of Technology. Downloaded on April 03,2025 at 07:45:06 UTC from IEEE Xplore. Restrictions apply.
2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)

IV. METHODOLOGY called the tokenizer, which turns words into integers. We need
Translation of Text from Sanskrit language (widely a tokenizer for the source language which is Sanskrit here. We
spoken language during the vedic era) to English need another tokenizer for the destination language which is
implemented using the deep learning model. For instance English in our case, because they both have different
we take as input “ अहम ् छात्रः अस्मम ” and we want our vocabulary.
model to translate this text to the English language. The
correct translation of this text is “I am a student.” Hence we The word “ अहम् ” becomes the integer token number 26, “
want our model to map from language A to language B. छात्रः ” becomes the integer token 1128 and the word “ अस्मम ”
The overall structure of this Deep Neural Network is to becomes the integer token 47. The tokens are enumerated in
have what is called an Encoder and Decoder. Both of these accordance with frequency of their respective occurrence in
units are Recurrent Neural Networks and connected by a the dataset. The word “ अहम् ” and “ अस्मम ” occurs most
thought vector as depicted in Figure 1. Thought vector is frequently while another word “ छात्रः ” does not occur
the output of an Encoder and an array of floating point
frequently. Typically we will limit our vocabulary to only
numbers between -1 & 1 which summarizes the content or
the meaning or the intention of the input text. 10,000 but one can use 100,000. However in this experiment
we have used only 10,000 still the Neural networks can not
Thereafter we use the thought vector as an initial state work on integers. Next we convert each of these integers into
for the recurrent units and then at the decoder end of the a vector of floating point numbers roughly between the
Neural Network then we start marker “ssss” as it doesn’t negative unit vector and positive unit and this embedding layer
exist in the dataset. Given this as the initial state the is also different for the encoder and decoder because they
thought vector summarizes the input text at the start work on different languages. Embedding layers learns from
marker. we want the decoder to provide output “I am”. semantic similarities between words. Therefore it learns
Further we input the same initial state, the thought vector certain words that have a tendency to occur in the same
however we have the both start marker and the word “I
situation.
am”. We want the network to produce the word “a” and we
input our start marker. This provides “student” then we
input these four words and we receive the results along Hence input sequences have the same semantic meaning
with the end marker “eeee” which does not exist in the and another thing we should note that the application of
vocabulary for the datasets. tokenizer is applied outside of the neural network because
there is no need to do this every time we run the data through
the neural network so we just process the data once. The
neural network actually consists of the embedding layers and
then three layers of gated recurrent units. And we use this
instead of LSTM because we have to set the initial state for
the recurrent units in the decoder and the LSTM has actually
two internal states and this means that we would be forced to
take out the last internal state for the last recurrent unit here.
and the initial state of the decoder. When we use the gated
recurrent units it becomes easier thus we use either internal
state or we use the output. In any case we get a vector out
which has a size of internal state of the gated recurrent unit
and because it has only one recurrent state so we just need one
vector to initialize it.
Now what we find as output of the final layer of the recurrent
units is a sequence which is opposed to the encoder that
provides only one vector and not the sequence of vectors.
Because we need only one thought vector to summarize the
contents of the input text. However, now we want to generate
a sequence of words. If the internal state of the gated recurrent
units have for instance 256 elements and we have a
vocabulary and the destination language say, 10,000 words.
Then we somehow need to convert them into the vector of
Fig. 3. Flowchart of Language Translation Model these 256 elements to a number 1 and 10,000. One of the
methods of doing that is with a so called 1-hot encoded array
It is well aware that the recurrent units can not work in which we want to take output in a vector with 10,000
directly on the text data therefore we opt the two step elements and we take the index of highest elements and this is
process as described in Fig 2 to convert it into numbers the integer we want as an output. Therefore these dense layers
eventually to use on neural networks. The first step is map from a vector that is 256 elements long to a vector that is

1479

Authorized licensed use limited to: CMR Institute of Technology. Downloaded on April 03,2025 at 07:45:06 UTC from IEEE Xplore. Restrictions apply.
2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N)

10,000 elements long. To get an integer out and then we VI. CONCLUSION
can use the tokenizer again to convert these numbers into In this paper we presented the basic implementation of the
the respective words. model commonly known as encoder-decoder by using two
V. RESULTS layers of Recurrent Neural Networks to implement Machine
Translation of human languages. The model is demonstrated
In order to evaluate the performance of our proposed model on a large dataset which is a Corpus based translation system
we have calculated the following parameters: from Sanskrit to English translation where Bhagavad Gita and
A. BLUE Score Amarakosha are used as input training data.The accuracy of
our model is arguably better than the existing models and can
The BiLingual Evaluation Understudy is a method for
calculating the machine translated textual data be improved by training more variety of datasets from other
automatically and is given by the following formula. ancient textual sources. However, we should also keep in mind
that these machine translations are the mapping functions for
approximating the sequence of integer-tokens because
𝑂𝑢𝑡𝑝𝑢𝑡 − 𝑙𝑒𝑛𝑔𝑡𝑕 computers do not really understand human languages.
𝐵𝐿𝐸𝑈 = 1, 𝛱 4𝑖=1 𝑝𝑟𝑒𝑐𝑖𝑠𝑜𝑛𝑖 (1)
𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 − 𝑙𝑒𝑛𝑔𝑡𝑕 REFERENCES
[1] Bahdanau, D., Cho, K., and Bengio, Y. Neural machine translation by jointly
learning to align and translate. In International Conference on Learning
Representations (2015).
[2] Sutskever, I., Vinyals, O., and Le, Q. V. Sequence to sequence learning with
neural networks. In Advances in Neural Information Processing Systems
(2014), pp. 3104–3112.
[3] Effectiveness in RMN of Algorithm, ,
https://karpathy.github.io/2015/05/21/rnn-effectiveness/,(Last Accessed on
27.06.2022)
[4] Vimal Mishra and RB Mishra. 2008. Study of example based english to
sanskrit machine translation. Polibits, (37):43–54.
[5] V. K. Gupta, N. Tapaswi, and S. Jain. 2013. Knowledge representation of
grammatical constructs of sanskrit language using rule based sanskrit
language to english language machine translation. In 2013 International
Fig. 4. Rule Based Machine Translation v/s Model proposed Conference on Advances in Technology and Engineering (ICATE), pages 1–
translation 5.
[6] Vimal Mishra and RB Mishra. 2010. Approach of english to sanskrit machine
translation based on case based reasoning, artificial neural networks and
Generally the score lies in the range of 0 and 1 but we can translation rules. International Journal of Knowledge Engineering and Soft
calculate the percentage by multiplying it by a hundred and Data Paradigms, 2(4):328–348.
a better score means a better and stable model. In our case [7] Nimrita Koul and Sunilkumar S Manvi. 2019. A proposed model for neural
machine translation of sanskrit into english. International Journal of
the model proposed is performing 26% better than that of Information Technology, pages 1–7.
traditional rule based translation. [8] Rahul Aralikatte, Miryam de Lhoneux, Anoop Kunchukuttan, and Anders
Søgaard. 2021. Itihasa: A large-scale corpus for Sanskrit to English
translation. In Proceedings of the 8th Workshop on Asian Translation
B. Word Rate Error (WAT2021), pages 191–197, Online. Association for Computational
Linguistics.
Another method to calculate the accuracy of machine
[9] Cho, Kyunghyun, et al. "Learning phrase representations using RNN
translation in textual data is calculating the word rate error. encoder-decoder for statistical machine translation." arXiv preprint
arXiv:1406.1078 (2014).
In this the comparison is done between human translation
[10] Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural
and machine translation. Lesser the error better the model, networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014).
the formula for calculating word error rate is given as [11] Zhou, J., Cao, Y., Wang, X., Li, P., and Xu, W. Deep recurrent models with
fast-forward connections for neural machine translation. CoRR
abs/1606.04199 (2016).
𝑠𝑢𝑏𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑠 +𝑖𝑛𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝑠 +𝑑𝑒𝑙𝑒𝑡𝑖𝑜𝑛 [12] Luong, M.-T., Sutskever, I., Le, Q. V., Vinyals, O., and Zaremba, W.
𝑊𝐸𝑅 = (2) Addressing the rare word problem in neural machine translation. In
𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 −𝑙𝑒𝑛𝑔𝑡 𝑕 Proceedings of the 53rd Annual Meeting of the Association for
In our case the model we proposed gives 38.2% less error Computational Linguistics and the 7th International Joint Conference on
Natural Language Processing (2015).
as compared to that of traditional rule based machine [13] Buck, C., Heafield, K., and Van Ooyen, B. N-gram counts and language
translation. models from the common crawl. In LREC (2014), vol. 2, Citeseer, p. 4.M.
Young, The Technical Writer’s Handbook. Mill Valley, CA: University
Science, 1989.
[14] Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W.,
Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson,
M., Liu, X., Łukasz Kaiser, Gouws, S., Kato, Y., Kudo, T., Kazawa, H.,
Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J.,
Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., and Dean, J. Google’s
neural machine translation system: Bridging the gap between human and
Fig. 5. Rule Based Machine Translation v/s Model proposed machine translation. arXiv preprint arXiv:1609.08144 (2016).
translation [15] Ravneet Punia, Aditya Sharma, Sarthak Pruthi, Minni Jain, “Improving
Neural Machine Translation for Sanskrit-English”, International Conference
on Natural Language Processing, pages 234–238, 2020
[16] Kak, Subhash C. "The Paninian approach to natural language processing."
International Journal of Approximate Reasoning 1.1 (1987):

1480

Authorized licensed use limited to: CMR Institute of Technology. Downloaded on April 03,2025 at 07:45:06 UTC from IEEE Xplore. Restrictions apply.

You might also like