Paper 93-BERT Model Based Natural Language To NoSQL Query Conversion
Paper 93-BERT Model Based Natural Language To NoSQL Query Conversion
net/publication/369057820
Article in International Journal of Advanced Computer Science and Applications · January 2023
DOI: 10.14569/IJACSA.2023.0140293
CITATION READS
1 574
4 authors:
Some of the authors of this publication are also working on these related projects:
Ensure Patient Privacy Given Internet of Things Data Transmitted from Wearable Sensors View project
All content following this page was uploaded by Md Ashraf Uddin on 08 March 2023.
Kazi Mojammel Hossen1 , Mohammed Nasir Uddin2 , Minhazul Arefin3 , Md Ashraf Uddin4
Department of CSE, Jagannath University
Dhaka, Bangladesh1,2,3,4
Abstract—Databases are commonly used to store complex query from Natural Language (English) is challenging. Using
and distinct information. With the advancement of the database NoSQL approach, amateur users can interact with the database
system, non-relational databases have been used to store a vast system. The model facilitates communication between humans
amount of data as traditional databases are not sufficient for and computers without recalling the query syntax method
making queries on a wide range of massive data. However, for the non-relational databases. Natural Language Processing
storing data in a database is always challenging for non-expert
users. We propose a conversion technique that enables non-expert
(NLP) [9], [10], [11] is a branch of linguistics, information
users to access and filter data as close to human language as engineering, computer science, and artificial intelligence that
possible from the NoSQL database. Researchers have already studies how computers and humans interact with Natural
explored a variety of technologies in order to develop more precise Language [12]. Traditional machine translation is applied to
conversion procedures. This paper proposed a generic NoSQL translate the text from one language to another by NLP [13].
query conversion learning method to generate a Non-Structured
Query Language from natural language. The proposed system This research aims to develop a feasible tool for searching
includes natural language processing-based text preprocessing databases where natural language can be used without needing
and the Levenshtein distance algorithm to extract the collection complex database queries that are developed by expertise.
and attributes if there were any spelling errors. The analysis of Generating NoSQL from natural language has wide range of
the result shows that our suggested approach is more efficient and applications. Tools with AI knowledge [14] such as Google
accurate than other state-of-the-art methods in terms of bilingual Assistant or Alexa use the NLIDB system for non-technical
understudy scoring with the WikiSQL dataset. Additionally, the users. Filling out a lengthy online form can be tedious and
proposed method outperforms the existing approaches because users might need to navigate through the screen, scroll, look
our method utilizes a bidirectional encoder representation from up values in the scroll box, and so on whereas with NLIDB,
a transformer multi-text classifier. The classifier process extracts
the users need to type a question similar to a sentence. Conse-
database operations that might increase the accuracy. The model
achieves state-of-the-art performance on WikiSQL, obtaining quently, such a tool has a wide range of usage and applications.
88.76% average accuracy. NoSQL approach has been researched both in academia as
well as in industry [15]. In this paper, we implement a Neural
Keywords—Natural language processing; NoSQL query; BERT Machine translation model which consists of four steps. First,
model; Levenshtein distance algorithm; artificial neural network we have used a Natural Language Tool-Kit for performing text
preprocessing. Secondly, attributes are collected and extracted
I. I NTRODUCTION using Levenshtein Distance (LD) [16], [17] algorithm. Thirdly,
we have used a bidirectional encoder representations from
In today’s digital age, non-relational databases are utilized BERT Transformers Model-based multi-text classification [18]
in almost every industry to store information. Non-Structured to extract the operations including find, insert, update and
Query Language (NoSQL) databases [1], [2] are increasingly remove. The last step of the proposed approach is generating
being used for large-scale data sets, search engines, and query.
real-time web applications [3]. Nowadays, NoSQL databases
work as an alternative to relational databases [4] and other Many research works have used WIKISQL dataset for
conventional databases [5]. conversation Natural Language to Structured Query Language.
The BERT Model generates the NoSQL operational command
With the growth of technology, NoSQL databases stores a from the WIKISQL task. The contribution of this research
large amount of data in document stores, key-value data stores, paper can be summarized as follows:
wide-column stores, and Graph stores. As opposed to relational
databases, MongoDB, CouchDB, Cassandra, etc are designed • Designing several algorithms to come up with a stan-
on the architecture of distributed systems to store massive date dard machine translation model for converting Natural
[6]. Many organizations are gradually looking into approaches Language into NoSQL queries.
to understand and analyze this enormous unstructured data.
• To resolve the syntax errors for primitive users using
The current approaches to data management, organization, and
Levenshtein Distance algorithm that can extract the
storage are being changed by “Big Data” [7]. In particular,
collection and attributes from the text even if any users
“Big Data,” an open source framework used to store vast
make spelling mistakes or utilize synonyms.
amounts of structured, unstructured, and semi-structured data
[8]. So, Normal users require knowledge of the query syntax • To employ the latest contextual word representation
and table schema to access and store a large amount of data. BERT transformer model to extract the operations
However, finding a reliable approach to generate the NoSQL with a higher accuracy rate.
www.ijacsa.thesai.org 810 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 2, 2023
The remainder of the paper is organized as follows: Related keywords, symbols, attributes, values, and relationships among
works conducting with the same and different technologies various types of quiries. Additionally, they proposed a graph-
by other researchers are illustrates in Section II. Section III ical user interface where users could enter NL queries and
describes the proposed methodology and work flow. Section have a NoSQL query created from those queries. For complex
IV shows experiment evaluation and result of the proposed queries, the proposed framework didn’t offer a solution.
system. Conclusions with the future expansion are detailed in
Section V. S. Mondal et al. [25] introduced a query-response model
that can respond to a variety of queries, including assertive,
II. R ELATED W ORK interrogative, imperative, compound, and complicated forms.
This NoSQL system’s primary task is to retrieve knowledge
Research in Natural Language for non-relational databases data from the default MongoDB database. This paper didn’t
has started as far back as the twenty century. Since the interest give any solution of time-related query such as “What is the
in Natural Language Processing has continued tremendously. age of x after 10 years”.
In the early 1970’s LUNAR [19], the first Natural Language
Interface for the relational database (NLIDB) has introduced T. Pradeep et al. [26] presented a Deep Learning based
to the researcher. LUNAR was a Question Answering (QA) approach that converts English questions to MongoDB queries.
system connected with the moon rock sample database. The They applied an encoder-Decoder machine-translation method
information of rock samples brought back from the moon for this conversion. The encoder turns the NLQ text input
was used to make the LUNAR database. NLP to NoSQL into a vector and sends it to the decoder. The decoder uses a
query conversion field has very little research on it. This deep neural network to predict NoSQL queries. Their system
section discusses various works on Natural Language to query uses ten different deep learning models to handle ten types of
conversion. MongoDB queries. One solution is the best possible answer
for this problem.
In 2021, Minhazul et al. [20] suggested a machine learning-
based NLP2SQL translation system. They used the Naive Sebastian Blank et al. [27] suggested an end-to-end Ques-
Bayes algorithm for command extraction and decision tree re- tion Answering (QA) system. It allows a user to ask a question
gression for condition extraction. Their proposed method lack in natural language on the Elasticsearch database. They solve
accuracy because of using the bag of words technique in the the homogeneous operation problem of the database by us-
derivation of condition from SQL. An advance deep learning ing policy-based reinforcement learning. For that, they used
solution can mitigate this problem. On the other hand, they can Facebook’s bAbI Movie Dialog dataset. They also design a
use the neural translation technique for this machine translation KBQueryBot, an agent of translating a natural language query
approach. The system can use the statistical translation method into the domain-specific query language based on a sequence-
also. to-sequence model [28]. It gives every single answer with the
help of an external knowledge base.
Mallikarjun et al. [21] proposed an automated NLP-based
text processing approach. Their approach can successfully Some classic NLIDB systems can solve the spelling cor-
convert an excel datasheet into a DBMS. Their system has a rections of misspelled words automatically [29]. The module
user authentication system that prevents unwanted users. The gives the interface between computer and user by the database
system has a limitation of 16,384 columns and 1,048,576 rows query language. Consequently, they discuss the overall system
for an excel worksheet. This data may be massive for average architecture of the NLIDB, some implementation details, and
purposes but not enough for big data. experimental results. The proposed work only focuses on
An Intelligent processing system in a document-based automatic spelling and grammar correction.
NoSQL database had proposed by Benymol et al. [22] in Z. Farooqui et al. [30] recommended the conversion of
2021. They used state-of-the-art algorithms and technologies to English to SQL. For example, their system converts English
convert text into NoSQL. They used different types of TF-IDF questions or text queries into SQL queries. Later it will be
schemes for information retrieval, machine learning algorithm operated on databases. Their suggested technique and method
for modeling, and hyper parameter tuning for model selection. are generic and smooth. It can handle both small and large
The system may have vulnerability in stream and batch data applications for generic NLIDB systems. There are four types
on the Big Data processing platform. The proposed model also of input NLQ text Normal, Linear Disjoint, Linear Coincident,
has a problem with dynamic processing strategies. In this stage, and Non-Linear Model. It focuses on simple SQL query
the system fails to find any possible solution. clauses such as SELECT, FROM, WHERE, and JOIN. Their
Fatma et al. [23] proposed an automatic UML/OCL model system can handle complex queries resulting from ambiguous
for the NoSQL database converter. Their system mainly fo- NL queries.
cuses on the big data platform. Because there is wide use
Tanzim Mahmud et al. [31] proposed a system based on
of NoSQL database in the big data platform. After creating
Context-Free-Grammar (CFG). Any input token of appropriate
the NoSQL database, the system automatically checks the
terminals found in the input NLQ will replace the correspond-
OCL constraints of the model. There are different types of
ing attribute in the relational table or applicable operators of
NoSQL databases and a maximum of them have a problem
SQL. The interface can configure easily and automatically by
with integrity constraint checking. For this, it is the most
the user. It relies on the Metadata set and Semantic sets for
challenging task in the system.
tables and attributes. It can handle ambiguities in the input
In [24] M. T. Majeed et al. have designed a fully auto- NLQ. For example, the system can solve the same attribute
mated framework that, using an AI technique, can recognize name clashing problem within a table. The limitation of the
www.ijacsa.thesai.org 811 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 2, 2023
Algorithm 3: Attribute Extraction data containing (or instances) whose categorical membership
Input: W = List of Attributes from Database; C = is known [36]. A classification model tries to make some
List of Collection name from Database; S = Set of inferences from the observed data. To predict one or more
Similar Words outcomes from the dataset, provide one or more data as inputs
Output: A = Attributes Name; B = Table Name to the categorization model.
t = CountWord(T ) In the dataset, BERT employs a novel technique known as
for i ← 1 to t do Masked Language Model (MLM), in which it masks words
for j ∈ S do in the sentence at random and then attempts to predict them.
LD − T HRESHOLD = 1 It doesn’t use common sequence left-to-right or right-to-left
THRESHOLD = LD-Algorithm(S[j], W [j]) language models. Instead, it uses the bidirectionally trained
if LD − T HRESHOLD>T HRESHOLD sequence with a deeper sense of language context and the
then model. The pre-train BERT applying two unsupervised tasks:
PUSH(A[i], W [j]) • Pre-training the BERT to understand language.
PUSH(B[i], C[i]) • Fine-tuning the BERT to learn specific task.
end
end BERT depends on a Transformer (the self-attention mech-
end anism to learns contextual relationships between words in a
text). A simple Transformer consists of an encoder that reads
text input and a decoder to generates a task prediction. Since
the BERT model only requires the encoder part for generating
between the first m characters of a and the first n characters a language representation model. There are two main models
of b. of BERT:
• BERT base has 12 transformer blocks, 768 hidden
C. Operation Extraction
layers, 12 attention heads, and 110M parameters.
Operation extraction is a particular solution that uses BERT • BERT large has 24 transformer blocks, 1024 hidden
Model to extract operations from natural language queries. In layers, 16 attention heads, and 340M parameters.
this approach, we use BERT Model for classifying the specific
operation. In machine learning, classification is the set of In this paper, we used the BERT base model that has
categories that analysis belongs to the basis of a training set of enough pre-trained data to help bridge the gap in data. The
www.ijacsa.thesai.org 814 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 2, 2023
model for operation extraction shows in Fig. 2. Given the input Query column which represent the natural language query
text, the Model that tokenizes the text using BERT tokenizer and (2) Operations column. The description of the datasets
then generates the input masks with input IDs of the sentence. is illustrated in Table III.
The input mask uses WordPiece [37] for tokenizing that splits
the token like “going” to “go” and “ing.” It is mainly to cover
TABLE III. D ESCRIPTION OF DATASET
a broad spectrum of Out-Of-Vocabulary (OOV) words. After
tokenization, the output class goes as input in the classification Dataset WIKISQL
model. we used a neural network for classification to get Language NLQ
Total number of cases 80,654
the highest accuracy. After classifying, we get the output of Length of the text (average) 61.09
the operation. Here we work on four types of operations, in Word count of the text (average) 11.66
consideration- FIND, INSERT, UPDATE, REMOVE. Granualarity of text description line
Number of validation text 8,421
Number of test cases (total) 15,878
D. Build Syntax Tree & Generate Query Number of train cases (total) 56,355
variable contains the embedding vectors of all of the tokens in G. Model Accuracy
a sequence and second variable contains the embedding vector
of [CLS] token. We then pass the variable into a linear layer Accuracy evaluates how well our model forecasts compare
with ReLU activation function. We have a vector of size 4 them with the original values. With a low rigor yet a high
at the end of the linear layer, each of which corresponds to blunder, the model would make huge mistakes in the data.
a category of our labels (find, insert, update, and remove). Both blunder and rigor lowness indicates that with most data,
We use Adam as the optimizer and train the model for 10 the model produces smaller errors. However, it produces huge
epochs. Because we’re dealing with multi-class classification, mistakes in some systems if they are both high. The ideal
we’ll need to use categorical cross entropy as our loss function. scenario of any model would be high rigor and little blunder.
Fig. 8 depicted the operation. Fig. 9 illustrate the accuracy of the proposed model.
For example:
H. Model Loss
Fig. 8. Model building
Loss is the total of our model errors. It evaluates how well
our model does (or how badly it does). When there are a lot of
mistakes, the loss is high and the model doesn’t work properly.
The better our model works, the lower it is. However, the
The model enhances the accuracy rate for classification greatest conclusion we can make from it is whether the loss
than the previous model. For the classification task, the model is big or low. If we plot losses over time, we can evaluate
can classify 81.45% average class detection from previous if and how quickly our model is learning. This is because
research. One of the reasons is BERT uses a pre-trained model the loss function is utilized by the model for learning. This
which is based on transfer learning. It can tune the data on a takes the shape of approaches like gradient descent, which
specific NoSQL language. Fig. 7 illustrates the accuracy rate modify parameters of the model using information on the loss
of four types of operations separately. outcome. Fig. 10 illustrate the loss of the proposed model.
www.ijacsa.thesai.org 817 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 2, 2023
J. Evaluation Setup
In this dissertation, we evaluate the result on our dataset
that have three notation to evaluate the query synthesis accu-
racy.
• Normal form accuracy is the form of a NoSQL
query that has no attribute. We analyze the synthesized
NoSQL query with the ground truth to verify whether
they match each other.
Fig. 10. Model loss
• Logical form accuracy is the accuracy of a NoSQL
query that has attributes or any logical expression of
the query.
I. Output
• Query match is the comparison accuracy with the
In output, we get the collection and attribute name, such original query match for find, insert, update and re-
as all student and name. From the operation extraction, get move operations query. We use a canonical represen-
the find operation, then concatenate all the extractions output tation of the synthesized NoSQL query and the ground
part-by-part to generate a NoSQL query. For example: truth to determine whether two NoSQL queries are
identical.
We also find out the F1 score for operation extraction that
measures the precision and recall value. Finally, we present
the comparison of our model with previous work on NoSQL
We have classified the wrong output into two categories: conversion tasks. The implementation of our model using
(a)sometimes, the query contained incomplete logical expres- python [38].
sion in condition part (b) the query is incorrect. Analysis of
the conversion results reveals the following: The F1-score measures the accuracy of the operation (find,
insert, update, remove) by applying the precision and recall
• Observing all the NoSQL output, we can notice sug- values of the test. This test looks at whether the system can
gested model can work with natural language queries process the sentences entered by the user so that it can be
of different lengths. After a successful NoSQL query measured the operation accurately with the F1-score method.
output, the number of input and output tokens might Table VI shows the accuracy values. The equation of the F1-
be distinct. The accuracy of the proposed model did score, precision, recall, and accuracy have given below:
not depend on the length of the query.
• Precision: It is the true positive relevance rate that
• The BERT Model successfully predicts the operation tp
defined as the ratio tp+f p , where f p indicates the
using a pre-trained model. It also tunes the NoSQL number of false positives;
command from a distinctive size of input text.
• Recall: It is the true positive rate that defined as the
• The BERT model can process a large amount of tp
ratio tp+f n , where tp and f n are the number of true
data. The WIKISQL dataset covered different types positives and false negatives, respectively;
of query statements. So there is no problem for the
BERT model to work with the WIKISQL dataset. • F1-score: F1-score is a function of Precision and
Recall that is the harmonic mean between Precision
• The Bert model understands the semantic relationship and Recall, defined the ratio as 2∗(precision∗recall)
precision+recall ;
between natural language and NoSQL queries. As a
result, the decoder output is logically correct for the Next, we find out the accuracy of normal and logical forms.
maximum query. Let X is the total number of queries in our dataset and X ex is
the execution query. we evaluate the every clause (find, insert,
• The model can generate “contextualized” word em-
update and remove) query using accuracy metric for normal
beddings but it is compute-intensive at inference time
form Acc nf = XXex and for logical form Acc lf = XXex . Table
and need to calculate compute vectors every time.
VII shows the accuracy of normal and logical queries. After
• In collection and attribute extraction, we use the Lev- that the overall result is evaluated by the BLEU (Bilingual
enshtein Distance algorithm. The algorithm can extract Evaluation Understudy) that was developed to evaluate the
attributes from natural language queries furthermore machine translation system.
check the spelling error. The run time complexity of
this algorithm is lower than O(n2 ). K. Result
Test results show in Table V that have been translated The article presents an efficient approach to transform
into the NoSQL syntax. The test data contains the natural the natural language query into a NoSQL query effectively.
www.ijacsa.thesai.org 818 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 2, 2023
TABLE V. T HE ACCURACY FOR C ONVERTING NATURAL L ANGUAGE INTO N ON S TRUCTURED Q UERY L ANGUAGE
Bilingual Evaluation Understudy (BLEU) is a score for Table VIII represents BLEU portion of efficiency for
comparing a candidate translation of the NoSQL query to forecasting correct NoSQL query. Using the WikiSQL reshape
one or more reference translations. To predict the accuracy dataset the proposed model is passed down for comparing
of automatic machine translation systems, Kishore Papineni, with the existing other models. Fig. 11 illustrates the three
et al. [39] proposed this score in 2002. We used the BLEU models’ estimated efficiency and error rates. It demonstrates
score to determine the output. the accuracy of other measure rates of converting the natural
language query into the non-structured query language (that
BLEU is not entirely effective but offers several interesting scored 88.76%) is better or at least competitive than the earlier
benefits like quick, easy to calculate, language-independent, results.
highly interactive with human interpretation, and widely used.
m
P = (1)
wt
where, m is the estimate of tokens from the candidate
source code that are found in the reference text, and wt is the
total estimate of words in the candidate query. The accuracy
is calculated using the equation 2.
Accuracy = P × 100% (2)
to non-relational query conversion. Initially, preprocessing the [17] T. Ho, S.-R. Oh, and H. Kim, “A parallel approximate string matching
text (English) by NLTK, then used LD algorithm for collection, under levenshtein distance on graphics processing units using warp-
attribute extraction and BERT model for operation extraction shuffle opera-tions,”PloS one, vol. 12, no. 10, p. e0186251, 2017.
and finally, query generation. Our model can generate queries [18] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
of deep bidirectional transformers for language understanding,” CoRR,
for Find, Insert, Update, Remove clause with an average vol. abs/1810.04805, 2018.
accuracy of 88.76%. In the future, we intend to improve more [19] Woods, W. A.,. 1973. Progress in natural language understanding: An
complex NoSQL queries such as logical function queries, using application to LUNAR geology. AFIPS Natl. Computer. Conj: Expo..
other incentive mechanisms for better performance. Conference Proc. 42, 441-450.
[20] M. Arefin, K. M. Hossen, and M. N. Uddin, “Natural language
ACKNOWLEDGMENT OF F UNDING query to sql conversion using machine learning approach,” in 2021 3rd
International Conference on Sustainable Technologies for Industry 4.0
This work was supported by the UGC Jagannath (STI). IEEE, 2021, pp. 1–6.
University Research Branch, Dhaka, Bangladesh, Under [21] B. Mallikarjun, K. Annapoorneshwari, M. Yadav, L. R. Rakesh, and
JnU/research/rp/2020-21/science/44. S. Suhaas, “Intelligent automated text processing system-an nlp based
approach,” in 2020 5th International Conference on Communication and
Electronics Systems (IC- CES). IEEE, 2020, pp. 1026–1030.
R EFERENCES [22] B. Jose and S. Abraham, “Intelligent processing of unstructured textual
[1] J. Han, E. Haihong, G. Le, and J. Du, “Survey on nosql database,” in 2011 data in document based nosql databases,” Materials Today: Proceedings,
6th international conference on pervasive computing and applications. 2021.
IEEE,2011, pp. 363–366. [23] F. Abdelhedi, A. A. Brahim, and G. Zurfluh, “Ocl constraints checking
[2] A. Nayak, A. Poriya, and D. Poojary, “Type of nosql databases and its on nosql systems through an mda-based approach,” International Journal
comparison with relational databases,”International Journal of Applied of Data Warehousing and Mining (IJDWM), vol. 17, no. 1, pp. 1–14,
2021.
Information Systems, vol. 5, no. 4, pp. 16–19, 2013.
[24] M. T. Majeed, M. Ahmad, and M. Khalid, “Automated xquery gener-
[3] R. S. Al Mahruqi, “Migrating web applications from sql to nosql
databases,” Ph.D. dissertation, Queen’s University (Canada), 2020. ation for nosql,” in 2016 Sixth International Conference on Innovative
Computing Technology (INTECH). IEEE, 2016, pp. 507–512.
[4] S. Batra, C. Tyagi, ”Comparative Analysis of Relational And Graph
Databases”, IJSCE,vol.2(2), pp. 509-512, 2012. [25] S. Mondal, P. Mukherjee, B. Chakraborty, and R. Bashar, “Natural
language query to nosql generation using query-response model,” in 2019
[5] R. Alexander, P. Rukshan, and S. Mahesan, “Natural language web International Conference on Machine Learning and Data Engineering
interfacefor database (nlwidb),”arXiv preprint arXiv:1308.3830, 2013. (iCMLDE). IEEE, 2019, pp. 85–90.
[6] Z. Wei-ping, L. Ming-xin and C. Huan, ”Using MongoDB to implement [26] T, Pradeep and P C, Rafeeque and Murali, Reena, Natural Lan-
textbook management system instead of MySQL”, IEEE-ICCSN,2011, guage To NoSQL Query Conversion using Deep Learning (August
pp. 303-305. 13, 2019). In proceedings of the International Conference on Sys-
[7] P. Chen, C. Xhang,”Data-intensive applications, challenges, techniques tems, Energy & Environment (ICSEE) 2019, GCE Kannur, Kerala,
and technologies: A survey on Big Data”, Information Sciences, Elsevier, July 2019, Available at SSRN: https://ssrn.com/abstract=3436631 or
vol.275, pp.314–347, 2014. http://dx.doi.org/10.2139/ssrn.3436631
[8] B. Jose, S. Abraham, Unstructured Data Mining for Customer Rela- [27] S. Blank, F. Wilhelm, H.-P. Zorn, and A. Rettinger, “Querying nosql
tionship Management: A Survey, International Journal of Management, with deeplearning to answer natural language questions,” in Proceedings
Technology And Engineering 8, Issue 7. ISSN NO (2018) 2249–7455. of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019,
[9] O. Ferschke, J. Daxenberger, and I. Gurevych, “A survey of nlp pp. 9416–9421.
methods and resources for analyzing the collaborative writing process [28] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learn-
in wikipedia,” in The People’s Web Meets NLP. Springer, 2013, pp. ing with neural networks,” Advances in neural information processing
121–160. systems, vol. 27, 2014.
[10] Garrido-Mu ñoz, A. Montejo-R áez, F. Mart́ ınez-Santiago, and L. A. [29] M. D. Gadekar, B. M. Jadhav, A. S. Shaikh, and R. B. Kokare, “Natural
Ure ña- L ópez, “A survey on bias in deep nlp,” Applied Sciences, vol. language (english) to mongodb interface, ”International Journal of Ad-
11, no. 7, p. 3184, 2021. vancedResearch in9 Computer Engineering & Technology (IJARCET),
[11] S. Srivastava, A. Shukla, and R. Tiwari, “Machine transla- vol. 4, no. 3, 2015.
tion: from statisticalto modern deep-learning practices,”arXiv preprint [30] P. Anand and Z. Farooqui, “Rule based domain specific semantic
arXiv:1812.04238, 2018. analysis for natural language interface for database,” International Journal
[12] Kłosowski, P. (2018). Deep learning for natural language processing of Computer Applications, vol. 164, no. 11, 2017.
and language modelling. In 2018 Signal Processing: Algorithms, Archi- [31] T. Mahmud, K. A. Hasan, M. Ahmed, and T. H. C. Chak, “A rule based
tectures, Arrangements, and Applications (SPA), September 2018, pp. approach for nlp based query processing,” in 2015 2nd International
223-228. IEEE. 10.23919/SPA.2018.8563389 Conference on Electrical Information and Communication Technologies
[13] U. K. Acharjee, M. Arefin, K. M. Hossen, M. N. Uddin, M. A. (EICT), pp. 78–82, IEEE, 2015.
Uddin, and L. Islam, “Sequence-to-sequence learning-based conversion [32] X. Xu, C. Liu, and D. Song, “Sqlnet: Generating structured queries
of pseudo-code to source code using neural translation approach,” IEEE from natural language without reinforcement learning,” arXiv preprint
Access, vol. 10, pp. 26 730–26 742, 2022. arXiv:1711.04436, 2017.
[14] B. Jose and S. Abraham, “Intelligent processing of unstructured textual [33] J. Bornholt, E. Torlak, D. Grossman, and L. Ceze, “Optimizing synthesis
data in document based nosql databases,” Materials Today: Proceedings, with metasketches,” in Proceedings of the 43rd Annual ACM SIGPLAN-
2021. SIGACT Symposium on Principles of Programming Languages, 2016,
[15] N. Yaghmazadeh, X. Wang, and I. Dillig, “Automated migration of pp. 775–788.
hierarchical data to relational tables using programming-by-example, [34] V. Zhong, C. Xiong, and R. Socher, “Seq2sql: Generating structured
”Proceedings ofthe VLDB Endowment, vol. 11, no. 5, pp. 580–593, queries from natural language using reinforcement learning,” arXiv
2018. preprint arXiv:1709.00103, 2017.
[16] S. Zhang, Y. Hu, and G. Bian, “Research on string similarity al- [35] K. Guu, P. Pasupat, E. Z. Liu, and P. Liang, “From language to
gorithm basedon levenshtein distance,” in2017 IEEE 2nd Advanced programs: Bridging reinforcement learning and maximum marginal like-
Information Technology, Electronic and Automation Control Conference lihood,” arXiv preprint arXiv:1704.07926, 2017.
(IAEAC).IEEE, 2017, pp.2247–2251. [36] G. B. Boullanger and M. Dumonal, “Search like a human: Neural
machinetranslation for database search,” Technical report, Tech. Rep.,
2019.
www.ijacsa.thesai.org 820 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 2, 2023
[37] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, Today: Pro- ceedings, 2021.
M. Krikun,Y. Cao, Q. Gao, K. Machereyet al., “Google’s neural ma- [39] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for
chine translationsystem: Bridging the gap between human and machine automatic evaluation of machine translation,” in Proceedings of the 40th
translation,” arXivpreprint arXiv:1609.08144, 2016. annual meeting on association for computational linguistics, pp. 311–
[38] M. K. Chakravarthy and S. Gowri, “Interfacing advanced nosql database 318, Association for Computational Linguistics, 2002.
with python for internet of things and big data analytics,” Materials
www.ijacsa.thesai.org 821 | P a g e