0% found this document useful (0 votes)

13 views15 pages

Advanced NLP Models For Technical University Information Chatbots: Development and Comparative Analysis

This document discusses the development and comparative analysis of advanced Natural Language Processing (NLP) models for chatbots aimed at providing information to prospective students at technical universities. It highlights the importance of chatbots in delivering accurate and timely information, thereby assisting students in their academic decision-making processes. The research compares various chatbot models, concluding that neural network-based models, particularly sequential modeling, demonstrate superior accuracy in query resolution.

Uploaded by

fexodih181

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views15 pages

Advanced NLP Models For Technical University Information Chatbots: Development and Comparative Analysis

Uploaded by

fexodih181

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Received 10 January 2024, accepted 17 February 2024, date of publication 20 February 2024, date of current version 29 February 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3368382

Advanced NLP Models for Technical University

Information Chatbots: Development and
Comparative Analysis
GIRIJA ATTIGERI , (Member, IEEE), ANKIT AGRAWAL,
AND SUCHETA V. KOLEKAR , (Member, IEEE)
Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka
576104, India
Corresponding author: Sucheta V. Kolekar ([email protected])

ABSTRACT In order to achieve quality education as a defined one of the sustainable goals, it is necessary
to provide information about the education system according to the stakeholders’ requirements. The process
to obtain the information about university/institute is a critical stage in the academic journey of prospective
students who are seeking information about the specific courses which makes that university/institute unique.
This process begins with exploration to general information about universities through websites, rankings,
and brochures from various sources. Most of the time, information available on different sources leads to
discrepancies and influences student’s decisions. By addressing inquiries promptly and providing valuable
information, universities can guide individuals in making informed choices about their academic future.
To address this, the chatbot application is the most effective tool to be implemented and make it functional
on university’s functional website. A chatbot is an artificially intelligent tool which can interact with humans
and can mimic a conversation. This tool can be implemented using advanced Natural Language Processing
(NLP) models to provide the pre-defined answers to the student’s queries. Chatbot is very helpful for query
resolution during the counseling process of the institute as it will provide official/uniform information and
can be accessed 24 × 7. Therefore, the aim of this research work was to implement a chatbot using various
NLP models and compare them to identify best one. In this work, five chatbot models were implemented
using neural networks, TF-IDF vectorization, sequential modeling and pattern matching. From the results,
it was observed that neural network-related models had better accuracy than TF-IDF and pattern matching
model, and sequential modeling is the most accurate model because it prevents over-fitting. Furthermore,
a chatbot having any kind of optimizer can improve the result and it is most important that pattern matching,
and semantic analysis should be the parts of a chatbot for real time scenarios.

INDEX TERMS Conversational AI, natural language processing, artificial intelligence, chatbots, neural
networks, sequential modeling, pattern matching, semantic analysis.

I. INTRODUCTION personal, professional, or academic lives instant assistance.

Chatbots are essential for counseling in engineering institutes Due to chatbots’ real-time functionality, kids may get help
for a number of reasons. The first benefit of these chatbots whenever it’s convenient for them, which encourages a
is that they make counseling services more accessible proactive approach to problem solving.
by eliminating time and location constraints and offering Chatbot is an artificially intelligent entity that can interact
engineering students who could be facing difficulties in their with humans and mimic conversations. The input to the
chatbot could be text-based or spoken (voice-based queries).
The associate editor coordinating the review of this manuscript and Chatbots are majorly used for information retrieval. It can
approving it for publication was Wei-Yen Hsu . run on a local computer or mobile phone, though most of

2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 29633
G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

the time, chatbots are accessed through the web browser. neural networks, and other technologies. The structured way
A chatbot mainly works by asking a question or query of generating knowledge base with specific patterns and
regarding a specific topic. They work on the principle of responses is possible through specific mark-up language.
Artificial Intelligence (AI) and Natural Language Processing AIML stands for Artificial Intelligence Markup Language,
(NLP) to provide the answers to the user’s queries, and a an XML base markup language meant to create artifi-
predefined knowledge base helps to develop a response to the cial intelligent applications. In 2021, Md Mabrur Husan
question [1]. Dihyat et al. [6] wrote that AIML uses pattern-matching
There are three major types of chatbots, namely the techniques to formulate query answers. The basic unit of the
Rule-based, Retrieval-based model, and Generative-based AIML script is called category tag, which is formed by user
model. In the Rule-based Model, the bot responds to queries input patterns and chatbot responses according to the input.
using pre-programmed rules. This kind of chatbots can The question is stored in the < pattern > tag inside each
answer a simple and limited set of questions. Retrieval- category, while the corresponding answer is stored in the
based Models select an appropriate response from a group < template > tag. The design comprises words, spaces, and
of pre-defined responses using limiting conditions(heuristic). wildcard symbols such as ∧ and ∗. Wildcard symbols are used
The bot created can understand the entire conversation and to replace strings in AIML.
respond based on the context of the conversation. Lastly, Specifically the focus of the work is to use a conversational
Generative-based models generate responses from previous AI based intelligent chatbot, which is a good solution
and current experiences. This highly sophisticated chatbot for answering the student’s specific queries regarding the
type requires complex computational models and vast data admission process. It will provide 24 × 7 assistance, and the
to train [2]. The captured vast data of question and answers information will be uniform and precise [7]. The contribution
can be pre-trained to generate the model and subsequently of the paper are:
the model will be able to generate the accurate responses.
• Preparation of questions related fo counseling process
Usually, neural network based models such as Recurrent
• Handing a various forms of the same query using
Neural Network (RNN) and Long Short-Term Memory
semantic analysis
(LSTM) are more efficient to work on such a use-case.
• Capability to process all types of questions: simple to
The standard form of query will be asked in natural
complex
language format e.g. ‘‘English’’. To generate the appropriate
• Implementation and analysis of chatbots sing various
responses, understanding and analysis of query is important
technologies
and that can be done by Natural Language Processing (NLP).
As per natural language analysis of the queries, there
are three major types of queries can be defined for II. LITERATURE SURVEY
experimentation: The integration of chatbots into counseling services within
engineering institutes has gained traction as a means to
• Simple Query: A simple query consists of a single and
provide accessible and timely support to students facing
unconstrained query desire(desired output) and has a
academic, personal, or career-related challenges. Some of
single unconstrained query input. For example, in the
the chatbots used in counseling process are briefed and
query ‘‘What is the capital of USA?’’, the desired output
comparative analysis is explained in this section. A study by
of the query (i.e., Capital) is explicit, single, and not
Davis and Smith [8] emphasized the potential of chatbots in
bound to any constraint. The input to the query (i.e.
counseling to overcome geographical and time constraints.
USA) is also single and unconstrained.
The implementation of chatbots allowed students to seek
• Complex Query: A complex query consists of a
guidance beyond traditional office hours, leading to increased
single query desire, which can be either constrained or
accessibility. The proposed chatbot emphasizes on the set of
unconstrained and the input to the query is multiple and
questions which are frequently getting asked during coun-
explicit but it can be constrained or unconstrained. For
seling process. Research by Johnson and Lee [9] explored
example, in the query ‘‘What was the capital of the USA
the role of chatbots in providing emotional support to
during World War II?’’, the query has multiple inputs
engineering students. The study found that chatbots equipped
(i.e. USA and World War II) and the desired output is
with sentiment analysis capabilities effectively identified and
single, unconstrained, and implicit (i.e. Capital).
responded to students’ emotional states, contributing to a
• Compound Queries: A compound query is a query
supportive environment. The development proposed chatbot
with a conjunction or dis-junction operator connecting
focuses on all types of questions which are generally need
two simple or complex queries. For example, ‘‘What are
to be answered before taking decision to join any university
the capitals of the USA and Germany?’’ is a compound
for the course. This decision process is quite an emotional
query because it has ‘‘and’’ in it.
for the students and parents hence appropriate questions to
To develop the conversational agents, the broader field be answered is a crucial task. Career-oriented chatbots were
of AI and natural language processing continues to evolve investigated by Patel et al. [10] for offering personalized
with advancements in machine learning, deep learning, career guidance to engineering students. Results indicated

29634 VOLUME 12, 2024

G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

that students who engaged with career-focused chatbots Fryer and Carpenter [20] explained about Jabberwacky,
demonstrated a clearer understanding of their career paths written in CleverScript, an Artificial Intelligence tool. Eleni
and increased confidence in their choices. The proposed Adamopoulou et al. [15] authors wrote that Jabberwacky
chatbot mainly focuses on various preferences of students to was created in 1988 and used contextual pattern-matching
decide the university for the admission. The work of Chang algorithms to answer queries based on previous discussions.
and Wang [11] delved into privacy concerns associated with In 2020, Shivang et al. [21] mentioned about Jab-
counseling chatbots. The study highlighted the importance berwacky’s main goal that was to transition from a text-based
of secure communication channels and transparent data system to a fully voice-driven system. Mathew et al. [22] have
handling practices to ensure the confidentiality of sensitive implemented an NLP-based personal learning assistant for
information shared during counseling sessions. User accep- school education. The chatbot proposed in this paper requires
tance of counseling chatbots was explored by Yang and the potential to cover the whole subject’s contents which
Liu [12]. Their research found a positive correlation between can be achieved by enhancing the ontology and knowledge
the user-friendliness of chatbots and students’ willingness to base.
engage. Clear communication of the chatbot’s capabilities In 2020, Verma et al. [21] implemented the Artificial
and limitations also played a crucial role in user acceptance. Linguistic Internet Computer Entity (ALICE), which is
Ranoliya et al. [13] have developed a chatbot for Natural Language Processing chatbot. This chatbot uses
university-related FAQs. This chatbot was implemented using heuristic pattern and matching algorithms to conduct con-
Artificial Intelligence Markup Language (AIML) and Latent versations. ALICE was written using Artificial Intelligence
Semantic Analysis (LSA). Authors need to try other ways of Markup Language (AIML), an XML-based schema for
implementation significantly when the dataset of questions writing heuristic conversational rules.
increases. In 2020, Adamopoulou and Moussiades [15] wrote
Sharma et al. [14] discussed that ELIZA was created that ALICE entirely relied on pattern-matching algorithms
by a German Computer Scientist, Joseph Weizenbaum, without recognizing the context of the entire conversation.
in 1966. It is considered to be the first chatbot in computer Also, ALICE lacks intelligent traits and cannot generate
history. Eliza used ‘‘pattern matching’’ and substitution human-like responses that express emotions and attributes.
methodologies to simulate conversations. Mittal et al. [23] have developed a Web-based chatbot for
In 2020, Adamopoulou and Moussiades [15] that ELIZA Frequently Asked Queries (FAQ) in Hospitals. Authors have
responds like a psychotherapist by returning the user’s query used ML algorithms to train the dataset and NLP methods
in an interrogative form. The downside of ELIZA is its limited for text processing. Authors have used the Gradient descent
knowledge, so it can only discuss a limited range of topics. algorithm, but no comparison has been provided concerning
Furthermore, ELIZA cannot maintain long conversations and other algorithms.
cannot learn context from the conversation. In 2020, Verma et al. [21] explained about Mitsuku,
Khan and Raza Rabbani [16] implemented a chatbot an intelligent chatbot created by Steve Worswick using
using AI and NLP models for Islamic finance and banking AIML. Maher et al. note maher2020chatbots implemented
customers. Authors have used the traditional NLP model to Mitsuku for a general type of conversation and interacted
implement a chatbot where the chatbot’s performance is not with the user using the rules written in AIML. Mitsuku
compared with various other methods. can also be integrated with social media platforms like
Adamopoulou and Moussiades [15] discussed a chatbot Telegram or Twitter. The chatbot is hosted at Pandorabot
called PARRY created in 1971. PARRY is considered to and employs NLP with heuristic patterns. Mitsuku can retain
be more evolved than ELIZA since it has a ‘‘personality’’ and utilize large amounts of conversational history in future
and a more effective control structure. In 2012, Sandeep conversations.
A Thorat et al. [17] wrote that the ELIZA chatbot system Nguyen et al. [24] performed an empirical study of user
also has language comprehension capabilities and can have interaction with chatbot vs. menu interface. The results
variables like mistrust, anger, and fear. conclude that chatbots provide lower user satisfaction over
Tiwari et al. [18] have implemented a chatbot using the menu interface due to the vague nature of queries and
neural networks and NLP for COVID-19-related queries. generated answers. The authors suggested that implementing
The dataset of questions and answers have used to train chatbots should focus on perceived autonomy, perceived
and generate the responses. The neural network model competence, and cognitive effort. Hence, various methods of
requires a considerable dataset, and the queries are vague, implementations need to be compared.
so preparing the model and developing specific responses In 2020, Verma et al. [21] wrote about Siri, a virtual
takes longer. Ranavare and Kamath [19] have implemented a assistant developed by Apple launched in 2010. Siri uses a
chatbot for placement activity using the DialogFlow method. natural language interface that enables it to take actions, make
The proposed approach needs structured data handling per recommendations and perform specific actions in response
the pre-defined dialogs and cannot accommodate semantic to voice queries. With users usage, Siri can adapt to the
queries. user’s language usage and searches. Siri has many features,

VOLUME 12, 2024 29635

G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

including handling device settings, scheduling events, and In 2015, Ahmed et al. [29] wrote that Artificial Intelligence
searching queries online. Markup Language (AIML) has a syntax similar to Extensible
Adamopoulou and Moussiades [15] wrote about Siri’s Markup Language (XML) and is used for pattern-matching
weaknesses. Siri’s main disadvantage is that it depends on algorithms.
the internet to function. Siri can understand many languages, In 2018, Rani et al. [30] wrote that in IR systems, stop
but many languages are not supported, while the navigational words are words with little or no semantic importance.
instructions are only available in English. Furthermore, Siri Removing such common words can lead to more effective
struggles to understand solid accents and commands in the corpus indexing and boost the performance of an IR system.
presence of external noise. In 2021, Ofer et al. [31] wrote that splitting text into atomic
Han and Lee [25] have implemented a FAQ chatbot for units of information in a selected language representation
Massive open online courses. The authors suggest a concep- (called tokens) is known as tokenization. Although some
tual framework for chatbots and explain how it is essential approaches employ individual letters, most English NLP
to implement and integrate chatbots for conversation-centric models use words as tokens. Individual-character tokens
tasks. provide more versatility, particularly for out-of-vocabulary or
In 2019, Qaffas [26] wrote about IBM’s Watson. Watson misspelled words and languages lacking unambiguous word
is a question-and-answer unit that can answer questions in division.
natural language. Watson uses NLP and machine learning Based on literature, following research gaps have been
algorithms to extract insights from previous conversations. identified to formulate the research:
The downside of Watson is that he only supports English. • The existing research lacks in identifying best suitable
Adamopoulou and Moussiades [15] authors wrote that models to implement chatbots. Hence, there is a need
Watson was created in 2011, and later ‘‘Watson Health’’ for implementation and comparative analysis of various
helped doctors diagnose diseases. models to select the best one.
In 2020, Adamopoulou and Moussiades [15] wrote about • Implementation of chatbot by considering all simple and
Google Assistant, which consists of the next generation complex queries related to university/institution is major
of Google Now. Google Now was created by Google in missing.
2012 and gave responses based on users’ preferences and • Existing chatbots lack in domain information related
locations. Google Assistant is a deeper artificial intelligence to educational institutions. Hence, there is a need
and has a friendlier user interface. The main disadvantages of of generating extensive question-answer repository for
Google Assistant are that it has no personality and violates such chatbots.
the user’s privacy because it is directly linked to their Google • There is a requirement to implement conversation AI
accounts. which considers domain knowledge and semantics of
In 2020, Adamopoulou and Moussiades [15] wrote about questions while answering.
Cortana, a digital assistant developed by Microsoft in 2014.
It understands voice instructions, identifies time and location,
III. RESEARCH FORMULATION
sends emails, creates reminders, and manages lists. Cortana
Technical Engineering colleges follow an online admission
has a significant flaw in that it can run software that installs
process along with counseling for providing information
malware.
regarding various courses. Students and Parents have a lot
In 2016, Abadi et al. [27] wrote that TensorFlow is
of queries regarding the entrance process which are clarified
a programming language for expressing and executing
on calls or by visiting the university. There might be some
machine learning algorithms. With few or no adjustments,
miscommunication of information, or many a times officials
TensorFlow computations can be conducted on various
get busy on other calls which prevents any other student from
heterogeneous systems, ranging from mobile devices like
getting their queries resolved.
phones and tablets to large-scale distributed systems. The
Engineering colleges follow an online admission process
framework is extensible and may be used to define multiple
which constitutes a counseling process and stream selection.
algorithms, such as deep neural networks and inference
During this phase, the students and parents have various kinds
approaches.
of questions, such as:
In 2018, Qaiser et al. [28] wrote that Term Frequency
and Inverse Document Frequency (TF-IDF) is a numerical • Queries regarding the college
statistic that illustrates the relevance of keywords to specific • Queries regarding the branch they are about to select
document in other words. • Queries regarding the students’ options after college
Ranoliya et al. [13] have developed a chatbot for • Queries regarding the placements in the final year
university-related FAQs. This chatbot was implemented using • Queries related to the various branches and the differ-
Artificial Intelligence Markup Language (AIML) and Latent ence between them along with questions related to the
Semantic Analysis (LSA). Authors need to try other ways of relatively new branches.
implementation, significantly when the dataset of questions All the queries can be clarified by visiting the college or
increases. over a phone call with the officials. Due to the shear volume

29636 VOLUME 12, 2024

G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

of queries it might be difficult for the officials to answer all

the queries properly which may lead to miscommunication
of information, and sometimes the they might be busy on
some other call which prevents the student from getting the
information they need.
Students may have to rely on online platforms such as
Quora or Telegram groups for getting information and getting
their queries resolved. These sources of information are
not reliable because the information is not provided by a
university officials. Furthermore, students might need to
navigate through the entire college website to find a particular
piece of information which can be tedious.

IV. METHODOLOGY
The development of university information chatbots involves
defining objectives, understanding user requirements, select-
ing a suitable platform, integrating with university systems,
and deploying across various channels. The overview of an
developed solution is explained step-wise as follows:
• Preparation of Questions Related to Counseling Process:
The solution begins with a meticulous preparation
of questions related to the counseling process. This FIGURE 1. Methodology for chatbot development.
involves understanding the varied needs and concerns of
users, including prospective students, and parents. The
question preparation phase includes input from counsel-
ing experts to ensure the chatbot is equipped to address a learning models are integrated to enhance the chatbot’s
wide range of inquiries related to admissions, academic understanding of user intent and context. The solution
programs, career guidance, and support services. embraces a comparative analysis of different technolo-
• Handling Various Forms of the Same Query Using gies to select the most suitable ones, considering factors
Semantic Analysis: To enhance the effectiveness of like accuracy, scalability, and ease of integration with
the chatbot, semantic analysis is employed to handle existing systems.
various forms of similar queries. Through natural
Figure 1 depicts the methodology for developing a chatbot.
language processing techniques, the chatbot is trained
The dataset is created by collecting all the questions
to recognize the semantic meaning behind different
asked about a particular technical university from various
expressions of the same question. This enables the
social media portals and the university’s students and
chatbot to provide consistent and accurate responses
faculty. Answers are obtained for these questions from
regardless of how users phrase their queries, ensuring
authorized sources from the university. The dataset has-
a more user-friendly and efficient interaction.
around 250 questions formed in different ways. The following
• Capability to Process All Types of Questions: Sim-
methodology is used for using these questions and answers to
ple to Complex: The developed solution ensures the
design a chatbot. In the first step, raw data is pre-processed
chatbot’s versatility in processing questions of vary-
and converted into a format that is easier and more effective
ing complexity. Whether users have straightforward
for further processing steps. It also normalizes the raw data in
queries about admission deadlines or complex inquiries
the dataset and reduces the number of features in the feature
regarding academic policies, the chatbot is designed
set. This leads to a decrease in the complexity of fitting the
to comprehend and respond appropriately. The system
data to each classification model.
is equipped with an extensive knowledge base and
The pre-processing steps are explained below:
advanced algorithms to tackle a diverse set of questions,
providing comprehensive support across the counseling • Converting to Lowercase: The raw text is changed
spectrum. to lowercase to avoid numerous variants of the same
• Implementation and Analysis of Chatbots Using Vari- word, and all the terms, regardless of their casing, are
ous Technologies: The implementation of the chatbot standardized/normalized to lowercase so they can be
solution is characterized by the use of various tech- counted together.
nologies to optimize performance and user experience. • Tokenization: Tokenization is dividing a text stream
Technologies such as natural language processing into meaningful elements called tokens. Tokens can be
(NLP), machine learning algorithms, and possibly deep words, sentences, or any other part of the sentence.

VOLUME 12, 2024 29637

G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

• Bag Of Words: Neural networks cannot understand

words and require numbers as input. Therefore, the
Bag of Words model is used to convert words into
machine-recognizable vectors of numbers. There are
two types of Bag of Words: a list of zeros and ones that
signify whether the word is present in the sentence. The
other kind counts the number of occurrences of the most
frequently used words. The size of the Bag of Words is
the number of unique root words. The Bag of Words is
represented as a list of 0s and 1s where each position in
the list means if a comment exists or not in the sentence. FIGURE 2. Structure of a neural network.
sentence=hello, how are you?
words = [‘‘are’’, ‘‘bye’’, ‘‘hello’’, ‘‘hi’’, ‘‘how’’, ‘‘i’’,
‘‘thank’’, ‘‘you’’] information in only one direction. Contrary to feed-forward
bag = [1,0,1,0,1,0,0,1] neural networks, Long Short-Term Memory (LSTM) uses
• Removing Stop Words: The stopwords are the words
recurrent neural networks, where the information flow is non-
that occur most frequently in a document and contain linear. While dealing with sequential data or data with a
very little information that is not usually relevant. For temporal link, LSTMs are favored. However, LSTMs have
example, in the English language, there are some words disadvantages: they are comparatively slow and require a
such as ‘‘a,’’ ‘‘about,’’ ‘‘above,’’ ‘‘after,’’ ‘‘again,’’ and sizeable high-quality dataset to get acceptable results.
‘‘against’’ all contain meagre information, and thus Once the model is ready, the next is to design a chatbot.
they are called stopwords. Removing stopwords reduces There are many ways of creating a chatbot; all of them will
vector space and improves the model’s performance by have different performances. Even if the same query is fed
increasing accuracy and reducing the training time and to all the chatbots, their responses might be different. In this
number of calculations. section, the following five chatbots are explained:
• Stemming: The stemming is the process of removing • Smart Bot: In this chatbot, the neural network is created
prefixes and suffixes(affixes) of words. When several using TensorFlow (TfLearn)
types of features are stemmed into a single feature, • Sam: In this chatbot, the neural network is created using
it reduces the amount of features in the feature space PyTorch
and increase the performance of the classifier. For • Big Mouth: This chatbot is created using TF-IDF
example, the words ‘‘likes,’’ ‘‘like,’’ ‘‘likely,’’ ‘‘liked’’ Vectorization
and ‘‘liking’’ will all be stemmed from their root form • Hercules: This chatbot is created using Sequential
‘‘like’’. Consolidating all term versions with the same Modeling
root form guarantees that the root form’s frequency is • ALICE: This chatbot is created using AIML
assessed collectively, increasing the likelihood of the
root form appearing. A. SMART BOT: NEURAL NETWORK USING TENSORFLOW
• Lemmatization: The Lemmatization and stemming are (TFLEARN)
similar as they combine several versions of a word into a As the name suggests, TensorFlow runs computations based
single root form. When reducing and aggregating words, on tensors. In machine learning, a tensor is a generalization of
lemmatization considers the context and lemma of each vectors and matrices, represented as an n-dimensional array
word. As a result, it makes no distinction between words of a base datatype. All elements of a tensor always have the
with slightly different meanings. The words ‘‘went,’’ same data type.
‘‘going,’’ ‘‘gone,’’ and ‘‘go’’ will all become ‘‘go,’’ even The explanation of Algorithm 1 is given in the following
though ‘‘went’’ has distinct characters from ‘‘go,’’ which paragraphs. The data is stored in intents. json file, and it
the stemming algorithms do not do. contains a list of goals. Each plan or class has a tag, a pattern,
After pre-processing next step is to use it for building a and a response. The ‘‘tag’’ defines the purpose or class.
model for a chatbot. This is done using a Neural network as The ‘‘pattern’’ lists possible questions for the corresponding
shown in Figure 2. category. The ‘‘response’’ is a list of possible answers to the
The neural network framework is based on the biological questions of that ‘‘tag.’’ The chatbot will take the message
neural network formed inside the human brain. An input from the user, identify the ‘‘tag’’ of the message, and give the
layer, one or more hidden layers, and an output layer make corresponding response.
up an artificial neural network. Each artificial neuron in each Every ‘‘pattern’’ of every ‘‘intent’’ is tokenized using
layer is connected to the neuron in the next layer and has a nltk.word_tokenize() and is appended to the ‘‘words’’ list.
weight assigned to it. All the tags are stored in the ‘‘labels’’ list. ‘‘words’’ is a
A Feed Forward Neural Network is one of the most list containing all the words in the database. Every word in
fundamental neural network types because it conveys ‘‘words’’ is converted to lowercase using the lower() function.

29638 VOLUME 12, 2024

G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

Algorithm 1 Algorithm for Chatbot Using TensorFlow equals 1. After the model is trained, the variables are stored
data ← load data from JSON file in the data-pickle file. The training dataset is passed through
If the model has already been trained, load the variables from the pickle file
Initialize lists words, labels, docsx , docsy the model as a bag of words, and the model is trained. When
while intent ∈ data do the user query is passed through the neural network, the tag
while pattern ∈ intent do
wrds ← tokenize words in the pattern with the highest probability is chosen, and the corresponding
Apppend wrds to words response is given to the user [32].
Append wrds to docsx
Append tag of the intent to docsy
end while
/ labels then
if tag ∈
Algorithm 2 Algorithm for Chatbot Using PyTorch
Append tag of the intent to labels Data = Load JSON Data
end if Initialize lists tags, xy, allwords
end while while intent ∈ data do
Tag of the intent is stored in ‘‘tag′′
remove punctuations from ′′ words′′ while question ∈ intent[question] do
Stemming and converting to lowercase of xy ← xy + tokenized sentence
, ‘‘words′′ and store in ‘‘words′′ end while
sort‘‘words′′ and‘‘labels′′ end while
Initialize lists: training, output Remove stop words
while sentence ∈ docsx do Apply stemming, remove all duplicates, sort it and add them to ‘‘tags’’
Initialize ‘‘bag’’(bag of words) Initialize lists Xtrain , ytrain
stem every word in ‘‘sentence′′ and while pattern ∈ xy do
store in ‘‘wrds′′ while tag ∈ xy do
′′
Create bag of words using ‘‘pattern′′ and ‘‘allwords and
while word ∈ words do
if word ∈ wrds then store in ‘‘bag′′
′′
append ‘‘bag to Xtrain
append 1 to bag
else append tag of intent to label
append 0 to bag append ‘‘label ′′ to ytrain
end if end while
end while end while
Append ‘‘bag′′ to ‘‘training′′ A Forward Neural Network is created using PyTorch with two hidden layers with
ReLu activation functions. The output layer is of size equal to the number of tags in
outputrow [labels.index(docsy [x])] = 1 the database. The ReLu activation function is applied to the input and hidden layers.
Append ‘‘bag′′ to ‘‘training′′ The title with the highest probability is chosen. Training loss is calculated and printed
Append outputrow to output after every 100 epochs, and the final loss is computed.
end while query = input from user
convert ‘‘training’’ and ‘‘output’’ to array and save the variables in pickle file X = query is tokenized and converted to bag of words
create a Deep Neural Network using tflearn. The size of the input layer is same as output = query is passed through the model
the size of the bag of words. Then add 2 hidden layers (fully connected) of 8 neurons if probability > 0.75 then
each. The size of the output layer is equal to the number of tags if tag ∈ tags then
set ‘‘number of epochs′′ to 1000 Randomly print one of the responses in that tag
end if
save the model else
Print that the bot does not understand the query
end if

Now, all the words in the ‘‘words’’ list are stemmed using
LancasterStemmer().stem() function and all the duplicate B. SAM: NEURAL NETWORK USING PYTORCH
comments are removed using set(words). ‘‘words’’ and The explanation of Algorithm 2 is given in the following
‘‘labels’’ are then sorted. A bag of words as ‘‘bag’’(empty paragraphs. The data is stored in intents. json file, and it
list) is created, where the size of ‘‘bag’’ is the number of root contains a list of intents. Each intent or class has a tag,
words in the database. For every word in the ‘‘words’’ list, a pattern, and a response. The ‘‘tag’’ defines the intent or
if that word exists in the sentence, then one is appended to class. The ‘‘pattern’’ is a list of possible questions for the
‘‘bag’’; else 0 is appended to ‘‘bag’’. corresponding class. The ‘‘response’’ is a list of possible
A Deep Neural Network is created using TfLearn. The answers to the questions of that ‘‘tag.’’ The chatbot will take
size of the input layer is equal to the size of the ‘‘bag.’’ The the message from the user, identify the ‘‘tag’’ of the message,
input to the neural network is ‘‘bag.’’ Two fully connected and give the corresponding response.
hidden layers of eight neurons each are added to the network. Pre-processing steps are applied to the data. Every question
Fully Connected layers mean that all possible connections are of every intent is tokenized using nltk.word_tokenize() and is
present, wherein every input of the input vector influences appended to the ‘‘all_words’’ list. Every unique tag is stored
every output of the output vector. An output layer of size in the ‘‘tags’’ list. Now, ‘‘all_words’’ is a list that contains
equal to the number of tags in the dataset is added to the all the tokenized words of the dataset, and ‘‘tags’’ is a list
network. The softmax activation function is applied to each that contains all the tags of the database. All the punctuation
neuron in the output layer. ‘‘n_epochs’’ is the number of tokens are removed; every word in the dataset is converted
times the model will see the same training data. In this to lower case using the lower() function, and the words are
model, ‘‘n_epochs’’ is set to 1000. The softmax activation stemmed using PorterStemmer().stem() function from nltk.
function converts the output to a list of probabilities, with ‘‘all_words’’ list is sorted using sorted(all_words) function
each value denoting the possibility that the sentence belongs and all the duplicate words are removed using set(all_words)
to the corresponding tag and that the sum of all probabilities function. ‘‘tags’’ list is also sorted. To create a bag of words,

VOLUME 12, 2024 29639

G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

a ‘‘bag’’ list is created. ‘‘bag’’ list is of length equal to the Algorithm 3 Algorithm for TF-IDF Vectorization
size of the list ‘‘all_words’’ or the number of unique stemmed Data = Query input from the user
Output = Response from chatbot
words in the database. ‘‘bag’’ list is initialized with 0. For
every word in ‘‘sentence,’’ use its corresponding ‘‘index’’ in sentTokens ← Sentence tokenized data from database
Append the query to sentTokens
‘‘all_words’’ to set bag[index] to 1. tfidf ← TF − IDF vectorized data with stop words removed
A Feed Forward Neural Network is created using a vals ← Cosine Similarity between the user
query(tfidf [−1]) and tfidf
torch-module, a base class for all neural network modules. reqTFIDF ← Maximum Cosine Similarity in vals
A feed-forward neural network is an artificial neural network if reqTFIDF > 0 then
return the corresponding response
in which the connections between nodes do not form a cycle. else
One linear input layer of size equal to the ‘‘bag’’ list is created. return‘‘I do not understand.."
end if
Two hidden linear layers having eight neurons are created.
One output layer of size similar to the size of the ‘‘tags’’ list
is formed. A ReLu activation function is defined. Training Algorithm 4 Algorithm for Greeting in TF-IDF Vector
data is passed through the input layer; then, the activation Chatbot
Data = Query from the user
function is applied. This data is fed to the hidden layer, Output = Response from the chatbot if the query is a greeting
and then the activation function is used, which is provided
greetingInput ← List of greeting input words
to another hidden layer. The activation function is applied, greetingOutput ← List of greeting output words
and this data is fed to the output layer. The learning rate is while word ∈ query do
if lowerCase(word) ∈ greetingInput then
a vital hyperparameter that determines how fast the neural return random greetingResponse
network converges to an optimum value. In this model, the end if
end while
value of the learning rate is 0.001. When the user query
is passed through the neural network, the ‘‘tag’’ with the
highest probability is chosen, and the response is given to
the user. Term Frequency-Inverse Document Frequency (TF-IDF)
stores the component of resulting scores assigned to each
word. The goal of TF-IDF vector is to calculate the word
C. BIG MOUTH: USING TF-IDF VECTORIZATION frequency scores for the text that are more interesting (less
The term ‘‘Term Frequency’’(TF) is used to count how common). Term Frequency is used to calculate the frequency
many times a time appears in a document [33]. There are of each word, whereas, Inverse Document Frequency down
5000 words in document ‘‘T1,’’ and the word ‘‘alpha’’ scales the score of frequently occurring words.
appears ten times. As a result, the term ‘‘alpha’’ frequency The explanation of Algorithm 5 is given the following
in document ‘‘T1’’ will be paragraph. The corpus consists of full stop separated
TF = t/s answers in the form of a text file. The corpus is
where t is number of occurrences in a file and s is the total loaded and is tokenized using nltk.word_tokenize and
number of words in the document nltk.sent_tokenize. The tokens are lemmatized using
TF = 10/5000 = 0.002 nltk.stem.WordNetLemmatizer().lemmatize(). Words which
The inverse document frequency gives less weight to are there in string.punctuation (set of punctuations) are
frequently occurring words and more weight to infrequently removed. Input is taken from the user in form of a
occurring words. For example, if we have ten documents and ‘‘query’’. Greeting is implemented using the pattern
the term ‘‘alpha’’ appears in five of them, we may calculate matching algorithm. If the query contains any words from
the inverse document frequency as GREETING_INPUT (list of predefined greeting inputs),
IDF = log(M /m) then the chatbot will return a random response from
where M is the total number of documents in the corpus and GREETING_OUTPUT (list of predefined greeting outputs).
m is then number of documents ‘‘query’’ is tokenized and lemmatized and the result is
containing the required term appended to the list ‘‘sent_tokens’’. sklearn is the library
IDF = log(10/5) = 0.301 used for TF-IDF vectorization and to calculate the cosine
The detailed steps of TF-IDF Vectorization are shown in similarity. Cosine similarity is calculated between every
Algorithm 3 and Algorithm 4. sentence from the corpus and the user query, the sentence
In 2020, Abhishek Jaglan et al. [34] authors wrote that having the highest cosine similarity is given as the output
textual data can not be employed in the model directly, instead (cosine similarity values are sorted in descending order and
it has to be converted to numerical vectors. This can be the first value is selected) [35].
done by assigning a unique number to each word, and given
data can be encoded with the length of vocabulary of known D. HERCULES: USING SEQUENTIAL MODELING
words. The Bag-of-Words model is a way of representing The data is stored in intents. json file and contains a list
whether the words exists in the ‘‘sentence’’ or not regardless of intents. Each intent or class has a tag, a pattern, and a
of their sequence of appearance. response. The ‘‘tag’’ defines the intent or class. The ‘‘pattern’’

29640 VOLUME 12, 2024

G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

Algorithm 5 Algorithm for Chatbot Using IF-IDF Vectoriza- Algorithm 6 Algorithm for Chatbot Using Sequential
tion Modeling
data = load data.txt data = load data from JSON file
senttokens = sentence tokenization of data Implemet Lemmatization
wordtokens = word tokenization of the text Initialize lists words, classes, docx , docy
Remove punctuations from the text while intent ∈ data do
Initialize lists GREETINGINPUT , GREETINGOUTPUT while pattern ∈ intent do
with sample greeting wrds = tokenize words in the pattern
inputs and outputs Append "wrds" to words
while word ∈ sentence.split do Append "wrds" to docx
if word ∈ GREETINGSINPUT then Append "tag" to docy
Print out a random GREETINGSOUTPUT end while
end if if tag ∈/ labels then
end while Append "tag" to labels
userInput = Input from the user end if
userresponse = tokenized user query end while
Append userresponse to senttokens remove punctuations from ‘‘words’’
tfidfVec = Create tfidf Vector of senttokens words = stemed "words" converted to lowercase
vals = Cosine Similarity between "senttokens " and "userresponse " sort ‘‘words’’ and ‘‘labels’’
idx = Id of the sentence with the highest Cosine Similarity Initialize lists: training, output
if reqtfidf = 0 then while sentence ∈ docsx do
Print ‘‘I am sorry, I didnt understand you’’ Initialize "bag"(bag of words)
else wrds = stem every word in "sentence"
Print senttokens [idx] while word ∈ words do
end if if word ∈ wrds then
append 1 to bag
else
append 0 to bag
end if
is a list of possible questions of the corresponding class. The end while
Append "bag" to training
‘‘response’’ is a list of possible answers to the questions of
that ‘‘tag.’’ The chatbot will take the message from the user, outputrow [labels.index(docsy [x])] = 1
Append "bag" to training
identify the ‘‘tag’’ of the message, and give the corresponding Append outputrow to output
response. end while
create a Sequential model with softmax activation function
The explanation of Algorithm 6 is given in the following
paragraphs. Every question of every intent is tokenized
using nltk.word_tokenize() and is appended to the ‘‘words’’
list. All the tags are stored in the ‘‘labels’’ list. ‘‘words’’
of probabilities, wherein each value denotes the likelihood
contains all the words in the database. Every word in
of the sentence belonging to the corresponding tag, and the
‘‘words’’ is converted to lowercase using lower() function.
sum of all possibilities equals 1. The model is optimized
Now, all the words in the ‘‘words’’ list are lemmatized
using ‘‘Adam.’’ The Adam Optimizer is an adaptive learning
using WordNetLemmatizer().lemmatize() function and all the
rate method, which computes individual learning rates for
duplicate words are removed using set(words). Both ‘‘words’’
different parameters. When the user query is passed through
and ‘‘labels’’ are sorted. A bag of words is created having the
the neural network, the tag with the highest probability is
variable name ‘‘bag,’’ where the size of ‘‘bag’’ is the number
chosen, and the response is given to the user.
of root words in the database. For every word in the ‘‘words’’
list, if that word exists in the sentence, then one is appended
E. ALICE: USING AIML
to ‘‘bag’’; else, 0 is appended to ‘‘bag’’.
In 2014, authors Srivastava et al. [36] wrote that dropout In AIML, categories are the basic unit of knowledge. Each
is a technique that prevents overfitting and provides a way category has a pattern and a template. The pattern describes
of efficiently combining different neural networks. The term the query, and template describes the chatbot’s responses.
‘‘dropout’’ refers to dropping out units randomly from the The template tag can have a list of possible responses for
hidden or visible layers in the neural network. By dropping the chatbot to choose from, and it will randomly give one
a team, it is temporarily removed from the web and its response.
incoming and outgoing connections by setting its weight to There are two types of AIML classes:
zero. • Atomic Category: It is an AIML classification where
A Neural Network with three layers is created. A dense the query are an exact match. This type of classification
or fully connected input layer is equal to a ‘‘bag’’ size with does not contain any wildcards.
a ReLu activation function. Dropout is used, which will < category >
drop 50% of the units. A fully connected hidden layer of < pattern > Good Morning < /pattern >
64 neurons is created, and the ReLu activation function is < template > Good Morning to you too! <
applied. Dropout is used, which will drop 30% of the units. /template >
A dense output layer of size equal to the number of tags in < /category >
the database is created with the Softmax activation function. • Default category: wildcard symbols such at ∧ and ∗
The Softmax activation function converts the output to a list are used in the pattern. ∗ wildcard captures one or more

VOLUME 12, 2024 29641

G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

words and ∧ wildcard captures 1 or more words.

< category >
< pattern > Hi, ∧ < /pattern >
< template > Hi, Good to see you < /template >
< /category >
These five chatbots were created to see how different
technologies and algorithms impact a chatbot’s performance.
Confusion matrices were calculated using the sklearn library.
Furthermore, all five chatbots were implemented, and several
epochs were varied. Lastly, 250 queries were executed on all FIGURE 3. Query-wise accuracy of the all chatbots.
the chatbots, and all the responses were observed.

V. RESULT ANALYSIS In this model, the neural network is not created. Instead,
The results of all the implemented chatbot are represented TF-IDF Vectorization converts every sentence into a vector,
in accuracy and validation. The first section contains the and Cosine Similarity calculates the similarity between every
confusion matrices and accuracy, calculated based on a sentence and the query. This model needs to understand
sample training dataset. In the second section, 150 queries the meaning of the query; it simply finds the most similar
were implemented on all the chatbots, and their responses sentence. Table 1 is the confusion matrix of Big Mouth.
were observed to check if they categorized the query correctly In this model, Sequential modeling is used while creating
or not. Lastly, the last section contains screenshots of the the neural network, which was designed to prevent the prob-
conversation with the bots. lem of overfitting; this improves the model’s performance.
Table 1 is the confusion matrix of Hercules.
A. CONFUSION MATRICES AND ACCURACY BASED ON In this model, AIML is used to create pattern-matching
SAMPLE TEST DATASET rules. No computation is done here, and the query is matched
A test dataset having 144 queries of one university is to the predefined rules. The programmer needs to understand
used to test the chatbot models. Confusion matrices and the AIML functionalities to get acceptable results deeply.
accuracies are calculated using sklearn metrics library. (using Table 1 is the confusion matrix of ALICE.
confusion_matrix and accuracy_score functions)
The neural network was created using TensorFlow in B. QUERY ANALYSIS ON CHATBOTS
this model, and multiple pre-processing steps were applied. One hundred fifty simple queries were created, and 15 had
The Lancaster Stemming algorithm was used in the pre- spelling mistakes. All these queries were implemented on all
processing phase, which is more accurate. Furthermore, the the chatbots, and their responses were observed to check if
softmax activation function is applied to the output layer, they categorized the question correctly or not implemented.
increasing the neural network’s performance. Table 1 is the Figure 3 shows the number of queries correctly answered
confusion matrix of Smart Bot. by each chatbot along with the accuracy of each model.
Table 2 depicts how many questions were correctly answered
TABLE 1. Confusion matrix of all the chatbots. by each model.
Table 2 depicts the various queries and how many questions
were correctly answered by each model.

C. CONVERSATIONAL ANALYSIS OF CHATBOTS

In this section, simple and compound queries are imple-
mented on all the chatbots, and the conversation shown in the
form of screenshots for sample queries. For Smart Bot, Sam,
and Hercules, training loss is calculated, and its variations are
noted when the number of epochs changes.

1) SMART BOT
As shown in Figure 4 At the 1000th epoch, the training loss
of the model is 0.35567, and the accuracy is 0.9738. If we
increase the number of epochs to 1500, the training loss of
In this model, the neural network was created using the model reduces to 0.15079, and the accuracy increases to
PyTorch, and the ReLu activation function was applied to the 0.9949. On the contrary, if the number of epochs becomes
input and hidden layers. This impacted the performance of 500, the training loss becomes 0.24558, and the accuracy
the model. Table 1 is the confusion matrix of Sam. becomes 0.9817.

29642 VOLUME 12, 2024

G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

TABLE 2. Question-wise performance of chatbots.

FIGURE 4. Training of smart bot.

FIGURE 5. Interface of smart bot.

Figures shown in 5, it can be seen the chatbot gives the

correct output for the queries for which it was not trained. For
example, the first query is in the dataset whereas the second
query is not in the dataset. Furthermore, even if there is the
spelling mistake in the query asked by the user, the model
gives the correct output.
For complex and Compound queries, the chatbot gives
accurate results. It can be seen in the above figures that the
chatbot provides the answers to both queries.

2) SAM
At the 1000th epoch, the training loss of the chatbot is
0.0003 as shown in 6. If the number of epochs is increased to
1500, the loss in training decreases to 0.0001. On the contrary,
if the number of epochs is reduced to 500, the training loss
It can be observed that increasing the number of epochs rises to 0.0017.
increases the accuracy of the chatbot and decreases the It can be observed that by increasing the number of epochs,
training loss. the training loss decreases.

VOLUME 12, 2024 29643

G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

FIGURE 6. Training of sam.

FIGURE 8. Interface of big mouth.

FIGURE 9. Training of Hercules.

FIGURE 7. Interface of sam.

As shown in Figure 8, it is observed that the chatbot might
not give the correct response if there is a spelling mistake in
the query.
Figures shown in 7, the model gives the correct answers For complex and Compound queries, the chatbot gives
to the queries in the dataset but the wrong answers to the accurate results. It can be seen in the above figures that the
queries not in the dataset. One of the possible explanations chatbot provides answers to both queries.
for this is the overfitting of the model on the training dataset.
Overfitting makes the model relevant only to the dataset on 4) HERCULES
which it was trained and irrelevant to all the other datasets. At the 200th epoch as shown in Figure 9, the training loss of
In the above figure, it is observed that the chatbot does not the model is 0.0354, and the accuracy is 0.9831. If we increase
give the correct response if there is a spelling mistake in the the number of epochs to 250, the training loss of the model
query. rises to 0.2196, and the accuracy reduces to 0.9800. On the
For complex and Compound queries, the chatbot gives contrary, if the number of epochs becomes 150, the training
accurate results. It can be seen in the above figures that the loss becomes 0.1089, and the accuracy becomes 0.9400.
chatbot provides the answers to both queries. It can be observed that increasing the number of epochs
increases the accuracy of the chatbot and decreases the
3) BIG MOUTH training loss. A similar trend is seen if the number of epochs
In the above figure, the model gives the correct answers to the is reduced.
direct queries, but if the queries are asked differently, it may In the Figure 10, it can be seen the chatbot gives the
provide a different response. correct output for the queries for which it was not trained.

29644 VOLUME 12, 2024

G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

FIGURE 11. Interface of Alice.

FIGURE 10. Interface of Hercules.

then the order will be Sam<Smart bot<Hercules. Since all
three models are neural network-related, it can be concluded
that:
For example, the first query is in the dataset whereas the
second query is not in the dataset. In the above figure, it is • stemming algorithms applied to the data might affect
also observed that the bot can give the correct response even the neural network’s performance. The Smart Bot
if there is a spelling mistake in the query. LancasterStemmer algorithm was used, whereas the
For complex and Compound queries, the chatbot gives Sam PorterStemmer algorithm was used.
accurate results. It can be seen in the above figures that the • The type of activation function and the layers on
chat provides answers to both the halves of the query. which it is applied might affect the neural network’s
performance. In Smart Bot, the softmax activation
5) ALICE function is applied to the output layer, and no activation
The above picture shows that the chatbot gives the correct function is applied to any other layer. On the other hand,
output for the queries it was not trained for. For example, the in Sam, the ReLu activation function is applied to each
first query is in the dataset, whereas the second query is not node in the input layer, and every node in the hidden
in the dataset. layers, and no activation function is applied to the output
The above figure shows that the bot does n give the correct layer.
response even if there is a spelling mistake in the query. • It can be noticed that Hercules has a Sequential
For complex and Compound queries, the chatbot does not Neural Network designed to prevent overfitting. This
give accurate results. It can be seen in the above figures that may be why Hercules has the highest percentage of
the answer provided by the chatbot needs to be completed, ‘‘yes.’’
as it only answers one query. • It can be observed that Hercules is the only
Considering Sam as the base chatbot, it can be observed chatbot with any optimizer applied, improving its
that Smart Bot and Hercules perform better than Sam. performance.
Furthermore, it can be observed that Big Mouth and Alice Considering Sam as the base chatbot, it can be observed
do not perform well as compared to Sam. SmartBot, Sam, that Alice (pattern matching) and Big Mouth (TF-IDF
and Hercules are all neural network-based models with a vectorization) might not be as effective as neural network-
‘‘ yes ‘‘ percentage greater than 60 percent. If these three based models, and they require a deeper understanding of the
chatbots are arranged in the increasing order of rate of ‘‘yes,’’ technologies to obtain an acceptable model.

VOLUME 12, 2024 29645

G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

D. TIME COMPLEXITY OF CHATBOTS personalized, responsive, and inclusive tool in the counseling
The time complexity of chatbots implemented using neural process.
networks (NN) and natural language processing (NLP) can
vary depending on the specific architecture, algorithms, and REFERENCES
models employed. Let’s break down the time complexity for [1] T. Lalwani, S. Bhalotia, A. Pal, V. Rathod, and S. Bisen, ‘‘Implementation
different components: of a chatbot system using AI and NLP,’’ Int. J. Innov. Res. Comput. Sci.
Technol. (IJIRCST), vol. 6, no. 3, pp. 26–30, 2018.
[2] J. Thukrul, A. Srivastava, and G. Thakkar, ‘‘Doctorbot—An informative
• Natural Language Processing (NLP): O(n) and interactive chatbot for COVID-19,’’ Int. Res. J. Eng. Technol. (IRJET),
vol. 7, no. 7, pp. 3033–3036, 2020.
• Neural Networks (NN): O(e * n * h)
[3] S. Maher, ‘‘Chatbots & its techniques using AI: A review,’’ Int. J. Res. Appl.
• Response Generation: O(1) Sci. Eng. Technol., vol. 8, no. 12, pp. 503–508, Dec. 2020.
[4] M. Aleedy, H. Shaiba, and M. Bezbradica, ‘‘Generating and analyzing
chatbot responses using natural language processing,’’ Int. J. Adv. Comput.
Sci. Appl., vol. 10, no. 9, 2019.
VI. CONCLUSION
[5] P. Qi, Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning, ‘‘Stanza: A Python
Engineering colleges follow an online admission process natural language processing toolkit for many human languages,’’ 2020,
that involves a counseling process for engineering stream arXiv:2003.07082.
[6] M. M. H. Dihyat and J. Hough, ‘‘Can rule-based chatbots outperform
selection. During the counseling phase, students and parents neural models without pre-training in small data situations? A preliminary
have many queries regarding the branches offered by the comparison of AIML and Seq2Seq,’’ in Proc. 25th Workshop Semantics
college and many other such queries. These questions can Pragmatics Dialogue, 2021, pp. 22–26.
be answered by visiting the college or over a phone call. [7] A. Chandan, M. Chattopadhyay, and S. Sahoo, ‘‘Implementing chat-bot in
educational institutes,’’ IJRAR J., vol. 6, no. 2, pp. 44–47, 2019.
The volume of the queries can be overwhelming, and due to [8] D. Davis and J. Smith, ‘‘The potential of chatbots in counseling,’’
this, there might be some miscommunication of information J. Counsel., vol. 5, no. 2, pp. 123–136, Apr. 2019.
or many times, the officials might be busy on other calls. [9] R. Johnson and S. Lee, ‘‘Chatbots providing emotional support to
engineering students,’’ in Proc. IEEE Eng. Educ. Conf., Austin, TX, USA,
Students might have to rely on unofficial sources like Quora 2020, pp. 45–50.
to get information. Furthermore, the students have to navigate [10] A. Patel, Personalized Career Guidance for Engineering Students.
through the entire website for data which can be tedious. Springer, 2021.
[11] Y. Chang and W. Wang, ‘‘Privacy concerns in counseling chatbots,’’
In this paper, five chatbot models were created using in Proc. IEEE Int. Conf. Comput. Commun., Paris, France, 2018,
neural networks, TF-IDF vectorization, and pattern matching. pp. 234–239.
In neural network-related models, pre-processing steps like [12] H. Yang and Q. Liu, ‘‘User acceptance of counseling chatbots,’’
J. Comput., vol. 8, no. 3, pp. 210–225, May 2019. [Online]. Available:
converting to lowercase, stemming, lemmatization, tokeniza- https://www.example-url.com
tion, removing stop words, and creating a ‘‘bag of words’’ [13] B. R. Ranoliya, N. Raghuwanshi, and S. Singh, ‘‘Chatbot for university
are applied to the training data before passing it through related FAQs,’’ in Proc. Int. Conf. Adv. Comput., Commun. Informat.
(ICACCI), Sep. 2017, pp. 1525–1530.
the neural network. A query is taken from the user; pre-
[14] V. Sharma, M. Goyal, and D. Malik, ‘‘An intelligent behaviour shown
processing steps are used to it, and it is passed through the by chatbot system,’’ Int. J. New Technol. Res., vol. 3, no. 4, 2017,
model, which returns the list of probabilities that the query Art. no. 263312.
belongs to a certain intent. [15] E. Adamopoulou and L. Moussiades, ‘‘Chatbots: History, technology, and
applications,’’ Mach. Learn. Appl., vol. 2, Dec. 2020, Art. no. 100006.
Hercules performs best among the five chatbots discussed [16] S. Khan and M. R. Rabbani, ‘‘Artificial intelligence and NLP-based chatbot
in the project because it has sequential modeling designed for Islamic banking and finance,’’ Int. J. Inf. Retr. Res., vol. 11, no. 3,
to prevent overfitting training data. Furthermore, it is the pp. 65–77, Jul. 2021.
[17] C. Curry and J. D. O’Shea, ‘‘The implementation of a story telling
only chatbot with any optimizer applied to it, improving its chatbot,’’ Adv. Smart Syst. Res., vol. 1, no. 1, p. 45, 2012.
performance. Therefore, it can be concluded that a chatbot [18] V. Tiwari, L. K. Verma, P. Sharma, R. Jain, and P. Nagrath, ‘‘Neural
similar to Hercules can be implemented in real-time for network and NLP based chatbot for answering COVID-19 queries,’’ Int.
J. Intell. Eng. Informat., vol. 9, no. 2, pp. 161–175, 2021.
university/institute counseling. This will be very helpful for [19] S. S. Ranavare and R. Kamath, ‘‘Artificial intelligence based Chatbot for
the students because it can provide official and accurate placement activity at college using DialogFlow,’’ Our Heritage, vol. 68,
results. Furthermore, it can give 24 × 7 assistance, and the no. 30, pp. 4806–4814, 2020.
[20] L. Fryer and R. Carpenter, ‘‘Bots as language learning tools,’’ Lang. Learn.
information provided will be uniform. Lastly, students will Technol., vol. 10, no. 3, pp. 8–14, 2006.
be able to rely on reliable information sources to resolve their [21] S. Verma, L. Sahni, and M. Sharma, ‘‘Comparative analysis of chatbots,’’
queries. in Proc. Int. Conf. Innov. Comput. Commun. (ICICC), 2020, pp. 67–78.
[22] A. N. Mathew, V. Rohini, and J. Paulose, ‘‘NLP-based personal learning
Future work for integrating ChatGPT into counseling could
assistant for school education,’’ Int. J. Electr. Comput. Eng. (IJECE),
focus on enhancing emotional intelligence, enabling dynamic vol. 11, no. 5, pp. 4522–4530, Oct. 2021.
learning and adaptation, exploring multimodal interactions, [23] M. Mittal, G. Battineni, D. Singh, T. Nagarwal, and P. Yadav, ‘‘Web-based
addressing privacy concerns, ensuring cultural sensitivity, chatbot for frequently asked queries (FAQ) in hospitals,’’ J. Taibah Univ.
Med. Sci., vol. 16, no. 5, pp. 740–746, Oct. 2021.
integrating with counseling resources, implementing a con- [24] Q. N. Nguyen, A. Sidorova, and R. Torres, ‘‘User interactions with chatbot
tinuous user feedback mechanism, prioritizing accessibility interfaces vs. menu-based interfaces: An empirical study,’’ Comput. Hum.
features, optimizing scalability for concurrent interactions, Behav., vol. 128, Mar. 2022, Art. no. 107093.
[25] S. Han and M. K. Lee, ‘‘FAQ chatbot and inclusive learning in
and conducting rigorous evaluation studies for efficacy massive open online courses,’’ Comput. Educ., vol. 179, Apr. 2022,
validation. These developments aim to make ChatGPT a more Art. no. 104395.

29646 VOLUME 12, 2024

G. Attigeri et al.: Advanced NLP Models for Technical University Information Chatbots

[26] A. A. Qaffas, ‘‘Improvement of chatbots semantics using Wit.Ai and word ANKIT AGRAWAL received the bachelor’s degree
sequence kernel: Education chatbot as a case study,’’ Int. J. Mod. Educ. in computer and communication engineering from
Comput. Sci., vol. 11, no. 3, pp. 16–22, Mar. 2019. Manipal Institute of Technology, Manipal. He is
[27] M. Abadi et al., ‘‘TensorFlow: Large-scale machine learning on heteroge- currently working professional in the IT industry.
neous distributed systems,’’ 2016, arXiv:1603.04467. He got the opportunity to write a research article
[28] S. Qaiser and R. Ali, ‘‘Text mining: Use of TF-IDF to examine the and create a project that will solve a real world
relevance of words to documents,’’ Int. J. Comput. Appl., vol. 181, no. 1, problem for colleges and it’s applicants with Mani-
pp. 25–29, Jul. 2018. pal Institute of Technology. He has always been
[29] I. Ahmed and S. Singh, ‘‘AIML based voice enabled artificial intelligent
intrigued with the concept of artificial intelligence
chatterbot,’’ Int. J. u-e-Service, Sci. Technol., vol. 8, no. 2, pp. 375–384,
and was is grateful to the college and his mentors
Feb. 2015.
[30] R. Rani and D. K. Lobiyal, ‘‘Automatic construction of generic stop words to provide him with the freedom and guidance to implement the project and
list for Hindi text,’’ Proc. Comput. Sci., vol. 132, pp. 362–370, Jan. 2018. use his capabilities to the maximum.
[31] D. Ofer, N. Brandes, and M. Linial, ‘‘The language of proteins: NLP,
machine learning & protein sequences,’’ Comput. Structural Biotechnol.
J., vol. 19, pp. 1750–1758, 2021.
[32] G. Sperlí, ‘‘A cultural heritage framework using a deep learning based
chatbot for supporting tourist journey,’’ Expert Syst. Appl., vol. 183,
Nov. 2021, Art. no. 115277.
[33] S. D. Nithyanandam, S. Kasinathan, D. Radhakrishnan, and
J. Jebapandian, ‘‘NLP for chatbot application: Tools and techniques used
for chatbot application, NLP techniques for chatbot, implementation,’’ in
Deep Natural Language Processing and AI Applications for Industry 5.0.
Hershey, PA, USA: IGI Global, 2021, pp. 142–168.
[34] A. Jaglan, D. Trehan, M. Megha, and P. Singhal, ‘‘COVID-19 trend
analysis using machine learning techniques,’’ Int. J. Sci. Eng. Res., vol. 11,
no. 12, pp. 1162–1167, Dec. 2020.
[35] M. A. Al Muid, M. M. Reza, R. B. Kalim, N. Ahmed, M. T. Habib, and
M. S. Rahman, ‘‘EduBot: An unsupervised domain-specific chatbot for
educational institutions,’’ in Artificial Intelligence and Industrial Appli-
cations: Artificial Intelligence Techniques for Cyber-Physical, Digital
Twin Systems and Engineering Applications. Cham, Switzerland: Springer,
2021, pp. 166–174.
[36] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and SUCHETA V. KOLEKAR (Member, IEEE)
R. Salakhutdinov, ‘‘Dropout: A simple way to prevent neural networks received the Ph.D. degree in adaptive e-learning
from overfitting,’’ J. Mach. Learn. Res., vol. 15, pp. 1929–1958, Sep. 2014. from Manipal Academy of Higher Education,
[Online]. Available: http://jmlr.org/papers/v15/srivastava14a.html Manipal, Karnataka, India. She is currently an
Associate Professor with the Department of
Information and Communication Technology,
GIRIJA ATTIGERI (Member, IEEE) received the MIT, Manipal Academy of Higher Education.
B.E. and M.Tech. degrees from Visvesvaraya She has around 15 years of experience in the
Technological University, Karnataka, India, and field of teaching and research. She has published
the Ph.D. degree from Manipal Institute of Tech- more than 20 papers in national and international
nology, Manipal Academy of Higher Education, journals/conference proceedings. Her primary research interests include E-
Manipal. She has 18 years of teaching and research learning, web usage mining, human–computer interaction, serious game
experience in reputed institutes of Karnataka. development, and cloud computing. She has received E-learning Excellence
She is currently an Associate Professor with the Award, in 2017, by Academic Conferences International for her research
Department of Information and Communication work in adaptive E-learning. She along with her student team have designed
Technology, Manipal Institute of Technology, and developed novel browser extension to capture the usage data of online
Manipal. She has more than 16 publications in reputed international courses which are provided by Coursera. She is one of the inventor for
conferences and journals. She has conducted several seminars and workshops the patent called ‘‘Smart sole-based diabetic foot ulcer prediction system’’
on her big data and machine learning. She is working on several projects which is granted by Chennai Patent Office, India. She handles additional
related to data analytics in health care, education and agriculture. Her responsibility with the institute to promote and enhance innovation and
research interests include big data analytics, artificial intelligence, machine entrepreneurship culture.
learning deep learning, and semantic web.

VOLUME 12, 2024 29647

Advanced Wordpress Wordpress Theme Plagin Development - by Md. Mijanur Rahman
No ratings yet
Advanced Wordpress Wordpress Theme Plagin Development - by Md. Mijanur Rahman
127 pages
College Chatbot
No ratings yet
College Chatbot
105 pages
CFIHOS-V1 5 1-Excel-Format-V1 5 1
No ratings yet
CFIHOS-V1 5 1-Excel-Format-V1 5 1
3,584 pages
IHM Dakol Manual
No ratings yet
IHM Dakol Manual
390 pages
Visual Designer Manual
No ratings yet
Visual Designer Manual
653 pages
Conversational AI Chatbots
No ratings yet
Conversational AI Chatbots
6 pages
Christian Gustafson - Seattle Programmer - Obama Kenyan-In-Chief - Real Estate Guru - Nazi Sympathizer - Anti-Semite - Gun Advocate
No ratings yet
Christian Gustafson - Seattle Programmer - Obama Kenyan-In-Chief - Real Estate Guru - Nazi Sympathizer - Anti-Semite - Gun Advocate
915 pages
Growth Hacking Handbook
100% (8)
Growth Hacking Handbook
131 pages
Revised - Ms-Word-Mcq Format
No ratings yet
Revised - Ms-Word-Mcq Format
101 pages
Chatbot IEEE 1
No ratings yet
Chatbot IEEE 1
5 pages
Chat Bot For College Management System U
No ratings yet
Chat Bot For College Management System U
4 pages
Developing A Chatbot Using Machine Learning
No ratings yet
Developing A Chatbot Using Machine Learning
17 pages
Marketing Cloud Developer Set
No ratings yet
Marketing Cloud Developer Set
18 pages
Web 2.0 - The Business Model-Springer (2008)
No ratings yet
Web 2.0 - The Business Model-Springer (2008)
326 pages
Medical Assistance Chatbot
No ratings yet
Medical Assistance Chatbot
4 pages
AI Based Chatbot
No ratings yet
AI Based Chatbot
4 pages
Ict Notes m1-3 q2
No ratings yet
Ict Notes m1-3 q2
8 pages
@vtucode - in 21CS732 Module 2 Textbook
No ratings yet
@vtucode - in 21CS732 Module 2 Textbook
146 pages
$1000 On YouTube & Free Traffic
0% (1)
$1000 On YouTube & Free Traffic
11 pages
1 Artificial Intelligence and Taxoimia Bllom Chat GPT
No ratings yet
1 Artificial Intelligence and Taxoimia Bllom Chat GPT
35 pages
Inserting Images (Tag)
No ratings yet
Inserting Images (Tag)
14 pages
CampusGuide ChatBot - 20250324 - 070521 - 0000
No ratings yet
CampusGuide ChatBot - 20250324 - 070521 - 0000
35 pages
Best Digital Marketing Course in Kerala Zeon Academy
No ratings yet
Best Digital Marketing Course in Kerala Zeon Academy
14 pages
CAI7
No ratings yet
CAI7
18 pages
130+ ChatGPT Prompts For Etsy Success and SEO
No ratings yet
130+ ChatGPT Prompts For Etsy Success and SEO
37 pages
Aa
No ratings yet
Aa
30 pages
Chatbot For Children Assistance
No ratings yet
Chatbot For Children Assistance
6 pages
Research Paper
No ratings yet
Research Paper
8 pages
Interactive Notes
No ratings yet
Interactive Notes
6 pages
Matrix Project Chatbot
No ratings yet
Matrix Project Chatbot
45 pages
Intelligent Chatbot For College Enquiry
No ratings yet
Intelligent Chatbot For College Enquiry
8 pages
Base Paper
No ratings yet
Base Paper
15 pages
Sem A Tic Microsoft
No ratings yet
Sem A Tic Microsoft
31 pages
Denodo8 - Metadata Management Overview
No ratings yet
Denodo8 - Metadata Management Overview
28 pages
Chatbot Report
No ratings yet
Chatbot Report
32 pages
Chatbotpaper
No ratings yet
Chatbotpaper
10 pages
IJIRCST Paper Template
No ratings yet
IJIRCST Paper Template
3 pages
Journal Ijemr 4-2 448693241
No ratings yet
Journal Ijemr 4-2 448693241
8 pages
Personal Branding On LinkedIn
No ratings yet
Personal Branding On LinkedIn
29 pages
Project Phase 1 Progress 2
No ratings yet
Project Phase 1 Progress 2
15 pages
JETIR2309320
No ratings yet
JETIR2309320
17 pages
SurveyonIntelligentChatbots - State of The ArtandFutureResearchDirections
No ratings yet
SurveyonIntelligentChatbots - State of The ArtandFutureResearchDirections
11 pages
PRJ 3
No ratings yet
PRJ 3
11 pages
Kristensen y Hartley 2023
No ratings yet
Kristensen y Hartley 2023
12 pages
Creating A BSP Application - Purchase Order Details Display - v1
No ratings yet
Creating A BSP Application - Purchase Order Details Display - v1
13 pages
Exam Research - Group 7
No ratings yet
Exam Research - Group 7
22 pages
Chatbot: International Journal of Trend in Scientific Research and Development (IJTSRD)
No ratings yet
Chatbot: International Journal of Trend in Scientific Research and Development (IJTSRD)
4 pages
AI-102 - PowerShell, CLI, Python and C# SDK
No ratings yet
AI-102 - PowerShell, CLI, Python and C# SDK
21 pages
Chatbot PPT 2.0
No ratings yet
Chatbot PPT 2.0
13 pages
Survey Paper Group25
No ratings yet
Survey Paper Group25
8 pages
Quick Guide Festo OPC Easy Server
No ratings yet
Quick Guide Festo OPC Easy Server
30 pages
Research Paper
No ratings yet
Research Paper
4 pages
Dexter
No ratings yet
Dexter
5 pages
A Survey Paper On Chatbots
No ratings yet
A Survey Paper On Chatbots
4 pages
A Conditional Generative Chatbot Using Transformer
No ratings yet
A Conditional Generative Chatbot Using Transformer
14 pages
ChatBot Using TenserFlow
No ratings yet
ChatBot Using TenserFlow
12 pages
Icacci17 057
No ratings yet
Icacci17 057
8 pages
University Chat Bot
No ratings yet
University Chat Bot
5 pages
RNN and LSTM Based Chatbot Using NLP: Department of Computer Science and Engineering, MSIT, New Delhi, India
No ratings yet
RNN and LSTM Based Chatbot Using NLP: Department of Computer Science and Engineering, MSIT, New Delhi, India
4 pages
Recent Deep Learning Based NLP Techniques For Chatbot Development An Exhaustive Survey
No ratings yet
Recent Deep Learning Based NLP Techniques For Chatbot Development An Exhaustive Survey
4 pages
HTML Basic Tags and Their Uses
No ratings yet
HTML Basic Tags and Their Uses
10 pages
A Conditional Generative Chatbot Using Transformer Model
No ratings yet
A Conditional Generative Chatbot Using Transformer Model
12 pages
ChatBot Synopsis-Final
No ratings yet
ChatBot Synopsis-Final
7 pages
Implementation of A Chatbot System Using Ai and NLP
No ratings yet
Implementation of A Chatbot System Using Ai and NLP
6 pages
Chatbot Using TensorFlow For Small Businesses
No ratings yet
Chatbot Using TensorFlow For Small Businesses
6 pages
7 Ijcse 08620 18
No ratings yet
7 Ijcse 08620 18
6 pages
Class Handout AS122882 Creating Intelligent Details in Revit Brian Mackey
No ratings yet
Class Handout AS122882 Creating Intelligent Details in Revit Brian Mackey
8 pages
Chatbot 4
No ratings yet
Chatbot 4
6 pages
RP 4
No ratings yet
RP 4
6 pages
A Subject-Specific Chatbots For Primary Education End-Users Using Machine Learning Techniques
No ratings yet
A Subject-Specific Chatbots For Primary Education End-Users Using Machine Learning Techniques
10 pages
Chatbot: A Deep Neural Network Based Human To Machine Conversation Model
No ratings yet
Chatbot: A Deep Neural Network Based Human To Machine Conversation Model
7 pages
HTML Basic Quiz
No ratings yet
HTML Basic Quiz
12 pages
Fault Detection
No ratings yet
Fault Detection
6 pages
College Enquiry Chat Bot System
No ratings yet
College Enquiry Chat Bot System
5 pages
A Survey of Chatbot Design Techniques: Manuscript Published On 30 January 2019
No ratings yet
A Survey of Chatbot Design Techniques: Manuscript Published On 30 January 2019
5 pages
Lab 02b
No ratings yet
Lab 02b
8 pages
Fin Irjmets1685011030
No ratings yet
Fin Irjmets1685011030
6 pages
Chatbot Research Paper
No ratings yet
Chatbot Research Paper
5 pages
Kdwerw8Vlqj$3,+Xpdq7R0Dfklqh &rqyhuvdwlrq
No ratings yet
Kdwerw8Vlqj$3,+Xpdq7R0Dfklqh &rqyhuvdwlrq
5 pages
Title: Voice Chatbot in Educational Systems: Name: Bharath Kumar Venukanti Group:4901MDA Code: St71892
No ratings yet
Title: Voice Chatbot in Educational Systems: Name: Bharath Kumar Venukanti Group:4901MDA Code: St71892
10 pages
Chatbot Using Natural Language Process (NLP)
No ratings yet
Chatbot Using Natural Language Process (NLP)
5 pages
Ai Based Chatbot To Answer Faqs: Abstract
No ratings yet
Ai Based Chatbot To Answer Faqs: Abstract
5 pages
Human Activity
No ratings yet
Human Activity
5 pages
Implementing Chatbot in Educational Institutes
No ratings yet
Implementing Chatbot in Educational Institutes
6 pages
College Enquiry Chat Bot System With Text To Speech
No ratings yet
College Enquiry Chat Bot System With Text To Speech
5 pages
AI Chatbot For College Enquiry: Received: Revised: Accepted
No ratings yet
AI Chatbot For College Enquiry: Received: Revised: Accepted
4 pages
Image Capturing
No ratings yet
Image Capturing
4 pages
Service Now Faq
No ratings yet
Service Now Faq
7 pages
Icscan49426 2020 9262366
No ratings yet
Icscan49426 2020 9262366
5 pages
A Review of Current Trends in The Development of Chatbot Systems
No ratings yet
A Review of Current Trends in The Development of Chatbot Systems
5 pages
Safety App: Crime Prediction Using GIS
No ratings yet
Safety App: Crime Prediction Using GIS
6 pages
1 s2.0 S0952197622002081 Main
No ratings yet
1 s2.0 S0952197622002081 Main
3 pages
NLP Ise 2
No ratings yet
NLP Ise 2
4 pages
Ijirt156711 Paper
No ratings yet
Ijirt156711 Paper
3 pages
Release Notes On The Brain Application
No ratings yet
Release Notes On The Brain Application
8 pages
Lec #4 Javascript
No ratings yet
Lec #4 Javascript
4 pages
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet