Generating and Analyzing Chatbot Responses
Generating and Analyzing Chatbot Responses
1|Page
This project centers around the study of deep learning manual intervention and supervision, which affects the speed
models, natural language generation, and the evaluation of the and quality of processes execution. IT service providers are
generated results. under competitive pressure to continually improve their
We believe that this contribution can add improvement by service quality and reduce operating costs through automation.
applying the right preprocessing steps which may organize Hence, they need the adoption of chatbots in order to speed up
sentences in a better way and help in generating proper the work and ensure its quality [10].
responses. On the other hand, we start with the existing text On the medical side, the field of healthcare has developed
generative models CNN and LSTM and then try to improve a lot, lately. This development appears with the use of
them as well as develop a new model such as GRU to compare information technology and AI in the field. In [11], the authors
results. We focus on evaluating the generated responses from proposed a mobile healthcare application as a chatbot to give a
two aspects: the number of words matches between the fast treatment in response to accidents that may occur in
reference response and the generated response and their everyday life, and also in response to the sudden health
semantic similarity. changes that can affect patients and threaten their lives.
The rest of this paper is organized as follows. Section II Customer services agent is an application of applying
provides reviews of the related works. The methodological chatbot technologies in businesses to solve customer problems
approach is described in Section III. Moreover, dataset and help the sales process. As companies become globalized
collection and analysis in details are provided in Section IV. in the new era of digital marketing and artificial intelligence,
The implementation strategy and results of this project are brands are moving to the online world to enhance the
discussed in section V. Finally, the conclusion of the project customer experience in purchasing and provide new technical
and its future work are provided in Sections VI and VII support ways to solve after-sales problems. Moreover, fashion
respectively. brands such as Burberry, Louis Vuitton, Tommy Hilfiger,
II. LITERATURE REVIEW Levi's, H&M, and eBay are increasing the popularity of e-
service agents [1].
Developing computational conversational models
(chatbots) took the attention of AI scientists, for a number of B. Natural Language Processing
years. Modern intelligent conversational and dialogue systems NLP allows users to communicate with computers in a
draw principles from many disciplines, including philosophy, natural way. The process of understanding natural language
linguistics, computer science, and sociology [5]. This section can be decomposed into the syntactic and semantic analysis.
will explore the previous work of chatbots and their Syntactic refers to the arrangement of words in a sentence
implementations. such that they make grammatical sense. Moreover, syntactic
A. Chatbots Applications and Uses analysis transforms sequences of words into structures that
show how these words are related to each other. On the other
Artificial dialogue systems are interactive talking hand, semantic refers to the meaning of each word and
machines called chatbots. Chatbot applications have been sentence. The semantic analysis of natural language content
around for a long time; the first well-known chatbot is Joseph captures the real meaning; it processes the logical structure of
Weizenbaum‟s Eliza program developed in the early 1960s. sentences to find the similarities between words and
Eliza facilitated the interaction between human and machine understand the topic discussed in the sentences [12].
through a simple pattern matching and a template-based
response mechanism to emulate the conversation [6, 7]. As part of the text mining process, the text needs many
modification and cleaning before using it in the prediction
Chatbot became important in many life areas; one of the models. As mentioned in [13], the text needs many
primary uses of chatbots is in education as a question preprocessing steps which include: removing URLs,
answering system for a specific knowledge domain. In [8], the punctuation marks and stop words such as a, most, and, is and
authors proposed a system that has been implemented as a so on in the text because those words do not contain any
personal agent to assist students in learning Java programming useful information. In addition, tokenizing, which is the
language. The developed prototype has been evaluated to process of breaking the text into single words. Moreover, text
analyze how users perceive the interaction with the system. needs stemming, which means changing a word into its root,
Also, the student can get help in registering and dropping such as “happiness” to “happy”. For features extraction, the
courses by using a chatbot spatialized in student authors use Bag of Words (BoW) to convert the text into a set
administrative problems, as mentioned in [9]. The of features vector in numerical format. BoW is the process of
administrative student‟s chatbot helps the colleges to have transforming all texts into a dictionary that consist of all words
24*7 automated query resolution and helps students have the in the text paired with their word counts. Vectors are then
right information from a trusted source. formed based on the frequency of each word appearing in the
On another hand, information technology (IT) service text.
management is an important application area for enterprise Before entering the data into a model or a classifier, it is
chatbots. In many originations and companies, IT services necessary to make sure that the data are suitable, convenient,
desk is one of the essential departments that helps to ensure and free of outliers. In [14], the authors explain how to
the continuity of work and solving technical problems that preprocess the text data. The main idea was to simplify the
employees and clients are facing. This variability demands text for the classifier to learn the features quickly. For
example, the
2|Page
names can be replaced with one feature {{Name}} in the The customer service agent is an important chatbot that is
feature set, instead of having the classifier to learn 100 names used to map conversations from request to the response using
from the text as features. This will help in grouping similar the sequence to sequence model. Moreover, a sequence to
features together to build a better predicting classifier. On sequence models has two networks one work as an encoder
another hand, emoticons and punctuation‟s marks are that maps a variable-length input sequence to a fixed-length
converted to indicators (tags). Moreover, a list of emoticons is vector, and the other work as a decoder that maps the vector to
compiled from online sources and grouped into categories. a variable-length output sequence. In [4], the authors generate
Other punctuation marks that were not relevant to the coding word-embedding features and train word2vec models. They
scheme are removed. trained LSTMs jointly with five layers and 640 memory cells
Chat language contains many abbreviations and using stochastic gradient descent for optimization and gradient
contractions in the form of short forms and acronyms that clipping. In order to evaluate the model, the system was
have to be expanded. Short forms are shorter representations compared with actual human agents responses and the
of a word which are done by omitting or replacing few similarity measured by human judgments and an automatic
characters, e.g., grp → group and can‟t → cannot. The authors evaluation metric BLEU.
created a dictionary of these words from the Urban Dictionary As a conclusion of reviewing works concerned with the
to replace abbreviations by expansions. Spell checking is conversational system, text generation in English language
performed as the next step of the pre-processing pipeline on and the collaboration of social media in customer support
all word tokens, excluding the tagged ones from the previous service, this paper proposes a work that aims to fill the gap of
steps [14]. limited works in the conversational system for customer
Minimizing the words during the text pre-processing phase support field, especially in the Twitter environment. The
as much as possible is very important to group similar features hypothesis of this project was aiming to improve the
and obtain a better prediction. As mentioned in [15], the automated responses generated by different deep learning
authors suggest processing the text through stemming and algorithms such as LSTM, CNN, and GRU to compare results
lower casing of words to reduce inflectional forms and and then evaluate them using BLEU and cosine similarity
derivational affixes from the text. The Porter Stemming techniques. As a result, this project will help to improve the
algorithm is used to map variations of words (e.g., run, text generation process in general, and customer support field
running and runner) into a common root term (e.g., run). in particular.
Words can not be used directly as inputs in machine III. METHODOLOGICAL APPROACH
learning models; each word needs to be converted into a
This section discusses the background of the implemented
vector feature. In [4], the authors adopt the Word2vec word
methods, explain why these methods are appropriate and give
embedding method to learn word representations of customer
an overview of the project methodology.
service conversations. Word2vec's idea is that each dimension
of inclusion is a possible feature of the word, which can A. Text Generative Model
capture useful grammatical and semantic properties. Based on the nature of this project, which is generating a
Moreover, they tokenize the data by building a vocabulary of proper response to every customer query in social media,
the most frequent 100K words in the conversations. applying sequence-to-sequence learning are needed.
C. Machine Learning Algorithm and Evaluation Moreover, sequence-to-sequence means mapping a sequence
of words representing the query to another sequence of words
A large number of researchers use the idea of artificial representing the response, the length of queries and responses
intelligence and deep learning techniques to develop chatbots can be different. This can be applied by the use of NLP and
with different algorithms and methods. As mentioned in [16], deep learning techniques.
the authors use a repository of predefined responses and a
model that ranks these responses to pick an appropriate Sequence-to-sequence models are used in many fields,
response for a user‟s input. Besides, they proposed topic including chat generation, text translation, speech recognition,
aware convolutional neural tensor network (TACNTN) model and video captioning. As shown in Fig. 1, a sequence-to-
to classify whether or not a response is proper for a message. sequence model consists of two networks, encoder, and
The matching model used to select a response for a user decoder. The input text enters the encoder network in reverse
message. Specifically, it has three-stages that include: pre- order, then it is converted into a sequence of fixed length
processing the message, retrieving response candidates from context vector, which is then used by the decoder to generate
the pre-defined message-response pair index, then ranking the the output sequence [18].
response candidates with a pre-train matching model.
In [17], the authors train two word-based machine learning
models, a convolutional neural network (CNN) and a bag of
words SVM classifier. Resulting scores are measured by the
Explanatory Power Index (EPI). EPI used to determine how
much words contribute to the classification decision and filter
relevant information without an explicit semantic information
extraction step. Fig. 1. Sequence to Sequence Model.
3|Page
Before inserting the sequence of words into the encoder
model, it needs to be converted into a numerical format; this
can be applied by using NLP techniques. This project focused
on Bag of Words, or BoW vector representations, which is the
most commonly used traditional vector representation for text
generating models. BoW is used to transforms all texts into a
dictionary that consists of all words that appear in the
document [13]. It then creates a set of features in real number
inside a vector for each text.
B. Deep Learning Models Fig. 3. The Architecture of RNN.
4|Page
IV. IMPLEMENTATION STRATEGY The experiments are applied using three different models
In this section, we are going to explain the methodology LSTM, GRU, and CNN. The models use a training dataset of
followed for this project. At first, prepare the dataset for around 700k pairs of queries and responses and a testing dataset
modeling. The preparing process includes preprocessing step of 30k of unseen data. Training time is between 5 and 12 hours,
and features extraction then train the models using a training depending on the model ( see Table III).
set and evaluate them with a test set.
A. Data Preprocessing
A data analyst cannot handle raw text directly to suit
machine learning or deep learning methods. Therefore, it is
necessary to work on texts‟ preprocessing from all existing
impurities, for example, punctuation, expression code, and
non-English words (Chinese, Spanish, French, and others). In
order to do this, a number of python NLP libraries such as
regular expression (RE), unicodedata, langdetect, and
contractions are used.
In this project, the performed preprocessing steps include:
remove links, images, Twitter ID, numbers, punctuation,
emoji, non-English words and replace abbreviations with long
forms. Table II illustrates the changes in the dataset before and
after applying all the previous preprocessing steps.
The preprocessing steps are chosen carefully; not all
preprocessing techniques are suitable for this kind of projects.
For example, removing stopwords and text stemming cannot
be applied because it will affect the sentences structures as
well as the text generation process.
B. Feature Extraction
Before doing any complex modeling, the dataset needs to
be transformed into a numerical format suitable for training.
The Bag of Words (BOW) concept is applied to extract
features from the text dataset. First, all of the texts in the
dataset are split into an array of tokens (words). Then, a
vocabulary dictionary is built with all of the words in the
dataset and its corresponding index value. The array of words
is then converted to an array of indexes. This process can be
applied by the use of the sklearn‟ predefined method called
CountVectorizer.
In order to handle variable length, the maximum sentence
length needs to be decided. Moreover, all remaining vector
positions should be filled with a value („1‟ in this case) to
make all sequences have the same length. On the other hand,
words not in the vocabulary dictionary will be represented
with UNK as a shortcut of unknown words. Moreover, each
output text in the dataset will start with a start flag („2‟ in this
case) to help in training. Now the dataset is ready for training.
C. Modeling
The infrastructure used for experimentation involves
google colaboratory and Crestle cloud services which are
GPU-enabled Jupyter environments with powerful computing
resources. All popular scientific computing and deep learning
packages are pre-installed and configured to run on a GPU.
5|Page
TABLE. II. THE CHANGES IN TEXT BEFORE AND AFTER
APPLYING PREPROCESSING STEPS
TABLE. IV. THE COMMON PARAMETERS USED IN LSTM, GRU AND CNN
MODELS
6|Page
Parameter Value
Word embedding dimension size 100
Vocabulary size 10,000
Context dimension size 100
Learning rate 0.001
Optimization function Adam
1000 (the max that our computer can
Batch size
handle)
Max message length 30
7|Page
Fig. 10. The BLEU Scores for 1, 2, 3 and 4 Grams.
i am sorry for the trouble with your order
GRU Generated please reach out to us here and we will look
Response into this for you please do not provide your
order details
REFERENCES
Fig. 11. The Cosine Similarity Scores. [1] M. Chung, E. Ko, H. Joung, and S. J. Kim, “Chatbot e-service and
customer satisfaction regarding luxury brands,” J. Bus. Res., Nov. 2018.
TABLE. VI. EXAMPLE OF EMOTIONAL QUERY AND RESPONSES FROM ALL
[2] J. Hill, W. Ford, I. F.-C. in H. Behavior, and undefined 2015, “Real
MODELS
conversations with artificial intelligence: A comparison between human–
human online conversations and human–chatbot conversations,” Elsevier.
my package is days late and i am leaving [3] J. Hirschberg and C. D. Manning, “Advances in natural language
Customer Query tomorrow on holidays could you please help it processing,” Science (80-. )., vol. 349, no. 6245, pp. 261–266, Jul. 2015.
is extremely
[4] A. Xu, Z. Liu, Y. Guo, V. Sinha, and R. Akkiraju, “A New Chatbot for
Customer Service on Social Media,” in Proceedings of the 2017 CHI
Customer Support sorry to hear this please dm us your tracking Conference on Human Factors in Computing Systems - CHI ‟17, 2017,
Response and phone number pp. 3506–3510.
i am sorry for the trouble with your order [5] S. Oraby, P. Gundecha, J. Mahmud, M. Bhuiyan, and R. Akkiraju,
LSTM Generated
please report this to our support team here and “Modeling Twitter Customer ServiceConversations Using Fine-Grained
Response
we will check this Dialogue Acts,” in Proceedings of the 22nd International Conference on
Intelligent User Interfaces - IUI ‟17, 2017, pp. 343–355.
[6] H. Shah, K. Warwick, J. Vallverdú, and D. Wu, “Can machines talk?
Comparison of Eliza with modern dialogue systems,” Comput. Human
Behav., vol. 58, pp. 278–295, May 2016.
8|Page
[7] R. DALE, “The return of the chatbots,” Nat. Lang. Eng., vol. 22, no. 05,
pp. 811–817, Sep. 2016.
[8] M. Coronado, C. A. Iglesias, Á. Carrera, and A. Mardomingo, “A
cognitive assistant for learning java featuring social dialogue,” Int. J.
Hum. Comput. Stud., vol. 117, pp. 55–67, Sep. 2018.
[9] S. Jha, S. Bagaria, L. Karthikey, U. Satsangi, and S. Thota,
“STUDENT INFORMATION AI CHATBOT,” in International
Journal of Advanced Research in Computer Science, 2018, vol. 9, no.
3.
9|Page
10 | P a g e