Adaptive e Learning AI Powered Chatbot B

This paper presents an AI-powered Chatbot designed for e-learning, which utilizes multimedia indexing and Natural Language Processing (NLP) to enhance learner interaction and content retrieval. The Chatbot architecture is based on keyword extraction techniques to efficiently match learners' needs with appropriate multimedia content. The study outlines a four-step approach for implementing the Chatbot, including metadata extraction, speech-to-text processing, automatic keyword extraction, and Chatbot design.

Uploaded by

mnmnm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views10 pages

Adaptive e Learning AI Powered Chatbot B

Uploaded by

mnmnm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 11, No. 12, 2020

Adaptive e-Learning AI-Powered Chatbot based on

Multimedia Indexing
Salma El Janati1, Abdelilah Maach2, Driss El Ghanami3
LRIE Laboratory, Mohammadia School of Engineers (EMI)
Mohammed V University, Rabat
Morocco

Abstract—With the rapid evolution of e-learning technology, depend exclusively on content, and will frequently appear
the multiple sources of information become more and more valuable cards, pictures, joins, and shapes, giving an app-like
accessible. However, the availability of a wide range of e-learning encounter.
offers makes it difficult for learners to find the right content for
their training needs. In this context, our paper aims to design an One of the most difficult areas of research is the
e-learning AI-powered Chatbot allowing interaction with development of effective Chatbots that emulate human
learners and suggesting the e-learning content adapted to their dialogue. It is a difficult task that involves problems related to
needs. In order to achieve these objectives, we first analysed the the NLP (Natural Language Processing) research field [4].
e-learning multimedia content to extract the maximum amount Thanks to NLP techniques and algorithms, it possible to
of information. Then, using Natural Language Processing (NLP) understand the learners' requests based on what the learner is
techniques, we introduced a new approach to extract keywords. writing. Usually, this task is the core of the Chatbot but there
After that, we suggest a new approach for multimedia indexing are some limitations as it is not possible to map all learner
based on extracted keywords. Finally, the Chatbot architecture is requests, and current Chatbots do not show remarkable
realized based on the multimedia indexing and deployed on performance due to the unpredictability of the thinking of the
online messaging platforms. The suggested approach aims to learner during a conversation [5]. One of the most important
have an efficient way to represent the multimedia content based task in setting up Chatbot is the design of the conversational
on keywords. We compare our approach with approaches in
flow. In fact, we suggest a new approach for a successful
literature and we deduce that the use of keywords on our
conversation based on keywords extractions, which it is
approach result on a better representation and reduce time to
construct multimedia indexing. The core of our Chatbot is based important to handle with all learners requests and provide the
on this indexed multimedia content which enables it to look for adequate content.
the information quickly. Then our designed Chatbot reduce This paper is structured as follows: the second section is a
response time and meet the learner’s need. related work. In the third section, we will present the
suggested approach. Then, the fourth part will be devoted to
Keywords—e-Learning; Chatbot; Speech-To-Text; NLP;
our result and our simulation. Finally, we will conclude with a
Keywords Extraction; Text Clustering; Multimedia Indexing
recommendation approach to improve the suggested approach.
I. INTRODUCTION
II. RELATED WORK
With the rapid evolution of the IT field and the continuous
The use of the advanced technology in data science
updating of available tools. Learners must undergo continuous
enables to improve the quality of e-learning content. In this
trainings in order to improve their skills and ensure
paper, we suggest a framework that uses Natural Language
technological monitoring [1]. Indeed, training has been
Processing (NLP) and Keywords Extraction as a chatbot
strongly affected by the digital transformation. e-Learning
engine for e-learning. Thus, in this section we will review the
(online training) is a perfect example with the digitization of
literature related to chatbot and Automatic Keywords
learning. This distance learning technique eliminates the
Extraction (AKE).
physical presence of a trainer [2]. It has many advantages and
is become a part of the learner journey. However, the quantity A. Chatbot
of multimedia contents offered for the learner increases Chatbots are a virtual assistant capable of chatting with
exponentially, which makes the autonomous learning a users and responding to their requests. They are increasingly
complicated task because it is necessary to choose the content using speech synthesis techniques to produce their messages
adapted to their context to ensure an effective learning [3]. as the user types their interventions into a text field on the web
In this context, our study aims to create an indexed page.
database of e-learning multimedia content in order to set up a The design of a Chatbot to meet the needs of users was
Chatbot system who offer the appropriate content to each always a concern in the field of information retrieval [4].
learner according to his or her e-learning needs. The Depending on how Chatbots are programmed, we can divide
advancements in artificial intelligence and NLP, allowing bots them into two large groups: those that are programmed
to converse more and more, like real people who share e- according to predefined commands (rule-based Chatbot) [5]
learning presentation contents. Predominant Chatbots do not and those based on artificial intelligence (AI) [6].

299 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020

AI Chatbots using machine learning are designed to In this section, we provide an overview of keyword
understand the context and intent of a question before extraction methods. Our goal is not to detail the functioning of
formulating an answer. There are two types of AI Chatbots: these methods, but rather to have an overview of their basic
Generative Chatbots [7] and Information Retrieval Chatbots principle and their classification in order to identify the
[4]. Fig. 1 presents Chabot’s types. suitable methods for our case study.
The generative Chatbot is built in order to be able to act to 1) Statistic approach: Statistical Approaches is
any context or interlocutor and also in nonprogrammed considered one of the simplest techniques used to identify
situations. Such a conversational agent relies on the new keywords within a text. These approaches do not require
artificial intelligence techniques such as deep learning and training data in order to extract the most important keywords
neural network to generate his responses word by word [8].
in a text. it seeks to define what a keyword, based on certain
Thus, these bots can construct answers to users questions
themselves. The problem with this approach is that it is too statistical features and study their relation with the notion of
general and Chatbots struggle to have a coherent conversation importance of a candidate term. the more the term candidate is
or even to produce syntactically valid sentences with current considered important in analysed document, and the more it
models. will be relevant as a keyword.
The Information retrieval Chatbots are the chatbot adapted TF-IDF [15] and Likey [16] are two methods, which
to a given context, which builds its responses using a set of compare the behavior of a candidate term in the analysed
sentences that have been given to it in advance. These document with its behavior in a collection of documents
chatbots are based on retrieving information from the user's (reference corpus). The objective is to find candidate terms
question and seek the most suitable answer using NLP [9]. whose behavior in the document varies positively compared to
This type of Chatbots is best suited for closed domain their overall behavior in the collection. In both methods, this is
systems. This approach guarantees grammatically correct expressed by the fact that a term is important in the document
answers and simplifies the learning task for the algorithm, analysed if it is largely present, when it is not in the rest of the
because it allows constructing the model on a training data set collection.
smaller than the big amount of data required in the case of a Yake! Approach [17] focuses on statistical features that do
generative Chatbot [4]. not require external dictionaries and look on characteristics,
Thus, in our case study we chose to design a Chatbot based which can be calculated using only current document. These
on retrieval information since it allows to exploit the results approaches are based on characteristics such as the position of
obtained from videos indexing and also have some control the first occurrence of a candidate, the word frequency, the
over the responses generated by the Chatbot. case, and the frequency with which a word appears in different
sentences.
Our contribution consists of using a keyword extraction
technique adapted to the multimedia e-learning content we
have. Thus, the Chatbot will be based on the keywords instead
of all the text in order to find the adequate answer to the
learner's need in a fast and efficient way. The next section will
be dedicated to keyword extraction techniques found in
literature.
B. Automatic Keywords Extraction (AKE)
Keyword extraction involves identifying the words and
phrases representing the main subjects of a document. High
quality keywords can make it easier to understand, organize
and access the content of the document. AKE from a
document has been used in many applications, such as
information retrieval [10], text synthesis [11], text
categorization [12], and opinions mining [13].
Most existing keyword extraction algorithms address this
problem in three steps (Fig. 2): First, the candidate keywords Fig. 1. Chatbots Types.
(i.e. words and phrases that can be used as keywords) are
selected in the content of the document. Second, candidates
are either classified using a candidate weighting function
(unsupervised approaches) or classified into two classes
(keywords / no keywords) using a set of extracted
characteristics (supervised approaches). Third, the most
weighted first N candidates with the highest confidence scores
are selected as keywords [14].
Fig. 2. General approach for Automatic Keywords Extraction.

300 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020

2) Graph based approach: Graph-based approaches can adapt to any task, including AKE task. Algorithms used
consist of representing the content of a document as a graph. for this construct probabilistic models, decision trees, Support
The methodology applied comes from PageRank [18], an Vector Machine (SVM) or even neural networks.
algorithm for ranking web pages (graph nodes) based on the
KEA is a method which uses a naive Bayesian
recommendation links that exist between them (graph edges). classification to assign a likelihood score to each candidate
TextRank [19] and SingleRank [20] are the two basic term, the aim being to indicate whether they are keywords or
adaptations of PageRank for AKE. In these, web pages are not [28]. These approaches use three conditional distributions
replaced by text units whose granularity is the word and an learned from the learning corpus. The first is the probability
edge is created between two nodes if the words they represent that each candidate term is labelled yes (keyword) or no (no
co-occur in a given word window. When running the keyword). The other two stand for two different statistic
algorithm, a score is associated with each candidate keyword, features which are the TF-IDF weight of the term candidate
which represents the importance of the node in the graph. and its first position in the document.
To improve TextRank / SingleRank, Liu et al. suggest a Nguyen et al. propose an improvement (WINGNUS) [29]
method, which aims to increase the coverage of all the key by adding a set of features such as: First and last occurrences
terms extracted in the analysed document (TopicalPageRank) (word off-set), Length of phrases in words, whether a phrase is
[21]. To do this, they try to refine the importance of words in also part of the document title, number of times a phrase
the document by taking into account their rank in each topic. appears in the title of other document. Adding these features
The rank of a word for a topic is obtained by integrating into improves the performance of the original version of KEA, but
its PageRank score the probability that it belongs to the topic. only when the amount of data is large enough. Suggested
The overall rank of a candidate term is then obtained by Architecture.
merging its ranks for each topic.
III. SUGGESTED APPROACH
3) Word embedding approach: Word embedding [22] is a In this section, we suggest an approach of an adaptive
new way of representing words as vectors, typically in a space Chatbot based on Retrieval information. This approach
of a few hundred dimensions. A word is transformed into consists of indexing e-learning multimedia using keyword
numbers. The word representation vector is learned based on extraction. These indexed contents will be integrated in the
an iterative algorithm from a large amount of text. The Chatbot engine in order to offer the adequate content to the
algorithm tries to put the vectors in space in order to bring learner’s needs.
together the semantically close words, and to move away the The suggested approach is based on four steps, which will
semantically distant words. By finding the closest words in the be detailed:
embedding space of a given word as input, the model
identifies synonyms or intruders in a list of words. Once a 1) Extracting metadata from the e-learning database.
model is obtained, several standard tasks become possible. 2) Speech to text Processing.
3) Automatic Keywords Extraction.
Different methods based on the word embedding are 4) Suggested Chatbot design.
suggested for representing entire documents or sentences [23].
Skip-thought [24] provides sentence embedding trained to The first step of our approach consists of analyzing e-
predict neighbouring sentences. Sent2Vec [25] generate learning multimedia content and extracting as much
sentence embedding using word n-gram representation. information as possible from e-learning content in order to
build our database. The second step will be devoted to
With words embedding the keyword can be extracted standardize the e-learning content by transforming all content
using the cosine similarity in the word space representation. to text, so we use the Speech To Text techniques in order to
EmbedRank [26] is a word embedding based approach to extract text from all multimedia content. In the third step, we
automatically extract key phrases. In this method, documents will suggest methods for extracting keywords from the
or word sequences of arbitrary length are embedded into the extracted text. We will test several approaches and suggest a
same feature space. This enables computing semantic new approach adapted to our problematic.The last step
relatedness between document and candidate keyword by describe chatbot design based on keywords resulting from the
using the cosines similarity measures. previous steps. Fig. 3 summarizes the methodology of our
4) Supervised approach: Supervised approaches are approach.
methods who able to learn to perform a particular task, in this A. Extracting Metadata from e-Learning Database
case the extraction of keywords. Learning is done through a In our study, we use a various source of e-learning content
corpus whose documents are annotated in keywords. The in order to construct database, which will be the base of our
annotation allows to extract the examples and counter- Chatbot recommendation. e-Learning sources provide content
examples whose statistical and / or linguistic features are used in different types of information like video, speech and, text.
to teach a binary classification [27]. These classifications The first step of our approach consists of extracting as much
consist of indicating if a candidate term is a keyword or not. information as possible from e-learning content in order to
Many supervised algorithms are used in various fields. They build our database.

301 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020

which gives a probability distribution of the character at that

time step using softmax activation.
The problem of automatically recognizing speech with the
assistance of a computer is a diﬃcult task, due to the
complexity of the human dialect. To solve this problem,
several challenges must be addressed: Microphone poor
quality, background noise, speaker variability and so on. All
these possibilities must be included in the training step for
Deep Network to process them. Thus, to create a voice-
recognition system that achieves the performance of Siri,
Google Now or Alexa, it is mandatory to have a large amount
of data for the training step.
Given the difficulty of acquiring a massive database to
Fig. 3. Methodology Flow Chart. train our own ASR system, we chose in our study to evaluate
the APIs that are available on the market. Thus, we develop a
In order to extract metadata from e-learning multimedia, Python Script that implement different Speech Recognition
we design a Python script that automatically retrieves API’s using ‘Speech Recognition’ library.
metadata. This data frame is rich in information, it mainly
contains: ID, the title, the category, the description, the In order to compare different methods proposed by API’s
subtitle if it is +available, the date of the online publication, for automatic transcription, we need to evaluate the
and the author. In addition, the metadata contains the audio of performance of each model. The key metric for transcription is
multimedia if it is a video content. This metadata information accuracy. How closely the words within the created transcript
is used in the Chabot engine in order to organize multimedia match the talked words within the unique sound.
e-learning content and to give an easy content access. Fig. 4 To calculate the accuracy of the automatic transcription,
shows the metadata extraction script. we will use the metric Word Error Rate (WER). The WER is a
The second step of our approach is to standardize the e- very simple and widely use measure for transcription
learning content by transforming all types into text (video to accuracy. It is a number, calculated as the number of words
text and speech to text). The next sub-section is dedicated to needed to be inserted or changed or deleted to convert the
describe the speech to text approach used in our case study. transcript hypothesis into the reference transcript, divided by
the number of words in the reference transcript. (It’s the
B. Speech to Text Processing Levenshtein distance for words, measuring the minimum
Automatic Speech Recognition (ASR) is the process by number of single words edits to correct the transcription.) A
which speech is transcribed into text. This technology has perfect match has a WER of zero; larger values indicate lower
many useful applications ranging from hands-free car accuracy and thus more editing.
interfaces to home automation. Although speech recognition is
𝑊𝑜𝑟𝑑 𝐸𝑟𝑟𝑜𝑟 𝑅𝑎𝑡𝑒
an easy task for humans, it has always been difficult for 𝐼𝑛𝑠𝑒𝑟𝑡𝑖𝑜𝑛𝑠 + 𝐷𝑒𝑙𝑒𝑡𝑖𝑜𝑛𝑠 + 𝑆𝑢𝑏𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑠
machines. Since 2012, the use of Deep Neural Networks =
(DNN) has considerably improved the accuracy of speech 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑊𝑜𝑟𝑑𝑠 𝑖𝑛 𝑅𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑇𝑟𝑎𝑛𝑠𝑐𝑟𝑖𝑝𝑡
recognition [30].
Deep Learning-based ASR systems today have achieved
even better results than humans in languages like English
[Baidu's Deep speech 2] [31]. These advances have been
propelled by the use of large amounts of data (up to tens of
thousands of hours of transcribed speech) and by an enormous
parallel computing power controlled by GPUs.
The general approach of ASR systems based on Deep
Learning and which is currently used by almost all systems
offered by large groups such as google, IBM ... This approach Fig. 4. Metadata Extraction Script.
follows the architecture presented in Fig. 5.
This approach generally starts from converting audio into a
feature matrix to feed it into the neural network. This is done
by creating spectrograms from audio waveform. The
spectrogram input can be considered as a vector at each
timestamp. A 1D convolutional layer extricates highlights out
of each of these vectors to provide a grouping of highlight
vectors for the LSTM layer to handle. The (Bi) LSTM output
layer is passed to a Fully Connected layer for each time step Fig. 5. Global Architecture of ASRs based on Deep Learning.

302 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020

Fig. 6 and Table I show WER evaluation obtained from In the same perspective, we will compare keywords
the APIs transcription compared to ground truth. obtained by each approach with the set of Tags provided by
the author of some multimedia content and which we consider
The results obtained show that the transcription of to be the ground truth. So, we will use the F1 score which is
YouTube is the most efficient, followed by the transcription of based on precision and recall. Precision is defined as the
Houndify. This is because the YouTube transcription is number of correctly predicted keywords out of the number of
generally based on the publisher's recommendation, but this all predicted keywords, and recall is defined as the number of
transcription is not always available. Houndify API specializes correctly predicted keywords out of the total number of
in detecting lyrics in music and that can explain the good keywords in the ground truth set. Note that, to determine the
performance of this API, since our e-learning data source correspondence between two key words, we use Porter
mainly contains a speech with a music background. We can Stemmer for the preprocessing in order to consider keywords
also notice that IBM and Google services have a similar which have the same root word.
performance since they are based on similar resources for
learning phase. Finally, the Wit transcription is the least Then, we calculated F1 for the first 20 keywords on our set
efficient and this is generally due to resources for the learning of e-learning multimedia. Fig. 8 represents box plot graphs for
phase which are quite limited. evaluated algorithms as well as Table II presents the statistical
measures (Min, Max and the mean).
Based on results obtained is this step, we decided to make
the approach shown on Fig. 7, in order to normalize
multimedia e-learning types and extract text from video and
audio e-learning content.
Architecture shown on Fig. 7 contains three different
processing; each one is adapted for one type of e-learning
multimedia:
• Video Processing: Extract plain text from original video
if it is provided by the author, otherwise apply Speech
to Text API in order to extract text from video speech.
• Speech processing: Apply Speech Recognition in order
to extract text from e-learning speech.
• Text processing: Extract plain text from e-learning text.
This architecture enables to have a normalized Database,
which contains plain text for every e-learning multimedia
sources. The next step is to extract from the plain text the
important words (keywords) that will represent each Fig. 6. WER Index of Transcription obtained from the ASR APIs.
multimedia content.
TABLE I. WER INDEX OF TRANSCRIPTION OBTAINED FROM ASR APIS
C. Automatic Keywords Extraction
In this subsection, we will present a keyword extraction YouTube Houndify Google IBM Wit
evaluation based on the techniques described in our literature WER 97,04 91,12 86,83 86,13 75,23
review.
1) Keyword extraction approach evaluation: In order to
evaluate the performance of the different algorithms, a first
approach consists of applying a manual evaluation based on
human judgement to decide whether the keywords are
representatives of the content of a document or not [17].
Nevertheless, manual evaluation of Automatic Keywords
Extraction is difficult and time consuming.
Researchers have therefore developed some automatic
evaluation systems based on partial correspondence.
Automatic key phrase extraction methods have generally been
evaluated based on the number of N first candidates which
correspond correctly to the reference keywords. This number
is then used to calculate the accuracy, recall and F-score for a Fig. 7. Architecture of the approach chosen for Multimedia Normalization
set of keywords. (Video/Speech to Text Normalization).

303 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020

Based on Fig. 8, we note that the F1-score of all Then, a global weight for each keyword is calculated by
approaches shows a very large variability. Thus, the relevance summing the normalized weights. The normalization step
of the generated keywords is not stable enough. We can avoids favoring one method over another and thus allow to
deduce that all approaches evaluated are not adapted to all the obtain a global weight based on a non-discriminatory vote.
content of our e-learning database, and that each approach Also, to avoid generate similar keywords, the weights of
generates a set of relevant keywords only for a part of our e- keywords with the same root word are grouped together by
learning corpus. In this context, we suggest a new framework considering them as a single candidate keyword.
based on ensemble methods in order to improve keywords
consistency. 𝑤𝑖 = � 𝑤𝑖𝑗 ∀ 𝑖 ∈ {𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠}
𝑖 è𝑚𝑒 𝑘𝑒𝑦𝑤𝑜𝑟𝑑 𝑖𝑛 𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠 𝑚𝑜𝑑𝑒𝑙
2) Suggested framework for automatic keyword
extraction: In order to propose a framework adapted to our Finally, candidate keywords are ranked based on their
case study, our main idea consists of suggest an approach global weight and the top N are chosen as keywords to
allowing to combine the results obtained from all approaches represent each e-learning multimedia content.
by using a voting system. We notice that the choice of the models to be considered in
Basically, the first step our approach is to apply all the order to build the global weight of each candidate keyword is
AKE methods in order to obtain the first N keywords with important. Thus, to set up an adequate approach to our case
their weights according to each method. study, several configurations are evaluated, namely:

After obtaining the keywords with their weights for each • Voting system based on all methods (Keyword Vote).
method, the second step of our approach is to normalize the • Voting system based on one method per approach
weight of each keyword by dividing its weight for each (statistic, graph based, word embedding, supervised
method by the weight of the keyword with the maximum approach) (Keyword Vote 2).
weight. Thus, the first keyword will have a weight equal to 1.
𝑤𝑖𝑗 • Voting system based on best methods (PositionRank,
𝑤𝑖𝑗 = ∀ 𝑖 ∈ {𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠}, ∀ 𝑗 ∈ {𝑚𝑜𝑑𝑒𝑙𝑠} Topical Rank, EmbedRank, Wingnus) (Keyword Vote
max(𝑤𝑖𝑗 ) ++).
𝑖

Fig. 9 represents a comparison between different

configurations of our approach with the previous methods.
The results obtained confirm that the proposed approach
(Keyword Vote) enable to obtain results with very little
variability. Thus, all the keywords generated are relevant for
most of the e-learning content. In addition, Fig. 9 shows that
(Keyword Vote ++) achieves the best performance. This is
due to a good combination of models which stabilizes the
generation of relevant keywords.
Based on the results obtained, we can deduce that
Keyword Vote ++ makes allow generating a set of relevant
and diversified keywords in order to represent each e-learning
content. This approach is chosen to extract the keywords from
Fig. 8. Box Plot Evaluation Graph for each Keyword Extraction Approach. the text of each multimedia content.
TABLE II. WER INDEX OF TRANSCRIPTION OBTAINED FROM ASR APIS 3) Suggested chatbot design: To design our chatbot we
suggest an architecture composed with two main parts. The
Min Average Max
first part consists in designing the Chatbot Backend which
YAKE 1 4,82 10 contains the NLP engine allowing to understand the intention
TextRank 1 7,18 11,5 of the user and to propose the most adequate response to the
SingleRank 1,5 7,38 11,5 needs of the learner. The second part consists in developing
the Chatbot Frontend which constitutes the interface allowing
TopicRank 0 6,44 9,5
the user to interact with the Chatbot. Fig. 10 shows the overall
TopicalRank 1,5 8,54 12 Chatbot architecture.
PositionRank 3 8,1 12 a) Chatbot BackEnd: The main role of the Chatbot
MltipartitieRank 0,5 6,36 10,5 Backend is to deal with the user's question using NLP Engine
KEA 1 5,88 9,5 and then offer most similar answers to the learner's needs. So,
to design the Chatbot Backend we use the approach proposed
WINGNUS 3,5 8,14 11,5
which is based on extracting keywords using Keyword Vote
EmbedRank 3 8,26 12 ++. Fig. 11 shows the backend architecture of the chatbot.

304 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020

interact with each other. Thus, implementing our e-learning

Chatbot on Slack Messenger has the advantage of offering
Chatbot services to learners in an interface that they are used
to, which will improve the user experience.
IV. RESULTS AND SIMULATION
A. Results
The previous chapter explained the suggested approach for
designing the e-learning chatbot. Indeed, our chatbot is based
on a feature space for representing multimedia content built
Fig. 9. Box Plot of comparison between different Configurations of our
approach with the previous methods of Keyword Extraction. from the most relevant keywords instead of using the overall
text which is extracted from multimedia content.
In order to show the advantage of the proposed approach,
we apply it on a very large corpus with predefined categories.
Indeed, we will compare two approaches:
The approach which is based on the construction of the
terms / documents matrix by calculating the weight of the TF-
IDF for all words in document [15].
The proposed approach which is based on the Keyword
Fig. 10. Chatbot Design. Vote ++ algorithm to extract the keywords and then use the
list of keywords to build the similarity matrix between the
documents by calculating TF-IDF weight for just the relevant
keywords.
Both approaches provide a similarity matrix which will be
used to obtain a hierarchical clustering of the multimedia
content. In order to validate the clusters obtained from the two
approaches, we use the known categories of the corpus as a
ground truth. Thus, given the knowledge of class assignments
from the ground truth, it is possible to define an intuitive
evaluation measure using conditional entropy analysis. Then,
we measure two score that aims to identify the Homogeneity
Fig. 11. Backend Chatbot Architecture. and Completeness of each clustering assignment:

The backend process consists of two steps. In the first step, • Homogeneity: mean that all members with the same
the keywords proposed approach is used to extract the relevant cluster belong to the same class.
keywords from the e-learning content, then building the • Completeness: all the members of a given class are in
representation space of each multimedia e-learning using the the same cluster.
terms/documents matrix. This first step is carried out in offline
mode since it is not linked to user demand and allows the The two concepts are between 0 and 1. Thus, on the basis
multimedia indexing. The second step is triggered at each user of its two score another measure called V-measure can be
instruction and consists of calculating the representation of the calculated. Indeed, V-measure is the harmonic mean of the
query in the multimedia indexing space, then recommending two scores.
to users the most similar content to his request. To complete our evaluation, we use the Rand Index. It
b) Chatbot FrontEnd: The Chatbot Frontend consists of measures the similarity between two partitions. Rand's index
the interface design that allows to receive the user instructions measures the percentage of correctly classified decisions and
and interact with the Backend to display the appropriate is defined as follows:
response. Thus, to ensure interaction with the user, the 𝑎+𝑏
Chatbot must have a user-friendly interface. In fact, the 𝑅𝐼 =
𝑎+𝑏+𝑐+𝑑
Chatbot interface is based on messaging platforms to interact
with the user. Let X: the partition obtained from the clustering algorithm
and G: the partition obtained from the ground truth. So:
In order to develop the Frontend of our Chatbot, we will
use the design tools offered by messaging platforms. Our • 𝑎 : the number of element pairs that is in the same
choice was towards the messenger Bot API offered by Slack cluster in X and the same cluster in G.
[32] since on the one hand, it provides the possibility of
interacting with a Python script which allows us to set up our • 𝑏: the number of element pairs that is in a different
approach developed in the Backend, and, on the other hand, cluster in X and a different cluster in G.
Messenger is the most messaging platform used by learners to

305 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020

• 𝑐 : the number of element pairs that is in the same

cluster in X but in different clusters in G.
• 𝑑: the number of element pairs that is in a different
cluster in X but in the same cluster in G.
Fig. 12 illustrates the results obtained by applying the two
approaches on our corpus.
Based on the results obtained, we can confirm that the
proposed approach reduces the representation space and thus
makes it possible to represent documents with only 35103
terms instead of 174 555 terms (a reduction of the dimension
of 80%). It also reduces the time required for the construction
of the dendrogram which was reduced at 10 min instead of 35
Fig. 13. Hierarchical view resulted from Multimedia Indexing.
min (a 70% decrease in execution time). Regarding the
validation indices, we notice that there is a remarkable
improvement in indices with an 8% gain in performance. This
confirms that selecting terms based on keywords helps create
create a more homogeneous tree structure for indexing
multimedia content and with less execution time.
B. Simulation
Fig. 14. Messenger Bot Quick Reply Example.
In order to show the suggested chatbot in action, we
propose a simulation based on different multimedia e-learning 2) Carrousel: Another way to display results to the
content which concerns tutorial of different Business
learner is carousel. A carousel is used when a lot of data must
Intelligence and Artificial Intelligence Tools. The first step is
to apply our suggested Keyword Extraction approach in order be presented to the learner. The buttons that accompany this
to construct an indexed e-learning content. Fig. 13 show form can either return a personalized message to the bot as a
Dendrogram obtained by our approach. specialized command to trigger a flow or redirect to a URL.
We use this form to recommend multimedia content which fit
The dendrogram provides a hierarchical representation of the user request. Fig. 15 illustrates an example of using
our database of e-learning contents. Indeed, we notice that
carousel in our Chatbot.
multimedia contents are grouped in a way that we can easily
distinguish between the different categories contained in our
e-learning database. This confirms that the proposed approach
is well suited to our case study. The indexed multimedia
representation allows us to organize our database in order to
facilitate information access and offer content adapted to each
user. Indeed, this indexed database will be used into chatbot
engine to response to the learner’s needs. Our Chatbot design
allows to interact with learners in two ways: Quick Reply and
Carousel.
1) Quick reply: Quick reply allows to create short instant
responses that can be selected by users. Indeed, we use this
form to create suggestions of course categories to the user.
Fig. 14 shows an example of quick reply.

Fig. 15. Carrousel example.

V. RECOMMENDATION
The proposed ChatBot design allows the learner to offer
Fig. 12. Results comparison of the Proposed Approach with the Classic the multimedia content adapted to their needs. However, its
Approach for Creating a Tree Structure. functionality is limited and does not allow it to interact
effectively in the case of general questions by the learner. One
of the ways to improve ChatBot is to integrate a Chit-Chat to
simulate human conversation.

306 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020

Fig. 16 shows the new architecture proposed that will be Our methodology aims to have an efficient way to
the aim for our future work. represent the multimedia content based on keywords. The use
of keywords on our approach result on a better representation
and reduce time to construct multimedia indexing. The core of
our chatbot is based on this indexed multimedia content which
enables it to look for the information quickly. Then our
designed chatbot reduce response time and meet the learner’s
need.
The proposed Chatbot design allows the learner to get the
multimedia adapted to their needs. However, its functionality
is limited and does not allow it to interact effectively in the
case of general questions by the learner. Our future work will
focus on integrating a Chit-Chat to simulate a human
conversation, also we will integrate voice recognition on the
chatbot in order to enlarge the scope of the chatbot
interactions.
REFERENCES
[1] El Janati, S., Maach, A., El Ghanami, D., “Learning Analytics
Framework for Adaptive E-learning System to Monitor the Learner’s
Activities,” International Journal of Advanced Computer Science and
Fig. 16. Architecture for Integrating Chit-Chat into the Design of ChatBot. Applications, vol. 10, no. 8, 2019.
[2] El Janati, S., Maach, A., El Ghanami, D., “SMART education
After receiving the question from the user, a classifier will framework for adaptation content presentation”. Procedia Computer
predict the class of the question and determine whether the Science, 2018, vol. 127, p. 436-443.
question is related to the area of e-learning or a general [3] El Janati, S., Maach, A., El Ghanami, D., “Context aware in adaptive
ubiquitous e-learning system for adaptation presentation content”.
question. Then, depending on the class category, the ChatBot Journal of Theoretical and Applied Information Technology, 2019,
offers an answer according to two scenarios: 97(16), pp. 4424-4438.
• Offer the adapted e-learning multimedia using the [4] Yan, Z., Duan, N., Bao, J., Chen, P., Zhou, M., Li, Z., & Zhou, J.,
“Docchat: An information retrieval approach for chatbot engines using
approach proposed in this paper. unstructured documents”. In: Proceedings of the 54th Annual Meeting of
the Association for Computational Linguistics (Volume 1: Long Papers).
• Suggest a response from the Chit-Chat database The 2016. p. 516-525.
integration of this module makes the conversation as [5] Singh, J., Joesph, M. H., & Jabbar, K. B. A., “Rule-based chabot for
natural as possible. student enquiries”. In: Journal of Physics: Conference Series. IOP
Publishing, 2019. p. 012060.
VI. CONCLUSION [6] Zamora, J., “Rise of the chatbots: Finding a place for artificial
The aim of our study is to develop a Chatbot allowing the intelligence in India and US”. In: Proceedings of the 22nd International
interaction with learners and suggest the e-learning Conference on Intelligent User Interfaces Companion. 2017. p. 109-112.
multimedia content, which fit their learning needs. To achieve [7] Sheikh, S. A., Tiwari, V., & Singhal, S., “Generative model chatbot for
Human Resource using Deep Learning”. In: 2019 International
this objective, first we set up an analysis of e-learning contents Conference on Data Science and Engineering (ICDSE). IEEE, 2019. p.
in order to extract the maximum amount of information. 126-132.
Indeed, we propose an approach based on the Speech-To-Text [8] Wang, Z., Wang, Z., Long, Y., Wang, J., Xu, Z., & Wang, B.,
APIs to extract text from different sources of multimedia. “Enhancing generative conversational service agents with dialog history
and external knowledge”. Computer Speech & Language, 2019, vol. 54,
Based on the information obtained from the extracted text, p. 71-85.
we set up the step of keywords extraction. In this step, we [9] Zhang, J., Huang, H., & Gui, G., “A Chatbot Design Method Using
evaluate different algorithms proposed in the literature. Then, Combined Model for Business Promotion”. In: International Conference
we conclude that they are not suitable for all multimedia in Communications, Signal Processing, and Systems. Springer,
contained in our e-learning database. Thus, we propose a new Singapore, 2018. p. 1133-1140.
approach making it possible to combine the results of different [10] Amudha, S., & Shanthi, I. E., “Phrase Based Information Retrieval
Analysis in Various Search Engines Using Machine Learning
approaches using a voting system. After that, we proceed to Algorithms”. In: Data Management, Analytics and Innovation. Springer,
indexing of the e-learning content by constructing a tree Singapore, 2020. p. 281-293.
structure allowing the organization of the information and [11] Koka, R. S., “Automatic Keyword Detection for Text Summarization”.
facilitating the access to the e-learning content. PhD diss., 2019.
[12] Hulth, A., & Megyesi, B. B., “A study on automatically extracted
Finally, we design the ChatBot core which was divided keywords in text categorization”. In: Proceedings of the 21st
into Backend / Frontend. On the one hand, the Backend design International Conference on Computational Linguistics and the 44th
is mainly based on the proposed approach to indexing e- annual meeting of the Association for Computational Linguistics.
learning multimedia content and which constitutes the engine Association for Computational Linguistics, 2006. p. 537-544.
of the NLP used. On the other hand, the design of the [13] Fernandez, R. R., & Uy, C., “Keywords on Online Video-ads Marketing
Frontend is based on Slack Messenger platform which offers Campaign: A Sentiment Analysis”. Review of Integrative Business and
Economics Research, 2020, vol. 9, p. 99-110.
an interface facilitating interaction with learners.

307 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020

[14] Siddiqi, S., & Sharan, A., Keyword and keyphrase extraction [24] Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel, R., Urtasun, R.,
techniques: a literature review. International Journal of Computer Torralba, A., & Fidler, S., “Skip-thought vectors”. In Advances in neural
Applications, 2015, vol. 109, no 2. information processing systems, 2015, pp. 3294-3302.
[15] Havrlant, L., & Kreinovich, V., “A simple probabilistic explanation of [25] Lau, J. H., & Baldwin, T., “An empirical evaluation of doc2vec with
term frequency-inverse document frequency (tf-idf) heuristic (and practical insights into document embedding generation”. arXiv preprint
variations motivated by this explanation)”. International Journal of arXiv:1607.05368, 2016.
General Systems, 2017, vol. 46, no 1, p. 27-36. [26] Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl, M., & Jaggi,
[16] Paukkeri M, Honkela T., “Likey: Unsupervised language independent M.,“Simple unsupervised keyphrase extraction using sentence
keyphrase extraction”. In: Proceedings of the 5th international workshop embeddings. arXiv preprint arXiv:1801.04470, 2018.
on semantic evaluation, Uppsala, Sweden, 2010. p. 162-165. [27] Sarosa, M., Junus, M., Hoesny, M. U., Sari, Z., & Fatnuriyah, M.,
[17] Campos, R., Mangaravite, V., Pasquali, A., Jorge, A. M., Nunes, C., & “Classification Technique of Interviewer-Bot Result using Naïve Bayes
Jatowt, A., “YAKE! collection-independent automatic keyword and Phrase Reinforcement Algorithms”. International Journal of
extractor”. In: European Conference on Information Retrieval. Springer, Emerging Technologies in Learning (iJET), 2018, vol. 13, no 02, p. 33-
Cham, 2018. p. 806-810. 47.
[18] Nara, N., Sharma, P., & Kumar, P., “Page Rank Algorithm: big data [28] Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., & Nevill-Manning,
analytic”. 2017. C. G., “Kea: Practical automated keyphrase extraction”. In: Design and
[19] Mihalcea, R., & Tarau, P., “Textrank: Bringing order into text”. In: Usability of Digital Libraries: Case Studies in the Asia Pacific. IGI
Proceedings of the 2004 conference on empirical methods in natural global, 2005. p. 129-152.
language processing. 2004. p. 404-411. [29] Nguyen, T. D., & Luong, M. T., “WINGNUS: Keyphrase extraction
[20] Wan, X., & Xiao, J., “Single Document Keyphrase Extraction Using utilizing document logical structure”. In: Proceedings of the 5th
Neighborhood Knowledge”. In: AAAI. 2008. p. 855-860. international workshop on semantic evaluation. Association for
Computational Linguistics, 2010. p. 166-169.
[21] Liu, Z., Huang, W., Zheng, Y., & Sun, M., “Automatic keyphrase
extraction via topic decomposition”. In: Proceedings of the 2010 [30] G. Hinton et al., “Deep Neural Networks for Acoustic Modeling in
conference on empirical methods in natural language processing. Speech Recognition: The Shared Views of Four Research Groups”,
Association for Computational Linguistics, 2010. p. 366-376. IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
Available: 10.1109/msp.2012.2205597.
[22] Mikolov, T., Chen, K., Corrado, G., & Dean, J., “Efficient estimation of
word representations in vector space”. arXiv preprint arXiv:1301.3781, [31] Amodei, D., Ananthanarayanan, S., Anubhai, R, et al., “Deep speech 2:
2013. End-to-end speech recognition in english and mandarin”. In:
International conference on machine learning. 2016. p. 173-182.
[23] Hayati, H., Chanaa, A., Idrissi, M. K., & Bennani, S., “Doc2Vec
&Naïve Bayes: Learners’ Cognitive Presence Assessment through [32] Lin, B., Zagalsky, A., Storey, M. A., & Serebrenik, A.. “Why developers
Asynchronous Online Discussion TQ Transcripts”. International Journal are slacking off: Understanding how software teams use slack”. In :
of Emerging Technologies in Learning (iJET), 2019, vol. 14, no 08, p. Proceedings of the 19th ACM Conference on Computer Supported
70-81. Cooperative Work and Social Computing Companion. 2016. p. 333-336.