Adaptive e Learning AI Powered Chatbot B
Adaptive e Learning AI Powered Chatbot B
Abstract—With the rapid evolution of e-learning technology, depend exclusively on content, and will frequently appear
the multiple sources of information become more and more valuable cards, pictures, joins, and shapes, giving an app-like
accessible. However, the availability of a wide range of e-learning encounter.
offers makes it difficult for learners to find the right content for
their training needs. In this context, our paper aims to design an One of the most difficult areas of research is the
e-learning AI-powered Chatbot allowing interaction with development of effective Chatbots that emulate human
learners and suggesting the e-learning content adapted to their dialogue. It is a difficult task that involves problems related to
needs. In order to achieve these objectives, we first analysed the the NLP (Natural Language Processing) research field [4].
e-learning multimedia content to extract the maximum amount Thanks to NLP techniques and algorithms, it possible to
of information. Then, using Natural Language Processing (NLP) understand the learners' requests based on what the learner is
techniques, we introduced a new approach to extract keywords. writing. Usually, this task is the core of the Chatbot but there
After that, we suggest a new approach for multimedia indexing are some limitations as it is not possible to map all learner
based on extracted keywords. Finally, the Chatbot architecture is requests, and current Chatbots do not show remarkable
realized based on the multimedia indexing and deployed on performance due to the unpredictability of the thinking of the
online messaging platforms. The suggested approach aims to learner during a conversation [5]. One of the most important
have an efficient way to represent the multimedia content based task in setting up Chatbot is the design of the conversational
on keywords. We compare our approach with approaches in
flow. In fact, we suggest a new approach for a successful
literature and we deduce that the use of keywords on our
conversation based on keywords extractions, which it is
approach result on a better representation and reduce time to
construct multimedia indexing. The core of our Chatbot is based important to handle with all learners requests and provide the
on this indexed multimedia content which enables it to look for adequate content.
the information quickly. Then our designed Chatbot reduce This paper is structured as follows: the second section is a
response time and meet the learner’s need. related work. In the third section, we will present the
suggested approach. Then, the fourth part will be devoted to
Keywords—e-Learning; Chatbot; Speech-To-Text; NLP;
our result and our simulation. Finally, we will conclude with a
Keywords Extraction; Text Clustering; Multimedia Indexing
recommendation approach to improve the suggested approach.
I. INTRODUCTION
II. RELATED WORK
With the rapid evolution of the IT field and the continuous
The use of the advanced technology in data science
updating of available tools. Learners must undergo continuous
enables to improve the quality of e-learning content. In this
trainings in order to improve their skills and ensure
paper, we suggest a framework that uses Natural Language
technological monitoring [1]. Indeed, training has been
Processing (NLP) and Keywords Extraction as a chatbot
strongly affected by the digital transformation. e-Learning
engine for e-learning. Thus, in this section we will review the
(online training) is a perfect example with the digitization of
literature related to chatbot and Automatic Keywords
learning. This distance learning technique eliminates the
Extraction (AKE).
physical presence of a trainer [2]. It has many advantages and
is become a part of the learner journey. However, the quantity A. Chatbot
of multimedia contents offered for the learner increases Chatbots are a virtual assistant capable of chatting with
exponentially, which makes the autonomous learning a users and responding to their requests. They are increasingly
complicated task because it is necessary to choose the content using speech synthesis techniques to produce their messages
adapted to their context to ensure an effective learning [3]. as the user types their interventions into a text field on the web
In this context, our study aims to create an indexed page.
database of e-learning multimedia content in order to set up a The design of a Chatbot to meet the needs of users was
Chatbot system who offer the appropriate content to each always a concern in the field of information retrieval [4].
learner according to his or her e-learning needs. The Depending on how Chatbots are programmed, we can divide
advancements in artificial intelligence and NLP, allowing bots them into two large groups: those that are programmed
to converse more and more, like real people who share e- according to predefined commands (rule-based Chatbot) [5]
learning presentation contents. Predominant Chatbots do not and those based on artificial intelligence (AI) [6].
299 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
AI Chatbots using machine learning are designed to In this section, we provide an overview of keyword
understand the context and intent of a question before extraction methods. Our goal is not to detail the functioning of
formulating an answer. There are two types of AI Chatbots: these methods, but rather to have an overview of their basic
Generative Chatbots [7] and Information Retrieval Chatbots principle and their classification in order to identify the
[4]. Fig. 1 presents Chabot’s types. suitable methods for our case study.
The generative Chatbot is built in order to be able to act to 1) Statistic approach: Statistical Approaches is
any context or interlocutor and also in nonprogrammed considered one of the simplest techniques used to identify
situations. Such a conversational agent relies on the new keywords within a text. These approaches do not require
artificial intelligence techniques such as deep learning and training data in order to extract the most important keywords
neural network to generate his responses word by word [8].
in a text. it seeks to define what a keyword, based on certain
Thus, these bots can construct answers to users questions
themselves. The problem with this approach is that it is too statistical features and study their relation with the notion of
general and Chatbots struggle to have a coherent conversation importance of a candidate term. the more the term candidate is
or even to produce syntactically valid sentences with current considered important in analysed document, and the more it
models. will be relevant as a keyword.
The Information retrieval Chatbots are the chatbot adapted TF-IDF [15] and Likey [16] are two methods, which
to a given context, which builds its responses using a set of compare the behavior of a candidate term in the analysed
sentences that have been given to it in advance. These document with its behavior in a collection of documents
chatbots are based on retrieving information from the user's (reference corpus). The objective is to find candidate terms
question and seek the most suitable answer using NLP [9]. whose behavior in the document varies positively compared to
This type of Chatbots is best suited for closed domain their overall behavior in the collection. In both methods, this is
systems. This approach guarantees grammatically correct expressed by the fact that a term is important in the document
answers and simplifies the learning task for the algorithm, analysed if it is largely present, when it is not in the rest of the
because it allows constructing the model on a training data set collection.
smaller than the big amount of data required in the case of a Yake! Approach [17] focuses on statistical features that do
generative Chatbot [4]. not require external dictionaries and look on characteristics,
Thus, in our case study we chose to design a Chatbot based which can be calculated using only current document. These
on retrieval information since it allows to exploit the results approaches are based on characteristics such as the position of
obtained from videos indexing and also have some control the first occurrence of a candidate, the word frequency, the
over the responses generated by the Chatbot. case, and the frequency with which a word appears in different
sentences.
Our contribution consists of using a keyword extraction
technique adapted to the multimedia e-learning content we
have. Thus, the Chatbot will be based on the keywords instead
of all the text in order to find the adequate answer to the
learner's need in a fast and efficient way. The next section will
be dedicated to keyword extraction techniques found in
literature.
B. Automatic Keywords Extraction (AKE)
Keyword extraction involves identifying the words and
phrases representing the main subjects of a document. High
quality keywords can make it easier to understand, organize
and access the content of the document. AKE from a
document has been used in many applications, such as
information retrieval [10], text synthesis [11], text
categorization [12], and opinions mining [13].
Most existing keyword extraction algorithms address this
problem in three steps (Fig. 2): First, the candidate keywords Fig. 1. Chatbots Types.
(i.e. words and phrases that can be used as keywords) are
selected in the content of the document. Second, candidates
are either classified using a candidate weighting function
(unsupervised approaches) or classified into two classes
(keywords / no keywords) using a set of extracted
characteristics (supervised approaches). Third, the most
weighted first N candidates with the highest confidence scores
are selected as keywords [14].
Fig. 2. General approach for Automatic Keywords Extraction.
300 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
2) Graph based approach: Graph-based approaches can adapt to any task, including AKE task. Algorithms used
consist of representing the content of a document as a graph. for this construct probabilistic models, decision trees, Support
The methodology applied comes from PageRank [18], an Vector Machine (SVM) or even neural networks.
algorithm for ranking web pages (graph nodes) based on the
KEA is a method which uses a naive Bayesian
recommendation links that exist between them (graph edges). classification to assign a likelihood score to each candidate
TextRank [19] and SingleRank [20] are the two basic term, the aim being to indicate whether they are keywords or
adaptations of PageRank for AKE. In these, web pages are not [28]. These approaches use three conditional distributions
replaced by text units whose granularity is the word and an learned from the learning corpus. The first is the probability
edge is created between two nodes if the words they represent that each candidate term is labelled yes (keyword) or no (no
co-occur in a given word window. When running the keyword). The other two stand for two different statistic
algorithm, a score is associated with each candidate keyword, features which are the TF-IDF weight of the term candidate
which represents the importance of the node in the graph. and its first position in the document.
To improve TextRank / SingleRank, Liu et al. suggest a Nguyen et al. propose an improvement (WINGNUS) [29]
method, which aims to increase the coverage of all the key by adding a set of features such as: First and last occurrences
terms extracted in the analysed document (TopicalPageRank) (word off-set), Length of phrases in words, whether a phrase is
[21]. To do this, they try to refine the importance of words in also part of the document title, number of times a phrase
the document by taking into account their rank in each topic. appears in the title of other document. Adding these features
The rank of a word for a topic is obtained by integrating into improves the performance of the original version of KEA, but
its PageRank score the probability that it belongs to the topic. only when the amount of data is large enough. Suggested
The overall rank of a candidate term is then obtained by Architecture.
merging its ranks for each topic.
III. SUGGESTED APPROACH
3) Word embedding approach: Word embedding [22] is a In this section, we suggest an approach of an adaptive
new way of representing words as vectors, typically in a space Chatbot based on Retrieval information. This approach
of a few hundred dimensions. A word is transformed into consists of indexing e-learning multimedia using keyword
numbers. The word representation vector is learned based on extraction. These indexed contents will be integrated in the
an iterative algorithm from a large amount of text. The Chatbot engine in order to offer the adequate content to the
algorithm tries to put the vectors in space in order to bring learner’s needs.
together the semantically close words, and to move away the The suggested approach is based on four steps, which will
semantically distant words. By finding the closest words in the be detailed:
embedding space of a given word as input, the model
identifies synonyms or intruders in a list of words. Once a 1) Extracting metadata from the e-learning database.
model is obtained, several standard tasks become possible. 2) Speech to text Processing.
3) Automatic Keywords Extraction.
Different methods based on the word embedding are 4) Suggested Chatbot design.
suggested for representing entire documents or sentences [23].
Skip-thought [24] provides sentence embedding trained to The first step of our approach consists of analyzing e-
predict neighbouring sentences. Sent2Vec [25] generate learning multimedia content and extracting as much
sentence embedding using word n-gram representation. information as possible from e-learning content in order to
build our database. The second step will be devoted to
With words embedding the keyword can be extracted standardize the e-learning content by transforming all content
using the cosine similarity in the word space representation. to text, so we use the Speech To Text techniques in order to
EmbedRank [26] is a word embedding based approach to extract text from all multimedia content. In the third step, we
automatically extract key phrases. In this method, documents will suggest methods for extracting keywords from the
or word sequences of arbitrary length are embedded into the extracted text. We will test several approaches and suggest a
same feature space. This enables computing semantic new approach adapted to our problematic.The last step
relatedness between document and candidate keyword by describe chatbot design based on keywords resulting from the
using the cosines similarity measures. previous steps. Fig. 3 summarizes the methodology of our
4) Supervised approach: Supervised approaches are approach.
methods who able to learn to perform a particular task, in this A. Extracting Metadata from e-Learning Database
case the extraction of keywords. Learning is done through a In our study, we use a various source of e-learning content
corpus whose documents are annotated in keywords. The in order to construct database, which will be the base of our
annotation allows to extract the examples and counter- Chatbot recommendation. e-Learning sources provide content
examples whose statistical and / or linguistic features are used in different types of information like video, speech and, text.
to teach a binary classification [27]. These classifications The first step of our approach consists of extracting as much
consist of indicating if a candidate term is a keyword or not. information as possible from e-learning content in order to
Many supervised algorithms are used in various fields. They build our database.
301 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
302 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
Fig. 6 and Table I show WER evaluation obtained from In the same perspective, we will compare keywords
the APIs transcription compared to ground truth. obtained by each approach with the set of Tags provided by
the author of some multimedia content and which we consider
The results obtained show that the transcription of to be the ground truth. So, we will use the F1 score which is
YouTube is the most efficient, followed by the transcription of based on precision and recall. Precision is defined as the
Houndify. This is because the YouTube transcription is number of correctly predicted keywords out of the number of
generally based on the publisher's recommendation, but this all predicted keywords, and recall is defined as the number of
transcription is not always available. Houndify API specializes correctly predicted keywords out of the total number of
in detecting lyrics in music and that can explain the good keywords in the ground truth set. Note that, to determine the
performance of this API, since our e-learning data source correspondence between two key words, we use Porter
mainly contains a speech with a music background. We can Stemmer for the preprocessing in order to consider keywords
also notice that IBM and Google services have a similar which have the same root word.
performance since they are based on similar resources for
learning phase. Finally, the Wit transcription is the least Then, we calculated F1 for the first 20 keywords on our set
efficient and this is generally due to resources for the learning of e-learning multimedia. Fig. 8 represents box plot graphs for
phase which are quite limited. evaluated algorithms as well as Table II presents the statistical
measures (Min, Max and the mean).
Based on results obtained is this step, we decided to make
the approach shown on Fig. 7, in order to normalize
multimedia e-learning types and extract text from video and
audio e-learning content.
Architecture shown on Fig. 7 contains three different
processing; each one is adapted for one type of e-learning
multimedia:
• Video Processing: Extract plain text from original video
if it is provided by the author, otherwise apply Speech
to Text API in order to extract text from video speech.
• Speech processing: Apply Speech Recognition in order
to extract text from e-learning speech.
• Text processing: Extract plain text from e-learning text.
This architecture enables to have a normalized Database,
which contains plain text for every e-learning multimedia
sources. The next step is to extract from the plain text the
important words (keywords) that will represent each Fig. 6. WER Index of Transcription obtained from the ASR APIs.
multimedia content.
TABLE I. WER INDEX OF TRANSCRIPTION OBTAINED FROM ASR APIS
C. Automatic Keywords Extraction
In this subsection, we will present a keyword extraction YouTube Houndify Google IBM Wit
evaluation based on the techniques described in our literature WER 97,04 91,12 86,83 86,13 75,23
review.
1) Keyword extraction approach evaluation: In order to
evaluate the performance of the different algorithms, a first
approach consists of applying a manual evaluation based on
human judgement to decide whether the keywords are
representatives of the content of a document or not [17].
Nevertheless, manual evaluation of Automatic Keywords
Extraction is difficult and time consuming.
Researchers have therefore developed some automatic
evaluation systems based on partial correspondence.
Automatic key phrase extraction methods have generally been
evaluated based on the number of N first candidates which
correspond correctly to the reference keywords. This number
is then used to calculate the accuracy, recall and F-score for a Fig. 7. Architecture of the approach chosen for Multimedia Normalization
set of keywords. (Video/Speech to Text Normalization).
303 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
Based on Fig. 8, we note that the F1-score of all Then, a global weight for each keyword is calculated by
approaches shows a very large variability. Thus, the relevance summing the normalized weights. The normalization step
of the generated keywords is not stable enough. We can avoids favoring one method over another and thus allow to
deduce that all approaches evaluated are not adapted to all the obtain a global weight based on a non-discriminatory vote.
content of our e-learning database, and that each approach Also, to avoid generate similar keywords, the weights of
generates a set of relevant keywords only for a part of our e- keywords with the same root word are grouped together by
learning corpus. In this context, we suggest a new framework considering them as a single candidate keyword.
based on ensemble methods in order to improve keywords
consistency. 𝑤𝑖 = � 𝑤𝑖𝑗 ∀ 𝑖 ∈ {𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠}
𝑖 è𝑚𝑒 𝑘𝑒𝑦𝑤𝑜𝑟𝑑 𝑖𝑛 𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠 𝑚𝑜𝑑𝑒𝑙
2) Suggested framework for automatic keyword
extraction: In order to propose a framework adapted to our Finally, candidate keywords are ranked based on their
case study, our main idea consists of suggest an approach global weight and the top N are chosen as keywords to
allowing to combine the results obtained from all approaches represent each e-learning multimedia content.
by using a voting system. We notice that the choice of the models to be considered in
Basically, the first step our approach is to apply all the order to build the global weight of each candidate keyword is
AKE methods in order to obtain the first N keywords with important. Thus, to set up an adequate approach to our case
their weights according to each method. study, several configurations are evaluated, namely:
After obtaining the keywords with their weights for each • Voting system based on all methods (Keyword Vote).
method, the second step of our approach is to normalize the • Voting system based on one method per approach
weight of each keyword by dividing its weight for each (statistic, graph based, word embedding, supervised
method by the weight of the keyword with the maximum approach) (Keyword Vote 2).
weight. Thus, the first keyword will have a weight equal to 1.
𝑤𝑖𝑗 • Voting system based on best methods (PositionRank,
𝑤𝑖𝑗 = ∀ 𝑖 ∈ {𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠}, ∀ 𝑗 ∈ {𝑚𝑜𝑑𝑒𝑙𝑠} Topical Rank, EmbedRank, Wingnus) (Keyword Vote
max(𝑤𝑖𝑗 ) ++).
𝑖
304 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
The backend process consists of two steps. In the first step, • Homogeneity: mean that all members with the same
the keywords proposed approach is used to extract the relevant cluster belong to the same class.
keywords from the e-learning content, then building the • Completeness: all the members of a given class are in
representation space of each multimedia e-learning using the the same cluster.
terms/documents matrix. This first step is carried out in offline
mode since it is not linked to user demand and allows the The two concepts are between 0 and 1. Thus, on the basis
multimedia indexing. The second step is triggered at each user of its two score another measure called V-measure can be
instruction and consists of calculating the representation of the calculated. Indeed, V-measure is the harmonic mean of the
query in the multimedia indexing space, then recommending two scores.
to users the most similar content to his request. To complete our evaluation, we use the Rand Index. It
b) Chatbot FrontEnd: The Chatbot Frontend consists of measures the similarity between two partitions. Rand's index
the interface design that allows to receive the user instructions measures the percentage of correctly classified decisions and
and interact with the Backend to display the appropriate is defined as follows:
response. Thus, to ensure interaction with the user, the 𝑎+𝑏
Chatbot must have a user-friendly interface. In fact, the 𝑅𝐼 =
𝑎+𝑏+𝑐+𝑑
Chatbot interface is based on messaging platforms to interact
with the user. Let X: the partition obtained from the clustering algorithm
and G: the partition obtained from the ground truth. So:
In order to develop the Frontend of our Chatbot, we will
use the design tools offered by messaging platforms. Our • 𝑎 : the number of element pairs that is in the same
choice was towards the messenger Bot API offered by Slack cluster in X and the same cluster in G.
[32] since on the one hand, it provides the possibility of
interacting with a Python script which allows us to set up our • 𝑏: the number of element pairs that is in a different
approach developed in the Backend, and, on the other hand, cluster in X and a different cluster in G.
Messenger is the most messaging platform used by learners to
305 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
V. RECOMMENDATION
The proposed ChatBot design allows the learner to offer
Fig. 12. Results comparison of the Proposed Approach with the Classic the multimedia content adapted to their needs. However, its
Approach for Creating a Tree Structure. functionality is limited and does not allow it to interact
effectively in the case of general questions by the learner. One
of the ways to improve ChatBot is to integrate a Chit-Chat to
simulate human conversation.
306 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
Fig. 16 shows the new architecture proposed that will be Our methodology aims to have an efficient way to
the aim for our future work. represent the multimedia content based on keywords. The use
of keywords on our approach result on a better representation
and reduce time to construct multimedia indexing. The core of
our chatbot is based on this indexed multimedia content which
enables it to look for the information quickly. Then our
designed chatbot reduce response time and meet the learner’s
need.
The proposed Chatbot design allows the learner to get the
multimedia adapted to their needs. However, its functionality
is limited and does not allow it to interact effectively in the
case of general questions by the learner. Our future work will
focus on integrating a Chit-Chat to simulate a human
conversation, also we will integrate voice recognition on the
chatbot in order to enlarge the scope of the chatbot
interactions.
REFERENCES
[1] El Janati, S., Maach, A., El Ghanami, D., “Learning Analytics
Framework for Adaptive E-learning System to Monitor the Learner’s
Activities,” International Journal of Advanced Computer Science and
Fig. 16. Architecture for Integrating Chit-Chat into the Design of ChatBot. Applications, vol. 10, no. 8, 2019.
[2] El Janati, S., Maach, A., El Ghanami, D., “SMART education
After receiving the question from the user, a classifier will framework for adaptation content presentation”. Procedia Computer
predict the class of the question and determine whether the Science, 2018, vol. 127, p. 436-443.
question is related to the area of e-learning or a general [3] El Janati, S., Maach, A., El Ghanami, D., “Context aware in adaptive
ubiquitous e-learning system for adaptation presentation content”.
question. Then, depending on the class category, the ChatBot Journal of Theoretical and Applied Information Technology, 2019,
offers an answer according to two scenarios: 97(16), pp. 4424-4438.
• Offer the adapted e-learning multimedia using the [4] Yan, Z., Duan, N., Bao, J., Chen, P., Zhou, M., Li, Z., & Zhou, J.,
“Docchat: An information retrieval approach for chatbot engines using
approach proposed in this paper. unstructured documents”. In: Proceedings of the 54th Annual Meeting of
the Association for Computational Linguistics (Volume 1: Long Papers).
• Suggest a response from the Chit-Chat database The 2016. p. 516-525.
integration of this module makes the conversation as [5] Singh, J., Joesph, M. H., & Jabbar, K. B. A., “Rule-based chabot for
natural as possible. student enquiries”. In: Journal of Physics: Conference Series. IOP
Publishing, 2019. p. 012060.
VI. CONCLUSION [6] Zamora, J., “Rise of the chatbots: Finding a place for artificial
The aim of our study is to develop a Chatbot allowing the intelligence in India and US”. In: Proceedings of the 22nd International
interaction with learners and suggest the e-learning Conference on Intelligent User Interfaces Companion. 2017. p. 109-112.
multimedia content, which fit their learning needs. To achieve [7] Sheikh, S. A., Tiwari, V., & Singhal, S., “Generative model chatbot for
Human Resource using Deep Learning”. In: 2019 International
this objective, first we set up an analysis of e-learning contents Conference on Data Science and Engineering (ICDSE). IEEE, 2019. p.
in order to extract the maximum amount of information. 126-132.
Indeed, we propose an approach based on the Speech-To-Text [8] Wang, Z., Wang, Z., Long, Y., Wang, J., Xu, Z., & Wang, B.,
APIs to extract text from different sources of multimedia. “Enhancing generative conversational service agents with dialog history
and external knowledge”. Computer Speech & Language, 2019, vol. 54,
Based on the information obtained from the extracted text, p. 71-85.
we set up the step of keywords extraction. In this step, we [9] Zhang, J., Huang, H., & Gui, G., “A Chatbot Design Method Using
evaluate different algorithms proposed in the literature. Then, Combined Model for Business Promotion”. In: International Conference
we conclude that they are not suitable for all multimedia in Communications, Signal Processing, and Systems. Springer,
contained in our e-learning database. Thus, we propose a new Singapore, 2018. p. 1133-1140.
approach making it possible to combine the results of different [10] Amudha, S., & Shanthi, I. E., “Phrase Based Information Retrieval
Analysis in Various Search Engines Using Machine Learning
approaches using a voting system. After that, we proceed to Algorithms”. In: Data Management, Analytics and Innovation. Springer,
indexing of the e-learning content by constructing a tree Singapore, 2020. p. 281-293.
structure allowing the organization of the information and [11] Koka, R. S., “Automatic Keyword Detection for Text Summarization”.
facilitating the access to the e-learning content. PhD diss., 2019.
[12] Hulth, A., & Megyesi, B. B., “A study on automatically extracted
Finally, we design the ChatBot core which was divided keywords in text categorization”. In: Proceedings of the 21st
into Backend / Frontend. On the one hand, the Backend design International Conference on Computational Linguistics and the 44th
is mainly based on the proposed approach to indexing e- annual meeting of the Association for Computational Linguistics.
learning multimedia content and which constitutes the engine Association for Computational Linguistics, 2006. p. 537-544.
of the NLP used. On the other hand, the design of the [13] Fernandez, R. R., & Uy, C., “Keywords on Online Video-ads Marketing
Frontend is based on Slack Messenger platform which offers Campaign: A Sentiment Analysis”. Review of Integrative Business and
Economics Research, 2020, vol. 9, p. 99-110.
an interface facilitating interaction with learners.
307 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 11, No. 12, 2020
[14] Siddiqi, S., & Sharan, A., Keyword and keyphrase extraction [24] Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel, R., Urtasun, R.,
techniques: a literature review. International Journal of Computer Torralba, A., & Fidler, S., “Skip-thought vectors”. In Advances in neural
Applications, 2015, vol. 109, no 2. information processing systems, 2015, pp. 3294-3302.
[15] Havrlant, L., & Kreinovich, V., “A simple probabilistic explanation of [25] Lau, J. H., & Baldwin, T., “An empirical evaluation of doc2vec with
term frequency-inverse document frequency (tf-idf) heuristic (and practical insights into document embedding generation”. arXiv preprint
variations motivated by this explanation)”. International Journal of arXiv:1607.05368, 2016.
General Systems, 2017, vol. 46, no 1, p. 27-36. [26] Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl, M., & Jaggi,
[16] Paukkeri M, Honkela T., “Likey: Unsupervised language independent M.,“Simple unsupervised keyphrase extraction using sentence
keyphrase extraction”. In: Proceedings of the 5th international workshop embeddings. arXiv preprint arXiv:1801.04470, 2018.
on semantic evaluation, Uppsala, Sweden, 2010. p. 162-165. [27] Sarosa, M., Junus, M., Hoesny, M. U., Sari, Z., & Fatnuriyah, M.,
[17] Campos, R., Mangaravite, V., Pasquali, A., Jorge, A. M., Nunes, C., & “Classification Technique of Interviewer-Bot Result using Naïve Bayes
Jatowt, A., “YAKE! collection-independent automatic keyword and Phrase Reinforcement Algorithms”. International Journal of
extractor”. In: European Conference on Information Retrieval. Springer, Emerging Technologies in Learning (iJET), 2018, vol. 13, no 02, p. 33-
Cham, 2018. p. 806-810. 47.
[18] Nara, N., Sharma, P., & Kumar, P., “Page Rank Algorithm: big data [28] Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., & Nevill-Manning,
analytic”. 2017. C. G., “Kea: Practical automated keyphrase extraction”. In: Design and
[19] Mihalcea, R., & Tarau, P., “Textrank: Bringing order into text”. In: Usability of Digital Libraries: Case Studies in the Asia Pacific. IGI
Proceedings of the 2004 conference on empirical methods in natural global, 2005. p. 129-152.
language processing. 2004. p. 404-411. [29] Nguyen, T. D., & Luong, M. T., “WINGNUS: Keyphrase extraction
[20] Wan, X., & Xiao, J., “Single Document Keyphrase Extraction Using utilizing document logical structure”. In: Proceedings of the 5th
Neighborhood Knowledge”. In: AAAI. 2008. p. 855-860. international workshop on semantic evaluation. Association for
Computational Linguistics, 2010. p. 166-169.
[21] Liu, Z., Huang, W., Zheng, Y., & Sun, M., “Automatic keyphrase
extraction via topic decomposition”. In: Proceedings of the 2010 [30] G. Hinton et al., “Deep Neural Networks for Acoustic Modeling in
conference on empirical methods in natural language processing. Speech Recognition: The Shared Views of Four Research Groups”,
Association for Computational Linguistics, 2010. p. 366-376. IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
Available: 10.1109/msp.2012.2205597.
[22] Mikolov, T., Chen, K., Corrado, G., & Dean, J., “Efficient estimation of
word representations in vector space”. arXiv preprint arXiv:1301.3781, [31] Amodei, D., Ananthanarayanan, S., Anubhai, R, et al., “Deep speech 2:
2013. End-to-end speech recognition in english and mandarin”. In:
International conference on machine learning. 2016. p. 173-182.
[23] Hayati, H., Chanaa, A., Idrissi, M. K., & Bennani, S., “Doc2Vec
&Naïve Bayes: Learners’ Cognitive Presence Assessment through [32] Lin, B., Zagalsky, A., Storey, M. A., & Serebrenik, A.. “Why developers
Asynchronous Online Discussion TQ Transcripts”. International Journal are slacking off: Understanding how software teams use slack”. In :
of Emerging Technologies in Learning (iJET), 2019, vol. 14, no 08, p. Proceedings of the 19th ACM Conference on Computer Supported
70-81. Cooperative Work and Social Computing Companion. 2016. p. 333-336.
308 | P a g e
www.ijacsa.thesai.org