Final Presentation
Final Presentation
2
Introduction to Chatbots
● Software used to communicate with humans
● ELIZA and was written with a predefined script
● Today, chatbots utilize complicated deep neural networks to remove the constraints that come with
predefining output responses
● Our research was on generative models where the model formulates its own response
Goals:
● Improve on current chatbot design
● Promote bots in communication departments
○ conversational chatbot
4
Significance
● Bridging the gap between human and machine intelligence
● Great versatility: can be used for customer accommodation, assistance, and as an emotional friend
● Stepping stone to allowing human thoughts to be understood
● Few departments specializing in human interaction such as resources use chatbots
6
Methodology: Dataset
● 220,000 lines of Cornell Movie Corpus and 700,000 lines of Twitter
7
Methodology: Data Preprocessing
● Before the textual input is fed into the model, it needs to be preprocessed into
a readable format
○ Punctuation and capitalization were removed
○ Uncommon words and repeated lines were removed
○ Padded to make all words the same length
8
Methodology: Word Embeddings vs One hot encoding
● One hot encoding is used to convert each word input into vectoral form
○ Examples:
■ Outstanding = [1,0,0,0]
■ Amazing = [0,1,0,0]
■ Awesome = [0,0,1,0]
■ Great = [0,0,0,1]
● Problem with this approach is that synonyms will be counted as different
words and model will treat them differently
9
Methodology: Word Embeddings vs One Hot Encoding, continued
● Word Embeddings
○ Representation of text where words that have similar meanings have a similar representation
○ Most commonly used in an embedding layer
■ A word embedding that is learned jointly with a neural network model on a specific NLP
task such as language modeling
■ Input to layer must be a unique integer representation
■ Initialized with random weights and will learn an embedding for all the words in the
dataset
■ Arguments to be specified include the input dimension, output dimension, and the input
length
10
Methodology: Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM)
● Advantages
○ Avoids the vanishing gradient problem commonly associated with other networks
○ Advantage in storing short-term memory
● Main Idea
○ Gates filters out unnecessary information/words from a sentence that does not match the
intent
○ Example: This show is amazing. It is not bad. I need to buy tickets immediately.
■ The words the LSTM pick up are ones that match the intent. They are highlighted.
11
Methodology: Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM), continued
● Structure
○ LSTM
■ 3 gates
● Input gate
● Forget gate
● Output gate
○ GRU
■ 2 gates
● Reset gate
● Update gate
12
Methodology: Sequence to Sequence Model
● Used sequence-to-sequence model
○ A 3 layer neural network
○ Includes an encoder and decoder
○ Both encoder and decoder are recurrent neural networks
14
Methodology: Decoder
15
Analysis: Results
● The table on the left shows examples of output that correctly responded to each input respectively. Meanwhile, the
table on the right shows examples of output thats do not properly responded to each input respectively
● The chatbot was able to correctly respond 72.2% of the time
Input Output
16
Analysis: Learning Rate
● Initially, we set the learning rate higher and noticed that the model was giving
a higher loss despite a higher accuracy
○ One possibility was that the model was overconfident with its predictions
● Learning rate of 1E-4
17
Analysis: Learning Loss and Accuracy
● Loss decreased from 1.3 (epoch 1) to 0.4 (epoch 50)
● Accuracy grew from 38.9% (epoch 1) to 72.2% (epoch 50)
Figure 11. Loss vs. Epoch Graph Figure 11. Loss vs. Epoch Graph
18
Conclusion
● Due to the nature of our dataset, slang was incorporated
○ Bot understood popular topics such as the presidential election
● Ended with a 72% accuracy
● Could not respond to personal questions
○ Plan on building a “personal information document” saved in memory
○ Therefore, if personal questions are classified, a retrieval based model can be used
● Has potential in providing great emotional services
● Improved encoder can read between the lines
19
Acknowledgements
We would like to thank Ryan Solgi, Laboni Sarker, S. Shailja, Dr. Lina Kim, and the
SRA staff for their support.
20
References
[1] Ayanouz, Soufyane, et al. “A Smart Chatbot Architecture Based NLP and Machine Learning for Health Care Assistance.” ResearchGate, Association for Computing Machinery, 31
Mar. 2020, www.researchgate.net/publication/340678278_A_Smart_Chatbot_Architecture_based_NLP_and_Machine_Learning_for_Health_Care_Assistance.
[2] Brain. “Chatbot Report 2019: Global Trends and Analysis.” Medium, Chatbots Magazine, 19 Apr. 2019, chatbotsmagazine.com/chatbot-report-2019-global-trends-and-analysis-
a487afec05b.
[3] Chablani, Manish “Sequence to Sequence Model: Introduction and Concepts.” Medium, Towards Data Science, 23 June 2017 towardsdatascience.com/sequence-to-sequence-
model-introduction-and-concepts-44d9b41cd42
[4] “Chatbot Tutorial¶.” Chatbot Tutorial - PyTorch Tutorials 1.9.0+cu102 Documentation , Pytorch, pytorch.org/tutorials/beginner/chatbot_tutorial.html?highlight=chatbot+tutorial.
[5] Fang, Hao, et al. “Sounding Board: A User-Centric and Content-Driven Social Chatbot.” Arxiv, Cornell University, 26 Apr. 2018, arxiv.org/abs/1804.10202.
[6 ] Jwala, K. "(2019, June)." (n.d.): Jwala, K., Sirisha, G. N. V. G., & Raju, G. V. P. (2019, June). Developing a Chatbot using Machine Learning.
https://www.ijrte.org/wp-content/uploads/papers/v8i1S3/A10170681S319.pdf.
[7] Liu, Bing, et al. “Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems.” Aclanthology, Association for Computational
Linguistics, June 2018, aclanthology.org/N18-1187/.
[8] Mazarè, Pierre-Emmanuel, et al. “Training Millions of Personalized Dialogue Agents.” Arxiv, Cornell University, 6 Sept. 2018, arxiv.org/abs/1809.01984.
[9] Siddhant, Aditya, et al. “Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents.” Arxiv, Carnegie Mellon University, 13 Nov. 2018,
arxiv.org/pdf/1811.05370.pdf.
[10] Suta, P., Lang, X., Wu, B., Mongkolnam, P., & Chan, J. H. (2020, April 4). An Overview of Machine Learning in Chatbots .
http://www.ijmerr.com/uploadfile/2020/0312/20200312023706525.pdf.
21
Questions
Contact Information: [email protected], [email protected], [email protected]
22