0% found this document useful (0 votes)
24 views22 pages

Final Presentation

Uploaded by

ivanov.john04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views22 pages

Final Presentation

Uploaded by

ivanov.john04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Effective Chatbots using Deep

Learning and Natural Language


Processing
John Ivanov,1 Prajval Sharma,2 Yarwin Liu3
Tesoro High School,1 Cupertino High School, 2 Aliso Niguel High School3

SRA Track #9 - Machine Learning and Optimization


SRA Capstone Seminar
7/23
Presentation Outline
1. Introduction to Chatbots
2. Research Question & Goals
3. Significance
4. Literature Review
5. Methodology
6. Analysis
7. Conclusion
8. Acknowledgements
9. References

2
Introduction to Chatbots
● Software used to communicate with humans
● ELIZA and was written with a predefined script
● Today, chatbots utilize complicated deep neural networks to remove the constraints that come with
predefining output responses
● Our research was on generative models where the model formulates its own response

Figure 1. Classification of Chatbot types 3


Research Question & Goals
How can chatbots be trained to have more natural
conversations?

Goals:
● Improve on current chatbot design
● Promote bots in communication departments
○ conversational chatbot

Image 1. Cartoon Chatbot

4
Significance
● Bridging the gap between human and machine intelligence
● Great versatility: can be used for customer accommodation, assistance, and as an emotional friend
● Stepping stone to allowing human thoughts to be understood
● Few departments specializing in human interaction such as resources use chatbots

Figure 2. Use of Chatbots by Department 5


Literature Review
● Steps needed to create a model would include Natural Language Understanding and
Natural Language Generation
● Great datasets include corpuses with dialogues or data from communication
applications
● Cleaning of data dramatically improved sentence error rate
● Sequence to sequence model is state of the art in this field

6
Methodology: Dataset
● 220,000 lines of Cornell Movie Corpus and 700,000 lines of Twitter

Figure 7. Twitter Dataset Figure 8. Cornell Dataset

7
Methodology: Data Preprocessing
● Before the textual input is fed into the model, it needs to be preprocessed into
a readable format
○ Punctuation and capitalization were removed
○ Uncommon words and repeated lines were removed
○ Padded to make all words the same length

Example: [‘hello’, ‘everyone’] -> [‘hello000’, ‘everyone’]


● Words are added into a dictionary

8
Methodology: Word Embeddings vs One hot encoding
● One hot encoding is used to convert each word input into vectoral form
○ Examples:
■ Outstanding = [1,0,0,0]
■ Amazing = [0,1,0,0]
■ Awesome = [0,0,1,0]
■ Great = [0,0,0,1]
● Problem with this approach is that synonyms will be counted as different
words and model will treat them differently

9
Methodology: Word Embeddings vs One Hot Encoding, continued

● Word Embeddings
○ Representation of text where words that have similar meanings have a similar representation
○ Most commonly used in an embedding layer
■ A word embedding that is learned jointly with a neural network model on a specific NLP
task such as language modeling
■ Input to layer must be a unique integer representation
■ Initialized with random weights and will learn an embedding for all the words in the
dataset
■ Arguments to be specified include the input dimension, output dimension, and the input
length

10
Methodology: Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM)

● Advantages
○ Avoids the vanishing gradient problem commonly associated with other networks
○ Advantage in storing short-term memory
● Main Idea
○ Gates filters out unnecessary information/words from a sentence that does not match the
intent
○ Example: This show is amazing. It is not bad. I need to buy tickets immediately.
■ The words the LSTM pick up are ones that match the intent. They are highlighted.

11
Methodology: Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM), continued

● Structure
○ LSTM
■ 3 gates
● Input gate
● Forget gate
● Output gate
○ GRU
■ 2 gates
● Reset gate
● Update gate

Figure 3. Comparison of GRU and LSTM

12
Methodology: Sequence to Sequence Model
● Used sequence-to-sequence model
○ A 3 layer neural network
○ Includes an encoder and decoder
○ Both encoder and decoder are recurrent neural networks

Figure 4. Sequence to Sequence Model 13


Methodology: Encoder

Formula 1. Describes how each hidden state in the encoder is


calculated

Figure 5. Encoder Computation Graph [6]

14
Methodology: Decoder

Formula 2. Describes how each hidden state is


calculated

Formula 3. How the output of the decoder is calculated


Figure 6. Decoder Computation Graph [6]

15
Analysis: Results
● The table on the left shows examples of output that correctly responded to each input respectively. Meanwhile, the
table on the right shows examples of output thats do not properly responded to each input respectively
● The chatbot was able to correctly respond 72.2% of the time

Input Output

Where are you We have one before the sun


very badly

Are you funny Do you think youll make sure I


want one

Do you want to be I didnt get the message


president
Figure 9. Positive Example inputs and outputs from the chatbot Figure 10. Negative example inputs and outputs from the chatbot

16
Analysis: Learning Rate
● Initially, we set the learning rate higher and noticed that the model was giving
a higher loss despite a higher accuracy
○ One possibility was that the model was overconfident with its predictions
● Learning rate of 1E-4

17
Analysis: Learning Loss and Accuracy
● Loss decreased from 1.3 (epoch 1) to 0.4 (epoch 50)
● Accuracy grew from 38.9% (epoch 1) to 72.2% (epoch 50)

Figure 11. Loss vs. Epoch Graph Figure 11. Loss vs. Epoch Graph

18
Conclusion
● Due to the nature of our dataset, slang was incorporated
○ Bot understood popular topics such as the presidential election
● Ended with a 72% accuracy
● Could not respond to personal questions
○ Plan on building a “personal information document” saved in memory
○ Therefore, if personal questions are classified, a retrieval based model can be used
● Has potential in providing great emotional services
● Improved encoder can read between the lines

19
Acknowledgements
We would like to thank Ryan Solgi, Laboni Sarker, S. Shailja, Dr. Lina Kim, and the
SRA staff for their support.

20
References
[1] Ayanouz, Soufyane, et al. “A Smart Chatbot Architecture Based NLP and Machine Learning for Health Care Assistance.” ResearchGate, Association for Computing Machinery, 31
Mar. 2020, www.researchgate.net/publication/340678278_A_Smart_Chatbot_Architecture_based_NLP_and_Machine_Learning_for_Health_Care_Assistance.

[2] Brain. “Chatbot Report 2019: Global Trends and Analysis.” Medium, Chatbots Magazine, 19 Apr. 2019, chatbotsmagazine.com/chatbot-report-2019-global-trends-and-analysis-
a487afec05b.

[3] Chablani, Manish “Sequence to Sequence Model: Introduction and Concepts.” Medium, Towards Data Science, 23 June 2017 towardsdatascience.com/sequence-to-sequence-
model-introduction-and-concepts-44d9b41cd42

[4] “Chatbot Tutorial¶.” Chatbot Tutorial - PyTorch Tutorials 1.9.0+cu102 Documentation , Pytorch, pytorch.org/tutorials/beginner/chatbot_tutorial.html?highlight=chatbot+tutorial.

[5] Fang, Hao, et al. “Sounding Board: A User-Centric and Content-Driven Social Chatbot.” Arxiv, Cornell University, 26 Apr. 2018, arxiv.org/abs/1804.10202.

[6 ] Jwala, K. "(2019, June)." (n.d.): Jwala, K., Sirisha, G. N. V. G., & Raju, G. V. P. (2019, June). Developing a Chatbot using Machine Learning.
https://www.ijrte.org/wp-content/uploads/papers/v8i1S3/A10170681S319.pdf.

[7] Liu, Bing, et al. “Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems.” Aclanthology, Association for Computational
Linguistics, June 2018, aclanthology.org/N18-1187/.

[8] Mazarè, Pierre-Emmanuel, et al. “Training Millions of Personalized Dialogue Agents.” Arxiv, Cornell University, 6 Sept. 2018, arxiv.org/abs/1809.01984.

[9] Siddhant, Aditya, et al. “Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents.” Arxiv, Carnegie Mellon University, 13 Nov. 2018,
arxiv.org/pdf/1811.05370.pdf.

[10] Suta, P., Lang, X., Wu, B., Mongkolnam, P., & Chan, J. H. (2020, April 4). An Overview of Machine Learning in Chatbots .
http://www.ijmerr.com/uploadfile/2020/0312/20200312023706525.pdf.

21
Questions
Contact Information: [email protected], [email protected], [email protected]

22

You might also like