Syed Imran
Syed Imran
Abstract:
The pursuit of human-like conversational artificial intelligence (AI) in open domain chatbot
development presents both challenges and opportunities at the forefront of natural language
processing (NLP) research. This paper explores the multifaceted landscape of developing
chatbots capable of engaging in human-like conversations, addressing challenges such as
natural language understanding (NLU), contextual understanding, and generating human-like
responses. We examine the complexities of NLU, including handling ambiguity, context,
humour, and idiomatic expressions, as well as the importance of maintaining context and
memory over multiple turns in conversation. Additionally, we delve into the intricacies of
natural language generation (NLG) and the incorporation of emotional understanding in
chatbot responses. Ethical considerations, including privacy concerns, bias mitigation, and
fairness in conversational AI, are also explored. Despite these challenges, recent
advancements in deep learning architectures, transfer learning, and multimodal integration
offer promising opportunities for advancing human-like conversational AI. We discuss the
potential of context-aware and personalized chatbots, as well as the development of ethical
guidelines and frameworks for responsible AI design. Through case studies, examples, and
future directions, we elucidate the path forward for researchers and developers striving to
create chatbots that emulate human-like conversational abilities while upholding ethical
standards and promoting transparency.
Keywords: Artificial Intelligence (AI), Open Domain Chatbot, Natural Language
Understanding (NLU).
1. INTRODUCTION
Conversational AI represents a paradigm shift in human-computer interaction, aiming
to create intelligent systems capable of engaging in natural and meaningful conversations
with users. At its core, conversational AI seeks to bridge the gap between humans and
machines by enabling computers to understand and generate human language in a manner
that is intuitive, contextually relevant, and emotionally intelligent. This transformative
technology holds immense potential across a wide range of applications, including virtual
assistants, customer service chatbots, healthcare support systems, and educational
platforms.One of the defining features of conversational AI is its ability to comprehend and
respond to natural language inputs from users [1] . This capability is facilitated by
advancements in natural language processing (NLP), which enable computers to analyse and
interpret text or speech data, extract relevant information, and generate appropriate responses.
Through techniques such as named entity recognition, sentiment analysis, and language
modelling, conversational AI systems can understand the intent behind user queries, discern
context, and tailor responses accordingly.Furthermore, conversational AI systems strive to
emulate human-like conversational behaviour by incorporating elements of context, empathy,
and personality into their interactions[2]. By maintaining context over multiple turns of
dialogue, recognizing subtle cues such as tone of voice or facial expressions, and adapting
responses based on user preferences or past interactions, these systems aim to create more
engaging and personalized experiences for users.This human-centric approach not only
enhances user satisfaction but also fosters deeper engagement and trust in AI-powered
interactions.In addition to understanding and generating natural language, conversational AI
often integrates other modalities such as images, videos, or gestures to enrich the
communication experience. Multimodal interfaces enable more expressive and intuitive
interactions, allowing users to convey information more effectively and enabling AI systems
to provide richer and more informative responses [3]. By leveraging multiple modalities,
conversational AI systems can better understand user intent, disambiguate ambiguous queries,
and provide more contextually relevant and accurate information.Overall, conversational AI
represents a transformative technology with the potential to revolutionize how humans
interact with machines. As these systems continue to evolve and improve, they hold the
promise of enabling more natural, seamless, and effective communication between humans
and computers, opening up new possibilities for innovation and enhancing the way we live,
work, and interact with technology.
The importance of human-like conversational AI in open domain chatbot development
cannot be overstated, as it directly impacts the user experience and the effectiveness of
interactions between humans and machines. A chatbot's ability to emulate human
conversation fosters engagement, trust, and satisfaction among users, ultimately enhancing
the overall utility and adoption of AI-powered systems.Human-like conversational AI is
crucial for creating chatbots that users can interact with naturally and effortlessly. By
understanding and responding to natural language inputs in a manner that mimics human
conversation, chatbots can effectively bridge the gap between users and technology, making
interactions more intuitive and user-friendly. This human-centric approach is particularly
important in open domain chatbot development, where users may engage with the system on
a wide range of topics and in various contexts [4].Moreover, human-like conversational AI
enables chatbots to adapt to the nuances of human communication, including humour,
sarcasm, and idiomatic expressions. By recognizing and appropriately responding to these
linguistic cues, chatbots can engage users in more authentic and meaningful conversations,
leading to higher levels of user satisfaction and retention. This adaptability is essential for
creating chatbots that can effectively handle the diverse and unpredictable nature of open
domain interactions.Furthermore, human-like conversational AI enhances the ability of
chatbots to provide personalized and contextually relevant responses to user queries [5-6]. By
maintaining context over the course of a conversation and leveraging information about the
user's preferences, history, and situational context, chatbots can tailor their responses to meet
the individual needs and preferences of each user. This personalization not only improves the
user experience but also increases the likelihood of achieving the desired outcome from the
interaction[7].In summary, human-like conversational AI is integral to the development of
open domain chatbots that are capable of providing engaging, natural, and effective
interactions with users. By emulating human conversation and adapting to the nuances of
human communication, chatbots can foster deeper engagement, trust, and satisfaction among
users, ultimately driving the widespread adoption and success of AI-powered systems in
various domains.
2. CHALLENGES IN DEVELOPING HUMAN-LIKE CONVERSATIONAL AI
A. Natural language understanding (NLU)
Natural language understanding (NLU) is a critical component of artificial
intelligence (AI) systems that enables computers to comprehend and interpret human
language. NLU is essential for a wide range of applications, including virtual assistants,
chatbots, sentiment analysis, and language translation[8] . At its core, NLU seeks to bridge
the gap between human communication and machine processing, allowing computers to
extract meaning, infer intent, and generate appropriate responses from text or speech data.The
complexity of natural language poses significant challenges for NLU systems, as human
language is inherently ambiguous, context-dependent, and rich in nuances. NLU systems
must be able to decipher the meaning of words and phrases in different contexts, understand
syntactic and semantic relationships between words, and infer the underlying intent or
sentiment behind a given utterance. This requires sophisticated algorithms and techniques
from the field of natural language processing (NLP), including parsing, semantic analysis,
and machine learning.One of the key challenges in NLU is handling ambiguity and
uncertainty in human language. Words and phrases can have multiple meanings depending on
the context in which they are used, making it difficult for NLU systems to accurately interpret
user input. For example, the word "bank" can refer to a financial institution or the side of a
river, and understanding which meaning is intended requires context and contextual
understanding.Another challenge in NLU is understanding context and maintaining
coherence over the course of a conversation. Human communication is inherently sequential
and context-dependent, with meaning often derived from the surrounding discourse. NLU
systems must be able to track context, remember previous interactions, and adapt their
responses accordingly to ensure a coherent and meaningful dialogue.Furthermore, NLU
systems must be able to understand and process variations in language, including slang,
dialects, and idiomatic expressions[9]. Human language is constantly evolving and diverse,
with regional and cultural differences influencing the way people communicate. NLU
systems must be robust enough to handle these variations and accurately interpret user input
regardless of linguistic differences.Despite these challenges, advancements in NLP, machine
learning, and deep learning have led to significant progress in NLU technology. State-of-the-
art NLU systems leverage large-scale language models, neural network architectures, and
pre-trained embeddings to achieve impressive levels of accuracy and performance across a
wide range of tasks. By continuously refining and improving NLU capabilities, researchers
and developers are paving the way for more intelligent, intuitive, and natural interactions
between humans and machines.
Ambiguity and context pose substantial hurdles for natural language understanding
(NLU) systems, complicating the interpretation of human language in AI applications. Words
and phrases can carry multiple meanings depending on the context in which they are used,
making it challenging for NLU systems to accurately discern the intended interpretation [10].
For instance, the word "bank" can refer to a financial institution or the side of a river,
necessitating context to determine the correct interpretation. Resolving ambiguity requires
sophisticated algorithms capable of analysing surrounding text, identifying contextual clues,
and inferring the most likely meaning based on the broader context. Additionally, ambiguity
resolution may necessitate domain-specific knowledge or linguistic context to disambiguate
terms effectively. Without robust mechanisms to handle ambiguity and context, NLU systems
risk misinterpreting user input, leading to erroneous responses and degraded user
experiences.Understanding humour, sarcasm, and idiomatic expressions represents another
significant challenge for NLU systems, as these forms of linguistic phenomena often rely on
implicit cues and non-literal interpretations [11-13] .Humour, for example, frequently
involves wordplay, irony, or unexpected juxtapositions that defy literal interpretation.
Similarly, sarcasm relies on the use of tone, intonation, and context to convey meaning,
making it difficult for NLU systems to recognize and respond appropriately to sarcastic
remarks. Idiomatic expressions further compound the challenge by introducing culturally
specific phrases or figures of speech that may not have a direct equivalent in other languages
or contexts. Effectively understanding humour, sarcasm, and idiomatic expressions requires
NLU systems to possess a deep understanding of linguistic nuances, social context, and
cultural references. Furthermore, it necessitates the integration of advanced natural language
processing techniques, such as sentiment analysis, pragmatics, and discourse understanding,
to accurately interpret and respond to these forms of linguistic complexity. Failure to account
for humour, sarcasm, and idiomatic expressions can result in miscommunication,
misunderstandings, and decreased user satisfaction in AI-driven conversational systems.
1. Recurrent Neural Networks (RNNs): RNNs are a class of neural networks designed to
process sequential data, making them well-suited for tasks such as language modelling,
sentiment analysis, and machine translation. RNNs have a recurrent structure that allows
them to maintain a hidden state representing the context of previous tokens in the input
sequence, enabling them to capture temporal dependencies and sequential patterns in text
data.
2. Long Short-Term Memory (LSTM) Networks: LSTMs are a type of RNN architecture
designed to address the vanishing gradient problem, which can hinder the training of standard
RNNs on long sequences of data. LSTMs incorporate gated units, including input gates,
forget gates, and output gates, which enable them to selectively update and retain information
over long time steps, making them well-suited for tasks requiring memory and context
preservation, such as language modelling and text generation.
3. Gated Recurrent Unit (GRU) Networks: GRUs are a variant of the LSTM architecture that
simplifies the gating mechanism while retaining similar capabilities for modelling long-range
dependencies in sequential data. GRUs have fewer parameters than LSTMs, making them
computationally more efficient and easier to train, while still achieving comparable
performance on many NLP tasks.
4. Convolutional Neural Networks (CNNs): Although initially developed for image
processing tasks, CNNs have also been adapted for NLP tasks, such as text classification and
sentiment analysis. In text processing, CNNs operate on one-dimensional sequences of word
embeddings or character embeddings, applying convolutional filters of varying sizes to
capture local patterns and features in the input text. CNNs are particularly effective for tasks
requiring feature extraction from fixed-size windows of text, such as identifying n-gram
patterns or local context.
5. Transformer Networks: Transformers represent a breakthrough in NLP architecture,
introduced by the seminal "Attention is All You Need" paper. Transformers dispense with
recurrent connections altogether, relying instead on self-attention mechanisms to capture
global dependencies between words in a sequence. This architecture has become the
backbone of state-of-the-art models for various NLP tasks, including language translation
(e.g., BERT, GPT), achieving remarkable performance improvements and scalability
compared to traditional recurrent architectures.
These deep learning architectures have significantly advanced the state-of-the-art in NLP,
enabling more accurate, efficient, and scalable solutions to a wide range of tasks, from
language modelling and sentiment analysis to machine translation and question answering. As
research in deep learning continues to evolve, we can expect further innovations and
improvements in NLP architectures, driving continued progress in natural language
understanding and generation.
2. Transfer learning and pre-trained language models
Transfer learning and pre-trained language models have revolutionized natural language
processing (NLP) by enabling the transfer of knowledge learned from one task or domain to
another, leading to significant improvements in model performance and efficiency. This
approach leverages large-scale pre-trained models trained on vast amounts of text data,
typically using unsupervised learning techniques, and fine-tunes them on specific
downstream tasks with smaller labelled datasets. Several key concepts and techniques
underpin transfer learning and pre-trained language models:
1. Pre-trained Language Models: Pre-trained language models, such as OpenAI's GPT
(Generative Pre-trained Transformer) and Google's BERT (Bidirectional Encoder
Representations from Transformers), are large-scale neural network architectures trained on
massive corpora of text data. These models learn to predict the next word in a sequence (in
the case of GPT) or to predict missing words in a sentence (in the case of BERT) based on the
surrounding context. By training on vast amounts of unlabelled text data, pre-trained
language models capture rich semantic and syntactic representations of language, enabling
them to perform well on a wide range of NLP tasks without task-specific fine-tuning.
2. Fine-tuning: Fine-tuning involves taking a pre-trained language model and adapting it to a
specific downstream task or domain by further training it on task-specific labelled data.
During fine-tuning, the parameters of the pre-trained model are updated using supervised
learning techniques to optimize performance on the target task. Fine-tuning allows the model
to leverage the knowledge and representations learned from the pre-training phase while
adapting to the nuances and requirements of the target task, leading to improved performance
and generalization.
3. Task-specific Head: In transfer learning with pre-trained language models, the final layer
or layers of the model, known as the task-specific head, are replaced or modified to suit the
requirements of the target task. For example, in text classification tasks, the task-specific
head may consist of a fully connected layer followed by a SoftMax activation function to
predict the class label. By customizing the task-specific head, the model can be tailored to the
specific output requirements of the target task while still benefiting from the rich
representations learned during pre-training.
4. Domain Adaptation: Transfer learning with pre-trained language models also facilitates
domain adaptation, allowing models to generalize to new domains or data distributions with
minimal labelled data. By fine-tuning a pre-trained model on labelled data from the target
domain, the model can adapt its representations to better capture the characteristics and
nuances of the new domain, leading to improved performance on domain-specific tasks.
Overall, transfer learning and pre-trained language models have become indispensable tools
in NLP, enabling researchers and practitioners to build more accurate, efficient, and scalable
models for a wide range of tasks. By leveraging the knowledge and representations learned
from pre-training and fine-tuning them on specific downstream tasks, transfer learning with
pre-trained language models has unlocked new levels of performance and capabilities in NLP,
driving significant advancements in natural language understanding and generation.
B. Integration of multimodal inputs
1. Incorporating visual and auditory cues
The integration of multimodal inputs, including visual and auditory cues, is a rapidly
evolving area of research within artificial intelligence (AI) that aims to enhance the
capabilities of AI systems by enabling them to process and understand information from
multiple modalities simultaneously. By combining inputs from different modalities, such as
text, images, and audio, AI systems can achieve a more comprehensive understanding of the
world and perform more sophisticated tasks that require multimodal perception and
reasoning. Several key approaches and techniques are being explored to facilitate the
integration of multimodal inputs:
1. Fusion of Modalities: One approach to integrating multimodal inputs involves fusing
information from different modalities into a unified representation that captures the combined
features and characteristics of the input data. This may involve concatenating feature vectors
from each modality and feeding them into a shared neural network architecture for joint
processing. Alternatively, modalities may be fused at a higher level of abstraction, such as
through attention mechanisms or gating mechanisms that dynamically weight the
contributions of each modality based on their relevance to the task at hand.
2. Cross-Modal Learning: Cross-modal learning techniques aim to leverage correlations and
relationships between different modalities to facilitate learning across modalities. For
example, models may be trained to predict one modality (e.g., text) based on input from
another modality (e.g., images), encouraging the model to learn meaningful representations
that capture shared semantic information between modalities. Cross-modal learning enables
AI systems to leverage information from multiple modalities to improve performance on
tasks such as image captioning, audio-visual speech recognition, and multimodal sentiment
analysis.
3. Attention Mechanisms: Attention mechanisms play a crucial role in multimodal integration
by enabling AI systems to selectively focus on relevant information from each modality while
ignoring irrelevant or noisy inputs. By dynamically allocating attention across modalities
based on the task and context, attention mechanisms allow AI systems to effectively integrate
information from different sources and weigh the contributions of each modality based on
their salience and informativeness. This enables more flexible and adaptive multimodal
processing, improving performance on tasks such as visual question answering and
multimodal translation.
4. Multimodal Pre-training: Pre-training techniques, similar to those used in natural language
processing, can also be extended to multimodal learning settings. Models may be pre-trained
on large-scale multimodal datasets, such as image-text pairs or video-audio-text triplets, using
unsupervised or self-supervised learning objectives. Pre-training enables models to learn rich
representations of multimodal data, which can then be fine-tuned on downstream tasks with
smaller labelled datasets. This approach has shown promising results in tasks such as image-
text retrieval, visual grounding, and multimodal dialogue generation.
By integrating visual and auditory cues with other modalities such as text, AI systems can
achieve a more comprehensive understanding of the world and perform a wide range of tasks
that require multimodal perception and reasoning. As research in multimodal integration
continues to advance, we can expect to see increasingly sophisticated AI systems that can
effectively leverage information from multiple modalities to achieve human-level
performance on complex real-world tasks.
2. Enhancing user experience through multimodal interaction
Enhancing user experience through multimodal interaction involves leveraging
multiple sensory modalities, such as text, speech, images, and gestures, to create more
intuitive, engaging, and effective human-computer interfaces. By providing users with
multiple channels for interaction and feedback, multimodal interfaces can cater to diverse
user preferences, accessibility needs, and contextual constraints, ultimately leading to a more
seamless and satisfying user experience. Multimodal interaction can enhance accessibility for
users with disabilities or special needs by providing alternative input and output modalities
that accommodate different abilities and preferences. For example, speech recognition and
synthesis technologies enable users with motor impairments to interact with devices using
voice commands, while screen readers and haptic feedback systems provide auditory or
tactile feedback for users with visual impairments.Multimodal interfaces aim to mimic
natural human-human communication by allowing users to interact with devices using a
combination of speech, gestures, and touch. By enabling more intuitive and expressive forms
of interaction, multimodal interfaces can reduce the cognitive load on users and make
interactions feel more natural and fluid. For example, virtual assistants like Siri and Google
Assistant allow users to ask questions using natural language and receive spoken responses,
mimicking a conversation with a human assistant.Multimodal interfaces can leverage
contextual information, such as user location, device orientation, and environmental
conditions, to adapt and personalize the user experience in real-time. By dynamically
adjusting the interface layout, content presentation, and interaction modalities based on
contextual cues, multimodal interfaces can provide users with more relevant and timely
information, improving overall usability and engagement. For example, navigation apps may
switch between visual and auditory cues depending on whether the user is driving or walking,
while smart home devices may adjust their behaviour based on the time of day and user
preferences.Multimodal interfaces can enhance robustness and reliability by providing
redundant information across multiple modalities. By presenting information through both
visual and auditory channels, for example, multimodal interfaces can compensate for noisy
environments, distractions, or sensory impairments that may affect the effectiveness of
individual modalities. This redundancy can improve user comprehension, task performance,
and overall user satisfaction, particularly in challenging or unpredictable
conditions.Multimodal interfaces can be personalized to individual user preferences, habits,
and characteristics, allowing for a more tailored and adaptive user experience. By tracking
user interactions, preferences, and feedback over time, multimodal interfaces can learn user
preferences and adapt their behaviour accordingly, providing personalized recommendations,
suggestions, and assistance. This personalization can enhance user engagement, retention,
and loyalty, fostering stronger connections between users and interactive systems.Overall,
enhancing user experience through multimodal interaction involves designing interfaces that
are accessible, natural, context-aware, robust, and personalized to meet the diverse needs and
preferences of users. By leveraging multiple modalities for interaction and feedback,
multimodal interfaces can create more intuitive, engaging, and satisfying user experiences
across a wide range of applications and domains.
C. Context-aware and personalized chatbots
1. Adaptive dialogue systems
Context-aware and personalized chatbots represent a significant advancement in
conversational AI, enabling these systems to dynamically adapt their responses and behaviour
based on contextual cues and user-specific preferences. Adaptive dialogue systems leverage
contextual information such as conversation history, user intent, and situational context to
provide more relevant, timely, and engaging interactions. Context-aware chatbots are
equipped with mechanisms to understand and interpret the context of the conversation,
including the user's current task, previous interactions, and environmental factors. By
analysing dialogue history, user inputs, and contextual signals, such as timestamps or location
data, chatbots can infer the user's intent and adapt their responses accordingly. For example, a
chatbot assisting with travel bookings may reference previous bookings or user preferences to
provide personalized recommendations or assistance.Context-aware chatbots generate
responses dynamically based on the current conversation context, rather than relying solely
on static predefined responses. By considering the dialogue history, user preferences, and
situational context, chatbots can generate more contextually relevant and personalized
responses that address the user's needs and goals. For instance, a chatbot providing customer
support may adjust its tone and content based on the user's mood or level of urgency,
ensuring that responses are empathetic and helpful.Personalized chatbots tailor their
responses and interactions to match the preferences, characteristics, and behaviour of
individual users. By tracking user interactions, feedback, and demographic information,
chatbots can adapt their dialogue style, content recommendations, and service offerings to
align with the user's preferences and interests. Personalization can enhance user engagement,
satisfaction, and loyalty by providing a more customized and relevant experience. For
example, an e-commerce chatbot may recommend products based on the user's purchase
history, browsing behaviour, and preferences, increasing the likelihood of successful
conversions.Context-aware and personalized chatbots continuously learn and improve over
time through adaptive learning mechanisms. By analysing user feedback, conversational
patterns, and outcomes of interactions, chatbots can refine their dialogue strategies, update
their knowledge base, and adapt their behaviour to better meet user needs. Adaptive learning
enables chatbots to evolve and adapt to changing contexts, user preferences, and domain-
specific knowledge, ensuring that they remain effective and relevant over time.Context-aware
chatbots excel in managing multi-turn dialogues, where conversations unfold over multiple
exchanges. By maintaining context across turns and tracking the progression of the
conversation, chatbots can provide more coherent, contextually relevant, and engaging
interactions. This involves understanding the user's goals, tracking dialogue history, and
anticipating future actions or information needs, allowing chatbots to guide the conversation
smoothly toward successful outcomes.Overall, context-aware and personalized chatbots
represent a significant advancement in conversational AI, offering more adaptive, engaging,
and effective interactions with users. By leveraging contextual understanding, dynamic
response generation, personalization, adaptive learning, and multi-turn dialogue management,
these chatbots can deliver more tailored, relevant, and satisfying experiences across a wide
range of applications and domains.
4. CONCLUSION
The paper highlights the multifaceted nature of developing open-domain chatbots that
can engage users in meaningful, human-like conversations. It delves into the complexities of
natural language understanding, generation, and dialogue management, emphasizing the need
for robust models that can comprehend and respond to diverse linguistic patterns, intents, and
contexts.Moreover, the paper elucidates the challenges of imbuing chatbots with human-like
qualities such as empathy, humour, and contextual awareness. While advancements in
machine learning and natural language processing have enabled chatbots to mimic human
conversation to some extent, achieving true human-like conversational AI remains an elusive
goal that requires addressing fundamental questions of consciousness, intentionality, and
social cognition.Nevertheless, the paper also underscores the immense opportunities
presented by open-domain chatbot development. From improving customer service and
personal assistants to enhancing mental health support and educational resources,
conversational AI holds the potential to revolutionize various aspects of human-computer
interaction and everyday life.In light of these challenges and opportunities, the paper
advocates for continued research and innovation in the field of conversational AI. By
fostering interdisciplinary collaboration, exploring novel approaches to modelling human
conversation, and ethically addressing concerns related to privacy, bias, and transparency, we
can advance towards the realization of more human-like conversational agents that enrich our
digital experiences and deepen our understanding of human communication.
REFERENCES
1. Chung, M., Ko, E., Joung, H., & Kim, S. J., “Chatbot e-service and customer satisfaction
regarding luxury brands, Journal of Business Research, 2018
2. D. C. Ukpabi, B. Aslam, and H. Karjaluoto, “Chatbot Adoption in Tourism Services: A
Conceptual Exploration,” in Robots, Artificial Intelligence, and Service Automation in
Travel, Tourism and Hospitality, Emerald Publishing Limited, 2019, pp. 105–121.
3. Cameron, G., Cameron, D., Megaw, G., Bond, R., Mulvenna, M., O’Neill, S., McTear,
M., “Towards a chatbot for digital counselling.”, Proceedings of the 31st International
BCS Human Computer Interaction Conference, 2017
4. F. Clarizia, F. Colace, M. Lombardi, F. Pascale, and D. Santaniello, “Chatbot: An
Education Support System for Student,” in Cyberspace Safety and Security, Springer
International Publishing, 2018, pp. 291–302
5. Stefan Kojouharov, Ultimate Guide to Leveraging NLP & Machine Learning for your
Chatbot, Chatbots Life, Medium, September 2016.
6. Yu Wu, Wei Wu, Chen Xing, Ming Zhou, Zhoujun Li. Sequential Matching Network: A
New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots. arXiv,
Computer Science, Computation and language. May 2017.
7. A. Veglis and T. A. Maniou, “Chatbots on the Rise: A New Narrative in Journalism,”
Studies in Media and Communication, vol. 7, no. 1, p. 1, Jan. 2019.
8. Kyle Swanson, Lili Yu, Christopher Fox, Jeremy Wohlwend, Tao Lei. Building a
Production Model for Retrieval-Based Chatbots, arXiv, Computer Science, Computation
and language. Aug 2019.
9. Wu, Yu and Wu, Wei and Xing, Chen and Xu, Can and Li, Zhoujun and Zhou, Ming. A
Sequential Matching Framework for Multi-Turn Response Selection in Retrieval-Based
Chatbots. Computational Linguistics. Vol 45, 2019, pp. 163-197.
10. G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P.
Nguyen, T. Sainath, and B. Kingsbury. Deep neural networks for acoustic modelling in
speech recognition. IEEE Signal Processing Magazine, 2012.
11. Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with
neural networks." In Advances in neural information processing systems, pp. 3104-3112.
2014.
12. T. Mikolov, M. Karafia t, L. Burget, J. Cernocky`, and S. Khudanpur. Recurrent neural
network-based language model. In INTERSPEECH, pages 1045–1048, 2010.
13. M. Sundermeyer, R. Schluter, and H. Ney. LSTM neural networks for language
modelling. In INTER-SPEECH, 2010.
14. Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I.
Polosukhin. "Attention Is All You Need”. arXiv, Computer Science, Computation and
Knowledge 2017.
15. Dai, Andrew M., and Quoc V. Le. "Semi-supervised sequence learning." In Advances in
neural information processing systems, pp. 3079-3087. 2015.
16. Ashley Pilipiszyn. Better Language Models and Their Implications. OpenAI Blog.
February 2019.
17. Zhang, Yizhe, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao,
Jianfeng Gao, Jingjing Liu, and Bill Dolan. "DialoGPT: Large-Scale Generative Pre-
training for Conversational Response Generation." arXiv, Computer Science,
Computation and Language. 2019.