0% found this document useful (0 votes)
90 views18 pages

Preprints202303 0438 v1

This document provides a critical review of ChatGPT, an AI language model released in November 2022 by OpenAI. It summarizes existing research on ChatGPT from different perspectives and develops a taxonomy of areas studied. The review identifies ChatGPT's technical capabilities compared to previous models and examines its applications and implications. It serves as a reference for researchers advancing work on ChatGPT and large language models.

Uploaded by

LEGEND OFFICIAL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views18 pages

Preprints202303 0438 v1

This document provides a critical review of ChatGPT, an AI language model released in November 2022 by OpenAI. It summarizes existing research on ChatGPT from different perspectives and develops a taxonomy of areas studied. The review identifies ChatGPT's technical capabilities compared to previous models and examines its applications and implications. It serves as a reference for researchers advancing work on ChatGPT and large language models.

Uploaded by

LEGEND OFFICIAL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.

v1

Disclaimer/Publisher’s Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and1
contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting
from any ideas, methods, instructions, or products referred to in the content.

Exploring ChatGPT Capabilities and Limitations:


A Critical Review of the NLP Game Changer
Anis Koubaa, Wadii Boulila, Lahouari Ghouti, Ayyub Alzahem , Shahid Latif
Prince Sultan University, Riyadh, Saudi Arabia
{akoubaa,wboulila,lghouti,aalzahem,slatif}@psu.edu.sa

Abstract—ChatGPT, a groundbreaking natural language pro-


cessing technology released just three months ago, has attracted
significant a ttention d ue t o i ts r emarkable c apabilities. T his AI
milestone has urged researchers, industry, decision-makers, and
governments to examine this technology, including its impli-
cations, threats, and benefits. D espite t he s hort p eriod since
its release, several researchers have examined ChatGPT from
different perspectives. This paper presents a comprehensive
review of ChatGPT, highlighting its technical novelties compared
to previous models and analyzing existing research from various
perspectives. We followed a rigorous methodology to conduct a Fig. 1. Google Search Interest Over Time: "ChatGPT", "Transformers",
"Computer Vision", "Natural Language Processing"
critical review of existing research on ChatGPT and developed
a taxonomy for the different areas of study. Additionally, we
identify future challenges and research trends associated with
ChatGPT. Our paper is the first c ritical r eview o f C hatGPT lit- The popularity of ChatGPT has renewed the interest in NLP
erature, providing valuable insights for practitioners and policy- research, with many companies and organizations investing in
makers. This paper serves as a reference for researchers seeking developing similar language models or using its capabilities.
to advance research on ChatGPT, including its applications and The availability of large pre-trained language models, such as
development.
GPT-3, has also made it easier for researchers and developers
Index Terms—ChatGPT; Review; Technical Novelties; Taxon- to build sophisticated NLP applications without the need for
omy; Applications extensive data training.

I. I NTRODUCTION A. Why ChatGPT became popular?


ChatGPT is, in the end, a sophisticated chatbot, and the
Natural Language Processing (NLP) has been a rapidly
chatbot concept is not new as it has been developed since
growing field f or s everal y ears, b ut t he r elease o f ChatGPT
the early release of LSTMs and has been extensively used in
(Chat Generative Pre-trained Transformer) in November 2022
several applications such as automated customer service sup-
sparked a surge of interest and excitement in the technology.
port [1], E-Commerce [2], Healthcare [3], and Education [4].
ChatGPT, which is a large language model trained by Ope-
However, while these chatbots offer decent business services,
nAI, demonstrated impressive capabilities in understanding
they suffer from several limitations, which we summarize in
and generating human-like language. Its ability to answer
three main shortcomings:
questions, carry out conversations, and generate coherent and
• Limited context awareness: typical chatbot systems are
contextually appropriate responses was a significant l eap for-
ward in the development of conversational AI. trained on a limited context that serves the business
Figure 1 illustrates the exponential rise in popularity of requirements, such as customer service or e-commerce.
ChatGPT since its initial release, showcasing its dominance This can limit their understanding capabilities and result
over other widespread technologies such as Transformers, in unsatisfying interactions for users. Furthermore, within
NLP, and Computer Vision. The data was extracted from a the same context, these chatbots may struggle to address
web-based media analytics tool and covers the trends over user queries if the intent is unclear enough to the engine
the last three months. As evident from the graph, ChatGPT due to limited pre-programmed rules and responses. This
has surpassed other technologies by a considerable margin usually leads to unsatisfying interactions with users’
in terms of interest and mentions. Interestingly, we can also experience. However, ChatGPT has demonstrated supe-
observe a spike in the popularity of Transformers technology, rior performance compared to existing chatbots in its
which seems to be synchronized with the release of ChatGPT. ability to understand broad contexts in natural language
Nonetheless, ChatGPT continues to remain the frontrunner in conversations. Its advanced language model allows it to
the field of natural language processing, and its popularity only analyze and interpret the meaning behind user queries,
seems to be on the rise. leading to more accurate and satisfying responses.
• Limited scale: Another limitation of traditional chatbot
Funding Agency: Prince Sultan University systems is their limited scale. These systems are typically

© 2023 by the author(s). Distributed under a Creative Commons CC BY license.


Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
2

trained on a relatively small amount of data related to classification, segmentation, analysis, and natural language
the context of operation due to the high cost of data processing for radiology reports. The authors also discussed
labeling and training for large data sizes. In contrast, the future possibilities of using GPT models, such as in
ChatGPT has overcome these barriers by being trained personalized medicine and improving radiology education.
on a massive amount of data from the internet, with a Omar et al. [7] compared the effectiveness of ChatGPT
size of 570GB. This large-scale language model used and traditional question-answering methods for knowledge
in ChatGPT allows it to generate human-like responses graphs and discussed potential future directions for developing
that can mimic the tone, style, and humor of a human knowledge graph chatbots in their survey. First, an overview
conversation. Traditional chatbots often provide robotic of knowledge graphs and their applications and the current
and impersonal responses, which can lead to unsat- state of question-answering systems is presented. Then, the
isfactory interactions with users. However, ChatGPT’s authors compared the performance of ChatGPT and traditional
advanced language model and large-scale training allow question-answering methods on a benchmark dataset of knowl-
it to generate more natural and engaging responses. edge graph questions. The results demonstrated that ChatGPT
• Limited text generation ability: Traditional chatbot outperformed traditional methods regarding the accuracy and
systems often lack the flexibility and adaptability required naturalness of responses. Haleem et al. [8] presented an
to handle complex and dynamic natural language under- overview of ChatGPT and its importance. Moreover, various
standing and generation. They often rely on pre-written progressive workflow processes of the ChatGPT tool are
intents, responses, or templates, leading to repetitive, illustrated with diagrams. This survey further examined the
predictable, or irrelevant answers that fail to engage specific features and capabilities of ChatGPT as a support tool
users or meet their needs. Moreover, they struggle to and explored its significant roles in the current scenarios.
generalize to new, unseen data, limiting their usefulness The distribution of currently published articles through var-
in real-world scenarios where the topics and contexts can ious publishers is presented in bar graphs of Fig.2 and Fig.3.
vary widely. In contrast, ChatGPT leverages a powerful It demonstrates that limited literature is available on ChatGPT,
transformer architecture and a massive amount of training and there is still room for improvements and contributions for
data to generate high-quality and diverse text outputs upcoming researchers in this emerging research area.
that closely resemble human language. By learning pat-
45
terns and relationships between words and phrases from 41
38
40
various sources, ChatGPT can capture the nuances and
No of Published Articles

35
subtleties of different domains and produce relevant and 30 26
coherent responses even to open-ended or ambiguous 25
queries. 20 18
13
With the growing interest in ChatGPT, this paper aims 15
9
to provide a comprehensive review of the language model, 10
5
shedding light on its technical novelty and the reasons why it 1
0
has become a hot topic in the field of information technology. ArXiv Springer Wiley Nature Taylor & IEEE Elsevier
Additionally, we survey recent research papers published since Francis
Publisher
ChatGPT’s release, categorize them, and discuss their contri-
butions to the assessment and development of the ChatGPT Fig. 2. Distribution of selected articles based on publishers.
language model. Finally, we suggest some potential areas of
future research that could be explored in relation to ChatGPT.
Research Review Editorial Prespective Conference Book Chapter

40
B. Related Surveys 35
33
No of Published Articles

35

Only a few survey articles related to ChatGPT and its 30


25
application are available in the literature. In this subsection, 21
20
we summarize the main contributions of the most relevant 15 11
9
surveys. 10
5
3 4 4 4 4 4
The survey [5] explored the historical evolution and tech- 5 10 11 000
2 1 000 0 000 1 001 000100 1 000
0
nology of ChatGPT, highlighting its potential applications ArXiv Springer Wiley Nature Taylor & IEEE Elsevier
Francis
in various domains, including healthcare, education, and re-
Publisher
search. The article also addressed some significant limita-
tions and ethical concerns surrounding ChatGPT. One of the
unique aspects of the article is the authors’ attempt to engage Fig. 3. Distribution of selected articles based on article type.
ChatGPT in a conversation and obtain its perspective on the
questions posed by the authors. Finally, the article provided
insights into the capabilities, limitations, and future directions C. Contributions and research-structure
of ChatGPT technology. Lecler et al. [6] reviewed the current The contributions of this survey are summarized in the
applications of GPT models in radiology, including image following key points
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
3

• We provide an in-depth analysis of the technical advance- core technology of Transformers. The Transformer architec-
ments and innovations that distinguish ChatGPT from ture was first introduced in the seminal paper "Attention is
its predecessors, including generative models and chatbot All You Need" by Vaswani et al. in 2017 [10], which has
systems. This analysis elucidates the underlying mecha- significantly impacted the deep learning research community,
nisms contributing to ChatGPT’s enhanced performance starting with sequential models and extending to computer
and capabilities. vision. In the next sub-section, we provide a detailed overview
• We develop a comprehensive taxonomy of recent Chat- of the Transformer technology and how it was leveraged in
GPT research, classifying studies based on their appli- GPT-3 for text generation.
cation domains. This classification enables a thorough 1) Transformers as Core Technology: Transformers refer
examination of the contributions and limitations present to the revolutionary core technology behind ChatGPT. Trans-
in the current literature. Additionally, we conduct a com- formers have transformed how sequence-to-sequence models
parative evaluation of emerging ChatGPT alternatives, are processed, significantly outperforming traditional models
highlighting their competitive advantages and drawbacks. based on recurrent neural networks. Although Transformers
• We identify and discuss the limitations and challenges are based on classical encoder-decoder architecture, it dra-
associated with ChatGPT, delving into potential areas of matically differs in integrating the concept of self-attention
improvement and unexplored research opportunities. This modules, which excels in capturing long-term dependencies
discussion paves the way for future advancements in the between the elements (i.e., tokens) of the input sequence.
field, guiding researchers and practitioners in addressing It leverages this information to determine each element’s
the current gaps in ChatGPT research and applications. importance in the input sequence efficiently. The importance
This paper is structured as follows: Section II delves into of each element is determined through the self-attention mech-
ChatGPT’s background and technological innovations, such anism, which computes a weight for each element based on
as the fusion of transformers and reinforcement learning with its relevance to other tokens in the sequence. This enables
human feedback. Section III examines ChatGPT competitors Transformers to handle variable-length sequences better and
and their comparative analysis. Section IV highlights emerging capture complex relationships between the sequence elements,
ChatGPT applications and research by summarizing pertinent improving performance on various natural language processing
studies. Section V outlines challenges and future directions, tasks. Another critical feature is positional embedding that
and Section VI concludes the paper. helps transformers learn the positional information of tokens
within the sequence. It allows differentiating between tokens
II. BACKGROUND AND MAIN CONCEPTS with the same contents but at different positions, which pro-
vides better context representation that improves the models’
A. GPT-3 Model: Leveraging Transformers accuracy. These features represent a significant strength in
The release of the GPT-3 model family by OpenAI has set ChatGPT for providing accurate natural language generation,
the bar very high for its direct competitors, namely Google as compared to its peer, particularly with being trained on large
and Facebook. It has been a major milestone in developing datasets of 570 GB of Internet data.
natural language processing (NLP) models. The largest GPT- In general, a transformer comprises three featured modules:
3 configuration comprises 175 billion parameters, including (i.) Encoder-Decoder module, (ii.) Self-Attention module, (iii.)
96 attention layers and a batch size of 3.2 million training Positional Embedding module.
samples. GPT-3 was trained using 300 billion tokens (usually In the following sub-section, we will present the core
sub-words) [9]. functionalities of these modules.
The training process of GPT-3 builds on the successful 2) Encoder-Decoder Architecture : When Transformers
strategies used in its predecessor, GPT-2. These strategies architecture was first designed in [10] as shown in Figure
include modified initialization, pre-normalization, and reverse 4, they were applied in machine translation, where an input
tokenization. However, GPT-3 also introduces a new refine- sequence of an initial language is transformed to the output
ment based on alternating dense and sparse attention patterns sequence of the target language. The Transformer architecture
[9]. followed an encoder-decoder model, where the encoders map
GPT-3 is designed as an autoregressive framework that can a discrete representation of the input sequence (i.e., words,
achieve task-agnostic goals using a few-shot learning paradigm characters, sub-words) to a continuous representation denoted
[9]. The model can adapt to various tasks with minimal as an embedding vector (i.e., a vector of continuous values).
training data, making it a versatile and powerful tool for NLP The decoder takes embeddings as input and generates an
applications. output sequence of elements one at a time. As the transformer
To cater to different scenarios and computational resources, is an autoregressive generative model, it predicts the proba-
OpenAI has produced GPT-3 in various configurations. Table bility distribution of the next element in the output sequence,
II-A5 summarizes these configurations, which range from a given the previous sequence, which can be seen as a special
relatively small 125 million parameter model to the largest case of Hidden Markov Chains (HMMs). However, HMMs
175 billion parameter model. This allows users to choose a do not have the ability to capture long-term dependencies
model that best fits their needs and resources. bidirectionally as transformers do.
All GPT (Generative Pre-trained Transformer) models, in- Unlike other models that use both an encoder and a decoder
cluding the most recent GPT-3 model, are built based on the or an encoder only, like the Bert model family from Google
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
4

[11], ChatGPT relies only on pure decoder architecture, il-


lustrated in Fig. 5, as defined in the first GPT paper [12].
Specifically, the underlying GPT model applies unidirectional
attention using a language masking strategy to process the
input sequences token-wise. The decoder is trained to take
the first token of the input sequence as a start token and
then generate subsequent output tokens based on the input
sequence and previously generated tokens. This architecture
represents the standard model for language modeling with the
objective of generating the sequence of tokens with the highest
likelihood given a context (i.e., input sequence and previously
Fig. 5. The Transformers Architecture as defined in GPT [12]
generated tokens). The reason why ChatGPT architecture does
not rely on an encoder is that the GPT models are trained
on a large corpus of textual datasets, using unsupervised
semantically similar they are, indicating their importance in
learning, to predict the next sequence of words, given a
the overall context of the input sequence.
context. Therefore, these models are trained to generate text
Self-Attention in transformers also relies on the Query (Q),
rather than mapping an input to an output, as in a typical
Key (K), and Value (V ) concepts. This concept is not new
encoder-decoder architecture. As such, the text embedding
and is borrowed from the Information Retrieval literature and,
is directly fed into the self-attention modules to learn the
more specifically, from query processing and retrieval, such
complex relationship between these models in a cascade of
as in search engines. In Information Retrieval, a query is a
self-attention layers. The self-attention module is what makes
set of tokens used to search for relevant documents from a
transformers a powerful tool, which will be explained further
collection of stored documents. A document’s relevance score
in the next section.
is calculated based on the similarity between the query and the
document. The similarity score is determined by comparing
the query tokens to the documents tokens (i.e., keys) and their
corresponding weights (i.e., values). The dot product measures
the cosine similarity between these vectors and calculates the
relevance scores. This is what exactly happens in the self-
attention modules of transformers illustrated in Fig. 6.

Fig. 4. The Transformers Architecture as defined in [10] for Machine


Translation
Fig. 6. Self-Attention Module Architecture [12]
3) Self-Attention Module: Self-attention stands as the core
module that empowers transformers to achieve their remark- In transformers, an input sequence is converted into a set
able performance. They have the ability to capture complex of three vectors, namely, the query, the key, and the value
dependencies between the tokens in an input sequence and vectors. Consider a sentence with a sequence of tokens (i.e.,
efficiently determine the weight of each token in an input words, sub-words, or characters).
sequence, in addition to their relative importance. While the • The Query (Q): This vector represents a single token
self-attention concept may look complex, it relies on the (e.g., word embedding) in the input sequence. This token
notion of semantic similarity between vectors (in our case, is used as a query to measure its similarity to all other
token embeddings) using the dot product. The dot product of tokens in the input sequences, equivalent to a document
two vectors determines the cosine distance between the two in an information retrieval context.
vectors, considering their amplitude and the relative angle. The • The Key (K): This vector represents all other tokens in
higher the dot product between two embeddings, the more the input sequence apart from the query token. The key
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
5

vectors are used to measure the similarity to the query returns the output matrix Z. The softmax function is applied
vector. row-wise to the scaled dot product of Q and K matrices,
• The Value (V ): This vector represents all tokens in the which are divided by the square root of the key dimension dk .
input sequence. The value vectors are used to compute The mask is applied in Step 2 before computing the weighted
the weighted sum of the elements in the sequence, where sum in Step 3. The element-wise multiplication of the mask
the weights are determined by the attention weights and attention scores sets the attention scores corresponding to
computed from the query and key vectors through a dot- masked tokens to zero, ensuring that the model does not attend
product. to those positions. The resulting matrix A is used as weights
In summary, The dot product between the query vectors and to compute the weighted sum of V matrix, resulting in the
the key vectors results in the attention weights, also known output matrix Z.
as the similarity score. This similarity score is then used to 4) Multi-Head Attention: Traditional NLP tasks, particu-
compute attention weights determining how much each value larly ChatGPT, deal with vast and complex data. Therefore,
vector contributes to the final output. using only one attention head may not be sufficient for
Formally, the self-attention module is expressed as follows: capturing all relevant information in a sequence. Multi-head
Attention allows for parallel processing since the self-attention

QK T
 operation is applied across multiple heads. This can result in
Attention(Q, K, V) = softmax √ V = AV faster training and inference times than a single-head self-
Dk
attention mechanism.
Where Q ∈ RN ×Dk , K ∈ RM ×Dk , and V ∈ RM ×Dv are Multi-Head Attention also captures multiple between the
the packed matrix representations of queries, keys, and values, query and the key-value pairs in the input sequence, which
respectively. N and M denote the lengths of queries and enables the model to learn complex patterns and dependencies
keys (or values), while Dk and Dv denote the dimensions of in the data. It also helps increase the model’s capacity to learn
keys (or queries) and values, respectively.
√ The dot-products of advanced relationships over large data.
queries and keys are divided by Dk to alleviate the softmax Formally, the multi-head attention function can be repre-
function’s gradient vanishing problem, control the magnitude sented as follows:
of the dot products, and improve generalization. This is known
as a scaled dot product. The result of the attention mechanism
is given by the matrix multiplication of A and V . A is often MultiHead(Q, K, V ) = Concat(head1 , ..., headh )W O
called the attention matrix, and softmax is applied row-wise.
where headi = Attention(QWiQ , KWiK , W ViV ), and
In some cases, a mask could be applied relevance score
WiQ ∈ Rdmodel×dk , WiK ∈ Rdmodel ×dk , W ViV ∈ Rdmodel ×dv ,
matrix (query to key dot product) to consider specific de-
and W O ∈ Rhdv ×dmodel .
pendency patterns between the tokens. For example, in text
5) Positional Embedding: If only the data without its
generation, like ChatGPT, the self-attention module uses a
order is considered, then the self-attention mechanism used
mask that returns the lower triangular part of a tensor input,
in transformers becomes permutation invariant, which pro-
with the elements below the diagonal filled with zeros. This
cesses all tokens equally without considering their positional
helps capture the dependency of a token with all previous
information. This may result in the loss of important semantic
tokens and not with future ones, which is the partner needed
information as the importance of each token with respect to
in text generation.
other tokens in the sequence is not captured. Therefore, it is
The algorithm of a self-attention module (with mask) is
necessary to leverage position information to capture the order
presented in Algorithm 1.
and importance of tokens in the sequence.
Algorithm 1 Self-Attention Module with Mask To address the issue of losing important position infor-
Require: Q, K, and V matrices of dimensions n×dk , m×dk , mation, the transformer model creates an encoding for each
and m × dv , respectively position in the sequence and adds it to the token before
Ensure: Z matrix of dimension n × dv passing it through the self-attention and feedforward layers.
This allows the model to capture the importance of each token
 T dot product of Q and K
1: Step 1: Compute the scaled
concerning others, considering its position in the sequence.
matrices: A ← softmax QK √
dk In the transformer architecture of ChatGPT, the positional
2: Step 2: Apply the mask to the computed attention scores
embedding is added to the input embeddings at the entrance of
(if applicable): the decoder. The positional embedding is expressed as follows.
3: if mask is not None then
Given a sequence of inputs x1 , x2 , ..., xn , the position
4: A ← A ⊙ mask {Element-wise multiplication}
embedding matrix E ∈ Rn×d is calculated as:
5: end if
6: Step 3: Compute the weighted sum of V matrix using A
(
sin 10000i 2k/d , if k is even

matrix as weights: Z ← AV Posi =
cos 10000i 2k/d , if k is odd

7: Step 4: Return the final output matrix Z.
This equation calculates the value of the position embedding
Note that the algorithm takes in matrices Q, K, and V , Posi at position i, given the maximum sequence length d and
which are the query, key, and value matrices, respectively, and the embedding size k. The value of k determines whether the
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
6

sin or cos function is used, and is typically set to an even


number in transformer models.

B. From GPT3 to InstructGPT: : Leveraging Reinforcement


Learning
As an attempt to align its GPT3-based chatbot to human
intentions, OpenAI performed a significant and innovative
overhaul of the GPT-3 model, including state-of-the-art, su-
pervised fine-tuning (SFT) and reinforcement learning from
human feedback (RLHF) [13] algorithms. Their effort resulted Fig. 7. Basic RL model.
in the instructGPT model [14], which has laid the foundations
for the groundbreaking ChatGPT platform [15].
1) Reinforcement Learning from Human Feedback (RLHF): Markov decision process (MDP). MDP-based RL models are
Advanced reinforcement learning (RL) models interact with described using the following tuple [16]:
complex real-world environments to achieve pre-defined goals. 1) S: Set of states where each state, st , "encodes" the envi-
In most instances, these models learn independently by con- ronment (i.e., the user prompts and agent completions).
tinuous feedback from their environments on their recent 2) A: Set of actions that the dialogue agent can take at any
actions/decisions. However, this learning process can be time- step t by generating a new prompt completion.
consuming and expensive computationally. In their seminal 3) P (st+1 , st , at ): Transition probabilities when taking ac-
work, Christiano and his team from DeepMind showed that tion at at state st to reach state st+1 .
expressing the goals of the RL models using human prefer- 4) R (st+1 , st , at ): Reward achieved when taking action at
ences can significantly improve the learning experience in such to transition from st to st+1 . When fine-tuning dialogue
complex environments where access to the reward functions is agents, it is a common practice to assume the rewards
not possible [13]. Aside from resolving the lack of access to independent of the actions.
the reward function, the RL solution, proposed by Christiano et 5) γ: Discount (or forgetting) factor set to a positive
al., requires human intervention in less than 1% of the time the number smaller than 1. For simplicity, γ can be fixed
RL model interacted with its environment. Such small human to a typical value such as 0.9..
intervention allows tackling problems related to real-world In the LRHF settings, the cardinality of the state space,
environments with higher complexities, including Atari games S, and action set, A, have reasonable sizes, which ensures
and simulated robot locomotion. To estimate the rewards tractable policy solutions [14].
collected from the environment, a non-linear reward estimation To account for the uncertainties in the user-agent dialogue
function is proposed as a trainable deep neural network. This interaction, the reward function is formulated using the expec-
network is augmented with "natural" preferences based on tation operator:
sporadic human intervention (< 1% of the learning time). ∞
!
Given the non-stationarity of the adopted reward function, E (Rt:∞ ) = E
X
t+i
γ rt+i (1)
Christiano et al. relied on policy gradient methods [16] to ex- i=1
tract the best policy for the RL model following the successful
where E denotes the expectation operator. The infinite sum
work of Ho and Ermon [17]. The RL models representing the
in Eq. 1 will always converge as γ is smaller than one by
simulated robot locomotion and Atari games are solved using
design. Therefore, Eq. 1 allows handling continuous user/agent
the trust region policy optimization (TRPO) and advantage
interactions since the sum in Eq. 1 will always converge [16].
actor-critic (A2C) algorithms, respectively [18], [19]. Ouyang
Once the reward function is defined, the value function of
et al. devised a fine-tuning strategy for the OpenAI largest
a state st is expressed as the expected cumulative reward
model, GPT3-175B [9], based on the PPO algorithm and
accumulated by taking action at at this state and acting
a learnable reward function that takes into account human
"optimally" till the end of the dialogue session [16]:
preferences [14]. Human preferences are centered on three
main goals: helpfulness, harmlessness, and honesty (HHH).

!
The fine-tuned model, called instructGPT, did not only meet X
V (st ) = E γ i ri |st (2)
the human preferences to a large extent, but the resulting
i=t+1
model size was impressively smaller than its parent model.
More specifically, the instructGPT model consisted of 1.3 Similar to Eq. 2, we can define the state-action function, q-
billion parameters which make it comparable to lightweight state, as follows [16]:
GPT-3 versions such as GPT3-XL and GPT3-2.7B.

!
Fig. 7 illustrates the basic configuration of RL models. In X
i
most of these models, the RL agent interacts with its surround- Q (sk , ak ) = E γ ri |sk , ak (3)
ing environment by taking specific actions and collecting some i=k+1

rewards (could be negative depending on the action taken). Using human feedback and neural-based reward function
The possible actions and rewards are usually defined using a estimators, Eqs. 2-3 can be solved using advanced policy
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
7

Model Name No. of params No. of layers Embedding size No. of heads Head size Batch size Learning rate

GPT3-Small 125M 12 768 12 64 0.5M 6.0 × 10−4


GPT3-Medium 350M 24 1024 16 64 0.5M 3.0 × 10−4
GPT3-Large 760M 24 1536 16 96 0.5M 2.5 × 10−4
GPT3-XL 1.3M 24 2048 24 128 1M 2.0 × 10−4
GPT3-2.7B 2.7B 32 2560 32 80 1M 1.6 × 10−4
GPT3-6.7B 6.7B 32 4096 32 128 2M 1.2 × 10−4
GPT3-13B 13B 40 5140 40 128 2M 1.0 × 10−4
GPT3-175B 175B 96 12288 96 128 3M 0.6 × 10−4

training guided by the outputs of the reward model as shown


in Fig. 9. The LM policy is trained on a variant of the PPO
algorithm that has undergone several hyperparameter tuning
[24]. As it is impossible to have a human labeler intervene
at each training episode, Stiennon et al. suggested using a
transformer-based model to mimic the human scoring rationale
to rate the summarizations produced by the pre-trained and
fine-tuned models. As reported in [24], the reward module
consists of a transformer model with six layers and eight
attention heads. The model produces embedding vectors of
512. The training approach used to train the reward module is
illustrated in Fig. 10.

Fig. 9. Reward-based policy training.

The loss function of the reward module, parameterized with


θ, is defined as follows [14]:

1
loss(θ) = − 2
 E(x,yw ,yl ∼D) [log (σ (rθ (x, yw ) − rθ (x, yl )))] (4)
Fig. 8. Differences between AI-based and human-based text summarization K
processes.
Where rθ (x, yw ) and rθ (x, yl ) represent the outputs of the
reward module the input text, preferred and less-preferred
gradient algorithms to extract a policy that the dialogue agent summarizations by the human labeler, respectively. The set
will execute to follow user intentions and achieve the preset of summarizations labeled by a human is denoted by D.
HHH goals [18]–[21]. It is worth noting that the OpenAI team has treated each
2
possible K comparison as a separate training sample that
2) State-of-the-art related to RLHF-based dialogue sys-
will contribute to K − 1 different gradient updates [14].
tems: Over the last few years, three main players have
emerged in the field of RLHF-based dialogue systems, in-
cluding OpenAI [14], Deepmind [22], and Anthropic [23]. In
their attempt to teach dialogue systems to summarize texts, the
OpenAI team [24] reported one of the earliest success stories
in LRHF adoption in the field of natural language processing
(NLP). In this work, varying-size sentences are summarized
considering human preferences in terms of summary accuracy,
coverage, and coherence [24]. Fig. 8 depicts the main differ-
ences between AI-based and human-based text summarization
processes.
By treating the summarization model as an RL agent,
Stiennon et al. [24] devised a strategy consisting of policy Fig. 10. Reward module training.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
8

III. C HAT GPT COMPETITORS D. OpenAI Playground


ChatGPT is one of the most advanced natural language OpenAI Playground is an intuitive web-based platform that
processing models available today. However, several other simplifies creating and testing predictive language models.
language models are considered competitors to ChatGPT, This tool is both a predictive language and writing tool,
each with unique strengths and weaknesses. These models allowing users to express themselves in diverse ways [28].
have been developed by some of the world’s most prominent This chatbot enables users to type almost anything and receive
tech companies and research institutions. They are designed an accurate and human-like response. It employs a range of
to tackle various language-based tasks, including machine models, including all the models in the GPT-3 series and
translation, language generation, education, and industry. This others, to inspire creativity. OpenAI GPT-3 Playground can
review will focus on the most well-known ChatGPT competi- generate text, explain concepts, summarize text, translate text,
tors, including Google Bard, Chatsonic, Jasper Chat, OpenAI write novels, and much more [28].
Playground, Caktus AI, Replika, Chai AI, Neeva AI, Rytr, and
PepperType. E. Caktus AI
Caktus AI is an educational tool specifically designed
A. Google Bard for students. The first-ever artificial intelligence tool enables
students to automate their homework, freeing up time for
Google Bard is a recently introduced chatbot technology other tasks [29]. Caktus AI offers a range of features tailored
that emerged due to the growing competition in AI, exempli- to students, such as essay and paragraph writers, as well as
fied by counterparts such as ChatGPT. Its primary objective discussion, question, and coding tools. Additionally, students
is to emulate a genuine conversation with a human user. can access career guidance and language assistance through
It utilizes natural language processing and machine learning custom cover letter writers, and language tutor lessons [29].
algorithms to furnish accurate and practical answers to various
inquiries [25]. These tools can be highly beneficial for smaller
businesses that aim to provide natural language assistance F. Replika
to their customers but need more resources to hire a large Replika is specifically designed to foster companionship
support team or rely on Google’s search tools. In addition, and nurture relationships. Millions of people worldwide have
Bard can seamlessly integrate into various digital systems, utilized Replika as a chatting platform and a means of forming
including websites, messaging platforms, and desktop and deep, personal connections [30]. Replika is equipped with the
mobile applications, to enhance customer experience [25]. autoregressive GPT-3 language model, enabling it to acquire
knowledge from its past inputs [30]. As it prioritizes meaning-
ful conversation, it can draw from previous interactions and
B. Chatsonic tailor itself to the individual user, creating a more personalized
ChatSonic is a highly potent conversational AI chatbot experience.
designed to overcome the constraints of ChatGPT by OpenAI.
This advanced AI chatbot is based on the latest GPT-3.5 G. Chai AI
model. It utilizes cutting-edge Natural Language Processing
Chai AI is a comprehensive service that allows users to
(NLP) and Machine Learning (ML) technologies to automate
communicate with various chat AIs across the globe. The plat-
the text and image generation process [26]. ChatSonic can
form is designed to be highly intuitive, allowing users to easily
provide accurate information by utilizing internet results to
browse and select from various bots with unique personalities
generate responses, which significantly reduces the possibility
[31]. Whether users seek a therapeutic or romantic encounter,
of errors. Additionally, this AI chatbot can remember previ-
Chai AI’s bots respond in real time without significant delay.
ous conversations and build upon them, ensuring a seamless
Moreover, they possess a multi-turn ability, which enables
dialogue flow [26]. Furthermore, ChatSonic offers 16 distinct
them to remember past conversations and respond accordingly
personas, allowing users to engage with different virtual per-
[31].
sonalities, ranging from an accountant to a poet.

H. Neeva AI
C. Jasper Chat
Neeva is an AI-powered search engine that provides users
Jasper has been a prominent player in the AI content with a more convenient and user-friendly approach to web
generation industry and has garnered considerable acceptance searches. It allows users to search for various topics, including
from its users. Apart from its content creation capabilities and recipes, gift ideas, etc., and provides answers without requiring
other offerings, Jasper has recently introduced a chatbot named users to navigate multiple search results. Unlike traditional
Jasper Chat. This alternative to ChatGPT is based on GPT 3.5 search engines, Neeva delivers a single, synthesized answer
and other language models and has partnered with OpenAI summarizing the most relevant sites to a query [?]. Addition-
[27]. However, unlike ChatGPT, which is accessible to anyone, ally, Neeva embeds references and citations directly in the
this chatbot has been specifically developed for businesses in answer, making it easy for users to verify the credibility and
advertising, marketing, and more domains. reliability of the sources cited.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
9

I. Rytr language and multimodal setting. The paper also provided


Rytr is an AI writing tool that aims to assist users in gen- a detailed analysis of ChatGPT performance, highlighting
erating high-quality content in various contexts. The platform areas for improvement and potential applications. However,
employs a language model AI to help writers create content for the article didn’t present enough evidence to support the
ideation or practical use. Rytr offers versatility in generating proposed multitasking, multilingual and multimodal evaluation
content with over 40 use cases, and 20 tones [32]. The tool of ChatGPT. In addition, it doesn’t provide sufficient details
also supports over 30 languages and claims to produce outputs on the training data and evaluation metrics. Muennighoff et al.
that require minimal editing, making them pitch-perfect. To [36] proposed a new method SGPT for applying decoder-only
enhance its functionality, Rytr includes an SEO analyzer and transformers to semantic search and extracting meaningful
plugins for WordPress, as well as a Chrome extension [32]. sentence embeddings from them. SGPT can produce state-of-
the-art sentence embeddings that outperform previous methods
on the BEIR search benchmark. The authors also introduced
J. PepperType two settings for SGPT: Cross-Encoder vs. BiEncoder, and
Peppertype is an AI-based service that generates text-based Symmetric vs. Asymmetric, and provided recommendations
content for various applications. Its capabilities include craft- for which settings to use in different scenarios. However, the
ing Google Ad Copy, answering Quora questions, suggest- article primarily focuses on the performance of SGPT on the
ing blog topics, composing e-commerce product descriptions, BEIR search benchmark, and it is unclear how well it performs
writing blog introductions and conclusions, and rewriting on other semantic search tasks. Furthermore, the paper didn’t
existing content [33]. In addition, the user-friendly website provide a detailed analysis of the biases and limitations of the
allows users to browse and select the various content platforms proposed method, which could limit its applicability in certain
by category. Peppertype provides comprehensive language scenarios.
coverage to cater to a diverse global audience, supporting more Zhou et al. [37] comprehensively surveyed recent research
than 25 languages [33]. advancements, current and future challenges, and opportunities
for Pretrained Foundation Models (PFMs) in text, image,
IV. A PPLICATIONS OF C HAT GPT graph, and other data modalities. The authors reviewed the
This section discusses emerging applications and research basic components and existing pretraining techniques in natu-
works of ChatGPT by summarizing the most relevant related ral language processing, computer vision, and graph learning.
studies. We have classified the literature related to ChatGPT They also discussed advanced PFMs for other data modalities
into five distinct categories based on the context of their and unified PFMs considering the data quality and quantity.
application. These categories include (i.) Natural Language The paper is technical and may be challenging for non-experts
Processing, (ii.) Healthcare, (iii.) Ethics, (iv.) Education, (v.) to understand. It also focuses on PFMs and their applications,
Industry. but it may not provide sufficient insights into AI models’
general limitations and ethical implications. Qin et al. [38]
provided an empirical analysis of the zero-shot learning ability
A. Natural Language Processing of ChatGPT. The study evaluated the model’s performance
ChatGPT presents significant advances in the field of natural on 20 popular NLP datasets covering seven representative
language processing (NLP) and has the potential to revolu- task categories: reasoning, natural language inference, question
tionize the way we interact with machines and process natural answering (reading comprehension), dialogue, summarization,
language data. named entity recognition, and sentiment analysis. The article
The article [34] aims to evaluate the performance of Chat- also compared ChatGPT’s performance with the most ad-
GPT compared to human experts regarding dialogue quality. vanced GPT-3.5 model and reported recent work’s zero-shot,
The authors present a comparison corpus of human and fine-tuned, or few-shot fine-tuned results. However, this article
ChatGPT dialogue and use three evaluation metrics to measure does not provide a detailed analysis of the study’s limitations.
the quality of dialogue. In addition, they develop a detection In addition, the article didn’t discuss the ethical implications
model to identify human and ChatGPT-generated conversa- of using large language models for NLP tasks. Ortega et
tions. The results of the experiments demonstrate that the al. [39] introduced linguistic ambiguity, its varieties, and its
performance of ChatGPT is close to that of human experts. relevance in modern Natural Language Processing (NLP). It
However, this article lacks depth and breadth in evaluating also performs an extensive empirical analysis of linguistic
ChatGPT’s performance. The authors focus on a limited set ambiguity in ChatGPT, a transformer-based language model.
of metrics without considering other important aspects such The paper presented the strengths and weaknesses of ChatGPT
as accuracy, fluency, and generalizability. Furthermore, they related to linguistic ambiguity and provided strategies to get
didn’t discuss and provide concrete solutions to the issues the most out of the model. However, the paper is limited to
of scalability and customization of the model. Bang et al. an empirical analysis of ChatGPT’s performance in detect-
[35] analyzed the performance of the ChatGPT multipurpose ing linguistic ambiguity in English. It does not explore the
language and dialogue model across three tasks: reasoning, potential impact of linguistic features or resource limitations
hallucination, and interactivity. The experimental outcomes on the model’s performance. Borji et al. [40] presented a
showed that ChatGPT performs well in all three tasks, sig- comprehensive analysis of ChatGPT’s failures in 11 categories,
nificantly improving reasoning and interactivity in a multiple including reasoning, factual errors, math, coding, and bias,
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
10

along with the limitations, risks, and societal implications of discussed the ethical implications of AI-enabled personalized
large language models. The paper aims to provide a reference medicine and how to best utilize advanced AI technologies
point for evaluating the progress of chatbots like ChatGPT while minimizing risk. However, this article failed to provide
over time and to assist researchers and developers in enhancing adequate information about the potential risks of using AI
future language models and chatbots. However, it lacks an in- in medical applications. Additionally, it didn’t provide any
depth analysis of the problems associated with ChatGPT. It concrete evidence or examples of successful applications of
does not address the underlying issues that cause the failures AI in medical contexts. Antaki et al. [45] conducted a study to
or provide any recommendations for improving the system. evaluate the performance of ChatGPT in ophthalmology. The
This article [41] explored the performance of ChatGPT on outcomes indicated that ChatGPT could provide a low-cost,
aspect-based and query-based text summarization tasks. The accurate, personalized solution for ophthalmology consulta-
study evaluated ChatGPT’s performance on four benchmark tions. Furthermore, ChatGPT can accurately detect ophthalmic
datasets, encompassing various summaries from Reddit posts, diseases and create treatment plans consistent with guide-
news articles, dialogue meetings, and stories. The authors lines. However, this article only evaluated the performance of
reported that ChatGPT’s performance is comparable to tradi- ChatGPT in ophthalmology and did not consider its potential
tional fine-tuning methods regarding Rouge scores. However, applications in other medical specialties. Additionally, it didn’t
the study relies solely on Rouge scores to evaluate the perfor- provide any concrete recommendations on the performance
mance of ChatGPT, which may not be a sufficient indicator of improvement of ChatGPT. Jeblick et al. [3] used a ChatGPT-
summarization quality. The authors acknowledged this limita- based natural language processing system to simplify radiol-
tion and planned to conduct human evaluations shortly. Jiao ogy reports. The study found that the system can achieve high
et al. [42] provided a preliminary study on the performance of accuracy in terms of both content understanding and grammat-
ChatGPT for machine translation tasks, including translation ical correctness. Furthermore, with improved readability and
prompt, multilingual translation, and translation robustness. less medical jargon, the system could generate easier reports
The authors explored the effectiveness of different prompts, for laypeople to understand. However, this study has a few
evaluated the performance of ChatGPT on different language shortcomings. First, the authors only used a small sample
pairs, and investigated its translation robustness. In addition, size and did not comprehensively analyze the data collected.
the authors proposed an interesting strategy named pivot Second, this study did not assess the accuracy of the simplified
prompting that significantly improves translation performance reports produced by ChatGPT. Third, it didn’t provide any
for distant languages. However, the study results may not information on the potential impact of this technology on
be fully reliable due to the randomness in the evaluation medical care or patient outcomes.
process. The paper also does not provide a detailed analysis of The article [46] focuses on the potential of AI-assisted
the factors affecting the translation performance of ChatGPT. medical education using large language models. The authors
Kocon et al. [43] evaluated the capabilities of the ChatGPT on test the performance of ChatGPT on the USMLE, a compre-
25 diverse analytical NLP tasks, most of which are subjective hensive medical board examination, and an internal medical
in nature, including sentiment analysis, emotion recognition, knowledge base. The results show that ChatGPT outperforms
offensiveness and stance detection, natural language inference, current state-of-the-art models in both tasks. The authors
word sense disambiguation, linguistic acceptability, and ques- concluded that large language models have great potential
tion answering. The authors automated ChatGPT’s querying for use in AI-assisted medical education and could help drive
process and analyzed more than 38k responses, comparing personalized learning and assessment. This article has a few
its results with state-of-the-art solutions. However, the study shortcomings. First, the sample size used in the study was
does not directly compare ChatGPT’s performance with other small, as only seven medical students completed the study.
chatbot models. The authors also acknowledge ChatGPT’s This means that the results may not be representative of a
biases but didn’t provide a detailed analysis of these biases. larger population. The article does not discuss the ethical
Additionally, the study’s focus on analytical NLP tasks may implications of using AI-assisted technology in medical ed-
not fully reflect ChatGPT’s performance in more practical ucation. Finally, the article provides no evidence that AI-
settings, such as chatbot interactions. assisted medical education is superior to traditional methods.
The aforementioned discussion related to the applications Dahmen et al. [47] analyzed the potential of ChatGPT as
of ChatGPT in NLP is summarized in Table II. an AI-based tool to assist medical research. It evaluated the
opportunities and challenges that ChatGPT brings, such as
its ability to streamline research by eliminating manual labor
B. Healthcare and providing real-time access to the latest data, as well as
ChatGPT has the potential to revolutionize the healthcare its potential to introduce bias and reduce accuracy. It also
industry by improving patient outcomes, reducing costs, and provided recommendations on how to best utilize and regulate
facilitating more efficient and accurate diagnosis and treat- the use of ChatGPT in medical research. However, the main
ment. shortcoming of this article is that it relies heavily on one
Mann et al. [44] examined AI’s potential role in translational study, which is insufficient to establish definitive conclusions
medicine. It highlighted the need for further research into about the potential of AI bot ChatGPT in medical research.
personalized medicine and the potential for AI-driven data Additionally, the article does not provide enough detail about
analysis to be used in clinical decision-making. Furthermore, it the methods and results of the study, which limits the ability
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
11

TABLE I
A C OMPARATIVE A NALYSIS OF C HAT GPT- RELATED W ORKS IN NLP

Reference Main Contributions Strengths Shortcomings


[34]
• Compared ChatGPT and humans in • Compared ChatGPT’s performance to • The evaluation of ChatGPT’s perfor-
conversation quality using a dataset human experts mance is incomplete
and three metrics • Measured its accuracy in three areas • Focused on limited metrics
• Presented an approach for detecting
chatbot impersonation

[35]
• Analyzed and evaluated the perfor- • A novel evaluation framework for • Insufficient evidence for the proposed
mance of ChatGPT in three tasks conversational agents evaluation methods
• Identified areas for improvement and • Lack of details on training data and
potential applications evaluation metrics

[39]
• Explored linguistic ambiguity in • Disambiguation tasks through • Did not address linguistic features or
ChatGPT and its relevance to NLP context-based analysis resource limitations
• valuable insights for using • Did not propose new solutions for
transformer-based language models addressing linguistic ambiguity
in NLP systems

[40]
• Analyzes ChatGPT’s limitations and • Provides relatable Twitter examples • Lacks recommendations for improve-
risks as a reference for evaluating chatbot ment and supporting evidence
progress

[41]
• ChatGPT’s performance on aspect- • Identifies gaps in research • Relies solely on Rouge scores for
based and query-based summarization evaluation
tasks

[42]
• Investigates ChatGPT’s machine • Discusses insights, strategies, and • Limited test data coverage
translation performance limitations in evaluating ChatGPT’s • Lack of detailed analysis of transla-
translation performance tion performance

[43]
• Evaluates ChatGPT’s performance on • Highlights ChatGPT’s strengths and • Lacks direct comparison with other
25 diverse NLP tasks weaknesses chatbot models
• Analyzes ChatGPT’s results with • Discusses ethical implications • Acknowledgement of biases without a
state-of-the-art solutions • Automates the querying process for detailed analysis
increased efficiency

[38]
• ChatGPT’s zero-shot learning perfor- • Empirical studies • Lacks analysis on fine-tuning and eth-
mance compared with GPT-3.5 on • Qualitative case studies ical considerations
various NLP tasks • Zero-shot performance comparisons

to draw meaningful conclusions. King et al. [48] presented accuracy and time spent. Additionally, the study discussed the
a perspective on the future of artificial intelligence (AI) in strong potential of ChatGPT for future applications in surgical
medicine. It discusses the potential of AI to improve the practice. However, This article lacks empirical evidence to
accuracy and efficiency of medical diagnostics, reduce costs, support its claims. Additionally, the study was limited in
and improve access to medical care in parts of the world scope as it was conducted in one hospital and did not include
without adequate healthcare infrastructure. The authors also a control group. Furthermore, the study did not consider
explored the ethical considerations of using AI in medicine, the long-term effects of ChatGPT on patient satisfaction or
such as privacy, data security, and regulatory compliance. outcomes, leaving these questions unanswered.
However, the article does not provide an in-depth analysis
The editorial [50] explored the potential impact of the
of the ethical implications of using AI in health care or the
ChatGPT language model on medical education and research.
potential risks associated with its use. Additionally, the article
It discusses the various ways ChatGPT can be used in the
does not discuss how AI can be used to improve patient care
medical field, including providing medical information and
or how it could be used to develop new medical treatments.
assistance, assisting with medical writing, and helping with
Bhattacharya et al. [49] evaluated the performance of a novel
medical education and clinical decision-making. However,
deep learning and ChatGPT-based model for surgical practice.
the editorial didn’t provide new research data or empirical
The proposed model can understand natural language inputs
evidence to support its claims. Instead, it primarily relies
such as queries and generate relevant answers. The outcomes
on anecdotal evidence and expert opinions, which may not
demonstrated that ChatGPT outperformed baseline models in
represent the wider medical community’s views. This case
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
12

study in [51] explored the potential of generative AI in benefit society and maintain public safety. However, this article
improving the translation of environmental health research to didn’t consider the potential impacts of using large generative
non-academic audiences. The study submitted five recently AI models on privacy, data security, data ownership, and other
published environmental health papers to ChatGPT to generate ethical considerations. Additionally, it didn’t provide concrete
summaries at different readability levels. It evaluated the proposals or recommendations to effectively regulate large
quality of the generated summaries using a combination of generative AI models.
Likert-scale, yes/no, and text to assess scientific accuracy, Khalil et al. [56] explored the potential of using a GPT-
completeness, and readability at an 8th-grade level. However, based natural language processing (NLP) model for detect-
this study is limited in scope as it only evaluates the quality ing plagiarism. The authors presented the ChatGPT model,
of generated summaries from five recently published envi- trained on a large corpus of text to generate contextualized,
ronmental health papers. The study acknowledges the need paraphrased sentences. The experimental outcomes report that
for continuous improvement in generative AI technology but the model could identify plagiarism with an accuracy of
does not provide specific recommendations for improvement. 88.3% and detect previously unseen forms of plagiarism with
Wang et al. [Ref 33] investigated the effectiveness of using the an accuracy of 76.2%. However, the article does not offer
ChatGPT model for generating Boolean queries for systematic an alternate solution to the problem of plagiarism detection.
review literature searches. The study compared ChatGPT with Additionally, there is not enough discussion around the ethical
state-of-the-art methods for query generation and analyzed the implications of using ChatGPT to detect plagiarism. Zhuo et
impact of prompts on the effectiveness of the queries produced al. [57] presented a comprehensive exploration and catalog
by ChatGPT. However, the model’s MeSH term handling is of ethical issues in ChatGPT. The authors analyze the model
poor, which may impact the recall of the generated queries. In from four perspectives: bias, reliability, robustness, and tox-
addition, the study is limited to standard test collections for icity. In addition, they benchmark ChatGPT empirically on
systematic reviews, and the findings may not be generalizable multiple datasets and identify several ethical risks that existing
to other domains. Kurian et al. [52] highlighted the role of benchmarks cannot address. The paper also examined the
ChatGPT and its potential to revolutionize communication. implications of the findings for the AI ethics of ChatGPT and
It also sheds light on the issue of HPV vaccination and future practical design considerations for LLMs. However, the
the role of oral health care professionals in promoting it. In article does not provide a quantitative analysis of the identified
addition, the article suggested that training and education tools ethical issues in ChatGPT, which may cause a more nuanced
can improve healthcare providers’ willingness and ability to understanding of the risks posed by the model.
recommend the vaccine. However, the article doesn’t provide The aforementioned discussion related to the applications
in-depth analysis or research on either topic and relies heavily of ChatGPT in ethics is summarized in Table IV.
on information provided by external sources. It also does not
address potential concerns or criticisms of using AI chatbots D. Education
or the HPV vaccine. ChatGPT has the potential to enhance the quality of ed-
The aforementioned discussion related to the applications ucation by providing personalized, student-centered learning
of ChatGPT in healthcare is summarized in Table III. experiences that can help improve learning outcomes and
promote student success.
Frieder et al. [4] discussed the capabilities of ChatGPT
C. Ethics in education. The authors explored three key areas: First,
ChatGPT has been widely discussed in the field of ethics the ability of ChatGPT to generate mathematically valid
due to its potential impact on society. natural language statements; second, its ability to answer
Graf et al. [54] discussed the implications of using ChatGPT mathematics-related queries; and third, its ability to solve math
in research. It emphasized the importance of responsible re- problems. The experimental outcomes indicate that ChatGPT
search that adheres to ethical, transparent, and evidence-based outperformed other models in generating mathematically valid
standards. It also highlighted the potential to use ChatGPT for statements and answering math questions. However, this article
more specific and in-depth research, such as understanding the is largely theoretical and does not provide empirical evidence
impact of implicit biases and the potential risks associated with to prove the claims. Additionally, the authors do not provide
its use. The major shortcomings of this article are a lack of any evaluation metrics or performance results to demonstrate
discussion regarding the ethical implications of using ChatGPT the effectiveness of the proposed approach. Chen et al. [58]
in research. Additionally, it does not address the potential risks analyzed the potential impact of ChatGPT on library reference
associated with ChatGPT, and there is limited discussion of services. The article suggested that ChatGPT could help to
the potential benefits of using the technology. Hacker et al. reduce workloads, improve response times, provide accurate
[55] presented a brief discussion on various issues regulating and comprehensive answers, and offer a way to answer com-
large generative AI models, such as ChatGPT. The authors plex questions. Additionally, it argued that ChatGPT could
explored the legal, economic, and ethical considerations of enhance the user experience and transform library reference
regulating the use of such models, as well as proposed a services. However, the article does not offer sufficient details
regulatory framework for the safe and responsible use of about the potential advantages and disadvantages of using
ChatGPT. Finally, they concluded that a strong regulatory ChatGPT. Furthermore, the article does not provide any infor-
framework is necessary to ensure that ChatGPT is used to mation about the actual implementation of ChatGPT in library
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
13

TABLE II
A C OMPARATIVE A NALYSIS OF C HAT GPT- RELATED W ORKS IN H EALTHCARE

Reference Main Contributions Strengths Shortcomings


[45]
• ChatGPT provides accurate and per- • Accurate diagnoses and treatment op- • Lacks consideration for ChatGPT’s
sonalized ophthalmology consulta- tions potential in other medical specialties
tions at a low cost • Provides doctors with multiple ap- • Lacks comparison to similar systems
proaches for patient treatment

[3]
• ChatGPT-based NLP systems can ac- • ChatGPT-based Chatbot to simplify • Small sample size
curately simplify radiology reports radiology reports for better under- • Lack of accuracy assessment
standing • No information on potential impact
• Provides a case study of existing
methods and limitations

[46]
• ChatGPT in AI-assisted medical edu- • ChatGPT’s potential is more accurate • Small sample size
cation and faster AI model • Lacks evidence of the superiority of
• Discusses ethical concerns related to AI-assisted medical education over
its implementation traditional methods

[51]
• ChatGPT used to summarize environ- • ChatGPT improves research transla- • Does not provide improvement rec-
mental health research tion for a wider audience. ommendations.
• ChatGPT summaries aid communica- • Ignores ethical concerns.
tion.

[53]
• Examines ChatGPT’s query genera- • Boolean queries for systematic re- • The model’s MeSH term handling
tion effectiveness views effective generation using Chat- may affect recall
• Compares with state-of-the-art meth- GPT
ods.

TABLE III
A C OMPARATIVE A NALYSIS OF C HAT GPT- RELATED W ORKS IN E THICS

Reference Main Contributions Strengths Shortcomings


[55]
• ChatGPT for societal benefit and pub- • Covers current developments of chat- • Lack of discussion on the impacts on
lic safety bots privacy, security, ethics
• Explanation of complex decisions

[56]
• Detecting plagiarism better than exist- • ChatGPT as a plagiarism detection • Insufficient evidence
ing methods, with 88.3% accuracy tool • Inadequate ethical discussion regard-
ing ChatGPT’s plagiarism detection

[57]
• Analyzes ChatGPT biases, reliability, • Identifies new ethical risks • No quantitative analysis
robustness, and toxicity. • Fills gaps in previous research • Limited examination of other LLMs
• Identifies ChatGPT ethical risks
• Examines implications for AI ethics
and future LLM design

reference services. Susnjak et al. [59] evaluated the ability of Tlili et al. [60] presented a case study of ChatGPT in the
ChatGPT to perform high-level cognitive tasks and produce context of education. The study examined the public discourse
text indistinguishable from the human-generated text. The surrounding ChatGPT, its potential impact on education, and
study shows that ChatGPT can exhibit critical thinking skills users’ experiences in educational scenarios. It also identified
and generate highly realistic text with minimal input, making various issues, including cheating, honesty and truthfulness
it a potential threat to the integrity of online exams. The of ChatGPT, privacy misleading, and manipulation. However,
study also discussed online exams’ challenges in maintaining this study only examines the experiences of a small number
academic integrity, the various strategies institutions use to of users in educational scenarios, and the findings may not
mitigate the risk of academic misconduct, and the potential be generalizable to other contexts. In addition, it does not
ethical concerns surrounding proctoring software. However, provide a comprehensive analysis of the ethical implications of
the study mainly focuses on the potential threat of ChatGPT using ChatGPT in education, which could be an area for future
to online exams and does not provide an in-depth analysis of research. This editorial in [61] explored the potential use of an
other potential risks associated with online exams. AI chatbot, ChatGPT, in scientific writing. It highlighted the
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
14

ability of ChatGPT to assist in organizing material, generating making it less suitable for readers interested in generative AI’s
an initial draft, and proofreading. The paper discussed the technical aspects. Additionally, the paper does not critically
limitations and ethical concerns of using ChatGPT in scientific analyze generative AI models’ ethical and social implications.
writing, such as plagiarism and inaccuracies. However, the The case study [66] examined the use of ChatGPT in a human-
article does not provide empirical evidence or case studies centered design process. The study aims to explore the various
to support its claims. The potential benefits and limitations of roles of fully conversational agents in the design process and
using ChatGPT in scientific writing are discussed theoretically, understand the emergent roles these agents can play in human-
but there is no practical demonstration of how this tool can AI collaboration. ChatGPT is used to simulate interviews with
assist scientific writing. Therefore, the paper lacks practical fictional users, generate design ideas, simulate usage scenarios,
examples that can help readers better understand the potential and evaluate user experience for a hypothetical design project
of ChatGPT in scientific writing. King et al. [62] briefly on designing a voice assistant for the health and well-being of
discussed the history and evolution of AI and chatbot technol- people working from home. However, this study is limited by
ogy. It explored the growing concern of plagiarism in higher its focus on a single hypothetical design project. As a result,
education and the potential for chatbots like ChatGPT to be the generalizability of the findings to other design contexts
used for cheating. Additionally, it provides suggestions for may be limited. The study also relies on subjective evaluations
ways college professors can design assignments to minimize of ChatGPT’s performance, which may be influenced by the
potential cheating via chatbots. However, it doesn’t delve into authors’ biases or expectations.
the potential benefits of using chatbots in higher education. The aforementioned discussion related to the applications
The aforementioned discussion related to the applications of ChatGPT in the industry is summarized in Table VI.
of ChatGPT in education is summarized in Table V.
V. C HALLENGES AND F UTURE D IRECTIONS
A. Challenges
E. Industry
• Data Privacy and Ethics: The challenge of Data Privacy
ChatGPT has numerous potential applications in the indus- and Ethics for ChatGPT is complex and multifaceted.
try, particularly in customer service and marketing areas. It can One aspect of this challenge is related to data privacy,
improve efficiency, increase customer satisfaction, and provide which involves protecting personal information collected
valuable insights to companies across various industries. by ChatGPT. ChatGPT relies on vast amounts of data to
Prieto et al. [63] investigated the potential of using ChatGPT train its language model, which often includes sensitive
to automate the scheduling of construction projects. The data user information, such as chat logs and personal details.
mining techniques were used to extract scheduling information Therefore, ensuring that user data is kept private and
from project specifications and histories. The results showed secure is essential to maintain user trust in the technology
that the algorithm could accurately predict the project timeline [67], [68].
and duration with an average accuracy of 81.3%. However, the Another aspect of the Data Privacy and Ethics challenge
article lacks detailed information regarding implementing the for ChatGPT is related to ethical considerations. Chat-
ChatGPT interface and its effects on the scheduling process, as GPT has the potential to be used in various applica-
well as the lack of reliable data to support the authors’ claims tions, including social media, online communication, and
that the scheduling process is improved by incorporating Chat- customer service. However, the technology’s capabilities
GPT. Graf et al. [64] discussed the applications of ChatGPT also pose ethical concerns, particularly in areas such as
in the finance industry. First, it looked at the potential of using spreading false information and manipulating individuals
machine learning to facilitate the analysis of financial data, as [69], [70]. The potential for ChatGPT to be used mali-
well as its potential applications in finance. Next, it presented ciously highlights the need for ethical considerations in
the "Bananarama Conjecture", which suggests that ChatGPT its development and deployment.
can provide greater insights into financial research and data To address these challenges, researchers and develop-
analysis than conventional methods. However, the authors limit ers need to implement robust data privacy and security
their scope to the Bananarama Conjecture, a relatively narrow measures in the design and development of ChatGPT.
field that may not capture the breadth of the finance industry. This includes encryption, data anonymization, and access
Furthermore, they didn’t provide a detailed discussion of their control mechanisms. Additionally, ethical considerations
findings’ implications, making it difficult to draw conclusions should be integrated into the development process, such
from the research. as developing guidelines for appropriate use and ensuring
Gozalo et al. [65] provided a concise taxonomy of recent technology deployment transparency. By taking these
large generative models of artificial intelligence, their sectors steps, the development and deployment of ChatGPT can
of application, and their implications for industry and society. proceed ethically and responsibly, safeguarding users’
The article described how these models generate novel content privacy and security while promoting its positive impact.
and differ from predictive machine learning systems. The study • Bias and Fairness: Bias and fairness are critical issues
also highlights the limitations of these models, such as the related to developing and deploying chatbot systems
need for enormous datasets and the difficulty in finding data like ChatGPT. Bias refers to the systematic and unfair
for some models. However, the paper does not provide a de- treatment of individuals or groups based on their per-
tailed technical explanation of the models or their architecture, sonal characteristics, such as race, gender, or religion.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
15

TABLE IV
A C OMPARATIVE A NALYSIS OF C HAT GPT- RELATED W ORKS IN E DUCATION

Reference Main Contributions Strengths Shortcomings


[4]
• ChatGPT in answering math queries • ChatGPT for intelligent tutoring sys- • Theoretical approach with no empiri-
• Solving math problems compared to tems cal evidence
other models • Lacks metrics or results for effective-
ness

[58]
• Improve and transform library refer- • ChatGPT impact analysis • Lacks details on ChatGPT benefits,
ence services • Benefits and implications discussion implementation, and ethics

[59]
• ChatGPT’s potential threat to online • ChatGPT’s threat to online exam in- • Lack of thorough analysis of the po-
exam integrity tegrity tential threat to online exams
• Strategies for maintaining academic • Emphasize the need for more research
integrity on this issue

[60]
• ChatGPT issues in education • Insights into ChatGPT’s use in educa- • Small sample size
tion • Limited analysis of ethics

TABLE V
A C OMPARATIVE A NALYSIS OF C HAT GPT- RELATED W ORKS IN I NDUSTRY

Reference Main Contributions Strengths Shortcomings


[64]
• Improved financial research through • Innovative approach using NLP • Limited financial data
ChatGPT • Narrow scope
• Insufficient discussion of implications

[63]
• ChatGPT’s for construction project • Data mining solutions • Incomplete information on ChatGPT
scheduling • Time-saving solutions implementation
• Lack of reliable data
• No details on application to other
projects

[66]
• ChatGPT’s potential and limitations • Insights into benefits and limitations • Generalizability limitations
in design process exploration in the design process • Objective evaluations
• Lack of analysis of ethical implica-
tions

In chatbots, bias can occur in several ways [71], [72]. any particular group. The development and deployment
For example, biased language models can lead to bi- of chatbots must be done with a clear understanding of
ased responses that perpetuate stereotypes or discriminate the ethical considerations and a commitment to uphold
against certain groups. Biased training data can also result principles of fairness and non-discrimination.
in a chatbot system that provides inaccurate or incomplete • Robustness and Explainability: Robustness and ex-
information to users. plainability are two critical challenges that must be ad-
Fairness, on the other hand, relates to treating all users dressed when deploying ChatGPT in real-world applica-
equally without discrimination. Chatbots like ChatGPT tions [7].
must be developed and deployed in a way that pro- Robustness refers to the ability of ChatGPT to maintain
motes fairness and prevents discrimination. For instance, high performance even when faced with unexpected in-
a chatbot must provide equal access to information or puts or perturbations. In other words, robustness ensures
services, regardless of the user’s background or personal that the model’s predictions are reliable and consistent
characteristics. across different contexts. For example, if ChatGPT is
To address bias and fairness concerns, developers of used to generate responses in a chatbot, it must be robust
chatbots like ChatGPT must use unbiased training data to diverse language styles, accents, and topics to provide
and language models. Additionally, the chatbot system accurate and relevant responses.
must be regularly monitored and audited to identify and However, achieving robustness is challenging as the
address any potential biases. Fairness can be promoted by model’s performance can be affected by various fac-
ensuring that the chatbot system provides equal access to tors such as data quality, model architecture, training
information or services and does not discriminate against algorithms, and hyperparameters. In addition, adversarial
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
16

attacks can also compromise the robustness of ChatGPT. techniques that can learn from limited amounts of data,
Adversarial attacks refer to deliberately manipulating such as transfer learning, domain adaptation, and cross-
inputs to mislead the model’s predictions. Therefore, lingual learning. Another approach is to develop new
robustness is a crucial challenge that must be addressed data augmentation techniques that can generate synthetic
to ensure the reliability and trustworthiness of ChatGPT. data to supplement the limited labeled data available
On the other hand, explainability refers to the ability for low-resource languages. Additionally, researchers can
of ChatGPT to provide transparent and interpretable investigate new evaluation metrics and benchmarks that
explanations for its predictions. Explainability is crucial are specific to low-resource languages.
for building trust and accountability, especially in critical Developing ChatGPT models that can effectively process
applications such as healthcare and finance. For example, low-resource languages is a crucial area of research for
suppose ChatGPT is used to diagnose a medical condi- the future. It has the potential to enable access to informa-
tion. In that case, it must be able to provide transparent tion and communication for communities that have been
and interpretable explanations for its diagnosis to ensure historically marginalized due to language barriers.
that healthcare professionals and patients can understand • Domain-Specific Language Processing: Domain-
and trust its decisions. specific language processing refers to developing
However, explainability is also challenging as deep learn- and applying language models trained on text data
ing models such as ChatGPT are often seen as black from specific domains or industries, such as healthcare,
boxes, making it difficult to understand how they arrive at finance, or law. ChatGPT, with its remarkable capabilities
their decisions. Recent advances in explainable AI (XAI) in natural language processing, has the potential to be
have proposed techniques such as attention mechanisms applied to various domains and industries to improve
and saliency maps to provide interpretable explanations communication, decision-making, and automation.
for deep learning models. Therefore, explainability is a One potential future direction for ChatGPT is to develop
critical challenge that must be addressed to ensure the domain-specific language models that can understand and
transparency and accountability of ChatGPT. generate text specific to a particular domain or industry.
This would involve training the model on large amounts
of domain-specific text data and fine-tuning the model
B. Future Directions to the specific language and terminology used in that
• Multilingual Language Processing: Multilingual Lan- domain.
guage Processing is a crucial area for future work related Another future direction is to develop ChatGPT models
to ChatGPT [35]. Despite ChatGPT’s impressive perfor- that can transfer knowledge from one domain to another,
mance in English language processing, its effectiveness allowing for more efficient training and adaptation of lan-
in multilingual contexts is still an area of exploration. To guage models. This involves developing transfer learning
address this, researchers may explore ways to develop and techniques enabling the model to generalize from one
fine-tune ChatGPT models for different languages and domain to another while preserving the domain-specific
domains and investigate cross-lingual transfer learning language and context. Domain-specific language models
techniques to improve the generalization ability of Chat- could be used to address specific challenges in different
GPT. Additionally, future work in multilingual language disciplines. For example, ChatGPT could be used in
processing for ChatGPT may focus on developing multi- healthcare to develop models for medical diagnosis, drug
lingual conversational agents that can communicate with discovery, or patient monitoring. In finance, it could
users in different languages. This may involve addressing be used to develop fraud detection, investment analysis,
challenges such as code-switching, where users may or risk management models. These applications require
switch between languages within a single conversation. a deep understanding of domain-specific language and
Furthermore, research in multilingual language process- context, making ChatGPT an ideal tool for tackling these
ing for ChatGPT may also investigate ways to improve challenges.
the model’s handling of low-resource languages, which
may have limited training data available. VI. C ONCLUSION
• Low-Resource Language Processing: One of the future In conclusion, this survey presents a critical review of
works for ChatGPT is to extend its capabilities to low- ChatGPT, its technical advancements, and its standing within
resource language processing. This is particularly impor- the realm of conversational and generative AI. We have
tant as a significant portion of the world’s population demystified the factors that contribute to ChatGPT’s excep-
speaks low-resource languages with limited amounts of tional performance and capabilities by thoroughly analyzing
labeled data for training machine learning models. There- its innovations, establishing a taxonomy of recent research,
fore, developing ChatGPT models that can effectively and conducting a comparative analysis of its competitors.
process low-resource languages could have significant Moreover, we have identified and discussed the challenges and
implications for enabling communication and access to limitations of ChatGPT, emphasizing areas of improvement
information in these communities. and unexplored research opportunities.
To achieve this goal, researchers can explore several ap- We believe this survey lays the groundwork for a deeper un-
proaches. One possible solution is to develop pretraining derstanding of the trending ChatGPT in generative AI and will
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
17

serve as a valuable reference for researchers and practitioners [20] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox-
seeking to harness the power of ChatGPT in their applications imal policy optimization algorithms,” arXiv preprint arXiv:1707.06347,
2017.
or address its gaps as part of ongoing development. [21] Y. Wang, H. He, and X. Tan, “Truly proximal policy optimization,” in
Uncertainty in Artificial Intelligence. PMLR, 2020, pp. 113–122.
[22] A. Glaese, N. McAleese, M. Tr˛ebacz, J. Aslanides, V. Firoiu, T. Ewalds,
R EFERENCES M. Rauh, L. Weidinger, M. Chadwick, P. Thacker et al., “Improving
alignment of dialogue agents via targeted human judgements,” arXiv
[1] A. S. George and A. H. George, “A review of chatgpt ai’s impact on preprint arXiv:2209.14375, 2022.
several business sectors,” Partners Universal International Innovation
[23] Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma,
Journal, vol. 1, no. 1, pp. 9–23, 2023.
D. Drain, S. Fort, D. Ganguli, T. Henighan et al., “Training a helpful and
[2] M. Verma, “Integration of ai-based chatbot (chatgpt) and supply chain harmless assistant with reinforcement learning from human feedback,”
management solution to enhance tracking and queries response.” arXiv preprint arXiv:2204.05862, 2022.
[3] K. Jeblick, B. Schachtner, J. Dexl, A. Mittermeier, A. T. Stüber,
[24] N. Stiennon, L. Ouyang, J. Wu, D. Ziegler, R. Lowe, C. Voss, A. Rad-
J. Topalis, T. Weber, P. Wesp, B. Sabel, J. Ricke et al., “Chatgpt makes
ford, D. Amodei, and P. F. Christiano, “Learning to summarize with
medicine easy to swallow: An exploratory case study on simplified
human feedback,” Advances in Neural Information Processing Systems,
radiology reports,” arXiv preprint arXiv:2212.14882, 2022.
vol. 33, pp. 3008–3021, 2020.
[4] S. Frieder, L. Pinchetti, R.-R. Griffiths, T. Salvatori, T. Lukasiewicz,
[25] V. Malhotra, “Google introduces bard ai as a rival to chatgpt,” 2023,
P. C. Petersen, A. Chevalier, and J. Berner, “Mathematical capabilities
https://beebom.com/google-bard-chatgpt-rival-introduced/.
of chatgpt,” arXiv preprint arXiv:2301.13867, 2023.
[26] S. Garg, “What is chatsonic, and how to use it?” 2023, https://writesonic.
[5] S. Shahriar and K. Hayawi, “Let’s have a chat! a conversation with
com/blog/what-is-chatsonic/.
chatgpt: Technology, applications, and limitations,” arXiv preprint
[27] Jasper, “Jasper chat - ai chat assistant,” 2023, https://www.jasper.ai/.
arXiv:2302.13817, 2023.
[28] OpenAI, “Welcome to openai,” 2023, https://platform.openai.com/
[6] A. Lecler, L. Duron, and P. Soyer, “Revolutionizing radiology with gpt-
overview.
based models: Current applications, future possibilities and limitations
of chatgpt,” Diagnostic and Interventional Imaging, 2023. [29] C. AI, “Caktus - open writing with ai,” 2023, https://www.caktus.ai/
[7] R. Omar, O. Mangukiya, P. Kalnis, and E. Mansour, “Chatgpt versus caktus_student/.
traditional question answering for knowledge graphs: Current status and [30] E. Kuyda, “What is chatsonic, and how to use it?” 2023, https://help.
future directions towards knowledge graph chatbots,” arXiv preprint replika.com/hc/en-us/articles/115001070951-What-is-Replika-.
arXiv:2302.06466, 2023. [31] C. Research, “Chai gpt - ai more human, less filters,” 2023, https://www.
[8] A. Haleem, M. Javaid, and R. P. Singh, “An era of chatgpt as a significant chai-research.com/.
futuristic support tool: A study on features, abilities, and challenges,” [32] Rytr, “Rytr - take your writing assistant,” 2023, https://rytr.me/.
BenchCouncil Transactions on Benchmarks, Standards and Evaluations, [33] Peppertype.ai, “Peppertype.ai - your virtual content assistant,” 2023,
p. 100089, 2023. https://www.peppertype.ai/.
[9] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, [34] B. Guo, X. Zhang, Z. Wang, M. Jiang, J. Nie, Y. Ding, J. Yue, and
A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- Y. Wu, “How close is chatgpt to human experts? comparison corpus,
Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, evaluation, and detection,” arXiv preprint arXiv:2301.07597, 2023.
J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, [35] Y. Bang, S. Cahyawijaya, N. Lee, W. Dai, D. Su, B. Wilie, H. Lovenia,
B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, Z. Ji, T. Yu, W. Chung et al., “A multitask, multilingual, multimodal
and D. Amodei, “Language models are few-shot learners,” CoRR, vol. evaluation of chatgpt on reasoning, hallucination, and interactivity,”
abs/2005.14165, 2020. arXiv preprint arXiv:2302.04023, 2023.
[10] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. [36] N. Muennighoff, “Sgpt: Gpt sentence embeddings for semantic search,”
Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you arXiv preprint arXiv:2202.08904, 2022.
need,” in Advances in Neural Information Processing Systems, [37] C. Zhou, Q. Li, C. Li, J. Yu, Y. Liu, G. Wang, K. Zhang, C. Ji, Q. Yan,
I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, L. He et al., “A comprehensive survey on pretrained foundation models:
S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, A history from bert to chatgpt,” arXiv preprint arXiv:2302.09419, 2023.
Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper/ [38] C. Qin, A. Zhang, Z. Zhang, J. Chen, M. Yasunaga, and D. Yang,
2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf “Is chatgpt a general-purpose natural language processing task solver?”
[11] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training arXiv preprint arXiv:2302.06476, 2023.
of deep bidirectional transformers for language understanding,” arXiv [39] M. Ortega-Martín, Ó. García-Sierra, A. Ardoiz, J. Álvarez, J. C. Ar-
preprint arXiv:1810.04805, 2018. menteros, and A. Alonso, “Linguistic ambiguity analysis in chatgpt,”
[12] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, arXiv preprint arXiv:2302.06426, 2023.
“Improving language understanding by generative pre- [40] A. Borji, “A categorical archive of chatgpt failures,” arXiv preprint
training,” URL https://s3-us-west-2. amazonaws. com/openai- arXiv:2302.03494, 2023.
assets/researchcovers/languageunsupervised/language_understanding_paper.[41] X. Yang, Y. Li, X. Zhang, H. Chen, and W. Cheng, “Exploring the limits
pdf, 2018. of chatgpt for query or aspect-based text summarization,” arXiv preprint
[13] P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, arXiv:2302.08081, 2023.
“Deep reinforcement learning from human preferences,” Advances in [42] W. Jiao, W. Wang, J.-t. Huang, X. Wang, and Z. Tu, “Is chatgpt a good
neural information processing systems, vol. 30, 2017. translator? a preliminary study,” arXiv preprint arXiv:2301.08745, 2023.
[14] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, [43] J. Kocoń, I. Cichecki, O. Kaszyca, M. Kochanek, D. Szydło, J. Baran,
C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, J. Bielaniewicz, M. Gruza, A. Janz, K. Kanclerz et al., “Chatgpt: Jack
F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, of all trades, master of none,” arXiv preprint arXiv:2302.10724, 2023.
J. Leike, and R. Lowe, “Training language models to follow instructions [44] D. L. Mann, “Artificial intelligence discusses the role of artificial
with human feedback,” 2022. intelligence in translational medicine: A jacc: Basic to translational
[15] H. H. Thorp, “Chatgpt is fun, but not an author,” Science, vol. 379, no. science interview with chatgpt,” Basic to Translational Science, 2023.
6630, pp. 313–313, 2023. [45] F. Antaki, S. Touma, D. Milad, J. El-Khoury, and R. Duval, “Evalu-
[16] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. ating the performance of chatgpt in ophthalmology: An analysis of its
MIT press, 2018. successes and shortcomings,” medRxiv, pp. 2023–01, 2023.
[17] J. Ho and S. Ermon, “Generative adversarial imitation learning,” Ad- [46] T. H. Kung, M. Cheatham, A. Medenilla, C. Sillos, L. De Leon,
vances in neural information processing systems, vol. 29, 2016. C. Elepaño, M. Madriaga, R. Aggabao, G. Diaz-Candido, J. Maningo
[18] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust et al., “Performance of chatgpt on usmle: Potential for ai-assisted
region policy optimization,” in International conference on machine medical education using large language models,” PLOS Digital Health,
learning, 2015, pp. 1889–1897. vol. 2, no. 2, p. e0000198, 2023.
[19] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, [47] J. Dahmen, M. Kayaalp, M. Ollivier, A. Pareek, M. T. Hirschmann,
D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep rein- J. Karlsson, and P. W. Winkler, “Artificial intelligence bot chatgpt in
forcement learning,” in International conference on machine learning, medical research: the potential game changer as a double-edged sword,”
2016, pp. 1928–1937. pp. 1–3, 2023.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
18

[48] M. R. King, “The future of ai in medicine: a perspective from a chatbot,”


Annals of Biomedical Engineering, pp. 1–5, 2022.
[49] K. Bhattacharya, A. S. Bhattacharya, N. Bhattacharya, V. D. Yagnik,
P. Garg, and S. Kumar, “Chatgpt in surgical practice—a new kid on the
block,” Indian Journal of Surgery, pp. 1–4, 2023.
[50] T. B. Arif, U. Munaf, and I. Ul-Haque, “The future of medical education
and research: Is chatgpt a blessing or blight in disguise?” p. 2181052,
2023.
[51] L. B. Anderson, D. Kanneganti, M. B. Houk, R. H. Holm, and T. R.
Smith, “Generative ai as a tool for environmental health research
translation,” medRxiv, pp. 2023–02, 2023.
[52] N. Kurian, J. Cherian, N. Sudharson, K. Varghese, and S. Wadhwa, “Ai
is now everywhere,” British Dental Journal, vol. 234, no. 2, pp. 72–72,
2023.
[53] S. Wang, H. Scells, B. Koopman, and G. Zuccon, “Can chatgpt write
a good boolean query for systematic review literature search?” arXiv
preprint arXiv:2302.03495, 2023.
[54] A. Graf and R. E. Bernardi, “Chatgpt in research: Balancing ethics,
transparency and advancement.” Neuroscience, pp. S0306–4522, 2023.
[55] P. Hacker, A. Engel, and M. Mauer, “Regulating chatgpt and other large
generative ai models,” arXiv preprint arXiv:2302.02337, 2023.
[56] M. Khalil and E. Er, “Will chatgpt get you caught? rethinking of
plagiarism detection,” arXiv preprint arXiv:2302.04335, 2023.
[57] T. Y. Zhuo, Y. Huang, C. Chen, and Z. Xing, “Exploring ai ethics of
chatgpt: A diagnostic analysis,” arXiv preprint arXiv:2301.12867, 2023.
[58] X. Chen, “Chatgpt and its possible impact on library reference services,”
Internet Reference Services Quarterly, pp. 1–9, 2023.
[59] T. Susnjak, “Chatgpt: The end of online exam integrity?” arXiv preprint
arXiv:2212.09292, 2022.
[60] A. Tlili, B. Shehata, M. A. Adarkwah, A. Bozkurt, D. T. Hickey,
R. Huang, and B. Agyemang, “What if the devil is my guardian angel:
Chatgpt as a case study of using chatbots in education,” Smart Learning
Environments, vol. 10, no. 1, pp. 1–24, 2023.
[61] M. Salvagno, F. S. Taccone, A. G. Gerli et al., “Can artificial intelligence
help for scientific writing?” Critical Care, vol. 27, no. 1, pp. 1–5, 2023.
[62] M. R. King and chatGPT, “A conversation on artificial intelligence,
chatbots, and plagiarism in higher education,” Cellular and Molecular
Bioengineering, pp. 1–2, 2023.
[63] S. A. Prieto, E. T. Mengiste, and B. G. de Soto, “Investigating the use
of chatgpt for the scheduling of construction projects,” arXiv preprint
arXiv:2302.02805, 2023.
[64] M. Dowling and B. Lucey, “Chatgpt for (finance) research: The bana-
narama conjecture,” Finance Research Letters, p. 103662, 2023.
[65] R. Gozalo-Brizuela and E. C. Garrido-Merchan, “Chatgpt is not all you
need. a state of the art review of large generative ai models,” arXiv
preprint arXiv:2301.04655, 2023.
[66] A. Baki Kocaballi, “Conversational ai-powered design: Chatgpt as
designer, user, and product,” arXiv e-prints, pp. arXiv–2302, 2023.
[67] H. Harkous, K. Fawaz, K. G. Shin, and K. Aberer, “Pribots: Conver-
sational privacy with chatbots,” in Workshop on the Future of Privacy
Notices and Indicators, at the Twelfth Symposium on Usable Privacy
and Security, SOUPS 2016, no. CONF, 2016.
[68] M. Hasal, J. Nowaková, K. Ahmed Saghair, H. Abdulla, V. Snášel, and
L. Ogiela, “Chatbots: Security, privacy, data protection, and social as-
pects,” Concurrency and Computation: Practice and Experience, vol. 33,
no. 19, p. e6426, 2021.
[69] E. Ruane, A. Birhane, and A. Ventresque, “Conversational ai: Social and
ethical considerations.” in AICS, 2019, pp. 104–115.
[70] N. Park, K. Jang, S. Cho, and J. Choi, “Use of offensive language in
human-artificial intelligence chatbot interaction: The effects of ethical
ideology, social competence, and perceived humanlikeness,” Computers
in Human Behavior, vol. 121, p. 106795, 2021.
[71] H. Beattie, L. Watkins, W. H. Robinson, A. Rubin, and S. Watkins,
“Measuring and mitigating bias in ai-chatbots,” in 2022 IEEE Inter-
national Conference on Assured Autonomy (ICAA). IEEE, 2022, pp.
117–123.
[72] Y. K. Dwivedi, N. Kshetri, L. Hughes, E. L. Slade, A. Jeyaraj, A. K. Kar,
A. M. Baabdullah, A. Koohang, V. Raghavan, M. Ahuja et al., ““so what
if chatgpt wrote it?” multidisciplinary perspectives on opportunities,
challenges and implications of generative conversational ai for research,
practice and policy,” International Journal of Information Management,
vol. 71, p. 102642, 2023.

You might also like