Preprints202303 0438 v1
Preprints202303 0438 v1
v1
Disclaimer/Publisher’s Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and1
contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting
from any ideas, methods, instructions, or products referred to in the content.
trained on a relatively small amount of data related to classification, segmentation, analysis, and natural language
the context of operation due to the high cost of data processing for radiology reports. The authors also discussed
labeling and training for large data sizes. In contrast, the future possibilities of using GPT models, such as in
ChatGPT has overcome these barriers by being trained personalized medicine and improving radiology education.
on a massive amount of data from the internet, with a Omar et al. [7] compared the effectiveness of ChatGPT
size of 570GB. This large-scale language model used and traditional question-answering methods for knowledge
in ChatGPT allows it to generate human-like responses graphs and discussed potential future directions for developing
that can mimic the tone, style, and humor of a human knowledge graph chatbots in their survey. First, an overview
conversation. Traditional chatbots often provide robotic of knowledge graphs and their applications and the current
and impersonal responses, which can lead to unsat- state of question-answering systems is presented. Then, the
isfactory interactions with users. However, ChatGPT’s authors compared the performance of ChatGPT and traditional
advanced language model and large-scale training allow question-answering methods on a benchmark dataset of knowl-
it to generate more natural and engaging responses. edge graph questions. The results demonstrated that ChatGPT
• Limited text generation ability: Traditional chatbot outperformed traditional methods regarding the accuracy and
systems often lack the flexibility and adaptability required naturalness of responses. Haleem et al. [8] presented an
to handle complex and dynamic natural language under- overview of ChatGPT and its importance. Moreover, various
standing and generation. They often rely on pre-written progressive workflow processes of the ChatGPT tool are
intents, responses, or templates, leading to repetitive, illustrated with diagrams. This survey further examined the
predictable, or irrelevant answers that fail to engage specific features and capabilities of ChatGPT as a support tool
users or meet their needs. Moreover, they struggle to and explored its significant roles in the current scenarios.
generalize to new, unseen data, limiting their usefulness The distribution of currently published articles through var-
in real-world scenarios where the topics and contexts can ious publishers is presented in bar graphs of Fig.2 and Fig.3.
vary widely. In contrast, ChatGPT leverages a powerful It demonstrates that limited literature is available on ChatGPT,
transformer architecture and a massive amount of training and there is still room for improvements and contributions for
data to generate high-quality and diverse text outputs upcoming researchers in this emerging research area.
that closely resemble human language. By learning pat-
45
terns and relationships between words and phrases from 41
38
40
various sources, ChatGPT can capture the nuances and
No of Published Articles
35
subtleties of different domains and produce relevant and 30 26
coherent responses even to open-ended or ambiguous 25
queries. 20 18
13
With the growing interest in ChatGPT, this paper aims 15
9
to provide a comprehensive review of the language model, 10
5
shedding light on its technical novelty and the reasons why it 1
0
has become a hot topic in the field of information technology. ArXiv Springer Wiley Nature Taylor & IEEE Elsevier
Additionally, we survey recent research papers published since Francis
Publisher
ChatGPT’s release, categorize them, and discuss their contri-
butions to the assessment and development of the ChatGPT Fig. 2. Distribution of selected articles based on publishers.
language model. Finally, we suggest some potential areas of
future research that could be explored in relation to ChatGPT.
Research Review Editorial Prespective Conference Book Chapter
40
B. Related Surveys 35
33
No of Published Articles
35
• We provide an in-depth analysis of the technical advance- core technology of Transformers. The Transformer architec-
ments and innovations that distinguish ChatGPT from ture was first introduced in the seminal paper "Attention is
its predecessors, including generative models and chatbot All You Need" by Vaswani et al. in 2017 [10], which has
systems. This analysis elucidates the underlying mecha- significantly impacted the deep learning research community,
nisms contributing to ChatGPT’s enhanced performance starting with sequential models and extending to computer
and capabilities. vision. In the next sub-section, we provide a detailed overview
• We develop a comprehensive taxonomy of recent Chat- of the Transformer technology and how it was leveraged in
GPT research, classifying studies based on their appli- GPT-3 for text generation.
cation domains. This classification enables a thorough 1) Transformers as Core Technology: Transformers refer
examination of the contributions and limitations present to the revolutionary core technology behind ChatGPT. Trans-
in the current literature. Additionally, we conduct a com- formers have transformed how sequence-to-sequence models
parative evaluation of emerging ChatGPT alternatives, are processed, significantly outperforming traditional models
highlighting their competitive advantages and drawbacks. based on recurrent neural networks. Although Transformers
• We identify and discuss the limitations and challenges are based on classical encoder-decoder architecture, it dra-
associated with ChatGPT, delving into potential areas of matically differs in integrating the concept of self-attention
improvement and unexplored research opportunities. This modules, which excels in capturing long-term dependencies
discussion paves the way for future advancements in the between the elements (i.e., tokens) of the input sequence.
field, guiding researchers and practitioners in addressing It leverages this information to determine each element’s
the current gaps in ChatGPT research and applications. importance in the input sequence efficiently. The importance
This paper is structured as follows: Section II delves into of each element is determined through the self-attention mech-
ChatGPT’s background and technological innovations, such anism, which computes a weight for each element based on
as the fusion of transformers and reinforcement learning with its relevance to other tokens in the sequence. This enables
human feedback. Section III examines ChatGPT competitors Transformers to handle variable-length sequences better and
and their comparative analysis. Section IV highlights emerging capture complex relationships between the sequence elements,
ChatGPT applications and research by summarizing pertinent improving performance on various natural language processing
studies. Section V outlines challenges and future directions, tasks. Another critical feature is positional embedding that
and Section VI concludes the paper. helps transformers learn the positional information of tokens
within the sequence. It allows differentiating between tokens
II. BACKGROUND AND MAIN CONCEPTS with the same contents but at different positions, which pro-
vides better context representation that improves the models’
A. GPT-3 Model: Leveraging Transformers accuracy. These features represent a significant strength in
The release of the GPT-3 model family by OpenAI has set ChatGPT for providing accurate natural language generation,
the bar very high for its direct competitors, namely Google as compared to its peer, particularly with being trained on large
and Facebook. It has been a major milestone in developing datasets of 570 GB of Internet data.
natural language processing (NLP) models. The largest GPT- In general, a transformer comprises three featured modules:
3 configuration comprises 175 billion parameters, including (i.) Encoder-Decoder module, (ii.) Self-Attention module, (iii.)
96 attention layers and a batch size of 3.2 million training Positional Embedding module.
samples. GPT-3 was trained using 300 billion tokens (usually In the following sub-section, we will present the core
sub-words) [9]. functionalities of these modules.
The training process of GPT-3 builds on the successful 2) Encoder-Decoder Architecture : When Transformers
strategies used in its predecessor, GPT-2. These strategies architecture was first designed in [10] as shown in Figure
include modified initialization, pre-normalization, and reverse 4, they were applied in machine translation, where an input
tokenization. However, GPT-3 also introduces a new refine- sequence of an initial language is transformed to the output
ment based on alternating dense and sparse attention patterns sequence of the target language. The Transformer architecture
[9]. followed an encoder-decoder model, where the encoders map
GPT-3 is designed as an autoregressive framework that can a discrete representation of the input sequence (i.e., words,
achieve task-agnostic goals using a few-shot learning paradigm characters, sub-words) to a continuous representation denoted
[9]. The model can adapt to various tasks with minimal as an embedding vector (i.e., a vector of continuous values).
training data, making it a versatile and powerful tool for NLP The decoder takes embeddings as input and generates an
applications. output sequence of elements one at a time. As the transformer
To cater to different scenarios and computational resources, is an autoregressive generative model, it predicts the proba-
OpenAI has produced GPT-3 in various configurations. Table bility distribution of the next element in the output sequence,
II-A5 summarizes these configurations, which range from a given the previous sequence, which can be seen as a special
relatively small 125 million parameter model to the largest case of Hidden Markov Chains (HMMs). However, HMMs
175 billion parameter model. This allows users to choose a do not have the ability to capture long-term dependencies
model that best fits their needs and resources. bidirectionally as transformers do.
All GPT (Generative Pre-trained Transformer) models, in- Unlike other models that use both an encoder and a decoder
cluding the most recent GPT-3 model, are built based on the or an encoder only, like the Bert model family from Google
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
4
vectors are used to measure the similarity to the query returns the output matrix Z. The softmax function is applied
vector. row-wise to the scaled dot product of Q and K matrices,
• The Value (V ): This vector represents all tokens in the which are divided by the square root of the key dimension dk .
input sequence. The value vectors are used to compute The mask is applied in Step 2 before computing the weighted
the weighted sum of the elements in the sequence, where sum in Step 3. The element-wise multiplication of the mask
the weights are determined by the attention weights and attention scores sets the attention scores corresponding to
computed from the query and key vectors through a dot- masked tokens to zero, ensuring that the model does not attend
product. to those positions. The resulting matrix A is used as weights
In summary, The dot product between the query vectors and to compute the weighted sum of V matrix, resulting in the
the key vectors results in the attention weights, also known output matrix Z.
as the similarity score. This similarity score is then used to 4) Multi-Head Attention: Traditional NLP tasks, particu-
compute attention weights determining how much each value larly ChatGPT, deal with vast and complex data. Therefore,
vector contributes to the final output. using only one attention head may not be sufficient for
Formally, the self-attention module is expressed as follows: capturing all relevant information in a sequence. Multi-head
Attention allows for parallel processing since the self-attention
QK T
operation is applied across multiple heads. This can result in
Attention(Q, K, V) = softmax √ V = AV faster training and inference times than a single-head self-
Dk
attention mechanism.
Where Q ∈ RN ×Dk , K ∈ RM ×Dk , and V ∈ RM ×Dv are Multi-Head Attention also captures multiple between the
the packed matrix representations of queries, keys, and values, query and the key-value pairs in the input sequence, which
respectively. N and M denote the lengths of queries and enables the model to learn complex patterns and dependencies
keys (or values), while Dk and Dv denote the dimensions of in the data. It also helps increase the model’s capacity to learn
keys (or queries) and values, respectively.
√ The dot-products of advanced relationships over large data.
queries and keys are divided by Dk to alleviate the softmax Formally, the multi-head attention function can be repre-
function’s gradient vanishing problem, control the magnitude sented as follows:
of the dot products, and improve generalization. This is known
as a scaled dot product. The result of the attention mechanism
is given by the matrix multiplication of A and V . A is often MultiHead(Q, K, V ) = Concat(head1 , ..., headh )W O
called the attention matrix, and softmax is applied row-wise.
where headi = Attention(QWiQ , KWiK , W ViV ), and
In some cases, a mask could be applied relevance score
WiQ ∈ Rdmodel×dk , WiK ∈ Rdmodel ×dk , W ViV ∈ Rdmodel ×dv ,
matrix (query to key dot product) to consider specific de-
and W O ∈ Rhdv ×dmodel .
pendency patterns between the tokens. For example, in text
5) Positional Embedding: If only the data without its
generation, like ChatGPT, the self-attention module uses a
order is considered, then the self-attention mechanism used
mask that returns the lower triangular part of a tensor input,
in transformers becomes permutation invariant, which pro-
with the elements below the diagonal filled with zeros. This
cesses all tokens equally without considering their positional
helps capture the dependency of a token with all previous
information. This may result in the loss of important semantic
tokens and not with future ones, which is the partner needed
information as the importance of each token with respect to
in text generation.
other tokens in the sequence is not captured. Therefore, it is
The algorithm of a self-attention module (with mask) is
necessary to leverage position information to capture the order
presented in Algorithm 1.
and importance of tokens in the sequence.
Algorithm 1 Self-Attention Module with Mask To address the issue of losing important position infor-
Require: Q, K, and V matrices of dimensions n×dk , m×dk , mation, the transformer model creates an encoding for each
and m × dv , respectively position in the sequence and adds it to the token before
Ensure: Z matrix of dimension n × dv passing it through the self-attention and feedforward layers.
This allows the model to capture the importance of each token
T dot product of Q and K
1: Step 1: Compute the scaled
concerning others, considering its position in the sequence.
matrices: A ← softmax QK √
dk In the transformer architecture of ChatGPT, the positional
2: Step 2: Apply the mask to the computed attention scores
embedding is added to the input embeddings at the entrance of
(if applicable): the decoder. The positional embedding is expressed as follows.
3: if mask is not None then
Given a sequence of inputs x1 , x2 , ..., xn , the position
4: A ← A ⊙ mask {Element-wise multiplication}
embedding matrix E ∈ Rn×d is calculated as:
5: end if
6: Step 3: Compute the weighted sum of V matrix using A
(
sin 10000i 2k/d , if k is even
matrix as weights: Z ← AV Posi =
cos 10000i 2k/d , if k is odd
7: Step 4: Return the final output matrix Z.
This equation calculates the value of the position embedding
Note that the algorithm takes in matrices Q, K, and V , Posi at position i, given the maximum sequence length d and
which are the query, key, and value matrices, respectively, and the embedding size k. The value of k determines whether the
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
6
rewards (could be negative depending on the action taken). Using human feedback and neural-based reward function
The possible actions and rewards are usually defined using a estimators, Eqs. 2-3 can be solved using advanced policy
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
7
Model Name No. of params No. of layers Embedding size No. of heads Head size Batch size Learning rate
1
loss(θ) = − 2
E(x,yw ,yl ∼D) [log (σ (rθ (x, yw ) − rθ (x, yl )))] (4)
Fig. 8. Differences between AI-based and human-based text summarization K
processes.
Where rθ (x, yw ) and rθ (x, yl ) represent the outputs of the
reward module the input text, preferred and less-preferred
gradient algorithms to extract a policy that the dialogue agent summarizations by the human labeler, respectively. The set
will execute to follow user intentions and achieve the preset of summarizations labeled by a human is denoted by D.
HHH goals [18]–[21]. It is worth noting that the OpenAI team has treated each
2
possible K comparison as a separate training sample that
2) State-of-the-art related to RLHF-based dialogue sys-
will contribute to K − 1 different gradient updates [14].
tems: Over the last few years, three main players have
emerged in the field of RLHF-based dialogue systems, in-
cluding OpenAI [14], Deepmind [22], and Anthropic [23]. In
their attempt to teach dialogue systems to summarize texts, the
OpenAI team [24] reported one of the earliest success stories
in LRHF adoption in the field of natural language processing
(NLP). In this work, varying-size sentences are summarized
considering human preferences in terms of summary accuracy,
coverage, and coherence [24]. Fig. 8 depicts the main differ-
ences between AI-based and human-based text summarization
processes.
By treating the summarization model as an RL agent,
Stiennon et al. [24] devised a strategy consisting of policy Fig. 10. Reward module training.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
8
H. Neeva AI
C. Jasper Chat
Neeva is an AI-powered search engine that provides users
Jasper has been a prominent player in the AI content with a more convenient and user-friendly approach to web
generation industry and has garnered considerable acceptance searches. It allows users to search for various topics, including
from its users. Apart from its content creation capabilities and recipes, gift ideas, etc., and provides answers without requiring
other offerings, Jasper has recently introduced a chatbot named users to navigate multiple search results. Unlike traditional
Jasper Chat. This alternative to ChatGPT is based on GPT 3.5 search engines, Neeva delivers a single, synthesized answer
and other language models and has partnered with OpenAI summarizing the most relevant sites to a query [?]. Addition-
[27]. However, unlike ChatGPT, which is accessible to anyone, ally, Neeva embeds references and citations directly in the
this chatbot has been specifically developed for businesses in answer, making it easy for users to verify the credibility and
advertising, marketing, and more domains. reliability of the sources cited.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
9
along with the limitations, risks, and societal implications of discussed the ethical implications of AI-enabled personalized
large language models. The paper aims to provide a reference medicine and how to best utilize advanced AI technologies
point for evaluating the progress of chatbots like ChatGPT while minimizing risk. However, this article failed to provide
over time and to assist researchers and developers in enhancing adequate information about the potential risks of using AI
future language models and chatbots. However, it lacks an in- in medical applications. Additionally, it didn’t provide any
depth analysis of the problems associated with ChatGPT. It concrete evidence or examples of successful applications of
does not address the underlying issues that cause the failures AI in medical contexts. Antaki et al. [45] conducted a study to
or provide any recommendations for improving the system. evaluate the performance of ChatGPT in ophthalmology. The
This article [41] explored the performance of ChatGPT on outcomes indicated that ChatGPT could provide a low-cost,
aspect-based and query-based text summarization tasks. The accurate, personalized solution for ophthalmology consulta-
study evaluated ChatGPT’s performance on four benchmark tions. Furthermore, ChatGPT can accurately detect ophthalmic
datasets, encompassing various summaries from Reddit posts, diseases and create treatment plans consistent with guide-
news articles, dialogue meetings, and stories. The authors lines. However, this article only evaluated the performance of
reported that ChatGPT’s performance is comparable to tradi- ChatGPT in ophthalmology and did not consider its potential
tional fine-tuning methods regarding Rouge scores. However, applications in other medical specialties. Additionally, it didn’t
the study relies solely on Rouge scores to evaluate the perfor- provide any concrete recommendations on the performance
mance of ChatGPT, which may not be a sufficient indicator of improvement of ChatGPT. Jeblick et al. [3] used a ChatGPT-
summarization quality. The authors acknowledged this limita- based natural language processing system to simplify radiol-
tion and planned to conduct human evaluations shortly. Jiao ogy reports. The study found that the system can achieve high
et al. [42] provided a preliminary study on the performance of accuracy in terms of both content understanding and grammat-
ChatGPT for machine translation tasks, including translation ical correctness. Furthermore, with improved readability and
prompt, multilingual translation, and translation robustness. less medical jargon, the system could generate easier reports
The authors explored the effectiveness of different prompts, for laypeople to understand. However, this study has a few
evaluated the performance of ChatGPT on different language shortcomings. First, the authors only used a small sample
pairs, and investigated its translation robustness. In addition, size and did not comprehensively analyze the data collected.
the authors proposed an interesting strategy named pivot Second, this study did not assess the accuracy of the simplified
prompting that significantly improves translation performance reports produced by ChatGPT. Third, it didn’t provide any
for distant languages. However, the study results may not information on the potential impact of this technology on
be fully reliable due to the randomness in the evaluation medical care or patient outcomes.
process. The paper also does not provide a detailed analysis of The article [46] focuses on the potential of AI-assisted
the factors affecting the translation performance of ChatGPT. medical education using large language models. The authors
Kocon et al. [43] evaluated the capabilities of the ChatGPT on test the performance of ChatGPT on the USMLE, a compre-
25 diverse analytical NLP tasks, most of which are subjective hensive medical board examination, and an internal medical
in nature, including sentiment analysis, emotion recognition, knowledge base. The results show that ChatGPT outperforms
offensiveness and stance detection, natural language inference, current state-of-the-art models in both tasks. The authors
word sense disambiguation, linguistic acceptability, and ques- concluded that large language models have great potential
tion answering. The authors automated ChatGPT’s querying for use in AI-assisted medical education and could help drive
process and analyzed more than 38k responses, comparing personalized learning and assessment. This article has a few
its results with state-of-the-art solutions. However, the study shortcomings. First, the sample size used in the study was
does not directly compare ChatGPT’s performance with other small, as only seven medical students completed the study.
chatbot models. The authors also acknowledge ChatGPT’s This means that the results may not be representative of a
biases but didn’t provide a detailed analysis of these biases. larger population. The article does not discuss the ethical
Additionally, the study’s focus on analytical NLP tasks may implications of using AI-assisted technology in medical ed-
not fully reflect ChatGPT’s performance in more practical ucation. Finally, the article provides no evidence that AI-
settings, such as chatbot interactions. assisted medical education is superior to traditional methods.
The aforementioned discussion related to the applications Dahmen et al. [47] analyzed the potential of ChatGPT as
of ChatGPT in NLP is summarized in Table II. an AI-based tool to assist medical research. It evaluated the
opportunities and challenges that ChatGPT brings, such as
its ability to streamline research by eliminating manual labor
B. Healthcare and providing real-time access to the latest data, as well as
ChatGPT has the potential to revolutionize the healthcare its potential to introduce bias and reduce accuracy. It also
industry by improving patient outcomes, reducing costs, and provided recommendations on how to best utilize and regulate
facilitating more efficient and accurate diagnosis and treat- the use of ChatGPT in medical research. However, the main
ment. shortcoming of this article is that it relies heavily on one
Mann et al. [44] examined AI’s potential role in translational study, which is insufficient to establish definitive conclusions
medicine. It highlighted the need for further research into about the potential of AI bot ChatGPT in medical research.
personalized medicine and the potential for AI-driven data Additionally, the article does not provide enough detail about
analysis to be used in clinical decision-making. Furthermore, it the methods and results of the study, which limits the ability
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
11
TABLE I
A C OMPARATIVE A NALYSIS OF C HAT GPT- RELATED W ORKS IN NLP
[35]
• Analyzed and evaluated the perfor- • A novel evaluation framework for • Insufficient evidence for the proposed
mance of ChatGPT in three tasks conversational agents evaluation methods
• Identified areas for improvement and • Lack of details on training data and
potential applications evaluation metrics
[39]
• Explored linguistic ambiguity in • Disambiguation tasks through • Did not address linguistic features or
ChatGPT and its relevance to NLP context-based analysis resource limitations
• valuable insights for using • Did not propose new solutions for
transformer-based language models addressing linguistic ambiguity
in NLP systems
[40]
• Analyzes ChatGPT’s limitations and • Provides relatable Twitter examples • Lacks recommendations for improve-
risks as a reference for evaluating chatbot ment and supporting evidence
progress
[41]
• ChatGPT’s performance on aspect- • Identifies gaps in research • Relies solely on Rouge scores for
based and query-based summarization evaluation
tasks
[42]
• Investigates ChatGPT’s machine • Discusses insights, strategies, and • Limited test data coverage
translation performance limitations in evaluating ChatGPT’s • Lack of detailed analysis of transla-
translation performance tion performance
[43]
• Evaluates ChatGPT’s performance on • Highlights ChatGPT’s strengths and • Lacks direct comparison with other
25 diverse NLP tasks weaknesses chatbot models
• Analyzes ChatGPT’s results with • Discusses ethical implications • Acknowledgement of biases without a
state-of-the-art solutions • Automates the querying process for detailed analysis
increased efficiency
[38]
• ChatGPT’s zero-shot learning perfor- • Empirical studies • Lacks analysis on fine-tuning and eth-
mance compared with GPT-3.5 on • Qualitative case studies ical considerations
various NLP tasks • Zero-shot performance comparisons
to draw meaningful conclusions. King et al. [48] presented accuracy and time spent. Additionally, the study discussed the
a perspective on the future of artificial intelligence (AI) in strong potential of ChatGPT for future applications in surgical
medicine. It discusses the potential of AI to improve the practice. However, This article lacks empirical evidence to
accuracy and efficiency of medical diagnostics, reduce costs, support its claims. Additionally, the study was limited in
and improve access to medical care in parts of the world scope as it was conducted in one hospital and did not include
without adequate healthcare infrastructure. The authors also a control group. Furthermore, the study did not consider
explored the ethical considerations of using AI in medicine, the long-term effects of ChatGPT on patient satisfaction or
such as privacy, data security, and regulatory compliance. outcomes, leaving these questions unanswered.
However, the article does not provide an in-depth analysis
The editorial [50] explored the potential impact of the
of the ethical implications of using AI in health care or the
ChatGPT language model on medical education and research.
potential risks associated with its use. Additionally, the article
It discusses the various ways ChatGPT can be used in the
does not discuss how AI can be used to improve patient care
medical field, including providing medical information and
or how it could be used to develop new medical treatments.
assistance, assisting with medical writing, and helping with
Bhattacharya et al. [49] evaluated the performance of a novel
medical education and clinical decision-making. However,
deep learning and ChatGPT-based model for surgical practice.
the editorial didn’t provide new research data or empirical
The proposed model can understand natural language inputs
evidence to support its claims. Instead, it primarily relies
such as queries and generate relevant answers. The outcomes
on anecdotal evidence and expert opinions, which may not
demonstrated that ChatGPT outperformed baseline models in
represent the wider medical community’s views. This case
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
12
study in [51] explored the potential of generative AI in benefit society and maintain public safety. However, this article
improving the translation of environmental health research to didn’t consider the potential impacts of using large generative
non-academic audiences. The study submitted five recently AI models on privacy, data security, data ownership, and other
published environmental health papers to ChatGPT to generate ethical considerations. Additionally, it didn’t provide concrete
summaries at different readability levels. It evaluated the proposals or recommendations to effectively regulate large
quality of the generated summaries using a combination of generative AI models.
Likert-scale, yes/no, and text to assess scientific accuracy, Khalil et al. [56] explored the potential of using a GPT-
completeness, and readability at an 8th-grade level. However, based natural language processing (NLP) model for detect-
this study is limited in scope as it only evaluates the quality ing plagiarism. The authors presented the ChatGPT model,
of generated summaries from five recently published envi- trained on a large corpus of text to generate contextualized,
ronmental health papers. The study acknowledges the need paraphrased sentences. The experimental outcomes report that
for continuous improvement in generative AI technology but the model could identify plagiarism with an accuracy of
does not provide specific recommendations for improvement. 88.3% and detect previously unseen forms of plagiarism with
Wang et al. [Ref 33] investigated the effectiveness of using the an accuracy of 76.2%. However, the article does not offer
ChatGPT model for generating Boolean queries for systematic an alternate solution to the problem of plagiarism detection.
review literature searches. The study compared ChatGPT with Additionally, there is not enough discussion around the ethical
state-of-the-art methods for query generation and analyzed the implications of using ChatGPT to detect plagiarism. Zhuo et
impact of prompts on the effectiveness of the queries produced al. [57] presented a comprehensive exploration and catalog
by ChatGPT. However, the model’s MeSH term handling is of ethical issues in ChatGPT. The authors analyze the model
poor, which may impact the recall of the generated queries. In from four perspectives: bias, reliability, robustness, and tox-
addition, the study is limited to standard test collections for icity. In addition, they benchmark ChatGPT empirically on
systematic reviews, and the findings may not be generalizable multiple datasets and identify several ethical risks that existing
to other domains. Kurian et al. [52] highlighted the role of benchmarks cannot address. The paper also examined the
ChatGPT and its potential to revolutionize communication. implications of the findings for the AI ethics of ChatGPT and
It also sheds light on the issue of HPV vaccination and future practical design considerations for LLMs. However, the
the role of oral health care professionals in promoting it. In article does not provide a quantitative analysis of the identified
addition, the article suggested that training and education tools ethical issues in ChatGPT, which may cause a more nuanced
can improve healthcare providers’ willingness and ability to understanding of the risks posed by the model.
recommend the vaccine. However, the article doesn’t provide The aforementioned discussion related to the applications
in-depth analysis or research on either topic and relies heavily of ChatGPT in ethics is summarized in Table IV.
on information provided by external sources. It also does not
address potential concerns or criticisms of using AI chatbots D. Education
or the HPV vaccine. ChatGPT has the potential to enhance the quality of ed-
The aforementioned discussion related to the applications ucation by providing personalized, student-centered learning
of ChatGPT in healthcare is summarized in Table III. experiences that can help improve learning outcomes and
promote student success.
Frieder et al. [4] discussed the capabilities of ChatGPT
C. Ethics in education. The authors explored three key areas: First,
ChatGPT has been widely discussed in the field of ethics the ability of ChatGPT to generate mathematically valid
due to its potential impact on society. natural language statements; second, its ability to answer
Graf et al. [54] discussed the implications of using ChatGPT mathematics-related queries; and third, its ability to solve math
in research. It emphasized the importance of responsible re- problems. The experimental outcomes indicate that ChatGPT
search that adheres to ethical, transparent, and evidence-based outperformed other models in generating mathematically valid
standards. It also highlighted the potential to use ChatGPT for statements and answering math questions. However, this article
more specific and in-depth research, such as understanding the is largely theoretical and does not provide empirical evidence
impact of implicit biases and the potential risks associated with to prove the claims. Additionally, the authors do not provide
its use. The major shortcomings of this article are a lack of any evaluation metrics or performance results to demonstrate
discussion regarding the ethical implications of using ChatGPT the effectiveness of the proposed approach. Chen et al. [58]
in research. Additionally, it does not address the potential risks analyzed the potential impact of ChatGPT on library reference
associated with ChatGPT, and there is limited discussion of services. The article suggested that ChatGPT could help to
the potential benefits of using the technology. Hacker et al. reduce workloads, improve response times, provide accurate
[55] presented a brief discussion on various issues regulating and comprehensive answers, and offer a way to answer com-
large generative AI models, such as ChatGPT. The authors plex questions. Additionally, it argued that ChatGPT could
explored the legal, economic, and ethical considerations of enhance the user experience and transform library reference
regulating the use of such models, as well as proposed a services. However, the article does not offer sufficient details
regulatory framework for the safe and responsible use of about the potential advantages and disadvantages of using
ChatGPT. Finally, they concluded that a strong regulatory ChatGPT. Furthermore, the article does not provide any infor-
framework is necessary to ensure that ChatGPT is used to mation about the actual implementation of ChatGPT in library
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
13
TABLE II
A C OMPARATIVE A NALYSIS OF C HAT GPT- RELATED W ORKS IN H EALTHCARE
[3]
• ChatGPT-based NLP systems can ac- • ChatGPT-based Chatbot to simplify • Small sample size
curately simplify radiology reports radiology reports for better under- • Lack of accuracy assessment
standing • No information on potential impact
• Provides a case study of existing
methods and limitations
[46]
• ChatGPT in AI-assisted medical edu- • ChatGPT’s potential is more accurate • Small sample size
cation and faster AI model • Lacks evidence of the superiority of
• Discusses ethical concerns related to AI-assisted medical education over
its implementation traditional methods
[51]
• ChatGPT used to summarize environ- • ChatGPT improves research transla- • Does not provide improvement rec-
mental health research tion for a wider audience. ommendations.
• ChatGPT summaries aid communica- • Ignores ethical concerns.
tion.
[53]
• Examines ChatGPT’s query genera- • Boolean queries for systematic re- • The model’s MeSH term handling
tion effectiveness views effective generation using Chat- may affect recall
• Compares with state-of-the-art meth- GPT
ods.
TABLE III
A C OMPARATIVE A NALYSIS OF C HAT GPT- RELATED W ORKS IN E THICS
[56]
• Detecting plagiarism better than exist- • ChatGPT as a plagiarism detection • Insufficient evidence
ing methods, with 88.3% accuracy tool • Inadequate ethical discussion regard-
ing ChatGPT’s plagiarism detection
[57]
• Analyzes ChatGPT biases, reliability, • Identifies new ethical risks • No quantitative analysis
robustness, and toxicity. • Fills gaps in previous research • Limited examination of other LLMs
• Identifies ChatGPT ethical risks
• Examines implications for AI ethics
and future LLM design
reference services. Susnjak et al. [59] evaluated the ability of Tlili et al. [60] presented a case study of ChatGPT in the
ChatGPT to perform high-level cognitive tasks and produce context of education. The study examined the public discourse
text indistinguishable from the human-generated text. The surrounding ChatGPT, its potential impact on education, and
study shows that ChatGPT can exhibit critical thinking skills users’ experiences in educational scenarios. It also identified
and generate highly realistic text with minimal input, making various issues, including cheating, honesty and truthfulness
it a potential threat to the integrity of online exams. The of ChatGPT, privacy misleading, and manipulation. However,
study also discussed online exams’ challenges in maintaining this study only examines the experiences of a small number
academic integrity, the various strategies institutions use to of users in educational scenarios, and the findings may not
mitigate the risk of academic misconduct, and the potential be generalizable to other contexts. In addition, it does not
ethical concerns surrounding proctoring software. However, provide a comprehensive analysis of the ethical implications of
the study mainly focuses on the potential threat of ChatGPT using ChatGPT in education, which could be an area for future
to online exams and does not provide an in-depth analysis of research. This editorial in [61] explored the potential use of an
other potential risks associated with online exams. AI chatbot, ChatGPT, in scientific writing. It highlighted the
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
14
ability of ChatGPT to assist in organizing material, generating making it less suitable for readers interested in generative AI’s
an initial draft, and proofreading. The paper discussed the technical aspects. Additionally, the paper does not critically
limitations and ethical concerns of using ChatGPT in scientific analyze generative AI models’ ethical and social implications.
writing, such as plagiarism and inaccuracies. However, the The case study [66] examined the use of ChatGPT in a human-
article does not provide empirical evidence or case studies centered design process. The study aims to explore the various
to support its claims. The potential benefits and limitations of roles of fully conversational agents in the design process and
using ChatGPT in scientific writing are discussed theoretically, understand the emergent roles these agents can play in human-
but there is no practical demonstration of how this tool can AI collaboration. ChatGPT is used to simulate interviews with
assist scientific writing. Therefore, the paper lacks practical fictional users, generate design ideas, simulate usage scenarios,
examples that can help readers better understand the potential and evaluate user experience for a hypothetical design project
of ChatGPT in scientific writing. King et al. [62] briefly on designing a voice assistant for the health and well-being of
discussed the history and evolution of AI and chatbot technol- people working from home. However, this study is limited by
ogy. It explored the growing concern of plagiarism in higher its focus on a single hypothetical design project. As a result,
education and the potential for chatbots like ChatGPT to be the generalizability of the findings to other design contexts
used for cheating. Additionally, it provides suggestions for may be limited. The study also relies on subjective evaluations
ways college professors can design assignments to minimize of ChatGPT’s performance, which may be influenced by the
potential cheating via chatbots. However, it doesn’t delve into authors’ biases or expectations.
the potential benefits of using chatbots in higher education. The aforementioned discussion related to the applications
The aforementioned discussion related to the applications of ChatGPT in the industry is summarized in Table VI.
of ChatGPT in education is summarized in Table V.
V. C HALLENGES AND F UTURE D IRECTIONS
A. Challenges
E. Industry
• Data Privacy and Ethics: The challenge of Data Privacy
ChatGPT has numerous potential applications in the indus- and Ethics for ChatGPT is complex and multifaceted.
try, particularly in customer service and marketing areas. It can One aspect of this challenge is related to data privacy,
improve efficiency, increase customer satisfaction, and provide which involves protecting personal information collected
valuable insights to companies across various industries. by ChatGPT. ChatGPT relies on vast amounts of data to
Prieto et al. [63] investigated the potential of using ChatGPT train its language model, which often includes sensitive
to automate the scheduling of construction projects. The data user information, such as chat logs and personal details.
mining techniques were used to extract scheduling information Therefore, ensuring that user data is kept private and
from project specifications and histories. The results showed secure is essential to maintain user trust in the technology
that the algorithm could accurately predict the project timeline [67], [68].
and duration with an average accuracy of 81.3%. However, the Another aspect of the Data Privacy and Ethics challenge
article lacks detailed information regarding implementing the for ChatGPT is related to ethical considerations. Chat-
ChatGPT interface and its effects on the scheduling process, as GPT has the potential to be used in various applica-
well as the lack of reliable data to support the authors’ claims tions, including social media, online communication, and
that the scheduling process is improved by incorporating Chat- customer service. However, the technology’s capabilities
GPT. Graf et al. [64] discussed the applications of ChatGPT also pose ethical concerns, particularly in areas such as
in the finance industry. First, it looked at the potential of using spreading false information and manipulating individuals
machine learning to facilitate the analysis of financial data, as [69], [70]. The potential for ChatGPT to be used mali-
well as its potential applications in finance. Next, it presented ciously highlights the need for ethical considerations in
the "Bananarama Conjecture", which suggests that ChatGPT its development and deployment.
can provide greater insights into financial research and data To address these challenges, researchers and develop-
analysis than conventional methods. However, the authors limit ers need to implement robust data privacy and security
their scope to the Bananarama Conjecture, a relatively narrow measures in the design and development of ChatGPT.
field that may not capture the breadth of the finance industry. This includes encryption, data anonymization, and access
Furthermore, they didn’t provide a detailed discussion of their control mechanisms. Additionally, ethical considerations
findings’ implications, making it difficult to draw conclusions should be integrated into the development process, such
from the research. as developing guidelines for appropriate use and ensuring
Gozalo et al. [65] provided a concise taxonomy of recent technology deployment transparency. By taking these
large generative models of artificial intelligence, their sectors steps, the development and deployment of ChatGPT can
of application, and their implications for industry and society. proceed ethically and responsibly, safeguarding users’
The article described how these models generate novel content privacy and security while promoting its positive impact.
and differ from predictive machine learning systems. The study • Bias and Fairness: Bias and fairness are critical issues
also highlights the limitations of these models, such as the related to developing and deploying chatbot systems
need for enormous datasets and the difficulty in finding data like ChatGPT. Bias refers to the systematic and unfair
for some models. However, the paper does not provide a de- treatment of individuals or groups based on their per-
tailed technical explanation of the models or their architecture, sonal characteristics, such as race, gender, or religion.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
15
TABLE IV
A C OMPARATIVE A NALYSIS OF C HAT GPT- RELATED W ORKS IN E DUCATION
[58]
• Improve and transform library refer- • ChatGPT impact analysis • Lacks details on ChatGPT benefits,
ence services • Benefits and implications discussion implementation, and ethics
[59]
• ChatGPT’s potential threat to online • ChatGPT’s threat to online exam in- • Lack of thorough analysis of the po-
exam integrity tegrity tential threat to online exams
• Strategies for maintaining academic • Emphasize the need for more research
integrity on this issue
[60]
• ChatGPT issues in education • Insights into ChatGPT’s use in educa- • Small sample size
tion • Limited analysis of ethics
TABLE V
A C OMPARATIVE A NALYSIS OF C HAT GPT- RELATED W ORKS IN I NDUSTRY
[63]
• ChatGPT’s for construction project • Data mining solutions • Incomplete information on ChatGPT
scheduling • Time-saving solutions implementation
• Lack of reliable data
• No details on application to other
projects
[66]
• ChatGPT’s potential and limitations • Insights into benefits and limitations • Generalizability limitations
in design process exploration in the design process • Objective evaluations
• Lack of analysis of ethical implica-
tions
In chatbots, bias can occur in several ways [71], [72]. any particular group. The development and deployment
For example, biased language models can lead to bi- of chatbots must be done with a clear understanding of
ased responses that perpetuate stereotypes or discriminate the ethical considerations and a commitment to uphold
against certain groups. Biased training data can also result principles of fairness and non-discrimination.
in a chatbot system that provides inaccurate or incomplete • Robustness and Explainability: Robustness and ex-
information to users. plainability are two critical challenges that must be ad-
Fairness, on the other hand, relates to treating all users dressed when deploying ChatGPT in real-world applica-
equally without discrimination. Chatbots like ChatGPT tions [7].
must be developed and deployed in a way that pro- Robustness refers to the ability of ChatGPT to maintain
motes fairness and prevents discrimination. For instance, high performance even when faced with unexpected in-
a chatbot must provide equal access to information or puts or perturbations. In other words, robustness ensures
services, regardless of the user’s background or personal that the model’s predictions are reliable and consistent
characteristics. across different contexts. For example, if ChatGPT is
To address bias and fairness concerns, developers of used to generate responses in a chatbot, it must be robust
chatbots like ChatGPT must use unbiased training data to diverse language styles, accents, and topics to provide
and language models. Additionally, the chatbot system accurate and relevant responses.
must be regularly monitored and audited to identify and However, achieving robustness is challenging as the
address any potential biases. Fairness can be promoted by model’s performance can be affected by various fac-
ensuring that the chatbot system provides equal access to tors such as data quality, model architecture, training
information or services and does not discriminate against algorithms, and hyperparameters. In addition, adversarial
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
16
attacks can also compromise the robustness of ChatGPT. techniques that can learn from limited amounts of data,
Adversarial attacks refer to deliberately manipulating such as transfer learning, domain adaptation, and cross-
inputs to mislead the model’s predictions. Therefore, lingual learning. Another approach is to develop new
robustness is a crucial challenge that must be addressed data augmentation techniques that can generate synthetic
to ensure the reliability and trustworthiness of ChatGPT. data to supplement the limited labeled data available
On the other hand, explainability refers to the ability for low-resource languages. Additionally, researchers can
of ChatGPT to provide transparent and interpretable investigate new evaluation metrics and benchmarks that
explanations for its predictions. Explainability is crucial are specific to low-resource languages.
for building trust and accountability, especially in critical Developing ChatGPT models that can effectively process
applications such as healthcare and finance. For example, low-resource languages is a crucial area of research for
suppose ChatGPT is used to diagnose a medical condi- the future. It has the potential to enable access to informa-
tion. In that case, it must be able to provide transparent tion and communication for communities that have been
and interpretable explanations for its diagnosis to ensure historically marginalized due to language barriers.
that healthcare professionals and patients can understand • Domain-Specific Language Processing: Domain-
and trust its decisions. specific language processing refers to developing
However, explainability is also challenging as deep learn- and applying language models trained on text data
ing models such as ChatGPT are often seen as black from specific domains or industries, such as healthcare,
boxes, making it difficult to understand how they arrive at finance, or law. ChatGPT, with its remarkable capabilities
their decisions. Recent advances in explainable AI (XAI) in natural language processing, has the potential to be
have proposed techniques such as attention mechanisms applied to various domains and industries to improve
and saliency maps to provide interpretable explanations communication, decision-making, and automation.
for deep learning models. Therefore, explainability is a One potential future direction for ChatGPT is to develop
critical challenge that must be addressed to ensure the domain-specific language models that can understand and
transparency and accountability of ChatGPT. generate text specific to a particular domain or industry.
This would involve training the model on large amounts
of domain-specific text data and fine-tuning the model
B. Future Directions to the specific language and terminology used in that
• Multilingual Language Processing: Multilingual Lan- domain.
guage Processing is a crucial area for future work related Another future direction is to develop ChatGPT models
to ChatGPT [35]. Despite ChatGPT’s impressive perfor- that can transfer knowledge from one domain to another,
mance in English language processing, its effectiveness allowing for more efficient training and adaptation of lan-
in multilingual contexts is still an area of exploration. To guage models. This involves developing transfer learning
address this, researchers may explore ways to develop and techniques enabling the model to generalize from one
fine-tune ChatGPT models for different languages and domain to another while preserving the domain-specific
domains and investigate cross-lingual transfer learning language and context. Domain-specific language models
techniques to improve the generalization ability of Chat- could be used to address specific challenges in different
GPT. Additionally, future work in multilingual language disciplines. For example, ChatGPT could be used in
processing for ChatGPT may focus on developing multi- healthcare to develop models for medical diagnosis, drug
lingual conversational agents that can communicate with discovery, or patient monitoring. In finance, it could
users in different languages. This may involve addressing be used to develop fraud detection, investment analysis,
challenges such as code-switching, where users may or risk management models. These applications require
switch between languages within a single conversation. a deep understanding of domain-specific language and
Furthermore, research in multilingual language process- context, making ChatGPT an ideal tool for tackling these
ing for ChatGPT may also investigate ways to improve challenges.
the model’s handling of low-resource languages, which
may have limited training data available. VI. C ONCLUSION
• Low-Resource Language Processing: One of the future In conclusion, this survey presents a critical review of
works for ChatGPT is to extend its capabilities to low- ChatGPT, its technical advancements, and its standing within
resource language processing. This is particularly impor- the realm of conversational and generative AI. We have
tant as a significant portion of the world’s population demystified the factors that contribute to ChatGPT’s excep-
speaks low-resource languages with limited amounts of tional performance and capabilities by thoroughly analyzing
labeled data for training machine learning models. There- its innovations, establishing a taxonomy of recent research,
fore, developing ChatGPT models that can effectively and conducting a comparative analysis of its competitors.
process low-resource languages could have significant Moreover, we have identified and discussed the challenges and
implications for enabling communication and access to limitations of ChatGPT, emphasizing areas of improvement
information in these communities. and unexplored research opportunities.
To achieve this goal, researchers can explore several ap- We believe this survey lays the groundwork for a deeper un-
proaches. One possible solution is to develop pretraining derstanding of the trending ChatGPT in generative AI and will
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
17
serve as a valuable reference for researchers and practitioners [20] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox-
seeking to harness the power of ChatGPT in their applications imal policy optimization algorithms,” arXiv preprint arXiv:1707.06347,
2017.
or address its gaps as part of ongoing development. [21] Y. Wang, H. He, and X. Tan, “Truly proximal policy optimization,” in
Uncertainty in Artificial Intelligence. PMLR, 2020, pp. 113–122.
[22] A. Glaese, N. McAleese, M. Tr˛ebacz, J. Aslanides, V. Firoiu, T. Ewalds,
R EFERENCES M. Rauh, L. Weidinger, M. Chadwick, P. Thacker et al., “Improving
alignment of dialogue agents via targeted human judgements,” arXiv
[1] A. S. George and A. H. George, “A review of chatgpt ai’s impact on preprint arXiv:2209.14375, 2022.
several business sectors,” Partners Universal International Innovation
[23] Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma,
Journal, vol. 1, no. 1, pp. 9–23, 2023.
D. Drain, S. Fort, D. Ganguli, T. Henighan et al., “Training a helpful and
[2] M. Verma, “Integration of ai-based chatbot (chatgpt) and supply chain harmless assistant with reinforcement learning from human feedback,”
management solution to enhance tracking and queries response.” arXiv preprint arXiv:2204.05862, 2022.
[3] K. Jeblick, B. Schachtner, J. Dexl, A. Mittermeier, A. T. Stüber,
[24] N. Stiennon, L. Ouyang, J. Wu, D. Ziegler, R. Lowe, C. Voss, A. Rad-
J. Topalis, T. Weber, P. Wesp, B. Sabel, J. Ricke et al., “Chatgpt makes
ford, D. Amodei, and P. F. Christiano, “Learning to summarize with
medicine easy to swallow: An exploratory case study on simplified
human feedback,” Advances in Neural Information Processing Systems,
radiology reports,” arXiv preprint arXiv:2212.14882, 2022.
vol. 33, pp. 3008–3021, 2020.
[4] S. Frieder, L. Pinchetti, R.-R. Griffiths, T. Salvatori, T. Lukasiewicz,
[25] V. Malhotra, “Google introduces bard ai as a rival to chatgpt,” 2023,
P. C. Petersen, A. Chevalier, and J. Berner, “Mathematical capabilities
https://beebom.com/google-bard-chatgpt-rival-introduced/.
of chatgpt,” arXiv preprint arXiv:2301.13867, 2023.
[26] S. Garg, “What is chatsonic, and how to use it?” 2023, https://writesonic.
[5] S. Shahriar and K. Hayawi, “Let’s have a chat! a conversation with
com/blog/what-is-chatsonic/.
chatgpt: Technology, applications, and limitations,” arXiv preprint
[27] Jasper, “Jasper chat - ai chat assistant,” 2023, https://www.jasper.ai/.
arXiv:2302.13817, 2023.
[28] OpenAI, “Welcome to openai,” 2023, https://platform.openai.com/
[6] A. Lecler, L. Duron, and P. Soyer, “Revolutionizing radiology with gpt-
overview.
based models: Current applications, future possibilities and limitations
of chatgpt,” Diagnostic and Interventional Imaging, 2023. [29] C. AI, “Caktus - open writing with ai,” 2023, https://www.caktus.ai/
[7] R. Omar, O. Mangukiya, P. Kalnis, and E. Mansour, “Chatgpt versus caktus_student/.
traditional question answering for knowledge graphs: Current status and [30] E. Kuyda, “What is chatsonic, and how to use it?” 2023, https://help.
future directions towards knowledge graph chatbots,” arXiv preprint replika.com/hc/en-us/articles/115001070951-What-is-Replika-.
arXiv:2302.06466, 2023. [31] C. Research, “Chai gpt - ai more human, less filters,” 2023, https://www.
[8] A. Haleem, M. Javaid, and R. P. Singh, “An era of chatgpt as a significant chai-research.com/.
futuristic support tool: A study on features, abilities, and challenges,” [32] Rytr, “Rytr - take your writing assistant,” 2023, https://rytr.me/.
BenchCouncil Transactions on Benchmarks, Standards and Evaluations, [33] Peppertype.ai, “Peppertype.ai - your virtual content assistant,” 2023,
p. 100089, 2023. https://www.peppertype.ai/.
[9] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, [34] B. Guo, X. Zhang, Z. Wang, M. Jiang, J. Nie, Y. Ding, J. Yue, and
A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- Y. Wu, “How close is chatgpt to human experts? comparison corpus,
Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, evaluation, and detection,” arXiv preprint arXiv:2301.07597, 2023.
J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, [35] Y. Bang, S. Cahyawijaya, N. Lee, W. Dai, D. Su, B. Wilie, H. Lovenia,
B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, Z. Ji, T. Yu, W. Chung et al., “A multitask, multilingual, multimodal
and D. Amodei, “Language models are few-shot learners,” CoRR, vol. evaluation of chatgpt on reasoning, hallucination, and interactivity,”
abs/2005.14165, 2020. arXiv preprint arXiv:2302.04023, 2023.
[10] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. [36] N. Muennighoff, “Sgpt: Gpt sentence embeddings for semantic search,”
Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you arXiv preprint arXiv:2202.08904, 2022.
need,” in Advances in Neural Information Processing Systems, [37] C. Zhou, Q. Li, C. Li, J. Yu, Y. Liu, G. Wang, K. Zhang, C. Ji, Q. Yan,
I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, L. He et al., “A comprehensive survey on pretrained foundation models:
S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, A history from bert to chatgpt,” arXiv preprint arXiv:2302.09419, 2023.
Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper/ [38] C. Qin, A. Zhang, Z. Zhang, J. Chen, M. Yasunaga, and D. Yang,
2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf “Is chatgpt a general-purpose natural language processing task solver?”
[11] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training arXiv preprint arXiv:2302.06476, 2023.
of deep bidirectional transformers for language understanding,” arXiv [39] M. Ortega-Martín, Ó. García-Sierra, A. Ardoiz, J. Álvarez, J. C. Ar-
preprint arXiv:1810.04805, 2018. menteros, and A. Alonso, “Linguistic ambiguity analysis in chatgpt,”
[12] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, arXiv preprint arXiv:2302.06426, 2023.
“Improving language understanding by generative pre- [40] A. Borji, “A categorical archive of chatgpt failures,” arXiv preprint
training,” URL https://s3-us-west-2. amazonaws. com/openai- arXiv:2302.03494, 2023.
assets/researchcovers/languageunsupervised/language_understanding_paper.[41] X. Yang, Y. Li, X. Zhang, H. Chen, and W. Cheng, “Exploring the limits
pdf, 2018. of chatgpt for query or aspect-based text summarization,” arXiv preprint
[13] P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, arXiv:2302.08081, 2023.
“Deep reinforcement learning from human preferences,” Advances in [42] W. Jiao, W. Wang, J.-t. Huang, X. Wang, and Z. Tu, “Is chatgpt a good
neural information processing systems, vol. 30, 2017. translator? a preliminary study,” arXiv preprint arXiv:2301.08745, 2023.
[14] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, [43] J. Kocoń, I. Cichecki, O. Kaszyca, M. Kochanek, D. Szydło, J. Baran,
C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, J. Bielaniewicz, M. Gruza, A. Janz, K. Kanclerz et al., “Chatgpt: Jack
F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, of all trades, master of none,” arXiv preprint arXiv:2302.10724, 2023.
J. Leike, and R. Lowe, “Training language models to follow instructions [44] D. L. Mann, “Artificial intelligence discusses the role of artificial
with human feedback,” 2022. intelligence in translational medicine: A jacc: Basic to translational
[15] H. H. Thorp, “Chatgpt is fun, but not an author,” Science, vol. 379, no. science interview with chatgpt,” Basic to Translational Science, 2023.
6630, pp. 313–313, 2023. [45] F. Antaki, S. Touma, D. Milad, J. El-Khoury, and R. Duval, “Evalu-
[16] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. ating the performance of chatgpt in ophthalmology: An analysis of its
MIT press, 2018. successes and shortcomings,” medRxiv, pp. 2023–01, 2023.
[17] J. Ho and S. Ermon, “Generative adversarial imitation learning,” Ad- [46] T. H. Kung, M. Cheatham, A. Medenilla, C. Sillos, L. De Leon,
vances in neural information processing systems, vol. 29, 2016. C. Elepaño, M. Madriaga, R. Aggabao, G. Diaz-Candido, J. Maningo
[18] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust et al., “Performance of chatgpt on usmle: Potential for ai-assisted
region policy optimization,” in International conference on machine medical education using large language models,” PLOS Digital Health,
learning, 2015, pp. 1889–1897. vol. 2, no. 2, p. e0000198, 2023.
[19] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, [47] J. Dahmen, M. Kayaalp, M. Ollivier, A. Pareek, M. T. Hirschmann,
D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep rein- J. Karlsson, and P. W. Winkler, “Artificial intelligence bot chatgpt in
forcement learning,” in International conference on machine learning, medical research: the potential game changer as a double-edged sword,”
2016, pp. 1928–1937. pp. 1–3, 2023.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 27 March 2023 doi:10.20944/preprints202303.0438.v1
18