A Survey of Large Language Models
A Survey of Large Language Models
A Survey of Large Language Models
Abstract—Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering of language intelligence
by machine. Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a
significant challenge to develop capable artificial intelligence (AI) algorithms for comprehending and grasping a language. As a major
approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving
from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-
training Transformer models over large-scale corpora, showing strong capabilities in solving various natural language processing (NLP)
arXiv:2303.18223v15 [cs.CL] 13 Oct 2024
tasks. Since the researchers have found that model scaling can lead to an improved model capacity, they further investigate the scaling
effect by increasing the parameter scale to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these
enlarged language models not only achieve a significant performance improvement, but also exhibit some special abilities (e.g., in-
context learning) that are not present in small-scale language models (e.g., BERT). To discriminate the language models in different
parameter scales, the research community has coined the term large language models (LLM) for the PLMs of significant size (e.g.,
containing tens or hundreds of billions of parameters). Recently, the research on LLMs has been largely advanced by both academia
and industry, and a remarkable progress is the launch of ChatGPT (a powerful AI chatbot developed based on LLMs), which has
attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI
community, which would revolutionize the way how we develop and use AI algorithms. Considering this rapid technical progress, in this
survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular,
we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Furthermore, we
also summarize the available resources for developing LLMs and discuss the remaining issues for future directions. This survey provides
an up-to-date review of the literature on LLMs, which can be a useful resource for both researchers and engineers.
Index Terms—Large Language Models; Emergent Abilities; Adaptation Tuning; Utilization; Alignment; Capacity Evaluation
1 I NTRODUCTION
“The limits of my language mean the limits of my world.” extensive attention in the literature, which can be divided
—Ludwig Wittgenstein into four major development stages:
• Statistical language models (SLM). SLMs [6–9] are de-
veloped based on statistical learning methods that rose in
L ANGUAGE is a prominent ability in human beings to
express and communicate, which develops in early
childhood and evolves over a lifetime [3, 4]. Machines,
the 1990s. The basic idea is to build the word prediction
model based on the Markov assumption, e.g., predicting the
next word based on the most recent context. The SLMs with
however, cannot naturally grasp the abilities of understand-
a fixed context length n are also called n-gram language
ing and communicating in the form of human language,
models, e.g., bigram and trigram language models. SLMs
unless equipped with powerful artificial intelligence (AI)
have been widely applied to enhance task performance
algorithms. It has been a longstanding research challenge
in information retrieval (IR) [10, 11] and natural language
to achieve this goal, to enable machines to read, write, and
processing (NLP) [12–14]. However, they often suffer from
communicate like humans [5].
the curse of dimensionality: it is difficult to accurately
Technically, language modeling (LM) is one of the major
estimate high-order language models since an exponential
approaches to advancing language intelligence of machines.
number of transition probabilities need to be estimated.
In general, LM aims to model the generative likelihood
Thus, specially designed smoothing strategies such as back-
of word sequences, so as to predict the probabilities of
off estimation [15] and Good–Turing estimation [16] have
future (or missing) tokens. The research of LM has received
been introduced to alleviate the data sparsity problem.
• Neural language models (NLM). NLMs [1, 17, 18] charac-
• Version: v14 (major update on September 25, 2024). terize the probability of word sequences by neural networks,
• GitHub link: https://github.com/RUCAIBox/LLMSurvey
• Chinese book link: lmbook-zh.github.io
e.g., multi-layer perceptron (MLP) and recurrent neural net-
• * K. Zhou and J. Li contribute equally to this work. works (RNNs). As a remarkable contribution, the work in
• The authors are mainly with Gaoling School of Artificial Intelligence and [1] introduced the concept of distributed representation of
School of Information, Renmin University of China, Beijing, China; Jian- words and built the word prediction function conditioned
Yun Nie is with DIRO, Université de Montréal, Canada.
Contact e-mail: [email protected] on the aggregated context features (i.e., the distributed
• The authors of this survey paper reserve all the copyrights of the fig- word vectors). By extending the idea of learning effective
ures/tables, and any use of these materials for publication purpose must be features for text data, a general neural network approach
officially granted by the survey authors.
was developed to build a unified, end-to-end solution for
2
* 3 7
* 3 7
/ / D 0 $
/ / D 0 $
&