0% found this document useful (0 votes)
45 views11 pages

Large Language Models in Finance

Uploaded by

gopal swami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views11 pages

Large Language Models in Finance

Uploaded by

gopal swami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Large Language Models in Finance: A Survey

Yinheng Li∗ Shaofei Wang∗


[email protected] [email protected]
Columbia University Columbia University
New York, NY, USA New York, NY, USA

Han Ding∗ Hang Chen∗


[email protected] [email protected]
Columbia University New York University
New York, NY, USA New York, NY, USA
Abstract
arXiv:2311.10723v2 [q-fin.GN] 8 Jul 2024

• Existing solutions and models that employ LLMs for


Recent advances in large language models (LLMs) have opened various finance tasks. We summarize key techniques
new possibilities for artificial intelligence applications in fi- like finetuning pretrained LLMs and training domain-
nance. In this paper, we provide a practical survey focused on specific LLMs from scratch.
two key aspects of utilizing LLMs for financial tasks: existing • Guidance on the decision process for applying LLMs
solutions and guidance for adoption. in finance. We discuss factors to consider regarding
First, we review current approaches employing LLMs in whether LLMs are suitable for a task, cost/benefit trade-
finance, including leveraging pretrained models via zero- offs, risks, and limitations.
shot or few-shot learning, fine-tuning on domain-specific By reviewing current literature and developments, we
data, and training custom LLMs from scratch. We summarize hope to give an accessible synthesis of the state-of-the-art
key models and evaluate their performance improvements along with considerations for adopting LLMs in finance. This
on financial natural language processing tasks. survey targets financial professionals and researchers explor-
Second, we propose a decision framework to guide finan- ing the intersection of AI and finance. It may also inform
cial professionals in selecting the appropriate LLM solution developers applying LLM solutions for the finance industry.
based on their use case constraints around data, compute, The remainder of the paper is organized as follows. Sec-
and performance needs. The framework provides a pathway tion 2 covers background on language modeling and recent
from lightweight experimentation to heavy investment in advances leading to LLMs. Section 3 surveys current AI ap-
customized LLMs. plications in finance and the potential for LLMs to advance
Lastly, we discuss limitations and challenges around lever- in these areas. Sections 4 and 5 provide LLM solutions and
aging LLMs in financial applications. Overall, this survey decision guidance for financial applications. Finally, Sections
aims to synthesize the state-of-the-art and provide a roadmap 6 and 7 discuss risks, limitations, and conclusions.
for responsibly applying LLMs to advance financial AI.
Keywords: Large Language Models, Generative AI, Natural 2 Basics of Language Models
Language Processing, Finance A language model is a statistical model that is trained on
extensive text corpora to predict the probability distribution
of word sequences [4]. Let’s consider a sequence of words
1 Introduction denoted as 𝑊 = 𝑤 1, 𝑤 2, ..., 𝑤𝑛 , where 𝑤𝑖 represents the 𝑖-th
Recent advances in artificial intelligence, especially in nat- word in the sequence. The goal of a language model is to
ural language processing, have led to the development of calculate the probability 𝑃 (𝑊 ), which can be expressed as:
powerful large language models (LLMs) like ChatGPT[33].
These models have demonstrated impressive capabilities in 𝑃 (𝑊 ) = 𝑃 (𝑤 1, 𝑤 2, ..., 𝑤𝑛 )
understanding, generating, and reasoning about natural lan- = 𝑃 (𝑤 1 )𝑃 (𝑤 2 |𝑤 1 )𝑃 (𝑤 3 |𝑤 1, 𝑤 2 )
guage. The finance industry could benefit from applying
...𝑃 (𝑤𝑛 |𝑤 1, 𝑤 2, ..., 𝑤𝑛−1 )
LLMs, as effective language understanding and generation
can inform trading, risk modeling, customer service, and The conditional probability 𝑃 (𝑤𝑖 |𝑤 1, 𝑤 2, ..., 𝑤𝑖 −1 ) captures
more. the likelihood of word 𝑤𝑖 given the preceding words. Over
In this survey, we aim to provide a practical overview the past few decades, language model architectures have
focused on two key aspects of utilizing LLMs for financial undergone significant evolution. Initially, n-gram models
applications: represented word sequences as Markov processes [3], assum-
ing that the probability of the next word depends solely on
∗ All authors contributed equally to this research. Order is random the preceding (𝑛 − 1) words. For example, in a bigram model,
ICAIF-23, New York City, NY,
Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen

the probability of a word is only conditioned on the previous Financial risk modeling encompasses various applica-
word. tions of machine learning and deep learning models. For in-
Later, Recurrent Neural Network (RNN)-based models like stance, McKinsey & Company has developed a deep learning-
LSTM [20] and GRU [10] emerged as neural network solu- based solution for financial fraud detection by leveraging
tions, which are capable of capturing long-term dependen- user history data and real-time transaction data [39]. Similar
cies in sequential data. However, in 2017, the introduction approaches have been employed in credit scoring [29, 52]
of the transformer architecture [46] revolutionized language and bankruptcy or default prediction [8].
modeling, surpassing the performance of RNNs in tasks such Financial text mining represents a popular area where
as machine translation. Transformers employ self-attention deep learning models and natural language processing tech-
mechanisms to model parallel relationships between words, niques are extensively utilized. According to [35], there are
facilitating efficient training on large-scale datasets. Promi- over 40 research publications on this topic. Financial text
nent transformer-based models include GPT (Generative Pre- mining aims to extract valuable information from large-
trained Transformer) [5, 48], which is decoder-only frame- scale unstructured data in real-time, enabling more informed
work, BERT (Bidirectional Encoder Representations from decision-making in trading and risk modeling. For exam-
Transformers) [13], which is encoder-only framework, and ple, [15] employs financial market sentiment extracted from
T5 (Text-to-Text Transfer Transformer) [38], which lever- news articles to forecast the direction of the stock market
ages both encoder and decoder structures. These models index.
have achieved state-of-the-art results on various natural lan- Applying AI in financial advisory and customer-
guage processing (NLP) tasks through transfer learning. related services is an emerging and rapidly growing field.
It is important to note that the evolution of language mod- AI-powered chatbots, as discussed in [32], already provide
els has mainly been driven by advancements in computa- more than 37% of supporting functions in various e-commerce
tional power, the availability of large-scale datasets, and the and e-service scenarios. In the financial industry, chatbots
development of novel neural network architectures. These are being adopted as cost-effective alternatives to human
models have significantly enhanced language understanding customer service, as highlighted in the report "Chatbots in
and generation capabilities, enabling their application across consumer finance" [2]. Additionally, banks like JPMorgan
a wide range of industries and domains. are leveraging AI services to provide investment advice, as
mentioned in a report by CNBC [42].
The current implementation of deep learning models of-
fers significant advantages by efficiently extracting valuable
insights from vast amounts of data within short time frames.
3 Overview of AI Applications in Finance
This capability is particularly valuable in the finance indus-
3.1 Current AI Applications in Finance try, where timely and accurate information plays a crucial
Artificial Intelligence (AI) has witnessed extensive adoption role in decision-making processes. With the emergence of
across various domains of finance in recent years [19]. In LLMs, even more tasks that were previously considered in-
this survey, we focus on key financial applications, includ- tractable become possible, further expanding the potential
ing trading and portfolio management [67], financial risk applications of AI in the finance industry.
modeling [30], financial text mining [21, 36], and financial
advisory and customer services [41]. While this list is not
exhaustive, these areas have shown significant interest and 3.2 Advancements of LLMs in Finance
high potential with the advancement of AI. LLMs offer numerous advantages over traditional models,
Trading and portfolio management have been early particularly in the field of finance. Firstly, LLMs leverage
adopters of machine learning and deep learning models their extensive pre-training data to effectively process common-
within the finance industry. The primary objective of trading sense knowledge, enabling them to understand natural lan-
is to forecast prices and generate profits based on these pre- guage instructions. This is valuable in scenarios where super-
dictions. Initially, statistical machine learning methods such vised training is challenging due to limited labeled financial
as Support Vector Machines (SVM) [23], Xgboost [68], and data or restricted access to certain documents. LLMs can per-
tree-based algorithms were utilized for profit and loss esti- form tasks through zero-shot learning [26], as demonstrated
mation. However, the emergence of deep neural networks in- by their satisfactory performance in sentiment classification
troduced techniques like Recurrent Neural Networks (RNN), tasks across complex levels [65]. For similar text mining tasks
particularly Long Short-Term Memory (LSTM) networks on financial documents, LLMs can automatically achieve ac-
[40], Convolutional Neural Networks (CNN), and transform- ceptable performance.
ers [51], which have proven effective in price forecasting. Compared to other supervised models, LLMs offer supe-
Additionally, reinforcement learning [47] has been applied rior adaptation and flexibility. Instead of training separate
to automatic trading and portfolio optimization. models for specific tasks, LLMs can handle multiple tasks
ICAIF-23, New York City, NY,
Large Language Models in Finance: A Survey

by simply modifying the prompt under different task in- Similar to using LLM APIs, zero-shot or few-shot learning
structions [6]. This adaptability does not require additional approaches can be employed with open-source models. Uti-
training, enabling LLMs to simultaneously perform senti- lizing open-source models offers greater flexibility as the
ment analysis, summarization, and keyword extraction on model’s weights are accessible, and the model’s output can
financial documents. be customized for downstream tasks. Additionally, it pro-
LLMs excel at breaking down ambiguous or complex tasks vides better privacy protection as the model and data remain
into actionable plans. Applications like Auto-GPT [1], Se- under user’s control. However, working with open-source
mantic Kernel [31], and LangChain [7] have been developed models also has its drawbacks. Reported evaluation metrics
to showcase this capability. In this paper, we refer to this suggest a performance gap between open-source models and
as Tool Augmented Generation. For instance [37], Auto- proprietary models. For certain downstream tasks, zero-shot
GPT can optimize a portfolio with global equity ETFs and or few-shot learning may not yield optimal performance. In
bond ETFs based on user-defined goals. It formulates detailed such cases, fine-tuning the model with labeled data, exper-
plans, including acquiring financial data, utilizing Python tise, and computational resources is necessary to achieve
packages for Sharpe ratio optimization, and presenting the satisfactory results. This may explain why, at the time of
results to the user. Previously, achieving such end-to-end writing this paper, no direct examples of open-source models
solutions with a single model was unfeasible. This prop- applied to financial applications have been found. In Section
erty makes LLMs an ideal fit for financial customer service 5, we provide a more detailed discussion of which option is
or financial advisory, where they can understand natural more favorable under different circumstances.
language instructions and assist customers by leveraging
available tools and information. 4.2 Fine-tuning a Model
While the application of LLMs in finance is really promis- Fine-tuning LLMs in the finance domain can enhance domain-
ing, it is crucial to acknowledge their limitations and as- specific language understanding and contextual comprehen-
sociated risks, which will be further discussed in Section sion, resulting in improved performance in finance-related
6. tasks and generating more accurate and tailored outputs.

4 LLM Solutions for Finance 4.2.1 Common Techniques for LLM Fine-tuning. Mod-
4.1 Utilizing Few-shot/Zero-shot Learning in ern techniques for fine-tuning LLMs typically fall into two
Finance Applications main categories: standard fine-tuning and instructional fine-
tuning.
Accessing LLM solutions in finance can be done through
In standard fine-tuning, the model is trained on the raw
two options: utilizing an API from LLM service providers
datasets without modification. The key context, question,
or employing open-source LLMs. Companies like OpenAI1 ,
and desired answer are directly fed into the LLM, with the
Google2 , and Microsoft3 offer LLM services through APIs.
answer masked during training so that the model learns to
These services not only provide the base language model
generate it. Despite its simplicity, this approach is widely
capabilities but also offer additional features tailored for
effective.
specific use cases. For example, OpenAI’s APIs include func-
Instruct fine-tuning [34] involves creating task-specific
tionalities for chat, SQL generation, code completion, and
datasets that provide examples and guidance to steer the
code interpretation. While there is no dedicated LLM ser-
model’s learning process. By formulating explicit instruc-
vice exclusively designed for finance applications, leveraging
tions and demonstrations in the training data, the model
these general-purpose LLM services can be a viable option,
can be optimized to excel at certain tasks or produce more
especially for common tasks. An example in this work [16]
contextually relevant and desired outputs. The instructions
demonstrates the use of OpenAI’s GPT4 service for financial
act as a form of supervision to shape the model’s behavior.
statement analysis.
Both methods have their merits: standard fine-tuning
In addition to LLM services provided by tech companies,
is straightforward to implement, while instructional fine-
open-source LLMs can also be applied to financial applica-
tuning allows for more precise guidance of the model. The
tions. Models such as LLaMA [45], BLOOM [54], Flan-T5
ideal approach depends on the amount of training data avail-
[12], and more are available for download from the Hugging
able and the complexity of the desired behaviors. However,
Face model repository4 . Unlike using APIs, hosting and run-
both leverage the knowledge already embedded in LLMs and
ning these open-source models would require self-hosting.
fine-tune them for enhanced performance on downstream
tasks.
1 https://openai.com/product
2 https://bard.google.com/
In addition to the above methods, techniques such as Low-
3 https://azure.microsoft.com/en-us/products/cognitive-services/openai- Rank Adaptation (LoRA)[22] and quantization[18] can en-
service able fine-tuning with significantly lower computational re-
4 https://huggingface.co/models quirements.
ICAIF-23, New York City, NY,
Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen

Table 1. Quick Overview of Finetuned Finance LLM

Model Name Finetune data size (samples) Training budget Model architecture Release time
FinMA-7B Raw: 70k, Instruction: 136k 8 A100 40GB GPUs LLaMA-7B Jun 2023
FinMA-30B Raw: 70k, Instruction: 136k 128 A100 40GB GPUs LLaMA-30B Jun 2023
Fin-GPT(V1/V2/V3) 50K < $300 per training ChatGLM, LLaMA July 2023
Instruct-FinGPT 10K Instruction 8 A100 40GB GPUs, ∼1 hr LLaMA-7B Jun 2023
Fin-LLaMA[53] 16.9K Instruction NA LLaMA-33B Jun 2023
Cornucopia(Chinese)[61] 12M instruction NA LLaMA-7B Jun 2023

Table 2. Quick Overview of from scratch trained Finance LLMs

Pretrained Corpus size(tokens) Training bud- Model architecture Release time


LLM get(A100·hours)
BloomBergGPT 363B Finance tokens + 345B 1,300,000 50B-BLOOM May 2023
public tokens
XuanYuan2.0 366B for pre-training + 13B Not released 176B-BLOOM May 2023
for finetuning
Fin-T5 80B Finance tokens Days/weeks 770M-T5 Feb 2023

LoRA allows for fine-tuning the low-rank decomposed fine-tunes LLaMA on 10k instruction samples derived from
factors of the original weight matrices instead of the full two Financial Sentiment Analysis Datasets and also solely
matrices. This approach drastically reduces the number of evaluates performance on finance classification tasks.
trainable parameters, enabling training on less powerful Based on the reported model performance, we summarize
hardware and shortening the total training time. our findings as below:
Another impactful approach is to use reduced numerical
• Compared to the original base LLM (LLaMA) and other
precisions such as bfloat16 [24] or float16 instead of float32.
open-source LLMs (BLOOM, OPT[64], ChatGLM[14,
By halving the bit-width, each parameter only occupies 2
62]), all fine-tuned finance LLMs exhibit significantly
bytes instead of 4 bytes, reducing memory usage by 50%.
better performance across all finance-domain tasks
This also accelerates computation by up to 2x since smaller
reported in the papers, especially classification tasks.
data types speed up training. Moreover, the reduced mem-
• The fine-tuned finance LLMs outperform BloombergGPT[55]
ory footprint enables larger batch sizes, further boosting
in most finance tasks reported in the papers.
throughput.
• When compared to powerful general LLMs like Chat-
4.2.2 Fine-tuned finance LLM evaluation. The perfor- GPT and GPT-4, the fine-tuned finance LLMs demon-
mance of fine-tuned finance LLMs can be evaluated in two strate superior performance in most finance classifi-
categories: finance classification tasks and finance genera- cation tasks, which indicates their enhanced domain-
tive tasks. In finance classification, we consider tasks such as specific language understanding and contextual com-
Sentiment Analysis and News Headline Classification. In fi- prehension abilities. However, in finance generative
nance generative tasks, our focus is on Question Answering, tasks, the fine-tuned LLMs show similar or worse per-
News Summarization, and Named Entity Recognition. Table formance, suggesting the need for more high-quality
1 provides detailed information about all the fine-tuned fi- domain-specific datasets to improve their generative
nance LLMs. Among the various fine-tuned LLMs, we will capabilities.
focus on discussing three of them: (1) PIXIU (also known
as FinMA)[56], fine-tuned LLaMA on 136K task-specific in- 4.3 Pretrain from Scratch
struction samples. (2) FinGPT[58], it presents a end-to-end The objective of training LLMs from scratch is to develop
framework for training and applying FinLLMs in the finance models that have even better adaptation to the finance do-
industry. FinGPT utilizes the lightweight Low-rank Adapta- main. Table 2 presents the current finance LLMs that have
tion (LoRA) technique to fine-tune open-source LLMs (such been trained from scratch: BloombergGPT, Xuan Yuan 2.0
as LLaMA and ChatGLM) using approximately 50k samples. [66], and Fin-T5[28].
However, FinGPT’s evaluation is only limited to finance clas- As shown in Table 2, there is a trend of combining public
sification tasks. (3) Instruct-FinGPT[63], on the other hand, datasets with finance-specific datasets during the pretraining
ICAIF-23, New York City, NY,
Large Language Models in Finance: A Survey

phase. Notably, BloombergGPT serves as an example where 5 Decision Process in Applying LLM to
the corpus comprises an equal mix of general and finance- Financial Applications
related text. It is worth mentioning that BloombergGPT pri- 5.1 Determining the Need for a LLM
marily relies on a subset of 5 billion tokens that pertain
exclusively to Bloomberg, representing only 0.7% of the to- Before exploring LLM solutions, it is essential to ascertain
tal training corpus. This targeted corpus contributes to the whether employing such a model is truly necessary for the
performance improvements achieved in finance benchmarks. given task. The advantages of LLMs over smaller models can
Both BloombergGPT and Fin-T5 have demonstrated su- be summarized as follows, as outlined in the work by Yang
perior performance compared to their original models like et al. [59]:
BLOOM176B and T5, respectively. These tasks encompass ac- Leveraging Pretraining Knowledge: LLMs can utilize
tivities such as market sentiment classification, multi-categorical the knowledge acquired from pretraining data to provide
and multi-label classification, and more. BloombergGPT achieves solutions. If a task lacks sufficient training data or annotated
an impressive average score of 62.51, surpassing the open- data but requires common-sense knowledge, an LLM may
source BLOOM176B model, which only attains a score of be a suitable choice.
54.35. Similarly, Fin-T5 demonstrates its excellence with an Reasoning and Emergent Abilities: LLMs excel at tasks
average score of 81.78, outperforming the T5 model’s score that involve reasoning or emergent abilities [49]. This prop-
of 79.56. Notably, BloombergGPT was evaluated using an erty makes LLMs well-suited for tasks where task instruc-
internal benchmark specifically designed by Bloomberg. The tions or expected answers are not clearly defined, or when
results of this evaluation showcased remarkable improve- dealing with out-of-distribution data. In the context of fi-
ments, as BloombergGPT achieved an average score of 62.47, nancial advisory, client requests in customer service often
surpassing the performance of BLOOM176B, which only at- exhibit high variance and complex conversations. LLMs can
tained a score of 33.39. This outcome highlights that even serve as virtual agents to provide assistance in such cases.
when the internal private training corpus constitutes less Orchestrating Model Collaboration: LLMs can act as
than 1% of the total training corpus, it can still lead to sub- orchestrators between different models and tools. For tasks
stantial enhancements in evaluating tasks within the same that require collaboration among various models, LLMs can
domain and distribution. serve as orchestrators to integrate and utilize these tools
On finance-related generative tasks such as Question An- together [1, 7, 31]. This capability is particularly valuable
swering, Named Entity Recognition, summarization, both when aiming for a robust automation of a model solution
models exhibited significantly better results compared to pipeline.
their respective general models by a considerable margin. While LLMs offer immense power, their use comes with
Specifically, BloombergGPT achieved an impressive score a significant cost, whether utilizing a third-party API [33]
of 64.83, surpassing BLOOM-176B’s score of 45.43. Simi- or fine-tuning an open-source LLM. Therefore, it is prudent
larly, Fin-T5 outperformed T5 with a score of 68.69, while to consider conventional models before fully committing to
T5 scored 66.06. These findings further highlight the models’ LLMs. In cases where the task has a clear definition (e.g.,
superior performance in generating finance-related content regression, classification, ranking), there is an ample amount
when compared to their general-purpose counterparts. of annotated training data, or the task relies minimally on
Although these models are not as powerful as closed- common-sense knowledge or emerging capabilities like rea-
source models like GPT-3 or PaLM[11], they demonstrate soning, relying on LLMs may not be necessary or justified
similar or superior performance compared to similar-sized initially.
public models. In evaluations on various general genera-
tive tasks, such as BIG-bench Hard, knowledge assessments,
reading comprehension, and linguistic tasks, BloombergGPT
exhibited comparable or superior performance compared to 5.2 A general decision guidance for applying LLMs
similar-sized public models, albeit slightly inferior to larger on finance tasks
models like GPT-3 or PaLM. Overall, BloombergGPT show- Once the decision has been made to utilize LLMs for a finance
cased commendable performance across a wide range of task, a decision guidance framework can be followed to en-
general generative tasks, positioning it favorably among sure efficient and effective implementation. The framework,
models of comparable size. This indicates that the model’s illustrated in Figure 1, categorizes the usage of LLMs into four
enhanced capabilities in finance-related tasks do not come levels based on computational resources and data require-
at the expense of its general abilities. ments. By progressing through the levels, costs associated
with training and data collection increase. It is recommended
to start at Level 1 and move to higher levels (2, 3, and 4) only
if the model’s performance is not satisfactory. The following
section provides detailed explanations of the decision and
ICAIF-23, New York City, NY,
Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen

Figure 1. Decision process flow chart

action blocks at each level. Table ?? presents an approxi- If data privacy is not a concern, selecting third-party LLMs
mate cost range for different options, based on pricing from such as GPT3.5/GPT4 from OpenAI or BARD from Google
various third-party services like AWS and OpenAI. is recommended. This option allows for lightweight experi-
ments and early performance evaluation without significant
deployment costs. The only cost incurred would be the fees
5.2.1 Level 1: Zero-shot Applications. The first decision
associated with each API call, typically based on input length
block determines whether to use an existing LLM service
and the token count of the model’s output.
or an open-source model. If the input question or context
involves confidential data, it is necessary to proceed with the 5.2.2 Level 2: Few-shot Applications. If the model’s per-
1A action block, which involves self hosting an open-source formance at Level 1 is not acceptable for the application,
LLM. As of July 2023, several options are available, including few-shot learning can be explored if there are several ex-
LLAMA[45], OpenLLAMA[17], Alpaca[44], and Vicuna[9]. ample questions and their corresponding answers available.
LLAMA offers models with sizes ranging from 7B to 65B, Few-shot learning has shown advantages in various previous
but they are limited to research purposes. OpenLLAMA pro- works [5, 48]. The core idea is to provide a set of example
vides options for 3B, 7B, and 13B models, with support for questions along with their corresponding answers as con-
commercial usage. Alpaca and Vicuna are fine-tuned based text in addition to the specific question being asked. The
on LLAMA, offering 7B and 13B options. Deploying your cost associated with few-shot learning is similar to that of
own LLM requires a robust local machine with a suitable the previous levels, except for the requirement of providing
GPU, such as NVIDIA-V100 for a 7B model or NVIDIA-A100, examples each time. Generally, achieving good performance
A6000 for a 13B model. may require using 1 to 10 examples. These examples can be
ICAIF-23, New York City, NY,
Large Language Models in Finance: A Survey

Options Development Com- Development Data Deployment Computa-


putational Cost($) Cost(samples) tional Cost ($/1k to-
kens generated)
OpenSource-ZeroShot - - 0.006 - 0.037
3rd party-ZeroShot - - 0.002 - 0.12
OpenSource-FewShot - - 0.006 - 0.037
3rd party-FewShot - - 0.002 - 0.12
OpenSource Tool Augmented Cost of developing - 0.006 - 0.037
Generation tools
3rd party Tool Augmented Gen- Cost of developing - 0.002 - 0.12
eration tools
OpenSource-Finetune 4-360,000 10,000 - 12,000,000 0.0016 - 0.12
3rd party-Finetune 30-30,000 10,000 - 12,000,000 0.002 - 0.12
Train from Scratch 5,000,000 700,000,000 0.0016 - 0.12
Table 3. Costs of Different LLM Options: This table gives an approximate range of requirements of data and dollar cost.
The data and dollar cost requirements for development are estimated based on previous works listed in 2. The third-
party deployment costs are listed in https://openai.com/pricing. The open source deployment costs are calculated based
on https://openai.com/pricing and https://aws.amazon.com/ec2/pricing/on-demand/. We assume using NVIDIA A100 GPU.
The cost of $/ tokens = $ / second * second / 1k tokens, and it typically takes 3 to 33 seconds to generate 1k tokens, depending
on model size.

the same across different questions or selected based on the requires a reasonable amount of annotated data, computa-
specific question at hand. The challenge lies in determining tional resources (GPU, CPU, etc.), and expertise in tuning
the optimal number of examples and selecting relevant ones. language models, as listed in Table ??.
This process involves experimentation and testing until the
desired performance boundary is reached. 5.2.4 Level 4: Train Your Own LLMs from Scratch. If
the results are still unsatisfactory, the only option left is to
5.2.3 Level 3: Tool-Augmented Generation and Fine- train domain-specific LLMs from scratch, similar to what
tuning. If the task at hand is extremely complicated and BloombergGPT did. However, this option comes with signif-
in-context learning does not yield reasonable performance, icant computational costs and data requirements. It typically
the next option is to leverage external tools or plugins with requires millions of dollars in computational resources and
the LLM, assuming a collection of relevant tools/plugins is training on a dataset with trillions of tokens. The intricacies
available. For example, a simple calculator could assist with of the training process are beyond the scope of this survey,
arithmetic-related tasks, while a search engine could be in- but it is worth noting that it can take several months or even
dispensable for knowledge-intensive tasks such as querying years of effort for a professional team to accomplish.
the CEO of a specific company or identifying the company By following this decision guidance framework, financial
with the highest market capitalization. professionals and researchers can navigate through the vari-
Integrating tools with LLMs can be achieved by provid- ous levels and options, making informed choices that align
ing the tool’s descriptions. The cost associated with this with their specific needs and resource constraints.
approach is generally higher than that of few-shot learning
due to the development of the tool(s) and the longer input
sequence required as context. However, there may be in- 5.3 Evaluation
stances where the concatenated tool description is too long, The evaluation of LLMs in finance can be conducted through
surpassing the input length limit of LLMs. In such cases, an various approaches. One direct evaluation method is to assess
additional step such as a simple tool retrieval or filter might the model’s performance on downstream tasks. Evaluation
be needed to narrow down the tools for selection. The de- metrics can be categorized into two main groups: accuracy
ployment cost typically includes the cost of using the LLMs and performance, based on the taxonomy provided by [57].
as well as the cost of using the tool(s). The accuracy category can further be divided into metrics
If the above options fail to produce satisfactory perfor- for regression (such as MAPE, RMSE, 𝑅 2 ) and metrics for
mance, finetuning the LLMs can be attempted. This stage classification (Recall, Precision, F1 score). The performance
ICAIF-23, New York City, NY,
Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen

category includes metrics or measurements that directly as- AI. We structured the survey around two critical pillars:
sess the model’s performance on the specific task, such as solutions and adoption guidance.
measuring total profit or Sharpe Ratio in a trading-related Under solutions, we reviewed diverse approaches to har-
task. These evaluations can be conducted using historical nessing LLMs for finance, including leveraging pretrained
data, backtest simulations, or online experiments. While per- models, fine-tuning on domain data, and training custom
formance metrics are often more important in finance, it LLMs. Experimental results demonstrate significant perfor-
is crucial to ensure that accuracy metrics align with per- mance gains over general purpose LLMs across natural lan-
formance to ensure meaningful decision-making and guard guage tasks like sentiment analysis, question answering, and
against overfitting. summarization.
In addition to task-specific evaluations, general metrics To provide adoption guidance, we proposed a structured
used for LLMs can also be applied. Particularly, when evaluat- framework for selecting the optimal LLM strategy based
ing the overall quality of an existing LLM or a fine-tuned one, on constraints around data availability, compute resources,
comprehensive evaluation systems like the one presented in and performance needs. The framework aims to balance
[27] can be utilized. This evaluation system covers tasks for value and investment by guiding practitioners from low-cost
various scenarios and incorporates metrics from different experimentation to rigorous customization.
aspects, including accuracy, fairness, robustness, bias, and In summary, this survey synthesized the latest progress
more. It can serve as a guide for selecting a language model in applying LLMs to transform financial AI and provided a
or evaluating one’s own model in the context of finance practical roadmap for adoption. We hope it serves as a useful
applications. reference for researchers and professionals exploring the in-
tersection of LLMs and finance. As datasets and computation
improve, finance-specific LLMs represent an exciting path
5.4 Limitations to democratize cutting-edge NLP across the industry.
While significant progress has been made in applying LLMs
to revolutionize financial applications, it is important to ac-
knowledge the limitations of these language models. Two References
major challenges are the production of disinformation and [1] 2023. Auto-GPT: An Autonomous GPT-4 Experiment. https://github.
com/Significant-Gravitas/Auto-GPT.
the manifestation of biases, such as racial, gender, and reli-
[2] 2023. Chatbots in consumer finance. https://www.consumerfinance.
gious biases, in LLMs [43]. In the financial industry, accuracy gov/data-research/research-reports/chatbots-in-consumer-
of information is crucial for making sound financial decisions, finance/chatbots-in-consumer-finance/
and fairness is a fundamental requirement for all financial [3] Talal Almutiri and Farrukh Nadeem. 2022. Markov models applications
services. To ensure information accuracy and mitigate hallu- in natural language processing: a survey. Int. J. Inf. Technol. Comput.
Sci 2 (2022), 1–16.
cination, additional measures like retrieve-augmented gen- [4] Yoshua Bengio, Réjean Ducharme, and Pascal Vincent. 2000. A neural
eration [25] can be implemented. To address biases, content probabilistic language model. Advances in neural information process-
censoring and output restriction techniques (such as only ing systems 13 (2000).
generating answers from a pre-defined list) can be employed [5] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared
to control the generated content and reduce bias. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam,
Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss,
LMMs poises potential challenges in terms of regulation Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh,
and governance. Although LLM offers more interpretability Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse,
compared to conventional deep learning models by provid- Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess,
ing reasoning steps or thinking processes for the generated Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya
answers when prompted correctly [50] [60], LLM remains Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot
Learners. arXiv:2005.14165 [cs.CL]
a black box and explainability of the content it generates is [6] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared
highly limited. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam,
Addressing these limitations and ensuring the ethical and Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss,
responsible use of LLMs in finance applications is essen- Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh,
tial. Continuous research, development of robust evaluation Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse,
Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess,
frameworks, and the implementation of appropriate safe- Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya
guards are vital steps in harnessing the full potential of LLMs Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot
while mitigating potential risks. Learners. CoRR abs/2005.14165 (2020). arXiv:2005.14165 https:
//arxiv.org/abs/2005.14165
[7] Harrison Chase. 2022. LangChain. https://github.com/hwchase17/
6 Conclusion langchain
[8] Mu-Yen Chen. 2011. Bankruptcy prediction in firms with statistical and
In conclusion, this paper has conducted a timely and practical intelligent techniques and a comparison of evolutionary computation
survey on the emerging application of LLMs for financial approaches. Computers & Mathematics with Applications 62, 12 (2011),
ICAIF-23, New York City, NY,
Large Language Models in Finance: A Survey

4514–4524. https://doi.org/10.1016/j.camwa.2011.10.030 s40854-020-00205-1


[9] Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao [22] Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi
Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. LoRA: Low-Rank
Gonzalez, Ion Stoica, and Eric P. Xing. 2023. Vicuna: An Open-Source Adaptation of Large Language Models. arXiv:2106.09685 [cs.CL]
Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https://lmsys. [23] Kyoung jae Kim. 2003. Financial time series forecasting using support
org/blog/2023-03-30-vicuna/ vector machines. Neurocomputing 55, 1 (2003), 307–319. https://doi.
[10] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry org/10.1016/S0925-2312(03)00372-2 Support Vector Machines.
Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. [24] Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar
Learning Phrase Representations using RNN Encoder-Decoder for Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj
Statistical Machine Translation. arXiv:1406.1078 [cs.CL] Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo
[11] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srini-
Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, vasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, and Pradeep
Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Dubey. 2019. A Study of BFLOAT16 for Deep Learning Training.
Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi arXiv:1905.12322 [cs.LG]
Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, [25] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir
Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen tau
Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2021.
Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.
Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne arXiv:2005.11401 [cs.CL]
Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiri- [26] Yinheng Li. 2023. A Practical Survey on Zero-shot Prompt Design
donov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, for In-context Learning. International Conference Recent Advances in
Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Natural Language Processing.
Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, [27] Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara
Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai
Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan,
Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2022. PaLM: Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher
Scaling Language Modeling with Pathways. arXiv:2204.02311 [cs.CL] Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Dur-
[12] Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, mus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang,
William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Sid- Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac
dhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab,
Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro- Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani
Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gau- Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi
rav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai,
Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Yuhui Zhang, and Yuta Koreeda. 2022. Holistic Evaluation of Language
Roberts, Denny Zhou, Quoc V. Le, and Jason Wei. 2022. Scaling Models. arXiv:2211.09110 [cs.CL]
Instruction-Finetuned Language Models. arXiv:2210.11416 [cs.LG] [28] Dakuan Lu, Hengkui Wu, Jiaqing Liang, Yipei Xu, Qianyu He, Yipeng
[13] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Geng, Mengkun Han, Yingsi Xin, and Yanghua Xiao. 2023. BBT-Fin:
2019. BERT: Pre-training of Deep Bidirectional Transformers for Comprehensive Construction of Chinese Financial Domain Pre-trained
Language Understanding. arXiv:1810.04805 [cs.CL] Language Model, Corpus and Benchmark. arXiv:2302.09432 [cs.CL]
[14] Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin [29] Cuicui Luo, Desheng Wu, and Dexiang Wu. 2017. A deep learning
Yang, and Jie Tang. 2022. GLM: General Language Model Pretraining approach for credit scoring using credit default swaps. Engineering
with Autoregressive Blank Infilling. In Proceedings of the 60th Annual Applications of Artificial Intelligence 65 (2017), 465–470. https://doi.
Meeting of the Association for Computational Linguistics (Volume 1: org/10.1016/j.engappai.2016.12.002
Long Papers). 320–335. [30] Akib Mashrur, Wei Luo, Nayyar A. Zaidi, and Antonio Robles-Kelly.
[15] Bledar Fazlija and Pedro Harder. 2022. Using Financial News Sentiment 2020. Machine Learning for Financial Risk Management: A Survey.
for Stock Price Direction Prediction. Mathematics 10, 13 (2022). https: IEEE Access 8 (2020), 203203–203223. https://doi.org/10.1109/ACCESS.
//doi.org/10.3390/math10132156 2020.3036322
[16] Peter Foy. 2023. GPT-4 for Financial Statements: Building an AI Analyst. [31] Microsoft. 2023. Semantic Kernel. https://github.com/microsoft/
MLQ AI. https://www.mlq.ai/gpt-4-financial-statements-ai-analyst/ semantic-kernel.
[17] Xinyang Geng and Hao Liu. 2023. OpenLLaMA: An Open Reproduction [32] Chiara Valentina Misischia, Flora Poecze, and Christine Strauss. 2022.
of LLaMA. https://github.com/openlm-research/open_llama Chatbots in customer service: Their relevance and impact on service
[18] Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. quality. Procedia Computer Science 201 (2022), 421–428. https://doi.
Mahoney, and Kurt Keutzer. 2021. A Survey of Quantization Methods org/10.1016/j.procs.2022.03.055 The 13th International Conference
for Efficient Neural Network Inference. arXiv:2103.13630 [cs.CV] on Ambient Systems, Networks and Technologies (ANT) / The 5th
[19] John Goodell, Satish Kumar, Weng Marc Lim, and Debidutta Pattnaik. International Conference on Emerging Data and Industry 4.0 (EDI40).
2021. Artificial intelligence and machine learning in finance: Iden- [33] OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
tifying foundations, themes, and research clusters from bibliometric [34] Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright,
analysis. Journal of Behavioral and Experimental Finance 32 (08 2021). Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama,
https://doi.org/10.1016/j.jbef.2021.100577 Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller,
[20] Alex Graves. 2014. Generating Sequences With Recurrent Neural Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan
Networks. arXiv:1308.0850 [cs.NE] Leike, and Ryan Lowe. 2022. Training language models to follow
[21] Aaryan Gupta, Vinya Dengre, Hamza Kheruwala, and Manan Shah. instructions with human feedback. arXiv:2203.02155 [cs.CL]
2020. Comprehensive review of text-mining applications in finance. [35] Ahmet Murat Ozbayoglu, Mehmet Ugur Gudelek, and Omer Berat
Journal of Financial Innovation 6 (11 2020). https://doi.org/10.1186/ Sezer. 2020. Deep Learning for Financial Applications : A Survey.
ICAIF-23, New York City, NY,
Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen

arXiv:2002.05786 [q-fin.ST] Survey. arXiv:2202.07125 [cs.LG]


[36] Cynthia Pagliaro, Dhagash Mehta, Han-Tai Shiao, Shaofei Wang, and [52] David West. 2000. Neural network credit scoring models. Computers
Luwei Xiong. 2022. Investor Behavior Modeling by Analyzing Financial & Operations Research 27, 11 (2000), 1131–1152. https://doi.org/10.
Advisor Notes: A Machine Learning Perspective. In Proceedings of the 1016/S0305-0548(99)00149-5
Second ACM International Conference on AI in Finance (Virtual Event) [53] Pedram Babaei William Todt, Ramtin Babaei. 2023. Fin-LLAMA: Effi-
(ICAIF ’21). Association for Computing Machinery, New York, NY, USA, cient Finetuning of Quantized LLMs for Finance. https://github.com/
Article 23, 8 pages. https://doi.org/10.1145/3490354.3494388 Bavest/fin-llama.
[37] Igor Radovanovic. 2023. Auto-GPT for finance - an exploratory guide [54] BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki,
- algotrading101 blog. https://algotrading101.com/learn/auto-gpt- Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexan-
finance-guide/ dra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow,
[38] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka
Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff,
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas
Transformer. arXiv:1910.10683 [cs.LG] Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile
[39] Abhimanyu Roy, Jingyi Sun, Robert Mahoney, Loreto Alonzi, Stephen Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Lau-
Adams, and Peter Beling. 2018. Deep learning detecting fraud in rençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel,
credit card transactions. In 2018 Systems and Information Engineering Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Al-
Design Symposium (SIEDS). 129–134. https://doi.org/10.1109/SIEDS. fassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao
2018.8374722 Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van
[40] Omer Berat Sezer, Murat Ozbayoglu, and Erdogan Dogdu. 2017. A Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo González
Deep Neural-Network Based Stock Trading System Based on Evolu- Ponferrada, Efrat Levkovizh, Ethan Kim, Eyal Bar Natan, Francesco De
tionary Optimized Technical Analysis Parameters. Procedia Computer Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar,
Science 114 (2017), 473–480. https://doi.org/10.1016/j.procs.2017.09.031 Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac John-
Complex Adaptive Systems Conference with Theme: Engineering Cy- son, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge,
ber Physical Systems, CAS October 30 – November 1, 2017, Chicago, Jian Zhu, Jonathan Chang, Jörg Frohberg, Joseph Tobing, Joydeep
Illinois, USA. Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro Von
[41] Ashish Shah, Pratik Raj, Pushpam Kumar, Supriya P, and Asha Werra, Leon Weber, Long Phan, Loubna Ben allal, Ludovic Tanguy,
H V. 2020. FinAID, A Financial Advisor Application using AI. , Manan Dey, Manuel Romero Muñoz, Maraim Masoud, María Grandury,
2282–2286 pages. https://doi.org/10.35940/ijrte.a2951.059120 Mario Šaško, Max Huang, Maximin Coavoux, Mayank Singh, Mike
[42] Hugh Son. 2023. JPMorgan is developing a CHATGPT-like A.I. service Tian-Jian Jiang, Minh Chien Vu, Mohammad A. Jauhar, Mustafa Ghaleb,
that gives investment advice. https://www.cnbc.com/2023/05/25/ Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen,
jpmorgan-develops-ai-investment-advisor.html Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre
[43] Alex Tamkin, Miles Brundage, Jack Clark, and Deep Ganguli. 2021. Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi
Understanding the Capabilities, Limitations, and Societal Impact of Bommasani, Roberto Luis López, Rui Ribeiro, Salomey Osei, Sampo
Large Language Models. arXiv:2102.02503 [cs.CL] Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muham-
[44] Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen mad, Shanya Sharma, Shayne Longpre, Somaieh Nikpoor, Stanislav Sil-
Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. 2023. berberg, Suhas Pai, Sydney Zink, Tiago Timponi Torrent, Timo Schick,
Stanford Alpaca: An Instruction-following LLaMA model. https:// Tristan Thrush, Valentin Danchev, Vassilina Nikoulina, Veronika Laip-
github.com/tatsu-lab/stanford_alpaca. pala, Violette Lepercq, Vrinda Prabhu, Zaid Alyafeai, Zeerak Talat,
[45] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie- Arun Raja, Benjamin Heinzerling, Chenglei Si, Davut Emre Taşar, Eliz-
Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric abeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma,
Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta,
Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hen-
Foundation Language Models. arXiv:2302.13971 [cs.CL] drik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika,
[46] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal Nayak, Ryan
Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H.
Attention Is All You Need. arXiv:1706.03762 [cs.CL] Bach, Taewoon Kim, Tali Bers, Thibault Fevry, Trishala Neeraj, Urmish
[47] Junhao Wang, Yinheng Li, and Yijie Cao. 2019. Dynamic Portfolio Thakker, Vikas Raunak, Xiangru Tang, Zheng-Xin Yong, Zhiqing Sun,
Management with Reinforcement Learning. arXiv:1911.11880 [q- Shaked Brody, Yallow Uri, Hadar Tojarieh, Adam Roberts, Hyung Won
fin.PM] Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, Deepak
[48] Yaqing Wang, Quanming Yao, James Kwok, and Lionel M. Ni. 2020. Narayanan, Hatim Bourfoune, Jared Casper, Jeff Rasley, Max Ryabinin,
Generalizing from a Few Examples: A Survey on Few-Shot Learning. Mayank Mishra, Minjia Zhang, Mohammad Shoeybi, Myriam Pey-
arXiv:1904.05046 [cs.LG] rounette, Nicolas Patry, Nouamane Tazi, Omar Sanseviero, Patrick
[49] Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebas- von Platen, Pierre Cornette, Pierre François Lavallée, Rémi Lacroix,
tian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, Stéphane Re-
Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, quena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh,
Jeff Dean, and William Fedus. 2022. Emergent Abilities of Large Lan- Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Au-
guage Models. arXiv:2206.07682 [cs.CL] rélie Névéol, Charles Lovering, Dan Garrette, Deepak Tunuguntla,
[50] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov,
Chi, Quoc Le, and Denny Zhou. 2022. Chain of Thought Prompting Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekate-
Elicits Reasoning in Large Language Models. CoRR abs/2201.11903 rina Novikova, Jessica Zosa Forde, Jordan Clive, Jungo Kasai, Ken
(2022). arXiv:2201.11903 https://arxiv.org/abs/2201.11903 Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung
[51] Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal,
Junchi Yan, and Liang Sun. 2023. Transformers in Time Series: A Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, Shachar Mirkin,
ICAIF-23, New York City, NY,
Large Language Models in Finance: A Survey

Shani Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Lim- [62] Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming
isiewicz, Verena Rieser, Vitaly Protasov, Vladislav Mikhailov, Yada Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam,
Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdeněk Kas- Zixuan Ma, Yufei Xue, Jidong Zhai, Wenguang Chen, Zhiyuan Liu,
ner, Alice Rueda, Amanda Pestana, Amir Feizpour, Ammar Khan, Peng Zhang, Yuxiao Dong, and Jie Tang. 2023. GLM-130B: An Open
Amy Faranak, Ana Santos, Anthony Hevia, Antigona Unldreaj, Arash Bilingual Pre-trained Model. In The Eleventh International Conference
Aghagol, Arezoo Abdollahi, Aycha Tammour, Azadeh HajiHosseini, on Learning Representations (ICLR). https://openreview.net/forum?id=-
Bahareh Behroozi, Benjamin Ajibade, Bharat Saxena, Carlos Muñoz Aw0rrrPUF
Ferrandis, Daniel McDuff, Danish Contractor, David Lansky, Davis [63] Boyu Zhang, Hongyang Yang, and Xiao-Yang Liu. 2023. Instruct-
David, Douwe Kiela, Duong A. Nguyen, Edward Tan, Emi Baylor, Ez- FinGPT: Financial Sentiment Analysis by Instruction Tuning of
inwanne Ozoani, Fatima Mirza, Frankline Ononiwu, Habib Rezanejad, General-Purpose Large Language Models. arXiv:2306.12659 [cs.CL]
Hessie Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, [64] Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen,
Isar Nejadgholi, Jesse Passmore, Josh Seltzer, Julio Bonis Sanz, Livia Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria
Dutra, Mairon Samagaio, Maraim Elbadri, Margot Mieskes, Marissa Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel
Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, Muhammed Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke
Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, Zettlemoyer. 2022. OPT: Open Pre-trained Transformer Language
Nour Fahmy, Olanrewaju Samuel, Ran An, Rasmus Kromann, Ryan Models. arXiv:2205.01068 [cs.CL]
Hao, Samira Alizadeh, Sarmad Shubber, Silas Wang, Sourav Roy, Syl- [65] Wenxuan Zhang, Yue Deng, Bing Liu, Sinno Jialin Pan, and Lidong
vain Viguier, Thanh Le, Tobi Oyebade, Trieu Le, Yoyo Yang, Zach Bing. 2023. Sentiment Analysis in the Era of Large Language Models:
Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, Alison Calla- A Reality Check. arXiv:2305.15005 [cs.CL]
han, Anima Shukla, Antonio Miranda-Escalada, Ayush Singh, Ben- [66] Xuanyu Zhang, Qing Yang, and Dongliang Xu. 2023. XuanYuan 2.0:
jamin Beilharz, Bo Wang, Caio Brito, Chenxi Zhou, Chirag Jain, Chuxin A Large Chinese Financial Chat Model with Hundreds of Billions
Xu, Clémentine Fourrier, Daniel León Periñán, Daniel Molano, Dian Parameters. arXiv:2305.12002 [cs.CL]
Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel [67] Zihao Zhang, Stefan Zohren, and Stephen Roberts. 2020. Deep Learn-
Altay, Giyaseddin Bayrak, Gully Burns, Helena U. Vrabec, Imane ing for Portfolio Optimization. The Journal of Financial Data Science 2,
Bello, Ishani Dash, Jihyun Kang, John Giorgi, Jonas Golde, Jose David 4 (aug 2020), 8–20. https://doi.org/10.3905/jfds.2020.1.042
Posada, Karthik Rangasai Sivaraman, Lokesh Bulchandani, Lu Liu, [68] Ekaterina Zolotareva. 2021. Aiding Long-Term Investment Decisions
Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc with XGBoost Machine Learning Model. arXiv:2104.09341 [q-fin.CP]
Pàmies, Maria A Castillo, Marianna Nezhurina, Mario Sänger, Matthias
Samwald, Michael Cullan, Michael Weinberg, Michiel De Wolf, Mina
Mihaljcic, Minna Liu, Moritz Freidank, Myungsun Kang, Natasha See-
lam, Nathan Dahlberg, Nicholas Michio Broad, Nikolaus Muellner,
Pascale Fung, Patrick Haller, Ramya Chandrasekhar, Renata Eisenberg,
Robert Martin, Rodrigo Canalli, Rosaline Su, Ruisi Su, Samuel Cahyaw-
ijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Ki-
blawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter,
Sushil Bharati, Tanmay Laud, Théo Gigant, Tomoya Kainuma, Woj-
ciech Kusa, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan
Xu, Yingxin Xu, Yu Xu, Zhe Tan, Zhongli Xie, Zifan Ye, Mathilde Bras,
Younes Belkada, and Thomas Wolf. 2023. BLOOM: A 176B-Parameter
Open-Access Multilingual Language Model. arXiv:2211.05100 [cs.CL]
[55] Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze,
Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and
Gideon Mann. 2023. BloombergGPT: A Large Language Model for
Finance. arXiv:2303.17564 [cs.LG]
[56] Qianqian Xie, Weiguang Han, Xiao Zhang, Yanzhao Lai, Min Peng,
Alejandro Lopez-Lira, and Jimin Huang. 2023. PIXIU: A Large Lan-
guage Model, Instruction Data and Evaluation Benchmark for Finance.
arXiv:2306.05443 [cs.CL]
[57] Frank Xing, Erik Cambria, and Roy Welsch. 2018. Natural language
based financial forecasting: a survey. Artificial Intelligence Review 50
(06 2018). https://doi.org/10.1007/s10462-017-9588-9
[58] Hongyang Yang, Xiao-Yang Liu, and Christina Dan Wang.
2023. FinGPT: Open-Source Financial Large Language Models.
arXiv:2306.06031 [q-fin.ST]
[59] Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang
Feng, Haoming Jiang, Bing Yin, and Xia Hu. 2023. Harnessing
the Power of LLMs in Practice: A Survey on ChatGPT and Beyond.
arXiv:2304.13712 [cs.CL]
[60] Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L.
Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of
Thoughts: Deliberate Problem Solving with Large Language Models.
arXiv:2305.10601 [cs.CL]
[61] YangMu Yu. 2023. Cornucopia-LLaMA-Fin-Chinese. https://github.
com/jerry1993-tech/Cornucopia-LLaMA-Fin-Chinese.

You might also like