0% found this document useful (0 votes)
18 views25 pages

Rapport Template Master-4

Uploaded by

micki.zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views25 pages

Rapport Template Master-4

Uploaded by

micki.zhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Internship Report

« Quantitative Evaluation of Generative AI Models for


information extraction »

Author : Alexandre WILK


Academic Tutor : Cyril Bénézet

Company : Amundi
Internship Supervisor : Tony Le Gallic

Final-Year Internship from April 15, 2024 to October 11, 2024


1
Internship Framework

Amundi
Amundi is a French asset management firm established on January 1, 2010, from the
merger of Crédit Agricole Asset Management and Société Générale Asset Management.
Today, it is the leading asset manager in Europe. Amundi specializes in active manage-
ment through UCITS (Undertakings for Collective Investment in Transferable Securities)
and in passive management by issuing ETFs. It is also active in real and alternative asset
investments, including real estate and private equity. Additionally, Amundi is involved
in employee savings plans, which is one of the reasons it is well-known to the general
public today. The company employs around 5,000 people worldwide, including 2,000 in
its French subsidiary. The Amundi Paris office is located at 90-91 Boulevard Pasteur in
the 15th arrondissement of Paris.

Internship Goals
The primary objective of this internship was to analyze and comprehensively compare
various generative AI models (such as GPT, BERT, Llama, etc) to evaluate their strengths
and limitations in the context of information extraction. An additional objective was to
develop a methodology using AI to cross-verify and validate information from multiple
sources. This involves creating criteria and metrics to assess the accuracy and reliability
of the information. Experiments conducted to test the effectiveness of the selected AI
models required the collection of large datasets from various sources. The experimental
results will then be analyzed using statistical methods to evaluate the precision, reliability,
and efficiency of the AI models in the cross-verification process. Particular attention will
be given to identifying and evaluating potential risks and biases inherent in the AI models
used, especially concerning misinformation and source bias.

2
List of Acronyms and Abbreviations

• GenAI : Generative Artificial Intelligence

• LLM : Large Language Model

• NLP : Natural language processing

• FAISS : Facebook AI Similarity Search

• RAG : Retrieve Augmented Generation

• ICL : In-Context Learning

• ETF : Exchange-Traded Fund

• GPT : Generative Pre-trained Transformer

• BERT : Bidirectional Encoder Representations from Transformers

• KID : Key Information Document

• PRIIP : Packaged Retail and Insurance-based Investment Products


Contents
1 Introduction 1

2 Overview of the possibilities 2


2.1 Why use GenAI for Information Extraction ? . . . . . . . . . . . . . . . . 2
2.2 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1 What is BERT? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.2 BERT for Information Extraction . . . . . . . . . . . . . . . . . . . 4
2.3 LLMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 Established LLM Models . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.2 LLMs for Information Extraction . . . . . . . . . . . . . . . . . . . 6

3 Experimentations 8
3.1 Overview of the Business Requirements . . . . . . . . . . . . . . . . . . . . 8
3.1.1 Sources of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.2 Evaluation Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 In Context Learning vs Finetuning . . . . . . . . . . . . . . . . . . 10
3.2.2 Leveraging In Context Learning: The Vector Database . . . . . . . 11
3.2.3 Final Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Results and Limitation 13


4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.1 Accuracy, Precision, Recall . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.2 Business feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Possible Area of Improvment . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.1 Vote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.2 Tree of Thoughts and Chain of Thoughts . . . . . . . . . . . . . . . 13
4.3 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.1 Cost of Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3.2 The Issue with Closed Source Models . . . . . . . . . . . . . . . . . 15

5 Conclusion 16

A Annexe : Développement durable et responsabilité sociétale 17


A.1 Développement durable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
A.2 Responsabilité sociétale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

B Bibliography 18

4
1 Introduction
This report is written as part of my final-year internship for the Master’s program in
Quantitative Finance at Paris-Saclay University. I completed my six-month internship
with the Specialized Investment team at Amundi, supported by the Lab.

The Specialized Investment team is a diverse and multi-functional IT-dominant team


with varied profiles. The Lab consists of professionals specializing in machine learning,
with both NLP and LLMs skills.

The goal of my internship was to evaluate and subsequently employ machine learning
models and/or generative AI to address several business requirements centered around a
common theme: transforming unstructured data from various sources (text files, Power-
Point presentations, Word documents, PDFs, etc.) into data clean enough to be put in a
database.

This report is divided into three sections. In the first section, I will discuss the state-
of-the-art techniques for data extraction as depicted in the literature and the different
options I considered to address this issue. In the second section, I will detail my exper-
iments and the solutions I implemented. Finally, in the last section, I will address the
limitations of my work and the improvements I had time to implement and those that are
yet to develop

1
2 Overview of the possibilities

The first part of this internship report provides a comprehensive description focusing
on the use of BERT (Bidirectional Encoder Representations from Transformers) and Large
Language Models (LLMs) for extracting data from any input source. I believe this is
meaningful to have a brief overview of what are the choices one can make to handle data
extraction in order to understand the next steps of that thesis.

2.1 Why use GenAI for Information Extraction ?


Generative AI, particularly architectures such as BERT and LLMs, have revolutionized
the field of information extraction due to their unprecedented ability to understand and
generate human-like text. One of the primary reasons for utilizing generative AI for in-
formation extraction from text, Word documents, PowerPoint presentations, and other
formats is its robust performance in tasks such as Named Entity Recognition (NER),
sentiment analysis, and text summarization. BERT, for instance, employs a transformer-
based model that reads the entire sentence bidirectionally, allowing it to understand the
context more accurately compared to traditional models. This capability is critical for
NER, where identifying proper nouns, dates, and specific terminologies within the proper
context can dramatically improve the accuracy of data extraction.

According to the research by Devlin et al. (2019), BERT achieved state-of-the-art re-
sults on various NER datasets, highlighting its efficacy in extracting meaningful informa-
tion [al19]. Furthermore, LLMs extend these capabilities by generating human-like text,
making them extremely useful for creating summaries or extracting essential information
from unstructured data. The ability of these models to understand and generalize across
different formats, such as text, Word, and PowerPoint, enables a more holistic approach
to data extraction, leading to improved efficiency and accuracy in transforming raw data
into actionable insights. Studies like Radford et al.’s GPT-3 research demonstrate the
unparalleled generative abilities of LLMs in diverse applications, thereby underscoring
the transformative impact these models have on information extraction tasks [al20]. As a
result, embracing generative AI for data extraction not only streamlines the process but
also enhances the capability to derive valuable insights from a wide array of data sources,
enabling more informed decision-making in various domains.

2
2.2 BERT

2.2.1 What is BERT?

The BERT model, introduced by researchers at Google AI, represents a significant ad-
vancement in NLP. To fully understand how BERT operates, it is essential to first grasp
the fundamentals of its underlying architecture—the Transformer model.

The Transformer, introduced in the paper "Attention is All You Need" by Vaswani et
al. in 2017 [al17], revolutionized NLP by departing from traditional recurrent neural
network (RNN) architectures. Instead of processing words sequentially, the Transformer
operates using a mechanism called self-attention, allowing it to consider the entire con-
text of a sentence at once. This mechanism calculates attention scores which indicate how
much focus to place on other parts of the input when encoding a particular word.

The Transformer architecture, denoted as T , is constructed from two primary mod-


ules: an encoder, E, and a decoder, D. The encoder processes an input sequence X =
(x1 , x2 , . . . , xn ) where xi ∈ Rd , and generates a contextual representation Z = (z1 , z2 , . . . , zn ),
with zi ∈ Rd being a fixed-size vector representation for each word. Let El denote the l-th
layer of the encoder. The transformation can be formalized as:

Zl = El (X) for; l = 1, . . . , L
where L represents the total number of layers in the encoder. The decoder, in turn,
takes this representation Z and generates an output sequence Y = (y1 , y2 , . . . , ym ), de-
scribable by:

Yk = Dk (Z, Y 1 : k − 1) for; k = 1, . . . , K
where K represents the total number of layers in the decoder. Each block b in both the
encoder and decoder comprises two main components: a multi-head self-attention mecha-
nism (MHA))andaf eed − f orwardneuralnetwork((FFN). The multi-head self-attention
mechanism is given by:

MHA(Q, K, V ) = Concat(head1 , . . . , headh )W O


where each head is calculated as:

headi = Attention(QWiQ , KWiK , V WiV )


and the attention function is defined via scaled dot-product attention:

QK ⊤
 
Attention(Q, K, V ) = softmax √ V
dk
Here, Q, K, and V are query, key, and value matrices, respectively, and W matrices
are learnable projection matrices.

3
Figure 1: The Transformer Architecture

BERT is based on the Transformer, using only its "encoder" part. BERT consists of
an initial lexical embedding layer to represent words as vectors. These embeddings are
then provided as input to successive Transformer blocks. The model ends with a layer
called the "head" that aligns the resulting vectors from the last Transformer block with
the model’s vocabulary, allowing for a probability distribution over the lexicon to predict
a missing word.

BERT was designed to accept up to two sentences as input. The sequence of lexical
units (tokens in English) in the input always starts with a special unit "[CLS]" (for "clas-
sify") and ends with the special unit [SEP] (for "separate"). If the sequence of units
contains two sentences, another [SEP] unit is inserted between the two sentences.

BERT is a pre-trained model for the following two objectives:


• Masked Language Modeling: one of the units in the sequence is replaced by the
[MASK] unit. The objective is for the output probability distribution of the model
to maximize the probability of predicting the masked unit.
• Next Sentence Prediction: the input sequence is composed of two sentences. The
model must predict (true or false) whether the two sentences are consecutive in the
training data or not.

2.2.2 BERT for Information Extraction

To perform information extraction using a pre-trained deep learning model, we need


to fine-tune the model on a specific information extraction task. The process begins
with acquiring a pre-trained bidirectional transformer model, which is typically trained
on large corpora. Let’s denote the sequence of token embeddings as X = (x1 , x2 , . . . , xn ).
These embeddings are first passed through an encoder layer consisting of multiple atten-
tion heads and feedforward layers, resulting in contextually enriched embeddings H =
(h1 , h2 , . . . , hn ).

4
The next step involves preparing a labelled dataset specific to our information extrac-
tion task. For instance, in NER, each token xi in our dataset will have a corresponding
label yi . Fine-tuning starts by modifying the architecture slightly by adding a classifi-
cation layer atop the output embeddings H. This classification layer can be formulated
as:
P (y|x) = softmax(W hi + b)
where W and b are learnable parameters of the classification layer, and softmax con-
verts the logits into a probability distribution over the possible classes.
During training, the objective is to minimize the cross-entropy loss, defined as:
N C
1 XX
L=− yi,j log ŷi,j
N i=1 j=1

where N is the total number of tokens in the training set, C represents the number of
classes, yi,j is a binary indicator 0or1 if class label j is the correct classification for token
xi , and ŷi,j = P (yj |xi ) is the predicted probability of token xi being in class j.
The optimization of the loss function is achieved by backpropagation and gradient
descent methods such as Adam.
Once fine-tuned, the model can be employed for information extraction by feeding new
text inputs and using the trained model to predict labels. The final output consists of
tokens tagged with the appropriate classes indicating the extracted information.

2.3 LLMs

2.3.1 Established LLM Models

A large language model, such as Google’s Gemini or Meta’s LLama, is a type of artifi-
cial intelligence that uses deep learning techniques to understand and generate human-like
text. Built on the transformer architecture, which employs self-attention mechanisms,
these models are trained on vast amounts of textual data, allowing them to learn the in-
tricacies of language and context. They work by processing input text and predicting the
next word or sequence of words in a sentence. Each word or token is transformed into a nu-
merical representation, or embedding, which captures semantic and syntactic information.

The architecture of large language models, particularly those based on the transformer
architecture, is designed to effectively process and generate natural language text. The
key components of this architecture are the multi-headed self-attention mechanism and
the position-wise feedforward neural networks.

• Multi-Headed Self-Attention Mechanism: This component is crucial for understand-


ing the context within sequences. Unlike traditional recurrent neural networks
(RNNs) that process data sequentially, the self-attention mechanism allows the

5
model to weigh the importance of each word in a sentence relative to others. This
means each word can directly attend to every other word, capturing dependencies
without regard for their distance in the text. Multi-headed attention further expands
this capability by allowing the model to jointly attend to information from different
representation subspaces at different positions, providing a richer representation of
the context.

• Positional Encoding: Since the self-attention mechanism does not inherently con-
sider the order of words in the input sequence, positional encodings are added to
give the model some information about the relative or absolute positioning of the
words. Positional encodings can be either learned or fixed and are added to the
input embeddings at the bottom of the model architecture.

• Position-Wise Feedforward Neural Networks: After processing through the self-


attention layer, the output is passed through a position-wise feedforward neural
network. This part of the transformer consists of fully connected layers applied to
each position separately and identically. This includes two linear transformations
with a ReLU activation in between, which transforms the data further before passing
it onto the next layer.

• Layer Normalization and Residual Connections: Each sub-layer (self-attention and


feedforward neural network) in each transformer block has a residual connection
around it followed by layer normalization. Residual connections help in mitigating
the vanishing gradient problem by allowing gradients to flow through the networks
directly. Layer normalization is used to stabilize the training process and improve
the convergence time.

• Stacking of Blocks: The transformer architecture stacks multiple identical layers


of the mentioned blocks to form a deep network. Each layer’s output is the input
to the next layer, and this depth helps in learning more complex patterns and
relationships in the data. The final output of the top layer can be used for different
types of language tasks.

Overall, this architecture enables the model to handle long-range dependencies and
various nuances in language, making it extremely powerful for tasks involving under-
standing and generating human-like text. Through extensive pre-training on diverse text
data, the model learns a pervasive amount of common sense, linguistic, and world knowl-
edge, which facilitates its application to a wide array of downstream tasks with minimal
task-specific adjustments (like information extraction).

2.3.2 LLMs for Information Extraction

Large language models (LLMs) have emerged as powerful tools for information and
data extraction tasks. These models can be leveraged to extract structured information
from unstructured text through various techniques, including prompt engineering and tool
calling.

One effective approach is to use carefully crafted prompts that instruct the LLM to
extract specific types of information and format the output in a desired structure1. For
example, you can prompt the model to identify and extract entities, relationships, or

6
key facts from a given text passage. To further enhance the extraction process, one can
utilize tool calling capabilities, such as JSON mode, which allows the LLM to generate
structured output directly in JSON format [information]. This approach is particu-
larly useful when you need to extract multiple data points or complex relationships from
text. Prompt engineering plays a crucial role in optimizing the extraction process. By
designing clear and specific prompts, you can guide the LLM to focus on relevant in-
formation and ignore irrelevant details. Additionally, few-shot learning [al20] techniques
can be employed by providing the model with examples of the desired extraction format,
which can significantly improve accuracy and consistency. It’s important to note that
while LLMs excel at understanding context and semantics, they may sometimes produce
inconsistent or hallucinated information. To mitigate this, it’s advisable to implement
post-processing steps or validation mechanisms to ensure the extracted data’s accuracy
and reliability. Furthermore, for domain-specific extraction tasks, fine-tuning the LLM
on relevant datasets can lead to improved performance and more accurate extractions1.

7
3 Experimentations
In order to provide a comprehensive analysis of my internship experience, it is essential to
first understand the fundamental business needs that drive the organization’s operations.
This section will delve into the specific requirements and challenges faced by the business,
and how one can leverage the power of LLMs to tackle with thoses issues.

3.1 Overview of the Business Requirements

3.1.1 Sources of Data

During my internship, I worked on two distinct projects, both centered around the
structured extraction of information from unstructured sources (texts, images, PDFs,
etc.).

• The first project was for the "Ingénieurie Opérationelle" team: This team ensures
that the data within various documents describing Amundi’s structured product
(such as retail products or formula-based funds) is consistent. There is a list of
about twenty pieces of information to check in each document, and about twenty
documents to review for each product. These documents can include marketing
materials, regulatory papers, and financial agreement documents between different
counterparties (term sheets), among others.

Figure 2: On the Left: Example file On the Right: Expected output (control file)

8
• The second project was for the Private Equity team: This team focuses on investing
in private equity funds of funds. They receive around fifteen transaction notices
daily. From these notices, it is necessary to extract each type of listed transaction
(such as capital calls, distributions, and about ten other types) along with their
amounts, and input them into a database. Most of the times, those files take the
form of a scanned PDF document.

3.1.2 Evaluation Parameters

To assess the performance of LLMs in the context of data extraction, several key
metrics are employed, each providing distinct insights into various aspects of the model’s
efficacy:

• Reliability: This metric evaluates the consistency of the LLM by measuring the
percentage of times it returns valid labels without errors. It is computed as the
average success rate across all data rows:

percent_successfuli
Pn
Reliability = i=1
n
• Latency: Latency measures the time efficiency of the LLM, specifically the 95th
percentile of the time taken to process the data. This metric helps in understanding
the worst-case performance scenario:

Latency95 = 95th Percentile of ti

• Precision: The micro average of precision indicates the accuracy of the labels pro-
vided by the LLM by calculating the ratio of correctly identified positive labels to
the total number of positive labels identified:

TP
Precision =
TP + FP
where T P represents true positives and F P represents false positives.
• Recall: The micro average of recall measures the ability of the LLM to identify all
relevant instances within the data by calculating the ratio of correctly identified
positive labels to the total number of actual positive instances:

TP
Recall =
TP + FN
where F N represents false negatives.
• F1 Score: The micro average of the F1 Score:

2 · TPtotal
F 1micro =
2 · TPtotal + FPtotal + FNtotal
Where:

9
– TPtotal = Total true positives across all classes
– FPtotal = Total false positives across all classes
– FNtotal = Total false negatives across all classes

3.2 Implementation

3.2.1 In Context Learning vs Finetuning

LLMs are powerful tools trained on vast and diverse datasets. They exhibit remark-
able generalization abilities, making them suitable for a wide range of tasks. However, to
maximize their utility for domain-specific applications or specialized use cases, fine-tuning
or in-context learning is often employed.

General-purpose LLMs are designed to handle a broad spectrum of queries but may lack
the nuanced understanding required for specific domains, such as legal, medical, or finan-
cial applications. Fine-tuning and in-context learning are two ways to make the model
understand the specific needs of the task.

Fine-tuning and in-context learning are two primary paradigms for leveraging large lan-
guage models in specialized tasks like information extraction. Fine-tuning involves updat-
ing the pre-trained model’s parameters using task-specific labeled datasets. This process
adapts the model to a particular task or domain, improving performance for well-defined
objectives. Fine-tuning typically requires access to substantial labeled data and com-
putational resources, as well as careful management to avoid overfitting or catastrophic
forgetting of the model’s pre-existing capabilities.

In contrast, in-context learning (ICL) involves prompting the LLM with examples or
instructions during inference without modifying its parameters. By simply providing a
carefully designed input prompt that includes task-specific demonstrations or descriptions,
the model can generate accurate predictions for the task at hand. This method leverages
the LLM’s pre-trained knowledge and eliminates the need for additional training, making
it flexible, efficient, and less resource-intensive.

This study [al24a] highlights the comparative advantages of in-context learning, particu-
larly in scenarios with limited labeled data. The authors demonstrate that ICL achieves
competitive performance compared to fine-tuning for various information extraction tasks
while avoiding the extensive setup and computational cost associated with fine-tuning.
Additionally, ICL adapts more easily to new tasks or domains by simply reconfiguring
the prompt, whereas fine-tuning would require retraining. Another main advantage of
this is that one can make the LLMs evolve as business requirements evolves (such as new
keywords, new format etc...) without retraining by leveraging at inference another tool :
the vector database.

10
3.2.2 Leveraging In Context Learning: The Vector Database

In-Context Learning (ICL) is effective in improving a large language model’s (LLM’s)


output by including examples in the prompt. However, when Many-Shot In-Context
Learning isn’t feasible due to the length of examples exceeding the context window (as
seen in two business issues I encountered), we must use Few-Shot ICL, which performs
significantly worse. This reduced performance is mainly because the limited set of exam-
ples is often not relevant. In contrast, providing 200 examples increases the likelihood of
at least one being useful to enhance the LLM’s output.

To address this issue, I propose combining Few-Shot In-Context Learning with a vec-
tor database during runtime. Indeed, a vector database allow to quickly and efficiently
store and retrieve items that are similar. When new input is given to the LLM, it first
passes through the vector database, which selects the most similar examples to add to
the prompt. The design of the solution is described here 3.2.3, the main focus of this part
being to describe the vector database architecture.

A vector database is a specialized database optimized to store, index, and query high-
dimensional data represented as vectors. Vectors are mathematical entities often used to
encode data such as text, images, or audio into numerical formats suitable for machine
learning and similarity search. During this internship, I chose to use FAISS (Facebook
AI Similarity Search) to perform the queries on a vector database constructed using Ope-
nAI’s vector embeddings text-embedding-3 (the vectors being stored as blobs).

FAISS (Facebook AI Similarity Search) is an open-source library designed for efficient


similarity search and clustering of dense vectors. Developed by Meta AI, FAISS excels
in performing approximate nearest neighbor (ANN) searches on large datasets containing
high-dimensional vectors.

Let a dataset consist of N vectors, each in d-dimensional space:


D = {v1 , v2 , . . . , vN }, vi ∈ R d
FAISS operates in two main phases. First, it builds a searchable index from the vector
database (in our case, it uses k-means clustering to organize vectors into groups). Then,
it can execute queries to find vectors similar to the given query using similarity search.
Formally, the result of the similarity search is D′ which follows this definition :

Given a dataset D and a query vector q, similarity search seeks to find a subset D′ ⊂ D
where vectors in D′ maximize a similarity function s(q, v):
D′ = {v ∈ D | s(q, v) ≥ τ },
where τ is a similarity threshold. In our case, the similarity function I chose to use was
the cosine similarity (which happens to be the most used as well since it yields the best
performances[al24b]). The cosine similarity is defined as the following:
Pn
q·v i=1 qi vi
sim(q, v) = = qP .
∥q∥∥v∥ d 2
pPn
2
q
i=1 i · v
i=1 i

qi and vi being the component of vectors q and v respectively.

11
3.2.3 Final Design

The goal of the presented solution is to extract structured information with type safety
from unstructured data source (such as PDFs, Powerpoints, text files...). The extraction
should leverage the full power of LLMs through in context learning on a specific targeted
domain and also "learn" from its mistake when a new input is wrongfully extracted. This
is achieved trough the vector database of examples, that at runtime chooses the most
relevant exemples from the database and add them to the prompt for extraction.

Figure 3: Design of the solution

The file refers to the document one wishes to extract structured information from.

• This file first goes through an OCR Engine (such as Tesseract) or a Vision LLM (such
as Microsoft Phi3 Vision) so that text is extracted from it (either as a pure string
in the case of OCR, or as MarkDown string in the case of Vision LLM, which allows
for richer informations about the text structure but is also more computationally
intensive).

• Then the text is processed (useless information is removed) and embed to a vector
using OpenAI’s text-embedding-3 to be used for similarity search. The similarity
search yields a few exemples that are added to the prompt. The prompt thus
contains the JSON schema format instruction, the data extraction instruction, the
text processed from the file, and a few exemples similar to the file.

• The prompt is then sent to the LLM, which respond in text format. This text is
then processed through a Parsing Engine (in this case, the LangChain one is used)
that returns the well formatted JSON output (and the python dict with typesafety
using pydantic schemas).

12
4 Results and Limitation

alors ça marche ?

4.1 Results

4.1.1 Accuracy, Precision, Recall

statssssss

4.1.2 Business feedback

ça dit quoi les métiers

4.2 Possible Area of Improvment

4.2.1 Vote

resultat qd je fais le vote

4.2.2 Tree of Thoughts and Chain of Thoughts

article a pomper

4.3 Limitation

13
4.3.1 Cost of Tokens

While LLMs are a strong asset for information extraction, they are also costly. Indeed,
while BERT can run on any machine with a small GPU and is open source, it is not the
case with most LLMs. For instance, among the top 10 ranked models on HuggingFace’s
Chatbot Arena (elo like system where answers to prompts are graded by users and yields
an elo (Arena Score)), there is currently only one open source model (Meta’s Llama 3.1
401b parameters model). And a company without a proper GPU infrastructure can’t run
at scale those type of models with hundred of billions of parameters.

Figure 4: Chatbot Arena leaderboard as of 01/11/2024

Which means that they have to resort to LLMs provider, and their pay as you consume
tokens policy. During my internship, I mostly used GPT-4 Turbo through Azure services
with a cost per token like this:

Model Input Cost Output Cost


(per 1K tokens) (per 1K tokens)
GPT-4 Turbo $0.01 $0.03
GPT-4 32K $0.06 $0.12
GPT-3.5 Turbo $0.001 $0.002
GPT-3.5 Turbo Instruct $0.0015 $0.0020

Table 1: Azure OpenAI Service Pricing for GPT Models (as of November 2024)

Ignoring the embedding costs when building the vector database, which is generally a
one-time expense and relatively cheap (at $0.0001 per 1,000 tokens, it’s 100 times cheaper
than GPT-4 Turbo), the cost of a single document extraction with GPT-4 Turbo is as
follows:

14
Total Cost = (15k input tokens × 0.01) + (0.03 × 1.5k output tokens) = 0.195$

While 20 cents per document extraction might not seem like much, these extractions
occur daily for about 10 documents for the private equity team and twice a week for
around 15 documents for the Ingénieurie Opérationnelle team. Annually, this amounts to
approximately $820 solely for the cost of tokens consumed for these two relatively small,
data-intensive use cases. Additionally, because the GPU infrastructure is not managed
by Amundi and the models available are not open source, there is significant exposure to
price variations in both token costs and model availability.

4.3.2 The Issue with Closed Source Models

exemple gpt 4 random

15
5 Conclusion
Pour conclure, blabla...

Voici une référence vers l’introduction : 1.


Voici une référence vers la page où commence l’introduction : 1. Attention, ici le numéro
est le même, mais ces deux références n’ont rien à voir !

La conclusion doit situer votre travail dans un contexte assez large,


utilisation, estimation économique, idée de développement futur,
etc. Elle doit comporter :

• Un résumé des contributions


• L’évaluation de la mission, du projet, l’atteinte des objectifs
et la réussite/non des actions (le Retour d’Expérience REX) en
précisant clairement le statut du travail (en production, en test,
etc. . . )
• Quelles démarches d’amélioration continue sont envisagées à
l’issue du déploiement du projet et perspectives à court, moyen
ou long terme, le cas échéant.

Bon a+ fin du contenu principal.

16
A Annexe : Développement durable et
responsabilité sociétale

Du texte random pour présenter la section.

Vous pouvez développer cette annexe en présentant "Politique et objectifs", "Mise en


place" et "Mesure des résultats" par exemple (c’est ce que j’ai fait dans mon rapport 3A).
Plus précisément, j’ai fait ça pour quatre subsections : "Droits de la personne", "Travail",
"Environnement" et "Anti-corruption" plutôt que juste développement durable d’un côté
et responsabilité sociétale de l’autre.
À vous d’adapter selon ce que vous pourrez sortir sur votre entreprise.

A.1 Développement durable

Politique et objectifs :

Du pipo.

Mise en place :

Encore du pipo.

Mesure des résultats :

Toujours du pipo.

A.2 Responsabilité sociétale

Là c’est pareil .

17
B Bibliography
References
[al17] Vaswani et al. Attention Is ALl You Need. https://arxiv.org/abs/1706.
03762. 2017.
[al19] Delvin et al. BERT: Pre-training of Deep Bidirectional Transformers for Lan-
guage Understanding. https://arxiv.org/abs/1810.04805. 2019.
[al20] Radford et al. Language Models are Few-Shot Learners. https://arxiv.org/
abs/2005.14165. 2020.
[al24a] Agarwal et al. Many-Shot In-Context Learning. https://arxiv.org/pdf/2404.
11018. 2024.
[al24b] Liu et al. In-context Vectors: Making In Context Learning More Effective and
Controllable Through Latent Space Steering. https://arxiv.org/abs/2311.
06668. 2024.
List of Figures
1 The Transformer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 On the Left: Example file On the Right: Expected output (control file) . . 8
3 Design of the solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Chatbot Arena leaderboard as of 01/11/2024 . . . . . . . . . . . . . . . . . 14

List of Tables
1 Azure OpenAI Service Pricing for GPT Models (as of November 2024) . . 14
Résumé

Écrire ici le résumé.

Le résumé doit être rédigé, dix à quinze lignes suffisent (en effet,
il doit normalement pouvoir être parcouru rapidement) et permet
d’indiquer les mots clés, il est placé en 4ème de couverture du mé-
moire.
Résumer son mémoire consiste à en présenter le contenu de façon
abrégée, afin de permettre aux lecteurs d’en identifier le contenu
essentiel. Il doit garantir un accès plus rapide aux informations
proposées. Précis et agréable à lire, le résumé doit être tout à fait
fidèle à la logique du mémoire, selon son développement.
Simple et concis, il forme un tout susceptible d’être publié de façon
indépendante en tant que fiche bibliographique.

Mots-clés : keyword1, keyword2, keyword3, keyword4, keyword5

Le résumé est à compléter impérativement par une série de mots


clés. Au nombre de 3 à 5, ces termes iront du plus général au
plus spécifique. Ils servent eux aussi à caractériser, de façon très
synthétique, l’essentiel du contenu du mémoire."

You might also like