0% found this document useful (0 votes)

18 views25 pages

Rapport Template Master-4

Uploaded by

micki.zhang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views25 pages

Rapport Template Master-4

Uploaded by

micki.zhang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Internship Report

« Quantitative Evaluation of Generative AI Models for

information extraction »

Author : Alexandre WILK

Academic Tutor : Cyril Bénézet

Company : Amundi
Internship Supervisor : Tony Le Gallic

Final-Year Internship from April 15, 2024 to October 11, 2024

1
Internship Framework

Amundi
Amundi is a French asset management firm established on January 1, 2010, from the
merger of Crédit Agricole Asset Management and Société Générale Asset Management.
Today, it is the leading asset manager in Europe. Amundi specializes in active manage-
ment through UCITS (Undertakings for Collective Investment in Transferable Securities)
and in passive management by issuing ETFs. It is also active in real and alternative asset
investments, including real estate and private equity. Additionally, Amundi is involved
in employee savings plans, which is one of the reasons it is well-known to the general
public today. The company employs around 5,000 people worldwide, including 2,000 in
its French subsidiary. The Amundi Paris office is located at 90-91 Boulevard Pasteur in
the 15th arrondissement of Paris.

Internship Goals
The primary objective of this internship was to analyze and comprehensively compare
various generative AI models (such as GPT, BERT, Llama, etc) to evaluate their strengths
and limitations in the context of information extraction. An additional objective was to
develop a methodology using AI to cross-verify and validate information from multiple
sources. This involves creating criteria and metrics to assess the accuracy and reliability
of the information. Experiments conducted to test the effectiveness of the selected AI
models required the collection of large datasets from various sources. The experimental
results will then be analyzed using statistical methods to evaluate the precision, reliability,
and efficiency of the AI models in the cross-verification process. Particular attention will
be given to identifying and evaluating potential risks and biases inherent in the AI models
used, especially concerning misinformation and source bias.

2
List of Acronyms and Abbreviations

• GenAI : Generative Artificial Intelligence

• LLM : Large Language Model

• NLP : Natural language processing

• FAISS : Facebook AI Similarity Search

• RAG : Retrieve Augmented Generation

• ICL : In-Context Learning

• ETF : Exchange-Traded Fund

• GPT : Generative Pre-trained Transformer

• BERT : Bidirectional Encoder Representations from Transformers

• KID : Key Information Document

• PRIIP : Packaged Retail and Insurance-based Investment Products

Contents
1 Introduction 1

2 Overview of the possibilities 2

2.1 Why use GenAI for Information Extraction ? . . . . . . . . . . . . . . . . 2
2.2 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1 What is BERT? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.2 BERT for Information Extraction . . . . . . . . . . . . . . . . . . . 4
2.3 LLMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 Established LLM Models . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.2 LLMs for Information Extraction . . . . . . . . . . . . . . . . . . . 6

3 Experimentations 8
3.1 Overview of the Business Requirements . . . . . . . . . . . . . . . . . . . . 8
3.1.1 Sources of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.2 Evaluation Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 In Context Learning vs Finetuning . . . . . . . . . . . . . . . . . . 10
3.2.2 Leveraging In Context Learning: The Vector Database . . . . . . . 11
3.2.3 Final Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Results and Limitation 13

4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.1 Accuracy, Precision, Recall . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.2 Business feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Possible Area of Improvment . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.1 Vote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.2 Tree of Thoughts and Chain of Thoughts . . . . . . . . . . . . . . . 13
4.3 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3.1 Cost of Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3.2 The Issue with Closed Source Models . . . . . . . . . . . . . . . . . 15

5 Conclusion 16

A Annexe : Développement durable et responsabilité sociétale 17

A.1 Développement durable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
A.2 Responsabilité sociétale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

B Bibliography 18

4
1 Introduction
This report is written as part of my final-year internship for the Master’s program in
Quantitative Finance at Paris-Saclay University. I completed my six-month internship
with the Specialized Investment team at Amundi, supported by the Lab.

The Specialized Investment team is a diverse and multi-functional IT-dominant team

with varied profiles. The Lab consists of professionals specializing in machine learning,
with both NLP and LLMs skills.

The goal of my internship was to evaluate and subsequently employ machine learning
models and/or generative AI to address several business requirements centered around a
common theme: transforming unstructured data from various sources (text files, Power-
Point presentations, Word documents, PDFs, etc.) into data clean enough to be put in a
database.

This report is divided into three sections. In the first section, I will discuss the state-
of-the-art techniques for data extraction as depicted in the literature and the different
options I considered to address this issue. In the second section, I will detail my exper-
iments and the solutions I implemented. Finally, in the last section, I will address the
limitations of my work and the improvements I had time to implement and those that are
yet to develop

1
2 Overview of the possibilities

The first part of this internship report provides a comprehensive description focusing
on the use of BERT (Bidirectional Encoder Representations from Transformers) and Large
Language Models (LLMs) for extracting data from any input source. I believe this is
meaningful to have a brief overview of what are the choices one can make to handle data
extraction in order to understand the next steps of that thesis.

2.1 Why use GenAI for Information Extraction ?

Generative AI, particularly architectures such as BERT and LLMs, have revolutionized
the field of information extraction due to their unprecedented ability to understand and
generate human-like text. One of the primary reasons for utilizing generative AI for in-
formation extraction from text, Word documents, PowerPoint presentations, and other
formats is its robust performance in tasks such as Named Entity Recognition (NER),
sentiment analysis, and text summarization. BERT, for instance, employs a transformer-
based model that reads the entire sentence bidirectionally, allowing it to understand the
context more accurately compared to traditional models. This capability is critical for
NER, where identifying proper nouns, dates, and specific terminologies within the proper
context can dramatically improve the accuracy of data extraction.

According to the research by Devlin et al. (2019), BERT achieved state-of-the-art re-
sults on various NER datasets, highlighting its efficacy in extracting meaningful informa-
tion [al19]. Furthermore, LLMs extend these capabilities by generating human-like text,
making them extremely useful for creating summaries or extracting essential information
from unstructured data. The ability of these models to understand and generalize across
different formats, such as text, Word, and PowerPoint, enables a more holistic approach
to data extraction, leading to improved efficiency and accuracy in transforming raw data
into actionable insights. Studies like Radford et al.’s GPT-3 research demonstrate the
unparalleled generative abilities of LLMs in diverse applications, thereby underscoring
the transformative impact these models have on information extraction tasks [al20]. As a
result, embracing generative AI for data extraction not only streamlines the process but
also enhances the capability to derive valuable insights from a wide array of data sources,
enabling more informed decision-making in various domains.

2
2.2 BERT

2.2.1 What is BERT?

The BERT model, introduced by researchers at Google AI, represents a significant ad-
vancement in NLP. To fully understand how BERT operates, it is essential to first grasp
the fundamentals of its underlying architecture—the Transformer model.

The Transformer, introduced in the paper "Attention is All You Need" by Vaswani et
al. in 2017 [al17], revolutionized NLP by departing from traditional recurrent neural
network (RNN) architectures. Instead of processing words sequentially, the Transformer
operates using a mechanism called self-attention, allowing it to consider the entire con-
text of a sentence at once. This mechanism calculates attention scores which indicate how
much focus to place on other parts of the input when encoding a particular word.

The Transformer architecture, denoted as T , is constructed from two primary mod-

ules: an encoder, E, and a decoder, D. The encoder processes an input sequence X =
(x1 , x2 , . . . , xn ) where xi ∈ Rd , and generates a contextual representation Z = (z1 , z2 , . . . , zn ),
with zi ∈ Rd being a fixed-size vector representation for each word. Let El denote the l-th
layer of the encoder. The transformation can be formalized as:

Zl = El (X) for; l = 1, . . . , L
where L represents the total number of layers in the encoder. The decoder, in turn,
takes this representation Z and generates an output sequence Y = (y1 , y2 , . . . , ym ), de-
scribable by:

Yk = Dk (Z, Y 1 : k − 1) for; k = 1, . . . , K
where K represents the total number of layers in the decoder. Each block b in both the
encoder and decoder comprises two main components: a multi-head self-attention mecha-
nism (MHA))andaf eed − f orwardneuralnetwork((FFN). The multi-head self-attention
mechanism is given by:

MHA(Q, K, V ) = Concat(head1 , . . . , headh )W O

where each head is calculated as:

headi = Attention(QWiQ , KWiK , V WiV )

and the attention function is defined via scaled dot-product attention:

QK ⊤

Attention(Q, K, V ) = softmax √ V
dk
Here, Q, K, and V are query, key, and value matrices, respectively, and W matrices
are learnable projection matrices.

3
Figure 1: The Transformer Architecture

BERT is based on the Transformer, using only its "encoder" part. BERT consists of
an initial lexical embedding layer to represent words as vectors. These embeddings are
then provided as input to successive Transformer blocks. The model ends with a layer
called the "head" that aligns the resulting vectors from the last Transformer block with
the model’s vocabulary, allowing for a probability distribution over the lexicon to predict
a missing word.

BERT was designed to accept up to two sentences as input. The sequence of lexical
units (tokens in English) in the input always starts with a special unit "[CLS]" (for "clas-
sify") and ends with the special unit [SEP] (for "separate"). If the sequence of units
contains two sentences, another [SEP] unit is inserted between the two sentences.

BERT is a pre-trained model for the following two objectives:

• Masked Language Modeling: one of the units in the sequence is replaced by the
[MASK] unit. The objective is for the output probability distribution of the model
to maximize the probability of predicting the masked unit.
• Next Sentence Prediction: the input sequence is composed of two sentences. The
model must predict (true or false) whether the two sentences are consecutive in the
training data or not.

2.2.2 BERT for Information Extraction

To perform information extraction using a pre-trained deep learning model, we need

to fine-tune the model on a specific information extraction task. The process begins
with acquiring a pre-trained bidirectional transformer model, which is typically trained
on large corpora. Let’s denote the sequence of token embeddings as X = (x1 , x2 , . . . , xn ).
These embeddings are first passed through an encoder layer consisting of multiple atten-
tion heads and feedforward layers, resulting in contextually enriched embeddings H =
(h1 , h2 , . . . , hn ).

4
The next step involves preparing a labelled dataset specific to our information extrac-
tion task. For instance, in NER, each token xi in our dataset will have a corresponding
label yi . Fine-tuning starts by modifying the architecture slightly by adding a classifi-
cation layer atop the output embeddings H. This classification layer can be formulated
as:
P (y|x) = softmax(W hi + b)
where W and b are learnable parameters of the classification layer, and softmax con-
verts the logits into a probability distribution over the possible classes.
During training, the objective is to minimize the cross-entropy loss, defined as:
N C
1 XX
L=− yi,j log ŷi,j
N i=1 j=1

where N is the total number of tokens in the training set, C represents the number of
classes, yi,j is a binary indicator 0or1 if class label j is the correct classification for token
xi , and ŷi,j = P (yj |xi ) is the predicted probability of token xi being in class j.
The optimization of the loss function is achieved by backpropagation and gradient
descent methods such as Adam.
Once fine-tuned, the model can be employed for information extraction by feeding new
text inputs and using the trained model to predict labels. The final output consists of
tokens tagged with the appropriate classes indicating the extracted information.

2.3 LLMs

2.3.1 Established LLM Models

A large language model, such as Google’s Gemini or Meta’s LLama, is a type of artifi-
cial intelligence that uses deep learning techniques to understand and generate human-like
text. Built on the transformer architecture, which employs self-attention mechanisms,
these models are trained on vast amounts of textual data, allowing them to learn the in-
tricacies of language and context. They work by processing input text and predicting the
next word or sequence of words in a sentence. Each word or token is transformed into a nu-
merical representation, or embedding, which captures semantic and syntactic information.

The architecture of large language models, particularly those based on the transformer
architecture, is designed to effectively process and generate natural language text. The
key components of this architecture are the multi-headed self-attention mechanism and
the position-wise feedforward neural networks.

• Multi-Headed Self-Attention Mechanism: This component is crucial for understand-

ing the context within sequences. Unlike traditional recurrent neural networks
(RNNs) that process data sequentially, the self-attention mechanism allows the

5
model to weigh the importance of each word in a sentence relative to others. This
means each word can directly attend to every other word, capturing dependencies
without regard for their distance in the text. Multi-headed attention further expands
this capability by allowing the model to jointly attend to information from different
representation subspaces at different positions, providing a richer representation of
the context.

• Positional Encoding: Since the self-attention mechanism does not inherently con-
sider the order of words in the input sequence, positional encodings are added to
give the model some information about the relative or absolute positioning of the
words. Positional encodings can be either learned or fixed and are added to the
input embeddings at the bottom of the model architecture.

• Position-Wise Feedforward Neural Networks: After processing through the self-

attention layer, the output is passed through a position-wise feedforward neural
network. This part of the transformer consists of fully connected layers applied to
each position separately and identically. This includes two linear transformations
with a ReLU activation in between, which transforms the data further before passing
it onto the next layer.

• Layer Normalization and Residual Connections: Each sub-layer (self-attention and

feedforward neural network) in each transformer block has a residual connection
around it followed by layer normalization. Residual connections help in mitigating
the vanishing gradient problem by allowing gradients to flow through the networks
directly. Layer normalization is used to stabilize the training process and improve
the convergence time.

• Stacking of Blocks: The transformer architecture stacks multiple identical layers

of the mentioned blocks to form a deep network. Each layer’s output is the input
to the next layer, and this depth helps in learning more complex patterns and
relationships in the data. The final output of the top layer can be used for different
types of language tasks.

Overall, this architecture enables the model to handle long-range dependencies and
various nuances in language, making it extremely powerful for tasks involving under-
standing and generating human-like text. Through extensive pre-training on diverse text
data, the model learns a pervasive amount of common sense, linguistic, and world knowl-
edge, which facilitates its application to a wide array of downstream tasks with minimal
task-specific adjustments (like information extraction).

2.3.2 LLMs for Information Extraction

Large language models (LLMs) have emerged as powerful tools for information and
data extraction tasks. These models can be leveraged to extract structured information
from unstructured text through various techniques, including prompt engineering and tool
calling.

One effective approach is to use carefully crafted prompts that instruct the LLM to
extract specific types of information and format the output in a desired structure1. For
example, you can prompt the model to identify and extract entities, relationships, or

6
key facts from a given text passage. To further enhance the extraction process, one can
utilize tool calling capabilities, such as JSON mode, which allows the LLM to generate
structured output directly in JSON format [information]. This approach is particu-
larly useful when you need to extract multiple data points or complex relationships from
text. Prompt engineering plays a crucial role in optimizing the extraction process. By
designing clear and specific prompts, you can guide the LLM to focus on relevant in-
formation and ignore irrelevant details. Additionally, few-shot learning [al20] techniques
can be employed by providing the model with examples of the desired extraction format,
which can significantly improve accuracy and consistency. It’s important to note that
while LLMs excel at understanding context and semantics, they may sometimes produce
inconsistent or hallucinated information. To mitigate this, it’s advisable to implement
post-processing steps or validation mechanisms to ensure the extracted data’s accuracy
and reliability. Furthermore, for domain-specific extraction tasks, fine-tuning the LLM
on relevant datasets can lead to improved performance and more accurate extractions1.

7
3 Experimentations
In order to provide a comprehensive analysis of my internship experience, it is essential to
first understand the fundamental business needs that drive the organization’s operations.
This section will delve into the specific requirements and challenges faced by the business,
and how one can leverage the power of LLMs to tackle with thoses issues.

3.1 Overview of the Business Requirements

3.1.1 Sources of Data

During my internship, I worked on two distinct projects, both centered around the
structured extraction of information from unstructured sources (texts, images, PDFs,
etc.).

• The first project was for the "Ingénieurie Opérationelle" team: This team ensures
that the data within various documents describing Amundi’s structured product
(such as retail products or formula-based funds) is consistent. There is a list of
about twenty pieces of information to check in each document, and about twenty
documents to review for each product. These documents can include marketing
materials, regulatory papers, and financial agreement documents between different
counterparties (term sheets), among others.

Figure 2: On the Left: Example file On the Right: Expected output (control file)

8
• The second project was for the Private Equity team: This team focuses on investing
in private equity funds of funds. They receive around fifteen transaction notices
daily. From these notices, it is necessary to extract each type of listed transaction
(such as capital calls, distributions, and about ten other types) along with their
amounts, and input them into a database. Most of the times, those files take the
form of a scanned PDF document.

3.1.2 Evaluation Parameters

To assess the performance of LLMs in the context of data extraction, several key
metrics are employed, each providing distinct insights into various aspects of the model’s
efficacy:

• Reliability: This metric evaluates the consistency of the LLM by measuring the
percentage of times it returns valid labels without errors. It is computed as the
average success rate across all data rows:

percent_successfuli
Pn
Reliability = i=1
n
• Latency: Latency measures the time efficiency of the LLM, specifically the 95th
percentile of the time taken to process the data. This metric helps in understanding
the worst-case performance scenario:

Latency95 = 95th Percentile of ti

• Precision: The micro average of precision indicates the accuracy of the labels pro-
vided by the LLM by calculating the ratio of correctly identified positive labels to
the total number of positive labels identified:

TP
Precision =
TP + FP
where T P represents true positives and F P represents false positives.
• Recall: The micro average of recall measures the ability of the LLM to identify all
relevant instances within the data by calculating the ratio of correctly identified
positive labels to the total number of actual positive instances:

TP
Recall =
TP + FN
where F N represents false negatives.
• F1 Score: The micro average of the F1 Score:

2 · TPtotal
F 1micro =
2 · TPtotal + FPtotal + FNtotal
Where:

9
– TPtotal = Total true positives across all classes
– FPtotal = Total false positives across all classes
– FNtotal = Total false negatives across all classes

3.2 Implementation

3.2.1 In Context Learning vs Finetuning

LLMs are powerful tools trained on vast and diverse datasets. They exhibit remark-
able generalization abilities, making them suitable for a wide range of tasks. However, to
maximize their utility for domain-specific applications or specialized use cases, fine-tuning
or in-context learning is often employed.

General-purpose LLMs are designed to handle a broad spectrum of queries but may lack
the nuanced understanding required for specific domains, such as legal, medical, or finan-
cial applications. Fine-tuning and in-context learning are two ways to make the model
understand the specific needs of the task.

Fine-tuning and in-context learning are two primary paradigms for leveraging large lan-
guage models in specialized tasks like information extraction. Fine-tuning involves updat-
ing the pre-trained model’s parameters using task-specific labeled datasets. This process
adapts the model to a particular task or domain, improving performance for well-defined
objectives. Fine-tuning typically requires access to substantial labeled data and com-
putational resources, as well as careful management to avoid overfitting or catastrophic
forgetting of the model’s pre-existing capabilities.

In contrast, in-context learning (ICL) involves prompting the LLM with examples or
instructions during inference without modifying its parameters. By simply providing a
carefully designed input prompt that includes task-specific demonstrations or descriptions,
the model can generate accurate predictions for the task at hand. This method leverages
the LLM’s pre-trained knowledge and eliminates the need for additional training, making
it flexible, efficient, and less resource-intensive.

This study [al24a] highlights the comparative advantages of in-context learning, particu-
larly in scenarios with limited labeled data. The authors demonstrate that ICL achieves
competitive performance compared to fine-tuning for various information extraction tasks
while avoiding the extensive setup and computational cost associated with fine-tuning.
Additionally, ICL adapts more easily to new tasks or domains by simply reconfiguring
the prompt, whereas fine-tuning would require retraining. Another main advantage of
this is that one can make the LLMs evolve as business requirements evolves (such as new
keywords, new format etc...) without retraining by leveraging at inference another tool :
the vector database.

10
3.2.2 Leveraging In Context Learning: The Vector Database

In-Context Learning (ICL) is effective in improving a large language model’s (LLM’s)

output by including examples in the prompt. However, when Many-Shot In-Context
Learning isn’t feasible due to the length of examples exceeding the context window (as
seen in two business issues I encountered), we must use Few-Shot ICL, which performs
significantly worse. This reduced performance is mainly because the limited set of exam-
ples is often not relevant. In contrast, providing 200 examples increases the likelihood of
at least one being useful to enhance the LLM’s output.

To address this issue, I propose combining Few-Shot In-Context Learning with a vec-
tor database during runtime. Indeed, a vector database allow to quickly and efficiently
store and retrieve items that are similar. When new input is given to the LLM, it first
passes through the vector database, which selects the most similar examples to add to
the prompt. The design of the solution is described here 3.2.3, the main focus of this part
being to describe the vector database architecture.

A vector database is a specialized database optimized to store, index, and query high-
dimensional data represented as vectors. Vectors are mathematical entities often used to
encode data such as text, images, or audio into numerical formats suitable for machine
learning and similarity search. During this internship, I chose to use FAISS (Facebook
AI Similarity Search) to perform the queries on a vector database constructed using Ope-
nAI’s vector embeddings text-embedding-3 (the vectors being stored as blobs).

FAISS (Facebook AI Similarity Search) is an open-source library designed for efficient

similarity search and clustering of dense vectors. Developed by Meta AI, FAISS excels
in performing approximate nearest neighbor (ANN) searches on large datasets containing
high-dimensional vectors.

Let a dataset consist of N vectors, each in d-dimensional space:

D = {v1 , v2 , . . . , vN }, vi ∈ R d
FAISS operates in two main phases. First, it builds a searchable index from the vector
database (in our case, it uses k-means clustering to organize vectors into groups). Then,
it can execute queries to find vectors similar to the given query using similarity search.
Formally, the result of the similarity search is D′ which follows this definition :

Given a dataset D and a query vector q, similarity search seeks to find a subset D′ ⊂ D
where vectors in D′ maximize a similarity function s(q, v):
D′ = {v ∈ D | s(q, v) ≥ τ },
where τ is a similarity threshold. In our case, the similarity function I chose to use was
the cosine similarity (which happens to be the most used as well since it yields the best
performances[al24b]). The cosine similarity is defined as the following:
Pn
q·v i=1 qi vi
sim(q, v) = = qP .
∥q∥∥v∥ d 2
pPn
2
q
i=1 i · v
i=1 i

qi and vi being the component of vectors q and v respectively.

11
3.2.3 Final Design

The goal of the presented solution is to extract structured information with type safety
from unstructured data source (such as PDFs, Powerpoints, text files...). The extraction
should leverage the full power of LLMs through in context learning on a specific targeted
domain and also "learn" from its mistake when a new input is wrongfully extracted. This
is achieved trough the vector database of examples, that at runtime chooses the most
relevant exemples from the database and add them to the prompt for extraction.

Figure 3: Design of the solution

The file refers to the document one wishes to extract structured information from.

• This file first goes through an OCR Engine (such as Tesseract) or a Vision LLM (such
as Microsoft Phi3 Vision) so that text is extracted from it (either as a pure string
in the case of OCR, or as MarkDown string in the case of Vision LLM, which allows
for richer informations about the text structure but is also more computationally
intensive).

• Then the text is processed (useless information is removed) and embed to a vector
using OpenAI’s text-embedding-3 to be used for similarity search. The similarity
search yields a few exemples that are added to the prompt. The prompt thus
contains the JSON schema format instruction, the data extraction instruction, the
text processed from the file, and a few exemples similar to the file.

• The prompt is then sent to the LLM, which respond in text format. This text is
then processed through a Parsing Engine (in this case, the LangChain one is used)
that returns the well formatted JSON output (and the python dict with typesafety
using pydantic schemas).

12
4 Results and Limitation

alors ça marche ?

4.1 Results

4.1.1 Accuracy, Precision, Recall

statssssss

4.1.2 Business feedback

ça dit quoi les métiers

4.2 Possible Area of Improvment

4.2.1 Vote

resultat qd je fais le vote

4.2.2 Tree of Thoughts and Chain of Thoughts

article a pomper

4.3 Limitation

13
4.3.1 Cost of Tokens

While LLMs are a strong asset for information extraction, they are also costly. Indeed,
while BERT can run on any machine with a small GPU and is open source, it is not the
case with most LLMs. For instance, among the top 10 ranked models on HuggingFace’s
Chatbot Arena (elo like system where answers to prompts are graded by users and yields
an elo (Arena Score)), there is currently only one open source model (Meta’s Llama 3.1
401b parameters model). And a company without a proper GPU infrastructure can’t run
at scale those type of models with hundred of billions of parameters.

Figure 4: Chatbot Arena leaderboard as of 01/11/2024

Which means that they have to resort to LLMs provider, and their pay as you consume
tokens policy. During my internship, I mostly used GPT-4 Turbo through Azure services
with a cost per token like this:

Model Input Cost Output Cost

(per 1K tokens) (per 1K tokens)
GPT-4 Turbo $0.01 $0.03
GPT-4 32K $0.06 $0.12
GPT-3.5 Turbo $0.001 $0.002
GPT-3.5 Turbo Instruct $0.0015 $0.0020

Table 1: Azure OpenAI Service Pricing for GPT Models (as of November 2024)

Ignoring the embedding costs when building the vector database, which is generally a
one-time expense and relatively cheap (at $0.0001 per 1,000 tokens, it’s 100 times cheaper
than GPT-4 Turbo), the cost of a single document extraction with GPT-4 Turbo is as
follows:

14
Total Cost = (15k input tokens × 0.01) + (0.03 × 1.5k output tokens) = 0.195$

While 20 cents per document extraction might not seem like much, these extractions
occur daily for about 10 documents for the private equity team and twice a week for
around 15 documents for the Ingénieurie Opérationnelle team. Annually, this amounts to
approximately $820 solely for the cost of tokens consumed for these two relatively small,
data-intensive use cases. Additionally, because the GPU infrastructure is not managed
by Amundi and the models available are not open source, there is significant exposure to
price variations in both token costs and model availability.

4.3.2 The Issue with Closed Source Models

exemple gpt 4 random

15
5 Conclusion
Pour conclure, blabla...

Voici une référence vers l’introduction : 1.

Voici une référence vers la page où commence l’introduction : 1. Attention, ici le numéro
est le même, mais ces deux références n’ont rien à voir !

La conclusion doit situer votre travail dans un contexte assez large,

utilisation, estimation économique, idée de développement futur,
etc. Elle doit comporter :

• Un résumé des contributions

• L’évaluation de la mission, du projet, l’atteinte des objectifs
et la réussite/non des actions (le Retour d’Expérience REX) en
précisant clairement le statut du travail (en production, en test,
etc. . . )
• Quelles démarches d’amélioration continue sont envisagées à
l’issue du déploiement du projet et perspectives à court, moyen
ou long terme, le cas échéant.

Bon a+ fin du contenu principal.

16
A Annexe : Développement durable et
responsabilité sociétale

Du texte random pour présenter la section.

Vous pouvez développer cette annexe en présentant "Politique et objectifs", "Mise en

place" et "Mesure des résultats" par exemple (c’est ce que j’ai fait dans mon rapport 3A).
Plus précisément, j’ai fait ça pour quatre subsections : "Droits de la personne", "Travail",
"Environnement" et "Anti-corruption" plutôt que juste développement durable d’un côté
et responsabilité sociétale de l’autre.
À vous d’adapter selon ce que vous pourrez sortir sur votre entreprise.

A.1 Développement durable

Politique et objectifs :

Du pipo.

Mise en place :

Encore du pipo.

Mesure des résultats :

Toujours du pipo.

A.2 Responsabilité sociétale

Là c’est pareil .

17
B Bibliography
References
[al17] Vaswani et al. Attention Is ALl You Need. https://arxiv.org/abs/1706.
03762. 2017.
[al19] Delvin et al. BERT: Pre-training of Deep Bidirectional Transformers for Lan-
guage Understanding. https://arxiv.org/abs/1810.04805. 2019.
[al20] Radford et al. Language Models are Few-Shot Learners. https://arxiv.org/
abs/2005.14165. 2020.
[al24a] Agarwal et al. Many-Shot In-Context Learning. https://arxiv.org/pdf/2404.
11018. 2024.
[al24b] Liu et al. In-context Vectors: Making In Context Learning More Effective and
Controllable Through Latent Space Steering. https://arxiv.org/abs/2311.
06668. 2024.
List of Figures
1 The Transformer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 On the Left: Example file On the Right: Expected output (control file) . . 8
3 Design of the solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Chatbot Arena leaderboard as of 01/11/2024 . . . . . . . . . . . . . . . . . 14

List of Tables
1 Azure OpenAI Service Pricing for GPT Models (as of November 2024) . . 14
Résumé

Écrire ici le résumé.

Le résumé doit être rédigé, dix à quinze lignes suffisent (en effet,
il doit normalement pouvoir être parcouru rapidement) et permet
d’indiquer les mots clés, il est placé en 4ème de couverture du mé-
moire.
Résumer son mémoire consiste à en présenter le contenu de façon
abrégée, afin de permettre aux lecteurs d’en identifier le contenu
essentiel. Il doit garantir un accès plus rapide aux informations
proposées. Précis et agréable à lire, le résumé doit être tout à fait
fidèle à la logique du mémoire, selon son développement.
Simple et concis, il forme un tout susceptible d’être publié de façon
indépendante en tant que fiche bibliographique.

Mots-clés : keyword1, keyword2, keyword3, keyword4, keyword5

Le résumé est à compléter impérativement par une série de mots

clés. Au nombre de 3 à 5, ces termes iront du plus général au
plus spécifique. Ils servent eux aussi à caractériser, de façon très
synthétique, l’essentiel du contenu du mémoire."

Machine Learning Systems
No ratings yet
Machine Learning Systems
300 pages
Machine Learning Systems: Vĳay Janapa Reddi
No ratings yet
Machine Learning Systems: Vĳay Janapa Reddi
1,474 pages
Machine Learning
100% (1)
Machine Learning
90 pages
Intership Report
No ratings yet
Intership Report
41 pages
ML Unit-1
No ratings yet
ML Unit-1
15 pages
ML Units-1&2
No ratings yet
ML Units-1&2
31 pages
Report
No ratings yet
Report
33 pages
Mastering AI and ML With Python - ACE - INTL
No ratings yet
Mastering AI and ML With Python - ACE - INTL
223 pages
Modern AI Pro Essentials
100% (1)
Modern AI Pro Essentials
9 pages
Generative AI Roadmap
No ratings yet
Generative AI Roadmap
36 pages
Internship Presentation
No ratings yet
Internship Presentation
16 pages
SST Ple Tips-1
No ratings yet
SST Ple Tips-1
6 pages
(Slides) Module 11
No ratings yet
(Slides) Module 11
103 pages
Wade 200622083
No ratings yet
Wade 200622083
153 pages
OT Theories (In A Nutshell)
100% (5)
OT Theories (In A Nutshell)
8 pages
Me Internship Certificate(s)
No ratings yet
Me Internship Certificate(s)
27 pages
Artificial Intelligence Machine Learning Program Brochure
No ratings yet
Artificial Intelligence Machine Learning Program Brochure
20 pages
Utkarsh Report
No ratings yet
Utkarsh Report
30 pages
SocrAI Day 1
No ratings yet
SocrAI Day 1
104 pages
AIML Domestic Executive Brochure Dec 10 2024
No ratings yet
AIML Domestic Executive Brochure Dec 10 2024
25 pages
John Locke. (TH-WPS Office
75% (4)
John Locke. (TH-WPS Office
10 pages
Artificial Intelligence Machine Learning Program Brochure
No ratings yet
Artificial Intelligence Machine Learning Program Brochure
24 pages
ML Intern
No ratings yet
ML Intern
20 pages
Aiml Report
No ratings yet
Aiml Report
70 pages
Aiml Report
No ratings yet
Aiml Report
70 pages
Antim Prahar 2024 AI and ML For Business
No ratings yet
Antim Prahar 2024 AI and ML For Business
43 pages
Comprehensive AI & ML Course - From Beginner To Gen...
No ratings yet
Comprehensive AI & ML Course - From Beginner To Gen...
5 pages
Leveled Book - Aa: Written by Cheryl Ryan - Illustrated by Nora Voutas
No ratings yet
Leveled Book - Aa: Written by Cheryl Ryan - Illustrated by Nora Voutas
12 pages
Iml Material
No ratings yet
Iml Material
139 pages
Internship Report Vanaja 4-1 VANAJA
No ratings yet
Internship Report Vanaja 4-1 VANAJA
52 pages
Coa 10701621018
No ratings yet
Coa 10701621018
12 pages
Answers 111111111111111111111111111
No ratings yet
Answers 111111111111111111111111111
21 pages
Introduction To AI and Machine Learning
No ratings yet
Introduction To AI and Machine Learning
21 pages
Krithika (113322243044) - INTERNSHIP REPORT
No ratings yet
Krithika (113322243044) - INTERNSHIP REPORT
27 pages
Part B Unit 2 Running Notes and Textbook Questions
No ratings yet
Part B Unit 2 Running Notes and Textbook Questions
27 pages
Prometeia's LLM GenAI Validation Framework - March 2024
No ratings yet
Prometeia's LLM GenAI Validation Framework - March 2024
18 pages
AI QP - 1 Solved
No ratings yet
AI QP - 1 Solved
9 pages
NEW LED Artificial Intelligence Curriculum (1) Compressed
No ratings yet
NEW LED Artificial Intelligence Curriculum (1) Compressed
13 pages
Internship Report
No ratings yet
Internship Report
41 pages
Unit 2
No ratings yet
Unit 2
19 pages
Ram Report
No ratings yet
Ram Report
35 pages
Ai
No ratings yet
Ai
13 pages
Ai and ML qp1 Solved
No ratings yet
Ai and ML qp1 Solved
20 pages
Ai
No ratings yet
Ai
13 pages
Proj Report - Naan Mudhalvan - 1
No ratings yet
Proj Report - Naan Mudhalvan - 1
13 pages
Updated Internship Front Page
No ratings yet
Updated Internship Front Page
13 pages
Advanced Techniques in Machine Learning and Optimization
No ratings yet
Advanced Techniques in Machine Learning and Optimization
8 pages
Ljybtwsye0gzyeq9z Embedding GenAI With MongoDB
No ratings yet
Ljybtwsye0gzyeq9z Embedding GenAI With MongoDB
17 pages
4th Grade 4 Reading First Day
No ratings yet
4th Grade 4 Reading First Day
4 pages
Aiml Online Brochure
No ratings yet
Aiml Online Brochure
20 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
14 pages
Aiml Online Brochure PDF
No ratings yet
Aiml Online Brochure PDF
23 pages
Unit 2 AIML
No ratings yet
Unit 2 AIML
28 pages
AI and ML Notes
No ratings yet
AI and ML Notes
8 pages
Ai Assignment No 1
No ratings yet
Ai Assignment No 1
17 pages
CCS C1 R 177 en File 75.en
No ratings yet
CCS C1 R 177 en File 75.en
4 pages
Unit 2 AIML
No ratings yet
Unit 2 AIML
23 pages
AI PR-1 Chapters Summary
No ratings yet
AI PR-1 Chapters Summary
6 pages
Al - ML Week 1 Assignment
No ratings yet
Al - ML Week 1 Assignment
3 pages
Syllabus For Essentials
No ratings yet
Syllabus For Essentials
9 pages
Artificial Intelligence Machine Learning Program Brochure
No ratings yet
Artificial Intelligence Machine Learning Program Brochure
24 pages
Assessment Anderson Taxonomy
No ratings yet
Assessment Anderson Taxonomy
16 pages
Module Philosophical Perspectives in Teacher
No ratings yet
Module Philosophical Perspectives in Teacher
3 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
Reaction Paper in Research Congress
No ratings yet
Reaction Paper in Research Congress
10 pages
DLP Diss Week 6
No ratings yet
DLP Diss Week 6
15 pages
2023PhD MIT Towards General-Purpose Vision Via Multiview Contrastive Learning
No ratings yet
2023PhD MIT Towards General-Purpose Vision Via Multiview Contrastive Learning
227 pages
TCMH
No ratings yet
TCMH
1 page
Lesson: Greeting
No ratings yet
Lesson: Greeting
14 pages
Turkey Art
No ratings yet
Turkey Art
2 pages
Learning Through The Arts
No ratings yet
Learning Through The Arts
20 pages
Daily Lesson Log (DLL) / Daily Lesson Plan (DLP) : Melcs P.545, Google/Internet
No ratings yet
Daily Lesson Log (DLL) / Daily Lesson Plan (DLP) : Melcs P.545, Google/Internet
1 page
Standardized Test
No ratings yet
Standardized Test
20 pages
A Critical Review of Assessments of Creativity in
No ratings yet
A Critical Review of Assessments of Creativity in
36 pages
BSBINS501 Project Portfolio
No ratings yet
BSBINS501 Project Portfolio
45 pages
Introduction To CUBES Strategy
No ratings yet
Introduction To CUBES Strategy
2 pages
Article Critique - Effects of Facebook Usage On English Learning Behavior
100% (1)
Article Critique - Effects of Facebook Usage On English Learning Behavior
5 pages
DLL Week 2-Q3 Math
No ratings yet
DLL Week 2-Q3 Math
10 pages
Transfer Learning Using VGG-16 With Deep Convoluti
No ratings yet
Transfer Learning Using VGG-16 With Deep Convoluti
9 pages
Quality Assurance in Assessment
No ratings yet
Quality Assurance in Assessment
4 pages
Leeds Artificial Intelligence Course Guide
No ratings yet
Leeds Artificial Intelligence Course Guide
11 pages
2 Unit 1 Characteristics and Functions of The External Organ Gr.1
No ratings yet
2 Unit 1 Characteristics and Functions of The External Organ Gr.1
4 pages
Module 3 BENLATC
No ratings yet
Module 3 BENLATC
10 pages
Boring But Important, A Self-Transcendent Purpose For Learning Fosters Academic Serlf-Regulation PDF
No ratings yet
Boring But Important, A Self-Transcendent Purpose For Learning Fosters Academic Serlf-Regulation PDF
23 pages
Mastering Computer Skills
No ratings yet
Mastering Computer Skills
2 pages
RPMS Tool For Teacher I-III (Proficient Teachers) : Position and Competency Profile
No ratings yet
RPMS Tool For Teacher I-III (Proficient Teachers) : Position and Competency Profile
2 pages

Rapport Template Master-4

Uploaded by

Rapport Template Master-4

Uploaded by

Internship Report

« Quantitative Evaluation of Generative AI Models for

Author : Alexandre WILK

Final-Year Internship from April 15, 2024 to October 11, 2024

• GenAI : Generative Artificial Intelligence

• LLM : Large Language Model

• NLP : Natural language processing

• FAISS : Facebook AI Similarity Search

• RAG : Retrieve Augmented Generation

• ICL : In-Context Learning

• ETF : Exchange-Traded Fund

• GPT : Generative Pre-trained Transformer

• BERT : Bidirectional Encoder Representations from Transformers

• KID : Key Information Document

• PRIIP : Packaged Retail and Insurance-based Investment Products

2 Overview of the possibilities 2

4 Results and Limitation 13

A Annexe : Développement durable et responsabilité sociétale 17

The Specialized Investment team is a diverse and multi-functional IT-dominant team

2.1 Why use GenAI for Information Extraction ?

2.2.1 What is BERT?

The Transformer architecture, denoted as T , is constructed from two primary mod-

MHA(Q, K, V ) = Concat(head1 , . . . , headh )W O

headi = Attention(QWiQ , KWiK , V WiV )

BERT is a pre-trained model for the following two objectives:

2.2.2 BERT for Information Extraction

To perform information extraction using a pre-trained deep learning model, we need

2.3.1 Established LLM Models

• Multi-Headed Self-Attention Mechanism: This component is crucial for understand-

• Position-Wise Feedforward Neural Networks: After processing through the self-

• Layer Normalization and Residual Connections: Each sub-layer (self-attention and

• Stacking of Blocks: The transformer architecture stacks multiple identical layers

2.3.2 LLMs for Information Extraction

3.1 Overview of the Business Requirements

3.1.1 Sources of Data

3.1.2 Evaluation Parameters

Latency95 = 95th Percentile of ti

3.2.1 In Context Learning vs Finetuning

In-Context Learning (ICL) is effective in improving a large language model’s (LLM’s)

FAISS (Facebook AI Similarity Search) is an open-source library designed for efficient

Let a dataset consist of N vectors, each in d-dimensional space:

qi and vi being the component of vectors q and v respectively.

Figure 3: Design of the solution

4.1.1 Accuracy, Precision, Recall

4.1.2 Business feedback

ça dit quoi les métiers

4.2 Possible Area of Improvment

resultat qd je fais le vote

4.2.2 Tree of Thoughts and Chain of Thoughts

Figure 4: Chatbot Arena leaderboard as of 01/11/2024

Model Input Cost Output Cost

4.3.2 The Issue with Closed Source Models

exemple gpt 4 random

Voici une référence vers l’introduction : 1.

La conclusion doit situer votre travail dans un contexte assez large,

• Un résumé des contributions

Bon a+ fin du contenu principal.

Du texte random pour présenter la section.

Vous pouvez développer cette annexe en présentant "Politique et objectifs", "Mise en

A.1 Développement durable

Mesure des résultats :

A.2 Responsabilité sociétale

Écrire ici le résumé.

Mots-clés : keyword1, keyword2, keyword3, keyword4, keyword5

Le résumé est à compléter impérativement par une série de mots

You might also like