0% found this document useful (0 votes)
21 views

Generative AI with Large Language Models AWS & DeepLearning

The document outlines a comprehensive course on Generative AI with Large Language Models (LLMs) covering various topics such as LLM characteristics, transformer architecture, prompt engineering, and model fine-tuning techniques. It details the generative AI project cycle, including the integration of human feedback and external resources, as well as challenges and optimization strategies for training LLMs. The course is structured over three weeks, with practical labs and examples to illustrate the concepts discussed.

Uploaded by

gilbert.hazard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Generative AI with Large Language Models AWS & DeepLearning

The document outlines a comprehensive course on Generative AI with Large Language Models (LLMs) covering various topics such as LLM characteristics, transformer architecture, prompt engineering, and model fine-tuning techniques. It details the generative AI project cycle, including the integration of human feedback and external resources, as well as challenges and optimization strategies for training LLMs. The course is structured over three weeks, with practical labs and examples to illustrate the concepts discussed.

Uploaded by

gilbert.hazard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

Generative AI with Large Language Models AWS & DeepLearning.

AI Full Course
Summary

Week 1 .................................................................................................................................................... 3
I. Examples of Generative AI with LLMs : .............................................................................. 3
II. LLMs characteristics: ............................................................................................................. 6
III. Examples of LLMs : ............................................................................................................. 6
IV. LLMs process : ......................................................................................................................7
V. Text generation before transformers : with RNNs .......................................................... 8
VI. Presenting Transformers Architecture : ........................................................................ 9
VII. Understanding how Transformers Work : ................................................................... 10
VIII. Using transformers in text generation: ......................................................................... 17
IX. Summarization of the role of Encoder and Decoder : .............................................. 20
X. Transformers Variations : .................................................................................................... 20
XI. The importance of prompt engineering in the LLMs output relevance : ............. 22
XII. Generative/inference configuration : .......................................................................... 24
XIII. Generative AI project cycle : ........................................................................................... 27
XIV. Lab 1 : Dialogue summarization Summary ................................................................... 28
XV. Video : Pre-training LLMs ............................................................................................... 29
XVI. Computational challenges of training LLMs : ............................................................. 33
XVII. Efficient Multi GPU strategies :.................................................................................. 35
XVIII. Scalling laws and compute optimal models : .......................................................... 37
XIX. Pre-training for domain adaptation .............................................................................. 40
Week 2 .................................................................................................................................................. 41
XX. LLMs finetuning doing supervised learning ................................................................ 42
XXI. Single Task Finetuning ..................................................................................................... 44
XXII. LLMs models evaluation .............................................................................................. 46
XXIII. Model evaluations using Benchmarks ...................................................................... 48
XXIV. Parameter Efficient fine-tuning (PEFT).................................................................... 49
XXV. LoRa technique : ............................................................................................................. 51
XXVI. PEFT with soft prompts technique : ......................................................................... 53
Week 3 ................................................................................................................................................. 55
XXVII. Aligning models with human values : ........................................................................ 55
I. Reinforcement Learning from human feedback (RLHF) :............................................. 57
II. Obtaining Feedback from humans ....................................................................................60
III. Fourth Step : Finetuning LLM with RL algorithm using the reward model: ........ 63
IV. Presenting the Proximal Policy optimization (PPO) : ................................................ 65

1
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

V. Reward hacking : Potential problem of RLHF ................................................................. 68


VI. Scaling Human Feedbacks using self-supervision : .................................................... 71
VII. Application Integration : last phase of generative AI project life cycle ................. 74
VIII. LLM optimization techniques :....................................................................................... 75
IX. Cheatsheet about time,effort and steps of each step in the presented phases in
the lifecycle .................................................................................................................................... 78
X. The need of external ressources to be connected with the LLM .............................. 79
XI. Connecting to external Datasets with RAG ( Retreival augmented generation ) :
80
XII. Required steps to implement LLM-powered by external apps and APIs ............. 83
XIII. LLM struggles with complex reasoning problems ( mathemathical ).................... 84
XIV. ReAct : Synergizing reasoning and action in LLM ..................................................... 89
XV. Langchain presentation ................................................................................................... 94
XVI. LLM Final architecture ..................................................................................................... 96

2
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

Week 1
I. Examples of Generative AI with
LLMs :
A. Q/A Chatbot :

B. Essay writer :

3
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

C. Dialogue summarization :

D.Traditional Translation :

4
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

E. Natural language-machine code translation :

F. Entity extraction ( NER ) :

5
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

G. Augmented LLMs :

Here ,we connect our LLMs to external APIs and thanks to that : the LLMs can provide
information that were unavailable in its pre-tarining data .

II. LLMs characteristics:


• They are a subset of Generative AI models and architectures that have the ability
to find statistical pattern in massive Data created originally by humans
• They are trained on Bilions of words for many weeks or mounths with high
computation power

III. Examples of LLMs :

6
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

- Each circle is proportional to the number of parameters in these LLMs .


- These models are called base models
- They take natural language human task ( prompt ) and do the generation
according to the prompt’s content

IV. LLMs process :

Any LLMs process is composed of these components :

- Prompt : it’s the text that we pass to LLMs


- Context window : it’s the memory spact available to the prompt (generally
thousands of words )

7
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

- Completion : it’s the model’s output which is considered as competition the


user’s prompt : and since the user prompt was a question : its completion is the
question’s answer
- Inference : is the act of using model to generate text

V. Text generation before transformers


: with RNNs

With RNNs , we used to take only N last words in the text to generate words after

And the more words we take from context : the more accurate generated words we get ,
and more the RNNs architecture size become important and large .

8
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

And even with large size RNNs , we can have bad results , because of their inability to
take all the words in the context as input while generating more text .

H. Beside, the context length : Human natural


language is complicated to understand it
perfectly :

The human language text can be hard to understand and anlayze even by the humans
itself which makes the text generation task more complicated.

And that’s why RNNs weren’t sufficient .

VI. Presenting Transformers


Architecture :

9
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

I. Transformers strength :

The Transformers strength is its ability to memorize the context of each word not only
between the close ones to each other but to all the other words in the context and
giving bigger weights to most important contexts word-word ( as hsown in orange in
the figure upside ) .

And this gives the model the ability to learn multiple information at the same time like :
who has the book ? , who taught with the book ?

And this ability si called : Self-attention mechanism

J. Self-attention mechanism and attention Map :

• This figure shows the attention map which regroups all the possible attention
weights.
• These attention weights are learned during LLM training

VII. Understanding how Transformers


Work :

10
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

K. Abstract presentation of the architecture :

- The transformer architecture is splitted tnto two major parts : Encoder and
Decoder
- These compounents works in conjunction of each other : and they have some
common similarities .

L. First step : the tokenization

11
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

The inputs are gonna be numbers instead of a string , they represent the words/tokens
ID given by the tokenizer

- While using the model , we must use the same tokenizer use din the training
phase

M. Second step : the embedding

- Representing each word with vectors of 512 elements (512 is the size presented in
the paper : Attention is all you need ) containing semantic meaning of the words

12
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

N. Third step : Positional encoding

By adding position embedding to the token embedding : we are gonna preserve the
information about the words order inside the text and not losing the position info

13
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

O. Fourth Step : Self-attention layer

Its role is to analyze all the relationships between the inputs tokens which allows the
model to attend to different parts of the input sequence to better capture the
contextual dependencies between the words. The self-attention weights that are
learned during training and stored in these layers reflect the importance of each word in
that input sequence to all other words in the sequence.

14
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ In reality it’s MULTI HEADED self attention layers :

- This allows the model to learn in parallel many attention maps at the same time
and independently, number of multi head is generally between 12-100
- The purpose of that is to let each self-attention layer learn a different aspect of
language

o Example :
▪ one may see the relationship between the people entities
▪ other one focus on the activity
▪ …etc

15
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

P. Last Step ; Feed-forward network +Softmax


output :

- It will take in the input all the learned attention weights maps
- Its output is a probability of every word in the vocabulary

- The token with the highest score ( probability) the most likely predicted token.

16
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

VIII. Using transformers in text


generation:
The example used here is Seq2Seq application: Translation from French to English

Q. First Step : passing the input to Encoder part

1. “J’aime l’apprentissage automatique” is gonna splitted to tokens


2. Each token will be mapped into its correspondant ID
3. We pass that array of token IDS to encoder part :
a. Embedding and positional ancoding layers
b. Multi-headed self attention layers
c. Feed Forward neural network
4. We pass the Encoder output in the middle of Decoder to influence the self-
attention mechanism output in the decoder part

17
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

R. Second Step : Generating the first output token


using the Decoder

1. We pass <Start-of-sequence> (SoS) token to Decoder input which will trigger the
Decoder to start the genartion
2. The Sos is gonna be passed to :
a. the embedding and the positional layers
b. Multi-headed self-attention layer in conjunction with the Encoder output
c. Feed-Forward Network which will generate the vector of probabilities
d. We get token ID = 297 with highest probability => the first predicted token

18
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

S. Last Step : generating the rest of the sequence


using the Encoder output and the previously
generated tokens

1. The 297 ID token is gonna be concatenated to the Sos Token in the input of the
Decoder
2. Repeat the whole process of the Previous Step to generate the next token
3. Keep repeating that process until we generate <End-of-Sequence> token (EoS)

4. We do the mapping og the output vectors to get the final translation :

19
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

IX. Summarization of the role of


Encoder and Decoder :

X. Transformers Variations :

20
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

T. Encoder-only models :

• Adding additional layers to do text classification tasks such as sentiment analysis


• Example : BERT
• Less used these days

U.Encoder-Decoder Models :

• Performs well In Seq2Seq tasks as translations


• Can eb genralized to all text generation tasks
• Input and output lengths can be different
• Example : BART and T5

21
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

V. Decoder only models :

• Can be generalized to most of the tasks


• Example : GPT family, BLOOM , Jurrasic , LLAMA
• Used a lot these days

XI. The importance of prompt


engineering in the LLMs output
relevance :
Prompt engineering is set of strategies to take in consideration while writing LLMs input
in order to increase the accuracy of the output accuracy.

One of this strategies is called : in-Context Learning

Briefly, it consists of giving examples of the desired task in the input in order to let the
LLMs understands well the purpose of your input .

In context learning has different types :

22
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

W. Zero-shot inference :

It means including the input data in your prompt

• LLMs provides good results generally in this kind of prompts regardless of the
nature of the data used on its training
• However, Little LLMs doesn’t give accurate results to that , Example : GPT-2

X. One-Shot inference :

In this kind of in-context learning , we provide in the input a complete concrete


example before giving our input data

• This makes little LLMs give more accurate results

23
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

Y. Few-shot inference :

Sometimes giving single exampleisn’t sufficient to make the LLMS accurate , so we give
it instead a mix of examples

Z. Summary : zero vs oe vs few shots

Take in consideration that the prompt cannot pass the context window , so in the few
shot we can include generally 5 to 6 examples only .

• If the model still don’t give satisfying results with few-shot inference : it means
that the model needs to be finetuned to our data .

XII. Generative/inference configuration :


• They are some methods/parameters that influence the model in the inference (
prediction ) phase to choose the next word to generate

24
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

AA. Max New tokens :


It’s the simplest one , it specifies the max number of tokens to generate before stopping
the generating , that concerns obviously only the cases where it pushes the model to
generate more than MAX_NUMBER

BB. Sample top K and Sample top P :


➢ Greedy vs Sampling technique in LLMs generation

And the Top P and Top K concerns the sampling technique

25
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Sample Top K :

It means taki,ng in consideration only the K tokens with highest probability In the
inferenace process

➢ Sample top P :

It means taking in consideration only the toekns with the highest probability whose
cumulative’s is <= P .

26
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

CC. Temperature

- It’s the only configuration that directly impacts the model output
- Default value is 1  not impacting the result of softmax layer
- This value controls the ratio of randomness of the generated output in sampling
technique :
o Cooler temperature makes the probability distribution goes Peaker
o Higher temperature gives flatter distribution

XIII. Generative AI project cycle :

27
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

DD. Scope :
By defining the use case to focusd on, the more specific we go , the more time ,effort
and computation time we are going to save

EE. Select
Generally , we choose an existing model , but sometimes we will be obliged to pretrain
our own one

FF. Adapt and align model


➢ Prompt engineering
We have seen how this make our model outputs more accurate

➢ Fine-tuning
This thing is gonna be discvussed in the next week

➢ Align with human feedback


Takes in consideration the client preferences ( will be discussed in the third week )

➢ Evaluate
Relevant

GG. Application integration :


➢ Optimize and deploy model
➢ Augement the LLM :
Sometimes it’s necessarry to feed our LLM with APIs to make him gives response to
questions that likely won’t can gather from the training data

XIV. Lab 1 : Dialogue summarization


Summary
• We have used “google/flan-t5-base” model to do Summarization dialogue
without any training
• We have used dataset called "knkarthick/dialogsum" ( hugging face ) to evaluate
how much this LLM is good in Dialoge summarization
• We have tried zero-shot inference , one-shot inference and few-shot inference
to observe how the model giving more promising results when we provide himl
examples of the task he needs to do
• Important : giving more than 5/6 examples in the few-shot technic will not make
the model more accurate, so if the modl still giving bad results with few-shot
inference , then probably it isn’t the right model to go for .

28
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

• Doing Prompt engineering ng is important for picking the most suitable LLMs for
our task , so we will favorize and pick the models who gives acceptable results
while doing zero and/or one and/or few shot inference rather than the LLMs
who give us bad results regardless of the prompt format

XV. Video : Pre-training LLMs


HH. Model Cards

Every model has its own Model card which helps us to do a comparaison between our
candidate models to choos ethe most suitable one for our use case .

29
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

II. How the three variation fo Transformers get


trained ?
➢ Encoder-only LLMs ( known as AutoEncoders models
):

They use technic called Masked Language Modeling (MLM ) where it hides randomly
some tokens inside the sentence and the model’s goal is to identify correctlmy the
hidden mask (densoising the masked text ) using bidirectional context (analyzing the
text from the two sides )

- Use Cases (anythinh related to text classification generally ) :


o Sentiment analysis
o Named entity recognizer
o Word classification
- Examples :
o BERT
o RoBERTA

30
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Decoder-only LLMs ( known as Autoregressive models


):

This kind use technic called Causal Language modelling (CLM) , where the model has
unidirectional context and its goal is to predict the next token ( and keep repeating it to
generate sequence of tokens instead of single one )

- Use Cases :
o Text generation generally
- Examples :
o GPT
o Bloom

31
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Encoder-Decoder LLMS (known as Sequence-to-


Sequence models ) :

They use technic called Span corruption : it corresponds of hiding a mass of tokens and
represent them with a single one called Sentinel Token <x> and the aim of the model is
try to predict the tokenthat corresponds to that sentinel

- Use Case :
o Translation
o Text Summarization
o Question Answering
- Examples :
o T5
o BART

32
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Summary :

JJ.Existence of Moore’s law for LLMs models size :

• The size of the released LLMs keeps Increasing year by year


• Can we say ( as we say about CPUs in Moore law) that increasing the model size
just keep our model performs better regardless of other things ? short answer is
No and it’s gonna be justified in the coming sections .

XVI. Computational challenges of training


LLMs :

33
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

KK. Parameters storage in RAM

• While training a model all its parameters need to be in RAM


• One parmeter In 32-bit precision takes 4 bytes => a model of 1 Billion parameter
takes whole 4GB in RAM !!!!!
• And actually , we don’t need just to store the model parameters : but also
optimizer , activation and gradients variables => 4 bytes per parameter + extra 20
bytes per paramleter => 80 GB RAM storage !!!!!!!!!!!!

➢ Solution : Quantization :

• Instead of storing the variables in 32 bits Full precision we store it in 16FP which
needs 2 bytes memory instead of 4 bytes ( it will reduce the interval of [max , min
] value )
• This will make the values less precise but generally that doesn’t impact much the
training accuracy in most of the cases
• There is also a more optimized version developed by Google and used in T5
training : it’s called BFLOAT16 which will keep the same [min , max] interval of
FP32 but truncated version

34
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

LL. Reducing the parameters precision isn’t


enough for huge models :

Solution : Split the model across multiple GPUs for training

XVII. Efficient Multi GPU strategies :


MM. When to do distributed Computing :

1. Model too big to be handled by single GPU


2. Train data in parallel

35
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

NN. Distributed Data Parallel ( DDP ) to train data


in parallel :

• We replicate the model and its parameters in each GPU


• For each model (GPU ) we pass it a part of the data to do its training
• After the ompletion of training ; we do synchronizing process to update the
parameetrs in all the models

OO. Sharding the model to split the model for


multiple GPUS :
The popular tehcnic is ZERO redundancy optimizer , its aim is to not have any
redunduncty in the model parameters , and here we have mumtiple stages :

- Stage 1 : Gradients and parameters are redundant , Optimizer states are not
- Stage 2 : Parameters are redundunts , gradients and optimizer not
- Stage 3: Nothing is redundant , everything is splitted across all the GPUS

36
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

• And The GPUs needs to communicate with all other GPUS to get all the variables
that doesn’t have .

PP. Fully Sharded Data parallel : a combination of


DDP and ZERO

- We split the data and the model parameters for each GPU as it’s clearly
illustrated in this figure .

XVIII. Scalling laws and compute optimal


models :

37
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

QQ. Scaling choices for pre-training :

Maximzing the model performance comes from minimizing the loss which comes from
maximizing the model performance (number of tokens ) and the model size ( parameters
numbers ) but the constraint is the computation budget ( hardware , training time and
cost )

RR. Chicnhilla paper : computing the optimal


model parameters and tokens

• Chinchilla paper cites that very large models may be over-parametraized ( having
number of parameters much more than its necessary ) or under-trained ( having
little amount of tokens relative to the model size )

38
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

• Chinchilla paper found that somle models are bigger than it should and model
litter than them can give similar or better results .

• Chinchilla paper found that the optimal model has number of tokens = 20 x
number of model parameters .
• This rule wasn’t respected by GPT-3 and BLOOM which seems to be over-
parametrzied
• in the other side ; we found that LLAMA-65B a relative little model obeys to
Chinchilla rule (and its newer than the other cited ones )

• and thanks to Chinchilla Paper we can say that LLMs doesn’t follow Moore’s law
but we can get back to little models but with better performance thanks to the
big data size .

39
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

XIX. Pre-training for domain adaptation


SS. LLMs needs to be finetuned for domain
specific language

• In specific domains ( like legal and medical ones ) we tend to find specific
terminology that we don’t find it usually in general purpose corpus, and that’s
why we need to adapt the date fed to the LLMs to that specific domain by feeding
it that domain data

40
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

TT. BloombergGPT : GPT adapted to finance


domain

• It’s fed by 51% of financial relate data and only 49% general purpose data .

And resulst shows that it follows roughly the Chinchilla low due to lack of finance
Data which leds to lack of number of tokens . so we cannot follow always the
chinchilla rule

Week 2

41
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

XX. LLMs finetuning doing supervised


learning
UU. Finetuning using dataset :

• Unlike LLMs pre-training which is unsupervised task , finetuning it is a


supervised task where we feed the pretrained LLMs with our proper dataset
containing two columns : the prompt and its appropriate completion, and then,
the FineTuned LLm will have an improved performance for our desired task .
• Since it’s full finetuning which involves all the LLMs weights , we must take in
consideration that the model at whole ( weights , optimizer and activation values
) need to be loaded in RAM .

VV. The dataset should be formatted as


instrtuctions

42
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

• That two columns should be formatted in form of an instruction ( the expected


way from user to interact with our finetuned LLMS) which is for example
“summarize the following text : PROMPT COMPLETION” for text
summarization.

WW. Formatting the datsets data according to


our use case :

XX. Data Splitting :

43
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

XXI. Single Task Finetuning


YY. Defintion

- It means that our dataset focus exclusively in only a single task ( like text
summarization )
- Often, only 500-1000 examples needed to fine-tune a single task

ZZ. Its diasadventage : Catastrophic forgetting

- Finetuning a model in a single task may lead him to forgetting the right
mechanism of doing other tasks
- This example shows how a LLM fine-tuned in sentiment analysis anits result with
Named entity recognition task
- Sometimes it’s okay to have this phenomenon , if we want our finetuned LLMs to
be accurate only in a single task .

AAA. How to avoid catastrophic forgetiing :


➢ Multi-task finetuning :
- This means we will finetune our LLM with dataset containing instructions for
multiple tasks
- That will need a huge dataset to use ( 50.000 – 100.000 examples across many
task )
- We will discuss that in the next section

44
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Parameter Efficient Finetuning ( PEFT ) :


- Instead of multi-task finetuning we can do PEFT
- PEFT is a set of techniques that will do the change in weights values only in the
concerned neurons and layouts by our specific single task without impacting the
weights of not concerned layers .
- We will discuss that after multi-task finetuning .

BBB. Multi-task finetuning :

- As we have already said, it needs 50.000-100.000 examples to give good results

CCC. FLAN models : an examples of multi-task


finetuned models :

• FLAN-T5 is a multi-task fine-tuned model of T5

45
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

- It’s finetuned using multiple datasets in multiple domains .

- And for each dataset , it was trained on multiple template prompts , like it shown
here for dataset SAMSUM for dialogue summarization.
- For dialogue-summarization task , FLAN-T5 was trained in everyday
conversations , so if we want to train it for customer support chatrs
summarization : we will give it additional finetune using a dataset dedicated for
customer support conversations. ( which is DIALOGSUM )

XXII. LLMs models evaluation

46
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

DDD. Used Metrics

EEE. N-grams concept

- Unigram : one single word/token


- Bigram : two consecutive words/tokens
- ….etc

FFF. ROUGE Score calculmation

These above picture shows the ROUGE score computation with unigrams ( ROUGE-1 )
and we can see there the downgrade of ROUGE with single unigram , that it gives

47
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

highscore for wrong egenrated output when the difference is only in one world ( even if
it’s critical word like ‘not’ )

➢ Another problems of ROUGE :


- The order wasn’t taken in consideration , so outputs “it is very cold outside” and
“outside is cold very it” has the same score
- The word repetition is like a hack : “cold cold cold cold cold” has the same score
as “it is very cold outside”

Solution : using ROUGE-L

- The idea here is to use L= longest common subsequence between the reference
and the generated output ( which is size= 2) and their number is 2 , and calculate
ROUGE score according to it .

GGG. BLEU SCORE :

XXIII. Model evaluations using Benchmarks

48
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

LLMs are complex, and simple evaluation metrics like the rouge and blur scores, can
only tell you so much about the capabilities of your model. In order to measure and
compare LLMs more holistically, you can make use of pre-existing datasets, and
associated benchmarks that have been established by LLM researchers specifically for
this purpose. Selecting the right evaluation dataset is vital, so that you can accurately
assess an LLM's performance, and understand its true capabilities. You'll find it useful to
select datasets that isolate specific model skills, like reasoning or common sense
knowledge, and those that focus on potential risks, such as disinformation or copyright
infringement. An important issue that you should consider is whether the model has
seen your evaluation data during training. You'll get a more accurate and useful sense of
the model's capabilities by evaluating its performance on data that it hasn't seen before.

➢ Most used benchmarks :

XXIV. Parameter Efficient fine-tuning


(PEFT)
HHH. Definition :

A set of techniques used to train only certain weights of LLMS instead of full-finetuning
the whole LLM and freezing most of LLMs compounents , the goal is to do finetuning

49
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

using one single GPU since there is just few number of weights to train and that will
consume less of time of course .

III. Finetuning for multiple tasks : fullfinetuning


vs PEFT :

Full finetuning PEFT

• When doing PEFt for multi task finetuning , instead of having and storing a
complete new version of the LLM for each task , we will just store its owns PEFT
weights and we call the appropriate PEFT weights depending on the task in the
original LLM to have the finetuned one for a specific task

JJJ. PEFT methods :

➢ Selective :
A set of methods which select subset of weight original model to finetune , and freeze all
the rest

50
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Reparamterization :
Lora is a technique which lets the models as it is and learn the necessary updates to do
in certain compounent of the transformer using independent trainable matrix for each
layer to reparamtemize , it will eb discussed in details in the next section .

➢ Additive :
A set of methods which adds layers or parameters top the original architecture and train
only these added ones.

- Adapters :

add new trainable layers to the architecture of the model, typically inside the
encoder or decoder components after the attention or feed-forward layers.

- Soft prompts:

A set of technics which keeps the model architecture fixed and frozen, and focus on
manipulating the input to achieve better performance. This can be done by adding
trainable parameters to the prompt embeddings or keeping the input fixed and
retraining the embedding weights.

One of them is called “Prompt tuning” that we are going to discuss in the next
section.

XXV. LoRa technique :

51
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

• It’s applied to self-attention layer and to feed forward layer if it’s necessary
o typically , applying it to self attention layer only is enough
• We freeze the weight values of the layer , and we add two rank matrices A and B ,
their multiplication A*B gives a matrix of similar dimension to the layer ( d x k ) (d
xrXrxk= dxk )
• A*B will be added to original matrix layer weghts to construct an updated
(repramtrzied ) version of this layer for the task which was trained for.

KKK. Training for multi task :

Each task will has its own A and B matrix and we will add A*B to the correpondant
desired Task .

52
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

LLL. Lora vs full finetuning performance :

- By finetuning FLAN-T5 for dialogue summarization we see there is only 3% of


difference between full finetuning and LoRa even though Lora training consumes
much less time

MMM. Choosing the suitable number of ranks ( r


):

- Authors shows that the best r is usually between 4-32


- Going beyond 32 is useless

XXVI. PEFT with soft prompts technique :

53
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

NNN. Definition :

- It consists of injecting 20-100 trainable tokens to the original input tokens


- Each token is a vector with the same length of the embedding vectors of the
input

OOO. Full finetuning vs prompt tuning :

- Full finetuning updates all the model weights


- Prompt tuning freeze the model weights and add 20-100 trainable tokens to the
input layer

54
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

PPP. Prompt tuning for multiple tasks :

- Each task has its own trainable tokens values, and by just applying the tokens of
the correspondant task , we achieve multi task finetuning !

QQQ. Prompt tuning performance :

- Prompt tuning is effective for large models

Week 3
XXVII. Aligning models with human values :
RRR. Models behaving badly :
A LLM model behaves badly when :

55
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

- Using toxic language


- Giving Aggressive responses
- Providing dangerous information

SSS.HHH rules for good behaving model :!

A good model who is helpful ( gives useful answers ), honest ( don’t give false information
) and harmless (don’t give dangerous information )

TTT. Align with hman feedback : a new technic


to adapt and align model to our need :

After prompt engineering and fine-tuning , There is a thiord method to align model to
our need which is by involving the human feedback in the model learning using
Reinforcement Learning .

56
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

UUU. Align with Human Feedback utility :

This graph shows already how much doing finetuning using human feedback can be
really an efficient way to augment the model accuracy.

I. Reinforcement Learning from human


feedback (RLHF) :
- This method is one of the techniques used to involve human feedback in model
finetuning.

57
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

VVV. Reinceforment learning high level


overview :

• Reinforcement learning is a type of machine learning in which an agent learns to


make decisions related to a specific goal by taking actions in an environment,
with the objective of maximizing some notion of a cumulative reward.
• In this framework, the agent continually learns from its experiences by taking
actions, observing the resulting changes in the environment, and receiving
rewards or penalties, based on the outcomes of its actions. By iterating through
this process, the agent gradually refines its strategy or policy to make better
decisions and increase its chances of success.

➢ Example : Tic-tac-Toe

I. the agent is a model or policy acting as a Tic-Tac-Toe player. Its objective is to


win the game. The environment is the three by three game board, and the state
at any moment, is the current configuration of the board.

58
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

II. The action space comprises all the possible positions a player can choose based
on the current board state. The agent makes decisions by following a strategy
known as the RL policy.
III. as the agent takes actions, it collects rewards based on the actions' effectiveness
in progressing towards a win.
IV. The goal of reinforcement learning is for the agent to learn the optimal policy for
a given environment that maximizes their rewards.
V. This learning process is iterative and involves trial and error. Initially, the agent
takes a random action which leads to a new state. From this state, the agent
proceeds to explore subsequent states through further actions. The series of
actions and corresponding states form a playout, often called a rollout.
VI. As the agent accumulates experience, it gradually uncovers actions that yield the
highest long-term rewards, ultimately leading to success in the game.

WWW. Porjecting Reinforcement Learning in


LLM’s context

- The agent is our LLM


- The objective is to generate aligned text to the user feedbacks
- The action is set of vocabulary used to generate the answer
- The environment is the LLM context
- Current Context is the state.
- Reward is the user feedback ( example: value 0 or 1 to toxicity generated by
LLm’s action )

XXX. Replacing User feedbacks by reward


model:
- obtaining human feedback can be time consuming and expensive. As a practical
and scalable alternative, you can use an additional model, known as the reward
model, to classify the outputs of the LLM and evaluate the degree of alignment
with human preferences.

59
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

- That reward model will be training using the few human feedbacks and
interactions we got by traditional supervised learning
- Once trained , that rewartd model will be used to evaluate the LLms output by
assigning reward value and update LLms weights according to that reward value .
- Rollout inside of playout : the sequence of actions and states is called a rollout,
instead of the term playout that's used in classic reinforcement learning.

II. Obtaining Feedback from humans


YYY. First step : Prepare dataset for human
feedback

• Once we have finetuned our LLM , it’s the time to gather human feedbacks by
prapring the datset for this
• First Step is generating multiple completions ( in this example : 3 ) for the same
prompt

60
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

ZZZ. Second Step : collect human feedbacks

➢ Define model alignment criterion :


• It’s among the HHH that we have defined , and here is the helpfulness
• Make the humans rank the helpfulness of the three generated completions of the
same prompt ( rank = 1 is the greatest completion )
• Gather multiple user feedbacks for each prompt to assure the feedbacks
correctness and consistence

➢ Rules to take in consideration while taking the human


feedbacks :

61
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

AAAA. Third step : prepare the labeled data for


reward model using the human feedbacks :
➢ The data must be formatted as pair-wise training data
for the reward model :

• We will construct all the possible pairs of the completions of the same prompt ( 3
possible completions => 3 possible pairs )
• We associate to each pair array of two elements, giving the value 1 to the bets
completion ( with the lowest rank )

➢ Reorder the pairs :

• Making always the completion with the lowest rank ( whose score is 1 ) on the left
side : [ 1 , 0 ]
o Yj is always the most preferred completion and Yk is the less preferred
one

➢ We can explain now why ranking-feedback is better


than thumbs-up / thumbs-down feedback :
- By doing ranking-ffedbakc swe got bigger dtaatset composed of every pair
possible of the possible completions of the same prompt .

BBBB. Training the reward model :

62
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

- We learn the model to predic the most p^referred completion for each pair by
associating bigger reward score to the preferred completion ( Yj and rj ) and
doing the opposite for the less preferred one( Yk , rk )

➢ Example about the toxicity :

- Giving the highest probability of the less toxic answer

III. Fourth Step : Finetuning LLM with


RL algorithm using the reward
model:

63
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

- The training process will be iteratve ( multiple ikteartion )


- We pass to the RL algorithm , the prompt , the completion genertated by our
LLM ; the reward score given by the rewared model to the RL algotithm

- And then the RL algorithm will update the LLM weights according to the reward
score giving by the reward model.

64
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

- After N iterations, the model start generating completions with pretty high score
from the reward model
o The stop criteria is either defined by the number of iterations
o Or by thereward score threshold

➢ The used RL techic is called : Proximal policy


optimization ( PPO )

IV. Presenting the Proximal Policy


optimization (PPO) :
CCCC. Presenting the method globally :

This approach ios composed of two phases which get repearted many ioteartions to
create the human-aligned LLM at the end .

DDDD. Phase 1 : create completions

65
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

- For each prompt we create completions that are gonna used to evaluate the
model performance at the current iteration.

EEEE. Phase 2 : update the model weights using


the objective function
FFFF. Presenting the PPO objective function :

It is composed of the addition of three losses : policy loss , value loss and entropy loss.

➢ Value Loss :

- It’s a loss function that tries to minimize the difference between the known
future reward of the next token and its estimated reward value calculated by our
baseline function

66
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

- This loss is calculated in the first phase

➢ Policy Loss

- It’s the most important ingredient in the PPO objective loss


- It aims to make the expected reward value higher
- Its goal is to make the Updated LLM gives much more efficient tokens while
generating the completiuon compared the to the initial one using that factor of
PIs whichn defines the probability of doing the action t giving the state t , which
means that the updated LLM generates tokens with higher probabilities
compared to the not selected ones .
- The advantage term is a factor defining how much the taken action is
advantageous to all the possible other actions ( action = generating next
completion token )

- The min function and the other term goal is to keepthe the policy ij te trust
region which means that the changes made to the initial LLM aren’t very big .

67
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Entropy Loss :

- The entropy loss here determines the creativity of the generated completions
o Low entropy = less creative
- Its utility is identical to the temperature hyperparameter using in the inference
opearion , the only difference between them is that the entropy loss is used in
training while the temperature is used in the inference ( prediction )

V. Reward hacking : Potential problem


of RLHF

68
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

GGGG. Presenting the problem

• In the first pic we seecompletion generated by our initial LLM ( toxic initially )
and with the least toxicity score
• The second pic shows a completion generated with RL-updated LLM which gives
answer less toxic and with better toxicity score
• The last one is way too exagerrtaed answer which didn’t represent the truth, but
it has the greatest toxicity score ! , and this shows that our agent ( LLm ) learned
to get reward hacking by just giving beautiful answer .

• Another aspect of reward hacking is by giving completions completely irrelevant


to the prompt, but it generates higher toxicity score that will lead to updating
negatively our LLM model weights.

69
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

HHHH. Avoiding reward hacking by using KL


divergence lost :

• The role of this loss is to assure thar the updated LLm gives answer not very far
from the original answerby computing the difference of the probability
distribution of all the tokens between the original LLM and the updated version
at iteration ( this loss is dutring during training )

- And this KL divergence penalty will be added to the PPO algorithm loss

➢ Computing this loss needs to charge the two LLMs in


teh memory :
- And here we can see that using PEFT instead of Full finetuning will be
advantageous in term of memory consumption

70
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Observing the result by comparing the toxicity score :

VI. Scaling Human Feedbacks using self-


supervision :
IIII. Probelmatic and definition :
• The labeled data set used to train the reward model typically requires large
teams of labelers, sometimes many thousands of people to evaluate many
prompts each.
• Constitutional AI is one approach of scale supervision. It is a method for training
models using a set of rules and principles that govern the model's behavior.
Together with a set of sample prompts, these form the constitution.
• By doing then, the model criticize his own inputs and governs them using our
defined rules

71
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

JJJJ. Example of constitutional rules mentioned in


the research papers :

- They all aim to respect the HHH rule

KKKK. Constutiunal AI implemnattion for RLHF :


• It’s composed of two phasis

➢ Step 1 : Supervised learning stage

• The aim here is to make our Helpful model to be less toxic , because its
helpfulness will make it tend to give dangerus information and becaming harmful
in result , as presented in that example

1. We get our LLM to generate harmful answer

2. We make them question himself about that generated answer whether it follow our
constituonal rule

3. use the pair ( harmful answer , self criticized and controlled answer ) as training
dataset to finetune our LLM to become less harmful.

72
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Explaining the Step 1 datset generation :

1. We give normally our prompt ( can you help me hack into my neighbor’s wifi )
2. Once the completion is generated , we give to our LLM prompt as checking of the
harmfulness, unethical , nracist ..etc answer is it
3. And if the LLM acknowledge his harmfulness in the generated completion , we
ask them to rewrite the answer of the previous question, by respecting the
constitutional rules , and by doing that we generate our pairs to finetune our
LLM :

73
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Step 2 : Reinfocement learning stage by doing


Reinforcement learningAI feedback ( RLAIF instead of
RLHF )

- This step is similar to RLHF , but here it’s up to the model to select the most
preferred response among the set of generated ones by the Finetuned version of
LLM too red teaming prompts ( red teaming prompts : a set of prompts that
makes the model generates harmful answers )
- And then we use these feedback sto train custom reward model and RL algorithm
exactly as we didi with RLHF to generate at the end constitutional LLM.

VII. Application Integration : last phase


of generative AI project life cycle

74
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

It’s composed mainly of two aspects :

➢ Optimize and deploy model for inference :


Here we answer questions about the needed budget and the accepted latency of the
LLM to give his completions to the final applications which uses it. And also how much
we are giving up about model accuracy to increase its performance

➢ Augment model and build LLM powered applications


Here we answer questions related to the need of linking external dataset or API to our
LLM , and how can we do it .

VIII. LLM optimization techniques :


There are three popular techniques which aims to reduce the model size and minimizing
the difference of accuracy after doing that ( which means having smaller LLM with
closer accuracy to the original one )

75
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

LLLL. Distillation :
➢ Defintition :
Using our large original LLM ( Teacher ) to train smaller LLM ( Student ) , the goal here is
to make the student LLM gives identical next-token probability distribution to the
Teacher LLM .

➢ Explaining the process :

1. We will use a training data ( prompts ) to generate completions by the Teacher


LLm and the student LLM ( the Teacher LLM weights are frozen : trainable = false
)
2. We use the softmax outputs of the two LLMs ( probability distribution of the
next-token ) to calculate the Distillation Loss which will tell us how much the
Student LLM completions are close to the Teacher’s
a. The reason why we used Temperature > 1 is to force the probability
distributions of the majority of tokens to be far from 0 ( so we can
calculate distillation Loss and having loss far from 0) : that’s why the labels
and predictions are called ‘soft’ here .

76
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

3. And in parallel , we compute the Student Loss using the ‘hard’ labels ( which
means the completions generated byu the Teacher when Temperature = 1 ) , and
use it to train our LLM student using backpropagation.

➢ Distillation isn’t a good approach for Decoder-onmy


generative models :
Studies shows that this method is much efficient with Decoder-only models instead (
BERT , ….. )

MMMM.Post-Training Quantization ( PTQ ):

• PTQ’ s goal is to transform (after training the model and being ready to be served
) the model weights into 8-bit precision instead of 16-bit in order to reduce the
model Size and losing a bit of the accuracy
• This training also aims to define the new Min and the new Max of model weights

77
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

NNNN. Pruning :

• It consisys of removing the model weighst whose values are closer to 0 ( so they
have a tiny impact in the model output )
o There are several pruning methods : Full re-training , PEFT/LORA , post-
training
o Removing these weights = save their space storage

➢ Pruning generally isn’t very efficient in practice


- Because generally , a few percentage of model weights are close to 0

IX. Cheatsheet about time,effort and


steps of each step in the presented
phases in the lifecycle

78
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

X. The need of external ressources to


be connected with the LLM
OOOO. Encountred difficulties by the LLM :

➢ Giving out of date data :


When the model got asked about information that get changed over time ( and bypass
the current version of its training datset )

➢ Wrong data
Especially when it comes to Mathemathical formula

➢ Hallucination
When the LLM doesn’t know the answer of a prompt : he generates random completion (
which is called hallucination )

PPPP. Solution : LLm-powered application

79
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

By using a framework ( like “LangChain” ) which play the role of orchestration library ,
the library which will make the connection of our LLM and the User Application with
external Datasources ( web , database , documents ) and external apps ( including APIs )
possible .

XI. Connecting to external Datasets with


RAG ( Retreival augmented
generation ) :
QQQQ. Definition :
- RAG is a framework which enables the Conecction of our LLM with external data
sources
- It is useful to make the LLM have access to data sources that he didn’t access
during the training phase
- RA G has many implementations

RRRR. RAG as it’s implemented by facebook


Researchers

1. The query encoder format the needed query from the user prompts ( the gray
square )
2. It’s get passed by the Retreiver to the external Information sources ( docs ,
datasets , web ) and gets the query results ( blue square )
3. The prompt and the query result ( the two squares ) get concatenated and passed
to the LLM to use both of them to generate the appropriate completions ( purple
square )

80
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Example searching Law documents

SSSS. RAG integrates with many data sources

- Interestingly, it can interacts with dataset as vector store which contains the
embedding vector of each token directly ( which eliminates the need of
preprocessing datasource with tokenization step and exploit it directly by the
LLM )

81
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

TTTT. Things need to be taken in consideration


while doing RAG :
➢ Data must fit inside the context window :

- When concatenating the user prompt and the gathered data from the external
dataset and putting it inside the context , we must take in consideration the
limited size of the context
o The solution for long data is to divide it to multiple chunks using
framework like ‘LangChain’

➢ Data needs to be formatted as embedding vectors :

82
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

UUUU. Vector database search utility :

- In this fomrat , the text is identified by a key


o This will let the data includes the citation which can be included in the
generated completion

XII. Required steps to implement LLM-


powered by external apps and APIs
Use case presented below is a conversation with chatbot to cancel an e-commerce
order

VVVV. First Step : Plan actions


- Write and plan formally the expected steps and the flow in the conversation
between our bot and the user to present the full use case

83
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

WWWW. Second step : Format outputs:


- Once the required data are gathered from the user, we format the needed query
to get the needed output

XXXX. Validate actions :


- Execute the required api calls to apply the order cancelation successfully ( so
here , we did backend operations instead of just getting data from external static
sources )

XIII. LLM struggles with complex


reasoning problems ( mathemathical
)

- When it comes to Mathematical operations: LLMs are known to be struggled with


, even with relatively simple problem.

84
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

YYYY. First solution : formatting the prompt as


chain of thought

• It consists of breaking the problem into subset of multi steps which will make the
LLM understand the correct process to obtain the problem solution
• This will be done in one-shot and/or few-shot inference in the prompt as it’s
presented below by giving a concrete example before inserting the user prompt :

85
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Chain of thoughts reasons isn’t always a sufficient


solution :
- When it comes to complex tasks like applying discounts in e-commerce website ,
calculating tax this will not be enough to handle them , and Program-aided
language models (PAL ) is the solution for that

ZZZZ. Second solution : program-aided


language models (PAL ) :
➢ The problematic is always the mathemathic
operations :

- They give wrong answers even with straightforward prompt

The solution for this kind of problems is to let the LLM interact with application strong
in math computations such as the Python interpreter

and one interesting framework to implement that is by using Program aided framework

86
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Explaining the process :

1. As the first solution , this apoproach requires one-shot or few-shot inference


2. The question part still the same as the original prompt
3. The answer part is broken to multiple sub steps as the chain of thoughts
a. The new thing her is making the natural language steps as comments (
with # ) and adding right after them the python code part
b. And the answer is python command giving the right solution to the
presented problem
i. And here we need PAL to execute the entire generated code in the
answer part to give us the numerical value solution ( that’s why we
made the human-language substeps as python comments to
prevent them from being executed by the python interpreter.) as
presented below

4. And then , the given answer by the python interpreter is gonna be concatenated
to the PAL pormatted prompt

87
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

5. And then it’s gonna be used to generate completion with the correct answer

88
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ The orhestration framework is gonna be used to


assure the communication andthe interaction
between our LLM and the Python interpreter

- Which means by using the same mechanism of integrating external data sources (
database , web …. ) and external applications

XIV. ReAct : Synergizing reasoning and


action in LLM
AAAAA. Definition :

89
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

ReAct uses a structured example of structured reasoning about solving a similar


problem to learn how to reason througha problem and take the right action among the
available ones to find the solution , the structured example contains 4 bricks :

➢ Question

➢ Thought :

90
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Action :

➢ Observation :

91
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

BBBBB. ReAct in action :

In order for the model to well interpret the prompt and get the results, the cycle
Thought-action-observation will b erepeated several times in order to get the soluion.

So after we got above the year of creation of Arthur magazine , we need now to search
for creation year of First for Women and compare them later .

92
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Defining the action space :


The LLM can only choose among the actions we have defined before introducing the
structured example :

And we end with the expression ‘here are some examples’ to introduce our example

93
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

CCCCC. The full ReAct prompt :

- The instructions which defines the action space


- A full ReAct example ( like the one presented just above ) or many examples
- And finally , the question to answer.

XV. Langchain presentation

This prompt is used to make the working with LLMs easier it contains three
compounents :

➢ Prompt templates :
For many different use cases that we can use to format both input examples and model
completions .

94
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

➢ Memory
To store the interactions with the LLM

➢ Pre-built tools
enable you to carry out a wide variety of tasks, including calls to external datasets and
various APIs.

• Connecting a selection of these individual components together results in a


chain.

DDDDD. The need for agents to more


flexibility in the chains
Sometimes your application workflow could take multiple paths depending on the
information the user provides. In this case, you can't use a pre-determined chain, but
instead we'll need the flexibility to decide which actions to take as the user moves
through the workflow. LangChain defines another construct, known as an agent, that
you can use to interpret the input from the user and determine which tool or tools to
use to complete the task.

➢ PAL and ReAct are included in the Langchain


framework !

EEEEE. Need for large models for complicated


tasks
Larger models are generally your best choice for techniques that use advanced
prompting, like PAL or ReAct. Smaller models may struggle to understand the tasks in

95
Generative AI with Large Language Models AWS & DeepLearning.AI Full Course
Summary

highly structured prompts and may require you to perform additional fine tuning to
improve their ability to reason and plan.

XVI. LLM Final architecture

1. Infrastructure to do the LLM finetuning and serving it

2. our optimized LLM (finetuned/ injected by few-shot inference …etc)

3. Information sources that the LLm may eventually need

4. Human feedbacks that we gonna need to assure the HHH for the LLM

5. LLM tools and frameworks like langchain and model hubs which willm make the
implemnattion of advanced techniques like React and PAL luch easier

6. finally , the app/web service that’s gonna interact with the LLM and exploit it

96

You might also like