0% found this document useful (0 votes)
4 views20 pages

Presentation 11

Large Language Models (LLMs) are computational models for natural language processing that utilize unsupervised and semi-supervised learning to understand and generate text. The evolution of LLMs has progressed from statistical language modeling to advanced models like GPT-4, with various types such as autoregressive models, masked language models, and encoder-decoder models serving different use cases. Future applications of LLMs include personal AI assistants, medical advisors, and business analytics.

Uploaded by

atoinfinitya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views20 pages

Presentation 11

Large Language Models (LLMs) are computational models for natural language processing that utilize unsupervised and semi-supervised learning to understand and generate text. The evolution of LLMs has progressed from statistical language modeling to advanced models like GPT-4, with various types such as autoregressive models, masked language models, and encoder-decoder models serving different use cases. Future applications of LLMs include personal AI assistants, medical advisors, and business analytics.

Uploaded by

atoinfinitya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

LLM

Introduction to Large Language Models (LLMs)


Definition
• A type of computational model designed for natural
language processing tasks such as language generation
• Applies Unsupervised learning and Semi-supervised
learning
• Learns statistical relationships between large amount of
texts
History
• Start from statistical language modelling by IBM in 2001
• To Neural Machine Translation by Google in 2016
• To GPT1 of OpenAI in 2019
• To GPT4 in 2023
Types of LLM and Use cases
• Autoregressive Models
Definition: Predict the next token based on previously generated tokens.
Examples: GPT series (GPT-3, GPT-4), LLaMA (Meta).
Use cases: Text generation and sequential tasks.

• Masked Language Models (MLMs)


Definition: Predict masked (hidden) tokens in a sentence by analyzing the entire context.
Examples: BERT (Bidirectional Encoder Representations from Transformers), RoBERTa
Use cases: Excellent at understanding context and relationships in text, good for classification and
question answering.

• Encoder-Decoder Models (Seq2Seq)


Definition: Combine both encoding input and decoding output to generate context-aware sequences.
Examples: T5 (Text-to-Text Transfer Transformer), BART (Bidirectional and Auto-Regressive
Transformers)
Use Cases: Designed for tasks like text summarization, translation, and question answering.
Architecture – Workflow
• Large datasets of words are processed
• The processed data goes through embedding layer(s)
• Embedded data are trained on Transformer Layers
• Self Attention Mechanism is applied on each set of layers for understanding context
• Masking is applied for different use cases
• Optimization is done on the completed model
• Output are decoded and sorted into a dictionary
Architecture – Process data
• Tokenization
• Encoding
• Clean Data
• Synthesize data
• Fine-tune data
Process data - Tokenization
• Tokenize words into numbers so the algorithm can
understand
Process data - Encoding
• Applies Positional Encoding to embed the position of
each token in the input sequence.
• Ensures the model understands the order of words.
Positional Encoding - Formula

• Q represents the Query vector.


• K represents the Key vector.
• V represents the Value vector.
• d is the dimension of the key/query vectors.

• Query: Used to match relevant information.


• Key: Enables finding matching queries.
• Value: The actual information passed based on the
attention weight.
Architecture – Embedding layers
• Embeddings words are dense vectors that words or
tokens are converted into before being processed.
• Vectors are multi-dimensional to capture semantic
similarities between different vocabulary field.
Architecture - Training
• Multiple Stacked Transformer Layers that are either:
• Self-Attention Layers: Focus on Specific parts of the input for relevancy
• Or Feed Forward Neural Network: Using Non-Linear Regression, take output of Self-
Attention Layers
• Short Cut between layers to minimize Information Loss
Architecture – Self Attention
Mechanism
• Multi-Head Attention: Allow the model to learn
different aspects of word relationships in parallel.
• Each head performs a separate attention operation then
combines results.
Architecture - Masking
• Bidirectional Attention (BERT): The model looks at both
left and right of a word to predict masked tokens.
• Autoregressive Attention (GPT): The model only looks at
prior words when generating text, useful for predicting
the next token.
Architecture – Output processing
• Decoder to turn tokens back to words
• Soft-max of multiple possible words
• Vocabulary and Tokenization: A fixed vocabulary size
(often 30,000–50,000 tokens), split into sub-word tokens
to handle rare words and different languages effectively.
Architecture - Optimization
• Optimizers: Adam, AdamW on attention weights and
back propagation
Scale
• GPT4 has 10x the parameters from GPT3
• GPT3 and Llama were trained on a much later dataset
that include new sources at the time: websites, social
medias, online contents,...
Research papers
• GPT: https://arxiv.org/abs/2303.08774

• PaLM: https://arxiv.org/abs/2204.02311

• LLaMA: https://arxiv.org/abs/2302.13971
Github Links
• OpenLLM: https://github.com/bentoml/OpenLLM

• OpenLLaMA: https://github.com/openlm-research/open_llama

• GPT: https://github.com/openai/openai-cookbook
Comparing LLM models
Future Uses of LLMs
• Personal AI Assistants

• Medical Advisors

• Business Analytics

You might also like