0% found this document useful (0 votes)
113 views1 page

LLM Cheatsheet

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views1 page

LLM Cheatsheet

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

TRANSFORMERS TYPES OF LLMS CONFIGURATION SETTINGS

Introduction to LLMs – Can scale efficiently to use multi-core GPUs


– Can process input data in parallel
Encoder only = Autoencoding model
Ex: BERT, RoBERTa
Parameters to set at inference time

Max new tokens Maximum number of tokens


– Pay attention to all other words These are not generative models. generated during completion
when processing a word
DEFINITIONS  
 Decoding strategy
Transformers’ strength lies in understanding
Generative AI AI systems that can produce the context and relevance of all words 1 Greedy Decoding The word/token with the

realistic content (text, image, etc.) in a sentence highest probability is selected from the final
To predict tokens masked
PRE-TRAINING OBJECTIVE probability distribution (prone to repetition)
        in a sentence (= Masked Language Modeling)
     
  
 
    OUTPUT Encoded representation of the text  
 
 USE CASE(S) Sentence classification (e.g., NER)   


  
Large Language Models (LLMs)  
Large neural networks trained at internet scale Decoder only = Autoregressive model
 
 
to estimate the probability of sequences
of words Ex: GPT, BLOOM  
 
Ex: GPT, FLAN-T5, LLaMA, PaLM, BLOOM  
  2 Random Sampling The model chooses
(transformers with billions of parameters)
an output word at random using the probability
Abilities (and computing resources needed)     
distribution to weigh the selection (could be
tend to rise with the number of parameters  
To predict the next token
PRE-TRAINING OBJECTIVE too creative)
USE CASES     based on the previous sequence of tokens TECHNIQUES TO CONTROL RANDOM SAMPLING
– Standard NLP tasks   (= Causal Language Modeling)
– Top K The next token is drawn from
(classification, summarization, etc.)  OUTPUT Next token the k tokens with the highest probabilities
– Content generation USE CASES Text generation – Top P The next token is drawn from
– Reasoning (Q&A, planning, coding, etc.) Token Word or sub-word the tokens with the highest probabilities,
The basic unit processed by transformers Encoder-Decoder = Seq-to-seq model whose combined probabilities exceed p
 
Encoder Processes input sequence Ex: T5, BART
 
  to generate a vector representation (or  
 
embedding) for each token       


Decoder Processes input tokens to produce  
 
new tokens   
In-context learning Specifying the task  
to perform directly in the prompt Embedding layer Maps each token   
to a trainable vector
   
         Temperature Influence the shape of
   Positional encoding vector the probability distribution through a scaling
      
Added to the token embedding vector PRE-TRAINING OBJECTIVE Vary from model to model
     factor in the softmax layer
  to keep track of the token’s position (e.g., Span corruption like T5)
     
OUTPUT Sentinel token + predicted tokens  
  
Self-Attention Computes the importance  

  
of each word in the input sequence to all USE CASES Translation, Q&A, summarization
     
       other words in the sequence

© 2024 Dataiku

You might also like