0% found this document useful (0 votes)
25 views31 pages

LLM Review

Uploaded by

sonyquekss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views31 pages

LLM Review

Uploaded by

sonyquekss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

LARGE LANGUAGE

MODELS

A REVIEW FOR LE AI
HACKATHON

Shay Zweig, 19/4/2023


WHAT ARE WE GOING TO COVER
• What are Large Language models
• LLMs Strengths and limitation
• Prompt engineering
• Context and short term memory
• Retrieval augmented generation
• Augmenting LLMS: Tools, plugins and agents
• Helpful libraries (Langchain)
WHAT ARE Given a sequence of words – predict the next
LANGUAGE
word Clean
0.19

MODELS? 0.4
Amazing
The room in the hotel was ___
Disappointing
0.3
Haunted
0.01
Has been around for a while
WHAT ARE Large language models are huge artificial neural
LARGE networks trained on the word prediction task
LANGUAGE
MODELS • Not magic – Function optimization
(LLM)?
• Only trained to predict the next word *

How big? (GPT3) Why does it work?

• 175B parameter • Quality data


• Trained on all the • Turns out word
internet ~500B tokens completion is a great
task
• Also – we don't know
* also RLHF
WHAT IS IT GOOD FOR?
Completion – write a rap song about Knowledge extraction– given a hotel
luxury escapes description, extract the name, location,
number of rooms...
Q&A– When was luxury escapes founded?
Sentiment analysis – what is the
Summarization – summarize the
sentiment of the following text: "The hotel
following document...
was..."
Classification – given a hotel Paraphrasing – rewrite the following text
description, classify it to the one or more of in 10 different styles
the following classes: [family friendly, city
break...] Coding – write a python function that
takes a document and analyzes...

And much more!!!


LIMITATION – BE CAREFUL....
Hallucinations and alignment:
Knowledge cutoff:

Consistency and predictability - how do I know I get the right result?


Evaluation - how to evaluate the results?
Number of tokens
Cost of inference
...
OPEN AI Completion: text-davinci-003
API openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(model="text-davinci-003", prompt="Say this is a test)

Chat completion: gpt-4/gpt-3.5-turbo


openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
]
)

Embeddings:text-embedding-ada-002
def get_embedding(text, model="text-embedding-ada-002"):
text = text.replace("\n", " ")
return openai.Embedding.create(input = [text], model=model)['data'][0]['embedding']
PROMPT ENGINEERING
The prompt: Our main way to control the model behavior
From To

Emergent behaviour
Task specific model training Zero shot and few shot
and fine tuning learning
PROMPT ENGINEERING
The prompt: The "program language" of the model
Instructions: (Answer the user query given the specifies hotel description. If there is no information in
the description to answer the query answer "I don't know".)

Context: (hotel description: Fly high in one of the world's ultimate....)

Examples (Few shots) :


Query: Did the hotel win any prizes
Answer: Yes, it won the Tripadvisor Travellers' Choice for 2020

Query: Does it have Gluten free meal?


Answer: I don't know

User input: "query: Can I charge my electric car in the hotel?

Output indicator: Answer:


PROMPT ENGINEERING
PROMPT ENGINEERING
CHAT PROMPTS – ROLES
Only relevant in the new chat models (GPT-3.5/GPT-4)

system: "You are LeGPT you are an expert in travel you System prompts are used for
can answer questions in reference to provided context. you general instructions to the model –
answer questions in a fun and engaging way." They are more useful in GPT4.0
user: I want to take my family to a vacation in December,
where should I go? User prompts are used for the user
interaction within the conversation
Assistant: December is a great time to take a family
vacation! If you're looking for a fun and festive experience,
Assistant prompts are used for the
I suggest visiting one of the many Christmas markets in
model response within the
Europe. Germany, Austria, and Switzerland are known for
conversation
their beautiful markets...
PROMPT ENGINEERING - TIPS AND TRICKS
• Tell the model it’s role: “As an expert in...”
• Be as explicit and elaborate as possible:
• ...If you don't have the answer, say: I don't know
• ...No more than 60 words but can be less than 60 words.

• Chain of thought reasoning (CoT) - Let's think step by step


• A good reference:
https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/
• Automatic prompt generator (careful – expensive....):
https://github.com/keirp/automatic_prompt_engineer
THE TEMPERATURE PARAMETER
The temperature parameter sets the randomness level of the model.

Temperature = 0 ==> an (almost) deterministic output

Temperature = 1 ==> Increase randomness – different outputs every time, higher "creativity"
CONTEXT - SHORT TERM MEMORY
• LLMs are stateless, all the context needs to be passed in the prompt....
CONTEXT - SHORT TERM MEMORY
• LLMs are stateless, all the context needs to be passed in the prompt....

"total_tokens": 263

"total_tokens": 488

* Tokens are the atoms of the language model – each token can be one or more words or even parts of a word
CONTEXT - SHORT TERM MEMORY
• LLMs are stateless, all the context needs to be passed in the prompt....

"total_tokens": 263

"total_tokens": 488

Problem: Token* explosion Possible solutions:


• Every model has a token limitation (4K for • Context window (include only the last X iterations)
ChatGpt/ 8K for GPT4) • Summarization (Summarize the chat to this point)
• Billing I usually by the token as well as runtime

* Tokens are the atoms of the language model – each token can be one or more words or even parts of a word
CONTEXT - FEW SHOTS
Providing the model with examples of the desired behavior, will greatly improve performance:

Problem: Possible solutions:


• Few shots increases the number of tokens • Example selection
significantly... • Fine tuning (only available for GPT 3)
CONTEXT
Remember – Always count your tokens...
https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
RETRIEVAL AUGMENTED GENERATION
• The best way to handle: knowledge cutoff, hallucinations, referencing and using internal
knowledge
• Use the strengths of the generative model but ground them to external knowledge.
EMBEDDINGS

Top-Rated Five-Star Maldives


Paradise with Two Infinity Pools & 2 -79 1 4 -30 26 8 94 -1 ...
Eight Restaurants
EMBEDDINGS

Ultimate All-Inclusive Pullman Maldives


Villas with Unlimited Drinks & ...

Roundtrip Domestic Malé Flights

Top-Rated Five-Star Maldives Paradise


with Two Infinity Pools & Eight
2 -79 1 4 -30 26 8 94 -1 ...
Restaurants

Vibrant Five-Star Pullman Stay in the


Heart of Melbourne CBD's Shopping & ...
Dining District with Daily Breakfast
RETRIEVAL AUGMENTED GENERATION
• Embeddings + Vector DB + retrieval + contextual generation

Query
LLM
Embedding Generation

Context
Knn
Similarity search Response
store
Embedding
Vector
DB*
Documents Embedding vectors Similar docs
* pinecone/chrome/Faiss
Tutorial link
RETRIEVAL AUGMENTED GENERATION
• Use cases:
• Search
• Recommendation
• In context QA
Query
• …
LLM
Embedding Generation

Context
Knn
Similarity search Response
store
Embedding
Vector
DB*
Documents Embedding vectors Similar docs
* pinecone/chrome/Faiss
Tutorial link
RETRIEVAL AUGMENTED GENERATION
• Pros
• Reduces hallucinations dramatically!
• LLM augmentation with new and external knowledge (like organizational knowledge)
• Can reduce cost (embeddings are cheap and one time)
• Leverage LLM strength for generation.
• Allows referencing

• Cons
• Complexity –
• preprocessing the data
• Vector DB ops
• Cost - Vector DB costs
AUGMENTING LLMS - TOOLS
• external APIs meant to augment LLMs such as:
• Search
• Calculator
• DB query
• Bash / Python interpreter
• OTHER AI models (HuggingGPT)
• Humans!
• …

• Should have a good description of function and expected input.


AUGMENTING LLMS - PLUGINS
• Exposing external API to OpenAI’s ChatGPT
• You can think of it as an app store for the new chat UI
LLM AGENTS
• LLMs that can plan, use tools and self improve:
• Plan strategy to reach a goal Be careful!
Use of agents can get expensive…
• Perform tasks (use of external tools or internal subtasks)
• Take Observations from tasks
• Self reflect and improve

AutoGPT, Demo, LangChain Agents


LANGCHAIN – A LIBRARY FOR LLM APP DEV
A lot of wrappers and functionality - make life very easy!
• Multiple LLMs
• Prompt templates and selectors
• Chains (composing different LLM components together)
• Memory (multiple types)
• Agents
• Indexes (Vector db wrappers)
• Loaders (easily load text data) Tutorial link

Should be the go-to lib for the AI hackathon


NOT ONLY OPEN AI
• AI21Labs (Task specific APIs)
• CoHere (multilingual embeddings!)
• HuggingFace – many open-source models
• Anthropic (closed beta)
REFERENCES

• Great post about production LLMS: link


• Prompt engineering tricks tl;dr link
• Retrieval augmented generation tutorial: link
• Langchain crash course: link
• Data preprocessing for link

You might also like